Data Management Plans

Site: Saylor Academy
Course: BUS611: Data Management
Book: Data Management Plans
Printed by: Guest user
Date: Thursday, 3 April 2025, 5:38 AM

Description

Study the Data Management Plans section of the US Geological Survey. Notice the templates and examples that are provided. How would you evaluate these data management plans? How could you apply one of the templates to develop your own DMP for a situation from your professional experience?

Now that you can describe and explain DMP, let's cover a few careers within the data management field.


Introduction

Planning for a project involves making decisions about data resources and potential products. A Data Management Plan (DMP) describes data that will be acquired or produced during research; how the data will be managed, described, and stored, what standards you will use, and how data will be handled and protected during and after the completion of the project. 



Source: USGS, https://www.usgs.gov/data-management/data-management-plans
Public Domain Mark This work is in the Public Domain.

Data Management Plan Checklist

USGS Data Management Plan Checklist

Effective October 1, 2016 all new USGS projects must include a Data Management Plan (DMP) (refer to SM 502.6) as part of the Project Work Plan (refer to SM 502.2) which is approved by the Science Center Director. This checklist provides guidance in what you must consider in developing for a DMP at the onset of a project to satisfy USGS Fundamental Science Practices.

Note: Items shaded in gray (or with an asterisk at the beginning) may be more accurately described as the project evolves.


Plan

Provide the basic identification information for your project. (e.g., project title, any project tracking numbers, point of contact)
Provide the time frame of the project and data collection activities (e.g., start and end dates for project and data collection periods).
Provide contact information for staff and partners involved in the project.
Identify who, the USGS, a partner, or a cooperator, has the overall data management responsibility for project-related data acquisition, processing, quality control, documentation, and preservation.
* Provide your estimated budget for data management activities.
* If applicable, identify the data sharing agreement, Memorandum of Understanding, or Memorandum of Agreement that defines roles and responsibilities for data collection and/or sharing.
 

Acquire

Provide the basic identification information for each dataset (e.g., title, description, source, point of contact).
Describe the purpose of each dataset in context of the project.
Identify any inherent restrictions on use of any dataset.
* Identify the format of each dataset.
* Identify storage requirements for each dataset.


Process and Analyze

* Capture the data transformations, synthesis actions, or other processing steps to produce the datasets. If possible, use workflow software such as VisTrails or others.
* Describe technologies, capabilities, or models that will be used for data processing.
* For models, software, and code, list data inputs, and data outputs and calibration details.


Preserve

Document who has the responsibility for ensuring that data preservation is provided for all approved data releases.
* State what open data formats you plan to use when submitting your data for preservation.
* Include the estimated storage volume of the approved data releases.
* Identify where your approved data releases will be stored for long term preservation. State which trusted digital repository you plan to use. List any other websites that will provide the approved data, software, model, or code.

 

Publish/ Share

Provide the preliminary identification information for the anticipated project publication(s) (e.g., title, description, list of authors).

Describe the anticipated format of each publication (e.g., publication series, data type, or model).
State how you plan to maintain and update detailed metadata records in FGDC or ISO XML standard formats with dataset(s).
Describe how (e.g., tool, responsible person) the Digital Object Identifier (DOI) number will be assigned to each approved data release
Describe any inherent restrictions that will have to be imposed on the derived product based on use of proprietary data inputs or other factors.
 

Describe/ Metadata

* Describe the tools or process that will be used to create metadata.
* Identify the person responsible for creating metadata files.

 

Manage Quality

* Document project team roles and operational procedures. Reference Science Center policy or other standard operating procedures if applicable.


Backup and Secure

* Identify location of internal storage resources that provide replication and backup capability, and will be used to store acquired data during processing and analysis.
* Identify the contact person for the storage resource that will be used.
* If known, describe the records disposition schedule for the data.

Science Center Data Management Strategy

  A Template for Science Center Data Management Strategies (Version 1)

Preface

To achieve the USGS mission of providing scientific data that serves the nation, data management must go beyond the immediate needs of the scientific research project. USGS science centers must take additional steps to provide data to the public in useful forms and to preserve data and make it discoverable far into the future. These requirements are expressed in the USGS Science Data Lifecycle Model (see diagram in appendix section A and USGS OFR 2013-1265), the USGS Public Access Plan, and new chapters in the Survey Manual. USGS information will move into a more open and transparent world of scientific research as we address these new data management policies. The USGS is committed to meet the following requirements by October 1, 2016:

  • Research data must be actively managed throughout the USGS Scientific Data Lifecycle Model: Plan, Acquire, Process, Analyze, Preserve, and Publish/ Share.
  • An approved data management plan is required for every new research project.
  • Metadata records describing how the data were acquired, what steps were used to process the data, and what tools were used to analyze the data need to accompany all data.
  • The metadata must be submitted to the USGS Science Data Catalog.
  • Data must be approved for release, which requires review of both data and metadata. In many cases, approval of data for release is delegated to the Science Center Director.
  • The supporting digital data must be released, free of charge, in a machine-readable form at the same time or prior to the publication of research results that are completely or partially funded by the USGS. Limited exceptions will be made in cases of security, privacy, confidentiality, and other legal constraints.
  • Released data must be identified and linked to the corresponding publications through use of Digital Object Identifiers (DOI's).
  • Data acquired by USGS must be stored and managed in one of the USGS trusted public repositories or offline archives.

Each USGS science center will need to implement data management practices to meet these new policies that are consistent with local research activities. A science center data management strategy is a plan to meet these goals. It serves as a foundation for decisions about staffing, training, tools, procedures, and supervision; a demonstration of commitment to complying with policy; and a guide for researchers. This template for science center strategies uses the USGS Science Data Lifecycle as a structure.

Roles and Responsibilities

This section of the strategy identifies roles and procedures that will be used in the science center for data management, organized using the elements of the data lifecycle. For each element, a Roles and Responsibilities Matrix summarizes major steps in the procedure and identifies who is involved in each step. The matrix uses the following RACI model (http://racichart.org/) to designate responsibilities:

  • R stands for Responsible - the person who will actually do the work, one individual per step.
  • A stands for Accountable – the person who approves the work and ensures that it is completed, zero or one per step.
  • C stands for Consulted – people who provide advice, assistance, or agreement before the step is completed; zero, one, or several per step.
  • I stands for Informed – people who must be told after the step is completed; zero, one, or several per step.

The column headings used in the examples below are role titles defined in the Appendix section C. Additional roles or variations in the definitions might be required to fit your science center staffing. To customize this section for your science center, identify the roles that apply to each lifecycle element and adjust the process steps to fit your research and management activities. USGS Operational Database standard operating procedures can be applied in addition to this section. Your science center staff might appreciate identification of the people or job titles that play each role.


Plan

Goal: To create a data management plan (DMP) that documents data management approaches, needed resources, and data outputs for a project.

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Before finalizing a proposal for a new project, create and submit a project DMP for review.

 

 

 

 

 

 

Before project data release or end of fiscal year closeout revise project DMP.

 

 

 

 

 

 

Archive final DMP.

 

 

 

 

 

 


Acquire, Process, and Analyze

Goals: To generate, collect, or evaluate for re-use the data inputs for a project. To convert input data to forms suitable for integration and analysis. To explore and interpret processed data to discover or produce scientific results, interpretations, and conclusions.  Details on the Acquire, Process, and Analyze stages are collected in the project DMP's where these actions occur.

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Conduct research activities that generate or collect data.

 

 

 

 

 

 

Process data to a version that is suitable for analysis and interpretation.

 

 

 

 

 

 

Analyze and interpret data.

 

 

 

 

 

 

 


Backup and Secure

Goal: To ensure the integrity and availability of data, plans, documentation, software, products, and applications throughout the project.

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Preserve raw data collected by the project.

 

 

 

 

 

 

Establish secure storage of working project data and derivatives.

 

 

 

 

 

 


Describe

Goal: To compile information throughout the project and produce standards-based metadata records and documentation to accompany data collections and product releases.

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Track details of data acquisition, processing, analysis, and interpretation.

 

 

 

 

 

 

Use a metadata tool to create standards-based metadata records.

 

 

 

 

 

 


Manage Quality

Goal: To conduct and document data quality control throughout the project.

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Examine data for quality and document procedure in metadata.

 

 

 

 

 

 

Conduct project team data reviews.

 

 

 

 

 

 

Review data and metadata during science center level project reviews.

 

 

 

 

 

 


Preserve

Goal: To convert data, metadata, and ancillary products to sustainable formats and store them in a USGS Trusted Digital Repository. 

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Evaluate and identify raw, Intermediate, or final data products of long-term value.

 

 

 

 

 

 

Deposit identified data (raw, intermediate, final) to a repository appropriate for each category.

 

 

 

 

 

 


Publish/Share

Goal: To review, approve, and release data consistent with the USGS Fundamental Science Practices. (Reference Appendix sections F and G for details on requirements for data.)

Customize the roles and process steps that apply to your science center.

Process Steps

Approving Officials (Job Titles)

Data Management Staff

IT Staff

Center Level Managers

Data Producers

Information Reviewers

Compile data and metadata files for release.

 

 

 

 

 

 

Create IPDS record and update throughout process.

 

 

 

 

 

 

Reserve digital object identifier.

 

 

 

 

 

 

Review data.

 

 

 

 

 

 

Review metadata.

 

 

 

 

 

 

Revise data and metadata.

 

 

 

 

 

 

Create online content placeholder.

 

 

 

 

 

 

Approve release.

 

 

 

 

 

 

Make content public.

 

 

 

 

 

 

Activate digital object identifier.

 

 

 

 

 

 


Infrastructure

This section of the strategy lists the tools and facilities supported or recommended by your science center for each stage of the data lifecycle. The template links to suggested options; customize the document by filling in options that fit your center's available resources and operating procedures.


Plan

There is no standard data management plan option.  The science center's DMPs will need to be archived in a single location, which should be specified in addition to the DMP option selected, if the selection does not support storage capability.

  • A list of tools can be found here:
    https://usgs.gov/products/data-and-tools/data-management/data-management-plans

DMP Creation Tool(s)

URL

Contact

 

 

 

 

DMP Archive(s)

URL

Contact

 

 

 


Acquire, Process, and Analyze

If the science center supports, provides assistance with, or requires the use of certain tools for data processing and analysis, list them below. Details on the Acquire, Process, and Analyze stages are collected in individual project DMP's where these actions occur.

Software

Data Type

URL or Contact

 

 

 


Backup and Secure

Is there an internal network resource where data producers can store acquired data for processing and analysis that has replication and backup capability?  How is data organized on this resource? Is its use required, recommended, or optional?

 


Describe

Many metadata tools exist and any tool can be identified below as long as it produces CSDGM (FGDC) or ISO standard metadata records that can be validated. 

  • A list of metadata tools can be found at:
    https://usgs.gov/products/data-and-tools/data-management/metadata#tools
  • A list of validation tools can be found at:
    https://usgs.gov/products/data-and-tools/data-management/metadata#validating

Metadata Tool(s)

URL

Contact

 

 

 


Preserve

What digital or physical repositories will preserve and curate science center data? Will all data go to a single archive, or will several be used based on data types? Be sure these will meet the USGS standard for trusted digital repositories and will maintain federal custody of the data.

  • A list of repositories can be found here:
    https://usgs.gov/products/data-and-tools/data-management/repositories
  • Guidance on repositories can be found at: https://my.usgs.gov/confluence/display/cdi/Trusted+Digital+Repository

Storage Option(s)

URL or Physical Address

Contact

 

 

 


Publish/Share

ScienceBase should be a default option for science centers not having in-house solutions for data portals and code repositories or not solely contributing to an approved USGS database.  Models, software, and code can take advantage of common code repositories in lieu of or in addition to data portals.  (Reference the Approved Data Package Content in the Additional Options section to describe options for data release packages.)

  • A list of data portals can be found here:
    https://usgs.gov/products/data-and-tools/data-management/data-catalogs-and-portals

All USGS data releases require identifiers assigned using the USGS DOI Generator, which is online at https://www1.usgs.gov/csas/doi/. There are many options for assigning responsibility of DOI generation. This is an important role to clarify in the Publish/Share stage of the Roles and Responsibilities section, above.

Portals and Repositories

URL

Contact

 

 

 

Additional Options

The following sections might be useful for further defining your science center's data management strategy.


Approved Data Package Content

Discretion of data to be included in a data package is given to authors and center directors.   Customize the following tables to define what types of data to include or not include in a data package intended for public release.  For more information, see http://internal.usgs.gov/fsp/faqs-basics.html.


Inclusion

Data that will be required to meet Fundamental Science Practice requirements and obtain approval for public release.  The following are considered required content. 

  • Data represented in a publication as the final analysis.
  • Model, software, code that is the subject of the publication or used to produce a final analysis.
  • Data Series, basic data sets, databases, and multimedia or motion graphics. This series can be used for videos, computer programs, and collections of digital photographs.

 

List data to include in the table below.

Data Type

Process Stage

 

 

 

Exclusion

Data that has security, privacy, confidentiality, and other legal constraints is never released to the public.  Exclusion of raw or minimally processed data that are the foundation for interpreted data products is an option left to the discretion of the Science Center. For more information, see

https://usgs.gov/products/data-and-tools/data-management/proprietary-and-sensitive-data.

 

Other excluded data include referenceable data, which should be identified in metadata and the references section of manuscripts:

  • Published and Citable Datasets and Models, Software, and Code (Secondary data, data collected and previously published by another USGS or Non-USGS source and used (with permission) in a USGS information product.).
  • Unpublished Personal Correspondence Contributions (Data collected by someone else, not publicly released by the data collector, and used (with permission) in a USGS information product.).
  • Approved USGS Operational Database.

 

List any additional data to exclude in the table below.

Data Type

Process Stage

 

 

 

Provisional Data Release

Provisional Data release is subject to Fundamental Science Practice requirements that can be found at http://internal.usgs.gov/preview/fsp/toolbox/provisional_data_information_release.html.


Data Management Plan Elements

Data Management Plans are interactive documents that have a life cycle of their own throughout a project.  What elements are required to be updated at each project stage? 

Project Stage

Required Elements

Proposal

 

Project review

 

Final archive

 


Appendix

A.  USGS Data Lifecycle Model


B.  Definitions

  • Approved Data - Those data that have USGS approval for release.
  • Approved USGS Operational Database – An online database, such as NWIS, that is approved for release of USGS data. These databases are under the care of data managers who assure the quality, integrity, and preservation of the data and provide appropriate metadata.
  • Data - Observations or measurements (unprocessed or processed) represented as text, numbers, or multimedia.
  • Data Lifecycle - The USGS Science Data Lifecycle Model is a structure of data management activities that relate to research project workflows, from conception through preservation and sharing. This structure is used to ensure that USGS data products will be well-described, preserved, accessible, and fit for re-use. For more information see USGS OFR 2013-1265 and www.usgs.gov/datamanagement/.
  • Data Management Plan (DMP) - A structured document that is submitted with a project proposal to summarize intentions and necessary resources for data management, then updated throughout the data lifecycle to serve as an official record of the data collected and how it has been managed.
  • Dataset - A structured collection of data.
  • Database - Datasets and other items stored together to serve one or more purposes or applications, often including data query or search and retrieval capabilities.
  • Fundamental Science Practices (FSP) – The set of USGS policies that govern the management and release of data as well as scientific publications. These Chapters of the Survey Manual can be found at www.usgs.gov/fsp/ and are enforced by the Office of Science Quality and Integrity.
  • Metadata - A structured, machine readable file that provides basic information about data (who, what, when, where, why, and how) that is essential to promote scientific collaboration; enable discovery, interpretation, and effective use of the data; and document its nature and quality. Current Approved Standards: FGDC Content Standard for Digital Geospatial Metadata or the International Organization for Standardization (ISO). Extensions to the standards exist, and those FGDC and ISO approved profiles or extensions that apply must be used.
  • Provisional Data - USGS data, such as real-time data or preliminary measurements that are permitted to be released prior to approval to meet an immediate need, with the stipulation that they are subject to revision. For more information about restrictions on release of preliminary data, see http://internal.usgs.gov/fsp/toolbox/provisional_data_information_release.html
  • Source Data - Primary or Secondary data used as input to produce products. Primary data is data measured or observed by the researcher, and is in a basic form that has been calibrated, converted to standard units, and has passed quality control procedures that remove or flag incorrect data. Secondary data is defined as data collected by someone other than the user.
  • USGS Data Portal - A USGS maintained data storage system that can ensure the long-term preservation, discoverability, accessibility, and usability of USGS data that is released to the public.
  • USGS Trusted Digital Repository - A USGS storage system that meets the standards at https://my.usgs.gov/confluence/display/cdi/Trusted+Digital+Repository.


C. Roles

  • Generic Classifications:
    1. Approving Officials - Including Science Center Directors (or their designees) and Bureau Approving Officials in the OSQI, collaborate with authors, mission area managers, and others as needed regarding review and approval of scientific data. They have latitude in determining what is needed to uphold USGS standards for data quality, including ensuring the necessary reviews are obtained and the method of release is appropriate. 
    2. Data Management Staff - The assigned or designated individuals, teams, or organizations that are responsible for stewarding scientific data through the release process using designated tools for creation of metadata and Digital Object Identifiers (DOI's) and USGS data portals. They collaborate with their mission area Science Center Directors, managers, supervisors, and scientists in the conduct of their data stewardship activities and interact with USGS data portals and other technical infrastructure for preservation of data.
  • IT Staff - The assigned or designated individuals, teams, or organizations are responsible for maintaining website servers, USGS data portals, and other technical infrastructure for access, discovery, and preservation of data.
  1. Center Level Managers - The assigned or designated individuals who oversee project operations, are responsible for understanding data management requirements and providing projects with guidance to ensure compliance.
  2. Data Producers - USGS scientists and authors ensure that data is in a non-proprietary publicly available format and sufficient metadata records and Digital Object Identifiers (DOI's) are created for each data, software, and other information product they produce in accordance with requirements. This includes ensuring that the appropriate metadata review, peer review, editorial review, and approval for products they produce are obtained.
  3. Information Reviewers - The assigned or designated individuals, teams, or organizations responsible for skills necessary to accurately review data and metadata for products produced by USGS scientists and authors.
  • Specific Classifications:
    1. Branch, Project, or Section Chiefs and Supervisors - Persons who oversee projects within a center.
    2. Bureau Approving Official (BAO) – A person who works for the Office of Science Quality and Integrity and is responsible for ensuring that our science center complies with FSP policies and for approval of publications that contain new interpretive content.
  • Data Manager - Coordinates data governance, data stewardship activities, oversees data management projects, and supervises data management activities.
  1. Data Quality Specialist - A person who can review data for publications.
  2. Data Steward - A person knowledgeable in a particular area or topic who is assigned accountability for data specifications and data quality for a specific project or dataset.
  3. Database Administrator – For NWIS and other USGS database applications, an IT professional responsible for the installation, configuration, upgrade, administration, monitoring, maintenance, security, and backup of databases.
  • DOI Manager - A person who creates and updates Digital Object Identifier numbers.
  • IPDS Manager - A person who creates and monitors IPDS records. Answers publications process questions and inquiries. Guides scientists through IPDS routing and requirements.
  1. Metadata Specialist - A person who can provide metadata training and review metadata for publications. This requires running metadata validation software and knowledge of xml file structure.
  2. PI/ Project Chief/ Researcher - A person responsible for project and resulting publications. These people are identified in BASIS+ workplans.
  3. Science Center Director –The science center director is responsible for approving the release of data that are not considered new interpretive, and for determining which data releases and publications must be approved by the BAO. This staff member can also be responsible for overall planning and management of research activities at the science center.
  • Science Center Web Staff - A person responsible for project pages and data release on web pages.
  • USGS Data Portal Manager - A person who manages dataset organization and permissions of a data portal, oversees the process that ensures the quality of data added to the database or service, routine data reviews, and documentation of methods and procedures.


D. Public Release of Data Packages

The section below described the classification of data releases and associated requirements.  There is nothing to fill out below, it is only required that the Science Center acknowledge these requirements.

  • Approved (Data Release, Models, Software, Code)
    1. All data and models, software, or code intended for public release must meet USGS FSP review, approval, and release requirements. These requirements are a minimum of two reviews that include one data review and one metadata review followed by Bureau approval documented in IPDS in addition to any related written publication reviews and approval.
    2. Special Considerations:
      1. "New Interpretive" information requires BAO approval.
      2. Software, models, and code products do not require FGDC XML metadata files. Meta information should be documented within the code.
      3. Must include appropriate disclaimer. (https://www2.usgs.gov/fsp/fsp_disclaimers.asp#1)

DMP

Metadata

Access

YES

YES

USGS Data Portal

 

  • Provisional
    1. Emergency and non-emergency provisional data are those data (such as real-time data, preliminary measurements) that are subject to revision, and may be released prior to approval to meet an immediate need.
    2. Special Considerations:
      1. Must include appropriate disclaimer: https://www2.usgs.gov/fsp/fsp_disclaimers.asp#11
      2. Emergency provisional data does not require finalized metadata

DMP

Metadata

Access

YES

YES

Personal communication, websites

 

  • USGS Operational Database Collections
    1. Data collections that are part of USGS supported database operations. (ex. NWIS, Borehole LogArchiver, and Biodata).
    2. Special Considerations:
      1. Follow other Science Center and database SOP's as authoritative procedural documentation and methods of metadata creation.

DMP

Metadata

Access

NO

NO

USGS Operational Database

 

  • Unpublished USGS Operational Database Parameters
    1. Data that are collected and entered into a Bureau database, but not approved for release through the database because of conflict with USGS science community data standards.
    2. Special Considerations:
      1. Follow other Science Center and database SOP's as authoritative procedural documentation in addition to maintaining a DMP and metadata.

DMP

Metadata

Access

YES

YES

USGS Data Portal

 

E.  Data Not Suitable for Public Release

The section below described the classifications of data not suitable for data release and associated requirements.  There is nothing to fill out below, it is only required that the Science Center acknowledge these requirements

  • Restricted Data
    1. Proprietary or sensitive data collected or purchased by a data producer on behalf of the USGS.
    2. Storage and Access:
      1. The source data should be stored on an encrypted device with back up capability to preserve the information.
      2. The source data must not be released to the public through any USGS data portal, FTP, or website.
      3. Derivative products where the proprietary and sensitive constraints no longer apply, can be released in the final data package for publication, must have a DOI, and utilize the Science Center's designated USGS Data Portal as the primary public access location.
  • Special Considerations:
    1. The proprietary or sensitive data may be such that the data management staff does not have privileges to handle the data.
    2. The source data will not be approved for public release, but data producers and Science Center Directors are responsible for meeting preservation requirements.

DMP

Metadata

Access

YES

YES

NONE

 

  • Raw Data
    1. Data that is collected and remains unprocessed or unverified, often requiring data producer interpretation to create meaningful information.
    2. Storage and Access:
      1. Raw data is required to be stored on the Science Center's internal network storage location or a USGS data portal if access controls are provided to restrict public access.
      2. Field note submission to National Archives and Records Administration (NARA) may apply to meet archival requirements.
  • Special Considerations:
    1. The data will not be approved for public release, but data producers and Science Center Directors are responsible for meeting preservation requirements.

DMP

Metadata

Access

YES

YES

LOCAL