5. Data Publishing: A Process for Bridging the Gap between Data Author and Data User

An emerging approach in the scientific communication is Data Publication. By Data Publication, we mean a process that allows the research community to discover, understand, and make assertions about the trustworthiness and fitness for purpose of the data. In addition, it should allow for those who create data, to receive academic credit for their work. The ultimate aim of Data Publication is to make scientific data available for reuse both within the original disciplines and the wider community. Many of the issues regarding data availability and usability can be addressed if the principles of publication rather than sharing are applied. The Data publication approach imitates the scholarly literature publication and generally emerges from the culture of academic research and scholarly communication.

The Data Publication process should perform the following main functions:

  • Data Peer-Reviewing

The purpose of peer review is to ensure a certain level of quality assurance. In fact, a peer-reviewed dataset can be considered to have been through a process of scientific quality assurance.

  • Data Registration

The purpose of registration is to make data citable as a unique piece of work and to allow claims of precedence of a scholarly finding. Data registration should facilitate data discoverability.

  • Data Semantic Enrichment

The purpose of semantic enrichment is to make data understandable. The published data should be endowed with appropriate discipline-specific metadata information. The metadata information improves data understandability.

  • Data Archiving

The purpose of archiving is to preserve data over time.

  • Awareness

Publishing data allows scholars to remain aware of new claims and findings.

  • Rewarding

The purpose of rewarding is to bring scholarly credit to the data authors.

In conclusion, we can say that Data Publication is a process that guarantees the "right to know" of scholars and the "right to be known" of the data authors.

Data publication is enabled by data curation and data stewardship that are two fields of practice of paramount importance for the data (re)usability.

Data curation is the active and on-going management of data through its entire lifecycle of interest and usefulness to scholarship. Data curation activities maintain data quality, add value, and provide for reuse over time and also include authentication, archiving, management, preservation, and representation. Curation, in essence, is concerned with availability and future use of data, including the enhancement, extension, and improvement of data for reuse beyond a single scientific community.

Data stewardship is concerned with the management of shared data collections. It is essential to their preservation and persistence. Stewardship is the process of overseeing and enforcing these activities in accordance with policies defined by data collections' owners. The stewardship function is often primarily an administrative workflow.

The focus of data curation is on the "interest and usefulness" of data to scholarship; in essence, it addresses the data quality criterion of relevance, while data stewardship is mainly concerned with data trustworthiness.

Both data curation and stewardship address the critical function of helping users take confidence in data usability based on various criteria of its quality, and thus, are instrumental to the data publication. In fact, they constitute the two pillars that bear data publication.

An instrument that effectively supports the Data Publication process and therefore the data reusability is the Data Management Plan (DMP). A DMP is a formal document that states what data will be created and how, and outlines the plans for sharing and preservation. In addition, it also states any restrictions that may need to be applied on the collected data. All the data organizations that maintain data collections as well as the research projects that create data collections must be endowed with a DMP.