6. Enabling Technologies

6.5. Linking Data to Publications

In data-dominated science, scientific communication undergoes a significant change. Modern scientific communication should support the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources. The need to cite data is starting to be recognized as one of the key practices underpinning the recognition of data as a primary research output rather than as a by-product of research. Data will, thus, become a first-class citizen of the scientific communication. Linking scientific data to publications will produce significant benefits as publications: (i) facilitate data findability; (ii) facilitate data interpretability; and (iii) provide the data author better credit for the data.

As a consequence, accessing a data set through a scientific publication will increase the usability of this data set.


Linked Open Data

The usability of scientific data could be greatly increased by the adoption of the "Linked Data" technologies as they provide a more generic, more flexible data publishing paradigm that makes it easier for data producers to interconnect their data with those produced in other scientific disciplines and for data consumers to discover and integrate data from large numbers of data sources. The term Linked Data refers to a set of best practices for publishing structured data on the Web. In particular, Linked Data provides (i) a unifying data model. Linked Data relies on Resource Description Framework RDF as a single, unifying model; (ii) a standardized data access mechanism. Linked Data commits itself to a specific pattern of using the HTTP protocol; (iii) hyperlink-based data discovery. By using URIs as global identifiers for entities, Linked Data allow hyperlinks to be set between entities in different data sources; and (iv) self-descriptive data.

Linked Data have gained significant uptake in several scientific domains as a technology that allows to connect the various data sets that are used by researchers in different scientific domains and to navigate along the RDF links between different scientific data sets as well as between publications and supporting data.

Recently, a grassroots effort, the Linked Open Data, is aiming to publish and interlink open license data sets from different data sources as Linked Data on the Web.