7. Standards

The role of standards in increasing data understandability and reusability is crucial. Standardization activities characterize the different phases of the scientific data life-cycle. Several activities aim at defining and developing standards to represent scientific data, i.e., standard data models; standards for querying data collections/databases, i.e., standard query languages; standards for modeling domain-specific metadata information, i.e., metadata standards; standards for identifying data, i.e., data identification standards, standards for creating a common understanding of a domain-specific data collection, i.e., standard domain-specific ontologies/taxonomies and lexicons, standards for facilitating the transfer of data between domains, i.e., standard transportation protocols, etc.

A big effort has been devoted to creating metadata standards for different research communities. Metadata standards vary in terms of their specificity, structure, and maturity largely because each standard has been developed on the basis of the needs of a particular user community.

Given the plethora of standards that now exist, some attention should be directed to creating crosswalks or maps between the different standards.

In, the standardization is considered to be particularly important for the reuse of data across distance, where the use of data outside their original context implies distance. The word distance is subject to a variety of interpretations. Most commonly, distance is used to refer to something outside the local sphere of activity. An example of this definition is the space between the assumptions and methods of one discipline and another. Distance can also exist within a community, for reasons such as personal or institutional status, subspecialty, or epistemological view. Additionally, the word distance can be defined in a temporal sense. For example, there can be a time lag between the original data collection and reuse.

Standards are important because they can help to span all kinds of distance (spatial, temporal, cultural, etc.) as they have the capability to transform local knowledge into public knowledge and thus avoid that epistemological differences due to distance can lead to different interpretations of the same data.