Read this article. Be sure you can explain the methods (approach) for extracting data based on usability.
3. The Conceptual Foundations of Data Reusability
3.1. Relational Thinking
The definition of data usability assumes that the two entities, data author and data user, are neatly separated from one another and considers the properties attached to these entities as independent of the relationships with which they interact and exist. Therefore, it tends to reify the attributes of these entities by detaching them from their scientific context. This often takes place, as substantive attributes are easier to identify or more convenient to count and so are assumed to be more concrete or "real" than relational attributes. However, such a substantialist approach is not appropriate to address data reusability issues. We think that an approach based on relational thinking is more appropriate. By relational thinking, we mean a loosely structured framework or scaffold around which various practice theories and methods are being developed.
In relational thinking found in practice theory, subjects, social groups, networks, or even artifacts develop their properties only in relation to other subjects, social groups, or networks. Social objects derive their significance from the relations that link them, rather than from the intrinsic features of individual elements.
A dataset cannot be understood and used in and of itself (isolation), and cannot be transferred from one scientific context to another without changes to its properties. Relational thinking entails that a dataset produced by a community of practice in order to be used by another community of practice must be endowed with properties (auxiliary information) that take into consideration the characteristics of the "usability relation" between the two communities.
Several kinds of usability relations can be established between two communities of practice. For example, a "confirmation relation" is established when the consumer community tries to find a confirmation of some scientific expectation by gathering enough evidence from a data set produced by the producer community. Another kind of usability relation is the "reproduction/verification relation" that is established when the consumer community tries to reproduce/verify a scientific result by using a data set produced by the producer community. One more kind of usability relation is the "discovery relation" that is established when the consumer community tries to discover new insights from a data set produced by the producer community.
Therefore, a community of practice that produces a data set, in order to make it (re)usable by another community of practice, must complement it with appropriate metadata information. The properties of the metadata information (provenance, context, quality, uncertainty, etc.) heavily depend on the "usability relation" established between the producer and consumer communities of practice.
Thus, if a dataset is to be used by different communities of practice, different metadata information must be provided to these diverse communities of practice depending on the characteristics of the "usability relations" that link the producer community of practice with them. For example, for one consumer community, it could be enough to know who, where and when a given dataset was produced; for another, it could be important to know how this dataset was produced.
As a consequence, a data producer community of practice must define metadata models based on the usability relations established between this community and the communities that consume the data produced by it.
Relational thinking makes it possible to choose and organize the metadata information so as to overcome the semantic and pragmatic boundaries between communities of research and, thus, increase the understandability and reusability of the data.
In order to apply the relational thinking approach to improving data reusability, we have to characterize the "usability relation" between data producer and data consumer communities. In particular, we have to consider:
(a) Differences characterizing the relation. A first characterization entails delineating what are the differences associated with a "usability" relation between the data author and data user. It would be important, for example, to be able to characterize the differences in the knowledge and perspectives of a data author and a data user when working in the context of a multidisciplinary/interdisciplinary collaborative research activity.
(b) Dependencies characterizing the relation. A second characterization entails delineating what are the dependencies associated with a "usability" relation between the data author and data consumer. The knowledge developed by the data producer is not inconsequential to the data consumer but develops in dependency of the perspectives promoted by the data consumer. It would be important to be able to delineate the dependencies that characterize the "usability" relation between data producer and data consumer.
(c) Changes characterizing the relation. Differences and dependencies characterizing a "usability" relation between data producer and data consumer change over time. We must assume that the "usability" relation undergoes continuous refinement and/or revision through interactions between the two entities (data author, data user).
The above characterizations should be taken into consideration when defining metadata models. They should guide the definition of good quality metadata models (purpose-oriented, community-specific) that can increase data reusability.
As the quality of metadata is probably the most important factor that determines the reusability of data, we can affirm that the relational thinking approach is decisive in achieving a good level of data reusability.