3. The Conceptual Foundations of Data Reusability

3.2. Knowledge Boundaries

In all the research activities, experimental, observational, or computational together with the production of scientific data, a rich body of knowledge is also created. This knowledge can be of two types: explicit knowledge and tacit knowledge.

In order to make the scientific data effectively reusable, the underpinning explicit and tacit knowledge also has to be made reusable. The notion of knowledge reuse refers to the concepts of transferring and reutilizing existing knowledge bases in significantly different contexts. Ideally, it would be desirable to be able to handle these two types of knowledge as a commodity that can be extracted, represented, and packaged within a given context (data producer context) and transferred and easily inserted in another context (data user context).

This means that both types of knowledge should be part of the data publishing process, that is, the process through which scientific data are made sharable and usable.

The difficulty in making knowledge reusable consists in the fact that what is codified in one discipline may not be understood to those in other fields because of the intellectual content and amount of background needed.


3.2.1. Explicit Knowledge Reuse

By explicit knowledge, we mean knowledge that can be encoded in some language and exchanged between distributed research teams.

Building a knowledge base also implies to endow it with a number of components that help to generate knowledge from the knowledge base. Part of the complexity of reusing knowledge stems from the multiple components of knowledge that should be reused. In fact, making a knowledge base reusable implies that these components should also be reusable. Among these components, we identify two that are particularly important:

Reusable Lexicons: In building a knowledge base, an important step is the establishment of the domain of discourse. It consists of identifying the objects in the world about which an inference engine will reason and the set of linguistic terms, which have a precise and invariant meaning, by which both the engine and the users will refer to those objects. A lexicon is reusable if it contains a set of reusable terms. By reusable terms, we mean that an equivalence can be established among these terms and the terms of other different lexicons.

Reusable Ontologies: In many cases, it is important to share more than a common vocabulary; it is required to specify also the relationships among the objects in the world to which the term refers, to understand how classes of objects can be defined and what are the rules that allow the assignment of individual objects to particular classes. In essence, it is necessary to create ontologies. A domain-specific ontology is reusable if it can be aligned with other domain-specific ontologies (see Section 6.2).


3.2.2. Tacit Knowledge

By tacit knowledge, we mean knowledge that is confined within specific practices and interpersonal exchanges and bound up with a set of communications, tools, etc. The main characteristic of this type of knowledge is its embeddedness. This characteristic of tacit knowledge makes its codification in some language very difficult.

In order to make tacit knowledge reusable, we must transform it into a "mobile knowledge", that is, a knowledge that can be codified in some language and easily transported or translated from one working context to another one.

Unfortunately, there are several difficult problems that hinder the knowledge transformation from tacit into mobile.

The main conceptual problem is how to transform knowledge that is embedded within highly specific scientific domains into mobile knowledge that can cross several scientific domains. The literature in many scientific fields addresses the tension between rich knowledge that is embedded in interpersonal contexts, and the need to make knowledge mobile when it must be shared and reused by distributed teams of researchers. Factors that can influence the effectiveness and efficiency of the knowledge transformation from tacit to mobile include:

(I) the characteristics of the knowledge,

(ii) the functionality of the data and communications infrastructures that support the data publishing process and the mobilization of knowledge, and

(iii) the characteristics of the working contexts involved in a distributed collaborative effort (data producer context, data consumer context).

The mobile knowledge derived from tacit knowledge is highly contextualized. Therefore, in order to make the mobile knowledge shareable among different teams, it is essential to create an interpretative context shared by all the actors involved in collaborative efforts.

As a conceptual framework within which knowledge can be embodied, mobilized, and shared has been proposed, the concept of "boundary object" is relevant. Boundary objects are those objects that both inhabit several communities of practice and satisfy the informational requirements of each of them. Boundary objects are thus both plastic enough to adapt to local needs and constraints of several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use and become strongly structured in individual-site use. These objects may be abstract or concrete. Such objects have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation.

Boundary objects could play a key role in the successful translation of knowledge between different communities. Unfortunately, boundary objects are not well understood or easily identified, so their use as a translation tool is not widely implemented.