1. Introduction

New high-throughput scientific instruments, telescopes, satellites, accelerators, supercomputers, sensor networks, and running simulations are generating massive amounts of scientific data. Often referred to as a data deluge, massive datasets are revolutionizing the way research is carried out, which results in the emergence of a new fourth paradigm of science based on data-intensive computing. This data-dominated science will lead to a data-centric way of thinking, organizing, and conducting research activities that could lead to new approaches to solve problems that were previously considered extremely hard or, in some cases, even impossible to solve and also lead to serendipitous discoveries. Today, one of the main challenges faced by researchers is to make the best use of the world's growing wealth of data.

By data (re)usability, we mean the ease of using data for legitimate scientific research by one or more communities of research (consumer communities) that is produced by other communities of research (producer communities). We use the term data reusability to mean the ease of use of data collected for one purpose to study a new problem. This term denotes the reutilization of existing datasets in significantly different contexts. Data reusability is becoming a distinct characteristic of modern scientific practice, as it allows the reanalysis of evidence, reproduction, and verification of results, minimizing duplication of effort, and building on the work of others.

Data (re)usability can be effectively implemented in the Open Science framework, as the ultimate goal of the Open Science is to make research data publicly available and (re)usable. The European Commission is moving decisively towards the implementation of an Open Science framework in Europe: In 2012, the European Commission encouraged all European Union EU Member States to put public-funded research results in the public sphere in order to make science better and strengthen their knowledge-based economy, via a Recommendation. A recent document The Amsterdam Call for Action on Open Science advocates "full open access for all scientific publications" and endorses an environment where data sharing and stewardship is the default approach for all publicly funded research. This document was produced at an Open Science meeting organized by the Dutch Presidency of the Council of the European Union (4–5 April 2016).

Another initiative of the European Commission that is worthwhile to mention is the publication of Guidelines on FAIR Data Management in Horizon 2020, that is, a set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable.

Data reusability has four main dimensions: policy, legal, economic, and technological. A legal and policy framework should favor the open availability of scientific data and allow legal jurisdictional boundaries to be overcome; the economics concern how the costs associated with the process of making scientific data reusable are distributed among the stakeholders; and technology should render physical and semantic barriers irrelevant. In this paper, we will concentrate on the technological dimension of data reusability.

The paper is organized in the following way: Section 2 describes the research-data universe composed of different types of research data, different kinds of data collections, of many data actors and different data uses. Section 3 introduces the conceptual foundations of data reusability, i.e., relational thinking, knowledge boundaries, data abstraction/levellism, and representation. Section 4 discusses the barriers that hamper data reuse. In Section 5, the data publication process, that spans the distance between the data author and the data user, is described. In Section 6, the technologies that enable this process are briefly described. Section 7 stresses the important role of standards in making data usable. Finally, Section 8 summarizes the main points to be taken into consideration when addressing the pressing need to reuse large datasets produced by the research communities.