This paper explores what a Database Management System (DBMS) suited to the future may look like based on issues that can be seen today, as well as emerging trends and how this system may be created. An apt example includes a system that allows efficient and continuous querying and mining of data flows that can be employed on media with different computing capacities. What human-to-machine communication and interoperability do you think was most beneficial? Consider how, for example, an individual embedded medical device will be included in DBMS as processes get more complex and storage facilities become more distributed. What are some key aspects of DBMS that could benefit future architectures?
1. Introduction
Database management systems (DBMS) emerged as a flexible and cost-effective solution to information organization, maintenance and access problems found in organizations (e.g., business, academia and government). DBMS addressed these problems under the following conditions: (i) with models and long-term reliable data storage capabilities; (ii) providing retrieval and manipulation facilities for stored data for multiple concurrent users or transactions. The concept of data model (most notably the relational models like the one proposed by Codd; and the object-oriented data models; Cattell and Barry), the Structured Query Language (SQL, Melton and Simon), and the concept of transaction (Gray and Reuter) are crucial ingredients of successful data management in current enterprises.
Today, the data management market is dominated by major object-relational database management systems (OR-DBMS) like Oracle, DB2, or SQLServer. These systems arose from decades of corporate and academic research pioneered by the creators of System R and Ingres.
Fig. 1

Historical outline of DBMS
Since their emergence, innovations and extensions have been proposed to enhance DBMS in power, usability, and spectrum of applications (see Fig. 1).
The introduction of the relational model, prevalent today, addressed the shortcomings of earlier data models. Subsequent data models, in turn, were relegated or became complementary to the relational model. Further developments focused on transaction processing and on extending DBMS to support new types of data (e.g., spatial, multimedia, etc.), data analysis techniques and systems (e.g., data warehouses, OLAP systems, data mining). The evolution of data models and the consolidation of distributed systems made it possible to develop mediation infrastructures that enable transparent access to multiple data sources through querying, navigation and management facilities. Examples of such systems are multi-databases, data warehouses, Web portals deployed on Internet/Intranets, polyglot persistence solutions. Common issues tackled by such systems are (i) how to handle diversity of data representations and semantics? (ii) how to provide a global view of the structure of the information system while respecting access and management constraints? (iii) how to ensure data quality (i.e., freshness, consistency, completeness, correctness)?
Besides, the Web of data has led to Web-based DBMS and XML data management systems serving as pivot models for integrating data and documents (e.g., active XML). The emergence of the Web marked a turning point, since the attention turned to vast amounts of new data outside of the control of a single DBMS. This resulted in an increased use of data integration techniques and exchange data formats such as XML. However, despite its recent development, the Web itself has experienced significant evolution, resulting in a feedback loop between database and Web technologies, whereby both depart from their traditional dwellings into new application domains.
Figure 2 depicts the major shifts in the use of the Web. A first phase saw the Web as a mean to facilitate communication between people, in the spirit of traditional media and mainly through email. Afterward, the WWW made available vast amounts of information in the form of HTML documents, which can be regarded as people-to-machines communication. Advances in mobile communication technologies extended this notion to mobile devices and a much larger number of users, the emergence of IoT environments increases the number of data producers to thousands of devices. A more recent trend exemplified by Web Services, and later Semantic Web Services, as well as Cloud computing, consists of machine-to-machine communication. Thus, the Web has also become a platform for applications and a plethora of devices to interoperate, share data and resources.
Fig. 2

Shifts in the use of the Web
Most recent milestones in data management (cf. Fig. 1) have addressed data streams leading to data stream management systems (DSMS). Besides, mobile data providers and consumers have also led to data management systems dealing with mobile queries and mobile objects. Finally, the last 7 years concern challenges introduced by the XXL phenomenon including the volume of data to be managed (i.e., Big Data volume) that makes research turn back the eyes toward DBMS architectures (cloud, in-memory, GPU DBMSs), data collection construction and parallel data processing. In this context, it seems that Big Data processing must profit from available computing resources by applying parallel execution models, thereby achieving results in "acceptable" times.
Relational queries are ideally suited to parallel execution because they consist of uniform operations applied to uniform streams of data. Each operator produces a new relation, so the operators can be composed into highly parallel dataflow graphs. By streaming the output of one operator into the input of another operator, the two operators can work in series giving pipelined parallelism. By partitioning the input data among multiple processors and memories, an operator can often be split into many independent operators, each working on a part of the data. This partitioned data and execution leads to partitioned parallelism that can exploit available computing resources.
In this context, we can identify three major aspects that involve database and Web technologies, and that are crucial for satisfying the new information access requirements of users: (i) a large number of heterogeneous data sources accessible via standardized interfaces, which we refer to as data services (e.g., Facebook, Twitter); (ii) computational resources supported by various platforms that are also publicly available through standardized interfaces, which we call computation services (e.g., hash Amazon E3C service); (iii) mobile devices that can both generate data and be used to process and display data on behalf of the user.
The new DBMS aims at fulfilling ambient applications requirements, data curation and warehousing, scientific applications, on-line games, among others. Therefore, future DBMS must address the following data management issues:
- Data persistence for managing distributed storage spaces delivered by different providers (e.g., dropbox, onedrive, and googledrive); efficiently and continuously ensuring data availability using data sharing and duplication; ensuring data usability and their migration into new hardware storage supports.
- Efficient and continuous querying, and mining of data flows. These are complex processes requiring huge amounts of computing resources. They must be designed, implemented and deployed on well adapted architectures such as the grid, the cloud but also sensor networks, mobile devices with different physic capacities (i.e., computing and storage capacity).
- Querying services that can implement evaluation strategies able to:
- (i) process continuous and one-shot queries that include spatiotemporal elements and nomad sources;
- (ii) deal with exhaustive, partial and approximate answers;
- (iii) use execution models that consider accessing services as data providers and that include as a source the wisdom of the crowd;
- (iv) integrate "cross-layer" optimization that includes the network and the enabling infrastructure as part of the evaluation process, and that can use dynamic cost models based on execution, economic, and energy costs.
The DBMS of the future must also enable the execution of algorithms and of complex processes (scientific experiments) that use huge data collections (e.g., multimedia documents, complex graphs with thousands of nodes). This calls for a thorough revision of the hypotheses underlying the algorithms, protocols and architectures developed for classic data management approaches. In the following sections we discuss some of these issues focusing mainly on the way data management services of different granularities are delivered by different DBMS architectures. Section 2 analyses DBMS architectures evolution, from monolithic to customizable systems. Section 3 discusses how scalability and extensibility properties can have associated implications on systems performance. Section 4 presents important issues and challenges concerning management systems through the description of Big Data stacks, Big Data management systems, environments and data cloud services. Section 5 discusses open issues on DBMS architectures and how they deliver their functions for fulfilling application requirements. Section 6 describes some perspectives.