2. From Monolithic to Customizable DBMS Architectures

Different kinds of architectures serve different purposes. The ANSI/SPARC architecture (Fig. 3) that characterizes classic database management systems (relational, object oriented, XML) deployed on client-server architectures has evolved in parallel to the advances resulting from new application requirements, data volumes, and data models. The 3-level-schema architecture reflects the different levels of abstraction of data in a database system distinguishing: (i) the external schemata that users work with, (ii) the internal integrated schema of the entire database; (iii) the physical schema determining the storage and the organization of databases on secondary storage.

Fig. 3

figure 3
ANSI-SPARC DBMS architecture


The structure of a monolithic DBMS shown in Fig. 4 shows three key components of the system: the storage manager, the transaction manager, and the schema manager.

Fig. 4

figure 4

Classic DBMS functions

The evolution of devices with different physical capacities (i.e., storage, computing, memory), and systems requiring data management functions started to show that adding more and more functions to monolithic DBMS does not work. Instead, it seems attractive to consider the alternative of extending DBMS allowing functionality to be added or replaced in a modular manner, as needed.

While such a closely woven implementation provides good performance/efficiency, customization is an expensive and difficult task because of the dependencies among the different components. For example, changing the indexing or clustering technique employed by the storage manager, changing the instance adaptation approach employed by the schema manager or the transaction model can have a large ripple effect on the whole system.

During the last twenty years, in order to better match the evolution of user and application needs, many extensions have been proposed to enhance the DBMS functions. In order to meet all new requirements, DBMS were extended to include new functionalities. Extensible and personalizable database systems were an attempt to ease the construction of DBMS by exploiting software reusability, and proposing a general core that can be customized or extended, or even used to generate some DBMS parts. Trade-offs between modularity and efficiency, granularity of services, and the number of inter-service relationships result in DBMS designs which lack customizability. A study of the standard task-oriented architecture of DBMS can be useful to determine their viability in new environments and for new applications. The following paragraphs give an overview of the main elements for showing how DBMS have and should evolve in order to address the scalability and performance requirements for data management.


Classic Functional Architecture

The classic DBMS architecture consists of a number of layers each supporting a set of data types and operations at its interface. It consists of several components (modules or managers of concrete or abstract resources). The data types and operations defined for the modules of one layer are implemented using the concepts (data types and operations) of the next-lower level. Therefore, the layered architecture can also be considered as a stack of abstract machines. The layered architecture model as introduced by Härder and Reuter (1983) is composed of five layers described in:

  1. The uppermost layer supports logical data structures such as relations, tuples, and views. Typical tasks of this layer include query processing and optimization, access control, and integrity enforcement.
  2. The next layer implements a record-oriented interface. Typical entities are records and sets as well as logical access paths. Typical components are the data dictionary, transaction management, and cursor management.
  3. The middle layer manages storage structures (internal records), physical access paths, locking, logging, and recovery. Therefore, relevant modules include the record manager, physical access path managers (e.g., a hash table manager), and modules for lock management, logging, and recovery.
  4. The next layer implements (page) buffer management and implements the page replacement strategy. Typical entities are pages and segments.
  5. The lowest layer implements the management of secondary storage (i.e., maps segments, and pages to blocks and files).

Due to performance considerations, no concrete DBMS has fully obeyed the layered architecture. Note that different layered architectures and different numbers of layers are proposed, depending on the desired interfaces at the top layer. If, for instance, only a set-oriented interface is needed, it is useful to merge the upper two layers. In practice, most DBMS architectures have been influenced by System R, which consists of two layers:

  • The relational data system (RDS), providing the relational data interface (RDI). It implements SQL (including query optimization, access control, triggers, etc.);
  • The relational storage system (RSS), supporting the relational storage interface (RSI). It provides access to single tuples of base relations at its interface.

Layered architectures were designed to address customizability, but they provide partial solutions at a coarse granularity. In the layered architecture, for example, the concurrency control components are spread across two different layers. Customization of the lock management or recovery mechanisms (residing in the lower layer) have a knock-on effect on the transaction management component (residing in the higher layer).

As we will discuss in the next sections, layered architectures are used by existing DBMS, and they remain used despite the different generations of these systems. In each generation, layers and modules were implemented according to different paradigms (e.g., object, component, and service oriented) changing their granularity and the transparency degree adopted for encapsulating the functions implemented by each layer.


OODBMS: Relaxing Data Management and Program Independence

The first evolution of DBMS is when the object-oriented (OO) paradigm emerged, and the logic and physical levels started to approach for providing efficient ways of dealing with persistent objects. Together with the OO paradigm emergence, it was possible to develop applications requiring databases that could handle very complex data, that could evolve gracefully, and that could provide the high-performance dictated by interactive systems. Database applications could be programmed with an OO language, and then object persistence was managed by an Object-Oriented DBMS (OODBMS). The OODBMS manifesto stated that persistence should be orthogonal, i.e., each object, independent of its type, is allowed to become persistent as such (i.e., without explicit translation). Persistence should also be implicit: the user should not have to explicitly move or copy data to make it persistent. This implied also that transparency was enforced regarding secondary storage management (index management, data clustering, data buffering, access path selection and query optimization).

Extensible database systems allowed new parts such as abstract data types or index structures to be added to the system. Enhancing DBMS with new Abstract Data Type (ADT) or index structures was pioneered in the Ingres/Postgres systems. Ingres supports the definition of new ADTs, including operators. References to other tuples can be expressed through queries (i.e., the data type postquel), but otherwise ADTs, and their associated relations, still had to be in first normal form. This restriction was relaxed in systems that have a more powerful type system (e.g., an OO data model). Another area in which extensions have been extensively considered are index structures. In Ingres/Postgres, existing indexes (such as B-trees) can be extended to also support new types (or support existing types in a better way). To extend an index mechanism, new implementations of type-specific operators of indexes have to be provided by the user. In this way, existing index structures were tailored to fit new purposes, and thus have been called extended secondary indexes.

This evolution responded to the need of providing flexibility to the logic level adapting the physical level in consequence. The idea was to approach the three levels by offering ad hoc query facilities, and let applications define the way they could navigate the objects collections, for instance, a graphical browser could be sufficient to fulfill this functionality. This facility could be supported by the data manipulation language or a subset of it.

As the architecture of the DBMS evolved according to the emergence of new programming paradigms like components and services, and to "new" data models like documents (XML), the frontiers among the three levels started to be thiner, and transparency concerning persistence and transaction management was less important. Component-oriented middleware started to provide persistence services and transaction monitors as services that required programmers to configure and integrate these properties within the applications. Data and program independence was broken but the ad hoc configuration of data management components or services seemed to be easier to configure since it was more efficient to personalize functions according to application needs.


Component-Oriented DBMS: Personalizing Data Management

Component aware was a paradigm to address reusability, separation of concerns (i.e., separation of functional from non-functional concerns) and ease of construction. Component-based systems are built by putting components together to form new software systems. Systems constructed by composition can be modified or extended by replacing or adding new components (Fig. 5).

Fig. 5

figure 5

Component DBMS

Approaches to extend and customize DBMS adopted the component-oriented paradigm for designing at least the customizable modules of the architecture, as components. Plug-in components are added to functionally complete DBMS and fulfill specialized needs. The components of component database management systems (CDBMS) are families of base and abstract data types or implementations of some DBMS function, such as new index structures. To date, all systems in this category are based on the relational data model and existing relational DBMS, and all of them offer some OO extensions. Example systems include IBM, DB2 UDB (IBM 1995), Informix Universal Server (Informix 1998), Oracle8 (Oracle 1999), and Predator.

Furthermore, the customization approach employed by most commercial DBMS are still largely monolithic (to improve performance). Special points similar to hot spots in OO frameworks allow to custom components to be incorporated into the DBMS. Examples of such components include Informix DataBlades, Oracle Data Cartridges and DB2 Relational Extenders. However, customization in these systems is limited to the introduction of user-defined types, functions, triggers, constraints, indexing mechanisms, and predicates, etc.


Database Middleware

Another way of addressing DBMS "componentization" was to provide database middlewares. Such middlewares leave data items under the control of their original (external) management systems while integrating them into a common DBMS-style framework. External systems exhibit, in many cases, different capabilities, such as query languages with varying power or no querying facilities at all. The different data stores might also have different data models (i.e., different data definition and structuring means), or no explicit data model at all. The goal of graceful integration is achieved through componentization.

The architecture introduces a common (intermediate) format into which the local data formats can be translated. Specific components perform this kind of translation. Besides, common interfaces and protocols define how the database middleware system and the components should interact (e.g., in order to retrieve data from a data store). These components (called wrappers) are also able to transform requests issued via these interfaces (e.g., queries) into requests understandable by the external system. In other words, these components implement the functionality needed to access data managed by the external data store. Examples of this approach include Disco, Garlic, OLE DB, Tsimmis, Harmony (which implemented the CORBA query service), and Sybase Adaptive Server Enterprise. Sybase allows access to external data stores, in Sybase called specialty data stores, and other types of database systems. ADEMS proposes mediation cooperative components or services that can broker and integrate data coming from heterogeneous sources. The cooperative brokers allow to build an extensible data mediation system.


Configuring and Unbundling Data Management

Configurable DBMS rely on unbundled DBMS tasks that can be mixed and matched to obtain database support (see Fig. 6). The difference lies in the possibility of adapting functions (called services) implementations to new requirements or in defining new services whenever needed. Configurable DBMS also consider services as unbundled representations of DBMS tasks. However, the models underlying the various services, and defining the semantics of the corresponding DBMS parts can now, in addition, be customized. Components for the same DBMS task can vary not only in their implementations for the same standardized interface, but also in their interfaces for the same task. DBMS implementors select (or construct new) components implementing the desired functionality, and obtain a DBMS by assembling the selected components. There are different approaches for configuring and composing unbundled DBMS services: kernel systems, customizable systems, transformational systems, toolkits, generators and frameworks.

Fig. 6

figure 6

Extensible DBMS

In principle, (internal) DBMS components are programmed and exchanged to achieve specific functionality in a different way than in the original system. A crucial element is the underlying architecture of the kernel, and the proper definition of points where exchanges can be performed. An example of this kind of DBMS is Starburst: its query language can be extended by new operators on relations and various phases of the query processing are also customizable (e.g., functions are implemented using the interfaces of a lower layer (kernel) sometimes using a dedicated language). GENESIS is a transformational approach that supports the implementation of data models as a sequence of layers. The interface of each layer defines its notions of files, records, and links between files. Transformations themselves are collected in libraries, so that they can be reused for future layer implementations. Another transformational approach that uses specification constructs similar to those of Acta has been described by EXODUS. EXODUS applies the idea of a toolkit for specific parts of the DBMS. A library is provided for access methods. While the library initially contains type-independent access methods such as B-trees, grid files, and linear hashing, it can also be extended with new methods. Other examples are the Open OODB (Open Object-Oriented Database) approach, Trent for the construction of transaction managers (mainly, transaction structures and concurrency control) and " à la carte" for the construction of heterogeneous DBMS.

One problem in any toolkit approach is the consistency (or compatibility) of reused components. Generation approaches instead support the specification of (parts of) a DBMS functionality and the generation of DBMS components based on those specifications. A programmer defines a model (e.g., an optimizer, a data model, or a transaction model), which is given as input to a generator. The generator then automatically creates a software component that implements the specified model based on some implementation base (e.g., a storage manager or kernel in the case of data model software generation). An example of a generator system is the EXODUS query-optimizer generator. Volcano, the successor of the EXODUS optimizer generator, also falls into the group of generator systems. Volcano has been used to build the optimizer for Open OODB.

Systems like KIDS, Navajo, and Objectivity provide a modular, component-based implementation. For example, the transaction manager (or any other component) can be exchanged with the ripple effect mainly limited to the glue code. However, to strike the right balance between modularity and efficiency the design of the individual components is not highly modular. In fact, the modularity of the individual components is compromised to preserve both modularity and efficiency of the DBMS. Approaches like NODS proposed service-oriented networked systems at various granularities that cooperated at the middleware level. The NODS services could be customized on a per-application basis at a fine grained level. For example, persistence could be configured at different levels memory, cache or disk and it could cooperate with fault tolerance protocols for providing, for example, different levels of atomic persistent data management. Other frameworks for building query optimizers are the ones described in Cascades, and EROC (Extensible Reusable Optimization Components). Framboise and ODAS are frameworks for layering active database functionality on top of passive DBMS.

However, customization at a finer granularity (i.e., the components forming the DBMS) is expensive. Such customization is cost-effective if changes were localized without compromising the system performance. Such performance can be ensured through closely woven components, i.e., both modularity and efficiency need to be preserved.


Summarizing Componentization of DBMS

CDBMS were successful because of the adoption of the notion of cartridge or blade by commercial DBMS. Other academic solutions were applied in some concrete validations. It is true that they enabled the configuration of the DBMS, but they still provided monolithic, complex, resources consuming systems ("kernels") that need to be tuned and carefully managed for fulfilling the management of huge data volumes. These systems continued to encourage classic conception of information systems, with clear and complete knowledge of the data they manage, with global constraints, and homogeneous management with well identified needs. Yet, the evolution of technology, and the production of data stemming from different devices and services, the access to non-curated continuous data collections, the democratized access to continuous information (for example in social networks) calls for light weight data management services delivered in ad hoc personalized manners and not only in full-fledged one fits all systems like the (C)DBMS. Together with this evolution, emerged the notion of service aiming to ease the construction of loosely coupled systems. DBMS then started to move toward this new paradigm and were redefined as data management service providers.


Service-Oriented DBMS

Today the DBMS architecture has evolved to the notion of service-based infrastructure where services are adapted and coordinated for implementing ad hoc data management functions (storage, fragmentation, replication, analysis, decision making, data mining). These functions are adapted and tuned for managing huge distributed multiform multimedia data collections. Applications can extend the functionality of DBMS through specific tasks that have to be provided by the data management systems, these tasks are called services, and allow interoperability between DBMS and other applications.

Fig. 7

figure 7

Service-oriented DBMS

Subasu et al. proposes a database architecture on the principles of service-oriented architecture (SOA) as a system capable of handling different data types, being able to provide methods for adding new database features (see Fig. 7). The service-based data management system (SBDMS) architecture borrows the architectural levels from Härder, and includes new features and advantages introduced by SOA into the field of database architecture. It is organized into functional layers that each with specialized services for specific tasks.

Storage services work on the byte level, in very close collaboration with file management functions of the operating system. These services have to handle the physical specifications of each non-volatile device. In addition, they provide services for updating existing data and finding stored data, propagating information from the Access Services Layer to the physical level. Since different data types require different storage optimizations, special services are created to supply their particular functional needs. This layer is equivalent to the first and second layer of the five layer architecture presented by Härder and Reuter.

Access services is in charge of the physical data representations of data records and provides access path structures like B-trees. It provides more complex access paths, mappings, particular extensions for special data models, that are represented in the Storage Services Layer. Moreover, it is responsible for sorting record sets, navigating through logical record structures, making joins, and similar higher-level operations. This layer represents a key factor to database performance. The Access Services Layer has functions that are comparable to those in the third and fourth layer as presented by Härder and Reuter.

Data services provide data represented in logical structures like tables or views. These are data structures without any procedural interface to the underlying database. The Data Service Layer can be mapped to the Non-Procedural and Algebraic Access level in the architecture by Härder and Reuter.

Extension services users can design tailored extensions for example, creating new services or reusing existing ones from any available service from the other layers. These extensions help to manage different data types like XML files or streaming data. In this layer, users can integrate application-specific services in order to provide specific data types or specific functionalities needed by their applications (e.g., for optimization purposes).

A service-based DBMS externalizes the functions of the different systems layers, and enables the programming of personalized data management as a service systems. They make it possible to couple the data model characteristics with well adapted management functions that can themselves be programmed in an ad hoc manner. The DBMS remains a general purpose system that can be personalized, thanks to service composition, to provide ad hoc data management. It is then possible to have services deployed in architectures that make them available to applications in a simple way (e.g., cluster, cloud).

As discussed before the evolution of the DBMS architecture responds to the evolution of applications requirements in regard to efficient management. With the emergence of the notion of service, the DBMS architecture has been "fragmented" into components and services that are deployed in distributed platforms such as the Web 2.0. Applications use different kinds of data that must be managed according to different purposes: some data collections are read oriented with few writes; other data is modified continuously, and it is exploited by non-concurrent read operations. Some collections are shared, and they can support low consistency levels as long as they are available. Furthermore, such data is multiform, and more and more multimedia, they are modeled or at least exchanged as documents, particularly if they stem from the Web.

Requirements concerning data management performance vs. volume, and the effort of constructing data collections themselves has determined the evolution of DBMS toward efficiency. The three level architecture that encouraged program-data independence based on series of transformations among layers seems inappropriate to fulfill performance requirements. The architectures are making levels separations thin. The principle being that the less transformations among data are required the more efficient are data management functions, particularly querying, accessing, and processing. It seems that the very principle of independence between programs and data management is a very expensive quality that is not worth paying in certain situations.