Discussion

In this section we compare and discuss the approaches presented in the previous sections in terms of the two perspectives that guide this survey: Data Modeling and Data Analytics. Each perspective defines a set of features used to compare Operational Databases, DWs and Big Data approaches among themselves.

Regarding the Data Modeling Perspective, Table 2 considers the following features of analysis: (1) the data model; (2) the abstraction level in which the data model resides, according to the abstraction levels (Conceptual, Logical and Physical) of the database design process; (3) the concepts or constructs that compose the data model; (4) the concrete languages used to produce the data models and that apply the previous concepts; (5) the modeling tools that allow specifying diagrams using those languages and; (6) the database tools that support the data model. Table 2 presents the values of each feature for each approach. It is possible to verify that the majority of the data models are at a logical and physical level, with the exception of the ER model and the OLAP cube model, which are more abstract and defined at conceptual and logical levels. It is also possible to verify that Big Data has more data models than the other approaches, what can explain the work and proposals that have been conducted over the last years, as well as the absence of a de facto data model. In terms of concepts, again Big Data-related data models have a more variety of concepts than the other approaches, ranging from key-value pairs or documents to nodes and edges. Concerning concrete languages, it is concluded that every data model presented in this survey is supported by a SQL-DDL-like language. However, we found that only the operational databases and DWs have concrete languages to express their data models in a graphical way, like Chen's notation for ER model, UML Data Profile for Relational model or CWM for multidimensional DW models. Also, related to that point, there are none modeling tools to express Big Data models. Thus, defining such a modeling language and respective supporting tool for Big Data models constitute an interesting research direction that fills this lack. At last, all approaches have database tools that support the development based on their data models, with the exception of the ER model that is not directly used by DBMSs.

Table 2. Comparison of the approaches from the Data Modeling Perspective.

Approaches

Features

Data Model

Abstraction Level

Concepts

Concrete

Languages

Modeling

Tools

DB Tools

Support

Operational

Entity-

Relationship Model

Conceptual, Logical

Entity

Relationship

Attribute

Primary Key

Foreign Key

Chen's, Crow's foot, Bachman's, Barker's, IDEF1X

Sparx Enterprise Architect,

Visual Paradigm,

Oracle Designer, MySQL Workbench,

ER/Studio

Relational Model

Logical, Physical

Table

Row

Attribute

Primary Key

Foreign Key,

View,

Index

SQL-DDL, UML Data Profile

Sparx Enterprise Architect,

Visual Paradigm,

Oracle Designer, MySQL Workbench,

ER/Studio

Microsoft SQL Server, Oracle, MySQL,

PostgreSQL,

IBM DB2

Decision Support

OLAP

Cube

Conceptual, Logical

Dimensions, Levels, Cube faces, Time dimension,

Local dimension

Common Warehouse Metamodel

Essbase Studio Tool,

Enterprise Architect,

Visual Paradigm

Oracle Warehouse Builder, Essbase Studio Tool,

Microsoft Analysis Services

Star

Schema

Logical, Physical

Fact table, Attributes table, Dimensions,

Foreign Key

SQL-DDL, DML, UML Data Model Profile, UML Profile for Modeling Data Warehouse Usage

Enterprise Architect,

Visual Paradigm,

Oracle SQL Data Modeler

Microsoft SQL Server, Oracle, MySQL,

PostgreSQL,

IBM DB2

Big Data

Key-Value

Logical, Physical

Key,

Value

SQL-DDL,

Dynamo Query Language

Dynamo,

Voldemort

Document

Logical, Physical

Document,

Primary Key

SQL-DDL, Javascript

MongoDB,

CounchDB

Wide-Column

Logical, Physical

Keyspace, Table, Column,

Column Family,

Super Column, Primary Key,

Index

CQL, Groovy

Cassandra,

HBase

Graph

Logical, Physical

Node,

Edge,

Property

Cypher Query Language, SPARQL

Neo4j,

AllegroGraph


On the other hand, in terms of the Data Analytics Perspective, Table 3 considers six features of analysis: (1) the class of application domains, which characterizes the approach suitability; (2) the common operations used in the approach, which can be reads and/or writes; (3) the operations types most typically used in the approach; (4) the concrete languages used to specify those operations; (5) the abstraction level of these concrete languages (Conceptual, Logical and Physical); and (6) the technology support of these languages and operations.

Table 3 shows that Big Data is used in more classes of application domains than the operational databases and DWs, which are used for OLTP and OLAP domains, respectively. It is also possible to observe that operational databases are commonly used for reads and writes of small operations (using transactions), because they need to handle fresh and critical data in a daily basis. On the other hand, DWs are mostly suited for read operations, since they perform analysis and data mining mostly with historical data. Big Data performs both reads and writes, but in a different way and at a different scale from the other approaches. Big Data applications are built to perform a huge amount of reads, and if a huge amount of writes is needed, like for OLTP, they sacrifice consistency (using "eventually consistency") in order to achieve great availability and horizontal scalability. Operational databases support their data manipulation operations (e.g. select, insert or delete) using SQL-ML, which has slight variations according to the technology used. DWs also use SQL-ML through the select statement, because their operations (e.g. slice, dice or drill down/up) are mostly reads. DWs also use SQL-based languages, like MDX and XMLA (XML for Analysis), for specifying their operations. On the other hand, regarding Big Data technologies, there is a great variety of languages to manipulate data according to the different class application domains. All of these languages provide equivalent operations to the ones offered by SQL-ML and add new constructs for supporting both ETL, data stream processing (e.g. create stream, window) [34] and MapReduce operations. It is important to note that concrete languages used in the different approaches reside at logical and physical levels, because they are directly used by the supporting software tools.