Data Modeling and Data Analytics: Discussion | Saylor Academy

Discussion

In this section we compare and discuss the approaches presented in the previous sections in terms of the two perspectives that guide this survey: Data Modeling and Data Analytics. Each perspective defines a set of features used to compare Operational Databases, DWs and Big Data approaches among themselves.

Regarding the Data Modeling Perspective, Table 2 considers the following features of analysis: (1) the data model; (2) the abstraction level in which the data model resides, according to the abstraction levels (Conceptual, Logical and Physical) of the database design process; (3) the concepts or constructs that compose the data model; (4) the concrete languages used to produce the data models and that apply the previous concepts; (5) the modeling tools that allow specifying diagrams using those languages and; (6) the database tools that support the data model. Table 2 presents the values of each feature for each approach. It is possible to verify that the majority of the data models are at a logical and physical level, with the exception of the ER model and the OLAP cube model, which are more abstract and defined at conceptual and logical levels. It is also possible to verify that Big Data has more data models than the other approaches, what can explain the work and proposals that have been conducted over the last years, as well as the absence of a de facto data model. In terms of concepts, again Big Data-related data models have a more variety of concepts than the other approaches, ranging from key-value pairs or documents to nodes and edges. Concerning concrete languages, it is concluded that every data model presented in this survey is supported by a SQL-DDL-like language. However, we found that only the operational databases and DWs have concrete languages to express their data models in a graphical way, like Chen's notation for ER model, UML Data Profile for Relational model or CWM for multidimensional DW models. Also, related to that point, there are none modeling tools to express Big Data models. Thus, defining such a modeling language and respective supporting tool for Big Data models constitute an interesting research direction that fills this lack. At last, all approaches have database tools that support the development based on their data models, with the exception of the ER model that is not directly used by DBMSs.

Table 2. Comparison of the approaches from the Data Modeling Perspective.

Approaches Features	Data Model	Abstraction Level	Concepts	Concrete Languages	Modeling Tools	DB Tools Support
Operational	Entity- Relationship Model	Conceptual, Logical	Entity Relationship Attribute Primary Key Foreign Key	Chen's, Crow's foot, Bachman's, Barker's, IDEF1X	Sparx Enterprise Architect, Visual Paradigm, Oracle Designer, MySQL Workbench, ER/Studio
Operational	Relational Model	Logical, Physical	Table Row Attribute Primary Key Foreign Key, View, Index	SQL-DDL, UML Data Profile	Sparx Enterprise Architect, Visual Paradigm, Oracle Designer, MySQL Workbench, ER/Studio	Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM DB2
Decision Support	OLAP Cube	Conceptual, Logical	Dimensions, Levels, Cube faces, Time dimension, Local dimension	Common Warehouse Metamodel	Essbase Studio Tool, Enterprise Architect, Visual Paradigm	Oracle Warehouse Builder, Essbase Studio Tool, Microsoft Analysis Services
Decision Support	Star Schema	Logical, Physical	Fact table, Attributes table, Dimensions, Foreign Key	SQL-DDL, DML, UML Data Model Profile, UML Profile for Modeling Data Warehouse Usage	Enterprise Architect, Visual Paradigm, Oracle SQL Data Modeler	Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM DB2
Big Data	Key-Value	Logical, Physical	Key, Value	SQL-DDL, Dynamo Query Language		Dynamo, Voldemort
	Document	Logical, Physical	Document, Primary Key	SQL-DDL, Javascript		MongoDB, CounchDB
	Wide-Column	Logical, Physical	Keyspace, Table, Column, Column Family, Super Column, Primary Key, Index	CQL, Groovy		Cassandra, HBase
	Graph	Logical, Physical	Node, Edge, Property	Cypher Query Language, SPARQL		Neo4j, AllegroGraph

On the other hand, in terms of the Data Analytics Perspective, Table 3 considers six features of analysis: (1) the class of application domains, which characterizes the approach suitability; (2) the common operations used in the approach, which can be reads and/or writes; (3) the operations types most typically used in the approach; (4) the concrete languages used to specify those operations; (5) the abstraction level of these concrete languages (Conceptual, Logical and Physical); and (6) the technology support of these languages and operations.

Table 3 shows that Big Data is used in more classes of application domains than the operational databases and DWs, which are used for OLTP and OLAP domains, respectively. It is also possible to observe that operational databases are commonly used for reads and writes of small operations (using transactions), because they need to handle fresh and critical data in a daily basis. On the other hand, DWs are mostly suited for read operations, since they perform analysis and data mining mostly with historical data. Big Data performs both reads and writes, but in a different way and at a different scale from the other approaches. Big Data applications are built to perform a huge amount of reads, and if a huge amount of writes is needed, like for OLTP, they sacrifice consistency (using "eventually consistency") in order to achieve great availability and horizontal scalability. Operational databases support their data manipulation operations (e.g. select, insert or delete) using SQL-ML, which has slight variations according to the technology used. DWs also use SQL-ML through the select statement, because their operations (e.g. slice, dice or drill down/up) are mostly reads. DWs also use SQL-based languages, like MDX and XMLA (XML for Analysis), for specifying their operations. On the other hand, regarding Big Data technologies, there is a great variety of languages to manipulate data according to the different class application domains. All of these languages provide equivalent operations to the ones offered by SQL-ML and add new constructs for supporting both ETL, data stream processing (e.g. create stream, window) [34] and MapReduce operations. It is important to note that concrete languages used in the different approaches reside at logical and physical levels, because they are directly used by the supporting software tools.

Course Introduction

Course Syllabus

Unit 1: Defining the Business Objective and Sourcing Data

1.1: Data Analysis Processes

Lifecycle of a Data Analysis Project

The Market Research Process

1.2: Data Analysis Business Objectives

Big Data Stream Analytics for Sentiment Analysis

Data Modeling and Data Analytics

1.3: Data Collection and Gathering Best Practices

Using BI and Decision-Making Process in Start-ups

Unit 1 Study Resources

Unit 1 Review Video

Unit 1 Review Slides

Study Guide: Unit 1

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Data Analysis

2.1: Data Analysis Methods and Models

Introduction to Analytics

The Stages of Analytics Development

Quantitative Methods

Qualitative Methods

Quantitative and Qualitative Data

Statistical Language

The Difference between Qualitative and Quantitative

Qualitative and Quantitative Research

Data-Driven Decisions

Research Design

2.2: Synthesizing Data Findings

Measures of the Center of the Data

Frequency, Frequency Tables, and Levels of Measurement

Frequency Tables

Unit 2 Study Resources

Unit 2 Review Video

Unit 2 Review Slides

Study Guide: Unit 2

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Visualization Principles and Processes

3.1: Visualization Concepts and Definitions

Data Visualization

Why Is Data Visualization Important?

Presenting Data in Meaningful and Interesting Ways

3.2: Interactive Visualizations and Dashboards

Data Visualization

Interactive Visualization of Refugee Demographics in the U.S.

3.3: Challenges in Visualization

Visualization in Exploratory Data Analysis

Visualizing Big Data with Augmented and Virtual Reality

Describing Data

Unit 3 Study Resources

Unit 3 Review Video

Unit 3 Review Slides

Study Guide: Unit 3

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Visualization Tools and Techniques

4.1: Number Representation

Visual Aids

Using PowerPoint with Excel

Using Charts with Word and PowerPoint

Describing Data

4.2: Formatting and Organizing Data

Best Visualization Practices

4.3: Selecting Visual Representations

Visualization Tools

Visualization Thought Process

4.4: Representing Data Values

Presenting Data with Graphs and Tables

4.5: Coordinating Data Positions and Scales

Improving Visualizations

Unit 4 Study Resources

Unit 4 Review Video

Unit 4 Review Slides

Study Guide: Unit 4

Unit 4 Assessment

Unit 4 Assessment

Unit 5: Evaluating Data Visualizations

5.1: Develop the Data Story