CS403 Study Guide

Unit 3: Database History

3a. discuss the evolution of database technology and relate it to the evolution of computing

  • What are some major milestones in the history of databases?
  • What are specific drivers or causes that led to those milestones?
  • Outline the evolution of data models from about 1960 to the present. 

Databases as we know them arose around the year 1900, when early computing technology in the form of mechanical tabulating equipment was used for calculating data on punched cards. Punched paper tape soon followed, and then magnetic tape and disks. Around 1950, large data processing computers began to become available. Computing capability increased significantly around this time, as large data processing computers were quickly accepted by business and finance to automate record and file systems. Manufacturing, engineering, and scientific applications arose shortly afterwards, and required manipulating ever-increasing amounts of data.

Early computers had a few registers, just thousands of words of expensive main memory (a typical large computer of the 1960s had 64k or 128k of memory), and relied on slow external media for large amounts of application data. This need for more, cheaper, and faster memory led to improvements in mechanical and magnetic storage media, optical and laser devices, and semiconductor memory. These advances in media technology found rapid application for larger and more complex systems, which revealed the limitations of the file-based paradigm and the need for new models to overcome those limitations. What are some of those 'new' models and the limitations they overcame? To review, read A Brief History of Databases.

Technology milestones usually involve changes in data models or in the technology that implements them. These changes are driven by the need for larger scope, speed, better accessibility, or a higher level of abstraction – which has an associated reduction in effort. The period from 1960 to the 1980s saw the development of database models to satisfy these needs for the applications of that period. What models were developed in that time period? What needs did they address? Even with the evolution of database technology, those needs still persist, because the problems have gotten larger and more complex and the computing technology has changed.

In the 1980s a new programming paradigm, called object-oriented programming, became popular and, soon after, a new conceptual model – the object model, which provided a database model that satisfied the need for compatibility with object-oriented systems. Object database models are not as widely used as other models, in particular the relational model, because of their dependence on object programming languages or APIs (application program interfaces). All models, except for the relational model, provided solutions for the representation and storage of data at the expense of complicating queries. Which query languages are used for relational databases?

In the first decade of the 2000s, the internet boom brought exponential growth in the amount of all types of data and in the number of users for OLTP (online transaction processing), which in turn motivated the need for new data models. This period did not see the end of relational and SQL database systems, but there candidate new models did appear. The 2000s also saw the rise of distributed databases, data warehouses, and OLAP (online analytical processing). Relational databases and SQL were considered too slow for the scale and complexity of these applications. NoSQL databases emerged. NoSQL DBMSes improved performance at the expense of complete ACID features (Atomicity, Consistency, Isolation, and Durability). These features were not considered required, but just desirable. NewSQL data models emerged in the 2010s. What needs did they satisfy? What is a Hybrid DBMS and what desires does it address?

The history of databases has been a recurring loop, going as follows:

  1. data needs
  2. database model solutions for current applications
  3. advances in computing
  4. new applications
  5. go to step 2 for the new applications, but keeping 1 unchanged if possible.

Sometimes the requirements are relaxed to make some gains. The needs for database systems don't often change. They are generic and are often summarized by the acronym ACID: Atomic, Consistent, Isolated, and Durable. Some newer database systems relaxed some of these features to achieve acceptable performance for new applications. This 'relaxation' is summarized as BASE, meaning 'BAsically available', 'Soft state' (not always consistent), and 'Eventually consistent'. We'll go into these features in more detail in later units, but start to think about them now.

To review, watch History of Databases.

 

3b. evaluate the four common data models relative to significant criteria for a selected application

  • What are some common data models? What are the four historical most common?
  • What level of the three levels of data architecture do they correspond to?
  • What are some significant features that represent common data requirements and needs?
  • What features are satisfied by the four historical common models? What features are not satisfied by them?

The most common four data models are hierarchical, network, relational, and object models. Each is capable for organizing data, storing and retrieving it, and satisfying generic database goals for some applications. Each of these models is primarily a model for database design. However, some also correspond to the internal/physical, or application dependencies. Hierarchical and network are best suited to problems that utilize tree/graph data structures. Object models have a dependency on object languages and APIs (application program interfaces). The relational model supports user views, conceptual and logical design, and adheres to the independence of the ANSI/SPARC framework. Moreover, it has a formal, mathematical foundation that enables SQL for data definition and manipulation. Can you describe the four common data models?

Unlike the file-base model, all four data structures (tree, network, relational, object) are independent of the application. Object models have some language dependencies. A hierarchical, network, or relational database for a given domain can be used by many applications. An object database, however, should be used by an object-based application for consistency of the domain model with the design model and access to proprietary program language libraries. 

A hierarchical data model uses a tree structure to organize data, which is a natural representation for many real-world applications. The hierarchy order is used as the physical order of the data on the storage media. The hierarchy is accessed by traversing the paths of the tree. Thus, there is some dependency of the physical structure on the design structure.

A network model extends a hierarchical model by allowing a node to have more than one parent, which enables redundant data to be stored efficiently in a network model (a data item on more than one path can be represented once and referenced whenever it is on a path). Thus, the data is organized as a graph structure. The paths in the graph structure specify the order in which the data nodes are accessed in physical storage. Thus, the physical implementation of a network database is dependent on the network design model.

The elements of a relational model are tables, rows, columns, and table relationships. The implementation of those elements is not specified by the model, and therefore the physical model is independent of the relational design model.

An object design model does not specify the organization or order of the data in the physical model. For operational efficiency, the data design model and the application object model should be compatible, and their physical models should also be compatible with each other. 

Redundancy, data dependency, application dependency, and natural or efficient representation of an application domain can be used to evaluate a data model. Can you think of some other criteria that might be used? Features or criteria are just general requirements, and are often called quality requirements. The development of a database system starts with requirements, and the design model selected is the one that best satisfies the requirements. Database requirements originate with sponsors and stakeholders and include functions, performance, quality requirements. Functions depend on the type of systems that will use the database – for example, information systems, transaction processing, or decision support. Performance (that is, the time to process a transaction or search, or the storage space) depends on how the data is organized, the size of the database, the complexity of the relationships, and how the database is used (such as for storage and retrieval, transaction process, searches, discovery of relationships, or analysis). What sets of requirements might be satisfied by a hierarchical model? A network model? A relational model? An object model?

To review, read pages 113-116 of Database Design and consider the example for student, enrollment, and course entities. How might you outline a hierarchical design for those relationships? A network design for those relationships? An object design for those relationships? We will be studying relational models in great detail in later units.

To review, also see Database Models, Hierarchical Model, Network Model, Relational Model, and Object Model.

 

Unit 3 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course. 

Try to think of the reason why each term is included.

  • Punched card
  • Punched paper tape
  • Magnetic tape
  • Data Drum
  • File System
  • Hierarchical Model
  • Network Model
  • Relational Model
  • Object Model
  • IMS
  • DB2
  • Oracle
  • SQL
  • NoSQL
  • NewSQL
  • Criteria
  • Database features
  • Database requirements
  • ACID
  • BASE
  • OLAP
  • OLTP