CS403 Study Guide

Unit 1: Introduction to Modern Database Systems

1a. explain the difference between data and information and give examples of each

From your study of the prerequisites for this course – Introduction to Computer Science and Elementary Data Structures – you were introduced to data types in the context of programming, and you have worked with numerous examples of them. 

A data type is a definition of properties for a group of values that enables program computations to be performed on those values. The values are referred to as program data. Some programs perform computations on large amounts of data. If we were to extract that data and store it apart from its program, we would have a simple database. In this course, we work with large collections of data, but the context is no longer a specific program. What is the context for our collections of data? It's the corporation, the organization, the enterprise, an industry, a domain of application - however, large.

  • What is the relationship between data and information?
  • Is there a simple answer to this question? 
  • Is this just a question of terminology, or is there a greater distinction that will affect all the other units of this course?

To address these questions, first, define data, and then define information. What are data and information? To review, read Chapter 1 in Database Design, which also identifies problems when an application is dependent on large amounts of data. How do databases help address those problems? Also, be sure to read the definitions of data and information on page 1-3 of Database Systems for Management.

The difference between data and information is not simple. It is not just a question of terminology. It involves important concepts of computer science, including programming language syntax, semantics, application domain modeling, and computational problem-solving. Information is data plus context. Context includes how data is represented (its data type or syntax), how it relates to other data, objects, concepts, or actions (its semantics), and the computations that data enables to solve problems (to use the data). Often the distinction between data and information is relative because it depends on the use of the data.

Think of a variety of different data types and values. Which of the examples are information, which are data, and why? The semantics associated with the data determines whether or not data is useful, and, therefore, whether or not it is information. Note that 'useful' is relative: what is useful to you may not be useful to me. To review, read this chapter.

 

1b. contrast file processing systems and database systems, relating problems of the former that are addressed by features of the latter

  • Why are database systems more useful than file processing systems? 
  • What are some of the problems with file processing? Are they solved by database systems? 

Early applications stored large amounts of data in files, which were used by one or more related programs to perform computations. This required semantic dependencies among the files and programs, resulting in significant effort when a change was made. Fewer dependencies result in easier changes. However, larger and more complicated applications introduce many dependencies. How would you remove dependencies so that many programs could use easily use the data in the files? If the semantic dependencies were removed from the programs and stored with the data files, then changes to the data would not require changes to the programs and more programs could more easily use the data. How would you do that? You could add semantics to the data to make it usable (that is, to make it information). How could you turn data into information? Simply by adding relationships among the data and descriptions of the data – that is, properties, restrictions, and constraints on the data. These are called metadata. To review, study the diagram in Chapter 2 of Database Design.

In addition to high maintenance costs, there are other limitations of file-based systems that prompted the development of databases. What are some of these limitations? Review Disadvantages of the File-Based Approach on page 1. Disadvantages of file-based applications are balanced by advantages of database systems, as described on pages 109-115 under "Objectives of Database Systems" and "User's View".

 

1c. describe what a database management system is and demonstrate how it functions

  • Why does one need a DBMS?
  • What are the functions of a database as compared to the functions of a DBMS?

A database helps address the challenges we encounter when we use data, particularly large collections of data. Solutions to those challenges involve software that is not part of the database itself and is implemented as a logically separate and related subsystem called a database management system. Review these in the second diagram in Chapter 2.

When we need to perform a task, we first plan the task, determine roles, assign responsibilities, perform the tasks, control the tasks, and overview the performance of the tasks. These are generic management functions. Specific management functions for using a database are performed by a database management system. What are some database specific management functions? Look at the description of a database on page 6, under "Database Properties", which lists properties of a database. Do the second and third bullets describe what is in a database? What functions of a DBMS correspond to those two bullet points?

The functions of a DBMS derive from the requirements for a database and are often listed as necessary properties of a database. Those properties are the positive counterparts to the challenges (negative counterparts) we encounter when we use large collections of data. What are some functions of a DBMS that derive from the necessary (positive) properties of a database? Review pages 1-8 to 1-14 about the Objectives of a Database System, which can serve as an overview of our requirements of a DBMS. Some of the requirements and functions of a DBMS, like concurrency, are supported by the operating system and other related systems. Those other systems are applications that utilize the database, as seen in Database Systems and Other Organizational Information Systems on page 1-5. Be sure to review Figure 1-3 on page 22. Where would you insert 'Operating System' into that diagram?

 

1d. compare the various database models

A database is a shared collection of common data plus context. Phrased more formally, a database is a representation of a model (actually, several models). If we wanted to rephrase to use a hierarchical software engineering view of the development activities of a software system, we'd identify requirements analysis, design, implementation, and operation. These each have a corresponding hierarchy of data models. The corresponding data model hierarchy consists of an application domain model, conceptual model, logical model, and physical model. Unit 1 introduces us to several of these models. The terminology for common data models varies and is more specific than the generic software engineering names. 

  • What benefit do data models provide us?
  • What are some types of data models?
  • What are some common database models?

The software engineering and database levels are levels of abstraction that we use to understand an operational database and its development. It is primarily a top-down approach that builds understanding, provides a framework, and guides efficient and effective database development. A database can be viewed from user or external, or internal perspectives, each of which has a corresponding type of model. Database development proceeds from a user view (application domain data model), to design views (conceptual and logical database models), to an internal view (physical database model). Thus, database models provide us abstraction levels that we use to understand databases and to develop databases.

To review, see the second paragraph on page 2-7 and 2-8 on "Database Design Techniques". Also review "Degrees of Abstraction" on pages 15-18. Note Figure 5-1 on page 17. 'Internal view' is a term often used for physical view. We have terms for software engineering database development phases, for views, for data model types, and for specific database models. Can you relate all those terms? Where do the terms schema and subschema fit in the hierarchies? What are some specific common database models? Here is a table that summarizes these ideas:

 

Unit 1 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course. 

Think of the list of terms as a data dictionary, of pointers to key topics in the course content. 

Try to think of the reason why each term is included. 

  • File processing system or file-base system
  • Data
  • Abstraction
    • Redundancy
    • Integrity
  • Levels
  • Semantics 
  • Information
  • Database
  • Database management system (DBMS)
  • Data model
  • Database model
  • Database properties
    • Consistency
    • Isolation
  • Schema
  • Subschema
  • Metadata
  • View
  • Conceptual model
  • Entity-Relationship model
  • Hierarchical database model
  • Network database model
  • Relational database model