BUS611 Study Guide

Unit 3: Data Models

3a. Describe the various data models, including logical models like the ERD model and physical models like the relational model

  • What are data models, and how are they used?
  • What are the different types of data models, and how do they differ?
Data models provide a basis for designing a computerized database system and are also beneficial for communicating with non-technical business users. Business process owners work with data modelers to ensure that the developed database system will meet the organization's needs. The data model provides a common point of reference in the development process. As a data professional, you must create and structure database tables to support business operations and processes.
 
Conceptual, logical, and physical data models are methods used to model data in a domain. While each of these is considered a data model, they are created for different purposes.
 
A conceptual model focuses on identifying data used within a business. It supports business development and events and tracks performance measures. However, it does not focus on process flow or other physical characteristics.
 
Logical data models are a way to translate business concepts for data purpose/use into Entity-Relationship diagrams. Logical data models describe data in as much detail as possible. This model is more complex than conceptual models. Logical data models establish column (data) types to include all entities and relationships among them. Remember, logical data models have nothing to do with creating a database. We need to identify the data elements we want to translate and how they are related through key fields.
 
Physical data models define all required logical database components and services to build a database. Physical data models can also be a layout of an existing database. Typically, physical data models contain the table structure, column names, primary keys, and other relationships among tables. The most commonly used physical model is the Relational Model. The relational model was initially developed by Edgar F. Codd in 1969 and has become the international standard. Today most Database Management Systems are Relational Database Management Systems (RDBMS). Gaining an understanding of the relational model and relational databases is thus an essential skill for anyone seeking to work in the data management field.
 
This figure illustrates the relationship between the user's view of the data and the different types of models that are used to design databases.

relationship between the user's view of the data and the different types of models that are used to design databases

The ultimate product of a logical and physical data modeling process is the design of the structure of the database tables, also called relations. These tables must be appropriately designed to show things like cardinality, the key structure that preserves data integrity, and the relationships between the tables. These relational designs are created using relational notation and are then implemented in the DBMS using languages like SQL.
 
To review, see Types of Data Models.

 

3b. Distinguish between the disadvantages and advantages of different data models

  • What are the advantages and disadvantages of the E-R model?
  • What are the advantages and disadvantages of the relational model?
  • How can E-R and relational models be used to develop a database?
Conceptual data models allow us to represent the business process through a typically graphical representation. The most common type of model is the entity-relationship (E-R) model. The E-R model starts by allowing us to identify physical entities in our business process and identify the attributes of these entities and the relationships among them. An entity is anything we find in the real world. It can be physical, such as a customer, or conceptual, such as a project. The entity is composed of attributes representing properties like a customer's name, address, and age. A relationship is an association among entities; for example, a customer might buy many different products. A relationship then exists between the customer and the product.
 
A primary advantage of the E-R model is that it makes intuitive sense to non-technical business process owners. Other advantages include its use of non-technical, non-mathematical terminology. The main disadvantage of the E-R model is that such models cannot be directly converted to relational database implementations. A relational model must first be derived from the E-R model, and then that relational model is used to implement the physical structure of the database itself.
 
Older logical models might represent how the data is stored in the computer disk drives. The two most common of these older models are network data models and hierarchical data models. The two older models, hierarchical and network models, were developed before the widespread availability of commercial relational database management software and allowed for the development of databases that could run on older, non-relational software. Over the last several decades, the relational model and relational database software have come to predominate the marketplace. As a result, almost all new development will be done using the relational model. However, the older two models are still encountered in practice, especially on projects that are updating older systems.
 
The primary advantage of working directly in the relational model is that a relational model can be easily converted into a relational database. There are many automated design tools available to do this. The main disadvantage of the relational model is that it requires a certain level of understanding of relational design and uses somewhat arcane terminology that can make it difficult for business process owners to grasp intuitively.
 
To review, see Data Modeling and Metadata Management.
 

3c. Describe why each data model, ERD and Relational, are used, and the benefits and drawbacks of each

  • What are ERD and Relational modeling?
  • How do ERD models flow from logical models and business rules?
Logical data models are a way to translate business concepts for data purposes/uses into Entity-Relationship diagrams. Logical data models describe data in as much detail as possible. This model is more complex than conceptual models. Logical data models establish column (data) types to include all entities and relationships among them. Remember, logical data models have nothing to do with creating a database. We want to translate the data elements and identify how they are related through key fields.
 
Businesses define how entities or tables are related in the business process. A typical relationship characteristic is called cardinality. Cardinality expresses the minimum and the maximum number of entity occurrences associated with one occurrence of another entity.
 
In the following figure, from a database used by an educational institution to relate teachers to classes, the cardinality is indicated with the "crow's feet", where the business rule states that one teacher can teach many classes. Still, one class can only have one teacher, as indicated by (1:N). Notice also that the O symbol and the double vertical line symbols are required versus optional. In other words, each class must have a teacher, but each teacher does not have to teach any particular number of classes.



The entity-relationship (ER) data model has existed for over 35 years. The model is very useful for data modeling for use with databases. The model is abstract enough to represent complex situations, yet it is fairly intuitive and easy to convey to the business user. In addition, analysts can easily translate ER models into relational models. We call the graphical representation of an ER model an ER diagram.
 
ER modeling depends on two fundamental concepts:
 
  1. The entities themselves, with data defined as tables that hold specific information
  2. The relationships between the entities that define their relationships and interactions.

To review, see Physical Data Models.

 

3d. Apply E-R modeling to several practical examples

  • How is E-R modeling applied in practice?
  • What is the role of the database administrator in E-R modeling?
The database administration process and the database administrator (DBA) are responsible for the design and administration of data models and the data integrity constraints included in those models. Missing data elements are likely caused by poor data integrity controls and would thus result from poor administration. Database administration is the function of managing and maintaining database management systems (DBMS) software. As a part of this, database administrators are responsible for the data modeling and design process and ensure that operational databases are designed to high professional standards.
 
This figure provides an example in a practical case. In this case, we are storing information about students in a database being used by an educational institution. Notice the structure of the table. The fields are identified, the key field is identified, and the non-key fields are identified.



Notice that this type of table structure can be used both to facilitate communication between the business and technical staff on the development team and to support the later construction of the relational model and the design of the physical structure of the relational database.
 
During the process of conceptual modeling, the users and business process owners have identified the entities involved in their business process. Consider an academic institution. Two entities of interest are the teachers and the classes that those teachers are assigned to. During the modeling process, the business rules were determined, and one of those rules is that teachers may teach many different classes but that each class is only taught by a single teacher. Notice that this would be reflected in the relational model figure using "crow's feet" that identify the cardinality 1:N. One teacher can teach a number (N) of courses. Since a given teacher may not be teaching any courses at a point in time, this is indicated by the 0 on the relationship. However, each class has a teacher. This is indicated by the two vertical hash marks on the relationship.
 
Notice also that the key attributes have been identified for each entity and are underlined. Non-key attributes are not underlined.
 
By identifying all of the entities and the relationships between those entities, we have constructed a model of the business rules. This E-R model can then be used to implement the relational database to support the business process.
 
To review, see Lessons in Data Modeling.

 

3e. Analyze a database system for security flaws

  • What are some of the ways that we can analyze a database to identify security flaws?
Like any system, the security of a database processing system is both necessary and critical. Because organizations store large amounts of data, much of it proprietary, confidential, or subject to legal requirements for security, it is essential that database security is "designed in" and that any flaws be identified and corrected.
 
There are several dimensions to database security. Because most databases are used in a distributed, multiuser environment, access and privilege management is critical. Users and administrative staff should only be given access to those data elements required to perform their job or execute the allowed function. Most modern database management systems allow for much granularity and specificity concerning access; a common flaw is the overbroad granting of access. Great care should be taken to rigorously examine users' actual needs and grant them only the minimum required privileges.
 
In addition to the determination and granting of privileges to users authorized to access the system, we are also concerned with others who have no legitimate need to access the system gaining entry. We need to rely on the functionality built into the database system and the network operating environment for these situations. A somewhat trivial though prevalent flaw is the management of access passwords. The administrator should ensure that users are required to select strong passwords and that procedural and training processes are in place to avoid users sharing those passwords. Strategies like two-factor authentication, requiring frequent changing of passwords, and prohibiting the sharing of passwords are examples.
 
Other, more technical flaws may also exist in database systems. Because SQL is used extensively with relational database systems, security flaws can be introduced into the system through improperly developed SQL code. For example, SQL injection attacks are possible because of sloppy coding practices.
 
Data lineage includes the data origin, what happens to it, and where it moves over time (essentially the full journey of a piece of data). This page explains the concept of data lineage and its utility in tracing errors back to their root cause in the data process. Data lineage is a way of debugging big data pipelines, but the process is not simple. There are many challenges, such as scalability, fault tolerance, and anomaly detection.
 
To review, see Database Security.

 

3f. Design mitigating systems and procedures to address security flaws

  • What are some of the key strategies for addressing database security flaws?
  • Why is administration as important as technical processes in addressing security flaws?
As security policies are developed to support business operations, good security practice ensures that data is only made accessible to those staff with a documented business need to access the data. Security policies should be designed so that only those users who have a legitimate need to access particular data are given access to that data. This is particularly important in the case of sensitive information like customer data.
 
A flexible privileges policy should allow different people to have different levels of privilege. The privilege level assigned to a given user would depend on the roles and responsibilities of that individual. Careful attention should be paid to this. Good security practice is ensuring that data is only made accessible to those staff with a documented business need to access the data.
 
The administrative team should practice good password management. The administrator should ensure that users are required to select strong passwords and that procedural and training processes are in place to avoid users sharing those passwords. Strategies like two-factor authentication, requiring frequent changing of passwords, and prohibiting the sharing of passwords are examples.
 
Careful attention to developing, testing, and constructing SQL code can mitigate against SQL injection attacks.
 
To review, see Database Security.

 

Unit 3 Vocabulary

This vocabulary list includes the terms that you will need to know to successfully complete the final exam.
 
  • attribute
  • cardinality
  • database administrator (DBA)
  • entity
  • entity-relationship (E-R) model
  • data lineage
  • password management
  • relational model
  • SQL injection