CS403 Study Guide

Unit 7: Introduction to Data Normalization

7a. describe normalization

  • What is normalization?
  • What is its purpose and benefits?
  • Are there disadvantages to normalization?
  • How is normalization performed?

Consistency increases as dependencies decrease and relationships are simplified. More dependencies enable more anomalies. Normalization is a procedure that increases consistency and performance by decomposing complex tables into simpler tables and minimizing functional dependencies. Normalization is performed during model detailed design; it improves design by helping to prevent design problems, including modification anomalies (deletion, insertion, update anomalies). Normalization is highly automated by software tools and therefore the procedure does not require much user effort. The goals of normalization support the goals of database modeling and design. Those goals identify the purpose and benefits of normalization.

The main disadvantage of normalization is 'composition' effort. Since normalization decomposes complex tables into simpler tables, the formation of complex queries takes more user effort and more execution time retrieving information from multiple simpler tables and composing it to obtain the information that answers the complex query. User effort also includes the effort to understand the table structure of the database to compose the needed tables. 

The normalization procedure transforms a table step by step from normal form 0 to normal form n, where n is usually 3 or 4. Normal form 4 is called BCNF (Boyce-Codd Normal Form). Review these normal form definitions and examples in this article.

In summary, normalization is a step by step process for transforming a table into simpler tables, by removing a certain type of dependency at each step. Normalization is usually performed when a table is being designed and when it is being updated. Normalization prevents data anomalies that cause data inconsistencies (update anomaly), unintended loss of data (deletion anomaly), or the inability to add data (insertion anomaly).

To review, read Chapter 9 of Database Systems for Management.

 

7b. discuss and contrast the different types of dependencies

  • What is a functional dependency?
  • What is a database functional dependency?
  • How does a functional dependency affect database integrity and consistency?

In general, a functional dependency refers to a function, denoted F: D ---- >R where D is a set that represents the domain values of the function and R is a set that represents the range of the function. F(d) = r, means that the value r in R is determined by d in D. We say that r is functionally dependent (via F) on d, and d is called the determinant. Given a d in D, there is only one r such that F(d) = r

It is a misconception to assume that a determined attribute value can not appear more than once in a column; and second, it is a misconception that dependency is reflexive – if d determines r, don't assume that r determines d.

The mathematical theory of functional dependencies provides us with rules that help us infer all the functional dependencies in a database. We apply the rules to each table in the database. Five important rules are reflexivity, augmentation, transitivity, union, and decomposition.

As mentioned previously, functional dependencies risk the integrity of a database by introducing risks of anomalies that could lead to data inconsistencies. To avoid those risks, we avoid dependencies when we are designing the tables; and we check for dependencies when we maintain the tables. 

Review Chapter 12 of Database Design and Chapter 5 of Database Systems for Management.

 

7c. construct a dependency diagram

  • What is a dependency diagram?
  • Why use a dependency diagram?

A dependency diagram is simply a diagram that depicts the dependencies among the values of the attributes of a relational table. There are various kinds of representations that can be used to represent any information. For example, there are pictures, graphs, matrices, text, and numbers. They are equivalent, in that they represent the same information. They differ in their suitability for a particular type of user or application. For example, a state diagram is a pictorial representation of a state machine that is suitable for human users and for depicting behavior. A matrix can also be used to represent that state machine and is suitable for calculations, especially, automated calculations, by a computer. Another familiar example is a graph of a curve versus an equation that represents that graph. What purpose does each serve? A dependency diagram is suitable for human users; it depicts the amount and complexity of the dependencies. A dependency is like a wire that connects signals between two devices – imagine the complexity of hundreds of wires interconnecting a relatively small number of devices!

A dependency diagram is a graphic aid that documents dependencies and helps us apply the dependency rules to find all the dependencies, which can then be avoided. Thus, dependency rules and diagrams are database design aids that help us produce normalized designs and databases that have integrity and consistency.

Review Chapter 11 of Database Design and Chapter 5 of Database Systems for Management.

 

7d. normalize a relation to at least third normal form

As with E-R modeling, relational modeling, and the relational algebra, practicing with examples is necessary to understand normalization and to prepare for the final exam. 

  • What is the process and procedure for normalization of a set of tables?
  • Give an example of a table that violates the first, second, third, and BC normal forms.

Normalization of a table starts by assuming that the table is normal form 0. Step 1 checks if it is normal form 1 by checking that each cell (i.e. intersection of a column and a row) has only one value (no repeating groups). If there is a repeating group, the table is not normal form 1. To make it normal form 1, the table is split into two tables, where the first table contains the non-repeating attributes and the second table contains the repeating group. The primary key for the second table is a composite key composed of the primary key of the first table and a key value from the repeating group. Repeat this step for each repeating group.

Step 2 checks if the tables from step 1 satisfy normal form 2 (they are already 1NF from step 1). If each primary key value consists of a single value, the tables are 2NF. If a primary key is a composite value, then each attribute that depends on that composite key must depend on it in its entirety; not on only part of the composite key. If a table is not 2NF, remove the information that is not fully dependent on the primary key, and put it in a separate table. Define a primary key for that second table.

Step 3 checks if the tables from step 2 satisfy normal form 3 (they are already 2NF from step 2).

If the tables from step 2 do not have any transitive dependencies and a non-key attribute is not fully functionally dependent on another non-key attribute, they are 3NF. Otherwise, if a table has a transitive dependency, remove it and put it in a separate table; if a table has a non-key attribute dependent on another non-key attribute, remove it and put it in a separate table. Repeat this step until all tables satisfy 3NF.

Normalization steps continue for each type of normal form. Step 4 checks for BCNF (Boyce-Codd Normal Form), which means every determinant is a candidate key. Steps higher than 3 are not always used. If a table has a determinant that is not a candidate key, split it off and put it in a new table. Note that dependencies often come from business rules. 

To review, read Chapter 12 of Database Design.

 

Unit 7 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course. 

Try to think of the reason why each term is included.

  • Data Integrity
  • Consistency
  • Functional dependency
  • Anomaly
  • Normalization
  • Dependency diagram
  • 0NF
  • 1NF
  • 2NF
  • 3NF
  • BCNF
  • Reflexive
  • Augmentation
  • Transitive
  • Union
  • Decomposition