# CS403 Study Guide

## Unit 6: Relational Algebra

### 6a. explain the basic relational algebra operations

- What is an algebra?
- What is the relational algebra?
- What are the elements of the relational algebra?
- What are the operations of the relational algebra?
- What are some theorems of the relational algebra?
- What is the benefit of relational algebra for databases?
- Explain relational algebra calculations.

Algebra is a field of mathematics that studies a defined set together with defined operations closed on that set (closed means that the result of an operation is a member of the set). Given the definitions of the set and its operations, theorems pertaining to calculation and equality of expressions, are discovered and proved. An expression is a combination of set elements and operations that satisfy the definitions. For example, commutative and associative properties of an operation, say '*', may be proved. Commutative means that, given a and b in the set, a * b = b * a; associative means that, given a, b, and c in the set, a * (b * c) = (a * b) * c.

Relational algebra is a study of a set of tuples (a tuple is a list of elements, such as, '(a, b, c, d)', where a, b, c, and d are members of defined sets) and a collection of tuple operations. Moreover, a specific relational algebra models a portion of an application domain, and serves as a logical design for database language operations.

For generality, elements of a relational algebra are usually defined as tables, which are sets of tuples. A general table has n rows and m columns of tuples, which we denote, as an 'n x m ' table. As we have seen in earlier units, rows and columns are also called records and attribute values, respectively. Examples of tables, include; 1 x n or row tuple, n x 1 or column tuple, n x m or table, and 1 x 1 or atomic value. The list of attributes is denoted, 'name' (A1, A2, A3, ...., An) where 'name' is taken from the domain of application. The attributes are names for the columns, A1, A2, ... , An. Be careful to clearly distinguish records from record values; attributes, from attribute values. Values are members of the set, called the set of domain values, used to form the components of a tuple.

A relational algebra operation may be unary (operates on 1 table) or binary (operates on 2 tables), and results in a table that is in the defined set of tuples (that is, the operations are closed). Relational algebra operations of interest include: projection, selection, set union, set intersection, set cross product, set difference, and join. An operation may be primitive or derived; a derived operation is one that can be written as an expression using primitive operations, for example, join. If an expression occurs frequently, it is given a name and, thereby, is a derived operation.

It is surprising to learn that numerous theorems can be discovered, given only a few definitions. For example, review Relational Algebra, which gives examples of equivalent expressions, one of which may be more efficient or effective than the other.

It is not necessary that we know a lot of theorems or how to prove them, for several reasons. First, keep in mind that our goals for a database are achievement of requirements at reasonable cost, which we have called effectiveness and efficiency, respectively. The relational calculus is an effective model, as are other models. For example, we can write relational algebraic expressions that specify the operational steps that result in the retrieval of desired information from a database. A main benefit of relation algebra is its formality, or algebraic mathematical foundation. If the attributes and the domain values constitute an 'appropriate set' of tables, then the mathematical foundation enables us to write relational expressions in a database language (SQL) and utilize software tools to compile them into optimized implementation steps that efficiently retrieve desired information from a database. Software tools help determine the 'appropriate set' of tables for a relational algebra, by a process called normalization, described in a later Unit in this course. The translation of SQL commands into optimal code by software tools supports execution efficiency and effective satisfaction of database requirements.

From the database development process, we know that effectiveness and efficiency are affected by key development activities of other development activities, in addition to implementation. Analysis, design, implementation, test, and maintenance activities impact general effectiveness and efficiency. Knowledge of relational rules and theorems helps database users, designers, analysts, and administrators understand the best formulation of a SQL relational expression to optimize usability, design, verification, validation, change, and performance.

In summary, to explain relational calculus computations, let's take two perspectives, the design of a relational database and the use of the database for query transactions. When we design a relational database, we need to make decisions on the atomic elements, the base tables, and the table relationships. Those design decisions will affect the operation of the database. The decisions are made using knowledge of the application and business domain, supported by normalization processes and tools that help produce base tables that are optimal for effectiveness and efficiency of performance. When the database is used, understanding of relational operations and rules for transforming relational expressions, helps users formulate optimal queries, again supported by software translators that optimize the queries for efficient performance.

### 6b. discuss relational algebra set operations

The relational algebra is built on set theory and some of the relational algebra operations correspond to set theory operations. Those relational operations are called table operations.

- Which relational operations are table operations?
- What is union - compatibility?

The relational operators Union, Intersection, and Difference are the table or set operations of relational algebra. They are operations that operate on two tables; and they result in the set union, intersection, or difference of sets of rows. For example, the Union, 'U', of the two tables, T1 and T2, is the table T3 whose rows are the set union of the rows of T1 and rows of T2. Thus, the definitions of the relational set operations are specified in terms of the mathematical set operations of the same names.

The set union, intersection, or difference of sets of rows must result in a valid relational table. A condition, called union compatibility, is placed on the tables T1 and T2, to ensure that the combination of their rows is a valid relational table. Union compatibility means that the two tables have the same number of columns and the value domains of the corresponding columns are the same. For example, the values in the nth columns are of the same type.

### 6c. explain what a derived operation is

As with all algebras, operations can be written in several equivalent ways or forms. For the relational calculus, we want to write a query expression in a form that enables the most effective and efficient database development and performance. Usually, the simplest or atomic, i.e. 'smallest', form is the most desirable. Projection and selection are two atomic operations.

Typically, a derived operation or expression is a query that is used so often, that it is given a name. Then a user can specify that query by name, instead of writing out a complicated expression involving several tables and multiple operations. It is not always obvious which form for a query will result in greater effectiveness and/or efficiency. Sometimes the smallest/shortest, or simplest form should be used. For other situations, the most desirable form depends on the organization and complexity of the database tables and the type of information needed. Consider this example:

Let T be a table representing Students. Let its columns be represented by Student (Id, Name, Major, Year, Advisor), Suppose the domain of values for Year is {1,2,3,4}. Suppose a user query is 'Find all 2nd, 3rd, and 4th-year students?' Two equivalent query formulations are:

(first query) 𝛔_{Year=2}(Student) U 𝛔_{Year=3}(Student) U 𝛔_{Year=4}(Student) and

(second, equivalent, query) Student - 𝛔_{Year=1}(Student)

Which is most effective and efficient? What if the value domain for Year was 1,2,3,4,5,6,7,8,9,10 (for college, master's degree, doctorate, or professional degree)? The answer is 'it depends' on the situation and the database and the database requirements. Clearly, the second is more beneficial for a user if the query is submitted many times by many users. The first may be more beneficial if computing resources are limited. Or the second may be more beneficial if computing resources are sufficiently powerful to produce the desired information in real time. Which form is better can be determined either by complexity calculations (counting the number of primitive steps, and the execution time for each step involved in retrieving the information), or by executing test cases to determine the user and computer resources required by each query formulation; and by knowledge of and experience with the database.

### Unit 6 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

- Algebra
- Relational algebra
- Relational operations
- Tuple
- Table
- Column
- Row
- Unary operation
- Binary operation
- Equivalent expressions
- Closed operation on a set
- Primitive or base operation
- Derived operation
- Domain of values
- Effectiveness
- Efficiency
- Normalization
- Set or table operations
- Union compatible
- 'Best' query expression
- Complexity of an expression