### Unit 6: Relational Algebra

We have seen that database entities can be viewed as logical tables. While this is useful in its own way, we can learn more from the data if we can perform operations on the tables within a database, as data from one table may not be meaningful without the data from another table. In this unit, we will introduce relational algebra, the mathematical notation used to represent how data retrievals and updates are performed on tables in a database. Understanding relational algebra will serve as a prelude to using the Structure Query Language (SQL).

One of the overall themes of computer science is commonality: common components are useful for building many kinds of applications. A database is one of these components, and its usefulness is due to its effectiveness and efficiency in creating, storing, and operating on all types of data. Relational algebra covers basic operations and composing them to form complex queries. Relational algebra is a mathematical system, or model, that formally specifies queries of a relational database, and is implemented as a formal language, SQL. A query against a database can be expressed as a SQL statement in more than one way, each having the same semantics. Relational algebra enables optimization of SQL queries, and allows you to structure queries in such a way that they execute more efficiently.

**Completing this unit should take you approximately 5 hours.**

Upon successful completion of this unit, you will be able to:

- explain the basic relational algebra operations;
- discuss relational algebra set operations; and
- explain what a derived operation is.

### 6.1: Basic Operations

An algebra consists of elements and operations that combine elements to produce new elements – also known as composite elements or expressions. An algebra also includes a definition of equivalence, which is when two expressions are equivalent. This subunit defines the basic operations for relations.

The elements of relational algebra are 'relations', which are tables. Operations on a table or set of tables result in a table. A relational algebra expression is a combination of operations that specify the order in which the operations are performed. In SQL terminology, this is the order in which basic SQL commands are performed to execute the query. This page explains the basic relational algebra operations.

Relational algebra is used to express a query as a sequence of basic operations which, when executed, will result in the relation that gives the query result. Relational algebra specifies 'how' to obtain the result. Relational calculus, on the other hand, is used to specify the desired result. This is 'what' the result is. The distinction is a difference of perspectives: relational algebra focuses on operations for constructing sequences of operations to produce a desired table, and relational calculus focuses on a relational expression that defines the desired table. Review pages 6–7 to 6–12 and study the case examples. Tables for the examples are included at the end of the chapter.

This lecture explains the division operation and demonstrates an implementation using fundamental operations.

### 6.2: Derived Operations

A combination of relations using basic operation is called a derived relation.

The first 4 minutes of this video characterize relational algebra relative to relational calculus. Relational algebra has a set of basic operations, as well as derived operations, like Join, which can be written as a composition of the basic operations. Since these operations are used frequently, they are made into a single command, the derived operation.

The last 10 minutes introduce set operations, which are used by relational calculus statements. The middle 20 minutes or so of the video work through several small examples of relational algebra. In this resource and other resources, the terms set, multi-set, and bag are used. A multi-set is simply a set that includes duplicates. A bag is just another name for a multi-set.

Read chapter 16, which explains SQL DML (data manipulation language), which is used for making queries to a database. Queries specify operations on relations that result in desired information. A combination of relational operations is called a derived operation. SQL DDL (data definition language) is used to create a database. A DML query is a SQL statement or command that is a high-level programming language representation for a relational calculus expression. The relational calculus expression is translated by the SQL optimizer (the query planner) into a composition, or a derived operation, of relational algebra operations that specify how the SQL statement is executed.

### 6.3: Set Operations

Some relational algebra operations are based on mathematical set operations. Relational calculus statements, which specify a desired result, are interpreted as operations on sets.

Review pages 6-12 to 6-19 and study the examples. These pages discuss relational calculus, which utilizes mathematical set operations. Basic relational algebra set operations are discussed earlier in the chapter, on pages 6-1 to 6-12. Tables for the examples are included at the end of the chapter.

### 6.4: Aggregate Functions

An aggregate function is a function defined on a relation and it returns a single value. A typical example is a function that returns the average of a set of values.

Read chapter 7, which discusses SQL statements. It also introduces aggregate functions, which return a single numeric value when given a single column of numeric values. Study the aggregate function section.

### Unit 6 Assessment

- Receive a grade
Take this assessment to see how well you understood this unit.

- This assessment
**does not count towards your grade**. It is just for practice! - You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
- You can take this assessment as many times as you want, whenever you want.

- This assessment