CS403 Study Guide

Unit 10: The Join Statement

10a. explain the different methods for joining tables

  • Which relational algebra operation corresponds to the SQL JOIN statement?
  • Why is there a join statement if the same result can be obtained from subqueries?
  • Describe the various types of JOINs?
  • What is a virtual table?
  • What is a view?
  • Since a SQL JOIN statement corresponds to a derived virtual ( JOIN creates a virtual or temporary table composed of the base tables in the database) operation, why is the JOIN necessary? Couldn't we just use the operations that it is derived from?

A major benefit of a relational database is the capability to relate tables. For integrity, tables are designed to be simple and represent atomic elements of an application domain. However, interesting information is represented by relationships, often complex relationships among the tables. Multiple SELECT statements can be used to join and query multiple tables to retrieve that information. SQL, however, provides The JOIN statement, which is often more effective and efficient, to combine related tables into a virtual or temporary table that incorporates and, from which, it retrieves the information of interest. 

Recall from the relational algebra that one way to combine two tables is via the product operation: if A and B are two tables with columns (A1, A2, ..., An) and (B1, B2, ..., Bm) then the product is the table (A1, A2, ..., An, B1, B2, ..., Bm). However, the product contains all possible relations, n * m. For a given domain of application, only some of the attributes will be related and, thus, only comprise a subset of the product. The relational algebra operation, natural join, combines two relations if they have the same attribute. Suppose Ai is an attribute of A and Bj is the same attribute. Then, a tuple (a1, a2, ..., ai, ..., an, b1, ..., bj-1, bj+1, ..., bm) is a tuple in the natural join of A and B, if and only if ai = bj). The natural join is a reasonable relationship of two tables that has meaning for most application domains. The natural join operation corresponds to the SQL Equi JOIN statement.

A SQL join is faster than subqueries. Moreover, it is more efficient for a user, since the SQL translator checks that conditions are satisfied, the SQL translator optimizes the execution plan which includes selecting from several or more tables (vs. separate execution plans for each subquery select command - this assumes that the SQL translator is a traditional interpreter), and a join lets a user work at the level of relations, instead of the level of columns and conditions. 

There are several types of joins, including, natural, equi, inner, and left outer, right outer, and full outer join. These types are similar, in that, they select rows from two tables, A and B, and combine them to produce a subset of the rows of A x B. They differ in how the rows are selected from A and B, and how the selected rows are combined. Here is a table that summarizes how each type of join selects records from A and B:

 

What columns are checked?

What rows are selected?

How are the selected rows combined?

What is the result table of the statement?

Comment

Natural Join

Column, say Ai, in A and, say Bj, in B that has the same name

For each value in the Ai column, select every row in B where Ai = Bj

Concatenate the select rows but only keep only one of the common column values, ie. the value of Ai or the value of Bj

Project the selected columns from the concatenated rows

Natural Join does not use an ON clause to specify what is compared and what comparison operator is used. The common column values are assumed to be compared using equality.

Equi Join

The columns checked are specified in the ON clause

If a value of one of the specified columns is equal to a value of the other specified column, select their corresponding rows.

Concatenate the selected rows, keep both of the values of the specified columns. 

Same, namely, project the selected columns from the concatenated rows

The columns whose values are checked are specified in the On clause

Inner Join

Same as the Equi Join

Same as the Equi Join except the comparison can be another comparison operator; it is not restricted to the equality operator

Same as Equi Join

Same, namely, project the selected columns from the concatenated rows

See the picture in

Pictorial representation

Left Outer Join

The columns checked are specified in the ON clause

Select all the rows that the Inner Join selects plus all the rows of the left table, i.e. the table in the FROM clause, which do not satisfy the join condition in the ON clause

Concatenate rows of the left table with those of the right table which satisfy the join condition in the ON clause; concatenate rows of the left table that do not satisfy the join condition with null values for the columns of right table, i.e. the table in the join clause

Same, namely, project the selected columns from the concatenated rows

See the picture in

Pictorial representation

Right Outer Join

The columns checked are specified in the ON clause

Select all the rows that the Inner Join selects plus all the rows of the right table, i.e. the table in the join clause, which do not satisfy the join condition in the ON clause

Concatenate rows of the left table with those of the right table which satisfy the join condition in the ON clause; concatenate rows of the right table that do not satisfy the join condition with null values for the columns of left table, i.e. the table in the FROM clause

Same, namely, project the selected columns from the concatenated rows

See the picture in

Pictorial representation

Full Outer

 Join

The columns 

checked are specified in the ON clause

Select all the rows that the Inner Join selects plus all the rows the Left Outer Join selects plus all the rows the Right Outer Join selects

Concatenate rows of the left table with those of the right table which satisfy the join condition in the ON clause; concatenate rows of the left table that do not satisfy the join condition with null values for the columns of right table, i.e. the table in the join clause; concatenate rows of the right table that do not satisfy the join condition with null values for the columns of left table, i.e. the table in the FROM clause

Same, namely, project the selected columns from the concatenated rows

See the picture in

Pictorial representation

A few final comments:

  1. The Cross Join produces the cartesian product of two tables; the cartesian product of two tables is the set of rows where each row of one table is concatenated with each row of the other table, and has n * m rows, where one table has n rows and the other m rows. The cartesian product is the 'universe' set for the joins - i.e. each join is a specific subset of the cartesian product. 
  2. Joining can be performed using just the SELECT statement. In particular, the two 'equivalent' commands, the SELECT query vs the SELECT JOIN query. Recall that the SELECT JOIN has user and execution advantages.

 

10b. construct advanced queries of two or more tables using join operations

  • What are some advanced types of queries using SQL JOIN commands or operations?
  • Describe the ORDER BY clause?
  • Describe the GROUP BY clause?
  • Name the type of key associations used to join multiple tables.

The following clauses are advanced, in that, they involve more than one table or they add features that modify or extend the more commonly used commands:

  • ORDER BY used with the SELECT statement to sort records, after by a column(s), after they have been retrieved
  • GROUP BY used with the SELECT to organize retrieved rows (note: after they are retried) into groups conditioned on the value of a column(s) and on an optional HAVING expression, where the expression often involves an AGGREGATE or COUNT function
  • SQL UNION, which corresponds to the relational algebra union operator; SQL UNION is used to combine the retrieved records from several SELECT queries, assuming that the output tables are union compatible
  • CREATE VIEW is used to create a view, which is a virtual table: it is essentially a query and the result of the query is the content of the view. A view does not have any data content of its own; its data content is stored in other tables. A view can be treated as a regular database table (called a base table), except there are restrictions related to changing data via UPDATE or INSERT. 
  • CREATE INDEX is used to create an index for the columns of a table to immediately access a column, and to order the rows of the column to quickly search the rows of a column.

The WHERE clause filters records from tables, then the GROUP BY clause organizes the records into subgroups, then the HAVING clause filters the subgrouped records, and then the resulting subgrouped records can be sorted by the ORDER BY clause.

When the GROUP BY clause lists more than one column the number of groups is multiplicative – if the first column listed was 'Year' and the second column listed by 'Month', where there are 10 year groups, and there are 12 month groups, then the final organization would have 10 * 12 = 120 subgroups. At the end, when the 120 subgroups have been retrieved, they can be sorted using the ORDER BY clause to order the 12 month subgroups within each year subgroup, and to order the year subgroups into, like ascending order by year.

CREATE INDEX may not be part of a particular ANSI Standard SQL version.

In summary, SQL base commands and operations are sufficient for searching a relational database. Since a relational database is organized as a related collection of normalized tables, needed information is often contained in multiple tables. A user, therefore, needs to understand the table organization in terms of columns and rows, table access via primary and candidate keys, and the relationship of tables via foreign keys. In addition, to the base commands and operations, SQL provides derived operations that are more user convenient and, often, more efficient, to query for information that is spread across multiple tables. JOINS are powerful derived commands. 

The best preparation for the final exam includes trying out the SELECT statement features on the sample database you defined and updated in the SQL Sandbox.

 

Unit 10 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course. 

Try to think of the reason why each term is included.

  • Joining tables
  • SELECT
  • JOIN
  • Query
  • Subquery
  • Virtual table
  • View
  • Product of tables
  • Natural Join
  • Equi Join
  • Inner Join
  • Left Outer Join
  • Right Outer Join
  • Full Outer Join
  • Cross Product
  • ORDER BY
  • GROUP BY
  • UNION
  • VIEW
  • INDEX
  • HAVING clause