The elements of relational algebra are 'relations', which are tables. Operations on a table or set of tables result in a table. A relational algebra expression is a combination of operations that specify the order in which the operations are performed. In SQL terminology, this is the order in which basic SQL commands are performed to execute the query. This page explains the basic relational algebra operations.
Relational algebra is the mathematical formalization of what happens in relational databases.
Prior to relational databases, there were various kinds of databases (hierarchical, among others), but there was little regularity among them. The formalization of relational databases led to a big step forward in systematizing database technology.
Why is it still useful? Because it gives us a standard internal representation of a query that allows for query optimization: ways to re-write a query that is:
For example, consider the relative costs of executing the following:
The latter is equivalent, but a lot faster to execute. The point is that algebraic manipulations of an expression can yield another expression that has the same results but executes faster. That's how query optimization works.
The minimal relational algebra operators:
CREATE TABLE elevation ( name CHAR(10), height INT );
CREATE TABLE population ( name CHAR(10), pop INT );
Slightly different from mathematical cross-product, but not a concern. Major problem is the size of the result. Quadratic is bad enough!
Additional (derived) operators:
You might try this exercise outside of class. We'll do some more exercises in class.
Write the relational algebra notation for our bi-coastal people. Once you've found your own solution, click on the link to see one possible solution. Assume we have the relations
A join is just a cross-product followed by a selection, but it's better to define it directly because that's (1) more efficient and (2) more useful.
General Join (AKA theta joins) just puts the selection condition in the join operator. Only conjunction is AND.
Equijoin is a special case where all the conjuncts are equality. This knits tables related by foreign keys together.
Natural Join is the special case that is also the most common. It's an equijoin with equality on all identically named attributes, projecting out the duplicates. Therefore, no subscripts necessary!
To facilitate this, it's a good idea for all attributes for a particular thing to have the same name across different tables. Not StudID here and ID there, but always StudID.
However, as natural as it is, there's no special support for it in standard SQL! You still have to write out all the equality conditions! Insane. Fortunately, MySQL (and other DBMSes) treat us better.
We saw this before, but just to recap:
Columns from the other table are padded with null.
Q: Explain the following:
mysql> select count(*) from credit; +----------+ | count(*) | +----------+ | 473 | +----------+ mysql> select count(*) from person natural join credit; +----------+ | count(*) | +----------+ | 473 | +----------+ mysql> select count(*) from movie natural join credit; +----------+ | count(*) | +----------+ | 473 | +----------+ mysql> select count(*) from person natural join credit natural join movie; +----------+ | count(*) | +----------+ | 236 | +----------+ mysql> select count(*) from person,credit,movie where person.nm=credit.nm and movie.tt=credit.tt; +----------+ | count(*) | +----------+ | 473 | +----------+
The "division" operator is diabolical but fortunately rare. It also has no natural counterpart in SQL, but we will see how to do queries that demand division. The idea:
Example: Relation R is
R(professors,courses) where there is a tuple if professor P teaches course C.
List students who have taken all the CS courses, where there are relations
As a query, we list students where there isn't a CS course that they didn't take:
SELECT S.id FROM Student S WHERE NOT EXISTS ( -- all CS courses (SELECT C.crsCode FROM Course C WHERE C.crsCode LIKE 'CS%') EXCEPT -- This student's courses (SELECT R.crsCode FROM Transcript R WHERE R.StudID = S.id) )
MySQL doesn't seem to have the EXCEPT operator, but we can accomplish the same result using EXISTS and NOT IN:
The idea is to search a space of equivalent algebraic expressions, estimating the run time of each, and trying to find the minimum.
σcond A(σcond B(R)) ≡ σcond B(σcond A(R))
σcond A ∧ B(R) ≡ σcond A(σcond B(R))
πX(R) ≡ πX(πY(R))
R × S ≡ S × R
R × (S × T) ≡ (S × R) × T
πX(σcond A(R)) ≡ σcond A(πX(R))
σcond A(R × S) ≡ σcond A(R) × S
πX(R × S) ≡ πX(R) × πX(S)
Estimation of cost is done essentially with big-O notation, plus estimates of the number of tuples in different relations and in estimated results of sub-queries.
Names of directors of movies namedHamlet
mergesortidea (see below) and ignore the index.
Names of directors who are under 50 and directed movies released after 1990.
Movies with a rating of at least 4 and with at least 5 listed cast members.
To find the intersection or union or difference of two sets, you need to make sure there are no duplicates. In general, we are
merging the sets. You can, of course, merge them and remove duplicates afterwards, but it is often faster to remove them as we merge.
One of the most efficient ways to remove duplicates is to sort so that identical elements end up next to each other. Sorting is O(N log N), and most any other algorithm will end up being O(N2), as you iterate through one set, looking in the other.
The mergesort algorithm has two sorted subsets and merges them in O(N) time to yield a sorted merger. We can use that idea to produce the union of the two subsets, the intersection, and so forth.
Source: Scott D. Anderson, http://cs.wellesley.edu/~webdb/lectures/16-Relational-Algebra/
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 1.0 License.