As you are studying databases, strive to read as many database examples as you can. This resource includes an example E-R model for a SQL movie database.
Sub-queries are very powerful and expressive. You can use them in at least three ways:
- Returning multiple columns. You can treat the subquery like a table and put it in the FROM clause.
- Returning a list of values. You can treat the subquery like a set and have conditions like NOT IN (select ...)
- Returning a single value (e.g. longest album). You can put the subquery wherever you might put the single value
Exercise on Sub-Queries
Devise a query to list the name of the youngest person in the WMDB. The oldest?
- Pictoral, high-level design of entities and their relationships.
- Can be systematically converted into MySQL tables.
- Not all info can be captured.
- There are many variant drawing methods; we'll use one.
WMDB: Wellesley Movie Database
Let's discuss the WMDB tables for a few minutes:
- Person (nm, name, birthdate, addedby)
- Movie (tt, title, release, director, addedby)
- Credit (nm,tt)
- Staff that adds people and movies.
Thinking of them as sets, what is the relationship between movies and directors, at least in the WMDB? What is the relationship between movies and actors?
What is a SQL query to look up all movies directed by Clint Eastwood?
- Goal is to accurately model the "world" that we are trying to capture in our database.
- Pictorial tool.
- Eventually "translated" into SQL DDL (table definitions). This translation can be rough sometimes.
Entities are the objects or nouns we are interested in. Relationships are the connections or verbs we care about.
No standard drawing, although some are very common.
Afterwards, convert ER diagrams to CREATE TABLE statements
- Entities: like objects. Entity Types (Entity Sets) are like classes
- Attributes: properties or characteristics of entity types. May be set valued. For example, a "genre" field added to our WMDB might be set-valued: Mystery, Drama, Comedy, Horror, etc. A movie might fit in more than one genre.
- Domains: the possible values of an attribute.
- Schema: The collection of entity type name, attributes, whether they are set-valued, and all that.
keys and key constraints
- a set of attributes that uniquely specifies an entity. E.g. SSN, or (FirstName,LastName,ClassYear)
- no subset of those attributes works
- The key should be unique by definition, not by happenstance. (E.g. names in the hwk1 example)
- There may be more than one key. Can you think of two or more keys for a Wellesley student?
- The database should enforce the key constraint.
Converting Entity Sets
- entity set → relation/table
- attribute → attribute/column
- But, set-valued attributes may require additional tables to avoid redundancy.
What attributes would you have for a movie in the WMDB? What would the key be?
What attributes would you have for cars at a new car dealership? What would the key be?
- Relationships: connections among entities. Binary, Ternary, more
- Entity sets play roles in relationship sets.
- Relationship sets connect entity sets.
- Relationship sets don't connect relationship sets.
- Relationship set → relation/table
- key is usually union of keys of roles
- Entity Sets are boxes
- Attributes are attached ovals
- Keys are underlined
- Relationship sets are diamonds, connected to entities
- Relationship sets can have attributes, too, as attached ovals
- If entity set is a key (one to many), draw an arrow to the diamond
- If full participation is required, line is thick
createtable person( nm int, namevarchar(30), birthdate date, primary key (nm)); -- as of MySQL 5, "release" is a reserved word; -- have to put the field name in backquotes. createtable movie( tt int primary key, title varchar(50), `release`char(4));
Remember this fact about the word
What about directors? That's a relationship between the two entities:
In our world, a movie only has 1 director, so this is a one-to-many relationship between Movies and People. Therefore, we draw an arrow from Movie to Directs, because Movie is a key of the relationship set. This is the key constraint.
createtable movie( tt int, title varchar(50), `release`char(4), director int, primary key (tt), foreign key (director) references person(nm) ondeletesetnullonupdatecascade);
We also have a one-to-many relationship between People and Staff via the AddedBy relationship. Similarly for movies. We have a one-to-one relationship between Staff and UserPass:
In this case the relationship disappears. This doesn't always happen with a one-to-one relationship, but almost always. (In our case, we could even absorb the password into the staff table.
When would we not want to do this?
createtable userpass( idint, username varchar(20), passwordchar(15), primary key (id), unique (username));
What about a many-to-many relationship? How about movies and people?
Here, we have to have the intermediate table representing the relationship. Essentially, the relationship is two one-to-many relationships.
createtable credit( tt int, nm int, primary key (tt,nm), foreign key (tt) references movie(tt) ondeletecascadeonupdatecascade foreign key (nm) references person(nm) ondeletecascadeonupdatecascade);
A relationship can have attributes as well. Here, we record the date that a student declared a particular major.
createtable majorsin( sidint, did int, decl date, primary key (sid,did), foreign key (sid) references student(sid) foreign key (did) references department(did));
Roles can be labeled. Here's an example.
The roles usually become column names:
createtableadvise( advisor int, advisee int, foreign key (advisor) references person(ssn), foreign key (advisee) references person(ssn), unique (advisor), unique (advisee));
Here's a (heterosexist, sorry) example:
createtable marriedto( husband int, wife int, foreign key (husband) references person(ssn), foreign key (wife) references person(ssn), unique (husband), unique (wife));
What if we had the primary key in
(husband,wife)? What would the implications be?
Relationships can be Ternary as well as Binary, though this is rare.
Here's an example. The idea is that there are generic
parts in the world, such as 2-inch #6 bolts or AA batteries. There are
suppliers, such as Eveready, Duracell, and others. There are
projects that need parts like bolts and batteries. If the project leader buys a particular product from a particular supplier for a given price on a given day, that event creates a relationship among three entities.
The table representing the relationship just has three foreign keys. Yikes, look at that primary key!
createtableproject (proj_id int, proj_title varchar(50)); createtable part (part_id int, sku char(10)); createtable supplier (supp_id int, namevarchar(50), address varchar(50)); createtable sold( customer int, product int, supplier int, `date`date, price float, primary key (customer,product,supplier,`date`), foreign key (customer) referencesproject(proj_id), foreign key (product) references part(part_id), foreign key (supplier) references supplier(sup_id));
Sketch an ER diagram for a student work-study program. What entities are there? What relationships? Are those relationships 1-1 or 1-many or many-many? Note that there's no single right answer to this.
- AddedBy relates Movie and Staff
- Works_in relates Profs and Departments
- Mentors relates Students and Students (big/little sister)
- Dates relates People and People
- Married relates Men, Women, Parsons (officiant)
If every X must have a Y, this is called a participation constraint. Examples:
- Every Movie must have a director
- Every Prof must have a Dept
- Every Dept must have a Chair
In an ER diagram, this is represented with a thick line or (sometimes) a double line:
In SQL, we can enforce a participation constraint with a "not null" and "foreign key" constraint. For example:
droptableifexists movie; createtable movie( tt int, title varchar(50), `release`char(4), director intnotnull, primary key (tt), foreign key (director) references person(nm));
However, this can be annoying:
- What if the value is unknown? You can't enter the data you do not know.
- What if the value is temporarily vacant? For example, a department doesn't have a chair for some interregnum period.
- An Employee ISA Person, as are Professors, Staff, and Students.
- A Professor ISA Employee, as is a Staff worker.
- What about contract employees, hourly, salaried, etc.?
- 1, 2, 3, 4th year students
Pictured using triangles:
Where would you put the following attributes?
- matriculation date
ISA hierarchies are converted to either of the following:
- to multiple relations, using key of supertype and extra attributes, OR
- to a single relation with all attributes, plus a distinguisher attribute.
- a references constraint to another table, such as a record in the Student table having a key for the Person table, OR
- include the other table's data, so the Student table has all the columns necessary for both oncampus and offcampus students, together with an indication of which kind the student is.
Limitations of ER
- Entity or Attribute? It's sometimes not clear whether something should be an entity or an attribute. Even something as simple as
However, if something participates in relationships, it must be an entity. Don't have a
deptattribute for a Student if there's a
Departmententity. Instead, put that entity in a relationship with the Department entity.
- Entity or Relationship? Should we have
directoras an entity, or
directsas a relationship?
- Ternary vs 3 Binary Relationships? Consider the
Exercise on Professors and Courses
For the following, assume we have (at least) two tables:
Prof (SSN,Name) Teaches Course(Dept,Number,Title)
Sketch the ER diagram you would want:
- Profs can teach the same course in several semesters, and only most recent offering is recorded
- Profs can teach the same course in several semesters, and each offering is recorded
- Every Prof teaches something
- Every Prof teaches exactly one course
- Every Prof teaches exactly one course and every course is taught by someone
- certain courses are team-taught.
Please remember the following syntax rules for ER diagrams, the way we're doing them:
- relationship lines connect boxes to diamonds; a box is never directly connected to another box
- arrowheads, if any, go on the diamond, not on the box
- In a one-to-many relationship, the arrow goes from the many towards the one. E.g. movie has just one director, but a person can direct more than one movie, so the arrow goes from movie to director.
- attributes are circled.
- attributes that are (part of) the key are underlined.
- Every entity set should have a key
- Use ISA for inheritance of attributes
- something that is a key to an entity set should not also be an attribute of another: instead, there's a relationship
A few more rules to keep in mind when converting ER diagrams to DDL:
- Not every diamond becomes a table. Some data can be "absorbed" into another table. (E.g. director is absorbed into the "movie" table)
- Many-to-many relationships may result in an extra table.
You can, of course, draw these diagrams with pencil and paper and either hand those in or scan them and email them to me. You can also create electronic drawings using Fireworks, Photoshop, the GIMP, or any of many other drawing applications. I did these with Xfig and with Inkscape.
Many students find that Google Slides and MS PowerPoint are sufficient.
The Wikipedia page lists several applications that have special support for ER diagrams. One I've played with is Dia which you can install on Windows and Macs. (There may even be implementations for Linux.) Here is a picture drawn using Dia:
Source: Scott D. Anderson, http://cs.wellesley.edu/~webdb/lectures/15-Entity-Relationships/activities.shtml
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 1.0 License.