## Entity-Relationships

As you are studying databases, strive to read as many database examples as you can. This resource includes an example E-R model for a SQL movie database.

### Sub-queries

Sub-queries are very powerful and expressive. You can use them in at least three ways:

• Returning multiple columns. You can treat the subquery like a table and put it in the FROM clause.
• Returning a list of values. You can treat the subquery like a set and have conditions like NOT IN (select ...)
• Returning a single value (e.g. longest album). You can put the subquery wherever you might put the single value

#### Exercise on Sub-Queries

Devise a query to list the name of the youngest person in the WMDB. The oldest?

### ER Diagrams

• Pictoral, high-level design of entities and their relationships.
• Can be systematically converted into MySQL tables.
• Not all info can be captured.
• There are many variant drawing methods; we'll use one.

### WMDB: Wellesley Movie Database

Let's discuss the WMDB tables for a few minutes:

• Person (nm, name, birthdate, addedby)
• Movie (tt, title, release, director, addedby)
• Credit (nm,tt)
• Staff that adds people and movies.

#### Exercise

Thinking of them as sets, what is the relationship between movies and directors, at least in the WMDB? What is the relationship between movies and actors?

What is a SQL query to look up all movies directed by Clint Eastwood?

### Entity-Relationship Model

• Goal is to accurately model the "world" that we are trying to capture in our database.
• Pictorial tool.
• Eventually "translated" into SQL DDL (table definitions). This translation can be rough sometimes.

### Conceptual Modeling

Entities are the objects or nouns we are interested in. Relationships are the connections or verbs we care about.

No standard drawing, although some are very common.

Afterwards, convert ER diagrams to CREATE TABLE statements

### Building Blocks

Some terminology:

• Entities: like objects. Entity Types (Entity Sets) are like classes
• Attributes: properties or characteristics of entity types. May be set valued. For example, a "genre" field added to our WMDB might be set-valued: Mystery, Drama, Comedy, Horror, etc. A movie might fit in more than one genre.
• Domains: the possible values of an attribute.
• Schema: The collection of entity type name, attributes, whether they are set-valued, and all that.

### Keys

keys and key constraints

• a set of attributes that uniquely specifies an entity. E.g. SSN, or (FirstName,LastName,ClassYear)
• no subset of those attributes works

Other observations:

• The key should be unique by definition, not by happenstance. (E.g. names in the hwk1 example)
• There may be more than one key. Can you think of two or more keys for a Wellesley student?
• The database should enforce the key constraint.

### Converting Entity Sets

• entity set → relation/table
• attribute → attribute/column
• But, set-valued attributes may require additional tables to avoid redundancy.

### Exercise

What attributes would you have for a movie in the WMDB? What would the key be?

What attributes would you have for cars at a new car dealership? What would the key be?

### Relationship Sets

• Relationships: connections among entities. Binary, Ternary, more
• Entity sets play roles in relationship sets.
• Relationship sets connect entity sets.
• Relationship sets don't connect relationship sets.

Conversion:

• Relationship set → relation/table
• key is usually union of keys of roles

### Diagrams

• Entity Sets are boxes
• Attributes are attached ovals
• Keys are underlined
• Relationship sets are diamonds, connected to entities
• Relationship sets can have attributes, too, as attached ovals
• If entity set is a key (one to many), draw an arrow to the diamond
• If full participation is required, line is thick

Examples:


createtable person(
nm int,
namevarchar(30),
birthdate date,
primary key (nm));
-- as of MySQL 5, "release" is a reserved word;
-- have to put the field name in backquotes.
createtable movie(
tt int primary key,
title varchar(50),
releasechar(4));


Remember this fact about the word release.

What about directors? That's a relationship between the two entities:

In our world, a movie only has 1 director, so this is a one-to-many relationship between Movies and People. Therefore, we draw an arrow from Movie to Directs, because Movie is a key of the relationship set. This is the  key constraint.


createtable movie(
tt int,
title varchar(50),
releasechar(4),
director int,
primary key (tt),
foreign key (director) references person(nm)


We also have a one-to-many relationship between People and Staff via the AddedBy relationship. Similarly for movies. We have a one-to-one relationship between Staff and UserPass:

In this case the relationship disappears. This doesn't always happen with a one-to-one relationship, but almost always. (In our case, we could even absorb the password into the staff table.

When would we not want to do this?


createtable userpass(
idint,
primary key (id),


Here, we have to have the intermediate table representing the relationship. Essentially, the relationship is two one-to-many relationships.


createtable credit(
tt int,
nm int,
primary key (tt,nm),
foreign key (tt) references movie(tt)
foreign key (nm) references person(nm)


### Relationship Attributes

A relationship can have attributes as well. Here, we record the date that a student declared a particular major.


createtable majorsin(
sidint,
did int,
decl date,
primary key (sid,did),
foreign key (sid) references student(sid)
foreign key (did) references department(did));


Roles can be labeled. Here's an example.

The roles usually become column names:




Here's a (heterosexist, sorry) example:


createtable marriedto(
husband int,
wife int,
foreign key (husband) references person(ssn),
foreign key (wife) references person(ssn),
unique (husband),
unique (wife));


What if we had the primary key in marriedto be (husband,wife)? What would the implications be?

Relationships can be Ternary as well as Binary, though this is rare.

Here's an example. The idea is that there are generic parts in the world, such as 2-inch #6 bolts or AA batteries. There are suppliers, such as Eveready, Duracell, and others. There are projects that need parts like bolts and batteries. If the project leader buys a particular product from a particular supplier for a given price on a given day, that event creates a relationship among three entities.

The table representing the relationship just has three foreign keys. Yikes, look at that primary key!


createtableproject (proj_id int, proj_title varchar(50));
createtable part (part_id int, sku char(10));
createtable supplier (supp_id int, namevarchar(50), address varchar(50));
createtable sold(
customer int,
product int,
supplier int,
datedate,
price float,
primary key (customer,product,supplier,date),
foreign key (customer) referencesproject(proj_id),
foreign key (product) references part(part_id),
foreign key (supplier) references supplier(sup_id));


### Long Exercise

Sketch an ER diagram for a student work-study program. What entities are there? What relationships? Are those relationships 1-1 or 1-many or many-many? Note that there's no single right answer to this.

#### More Examples

• AddedBy relates Movie and Staff
• Works_in relates Profs and Departments
• Mentors relates Students and Students (big/little sister)
• Dates relates People and People
• Married relates Men, Women, Parsons (officiant)

### Enhancements

#### Participation Constraints

If every X must have a Y, this is called a participation constraint. Examples:

• Every Movie must have a director
• Every Prof must have a Dept
• Every Dept must have a Chair

In an ER diagram, this is represented with a thick line or (sometimes) a double line:

In SQL, we can enforce a participation constraint with a "not null" and "foreign key" constraint. For example:


droptableifexists movie;
createtable movie(
tt int,
title varchar(50),
releasechar(4),
director intnotnull,
primary key (tt),
foreign key (director) references person(nm));


However, this can be annoying:

• What if the value is unknown? You can't enter the data you do not know.
• What if the value is temporarily vacant? For example, a department doesn't have a chair for some interregnum period.

#### ISA Hierarchies

• An Employee ISA Person, as are Professors, Staff, and Students.
• A Professor ISA Employee, as is a Staff worker.
• What about contract employees, hourly, salaried, etc.?
• 1, 2, 3, 4th year students

Pictured using triangles:

Where would you put the following attributes?

• birthdate
• GPA
• department
• dorm
• fee
• major
• matriculation date
• name
• salary
• ssn

ISA hierarchies are converted to either of the following:

• to multiple relations, using key of supertype and extra attributes, OR
• to a single relation with all attributes, plus a distinguisher attribute.

Represented by:

• references constraint to another table, such as a record in the Student table having a key for the Person table, OR
• include the other table's data, so the Student table has all the columns necessary for both oncampus and offcampus students, together with an indication of which kind the student is.

### Limitations of ER

• Entity or Attribute? It's sometimes not clear whether something should be an entity or an attribute. Even something as simple as color.

However, if something participates in relationships, it must be an entity. Don't have a dept attribute for a Student if there's a Department entity. Instead, put that entity in a relationship with the Department entity.

• Entity or Relationship? Should we have director as an entity, or directs as a relationship?
• Ternary vs 3 Binary Relationships? Consider the sold relationship above.

### Exercise on Professors and Courses

For the following, assume we have (at least) two tables:

Prof (SSN,Name) Teaches Course(Dept,Number,Title)

Sketch the ER diagram you would want:

1. Profs can teach the same course in several semesters, and only most recent offering is recorded
2. Profs can teach the same course in several semesters, and each offering is recorded
3. Every Prof teaches something
4. Every Prof teaches exactly one course
5. Every Prof teaches exactly one course and every course is taught by someone
6. certain courses are team-taught.

### Syntax Rules

Please remember the following syntax rules for ER diagrams, the way we're doing them:

1. relationship lines connect boxes to diamonds; a box is never directly connected to another box
2. arrowheads, if any, go on the diamond, not on the box
3. In a one-to-many relationship, the arrow goes from the many towards the one. E.g. movie has just one director, but a person can direct more than one movie, so the arrow goes from movie to director.
4. attributes are circled.
5. attributes that are (part of) the key are underlined.
6. Every entity set should have a key
7. Use ISA for inheritance of attributes
8. something that is a key to an entity set should not also be an attribute of another: instead, there's a relationship

A few more rules to keep in mind when converting ER diagrams to DDL:

1. Not every diamond becomes a table. Some data can be "absorbed" into another table. (E.g. director is absorbed into the "movie" table)
2. Many-to-many relationships may result in an extra table.

### Drawing Software

You can, of course, draw these diagrams with pencil and paper and either hand those in or scan them and email them to me. You can also create electronic drawings using Fireworks, Photoshop, the GIMP, or any of many other drawing applications. I did these with Xfig and with Inkscape.

Many students find that Google Slides and MS PowerPoint are sufficient.

The Wikipedia page lists several applications that have special support for ER diagrams. One I've played with is Dia which you can install on Windows and Macs. (There may even be implementations for Linux.) Here is a picture drawn using Dia:

Source: Scott D. Anderson, http://cs.wellesley.edu/~webdb/lectures/15-Entity-Relationships/activities.shtml