Data and Databases

This chapter covers the concepts of data and databases. Businesses are becoming more and more "data-driven"; understanding how data is collected, stored, and managed is essential for anyone wanting to succeed in business. Pay special attention to the sections on data warehouses and data mining, as they provide examples of how companies use data strategically.

Databases

Designing a Database

Suppose a university wants to create a School Database to track data. After interviewing several people, the design team learns that the goal of implementing the system is to give better insight into students' performance and academic resources. From this, the team decides that the system must keep track of the students, their grades, courses, and classrooms. Using this information, the design team determines that the following tables need to be created:

  • STUDENT: student name, major, and e-mail.
  • COURSE: course title, enrollment capacity.
  • GRADE: this table will correlate STUDENT with COURSE, allowing us to have any given student to enroll multiple courses and to receive a grade for each course.
  • CLASSROOM: classroom location, classroom type, and classroom capacity

Now that the design team has determined which tables to create, they need to define the specific data items that each table will hold. This requires identifying the fields that will be in each table. For example, course title would be one of the fields in the COURSE table. Finally, since this will be a relational database, every table should have a field in common with at least one other table (in other words, they should have relationships with each other).

A primary key must be selected for each table in a relational database. This key is a unique identifier for each record in the table. For example, in the STUDENT table, it might be possible to use the student name as a way to identify a student. However, it is more than likely that some students share the same name. A student's e-mail address might be a good choice for a primary key, since e-mail addresses are unique. However, a primary key cannot change, so this would mean that if students changed their e-mail address we would have to remove them from the database and then re-insert them – not an attractive proposition. Our solution is to use student ID as the primary key of the STUDENT table. We will also do this for the COURSE table and the CLASSROOM table. This solution is quite common and is the reason you have so many IDs! The primary key of table can be just one field, but can also be a combination of two or more fields. For example, the combination of StudentID and CourseID the GRADE table can be the primary key of the GRADE table, which means that a grade is received by a particular student for a specific course.

The next step of design of database is to identify and make the relationships between the tables so that you can pull the data together in meaningful ways. A relationship between two tables is implemented by using a foreign key. A foreign key is a field in one table that connects to the primary key data in the original table. For example, ClassroomID in the COURSE table is the foreign key that connects to the primary key ClassroomID in the CLASSROOM table. With this design, not only do we have a way to organize all of the data we need and have successfully related all the table together to meet the requirements, but have also prevented invalid data from being entered into the database. You can see the final database design in the figure below:

Tables of the student database

Tables of the student database