Data and Databases

This chapter covers the concepts of data and databases. Businesses are becoming more and more "data-driven"; understanding how data is collected, stored, and managed is essential for anyone wanting to succeed in business. Pay special attention to the sections on data warehouses and data mining, as they provide examples of how companies use data strategically.


Why Databases?

Data is a valuable resource in the organization.  However, many people do not know much about database technology, but use non-database tools, such as Excel spreadsheet or Word document, to store and manipulate business data, or use poorly designed databases for business processes.  As a result, the data are redundant, inconsistent, inaccurate, and corrupted.  For a small data set, the use of non-database tools such as spreadsheet may not cause serious problem.  However, for a large organization, corrupted data could lead to serious errors and destructive consequences.  The common defects in data resources management are explained as follows.

(1) No control of redundant data

People often keep redundant data for convenience.  Redundant data could make the data set inconsistent.  We use an illustrative example to explain why redundant data are harmful.  Suppose the registrar's office has two separate files that store student data: one is the registered student roster which records all students who have registered and paid the tuition, and the other is student grade roster which records all students who have received grades.

Example of Redundant Data

As you can see from the two spreadsheets, this data management system has problems.  The fact that "Student 4567 is Mary Brown, and her major is Finance" is stored more than once.  Such occurrences are called data redundancy.  Redundant data often make data access convenient, but can be harmful.  For example, if Mary Brown changes her name or her major, then all her names and major stored in the system must be changed altogether.  For small data systems, such a problem looks trivial.  However, when the data system is huge, making changes to all redundant data is difficult if not impossible.  As a result of data redundancy, the entire data set can be corrupted.

(2) Violation of data integrity

Data integrity means consistency among the stored data.  We use the above illustrative example to explain the concept of data integrity and how data integrity can be violated if the data system is flawed.  You can find that Alex Wilson received a grade in MKT211; however, you can't find Alex Wilson in the student roster.  That is, the two rosters are not consistent.  Suppose we have a data integrity control to enforce the rules, say, "no student can receive a grade unless she/he has registered and paid tuition", then such a violation of data integrity can never happen.

(3) Relying on human memory to store and to search needed data

The third common mistake in data resource management is the over use of human memory for data search.  A human can remember what data are stored and where the data are stored, but can also make mistakes.  If a piece of data is stored in an un-remembered place, it has actually been lost.  As a result of relying on human memory to store and to search needed data, the entire data set eventually becomes disorganized.

To avoid the above common flaws in data resource management, database technology must be applied.  A database is an organized collection of related data. It is an organized collection, because in a database, all data is described and associated with other data.  For the purposes of this text, we will only consider computerized databases.

Though not good for replacing databases, spreadsheets can be ideal tools for analyzing the data stored in a database. A spreadsheet package can be connected to a specific table or query in a database and used to create charts or perform analysis on that data.