BUS611 Study Guide

Unit 8: Data-Driven Uses and Misuses

8a. Describe how data warehouses bring related information from disparate databases so that it can be analyzed

  • Why do we use data warehouses?
  • How is data brought into a data warehouse from internal and external sources?

The fundamental purpose of a data warehouse is to store data extracted from internal transaction processing systems and external sources. The data will be reformatted to meet the business intelligence needs and other systems that will use it. The data warehouse will not be integrated with and will not contain operational data from the transaction processing systems. A fundamental reason organizations use data warehouses is that the warehouse does not interfere with the operations of the transaction processing systems. In addition, the data warehouse may or may not be segmented into specialized data marts.
 
Data mining is a technique that allows for the modeling and discovery of patterns and information within data. Along with data warehouses, data mining arose primarily to address some limitations in OLTP systems.
 
Data mining is often implemented to populate a data warehouse. Data mining evolved as a mechanism to cater to the limitations of transaction processing systems to deal with massive data sets with high dimensionality, new data types, and multiple heterogeneous data resources. Data mining systems need to be able to perform data extraction from the large diversity of systems and data formats that will likely be found in any organizational environment. For example, a data mining system must not only be able to extract data from a well-designed relational database. Still, it must also be able to extract data such as audio, video, and free-form text comments on social media from sources that may be organized in a completely different way or not organized at all.
 
To review, see Data Warehousing and Data Mining.

 

8b. Explain what data-driven systems can achieve in an organization

  • How do data warehouses support data-driven decision-making?
  • How do decision support and other systems use data to improve decision-making?
  • What role do Key Performance Indicators (KPIs) play in making data-driven decisions?

Once an organization has established a data warehouse, it can use its stored data to make better-informed managerial and operating decisions. Managers can either automate or make decisions using business intelligence or decision support systems. For example, a system that identifies products a customer might be interested in based on what they have already bought and sends them an email might be an example of an automated decision. Such a decision will not require input from any human and thus can be executed very rapidly, often in real time. Managerial decision-making, on the other hand, requires that a human make the decision. To improve the quality of managerial decision-making, the goal of a data warehouse is to feed other types of tools like decision support systems explicitly designed to support humans. There is extensive evidence that humans will make better decisions when they have access to better data and information and use tools to assist them in processing, modeling, and analyzing that data.
 
To achieve objectives, managers need to track how progress is being made against goals. However, these goals need to be relevant to the business setting. If the manager is achieving goals, but they are the wrong goals, that can be worse than not having any goals. For this reason, a common component of organizational strategic planning is identifying critical success factors. The idea of a critical success factor is that these are the small number of strategic factors that will determine the enterprise's success. Things like profits, sales and revenue, customer and employee satisfaction, time to develop new products, and so forth are all commonly encountered success factors.
 
Critical success factors are best accomplished when the success factors can be measured. Measurement allows for the establishment of targets and benchmarks and allows for a higher degree of precision. Setting business Key Performance Indicators (KPIs) is one of the most effective ways to do this. For KPIs to be effective, it's essential that managers consider what information is critical and that the KPIs have certain characteristics.


To review, see Data-Driven Decisions.

 

8c. Identify the most widely-used data warehousing architectures and schemas and how they are applied in practice

  • What are some of the most common data warehouse architectures?
  • How do data warehouses manage the vast diversity of data types they must store?

This figure provides a high-level, conceptual representation of the data warehousing and management architecture and process. Notice that data is fed to the data warehouse from various internal and external sources to the organization. To feed in data from external to the organization, we rely on various techniques like using APIs and web crawlers. Using a data warehouse to collect data from external sources also means that the data will be in many different formats and need to be adapted for inclusion in the warehouse; this is the process of the staging phase.
 
In the staging phase, we take data in different formats and manipulate it to include it in the data warehouse. In addition to data in a relational form that may have come from internal transaction processing databases, we may need to include external data that may take the form of audio files, images, video files, free-form texts, and so forth. For example, if we store information about feedback that our customers may have left about us on social media platforms, we have to have a way to organize and sort this. We may use things like keywords (words or phrases that users type into search engines to find relevant content for their queries), tags (terms assigned to a piece of information), and so on to help up, but in general, this data will not be that well formatted.
 
The warehouse itself may be subdivided into smaller sections called data marts. The structure of these data marts would depend on the user's needs. By using smaller data marts for specific types of problems – for example, one for finance, one for marketing, one for customer service, and so on – we can reduce the amount of time and system resources that any given user might need to find a particular piece of information relevant to a problem or decision that they are working on.
 
Notice that the data warehouse itself will store different kinds of data. Examples are raw data, which may be quite unstructured, summary data which may be raw data that has been summarized or other summary data, and metadata, or data about the data. These different types of data each require different techniques for management and storage, and the data warehouse has to be able to accommodate these different techniques.

The bottom tier of a three-tier data warehouse architecture is a warehouse database server, typically a relational database. Various tools and utilities, called back-end tools, can feed records into this bottom tier. Operational databases are often the source of this data, but they can also come from other sources, often external to the organization. The middle tier is an OnLine Analytic Processing (OLAP) server executed using either a relational or multidimensional OLAP model. This middle tier then serves as the data management subsystem of a decision support system. The front-end, or client layer, that interfaces with the user is called the top tier. The top tier usually includes query and reporting, analysis, and data mining tools.
 
Another type of tool that is sometimes included in the definition of a data warehouse at the top tier is an analytic dashboard. A significant challenge in supporting human decision-makers is assisting the decision-maker in forming a high-level mental model of the system they are making decisions about. A dashboard can aid in visualization to allow decision-makers to gain this kind of intuitive understanding of the real-world processes their decisions will influence.
 
To review, see Data Warehouse Architecture and Schemas.

 

8d. Use data warehouse and mining techniques to support decision-making based on business objectives

  • How is data mining used to populate a data warehouse?
  • How are data mining and data warehouses used to support managerial decision-making?

We store big data in the data warehouse and then use data mining techniques to extract data for use by business intelligence systems to support decision-making. Data mining systems are designed to find patterns and correlations in data from data warehouses and generally prepare data for use in the decision support systems used by decision-makers.
 
Along with data warehouses, data mining arose primarily to address some limitations in online transaction processing (OLTP) systems, which are designed primarily to automate business processes. Customer order entry might be an example. They were not designed specifically to support decision-making. While it is true that we can run SQL queries against OLTPs, this presents several problems. The first of these problems is that SQL queries can slow down the operations of a database. This is the last thing we need in an OLTP.
 
The second problem is that data in an OLTP is often much more detailed and granular than is needed for decision-making. In making decisions, we might be interested in monthly sales, for example, and the OLTP may process thousands of transactions a second. What we want is summary data. Finally, an OLTP will not contain data from outside the organization. Yet this data is required for decision-making. Hence the need for a data warehouse.
 
To begin making data-driven decisions, the organization must start with a clear objective as to what they are trying to accomplish. It could be increased sales, reduced manufacturing costs, improved process efficiency, or any measurable outcomes. Once the objective(s) have been determined, the organization gathers and analyzes the available data to make decisions. After the decision is implemented, it is essential to determine if the analysis validated the results.
 
Business intelligence (BI) combines analytics, data warehousing and mining, visualization, and data infrastructure to help organizations make effective data-driven decisions. This is a different focus than Information Management and Data Processing. Information management generally includes the management and reporting of transactions and the operational systems necessary to operate the business. This is different from the focus of BI on support for decision-making. Data processing, as generally defined, would involve using relational database systems and transaction processing. There might also be some use of SQL for queries and reports.
 
Since the advent of data-driven decision-making in the 1950s, business intelligence has always striven to collect and analyze the largest amount of data possible. BI focuses on the data needed to make decisions; this is not necessary, only the most current data. There is often a need for historical data to make forecasts and to support decision-making. The focus of a BI system would not be on the sheer amount and variety of data but rather on the data most relevant to the decision-maker's needs.
 
To support the needs of business intelligence and decision-making systems, the DBA must ensure that the data stored in operational and transaction processing systems can be extracted and moved to the BI system's data warehouse. This extraction process must also allow for the conversion of the operational data into whatever format meets the needs of the warehouse and the BI system.
 
To review, see Data Mining Techniques in Analyzing Process.

 

Unit 8 Vocabulary

This vocabulary list includes the terms that you will need to know to successfully complete the final exam.

  • automated decision
  • business intelligence
  • critical success factor
  • data extraction
  • data mart
  • data mining
  • decision support system
  • key performance indicator (KPI)
  • keyword
  • information management
  • managerial decision-making
  • online analytic processing (OLAP)
  • online transaction processing (OLTP)
  • operational database
  • raw data
  • summary data
  • tag
  • three-tier data warehouse architecture