BUS610 Study Guide

Unit 4: Data Warehousing and Integration

4a. Describe how data storage and integration have changed over time to enable the prediction of future trends in data storage

  • What are some of the issues associated with cloud data storage?
  • What role does data governance play in the administration of data?

From punch cards being used to communicate information to equipment a long time before computers were developed to Professor Fredrick Williams creating RAM in 1948, the history of data storage is wide, varied, and extremely complex, with the longest-serving era being that of IBM from the mid-1950s to approximately 2003 with their magnetic disk storage development and market domination. Since then, the technological development of data warehousing and storage regarding speed has moved beyond, but for large mainframes, it remained relatively the same for size.
 
Before the advent of relational databases, most transaction processing systems were characterized by application-specific data structures. Applications were not integrated, and thus there was no way to share data between applications. Since the advent of relational databases, there has been a much higher degree of centralization and coordination of transaction processing system data.
 
A cloud system consists of IT components (hardware, software, and infrastructure) that enable the delivery of cloud computing services such as SaaS (software as a service), PaaS (platform as a service), and IaaS (infrastructure as service) via a network, typically the public internet. Cloud systems must be highly flexible and allow for various technologies and systems of all vintages and standards. Cloud systems and the vendors and service providers who support them must be able to integrate many different types of technology and systems of different vintages and vendors. New technology and systems are constantly being developed, and cloud systems must allow for these new technologies to be integrated into the older technologies already in use.
 
Cloud services vendors must be able to provide non-proprietary network management solutions to allow for the wide range of technologies that must be integrated into the cloud system.
 
Data governance is the collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its strategic goals. Data governance defines who can take what action, upon what data, in what situations, and using what methods. Data governance frameworks and maturity models have been developed to aid the organization in ensuring that its governance policies and processes are serving the organization's needs in the most effective way.
 
The operational DBA would typically be concerned with the operational database and not the administration of the data warehouse. A specific DBA role should be created for the administration and management of the warehouse. In addition, one or both of these database systems may include cloud systems and may not even involve the use of a locally-managed data center.
 
This figure is a conceptualization of a cloud storage system.


To review, see Modeling and Management of Big Data in Databases.
 

4b. Explain the basic concepts and theories of data warehousing, such as dimensional modeling and ETLA (extract, transform, load, analyze) 

  • What are some of the challenges of data warehouse administration?
  • How is data extracted from the data warehouse?

The fundamental purpose of a data warehouse is to store data extracted from internal transaction processing systems and external sources. The data will be reformatted to meet the needs of the BI systems that will use it. The data warehouse will not be integrated with and will not contain operational data from the transaction processing systems. In addition, the data warehouse may or may not be segmented into specialized data marts.
 
To support the needs of BI systems, the DBA must ensure that the data stored in operational and transaction processing systems can be extracted and moved to the data warehouse supporting the BI system. This extraction process must also allow for the conversion of the operational data into whatever format meets the needs of the warehouse and the BI system.
 
Organizational data is expected to be utilized more than ever to support business intelligence applications, data warehousing, data marts, and advanced analytics for business decision support.
 
Extraction systems will only extract and transform the information they are specified for. Extraction systems have no mechanisms for auditing and checking data quality, completeness, or reliability. Such systems exist to automate extracting data from the warehouse, transforming it into the appropriate formats, and loading the data into the BI systems.
 
The following figure provides a high-level conceptual representation of the data warehousing and management process. Notice that data is fed to the data warehouse from various sources, both internal and external to the organization. This means that the data will be in many different formats and will need to be adapted for inclusion in the warehouse, which is the process of the staging phase. The warehouse itself may be subdivided into smaller sections called data marts. The structure of these data marts would depend on the needs of the user.


To review, see Data Warehouse Strategies.
 

4c. Explain data warehouse administration and security issues, such as user access and accountability, encryption, and emerging challenges 

  • What are some of the challenges of data warehouse administration?
  • How can security be managed and enforced in the data warehouse?

The database administrator of a warehouse will typically not be involved in the typical roles of an operational database. Their focus is specific to the warehouse and the BI systems it supports. Thus, the time it takes to make decisions, including the time it takes to extract from the warehouse, would be a primary concern.
 
The Database administrator of a warehouse may (rarely) be a team member on an applications redesign project but does not bear primary responsibility for such projects.
 
As security policies are developed to support business operations, good security practice ensures that data is only made accessible to those staff who have a documented business need to access the data. Security policies should be designed so that only those users who have a legitimate need to access particular data are given access to that data. This is particularly important in the case of sensitive information like customer data.
 
Fault tolerance simply means a system's ability to continue operating uninterrupted despite the failure of one or more of its components. This is true whether it is a computer system, a cloud cluster, a network, or something else. You can make a BI Server architecture more fault tolerant by using multiple instances that will tend to increase redundancy and result in a more fault-tolerant configuration.
 
A number of factors have changed recently regarding the future of data storage. As a move toward more security, containers are being used with more microservice architectures being implemented, and how those issues, such as operationality, will be a key trend to address. As cloud infrastructure grows, so does the market for on-premise storage facilities, as more businesses want in-house control.
 
To review, see Big Data Management.
 

Unit 4 Vocabulary 

This vocabulary list includes the terms that you will need to know to successfully complete the final exam.

  • analytics development
  • cloud storage
  • containers
  • database administrator (DBA)
  • data center
  • data governance
  • data warehouse
  • data mart
  • fault tolerant
  • network management
  • platform as a service (PaaS)
  • relational database
  • security
  • software as a service (SaaS)
  • transaction processing system