Read this article and focus on the diagram about how data warehouses are constructed. What is a data warehouse? How does it differ from a database? What are the key steps in constructing a data warehouse? Why would a business want to utilize a data warehouse?
Learning Objectives
- Understand what data warehouses and data marts are and the purpose they serve.
- Know the issues that need to be addressed in order to design, develop, deploy, and maintain data warehouses and data marts.
Since running analytics against transactional data can bog down a system, and since most organizations need to combine and reformat data from multiple sources, firms typically need to create separate data repositories for their reporting and analytics work – a kind of staging area from which to turn that data into information.
Two terms you'll hear for these kinds of repositories are data warehouse and data mart.
A data warehouse is a set of databases designed to support decision
making in an organization. It is structured for fast online queries and
exploration. Data warehouses may aggregate enormous amounts of data from
many different operational systems.
A data mart is a database focused on addressing the concerns of a
specific problem (e.g., increasing customer retention, improving product
quality) or business unit (e.g., marketing, engineering).
Marts and warehouses may contain huge volumes of data. For example, a
firm may not need to keep large amounts of historical point-of-sale or
transaction data in its operational systems, but it might want past data
in its data mart so that managers can hunt for patterns and trends that
occur over time.
It's easy for firms to get seduced by a software vendor's demonstration
showing data at your fingertips, presented in pretty graphs. But as
mentioned earlier, getting data in a format that can be used for
analytics is hard, complex, and challenging work. Large data warehouses
can cost millions and take years to build. Every dollar spent on
technology may lead to five to seven more dollars on consulting and
other services.
Most firms will face a trade-off – do we attempt a large-scale integration
of the whole firm, or more targeted efforts with quicker payoffs? Firms
in fast-moving industries or with particularly complex businesses may
struggle to get sweeping projects completed in enough time to reap
benefits before business conditions change. Most consultants now advise
smaller projects with narrow scope driven by specific business goals.
Firms can eventually get to a unified data warehouse but it may take
time. Even analytics king Wal-Mart is just getting to that point. Retail
giant Wal-Mart once reported having over seven hundred different data
marts and hired Hewlett-Packard for help in bringing the systems
together to form a more integrated data warehouse.
The old saying from the movie Field of Dreams,
"If you build it, they will come," doesn't hold up well for large-scale
data analytics projects. This work should start with a clear vision
with business-focused objectives. When senior executives can see
objectives illustrated in potential payoff, they'll be able to champion
the effort, and experts agree, having an executive champion is a key
success factor. Focusing on business issues will also drive technology
choice, with the firm better able to focus on products that best fit its
needs.
Once a firm has business goals and hoped-for payoffs clearly defined, it
can address the broader issues needed to design, develop, deploy, and
maintain its system:
- Data relevance. What data is needed to compete on analytics and to meet our current and future goals?
- Data sourcing. Can we even get the data we'll need? Where can this data be obtained from? Is it available via our internal systems? Via third-party data aggregators? Via suppliers or sales partners? Do we need to set up new systems, surveys, and other collection efforts to acquire the data we need?
- Data quantity. How much data is needed?
- Data quality. Can our data be trusted as accurate? Is it clean, complete, and reasonably free of errors? How can the data be made more accurate and valuable for analysis? Will we need to ‘scrub,' calculate, and consolidate data so that it can be used?
- Data hosting. Where will the systems be housed? What are the hardware and networking requirements for the effort?
- Data governance. What rules and processes are needed to manage data from its creation through its retirement? Are there operational issues (backup, disaster recovery)? Legal issues? Privacy issues? How should the firm handle security and access?
E-discovery: Supporting Legal Inquiries
Data archiving isn't just for analytics. Sometimes the law requires organizations to dive into their electronic records. E-discovery
refers to identifying and retrieving relevant electronic information to
support litigation efforts. E-discovery is something a firm should
account for in its archiving and data storage plans. Unlike analytics
that promise a boost to the bottom line, there's no profit in complying
with a judge's order – it's just a sunk cost. But organizations can be
compelled by court order to scavenge their bits, and the cost to uncover
difficult to access data can be significant, if not planned for in
advance.
In one recent example, the Office of Federal Housing Enterprise
Oversight (OFHEO) was subpoenaed for documents in litigation involving
mortgage firms Fannie Mae and Freddie Mac. Even though the OFHEO wasn't a
party in the lawsuit, the agency had to comply with the search – an
effort that cost $6 million, a full 9 percent of its total yearly
budget.
Key Takeaways
- Data warehouses and data marts are repositories for large amounts of transactional data awaiting analytics and reporting.
- Large data warehouses are complex, can cost millions, and take years to build.
Questions and Exercises
- List the issues that need to be addressed in order to design, develop, deploy, and maintain data warehouses and data marts.
- What is meant by "data relevance"?
- What is meant by "data governance"?
- What is the difference between a data mart and a data warehouse?
- Why are data marts and data warehouses necessary? Why can't an organization simply query its transactional database?
- How can something as simple as customer gender be difficult to for a large organization to establish in a data warehouse?
This text was adapted by Saylor Academy under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensor.