Read this article that will introduce you to the latest buzzword on how to more effectively manage all of this information: the dashboard.
4. SYSTEM DESCRIPTION
OLAP Servers.
Online Analytic processing (OLAP) supports operations such as filtering, aggregation, pivoting, rollup and drill- down on the multi-dimensional view of the data. OLAP servers are implemented using either a multidimensional storage engine
(MOLAP); a relational DBMS engine (ROLAP) as the backend; or a hybrid combination called HOLAP.
MOLAP servers. MOLAP servers directly support the multidimensional view of data through a storage engine that uses the multidimensional array abstraction. They typically precompute large data cubes to speed up query processing. Such an approach has the advantage of excellent indexing properties and fast query response times, but provides relatively poor storage utilization, especially when the data set is sparse. To better adapt to sparse data sets, MOLAP servers identify dense and sparse regions of the data, and store/index these regions differently. For example dense sub-arrays of the cube are identified and stored in array format, whereas the sparse regions are compressed and stored separately.
ETL METHODOLOGY
Introduction of ETL
Many of the Business Intelligence tools look way cool. They provide graphs, moving targets, drill-downs, and drill- through. But much of the work in an operational data warehouse involves getting the data from operational systems into the data warehouse so that business intelligence tools can display those pretty pictures. This paper addresses the extraction, transformation, and load components of data warehousing. Well look at issues in extraction, transformation, and loading and common approaches to loading data. We assume that source data structures are generally not similar to target data structures (e.g., flat files and normalized tables).
The Plan
At the risk of being a bit simplistic, extraction, transformation, and load requires three main steps:
- Read the source data
- Apply business, transformation, and technical rules
- Load the data
Figure 5 shows this data flow. After the process reads the data, it must transform the data by applying technology, transformation, and business rules to it. An example of a transformation rule is: "Convert values in field x to integer". An example of a business rule is: "Customers must purchase products in the list of Washer, Dryer, "Refrigerator". Applying business, transformation, and technology rules to data means generating keys, transforming codes, converting datatypes, parsing, merging, and many other operations. Once the data is in an appropriate technical and business format, the ETL process can load it into target tables. Note that these steps can potentially be performed many places in the system chain. For example, extract programs could transform code values as they read from DB2 tables or VSAM datasets. A Perl program could parse a file and generate keys for use by bcp or DTS. Finally, stored procedures could split staging table rows in a set-oriented fashion.
Fig 3: High-Level Data Flow for Extraction, Transformation, and Load
The Environment
The extraction, transformation, and load environment consists of three architectures (data, application, and technology), and a group of people (management, developers, and support). Figure 2 shows an abstract ETL environment.
Fig 4: ETL Environment.
The data architecture includes the data itself and its quality as well as the various models that represent the data, data structures, business and transformation rules, and business meaning embodied in the data and data structures. Data architecture models include conceptual data models, logical data models, physical data models, and physical representations such as COBOL copybooks, C structures, and SQL DDL statements.
Common Operations
Listed below are some common operations you may need to perform on the data you load.
Operation |
Possible Solution |
Generate a key |
Identity column; primary key generator |
Translate a code |
if-then logic; lookup |
Split data from one source into two targets |
Multiple write statements |
Merge two data sources into one |
Join from multiple source tables |
Log errors and progress information |
Log/Schedule table that all load processes write to |
Load code tables |
Set of scripts |
Table 2. Potential Load Operations
Four Approaches
There are four common approaches to transforming and loading data into the data warehouse.
- Build a custom solution
- Buy a code generator
- Buy a data transformation engine
- Use a mixture of the above