4. SYSTEM DESCRIPTION

OLAP Servers.
Online Analytic processing (OLAP) supports operations such as filtering, aggregation, pivoting, rollup and drill- down on the multi-dimensional view of the data. OLAP servers are implemented using either a multidimensional storage engine (MOLAP); a relational DBMS engine (ROLAP) as the backend; or a hybrid combination called HOLAP.

MOLAP servers. MOLAP servers directly support the multidimensional view of data through a storage engine that uses the multidimensional array abstraction. They typically precompute large data cubes to speed up query processing. Such an approach has the advantage of excellent indexing properties and fast query response times, but provides relatively poor storage utilization, especially when the data set is sparse. To better adapt to sparse data sets, MOLAP servers identify dense and sparse regions of the data, and store/index these regions differently. For example dense sub-arrays of the cube are identified and stored in array format, whereas the sparse regions are compressed and stored separately.


ETL METHODOLOGY

Introduction of ETL

Many of the Business Intelligence tools look way cool. They provide graphs, moving targets, drill-downs, and drill- through. But much of the work in an operational data warehouse involves getting the data from operational systems into the data warehouse so that business intelligence tools can display those pretty pictures. This paper addresses the extraction, transformation, and load components of data warehousing. Well look at issues in extraction, transformation, and loading and common approaches to loading data. We assume that source data structures are generally not similar to target data structures (e.g., flat files and normalized tables).


The Plan

At the risk of being a bit simplistic, extraction, transformation, and load requires three main steps:

  • Read the source data
  • Apply business, transformation, and technical rules
  • Load the data

Figure 5 shows this data flow. After the process reads the data, it must transform the data by applying technology, transformation, and business rules to it. An example of a transformation rule is: "Convert values in field x to integer". An example of a business rule is: "Customers must purchase products in the list of Washer, Dryer, "Refrigerator". Applying business, transformation, and technology rules to data means generating keys, transforming codes, converting datatypes, parsing, merging, and many other operations. Once the data is in an appropriate technical and business format, the ETL process can load it into target tables. Note that these steps can potentially be performed many places in the system chain. For example, extract programs could transform code values as they read from DB2 tables or VSAM datasets. A Perl program could parse a file and generate keys for use by bcp or DTS. Finally, stored procedures could split staging table rows in a set-oriented fashion.

Fig 3: High-Level Data Flow for Extraction, Transformation, and Load


The Environment

The extraction, transformation, and load environment consists of three architectures (data, application, and technology), and a group of people (management, developers, and support). Figure 2 shows an abstract ETL environment.


Fig 4: ETL Environment.

The data architecture includes the data itself and its quality as well as the various models that represent the data, data structures, business and transformation rules, and business meaning embodied in the data and data structures. Data architecture models include conceptual data models, logical data models, physical data models, and physical representations such as COBOL copybooks, C structures, and SQL DDL statements.


Common Operations

Listed below are some common operations you may need to perform on the data you load.

Operation

Possible Solution

Generate a key

Identity column; primary key generator

Translate a code

if-then logic; lookup

Split data from one source into two targets

Multiple write statements

Merge two data sources into one

Join from multiple source tables

Log errors and progress information

Log/Schedule table that all load processes write to

Load code tables

Set of scripts

Table 2. Potential Load Operations


Four Approaches

There are four common approaches to transforming and loading data into the data warehouse.

  • Build a custom solution
  • Buy a code generator
  • Buy a data transformation engine
  • Use a mixture of the above