2. Related Works

2.2. ETL vs ELT

In a typical BI infrastructure, data, extracted from Operational Data Sources (ODS), firstly transformed, then cleaned and loaded into a data warehouse. Before data are loaded into a data warehouse, it's necessary to process "raw data" for a variety of reasons. For example, a data warehouse typically consolidates a multitude different ODS with different schemas and metadata behind. Hence, incoming data must be normalized. Also, the ODS may contain erroneous, corrupted or missed data, so the process of cleaning and and reconsolidation are needed. This preprocessing is commonly known as Extract, Transform and Load (ETL): data are first extracted from the original data source, then transformed including normalization and cleansing, and finally loaded into the data warehouse.

While database technologies used for data warehousing had seen tremendous performance and scalability enhancements over the past decade, ETL has not been improved in scalability and performance in the same level of degree as database. As a result, most BI infrastructures are increasingly experiencing a bottleneck: data can't be easily acquired to the data warehouse with necessary actuality. Clearly, in order to provide near real-time BI, this bottleneck needs to be resolved.

Costs of data data storage were always a significant factor, but they becoming cheaper with time, and as the result, analysis can be performed over bigger amount of data with less investments. And in changing circumstances, former (but robust) Extract, Transform and Load approach cannot be easily applied to answer all needs of business, which includes work with big data and, as the result, new approach and/or architectural changes are needed. Main disadvantage of ETL is that data must be firstly transformed and only then loaded. It means that on transformation phase, mass amounts of potentially valuable data are thrown away. However, to eliminate drawbacks of ETL, improvement of latest storage techniques can be used. One of the approach that address such challenges is called  Extract, Load, and Transform. The basic idea is to perform Load process immediately after Extract process, and apply Transformation only after getting data stored. ELT, in comparison with ETL, has four following advantages: (1) flexibility in adding new data sources (EL part); (2) aggregation can be applied multiple times on same raw data (T part); (3) Transformation process can be re-adopted, even on legacy data; (4) speed-up process of implementation.

According to the Competitive advantage: Creating and sustaining superior performance, the enterprise competitiveness heavily rely on the time elapsed during decisions making process. And to make decisions, BI solutions are became "de-facto" standard. As long as time is a very important factor, it's also crucial to design BI solution in shorter period of time. One of the shortcomings, according to the An architecture for ad-hoc and collaborative business intelligence, is that time spent on phase of implementation BI solution causes increasing of costs. The next shortcoming is that BI solution needs to be flexible in order to reflect environmental changes and adopt them in shorted possible period of time.

In the ETL process is not addressing flexibility in the terms of reflecting environmental changes and classical BI solutions need vast amount of time to be implemented. ETL process is applying transformation after extraction and before loading that is causing data to be inserted into data warehouse only during the last phase. In contrast, ELT allows firstly extracting and loading data, and then apply on demand transformation according to the business needs. Also, transformation with ELT can be applied and re-applied taking into account changes in business requirements. Based on above reasons it's more preferable to adopt Extract, Load, and Transform (ELT) instead of Extract, Transform, and Load (ETL) in the BI solutions.