Designing BI Solutions in the Era of Big Data

This article highlights the changing dynamics needed for models to achieve feasibility within an organization by proposing a new process challenging both ETL (extract, transform, load) and ELT (extract, load, transform) by modifying with ETLA (extract, transform, load, and analyze). To challenge your thinking, apply the concept by putting dirty dishes into a dishwasher for context.

2. Related Works

2.1. Business Intelligence and Big Data

Business intelligence systems support and assist in decision making processes. It's also taking part in organization strategic plan, which normally addressing achievement of management effectiveness. BI is defined as "a set of methodologies, processes, architectures and  technologies  that  transform  raw data  into meaningful and useful  information used  to  enable  more  effective  strategic tactical, and operational insights and decision-making". Effective BI systems give decision makers access to quality information, enabling them accurately identify where the company has been, where it is now, and where it needs to be in future. Despite the immense benefits that an effective BI system can bring, numerous studies shown that the usage and  adoption of BI systems remain low, particularly among smaller institutions and companies with resource constraints. 

According to Study on Port Business Intelligence System Combined with Business Performance Management the BI system should have the following basic features:

  • Data Management: including data extraction, data cleaning, data integration, as well as efficient storage and maintenance of large amounts of data
  • Data Analysis: including information queries, report generation, and data visualization functions
  • Knowledge Discovery: extracting useful information (knowledge) from the rapidly growing volumes of digital data in databases

The most important feature to succeed in building BI solution is to perform well on stage of the Data Management. Data Management being a foundation of BI solution, it's usually the most stressing and time consuming part. Nowadays, there are many companies offering own solutions. However, their applications do not assure that the whole necessary information in the decision making process will be available. Rather than focusing on necessary information to build good solutions, most of these providers are focusing on the technological aspects. Such behavior is not satisfying real business needs, and not supporting the fact that there is not alignment between the business and technological domains.

Informally, big data is defined as limitation of analytics and storage capabilities of standard data processing tools like database management systems. Formally, the big data stays that there is a triple "v": volume, velocity, and variety. Volume states the fact of data processing limitations that a coming from data's huge size. Velocity argues that data input speed is also crucial, because data is generated and inserted into data storage on high speed. Variety stays that data is coming from different heterogeneous sources (social networks, sensors, transactional data and etc.).

Despite big data is kind of buzzword, the business cannot ignore it without losing competitiveness on the market. Datameer Inc. (2013) reported that the major goals for the companies to implement big data are: (1) increase revenue, (2) decrease costs, and (c) increase productivity. 

Data and knowledge extraction from it are too different things, but they cannot be separated. Then data is stored, proper analytical methods must be applied in order to get value out of it. Mainly, there are two ways that are used to implement analytics over data: SQL and MapReduce. SQL proved it's applicability by the long and robust history of usage (more than 40 years). While MapReduce appeared less than a decade ago, it's already one of the most popular programming models to support complex analysis over huge volumes of structure and unstructured data. Multiple researches stays that SQL wasn't designed for current needs, and new models and ways, like MapReduce, should deal with analytical challenges addressed by the era of big data. But, works A comparison of approaches to largescale data analysis and A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics showed that database management systems with SQL on board were significantly faster and required less code to implement information extraction and analytical tasks. However, the process of database tuning and data loading takes more time in comparison with MapReduce. 

As was mentioned the major feature of big data are: (1) increase revenue, (2) decrease costs, and (3) increase productivity. These 3 features are very desirable for any BI project. In the typical architecture of BI system it is very common to have data warehouses with the whole information needed or even several data marts joining together to conform the data warehouse, in this point.