Data Warehouses and Data Mining
|BUS206: Management Information Systems
|Data Warehouses and Data Mining
|Friday, February 23, 2024, 9:26 AM
This article gives a detailed summary of the role of data warehouses and data mining, and their relationship to organizational databases. As you read, pay attention to how data warehouses are used to improve decision-making in organizations. Keep a summary in your notes of how an organization you are involved with could benefit from data mining and data warehousing.
Table of contents
- Data Warehouse
- Applications of Data Warehouses
- Types of Data Warehouses
- Example: Facebook
- Big Data
- Why Is Big Data Important?
- Big Data in Today's World
- Data Mining
- Data Mining Life Cycle
- How does Data Mining Work?
- Elements of Data Mining
- Data Mining Applications
- Web Mining
- Spatial Data Mining
- Where is Spatial Data Mining Used?
IntroductionDatabases play a critical role in almost all areas where computers are used, including education, library, science, medicine, business, law, engineering, and so on. Due to the recent developments, the storage capacity and the computing speed of the computers have increased, so now big data is being handled by the computers using different techniques handling the data. Data is a valuable asset for an organization. Data mining is the process of finding patterns in a given data set. Today, data mining is used in contexts of fraud detection, as an aid in marketing campaigns, and even by supermarkets. Data warehouse provides us generalized and consolidated data in a multidimensional view. Along with this view of data, data warehouses also supply us with online analytical processing (OLAP) tools. There is no frequent updating done in a data warehouse. Data cleaning and data transformation are important steps in improving the quality of data and data mining results. Sometimes data mining is called data or knowledge discovery.
Source: Parul Mittal
This work is licensed under a Creative Commons Attribution 4.0 License.
Data warehouse was first coined by Bill Inmon in 1990. According to the Inmon, a data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to make informed decisions in an organization.
- A data warehouse is a database that is kept separate from the organization's operational database.
- Possesses consolidated historical data, which helps the organization to analyze its business.
- Help in the integration of diversity of application systems. Why a data warehouse is separated from operational databases?
- An operational database is constructed for well-known tasks and workloads such as searching particular records. Indexing etc.
- An operational database query allows reading and modifying operations.
- An operational database maintains current data.
- Data warehouse is subject oriented because it provides information around a subject.
- Data is integrated from heterogeneous sources.
- Data collected in a data warehouse is identified with a particular time period.
- Data is non volatile.
- A data warehouse does not require transaction processing, recovery, and concurrency controls.
Applications of Data Warehouses
A data warehouse helps business executives to organize, analyze and use their data for decision making. Data warehouses are widely used in the following fields:
- Financial services: They are the economic services provides by the finance industry which encompasses a broad range of businesses that manage money and some government sponsored enterprises.
- Banking services: We combine global capabilities with deep local knowledge to provide innovative products and services to meet the needs of our customers and clients.
- Consumer goods: Consumer goods are goods that are ultimately consumed rather than used in the production of another good. For example, a microwave oven or a bicycle which is sold to a consumer is a final good or consumer good, whereas the components which are sold to be used in those goods are called intermediate goods.
- Retail sectors: Retail is the process of selling consumer goods and/or services to customers through multiple channels of distribution to earn a profit.
- Controlled manufacturing: Quality Controlled Manufacturing Inc. specializes in the precision machining of complex components and assemblies in all metals, including exotic alloys using Six Sigma methodology and lean manufacturing principles.
Types of Data Warehouses
There are three types of data warehouse applications:
- Information processing: The data can be accessing by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts or graphs.
- Analytical processing: The data can be analyzed by means of basic OLAP operations, including slice and dice, drill down, drill up, and pivoting.
- Data mining: Data mining supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification, and prediction.
Example of data warehousing that everyone can relate to is what Facebook does. Facebook gathers all of your data – your friends, your likes, and so on – and then stores that data into one central repository. For many reasons, they want to make sure that you see the most relevant ads that you're most likely to click on, they want to make sure that every friends that they suggest are the most relevant to you, keep in mind that this is the data mining phase in which meaningful data and patterns are extracted from the aggregated data.
The term "big data" is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, search, sharing, storage, transfer, visualization, and querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction, and reduced risk. Definition of big data as the three Vs:
- Volume: Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data
- Velocity: Data streams in at an unprecedented speed and must be dealt with in a timely manner.
- Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data, and financial transactions.
Why Is Big Data Important?
The importance of big data doesn't revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable
- Cost reductions,
- Time reductions,
- New product development and optimized offerings,
- Smart decision making.
When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:
- Determining root causes of failures, issues, and defects in nearly real time.
- Generating coupons at the point of sale based on the customer's buying habits.
- Recalculating entire risk portfolios in minutes.
- Detecting fraudulent behavior before it affects your organization.
Big Data in Today's World
Big data and the way organizations manage and derive insight from it is changing the way the world uses business information.
VISA: As with most credit card companies, Visa faced the challenge of combatting fraudulent activity while providing seamless service for their customers – tasks that don't always go hand-in-hand.
Everyone's talking about big data and Hadoop these days, but if you're not a data whiz you might feel left out of the conversation. This paper tells you what you need to know – in a way that's easy to grasp – so you can start making the most of your big data.
Big Data, Data Mining, and Machine Learning shows how organizations can harness the power of high-performance computing architectures and data mining, text analytics and machine learning algorithms. Written for corporate leaders and technology and marketing executives, Big Data, Data Mining, and Machine Learning offers a detailed review of how big data analytics can be used to gain an edge on the competition and increase the bottom line.
Today, there are many challenges in the data mining system. Organization worldwide generate large amount of data that is mostly unorganized. Data mining is the technique of extracting meaningful information from large and mostly unorganized data banks. It is the process of performing automated extraction and generating predictive information from large data banks. The extraction of meaningful information from a large data bank is otherwise known as knowledge discovery. There are varied views regarding the usage of the term knowledge discovery for data mining. Data mining is the sorting through data to identify patterns and establish relationships. It is the computational process of discovering patterns in Big Data involving methods at the intersection of data science, A.I, Data base systems. There are number of data mining system are available. Data mining is the process of analyzing data from different perspectives and summarizing it into useful information that can be used to increase revenue, cuts costs, or both. Everyday people are using vast data and these data are in different fields. The data may be in form of audio, images, text, video, and so on. The applications and trends of data mining areas: Retail industry, financial data analysis. As we can see that there are many MNC's and organizations deals in different places of the different countries. Every place of operation may generate big data. This type of big data is available in the terra byte form which has drastically changed in the area of data science and data base system. To analyze and make a decision of this big amount of big data, we need different techniques or methods called data mining.
Data Mining Life Cycle
Data mining operations require a systematic approach. The sequence of the phases is not strict and moving back and forth between different phases is always required. The general phases in the data mining process to extract knowledge are:
- Problem definition: This phase is to understand the problem and the domain environment in which the problem occurs.
- Creating a database for data mining: This phase is to create a database where the data to be mined are stored for knowledge acquisition.
- Exploring the database: This phase is to select and examine important data sets of a data mining database in order to determine their feasibility to solve the problem.
- Preparation for creating a data mining model: This phase is to select variables to act as predictors.
- Building a data mining model: This phase is to create multiple data mining models and to select the best of these models.
- Evaluating the data mining model: This phase is to evaluate the accuracy of selected data mining model.
- Deploying the data mining model: This phase is to deploy the built and the evaluated data mining model in the external working environment.
How does Data Mining Work?
Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, four types of relationships are sought:
- Classes: stored data is used to locate data in predetermined groups.
- Clusters: data items are grouped according to logical relationships or consumer preferences.
- Associations: data can be mined to identify associations.
- Sequential patterns: data is mined to anticipate behavior patterns and trends.
Elements of Data Mining
Data mining consists of five major elements:
- Extract, transform, and load transaction data onto the data warehouse system.
- Store and manage the data in a multidimensional database system.
- Provide data access to business analyst.
- Analyze the data by application software.
- Present the data in a useful format.
Data Mining Applications
Data mining applications have been successfully applied in many areas, such as:
- Financial Data Analysis: The financial data in the banking and financial industry is generally reliable and of high quality, which facilitates systematic data analysis and data mining. Some of the typical cases are loan payment prediction and customer credit policy analysis; classification and clustering of customers for targeted marketing; and detection of money laundering and other financial crimes.
- Retail Industry: Data mining in the retail industry helps in identifying customer buying patterns and trends that lead to improved quality of customer service and good customer retention and satisfaction. These include the design and construction of data warehouses based on the benefits of data mining; multidimensional analysis of sales, customers, products, time, and region; and customer Retention.
- Telecommunication Industry: Data mining in the telecommunication industry helps in identifying the telecommunication patterns, catch fraudulent activities, make better use of resources, and improve the quality of service. These include the multidimensional Analysis of Telecommunication data; fraudulent pattern analysis; identification of unusual patterns; and mobile telecommunication services.
- Biological Data Analysis: The following are aspects in which data mining contributes to biological data analysis: semantic integration of heterogeneous, distributed genomic, and proteomic databases; discovery of structural patterns; association and path analysis; and visualization tools in genetic data analysis.
- Other Scientific Applications: A large amount of data sets is being generated because of the fast numerical simulations in various fields such as climate and ecosystem modeling, chemical engineering, fluid dynamics, etc. These include data Warehouses and data preprocessing; graph-based mining; and visualization and domain-specific knowledge.