A Review of Data Mining Techniques and Trends

Data mining will occupy an increasingly important position as the world moves from solving issues related to collecting data to generating information from large masses of data that are now easily gathered. This paper emphasizes that many industries depend on insights gathered from data, and thus naturally, data mining will become a central focus. We are now moving into an era where pattern recognition and prediction are common. What patterns do you recognize? Are you able to glean some insights into how you are learning?

Abstract

Everyday Terabytes of data are generated in many organizations. So it's difficult to predict for the world. Because of data are increasing day by day, we requires a need for new tools and techniques to support humans in automatically and intelligently analyzing large data repositories to obtain useful information. These growing needs gives a vison for a new area of research field called Data Mining (DM) or Knowledge Discovery in Databases (KDD). DM aims to extract implicit, previously unknown and potentially useful information from data by digging intelligently in large data repositories. In another way we can say that DM techniques are needed /used to extract unknown predictive information from large mass of data. Now a days data mining enhanced the various fields of human life including business, education, agriculture, medical, scientific etc., using Artificial Intelligence, Statistics, Computation capabilities, Pattern Recognition and Machine Learning, data visualization techniques. So we can say that DM has become an essential component in various fields of human life. This paper discusses and describes DM and major DM techniques such as statistics, artificial intelligence, decision tree approach, genetic algorithm, and visualization.

KEYWORDS : Knowledge Discovery in Databases (KDD), Data Mining, Trends, Association, Classification, Clustering, Prediction, pattern recognition.


Introduction

Data and information plays a very vital role on human activities. The knowledge discovery process is as old as Homo sapiens. Until some time ago this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers.DM is an active research area and research is ongoing to bring statistical analysis and artificial intelligence (AI) techniques together to address the issues. DM is the search for valuable information in large volumes of data. It is the process of nontrivial extraction of implicit, previously unknown and potentially useful information such as knowledge rules, constraints, and regularities from data stored in repositories using pattern recognition technologies as well as statistical and mathematical techniques.


Data Mining

Data mining is an essential step in the knowledge discovery in databases (KDD) process that produces useful patterns or models from data. The terms of KDD and data mining are different. KDD refers to the overall process of discovering useful knowledge from data. Data mining refers to discover new patterns from a wealth of data in databases by focusing on the algorithms to extract useful knowledge. In general, there are three main steps in DM: preparing the data, reducing the data and, finally, looking for valuable information. The specific approaches, however, differ from companies to companies and researchers to researchers. Fayyad et al. proposed the following steps:

  1. Retrieving the data from a large database
  2. Selecting the relevant subset to work with.
  3. Deciding on the appropriate sampling system, cleaning the data and dealing with missing fields and records.
  4. Applying the appropriate transformations, dimensionality reduction, and projections.
  5. Fitting models to the preprocessed data


Data Mining Techniques

There are several major data mining techniques have been developing and using in data mining projects including association, classification, clustering, prediction, pattern recognition.


Association Rules

Association rule is one of the well-known data mining technique. It implies certain association relationships among a set of objects in a database. Association rule mining is generally performed in generation of frequent Item sets. Nowadays Retailers are using association technique to research customer's buying habits to provide best services to customers and increase sales.


Classification

Classification is the practices of data analysis that can be used to extract models describing important data classes or to predict future data trends. Classification predicts discrete, unordered labels. Mainly classification technique is used to categorize data item into several predefined set of classes or groups. Any prediction can be thought of as classification or estimation. Different types of Classification method are:


Classification by decision tree induction

Bayesian Classification It's a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG)


Neural Networks

An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is a learning algorithm that is inspired by the structure and/or functional aspects of biological neural networks.


Support Vector Machines (SVM)

Support vector machine (SVM) is a set of related supervised learning methods used for classification and regression. SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.


Classification Based on Associations Prediction

Regression technique can be adapted for predication in data mining. Regression analysis can be used to model the relationship between one or more independent variables and dependent variables. In prediction records are classified according to some predicted future behavior or projected future value. In data mining independent variables are attributes already known and response variables are what we want to predict. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed.


Types of regression methods

  • Linear Regression
  • Multivariate Linear Regression
  • Nonlinear Regression
  • Multivariate Nonlinear Regression


Clustering

Clustering makes valuable cluster of objects. Clustering can be said as identification of similar subgroups or clusters of objects. By using clustering technique, we find some kinds of similarities in one cluster and label it with a meaningful name. The records are grouped together on the basis of self-similarity. Many clustering algorithms have been developed and are categorized from several aspects such as Partitioning Methods, Hierarchical Agglomerative, (divisive) methods; Density based methods, Grid-based methods, and Model-based methods.

General Types of Clusters are Well-separated clusters, Center-based clusters, Contiguous clusters, Density-based clusters


Pattern Recognition

There are many applications of machine learning in pattern recognition. One is optical character recognition, which is recognizing character codes from their images. This is an example where there are multiple classes, as many as there are characters we would like to recognize. Especially interesting is the case when the characters are handwritten. People have different handwriting styles; characters may be written small or large, slanted, with a pen or pencil, and there are many possible images corresponding to the same character. In the case of face recognition, the input is an image, the classes are people to be recognized, and the learning program should learn to associate the face images to identities. This problem is more difficult than optical character recognition because there are more classes, input image is larger, and a face is three-dimensional and differences in pose and lighting cause Significant changes in the image.


Different Trends of Data Mining

The diversity of data, data mining tasks, and data mining approaches poses many challenging research issues in data mining. The development of efficient and effective data mining methods and systems and techniques to solve large application problems are important tasks for DM researchers and data mining system and application developers. The era of data mining applications was conceived in the year1980 primarily by research-driven tools. There are a number of data mining trends is in terms of technologies and methodologies which are currently being developed and researched. The early day's data mining trends are as under.


Data Trends

Previously, flat files, traditional and relational databases were used to store the data which uses tabular representation. Data mining algorithms work best for numerical data collected from a single data base, and various data mining techniques have evolved. Later on, with the union of Statistics and Machine Learning techniques, various algorithms evolved to mine the non-numerical data and relational databases.


Computing Trends

Computing is the need of world. The field of data mining has been greatly influenced by the fourth generation programming languages and related computing techniques. In the previous era of data mining maximum of the algorithms working only on statistical techniques. Later on they evolved with different computing techniques like artificial intelligence, machine learning and pattern reorganization.


Current Trends

Now a days DM has been become very popular due to its tremendous success in terms of broad-ranging application achievements and scientific progress. The ever increasing complexities in various fields and enhancements in technology have posed new challenges to world of data mining; to handle the various challenges, some of the current trends are as under:


Distributed/Collective Data Mining (CDM)

In distributed data mining data are located in different places, in different physical locations. The man goal of CDM is to effectively mine distributed data which are located in heterogeneous locations. CDM provides a better approach to vertically partitioned datasets, using the notion of orthonormal basis functions, and computes the basis coefficients to generate the global model of the data.


Hypertext and Hypermedia Data Mining

Hypertext and hypermedia data mining can be characterized as mining data which includes text, hyperlinks, text markups, and various other forms of hypermedia information. Some of the important data mining techniques used for hypertext and hypermedia data mining include classification (supervised learning), clustering (unsupervised learning), semi-structured learning, and social network analysis.


Constraint - Based Data Mining

This form of data mining incorporates the use of constraints which guides the process.Frequently this is combined with the benefits of multidimensional mining to add greater power to the process. There are several categories of constraints which can be used, each of which has its own characteristics and purpose.


Ubiquitous Data Mining

The advent of laptops, palmtops, cell phones, and wearable computers is making ubiquitous access to large mass of data possible. The Ubiquitous computing environments are subsequently giving rise Ubiquitous Data Mining (UDM). UDM is the process of analysis of data for extracting useful knowledge from the data of ubiquitous computing. Accessing and examining data from a ubiquitous computing device may offer many challenges.


Multimedia Data Mining

Multimedia Data Mining is the mining and study of several types of data, including images, audio, video and animation. Some of the DM techniques that are applied on multimedia data are rule based decision tree classification algorithms like Artificial Neural Networks, Instance-based learning algorithms, Support Vector Machines, also association rule mining, clustering methods. It's new filed of research, but holds much potential for the future.


Spatial Data Mining

The spatial data includes astronomical data, natural resources data, satellite data and space craft data. Some of the data mining techniques and data structures which are used when analyzing spatial and related types of data include the use of spatial warehouses, spatial data cubes, spatial OLAP, and spatial clustering methods. Mostly these data are of image-oriented, and can represent a great deal of information if properly analyzed and mined.


Time Series Data Mining

It focuses on the goal of identifying movements or components which exist within the data sets of stock prices, currency exchange rates, the volume of product sales, biomedical measurements, weather data, etc (trend analysis). These can include long-term or trend movements, seasonal variations, cyclical variations, and random movements. Some of the rule induction algorithms Version Space, AQ15, C4.5 rules are presently employed in Time series data mining applications.


Business Trends

Early data mining applications focused mainly on helping businesses gain a competitive edge. The exploration of data mining for businesses continues to expand as e-commerce and e-marketing have become mainstream elements of the retail industry. Today's business/ industry must be more cost-effective, very faster and offer high value services that ever before.

Due to customer's expectations and constraints, data mining becomes a fundamental technology in supporting customer's transactions more accurately. Most probably classification and prediction Techniques are used for supporting business decisions and progressed to Decision Support Systems (DSS) and very recently it has grown to Business Intelligence (BI) systems.


Future Trends

Due to the enormous success of various application areas of data mining, the field of data mining has been establishing itself as the major discipline of computer science and has shown interest potential for the future developments. Ever increasing technology and future application areas are always poses new challenges and opportunities for data mining, the typical future trends of data mining includes:

  • Standardization of data mining languages
  • Data preprocessing
  • Complex objects of data
  • Computing resources
  • Web mining
  • Scientific Computing
  • Business data


Conclusion

In closing, it would not be overly optimistic to say that, DM will be one of the main viable focuses of the world. Although improvements are continuously been made in the DM field, many issues remain to be resolved and much research has yet to be done. The capability to continually change and provide new thoughtful is the principle benefit of DM, and will be at the core of DM bright and promising future. Having the right information at the right time is essential for making the right decision. The problem of collecting data, which used to be a major concern for most organizations, is almost resolved. In the millennium, world will be competing in generating information from large mass of data rather than collecting data.


Source: Shital H. Bhojani and Dr. Nirav Bhatt, https://www.worldwidejournals.com/global-journal-for-research-analysis-GJRA/fileview/May_2016_1464950444__86.pdf
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Last modified: Wednesday, March 15, 2023, 1:44 PM