Introduction to Data Mining

Data mining involves various algorithms and techniques for database searching, inferring data relationships, pattern recognition, and pattern classification. Pattern recognition is the process of comparing a sample observation against a fixed set of patterns (like those stored in a database) to search for an optimal match. Face recognition, voice recognition, character recognition, fingerprint recognition, and text string matching are all examples of pattern searching and pattern recognition.

Going one step further, given a set of prescribed pattern classes, pattern classification is the process of associating a sample observation with one of the pattern classes. For example, consider a database containing two possible classes of face images: happy faces and sad faces. Pattern classification involves processing an input face image of unknown classification to optimally classify it as either happy or sad. As you will soon see, the optimal pattern match or pattern classification is often defined using probabilistic and statistical measures such as distances, deviations, and confidence intervals.

Outline of this Course - What Topics Will Follow?

In this course, we will cover the following topics:

  • What is Statistical Learning (Supervised Learning and Unsupervised Learning)
  • Data Splitting, Model Building, and Cross-validation Techniques
  • Linear Regression and Variable Selection
  • Biased Regression, Shrinkage Methods of Regression
  • Dimension Reduction Techniques and Regression based on techniques thereof
  • Nonlinear Regression Techniques
  • Discriminant Analysis
  • Methods based on Decision Trees: CART, Bagging, Boosting, Random Forest
  • Support Vector Machines
  • Clustering Methods: Hierarchical Clustering, K-Means Clustering, kNN Method