7.2: Supervised Learning
A set, collection, or database of either pattern or class data is generically referred to as "training data". This is because data mining requires a collection of known or learned examples against which input observations can be compared. For pattern classification, as mentioned in the previous section, there are two broad categories of learned examples: supervised and unsupervised. This unit deals specifically with supervised learning techniques, while the next unit deals with unsupervised learning techniques. Read these basic steps of solving a supervised learning problem. Assuming data has been collected, as this unit progresses, you will understand and be able to implement the process:
Training set → Feature selection → Training algorithm → Evaluate model
Our tool for these implementing steps will be the scikit-learn module.
- Feature selection (or "feature extraction") is the process of taking raw training data and defining data features that represent important characteristics of the data. For example, consider an image recognition application. An image can contain millions of pixels. Yet, our eyes key into specific features that allow our brains to recognize objects within an image. Object edges within an image are key features that can define the shape of an object. The original image consisting of millions of pixels can therefore be reduced to a much smaller set of edges.
Once a set of features is chosen, a model must be trained and evaluated. Based on these materials, you should now understand how data mining works. The rest of this unit will introduce some practical techniques and their implementations using scikit-learn.
The scikit-learn module contains a broad set of methods for statistical analyses and basic machine learning. During the remainder of this unit and the next on unsupervised learning, we will introduce scikit-learn in the context of data mining applications. Use this section as an introduction to see how modules such as pandas can be used in conjunction with the sci-kit learn module. Make sure to follow along with the programming examples. There is no substitute for learning by doing. As this course progresses, you will understand more deeply how to apply the methods used in this video.