7.3: Principal Component Analysis
As part of the feature optimization process, when faced with a large set of features, a major goal is to determine a combination or mixture of features that lead to optimal model evaluations. You have already seen the subset selection approach, which you can use to reduce the number of features used to describe training set observations. Using the language of vectors and matrices, we can say that we can reduce the dimension of a feature vector if a subset of features is found to give optimal results. Reducing the feature vector dimension is preferable because it directly translates into reduced time to train a given model. Additionally, higher dimensional spaces impede the ability to define distances, as all points in the space begin to appear as if they are all equally close together.
Many approaches exist for reducing the dimension of feature vectors while still optimizing model evaluations. The subset selection approach is very useful and regularly applied. On the other hand, this approach may not reveal underlying relationships between the features or describe why certain features work well together while others do not. To do this, it is necessary to develop algorithms and compute recipes for mixing the most relevant features. Principal Component Analysis (PCA) is arguably one of the popular methodologies for achieving this goal.
In this section, you will learn how to implement and apply PCA for feature optimization and dimensionality reduction using scikit-learn.