Many approaches exist for reducing the dimension of feature vectors while still optimizing model evaluations. The subset selection approach is very useful and regularly applied. On the other hand, this approach may not reveal underlying relationships between the features or describe why certain features work well together while others do not. To do this, it is necessary to develop algorithms and compute recipes for mixing the most relevant features. Principal Component Analysis (PCA) is arguably one of the popular methodologies for achieving this goal.
Principal Components Analysis (PCA)
Objective
Capture the intrinsic variability in the data.
Reduce the dimensionality of a data set, either to ease interpretation or as a way to avoid overfitting and to prepare for subsequent analysis.
The sample covariance matrix of is
, since
has zero mean.
The eigenvectors of (i.e.,
) are called principal component directions of
.
The first principal component direction has the following properties that
is the eigenvector associated with the largest eigenvalue,
, of
.
has the largest sample variance amongst all normalized linear combinations of the columns of X.
is called the first principal component of
. And, we have
.
The second principal component direction (the
direction orthogonal to the first component that has the largest
projected variance) is the eigenvector corresponding to the second
largest eigenvalue,
, of
,
and so on. (The eigenvector for the
largest eigenvalue
corresponds to the
principal component direction
.)
The principal component of
,
, has maximum variance
, subject
to being orthogonal to the earlier ones.