Principal Component Analysis
Principal Components Analysis (PCA)
Objective
Capture the intrinsic variability in the data.
Reduce the dimensionality of a data set, either to ease interpretation or as a way to avoid overfitting and to prepare for subsequent analysis.
The sample covariance matrix of is
, since
has zero mean.
The eigenvectors of (i.e.,
) are called principal component directions of
.
The first principal component direction has the following properties that
is the eigenvector associated with the largest eigenvalue,
, of
.
has the largest sample variance amongst all normalized linear combinations of the columns of X.
is called the first principal component of
. And, we have
.
The second principal component direction (the
direction orthogonal to the first component that has the largest
projected variance) is the eigenvector corresponding to the second
largest eigenvalue,
, of
,
and so on. (The eigenvector for the
largest eigenvalue
corresponds to the
principal component direction
.)
The principal component of
,
, has maximum variance
, subject
to being orthogonal to the earlier ones.