Machine Learning: Dimensionality Reduction
1. MotivationPermalink
Dimensionality Reduction helps in:
- Data Compression
- Visualization (because we can only plot 2D or 3D)
Principal Component Analysis (主成分分析), abbreviated as PCA, is the algorithm to implement Dimensionality Reduction。
2. PCA: Problem FormulationPermalink
2.1 Reduce from 2-dimension to 1-dimensionPermalink
假设两个 features 分别是
那么如何判断 “投影点能够代替
经过进一步推导(这里就不深入了),问题进一步转化为:Find a direction (a vector
2.2 Reduce from n-dimension to -dimensionPermalink
同理,Find
3. PCA: AlgorithmPermalink
3.1 Data preprocessingPermalink
Given training set
- calculate
- replace each
with
If there are different features on different scales (e.g.
3.2 PCA algorithm and implementation in OctavePermalink
Suppose we are reducing data from
Step 1: Compute covariance matrix (协方差矩阵):Permalink
Non-vectorized formula is
又
Step 2: Compute eigenvectors of covariance matrixPermalink
[U, S, V] = svd(Σ)
, svd
for Singular Value Decomposition (奇异值分解). eig(Σ)
also works but less stable.
Covariance matrix always satisfies a property called “symmetric positive semidefinite” (对称半正定矩阵), so svd
== eig
.
The structure of U
in [U, S, V]
is:
Step 3: Generate the dimensionsPermalink
We want to reduce to
In Octave, use U_reduce = U(:, 1:k)
.
The new dimension
Vectorized formula is
The structure of
4. Reconstruction from Compressed RepresentationPermalink
这里说的 Reconstruction 是指 Reconstruct
算法是:
Vectorized formula is:
5. Choosing the Number of Principal ComponentsPermalink
I.e. how to choose
5.1 AlgorithmPermalink
Average squared projection error:
Total variation in the data:
Typically, choose
在实现的时候还是只有是
5.2 Convenient calculation with SVD resultsPermalink
我们利用 [U, S, V] = svd(Σ)
的
For a given
这样我们只用计算一次 [U, S, V] = svd(Σ)
,然后尝试
6. Advice for Applying PCAPermalink
6.1 Good use of PCAPermalink
Application of PCA:
- Compression
- Reduce memory/disk needed to store data
- Speed up learning algorithm
- choose
by xx% of variance retaining
- choose
- Visualization
- choose
or
- choose
PCA can be used to speedup learning algorithm, most commonly the supervised learning.
Suppose we have
6.2 Bad use of PCAPermalink
- To prevent overfitting
- Use PCA to reduce the number of features, thus, fewer features, less likely to overfit.
This is bad use because PCA is not a good way to address overfitting. Use regularization instead.
6.3 Implementation tipsPermalink
- Note 1: The mapping
should be defined by running PCA only on the training set. But this mapping can be applied as well to the examples and in the cross validation and test sets. - Note 2: Before implemen1ng PCA, first try running whatever you want to do with the original/raw data
. Only if that does not do what you want, then implement PCA and consider using .
Comments