The purpose
Principal component analysis, or PCA, reduces the number of features while keeping as much information as possible. It finds new axes, called principal components, that capture the directions of greatest variance in the data.
How it works
PCA looks at how features vary together:
- Center the data by subtracting each feature's mean
- Find the directions along which the data spreads the most
- These directions are ordered, so the first component captures the most variance, the second the next most, and so on
Keeping only the top components projects the data into fewer dimensions while preserving most of its spread.
Why use it
- Speeds up downstream models by cutting dimensionality
- Helps visualization by projecting to two or three dimensions
- Can reduce noise by dropping low variance directions
Cautions
PCA components are linear combinations of original features, so they can be hard to interpret. Features should be scaled first, since PCA is sensitive to magnitude.
Key idea
PCA projects data onto ranked directions of maximum variance, compressing dimensions while retaining most of the information.