The idea
A single deep decision tree has low bias but high variance. A random forest trains many trees and averages them, which keeps the low bias while sharply reducing variance.
Two sources of randomness
For the trees to help, their errors must be different from each other. Random forests inject randomness twice:
- Bagging trains each tree on a bootstrap sample, a random draw of rows with replacement
- At each split, only a random subset of features is considered, which decorrelates the trees
Combining predictions
For classification the forest takes a majority vote across trees. For regression it averages their outputs. Because the trees make different mistakes, the errors partly cancel.
Handy extras
The rows left out of each bootstrap form an out of bag set that gives a free validation estimate. Forests also rank feature importance.
Key idea
Random forests average many decorrelated trees built with bagging and random feature subsets, slashing variance without raising bias.