Random Forests and Bagging
A single deep tree has low bias but high variance. A random forest averages many trees so their individual errors cancel, giving a stable predictor.
Bagging
Bagging means bootstrap aggregating. Each tree trains on a bootstrap sample, a dataset drawn with replacement from the original data. Because the samples differ, the trees differ, and averaging their predictions reduces variance.
Adding feature randomness
Bagging alone leaves trees correlated because strong features dominate every tree. Random forests add a second trick.
- At each split the tree considers only a random subset of features.
- This forces trees to use different signals and makes them more decorrelated.
- Less correlated errors cancel better when averaged.
Combining predictions
For classification the forest takes a majority vote across trees. For regression it averages their outputs. More trees never increase overfitting, they only stabilize the estimate, so the count is limited mainly by compute.
Out of bag estimates
Each tree leaves out about a third of the data through its bootstrap sample. Predicting those held out points gives an out of bag error estimate for free, without a separate validation split.
Key idea
Random forests cut variance by averaging many trees decorrelated through bootstrap sampling and random feature subsets.