What a forest controls
A random forest averages many decorrelated trees, each grown on a bootstrap sample with random feature subsets at each split. A few hyperparameters shape that ensemble.
The knobs that matter
- Number of trees more trees lower variance and never hurt accuracy, only cost. Use enough that the score plateaus.
- Max features the count of features tried per split. Fewer features mean more decorrelated, more diverse trees.
- Max depth and min samples per leaf limit individual tree complexity to control overfitting.
Using out of bag error
Each tree skips about a third of the data, its out of bag samples. Predicting those gives a free validation estimate without a separate holdout, handy for quick tuning.
Practical guidance
- Start by raising tree count until the out of bag score stops improving.
- Then tune max features, the single most impactful diversity knob.
- Random forests are forgiving, so depth limits rarely need fine tuning unless overfitting is severe.
Key idea
The high impact random forest knobs are tree count, which only helps until it plateaus, and max features, which controls tree diversity. Out of bag error gives a free validation score for tuning.