← Lessons

quiz vs the machine

Gold1400

Machine Learning

The Random Forest Tuning

The handful of knobs that actually move a random forest, from tree count to feature sampling.

5 min read · core · beat Gold to climb

What a forest controls

A random forest averages many decorrelated trees, each grown on a bootstrap sample with random feature subsets at each split. A few hyperparameters shape that ensemble.

The knobs that matter

  • Number of trees more trees lower variance and never hurt accuracy, only cost. Use enough that the score plateaus.
  • Max features the count of features tried per split. Fewer features mean more decorrelated, more diverse trees.
  • Max depth and min samples per leaf limit individual tree complexity to control overfitting.

Using out of bag error

Each tree skips about a third of the data, its out of bag samples. Predicting those gives a free validation estimate without a separate holdout, handy for quick tuning.

Practical guidance

  • Start by raising tree count until the out of bag score stops improving.
  • Then tune max features, the single most impactful diversity knob.
  • Random forests are forgiving, so depth limits rarely need fine tuning unless overfitting is severe.

Key idea

The high impact random forest knobs are tree count, which only helps until it plateaus, and max features, which controls tree diversity. Out of bag error gives a free validation score for tuning.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does adding more trees to a random forest rarely hurt accuracy?

2. What is the out of bag estimate?