← Lessons

quiz vs the machine

Gold1480

Machine Learning

Gradient Boosted Trees

Building an ensemble by fitting trees to residual errors.

5 min read · core · beat Gold to climb

Gradient Boosted Trees

Random forests build trees in parallel and average them. Gradient boosting builds trees in sequence, each one correcting the mistakes of the ensemble so far.

Fitting the residuals

Boosting starts with a simple prediction such as the mean. Then it repeats a cycle.

  • Compute the current errors, the gap between predictions and targets.
  • Fit a new small tree to those errors, which are the negative gradients of the loss.
  • Add a scaled version of that tree to the running model.

Each tree nudges the prediction in the direction that most reduces the loss, which is why the method is called gradient boosting.

Weak learners

The added trees are deliberately shallow, often just a few levels deep. A single shallow tree is a weak learner, but hundreds of them combined form a strong model. Keeping each tree weak prevents any one step from overfitting.

Bias and variance

Boosting mainly reduces bias by adding capacity step by step, in contrast to bagging which mainly reduces variance. Because it keeps fitting errors, boosting can overfit if run too long, so the number of trees is tuned carefully.

Key idea

Gradient boosting adds shallow trees in sequence, each fit to the current errors, to steadily reduce bias.

Check yourself

Answer to earn rating on the learn ladder.

1. What does each new tree in gradient boosting fit?

2. Why are boosting trees kept shallow?

3. What does boosting primarily reduce compared with bagging?