← Lessons

quiz vs the machine

Platinum1760

Machine Learning

Gradient Boosting

Building a strong model by adding trees that fix prior errors.

6 min read · advanced · beat Platinum to climb

The strategy

Gradient boosting builds an ensemble one tree at a time. Unlike a random forest, where trees are independent, each new tree is trained to correct the mistakes the current ensemble still makes.

How it works

The method connects to gradient descent in function space:

  • Start with a simple baseline prediction
  • Compute the residuals, which are the negative gradients of the loss
  • Fit a small tree to predict those residuals
  • Add a shrunken version of that tree to the ensemble
  • Repeat

The shrinkage factor, called the learning rate, scales each tree's contribution so the model improves in small careful steps.

Why it is powerful

By sequentially focusing on the hardest remaining errors, boosting reaches very high accuracy on structured data. Libraries like XGBoost and LightGBM add clever regularization and speed.

The cost

Because trees are added sequentially, boosting is more prone to overfitting than bagging and needs careful tuning of tree depth, learning rate, and the number of trees, usually with early stopping.

Key idea

Gradient boosting adds shallow trees one at a time, each fitting the residual errors of the current ensemble, yielding strong but overfit prone models.

Check yourself

Answer to earn rating on the learn ladder.

1. How does each new tree in boosting differ from the last?

2. What does the learning rate do in boosting?

3. Compared with random forests, boosting is generally?