Gradient Boosting

The strategy

Gradient boosting builds an ensemble one tree at a time. Unlike a random forest, where trees are independent, each new tree is trained to correct the mistakes the current ensemble still makes.

How it works

The method connects to gradient descent in function space:

Start with a simple baseline prediction
Compute the residuals, which are the negative gradients of the loss
Fit a small tree to predict those residuals
Add a shrunken version of that tree to the ensemble
Repeat

The shrinkage factor, called the learning rate, scales each tree's contribution so the model improves in small careful steps.

Why it is powerful

By sequentially focusing on the hardest remaining errors, boosting reaches very high accuracy on structured data. Libraries like XGBoost and LightGBM add clever regularization and speed.

The cost

Because trees are added sequentially, boosting is more prone to overfitting than bagging and needs careful tuning of tree depth, learning rate, and the number of trees, usually with early stopping.

Key idea

Gradient boosting adds shallow trees one at a time, each fitting the residual errors of the current ensemble, yielding strong but overfit prone models.

The strategy

How it works

Why it is powerful

The cost

Key idea

Check yourself