The strategy
Gradient boosting builds an ensemble one tree at a time. Unlike a random forest, where trees are independent, each new tree is trained to correct the mistakes the current ensemble still makes.
How it works
The method connects to gradient descent in function space:
- Start with a simple baseline prediction
- Compute the residuals, which are the negative gradients of the loss
- Fit a small tree to predict those residuals
- Add a shrunken version of that tree to the ensemble
- Repeat
The shrinkage factor, called the learning rate, scales each tree's contribution so the model improves in small careful steps.
Why it is powerful
By sequentially focusing on the hardest remaining errors, boosting reaches very high accuracy on structured data. Libraries like XGBoost and LightGBM add clever regularization and speed.
The cost
Because trees are added sequentially, boosting is more prone to overfitting than bagging and needs careful tuning of tree depth, learning rate, and the number of trees, usually with early stopping.
Key idea
Gradient boosting adds shallow trees one at a time, each fitting the residual errors of the current ensemble, yielding strong but overfit prone models.