The LightGBM Specifics

Leaf wise growth, gradient based sampling, and feature bundling that make LightGBM fast on big data.

Speed on large datasets

LightGBM is a gradient boosting library built for large data. It pairs histogram based splits with three ideas that cut work without much accuracy loss.

Leaf wise tree growth

Most boosters grow trees level by level. LightGBM grows leaf wise, always splitting the leaf with the largest loss reduction. This reaches lower loss with fewer leaves but can overfit, so a max leaves limit is essential.

Two signature techniques

Gradient based one side sampling, called GOSS, keeps all rows with large gradients and randomly samples the small gradient rows, focusing effort where error is high.
Exclusive feature bundling, called EFB, merges sparse features that rarely take nonzero values together, shrinking the effective feature count.

Tuning notes

Control complexity mainly with num leaves rather than depth, since growth is leaf wise.
Increase min data in leaf to fight the overfitting that leaf wise growth invites.
It handles categorical features natively without one hot encoding.

Key idea

LightGBM grows trees leaf wise for lower loss per leaf, then speeds training with GOSS sampling and exclusive feature bundling. Control overfitting through num leaves and min data in leaf.

The LightGBM Specifics

Speed on large datasets

Leaf wise tree growth

Two signature techniques

Tuning notes

Key idea

Check yourself