← Lessons

quiz vs the machine

Platinum1820

Machine Learning

The LightGBM Specifics

Leaf wise growth, gradient based sampling, and feature bundling that make LightGBM fast on big data.

5 min read · advanced · beat Platinum to climb

Speed on large datasets

LightGBM is a gradient boosting library built for large data. It pairs histogram based splits with three ideas that cut work without much accuracy loss.

Leaf wise tree growth

Most boosters grow trees level by level. LightGBM grows leaf wise, always splitting the leaf with the largest loss reduction. This reaches lower loss with fewer leaves but can overfit, so a max leaves limit is essential.

Two signature techniques

  • Gradient based one side sampling, called GOSS, keeps all rows with large gradients and randomly samples the small gradient rows, focusing effort where error is high.
  • Exclusive feature bundling, called EFB, merges sparse features that rarely take nonzero values together, shrinking the effective feature count.

Tuning notes

  • Control complexity mainly with num leaves rather than depth, since growth is leaf wise.
  • Increase min data in leaf to fight the overfitting that leaf wise growth invites.
  • It handles categorical features natively without one hot encoding.

Key idea

LightGBM grows trees leaf wise for lower loss per leaf, then speeds training with GOSS sampling and exclusive feature bundling. Control overfitting through num leaves and min data in leaf.

Check yourself

Answer to earn rating on the learn ladder.

1. How does LightGBM grow its trees by default?

2. What does gradient based one side sampling focus on?