← Lessons

quiz vs the machine

Platinum1760

Machine Learning

Stacking Ensembles

Train a meta model to learn how best to combine diverse base predictions.

5 min read · advanced · beat Platinum to climb

Stacking Ensembles

Stacking, short for stacked generalization, combines diverse base models by training a second level meta model that learns how to weigh their predictions. Instead of simple averaging, it learns the best combination from data.

How it works

  • Train several diverse base models on the training data.
  • Generate their predictions to use as new features for the meta model.
  • Train a meta model on those predictions to produce the final output.

Avoiding leakage

The crucial detail is generating base predictions with out of fold cross validation. If the meta model trained on predictions the base models made on their own training rows, those predictions would be overoptimistic and leak. Out of fold predictions keep the meta features honest.

Practical notes

  • Base models should be diverse, mixing algorithm types for complementary strengths.
  • The meta model is usually simple, such as linear or logistic regression, to avoid overfitting the predictions.
  • Stacking often squeezes out extra accuracy but adds complexity and compute.

Key idea

Stacking trains a meta model on out of fold base predictions to learn the best combination, leveraging diverse models while preventing the leakage that naive prediction reuse would cause.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the meta model in stacking learn from?

2. Why use out of fold predictions for the meta features?

3. What kind of meta model is typically preferred?