← Lessons

quiz vs the machine

Platinum1620

Machine Learning

Stacked Generalization

Training a meta model to combine the predictions of diverse base models.

5 min read · advanced · beat Platinum to climb

Beyond simple averaging

Ensembles improve accuracy by combining models, and the simplest combination just averages or votes. Stacking, short for stacked generalization, goes further by learning how to combine base models with a second model called the meta learner.

How stacking works

The base models should be diverse, for example a tree ensemble, a linear model, and a neural network, so their errors differ:

  • Each base model makes predictions
  • A meta model takes those predictions as its input features
  • The meta model learns which base models to trust in which situations

Because a base model is naturally good at predicting data it trained on, stacking uses out of fold predictions, generated through cross validation, so the meta learner sees honest estimates rather than memorized answers.

When to use it

Stacking often wins competitions by squeezing out the last bit of accuracy, but it adds complexity and risk of overfitting the meta layer. Keeping the meta model simple, like a regularized linear model, usually works best.

Key idea

Stacking trains a meta model on the out of fold predictions of diverse base models, learning a smarter combination than plain averaging.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the meta model in stacking take as input?

2. Why use out of fold predictions for the meta model?