Stacked Generalization

Beyond simple averaging

Ensembles improve accuracy by combining models, and the simplest combination just averages or votes. Stacking, short for stacked generalization, goes further by learning how to combine base models with a second model called the meta learner.

How stacking works

The base models should be diverse, for example a tree ensemble, a linear model, and a neural network, so their errors differ:

Each base model makes predictions
A meta model takes those predictions as its input features
The meta model learns which base models to trust in which situations

Because a base model is naturally good at predicting data it trained on, stacking uses out of fold predictions, generated through cross validation, so the meta learner sees honest estimates rather than memorized answers.

When to use it

Stacking often wins competitions by squeezing out the last bit of accuracy, but it adds complexity and risk of overfitting the meta layer. Keeping the meta model simple, like a regularized linear model, usually works best.

Key idea

Stacking trains a meta model on the out of fold predictions of diverse base models, learning a smarter combination than plain averaging.

Stacked Generalization

Beyond simple averaging

How stacking works

When to use it

Key idea

Check yourself