Stacking Ensembles
Stacking, short for stacked generalization, combines diverse base models by training a second level meta model that learns how to weigh their predictions. Instead of simple averaging, it learns the best combination from data.
How it works
- Train several diverse base models on the training data.
- Generate their predictions to use as new features for the meta model.
- Train a meta model on those predictions to produce the final output.
Avoiding leakage
The crucial detail is generating base predictions with out of fold cross validation. If the meta model trained on predictions the base models made on their own training rows, those predictions would be overoptimistic and leak. Out of fold predictions keep the meta features honest.
Practical notes
- Base models should be diverse, mixing algorithm types for complementary strengths.
- The meta model is usually simple, such as linear or logistic regression, to avoid overfitting the predictions.
- Stacking often squeezes out extra accuracy but adds complexity and compute.
Key idea
Stacking trains a meta model on out of fold base predictions to learn the best combination, leveraging diverse models while preventing the leakage that naive prediction reuse would cause.