← Lessons

quiz vs the machine

Platinum1750

Machine Learning

The Ensembling Neural Nets

Combining multiple networks to cut variance and lift accuracy.

5 min read · advanced · beat Platinum to climb

Many models beat one

A single network has variance from its random init and data order. Ensembling averages several independently trained models so their uncorrelated errors cancel, lifting accuracy and calibration beyond any member.

Ways to build diversity

  • Different seeds train the same architecture with different initializations and shuffles.
  • Different architectures combine models that make distinct errors.
  • Snapshot ensembles save several models from one run by cycling the learning rate, capturing diverse minima cheaply.
  • Bagging trains each model on a resampled dataset.

Combining predictions

Why it works

  • Averaging probabilities reduces variance more reliably than voting on hard labels.
  • The gain is largest when members are diverse, making correlated mistakes rare.
  • Ensembles are usually better calibrated than single models.

Practical notes

  • The cost is multiplied inference time, which distillation can later compress into one model.
  • Snapshot and cyclic ensembles get much of the benefit for the price of a single training run.

Key idea

Ensembling averages diverse, independently trained networks so uncorrelated errors cancel, improving accuracy and calibration. Snapshot methods deliver much of the gain from a single run, and distillation can compress the cost.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does ensembling improve accuracy?

2. How does a snapshot ensemble obtain multiple models cheaply?