Augmenting at inference
Augmentation is usually a training trick, but test time augmentation applies it at inference too. You create several augmented versions of a test input, predict on each, and average the results. It is like ensembling one model over many views.
The procedure
Why it helps
- Each view exposes different cues, and averaging smooths out idiosyncratic errors on any single view.
- It tends to improve both accuracy and calibration, much like a small ensemble.
- It needs no extra training, only more inference compute.
Choosing transforms
- Use only label preserving transforms, the same rule as training augmentation.
- Mild geometric and color changes work best; extreme distortions can hurt.
- Average probabilities rather than hard votes to keep the signal smooth.
Practical notes
- The cost is proportional to the number of views, so balance gain against latency.
- It pairs well with model ensembles for competition grade accuracy.
Key idea
Test time augmentation predicts on several label preserving views of each input and averages them, acting as a one model ensemble. It lifts accuracy and calibration at the cost of extra inference compute, with no retraining.