The Test Time Augmentation

Averaging predictions over augmented copies of each test input for a free accuracy bump.

Augmenting at inference

Augmentation is usually a training trick, but test time augmentation applies it at inference too. You create several augmented versions of a test input, predict on each, and average the results. It is like ensembling one model over many views.

The procedure

Why it helps

Each view exposes different cues, and averaging smooths out idiosyncratic errors on any single view.
It tends to improve both accuracy and calibration, much like a small ensemble.
It needs no extra training, only more inference compute.

Choosing transforms

Use only label preserving transforms, the same rule as training augmentation.
Mild geometric and color changes work best; extreme distortions can hurt.
Average probabilities rather than hard votes to keep the signal smooth.

Practical notes

The cost is proportional to the number of views, so balance gain against latency.
It pairs well with model ensembles for competition grade accuracy.

Key idea

Test time augmentation predicts on several label preserving views of each input and averages them, acting as a one model ensemble. It lifts accuracy and calibration at the cost of extra inference compute, with no retraining.