Data Augmentation for Vision

Data augmentation expands a training set by applying random, label preserving transformations to images. It is one of the cheapest ways to improve generalization in vision.

Common transforms

The transformations must not change the correct label.

Flips mirror the image horizontally, useful when left and right do not matter.
Crops and resizes show the network different framings of the same object.
Color jitter shifts brightness, contrast, and saturation.
Rotations and small shifts add geometric variety.

Why it works

Augmentation teaches the network that a cat is still a cat when flipped, brighter, or slightly rotated. This invariance reduces overfitting because the model sees more varied data and cannot memorize exact pixels.

Cautions

The transform must preserve meaning. A horizontal flip ruins a digit recognition task because a flipped six is not a six. Strong augmentation can also make training harder, so the strength is tuned like any hyperparameter.

Augmentation is applied on the fly each epoch, so the network rarely sees the same exact image twice.

Key idea

Data augmentation applies random label preserving transforms like flips crops and color jitter to teach invariance and reduce overfitting.

Data Augmentation for Vision