Data Augmentation

Why augment

Models generalize better with more diverse data, but collecting and labeling data is expensive. Data augmentation creates new training examples by transforming existing ones in ways that preserve the label.

Common transformations

For images the classic operations include the following.

Flips and rotations that change orientation
Crops and scaling that change framing
Color jitter that varies brightness and contrast
Cutout that masks random patches

For text and audio there are analogues such as synonym replacement, back translation, and time or frequency masking.

Why it works

Augmentation teaches the model that the label is invariant to these changes. A cat rotated slightly or seen under different lighting is still a cat. This widens the effective training distribution and acts as a regularizer that reduces overfitting.

Stronger modern methods

Mixup blends two examples and their labels into a weighted average
CutMix pastes a patch from one image into another and mixes the labels by area
RandAugment applies a random sequence of operations with a single strength knob

The key rule is that a transformation must not change the true label. Flipping a digit horizontally, for example, can turn a valid character into an invalid one, so augmentations must respect the task.

Key idea

Data augmentation manufactures label preserving variations of examples to widen the distribution and reduce overfitting.

Why augment

Common transformations

Why it works

Stronger modern methods

Key idea

Check yourself