Why augment
Models generalize better with more diverse data, but collecting and labeling data is expensive. Data augmentation creates new training examples by transforming existing ones in ways that preserve the label.
Common transformations
For images the classic operations include the following.
- Flips and rotations that change orientation
- Crops and scaling that change framing
- Color jitter that varies brightness and contrast
- Cutout that masks random patches
For text and audio there are analogues such as synonym replacement, back translation, and time or frequency masking.
Why it works
Augmentation teaches the model that the label is invariant to these changes. A cat rotated slightly or seen under different lighting is still a cat. This widens the effective training distribution and acts as a regularizer that reduces overfitting.
Stronger modern methods
- Mixup blends two examples and their labels into a weighted average
- CutMix pastes a patch from one image into another and mixes the labels by area
- RandAugment applies a random sequence of operations with a single strength knob
The key rule is that a transformation must not change the true label. Flipping a digit horizontally, for example, can turn a valid character into an invalid one, so augmentations must respect the task.
Key idea
Data augmentation manufactures label preserving variations of examples to widen the distribution and reduce overfitting.