← Lessons

quiz vs the machine

Gold1340

Machine Learning

The Data Augmentation Strategies

How label preserving transforms expand a dataset and improve generalization.

4 min read · core · beat Gold to climb

Making more from less

Data augmentation creates new training examples by applying transformations that change the input but keep the label valid. It is a cheap way to grow data and teach the model what variations it should ignore.

Common transforms

  • For images, flips, crops, rotations, and color jitter produce new views of the same object.
  • For text, synonym swaps, back translation, and random deletion produce paraphrases.
  • For audio, time stretch, pitch shift, and added noise simulate recording conditions.

The label preserving rule

  • The transform must not change the correct answer. Flipping a digit six into a nine breaks the label, so not every transform is safe for every task.
  • Choosing safe transforms requires knowing what invariances the task actually has.

Why it helps

  • Augmentation acts as a regularizer, forcing the model to rely on stable features rather than memorizing exact pixels or tokens.
  • It improves robustness to the kinds of variation seen at deployment.

A caution

  • Too aggressive augmentation can push examples off the real data distribution and hurt, so the strength is a tuning knob.

Key idea

Data augmentation applies label preserving transforms to grow data and teach invariances, but the transforms must match the task and not overpower the signal.

Check yourself

Answer to earn rating on the learn ladder.

1. What must a data augmentation transform preserve?

2. Why does augmentation improve generalization?