← Lessons

quiz vs the machine

Gold1450

Machine Learning

The Mixup And Cutmix

Blending samples and labels together to smooth decision boundaries.

4 min read · core · beat Gold to climb

Blending instead of just transforming

Mixup takes two training examples and forms a weighted average of both their inputs and their labels. A blend of a cat and dog image carries a soft label of part cat and part dog. This encourages linear behavior between classes and smoother boundaries.

Cutmix variation

Cutmix instead cuts a rectangular patch from one image and pastes it onto another, mixing the labels in proportion to the patch area. It keeps local image structure intact, which often helps localization tasks more than mixup.

How the blend works

Why it helps

  • The model is trained to predict soft targets, which discourages overconfidence.
  • It acts as a strong regularizer and improves robustness to noisy labels and adversarial inputs.
  • The mix ratio is usually drawn from a beta distribution so most blends are mild.

Practical notes

  • These methods pair naturally with standard augmentation.
  • They can slow early convergence since targets are softer, so train a bit longer.
  • Apply them to the loss using both labels weighted by the mix ratio.

Key idea

Mixup averages inputs and labels while cutmix pastes patches and mixes labels by area. Both produce soft targets that smooth boundaries, reduce overconfidence, and boost robustness.

Check yourself

Answer to earn rating on the learn ladder.

1. How does mixup differ from cutmix?

2. Why do mixup and cutmix improve calibration?