Blending instead of just transforming
Mixup takes two training examples and forms a weighted average of both their inputs and their labels. A blend of a cat and dog image carries a soft label of part cat and part dog. This encourages linear behavior between classes and smoother boundaries.
Cutmix variation
Cutmix instead cuts a rectangular patch from one image and pastes it onto another, mixing the labels in proportion to the patch area. It keeps local image structure intact, which often helps localization tasks more than mixup.
How the blend works
Why it helps
- The model is trained to predict soft targets, which discourages overconfidence.
- It acts as a strong regularizer and improves robustness to noisy labels and adversarial inputs.
- The mix ratio is usually drawn from a beta distribution so most blends are mild.
Practical notes
- These methods pair naturally with standard augmentation.
- They can slow early convergence since targets are softer, so train a bit longer.
- Apply them to the loss using both labels weighted by the mix ratio.
Key idea
Mixup averages inputs and labels while cutmix pastes patches and mixes labels by area. Both produce soft targets that smooth boundaries, reduce overconfidence, and boost robustness.