Softening Targets With Label Smoothing

The problem with hard labels

In classification we usually train against a one hot target, a one for the true class and zeros elsewhere. Cross entropy then pushes the predicted probability of the correct class toward a full one, which drives the model to become overconfident and can hurt generalization.

The fix

Label smoothing replaces the hard target with a slightly softened one:

The correct class gets a value a little below one, such as zero point nine
The remaining small mass is spread evenly across the other classes
The model is now rewarded for being confident but penalized for being absolutely certain

This gentle pressure keeps the logits from growing without bound and tends to improve calibration, so predicted probabilities better match real accuracy.

Tradeoffs

Smoothing usually gives a small accuracy bump and better calibrated probabilities, widely used in image classification and translation. Too much smoothing, though, blurs the classes and can slightly reduce sharpness on easy examples.

Key idea

Label smoothing softens one hot targets so the model avoids extreme overconfidence, improving calibration and often generalization.

Softening Targets With Label Smoothing

The problem with hard labels

The fix

Tradeoffs

Key idea

Check yourself