The problem with hard labels
In classification we usually train against a one hot target, a one for the true class and zeros elsewhere. Cross entropy then pushes the predicted probability of the correct class toward a full one, which drives the model to become overconfident and can hurt generalization.
The fix
Label smoothing replaces the hard target with a slightly softened one:
- The correct class gets a value a little below one, such as zero point nine
- The remaining small mass is spread evenly across the other classes
- The model is now rewarded for being confident but penalized for being absolutely certain
This gentle pressure keeps the logits from growing without bound and tends to improve calibration, so predicted probabilities better match real accuracy.
Tradeoffs
Smoothing usually gives a small accuracy bump and better calibrated probabilities, widely used in image classification and translation. Too much smoothing, though, blurs the classes and can slightly reduce sharpness on easy examples.
Key idea
Label smoothing softens one hot targets so the model avoids extreme overconfidence, improving calibration and often generalization.