Hard targets and overconfidence
In classification, the usual target is a one hot vector that assigns all probability to the correct class. Training to match this pushes the model to make the correct logit enormous and all others tiny, which drives the model toward extreme overconfidence.
What label smoothing does
Label smoothing replaces the hard target with a softened one. Most of the probability still goes to the correct class, but a small amount is spread evenly across the other classes.
- A smoothing value such as one tenth is held back from the true class
- That held back mass is divided among the remaining classes
- The model is trained to match this softer distribution
Why it helps
- It discourages the model from producing extreme logits
- It improves calibration, so predicted probabilities better match real accuracy
- It often gives a small boost in generalization and reduces overfitting
The trade off
Because the target is no longer a pure one hot, the model is gently penalized for being completely certain even when it is right. This is usually a good thing, but if you need the raw confidence scores to be sharp, you may prefer to train without it or calibrate afterward.
Key idea
Label smoothing softens one hot targets to curb overconfidence and improve calibration and generalization.