Label Smoothing

Hard targets and overconfidence

In classification, the usual target is a one hot vector that assigns all probability to the correct class. Training to match this pushes the model to make the correct logit enormous and all others tiny, which drives the model toward extreme overconfidence.

What label smoothing does

Label smoothing replaces the hard target with a softened one. Most of the probability still goes to the correct class, but a small amount is spread evenly across the other classes.

A smoothing value such as one tenth is held back from the true class
That held back mass is divided among the remaining classes
The model is trained to match this softer distribution

Why it helps

It discourages the model from producing extreme logits
It improves calibration, so predicted probabilities better match real accuracy
It often gives a small boost in generalization and reduces overfitting

The trade off

Because the target is no longer a pure one hot, the model is gently penalized for being completely certain even when it is right. This is usually a good thing, but if you need the raw confidence scores to be sharp, you may prefer to train without it or calibrate afterward.

Key idea

Label smoothing softens one hot targets to curb overconfidence and improve calibration and generalization.

Hard targets and overconfidence

What label smoothing does

Why it helps

The trade off

Key idea

Check yourself