← Lessons

quiz vs the machine

Gold1400

Machine Learning

The Label Smoothing

Softening one hot targets to curb overconfidence and improve calibration.

4 min read · core · beat Gold to climb

The overconfidence problem

With one hot targets, cross entropy pushes the correct logit toward infinity relative to the rest. The model becomes overconfident and poorly calibrated, assigning near certainty even when wrong.

What smoothing does

Label smoothing replaces the hard one with a slightly lower value and spreads the remaining small mass evenly across the other classes. A common setting reserves a tenth of the probability for the wrong classes. The target is no longer extreme, so logits stay bounded.

The transformation

Why it helps

  • Predictions become better calibrated, so confidence tracks accuracy.
  • The network is discouraged from chasing infinite logits, which improves generalization.
  • It tightens the clustering of representations within a class.

Practical notes

  • A smoothing value around 0.1 is a common default.
  • It can slightly hurt if you later need the raw logits for distillation, where sharper targets matter.
  • It pairs well with mixup, which also produces soft labels.

Key idea

Label smoothing softens one hot targets by reserving a little probability for other classes. This bounds logits, improves calibration, and aids generalization at the cost of slightly fuzzier targets.

Check yourself

Answer to earn rating on the learn ladder.

1. What does label smoothing change about the training target?

2. What is a primary benefit of label smoothing?