← Lessons

quiz vs the machine

Platinum1820

Machine Learning

The Softmax and Cross Entropy

Turn scores into probabilities and measure them against the truth.

5 min read · advanced · beat Platinum to climb

Softmax

Softmax converts a vector of raw scores called logits into a probability distribution.

  • It exponentiates each logit and divides by the sum of all exponentials.
  • Outputs are positive and sum to one.
  • A higher logit gets a larger share, while the gap between logits sets the sharpness.

Cross entropy

Cross entropy measures how far a predicted distribution is from the true distribution.

  • For a one hot target it reduces to the negative log probability of the correct class.
  • Minimizing it pushes the model to assign high probability to the right answer.

A clean gradient

Combining softmax with cross entropy yields a simple gradient: the predicted probability minus the target. Libraries fuse the two for numerical stability.

Key idea

Softmax maps logits to a probability distribution and cross entropy scores it against the truth, and together they give the clean predicted minus target gradient that drives classification.

Check yourself

Answer to earn rating on the learn ladder.

1. What does softmax produce from logits?

2. For a one hot target, what does cross entropy reduce to?

3. What is the gradient of fused softmax cross entropy?