← Lessons

quiz vs the machine

Gold1350

Machine Learning

The Softmax Regression

Generalizing logistic regression to many classes with one weight vector per class.

4 min read · core · beat Gold to climb

From two classes to many

Softmax regression, also called multinomial logistic regression, extends logistic regression to more than two classes. Each class gets its own weight vector, producing one score per class.

The softmax function

The softmax turns a vector of class scores into probabilities that are positive and sum to one. It exponentiates each score and divides by the total, so the largest score becomes the most probable class.

Training

  • The loss is categorical cross entropy, comparing predicted probabilities to the one hot true label.
  • The loss stays convex, so optimization is stable.
  • The gradient again reduces to predicted minus true probability times the feature.

Practical details

  • Scores are shift invariant, so subtracting the maximum score before exponentiating prevents numerical overflow.
  • One weight vector is redundant since probabilities sum to one, but keeping all of them simplifies code and pairs well with regularization.
  • Softmax is the standard output layer for multiclass neural networks for these same reasons.

Key idea

Softmax regression gives each class its own weight vector and uses the softmax to produce probabilities that sum to one. It trains with convex categorical cross entropy and is the canonical multiclass output.

Check yourself

Answer to earn rating on the learn ladder.

1. What property does the softmax output guarantee?

2. Why subtract the maximum score before exponentiating in softmax?