The Logistic Regression Deep

How the sigmoid, log odds, and cross entropy loss turn a linear score into a calibrated probability.

A linear score becomes a probability

Logistic regression computes a linear score, the dot product of weights and features plus a bias, then passes it through the sigmoid to map any real number into the range zero to one. That output is read as a probability.

Log odds and weights

The linear score equals the log odds of the positive class. So each weight is the change in log odds per unit of its feature. Exponentiating a weight gives an odds ratio, a directly interpretable effect.

Training with cross entropy

The loss is binary cross entropy, also called log loss, which heavily punishes confident wrong predictions.
This loss is convex in the weights, so gradient descent finds the global optimum.
The gradient has a clean form, the difference between predicted probability and true label times the feature.

Practical notes

Regularization, L1 or L2, prevents weights from exploding when classes are separable.
Because outputs are true probabilities, logistic regression is naturally well calibrated on many problems.
The decision threshold, often 0.5, can be tuned to trade precision against recall.

Key idea

Logistic regression maps a linear score to a probability with the sigmoid, where the score is the log odds. It trains by minimizing convex cross entropy loss, giving interpretable weights and calibrated probabilities.

The Logistic Regression Deep

A linear score becomes a probability

Log odds and weights

Training with cross entropy

Practical notes

Key idea

Check yourself