A linear score becomes a probability
Logistic regression computes a linear score, the dot product of weights and features plus a bias, then passes it through the sigmoid to map any real number into the range zero to one. That output is read as a probability.
Log odds and weights
The linear score equals the log odds of the positive class. So each weight is the change in log odds per unit of its feature. Exponentiating a weight gives an odds ratio, a directly interpretable effect.
Training with cross entropy
- The loss is binary cross entropy, also called log loss, which heavily punishes confident wrong predictions.
- This loss is convex in the weights, so gradient descent finds the global optimum.
- The gradient has a clean form, the difference between predicted probability and true label times the feature.
Practical notes
- Regularization, L1 or L2, prevents weights from exploding when classes are separable.
- Because outputs are true probabilities, logistic regression is naturally well calibrated on many problems.
- The decision threshold, often 0.5, can be tuned to trade precision against recall.
Key idea
Logistic regression maps a linear score to a probability with the sigmoid, where the score is the log odds. It trains by minimizing convex cross entropy loss, giving interpretable weights and calibrated probabilities.