← Lessons

quiz vs the machine

Gold1380

Machine Learning

The Logistic Regression Deep

How the sigmoid, log odds, and cross entropy loss turn a linear score into a calibrated probability.

5 min read · core · beat Gold to climb

A linear score becomes a probability

Logistic regression computes a linear score, the dot product of weights and features plus a bias, then passes it through the sigmoid to map any real number into the range zero to one. That output is read as a probability.

Log odds and weights

The linear score equals the log odds of the positive class. So each weight is the change in log odds per unit of its feature. Exponentiating a weight gives an odds ratio, a directly interpretable effect.

Training with cross entropy

  • The loss is binary cross entropy, also called log loss, which heavily punishes confident wrong predictions.
  • This loss is convex in the weights, so gradient descent finds the global optimum.
  • The gradient has a clean form, the difference between predicted probability and true label times the feature.

Practical notes

  • Regularization, L1 or L2, prevents weights from exploding when classes are separable.
  • Because outputs are true probabilities, logistic regression is naturally well calibrated on many problems.
  • The decision threshold, often 0.5, can be tuned to trade precision against recall.

Key idea

Logistic regression maps a linear score to a probability with the sigmoid, where the score is the log odds. It trains by minimizing convex cross entropy loss, giving interpretable weights and calibrated probabilities.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the linear score in logistic regression represent before the sigmoid?

2. Why can gradient descent reliably find the best logistic regression weights?

3. What does exponentiating a fitted weight give you?