← Lessons

quiz vs the machine

Silver1130

Machine Learning

The Maximum Likelihood Principle

Pick the parameters that make the observed data most probable.

5 min read · intro · beat Silver to climb

The idea

Maximum likelihood estimation chooses model parameters that make the data you actually observed as probable as possible. It is the engine behind many classical methods.

How it works

  • Write the likelihood, the probability of the data given the parameters.
  • Treat that as a function of the parameters with the data fixed.
  • Find the parameter values that maximize it.

Because probabilities of many points multiply into tiny numbers, we usually maximize the log likelihood instead. Logs turn products into sums, which are easier and more stable.

Why it unifies methods

Maximum likelihood quietly produces many familiar results.

  • Assuming Gaussian noise recovers ordinary least squares regression.
  • Assuming Bernoulli outcomes recovers logistic regression and its log loss.
  • The principle gives a single recipe for deriving losses.

Caveats

  • With little data, maximum likelihood can overfit, fitting noise.
  • Adding a prior leads to maximum a posteriori estimation, which regularizes.

Key idea

Maximum likelihood selects the parameters that make the observed data most probable, unifying least squares and log loss under one principle.

Check yourself

Answer to earn rating on the learn ladder.

1. What does maximum likelihood estimation maximize?

2. Why do we usually work with the log likelihood?

3. Which method does maximum likelihood recover under Gaussian noise?