The Maximum Likelihood Principle

The idea

Maximum likelihood estimation chooses model parameters that make the data you actually observed as probable as possible. It is the engine behind many classical methods.

How it works

Write the likelihood, the probability of the data given the parameters.
Treat that as a function of the parameters with the data fixed.
Find the parameter values that maximize it.

Because probabilities of many points multiply into tiny numbers, we usually maximize the log likelihood instead. Logs turn products into sums, which are easier and more stable.

Why it unifies methods

Maximum likelihood quietly produces many familiar results.

Assuming Gaussian noise recovers ordinary least squares regression.
Assuming Bernoulli outcomes recovers logistic regression and its log loss.
The principle gives a single recipe for deriving losses.

Caveats

With little data, maximum likelihood can overfit, fitting noise.
Adding a prior leads to maximum a posteriori estimation, which regularizes.

Key idea