The idea
Maximum likelihood estimation chooses model parameters that make the data you actually observed as probable as possible. It is the engine behind many classical methods.
How it works
- Write the likelihood, the probability of the data given the parameters.
- Treat that as a function of the parameters with the data fixed.
- Find the parameter values that maximize it.
Because probabilities of many points multiply into tiny numbers, we usually maximize the log likelihood instead. Logs turn products into sums, which are easier and more stable.
Why it unifies methods
Maximum likelihood quietly produces many familiar results.
- Assuming Gaussian noise recovers ordinary least squares regression.
- Assuming Bernoulli outcomes recovers logistic regression and its log loss.
- The principle gives a single recipe for deriving losses.
Caveats
- With little data, maximum likelihood can overfit, fitting noise.
- Adding a prior leads to maximum a posteriori estimation, which regularizes.
Key idea
Maximum likelihood selects the parameters that make the observed data most probable, unifying least squares and log loss under one principle.