The Expectation Maximization Algorithm
Many models have hidden variables we cannot observe, such as which cluster a point belongs to. Expectation maximization, or EM, fits such models by alternating between two steps that improve each other.
The chicken and egg problem
If we knew the hidden labels we could fit the parameters easily, and if we knew the parameters we could guess the labels. We know neither, so EM bootstraps from a guess and refines.
The two steps
- E step: using current parameters, compute the probability that each point came from each hidden component, giving soft assignments.
- M step: using those soft assignments as weights, update the parameters to maximize the expected likelihood.
Repeating these steps drives the data likelihood upward every iteration.
Guarantees and limits
EM never decreases the likelihood, so it converges to a stable point. But that point can be a local maximum rather than the global best, so the starting guess matters. Running EM from several random starts and keeping the best result is standard.
Where it appears
EM underlies Gaussian mixtures, hidden Markov model training, and many latent variable models. Its power is turning a hard joint estimation into two simpler steps.
Key idea
EM alternates an E step that infers hidden assignments with an M step that updates parameters, never lowering likelihood.