The problem it solves
Expectation maximization (EM) fits models with latent variables, hidden assignments we never observe. A classic case is a Gaussian mixture where each point's cluster is unknown.
- Direct likelihood maximization is hard because of the hidden labels.
- EM splits it into two easier alternating steps.
The two steps
- E step: given current parameters, compute the expected hidden assignments, often soft probabilities of cluster membership.
- M step: given those expected assignments, update the parameters to maximize the expected likelihood.
Repeating these steps never decreases the likelihood.
Behavior
EM climbs the likelihood and converges to a local optimum. The result depends on initialization, so multiple restarts help avoid poor solutions.
EM is a general recipe: any model with clean E and M updates, like mixtures or hidden Markov models, can use it.
Key idea
EM alternates an E step that estimates hidden assignments with an M step that refits parameters, monotonically increasing the likelihood toward a local optimum that depends on initialization.