Gaussian Mixture Clustering

A Gaussian mixture model, or GMM, assumes the data was generated by a mixture of several Gaussian distributions. Each Gaussian is one cluster, with its own mean, covariance, and weight.

Soft assignments

Unlike k means, which gives each point a single label, a GMM produces soft assignments. Every point receives a probability of belonging to each component, called the responsibility. A point near the boundary might be sixty percent in one cluster and forty percent in another.

Fitting with EM

GMMs are fit with the expectation maximization algorithm, which mirrors the k means loop.

E step: compute each point responsibility for every component given current parameters.
M step: update each component mean, covariance, and weight using those responsibilities.

This alternation increases the data likelihood until it converges.

Why covariance matters

Because each component has its own covariance matrix, a GMM can model stretched and tilted ellipses, not just spheres. That flexibility lets it fit clusters that k means would split or merge incorrectly. The cost is more parameters and sensitivity to initialization.

Key idea

A Gaussian mixture models data as overlapping Gaussians fit by expectation maximization, giving soft probabilistic cluster memberships.

Gaussian Mixture Clustering