← Lessons

quiz vs the machine

Silver1050

Machine Learning

K Means Clustering Revisited

Partitioning points into k groups by iterating assignment and update steps.

4 min read · intro · beat Silver to climb

K Means Clustering Revisited

K means is the workhorse of unsupervised partitioning. It splits a dataset into k clusters so that points in the same group are close to a shared center, called a centroid.

The two step loop

K means alternates between two steps until the assignments stop changing.

  • Assignment step: attach every point to its nearest centroid using Euclidean distance.
  • Update step: move each centroid to the mean of the points assigned to it.

Repeating these steps minimizes the within cluster sum of squares, the total squared distance from points to their centroids.

Practical notes

  • You must choose k in advance, which is a real limitation.
  • The result depends on the random starting centroids, so methods like k means plus plus pick smarter seeds.
  • K means assumes clusters are roughly round and similarly sized, so it struggles with stretched or nested shapes.

Why it converges

Each step can only lower or hold the objective, and there are finitely many assignments, so the loop reaches a stable point. That point may be a local minimum, which is why several restarts are common.

Key idea

K means iterates assignment and update steps to minimize within cluster spread, but needs a chosen k and good seeds.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the update step of k means do?

2. Why are multiple random restarts often used?