← Lessons

quiz vs the machine

Gold1360

Machine Learning

K Means Clustering

Grouping unlabeled points around centers that you iteratively refine.

4 min read · core · beat Gold to climb

The goal

K means is an unsupervised method that partitions data into k groups so that points within a cluster are close together. You choose k in advance, and the algorithm finds the cluster centers, called centroids.

The loop

K means alternates two steps until it settles:

  • Assignment, where each point joins the nearest centroid
  • Update, where each centroid moves to the mean of its assigned points

Repeating these steps steadily lowers the total within cluster distance until assignments stop changing.

Caveats

  • Results depend on the initial centroids, so smart seeding like k means plus plus helps
  • It assumes roughly round, similarly sized clusters and struggles with odd shapes
  • Choosing k often uses the elbow method or silhouette scores
  • Features should be scaled since the method relies on distance

Key idea

K means iterates assignment and update steps to place k centroids that minimize within cluster distance, sensitive to initialization and cluster shape.

Check yourself

Answer to earn rating on the learn ladder.

1. What are the two alternating steps of k means?

2. Why does initialization matter for k means?