The Silhouette Score
The silhouette score measures how good a clustering is without using labels. It captures whether points sit comfortably in their own cluster or near the boundary with another.
The per point silhouette
For each point we compute two distances.
- a is the average distance to other points in its own cluster, a measure of cohesion.
- b is the average distance to points in the nearest other cluster, a measure of separation.
The silhouette of the point is b minus a, divided by the larger of the two. The value ranges from minus one to plus one.
Reading the values
- Near plus one the point is well inside its cluster and far from others.
- Near zero the point sits on the boundary between two clusters.
- Negative values suggest the point may be in the wrong cluster.
Using it to choose k
Averaging the silhouette over all points gives a single score for the clustering. Trying several values of k and keeping the one with the highest average silhouette is a more principled alternative to the elbow method, since it balances cohesion against separation.
Key idea
The silhouette score compares within cluster cohesion to nearest cluster separation per point, giving a label free way to judge and tune clustering.