DBSCAN Density Clustering
DBSCAN treats clusters as dense regions of points separated by areas of low density. Unlike k means it does not need a preset number of clusters and can find shapes that are not round.
Two parameters
DBSCAN is controlled by two settings.
- Epsilon is the radius that defines a neighborhood around a point.
- MinPts is the minimum number of points required inside that radius for the region to count as dense.
Point categories
Each point falls into one of three roles.
- A core point has at least MinPts neighbors within epsilon.
- A border point is within epsilon of a core point but is not dense itself.
- A noise point belongs to neither and is left unclustered.
Clusters grow by connecting core points whose neighborhoods overlap, then attaching their border points.
Why it is useful
- It discovers arbitrary shaped clusters, such as crescents or rings.
- It labels outliers as noise instead of forcing them into a cluster.
- It needs no value of k.
The main difficulty is choosing epsilon, since one density does not fit datasets with clusters of very different tightness.
Key idea
DBSCAN grows clusters from dense core points using epsilon and MinPts, finds arbitrary shapes, and labels sparse points as noise.