← Lessons

quiz vs the machine

Gold1350

Machine Learning

DBSCAN Density Clustering

Finding clusters as dense regions separated by sparse gaps.

5 min read · core · beat Gold to climb

DBSCAN Density Clustering

DBSCAN treats clusters as dense regions of points separated by areas of low density. Unlike k means it does not need a preset number of clusters and can find shapes that are not round.

Two parameters

DBSCAN is controlled by two settings.

  • Epsilon is the radius that defines a neighborhood around a point.
  • MinPts is the minimum number of points required inside that radius for the region to count as dense.

Point categories

Each point falls into one of three roles.

  • A core point has at least MinPts neighbors within epsilon.
  • A border point is within epsilon of a core point but is not dense itself.
  • A noise point belongs to neither and is left unclustered.

Clusters grow by connecting core points whose neighborhoods overlap, then attaching their border points.

Why it is useful

  • It discovers arbitrary shaped clusters, such as crescents or rings.
  • It labels outliers as noise instead of forcing them into a cluster.
  • It needs no value of k.

The main difficulty is choosing epsilon, since one density does not fit datasets with clusters of very different tightness.

Key idea

DBSCAN grows clusters from dense core points using epsilon and MinPts, finds arbitrary shapes, and labels sparse points as noise.

Check yourself

Answer to earn rating on the learn ladder.

1. What defines a core point in DBSCAN?

2. How does DBSCAN treat points in sparse regions?

3. What is an advantage of DBSCAN over k means?