← Lessons

quiz vs the machine

Silver1040

Databases

Index Selectivity

Why the fraction of distinct values decides whether an index is worth using.

4 min read · intro · beat Silver to climb

What Selectivity Means

Selectivity measures how well an index narrows a search. It is the ratio of distinct values to total rows. A column where almost every value is unique, like an email address, has high selectivity. A column with only a handful of values, like a boolean flag, has low selectivity.

Why It Drives Index Choice

The query planner estimates how many rows a predicate returns before deciding whether to use an index.

  • A highly selective predicate returns few rows, so an index seek saves a lot of work.
  • A poorly selective predicate matches most rows, so following an index plus fetching each row often costs more than a plain table scan.

Indexing a gender or status column rarely helps because reading scattered pages for half the table is slower than scanning it sequentially.

Estimating It

Engines keep statistics such as histograms and distinct value counts. They divide estimated matching rows by total rows to judge selectivity, then compare the cost of a seek against a scan.

Key idea

Selectivity is the share of distinct values in a column, and an index pays off only when a predicate is selective enough to return few rows.

Check yourself

Answer to earn rating on the learn ladder.

1. Which column has the highest selectivity?

2. Why might the planner skip an index on a low selectivity column?