← Lessons

quiz vs the machine

Gold1460

Databases

Statistics and Histograms for the Planner

The planner relies on collected statistics to estimate how many rows a query returns.

6 min read · core · beat Gold to climb

Guessing How Many Rows

The query planner must choose between scans, index lookups, and join orders. Each choice depends on cardinality, how many rows a step produces. The planner does not run the query to find out; it estimates from statistics gathered by sampling the table.

What Statistics Hold

  • The total row count and average row size.
  • The number of distinct values per column, which drives equality estimates.
  • A histogram dividing a column range into buckets so range predicates can be estimated.
  • A list of the most common values and their frequencies for skewed columns.

Using Them

For a predicate like a value less than a threshold, the planner reads the histogram to estimate the matching fraction. For an equality on a common value, it uses the recorded frequency. Multiplying selectivities across predicates yields the estimated row count that feeds cost calculations.

Staleness

Statistics drift as data changes. If they fall behind, estimates go wrong and the planner picks poor plans. Engines refresh statistics automatically after enough changes, and you can trigger a manual refresh.

Key idea

The planner estimates how many rows each step yields from sampled statistics and histograms, so keeping those statistics current is what keeps plan choices sound.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does the planner collect statistics?

2. What is a histogram used for?

3. What goes wrong when statistics are stale?