Statistics and Histograms for the Planner

The planner relies on collected statistics to estimate how many rows a query returns.

Guessing How Many Rows

The query planner must choose between scans, index lookups, and join orders. Each choice depends on cardinality, how many rows a step produces. The planner does not run the query to find out; it estimates from statistics gathered by sampling the table.

What Statistics Hold

The total row count and average row size.
The number of distinct values per column, which drives equality estimates.
A histogram dividing a column range into buckets so range predicates can be estimated.
A list of the most common values and their frequencies for skewed columns.

Using Them

For a predicate like a value less than a threshold, the planner reads the histogram to estimate the matching fraction. For an equality on a common value, it uses the recorded frequency. Multiplying selectivities across predicates yields the estimated row count that feeds cost calculations.

Staleness

Statistics drift as data changes. If they fall behind, estimates go wrong and the planner picks poor plans. Engines refresh statistics automatically after enough changes, and you can trigger a manual refresh.

Key idea

The planner estimates how many rows each step yields from sampled statistics and histograms, so keeping those statistics current is what keeps plan choices sound.

Statistics and Histograms for the Planner

Guessing How Many Rows

What Statistics Hold

Using Them

Staleness

Key idea

Check yourself