Guessing How Many Rows
The query planner must choose between scans, index lookups, and join orders. Each choice depends on cardinality, how many rows a step produces. The planner does not run the query to find out; it estimates from statistics gathered by sampling the table.
What Statistics Hold
- The total row count and average row size.
- The number of distinct values per column, which drives equality estimates.
- A histogram dividing a column range into buckets so range predicates can be estimated.
- A list of the most common values and their frequencies for skewed columns.
Using Them
For a predicate like a value less than a threshold, the planner reads the histogram to estimate the matching fraction. For an equality on a common value, it uses the recorded frequency. Multiplying selectivities across predicates yields the estimated row count that feeds cost calculations.
Staleness
Statistics drift as data changes. If they fall behind, estimates go wrong and the planner picks poor plans. Engines refresh statistics automatically after enough changes, and you can trigger a manual refresh.
Key idea
The planner estimates how many rows each step yields from sampled statistics and histograms, so keeping those statistics current is what keeps plan choices sound.