Statistics and Histograms
Good estimates need good summaries. Databases collect statistics about each column and store histograms that describe how values are distributed.
Basic statistics
For each column the engine often tracks the row count, the number of distinct values, the fraction of nulls, and the minimum and maximum. These let it estimate simple filters quickly.
Histograms
A histogram divides a column range into buckets and records how many rows fall in each. This captures skew, where some values are far more common than others. An equi depth histogram makes each bucket hold roughly the same number of rows so dense regions get finer detail.
- Distinct counts estimate equality selectivity.
- Histograms estimate range selectivity and handle skew.
- Stale statistics cause bad estimates, so engines refresh them.
Key idea
Statistics and histograms summarize column distributions so the optimizer can estimate selectivity, and keeping them fresh is essential for good plans.