The Optimizer's Hardest Guess
Before running a query, the optimizer must guess how many rows each step will produce. This guess is the cardinality estimate. It drives every choice: which index to use, which join algorithm, and the join order. A wrong estimate leads to a slow plan.
Where Estimates Come From
- The engine keeps statistics such as row counts and distinct values.
- A histogram describes how values are distributed in a column.
- For a filter, it multiplies total rows by an estimated selectivity.
Why Estimates Go Wrong
- Filters on correlated columns break the independence assumption and compound error.
- Stale statistics after big data changes mislead the planner.
- Errors multiply through a chain of joins, so a small mistake explodes.
Helping It
- Keep statistics fresh by running analyze after large changes.
- Use multi column statistics for correlated predicates.
Key idea
Cardinality estimation predicts row counts from statistics and histograms to drive plan choices, and because errors multiply through joins, fresh and correlation aware statistics are critical.