← Lessons

quiz vs the machine

Platinum1820

System Design

Cost Based Query Planning

Choosing among equivalent execution plans using statistics to estimate which is cheapest to run.

6 min read · advanced · beat Platinum to climb

Many plans, one query

A single declarative query can run in many ways. The optimizer can reorder joins, pick a broadcast or shuffle join, and choose scan methods. These alternatives are logically equivalent but have wildly different costs. A cost based optimizer estimates each plan's cost and picks the cheapest.

What feeds the estimate

  • Statistics like row counts, distinct values, and column histograms.
  • Cardinality estimation predicts how many rows each operator emits, which drives downstream cost.
  • A cost model converts estimated rows into expected work for cpu, memory, and network.

Why join order dominates

Join order is the biggest lever. Joining two large tables early can explode intermediate rows, while filtering first keeps them small. The optimizer searches the space of orders guided by cardinality estimates.

The fragility

Cost based planning is only as good as its statistics. Stale or missing stats cause bad cardinality estimates and disastrous plans, which is why engines refresh stats and sometimes adapt plans at runtime.

Key idea

A cost based optimizer enumerates equivalent plans and uses statistics and cardinality estimates to choose the cheapest, with join order as the dominant and most fragile decision.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a cost based optimizer choose between?

2. Why is cardinality estimation central?

3. Why can stale statistics ruin a plan?