The Dremel Idea
BigQuery descends from Dremel, a system built to query huge datasets in seconds. It combines columnar storage with a multi level serving tree that fans a query out across thousands of workers and merges results back up.
The Serving Tree
A root node receives the query and rewrites it. It splits the work into pieces and hands them to intermediate nodes, which split further down to leaf nodes that scan storage. Partial results flow back up the tree, aggregating at each level until the root returns the answer.
Why It Is Fast
- Massive parallelism: thousands of leaves scan shards at once.
- Columnar reads: only requested columns leave storage.
- In tree aggregation: partial sums combine on the way up, shrinking data each hop.
- Serverless pooling: workers come from a shared pool, so no cluster to manage.
Key idea
The Dremel serving tree fans a query across thousands of leaf scanners and aggregates partial results upward, turning petabyte scans into interactive queries.