Two Levels of Organization
Partitioning divides a table into separate physical chunks by a key, often a date. A query filtered on that key reads only matching partitions. Clustering sorts data within storage by chosen columns so related rows sit near each other, tightening the value range of each block.
How They Speed Queries
- Partition pruning: a filter like a date range skips entire partitions.
- Cluster locality: sorted columns let the engine skip blocks whose ranges miss the filter.
- Together they cut the bytes scanned, which often sets both cost and latency.
Choosing Keys
Partition on a column that queries filter on most, usually time. Avoid too many tiny partitions, which add overhead. Cluster on columns used in filters or joins that are not the partition key.
Key idea
Partition keys let queries skip whole chunks of a table while cluster keys let them skip blocks within a chunk, so choosing keys to match query filters minimizes bytes scanned.