← Lessons

quiz vs the machine

Gold1410

Databases

Partition and Cluster Keys

Organizing data so queries skip what they do not need.

5 min read · core · beat Gold to climb

Two Levels of Organization

Partitioning divides a table into separate physical chunks by a key, often a date. A query filtered on that key reads only matching partitions. Clustering sorts data within storage by chosen columns so related rows sit near each other, tightening the value range of each block.

How They Speed Queries

  • Partition pruning: a filter like a date range skips entire partitions.
  • Cluster locality: sorted columns let the engine skip blocks whose ranges miss the filter.
  • Together they cut the bytes scanned, which often sets both cost and latency.

Choosing Keys

Partition on a column that queries filter on most, usually time. Avoid too many tiny partitions, which add overhead. Cluster on columns used in filters or joins that are not the partition key.

Key idea

Partition keys let queries skip whole chunks of a table while cluster keys let them skip blocks within a chunk, so choosing keys to match query filters minimizes bytes scanned.

Check yourself

Answer to earn rating on the learn ladder.

1. What does partition pruning accomplish?

2. What is a downside of creating too many tiny partitions?