← Lessons

quiz vs the machine

Gold1430

Databases

Wide Column Store Design

How partition and clustering keys shape a column family store.

5 min read · core · beat Gold to climb

Wide Column Store Design

A wide column store, such as the design behind Cassandra and Bigtable, organizes data into rows that can each hold a different, very large set of columns. It scales writes and reads across many machines by carefully choosing keys.

Partition and clustering keys

The primary key has two parts:

  • The partition key decides which node owns the row, by hashing it. All rows with the same partition key live together.
  • The clustering key sets the sort order of rows within a partition.

A well designed table groups data that is read together into one partition and orders it the way queries want, so a read touches a single node and returns rows already sorted.

Query first modeling

Unlike relational design, you do not normalize and then query. You start from the queries and shape tables to serve them, even if that means duplicating data across several tables. Joins are avoided because they would cross nodes.

Key idea

In a wide column store the partition key places data on a node and the clustering key sorts it, so tables are modeled query first to keep reads on a single node.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the partition key determine?

2. Why is data often duplicated across tables in a wide column store?

3. What does the clustering key control?