Wide Column Store Design
A wide column store, such as the design behind Cassandra and Bigtable, organizes data into rows that can each hold a different, very large set of columns. It scales writes and reads across many machines by carefully choosing keys.
Partition and clustering keys
The primary key has two parts:
- The partition key decides which node owns the row, by hashing it. All rows with the same partition key live together.
- The clustering key sets the sort order of rows within a partition.
A well designed table groups data that is read together into one partition and orders it the way queries want, so a read touches a single node and returns rows already sorted.
Query first modeling
Unlike relational design, you do not normalize and then query. You start from the queries and shape tables to serve them, even if that means duplicating data across several tables. Joins are avoided because they would cross nodes.
Key idea
In a wide column store the partition key places data on a node and the clustering key sorts it, so tables are modeled query first to keep reads on a single node.