Query first, not entity first
Wide column stores like Cassandra reward a query first mindset. Relational modeling normalizes around entities, but here you design a table per query and accept denormalization.
Core rules
- Spread data evenly by choosing a high cardinality partition key to avoid hot partitions.
- Minimize partitions read so each query hits ideally one partition.
- These two goals often conflict, and balancing them is the heart of the craft.
Denormalization and duplication
Because there are no joins, you duplicate data into multiple tables shaped for different queries.
- A users by id table and a users by email table may hold the same data, keyed differently.
- Writes update all copies, which is cheap since writes are fast.
Choosing keys
- The partition key answers which group of rows, picked for even spread.
- Clustering columns answer the order and range within the group.
Diagram
Key idea
Model wide column tables query first, spreading load with the partition key while minimizing partitions read, and duplicate data freely since there are no joins.