Wide Column Data Modeling

Query first, not entity first

Wide column stores like Cassandra reward a query first mindset. Relational modeling normalizes around entities, but here you design a table per query and accept denormalization.

Core rules

Spread data evenly by choosing a high cardinality partition key to avoid hot partitions.
Minimize partitions read so each query hits ideally one partition.
These two goals often conflict, and balancing them is the heart of the craft.

Denormalization and duplication

Because there are no joins, you duplicate data into multiple tables shaped for different queries.

A users by id table and a users by email table may hold the same data, keyed differently.
Writes update all copies, which is cheap since writes are fast.

Choosing keys

The partition key answers which group of rows, picked for even spread.
Clustering columns answer the order and range within the group.

Diagram

Key idea

Model wide column tables query first, spreading load with the partition key while minimizing partitions read, and duplicate data freely since there are no joins.