← Lessons

quiz vs the machine

Gold1430

Databases

Wide Column Data Modeling

Designing query first denormalized tables in wide column stores.

6 min read · core · beat Gold to climb

Query first, not entity first

Wide column stores like Cassandra reward a query first mindset. Relational modeling normalizes around entities, but here you design a table per query and accept denormalization.

Core rules

  • Spread data evenly by choosing a high cardinality partition key to avoid hot partitions.
  • Minimize partitions read so each query hits ideally one partition.
  • These two goals often conflict, and balancing them is the heart of the craft.

Denormalization and duplication

Because there are no joins, you duplicate data into multiple tables shaped for different queries.

  • A users by id table and a users by email table may hold the same data, keyed differently.
  • Writes update all copies, which is cheap since writes are fast.

Choosing keys

  • The partition key answers which group of rows, picked for even spread.
  • Clustering columns answer the order and range within the group.

Diagram

Key idea

Model wide column tables query first, spreading load with the partition key while minimizing partitions read, and duplicate data freely since there are no joins.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the primary modeling principle in wide column stores?

2. Why is duplicating data across tables acceptable here?