Two dimensions at once
Database sizing has two axes: the volume of data stored and the throughput of operations against it. A database can be small but write hot, or huge but rarely touched.
Sizing for volume
- Estimate rows times bytes per row plus index overhead.
- Add growth and replication, like any storage estimate.
Sizing for throughput
- Estimate reads and writes per second at peak.
- Compare to what one instance sustains. A single primary has a write ceiling that no amount of replicas raises.
When writes exceed one node, you shard, splitting data across instances so each carries a slice of the write load.
Read replicas scale reads but never writes, so a write heavy workload that outgrows one primary must shard rather than add replicas.
Key idea
Size a database on both stored volume and operation throughput, and shard when writes exceed a single primary.