← Lessons

quiz vs the machine

Gold1350

System Design

Storage Estimation

Projecting how many bytes a system will hold this year and the next few after it.

5 min read · core · beat Gold to climb

Counting the bytes

Storage estimation answers how much disk you will need over time. You build it from the size of one record and how many records arrive.

The chain

  • Estimate bytes per item: a tweet of text is roughly a few hundred bytes, a photo a few hundred kilobytes.
  • Multiply by items created per day.
  • Multiply by the retention period in days to get the resting size.

For example, 100 million photos a day at 300 kilobytes each is about 30 terabytes per day, or roughly 11 petabytes over a year.

Do not forget overhead

  • Replication multiplies raw size, often by three.
  • Indexes and metadata add a meaningful percentage on top.
  • Growth compounds, so project several years out.

The replication factor often surprises newcomers, since a three way copy triples your bill before any index is added.

Key idea

Storage equals item size times creation rate times retention, then inflated by replication, indexes, and multi year growth.

Check yourself

Answer to earn rating on the learn ladder.

1. Which factor commonly triples raw storage size?

2. What three quantities form the raw storage estimate?

3. Why project storage several years out?