Counting the bytes
Storage estimation answers how much disk you will need over time. You build it from the size of one record and how many records arrive.
The chain
- Estimate bytes per item: a tweet of text is roughly a few hundred bytes, a photo a few hundred kilobytes.
- Multiply by items created per day.
- Multiply by the retention period in days to get the resting size.
For example, 100 million photos a day at 300 kilobytes each is about 30 terabytes per day, or roughly 11 petabytes over a year.
Do not forget overhead
- Replication multiplies raw size, often by three.
- Indexes and metadata add a meaningful percentage on top.
- Growth compounds, so project several years out.
The replication factor often surprises newcomers, since a three way copy triples your bill before any index is added.
Key idea
Storage equals item size times creation rate times retention, then inflated by replication, indexes, and multi year growth.