← Lessons

quiz vs the machine

Gold1490

System Design

Trace Storage and Retention

Deciding how long to keep traces and at what fidelity when full retention is unaffordable.

5 min read · core · beat Gold to climb

Traces Are Big and Many

A busy system produces enormous trace volume. Keeping every span forever is unaffordable, so storage and retention policy decides what survives and for how long.

Levers You Control

  • Sampling: store only a fraction of traces, the first defense against volume.
  • Retention windows: keep recent traces in fast storage, expire old ones.
  • Tiering: move older traces to cheaper, slower storage instead of deleting.
  • Index vs raw split: keep lightweight searchable summaries longer than the full span detail.

A Tiered Lifecycle

Balancing Cost and Value

Recent traces are queried constantly during incidents, so they justify fast storage. Old traces are rarely opened but may be needed for compliance or trend analysis, which is what cheap cold tiers serve. A common compromise keeps full detail for days, searchable summaries for weeks, and aggregate stats for far longer.

Key idea

Trace retention balances cost against value using sampling, time based windows, and storage tiers, keeping recent traces fast and aging old ones into cheap archives.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the first defense against trace storage volume?

2. Why move old traces to a cold tier instead of deleting them immediately?