← Lessons

quiz vs the machine

Gold1430

System Design

Garbage Collection Storage

Reclaiming space from obsolete versions without disturbing live readers.

5 min read · core · beat Gold to climb

Space that is no longer needed

Append only and copy on write storage never overwrite data, so they accumulate obsolete versions: superseded values, deleted keys, and unreferenced blocks. Garbage collection finds and frees this dead space so the system does not fill up.

Knowing what is dead

A block or version is garbage only when nothing live still refers to it. The collector must be sure no current data, snapshot, or in flight reader needs it. Often a value stays alive until the oldest active reader or snapshot that might see it has finished, which prevents reclaiming data still in use.

Tombstones for deletes

In a distributed store, a delete is recorded as a tombstone, a marker that the key is gone. The tombstone must outlive any stale replica that still holds the old value, so it is kept for a grace period before the collector finally removes both the value and the tombstone.

Doing it without stalls

  • Incremental collection works in small steps to avoid long pauses.
  • Background threads reclaim space while reads and writes continue.
  • Collection often piggybacks on compaction, which already rewrites files.

Key idea

Garbage collection reclaims obsolete versions and tombstoned deletes only once no live reader, snapshot, or stale replica needs them, working incrementally in the background so storage stays bounded without stalling traffic.

Check yourself

Answer to earn rating on the learn ladder.

1. When is a version safe to garbage collect?

2. Why must a tombstone be kept for a grace period?

3. How does garbage collection avoid stalling traffic?