Space that is no longer needed
Append only and copy on write storage never overwrite data, so they accumulate obsolete versions: superseded values, deleted keys, and unreferenced blocks. Garbage collection finds and frees this dead space so the system does not fill up.
Knowing what is dead
A block or version is garbage only when nothing live still refers to it. The collector must be sure no current data, snapshot, or in flight reader needs it. Often a value stays alive until the oldest active reader or snapshot that might see it has finished, which prevents reclaiming data still in use.
Tombstones for deletes
In a distributed store, a delete is recorded as a tombstone, a marker that the key is gone. The tombstone must outlive any stale replica that still holds the old value, so it is kept for a grace period before the collector finally removes both the value and the tombstone.
Doing it without stalls
- Incremental collection works in small steps to avoid long pauses.
- Background threads reclaim space while reads and writes continue.
- Collection often piggybacks on compaction, which already rewrites files.
Key idea
Garbage collection reclaims obsolete versions and tombstoned deletes only once no live reader, snapshot, or stale replica needs them, working incrementally in the background so storage stays bounded without stalling traffic.