← Lessons

quiz vs the machine

Gold1380

Databases

Compaction and Tombstones

How LSM engines reclaim space and finally delete old data.

4 min read · core · beat Gold to climb

Why Compaction Exists

An LSM tree writes new files instead of editing old ones. Over time the same key appears in many files with different versions. Compaction merges these sorted files, keeps the newest value for each key, and drops the rest.

Tombstones

A delete cannot simply erase a row that may still live in older files. Instead the engine writes a tombstone, a marker that says this key is deleted. Reads see the tombstone and hide the key. Only when compaction merges every file holding that key can the tombstone and the data be physically removed.

Costs to Watch

  • Too many small files raise read amplification because reads scan more places.
  • Aggressive compaction raises write amplification because data is rewritten often.
  • Long lived tombstones bloat files and slow range scans until they are collected.

Key idea

Compaction merges sorted files to reclaim space, and tombstones defer real deletion until every file holding the key has been merged away.

Check yourself

Answer to earn rating on the learn ladder.

1. What is a tombstone?

2. Why can a delete not erase the row immediately?

3. What problem do long lived tombstones cause?