Why Compaction Exists
Flushed SSTables pile up, so reads slow and deleted data lingers. Compaction merges files to reduce their number, drop tombstones, and keep key ranges tidy. Two strategies dominate.
Leveled Compaction
Each level holds non overlapping files and is much larger than the one above. When a level overflows, one file is merged into the overlapping files of the next level.
- Few files per level means fewer files to check on a read, so read amplification is low.
- Data is rewritten many times as it descends, so write amplification is high.
- Wasted space stays small because overlap is bounded, so space amplification is low.
Tiered Compaction
Several similar sized runs accumulate in a tier, then merge together into one run in the next tier.
- Data is written fewer times, so write amplification is lower.
- More runs coexist, so a read may check many of them, raising read amplification.
- Old and new copies linger together, raising space amplification.
Choosing
Write heavy workloads favor tiered. Read heavy or space tight workloads favor leveled.
Key idea
Leveled compaction minimizes read and space amplification at the cost of write amplification, while tiered compaction does the reverse.