Why compaction exists
Cassandra writes are append only into immutable SSTables. Over time a partition's data spreads across many SSTables, and deletes leave tombstones. Compaction merges SSTables, drops obsolete rows, and purges expired tombstones.
Size tiered compaction
SizeTieredCompactionStrategy (STCS) groups SSTables of similar size and merges them when enough accumulate.
- Great for write heavy workloads with low write amplification.
- Downside is read amplification, since a partition may span many tiers, and it needs free space for large merges.
Leveled compaction
LeveledCompactionStrategy (LCS) organizes SSTables into levels where each level is roughly ten times the previous.
- A partition lives in few SSTables per level, so reads touch fewer files.
- Best for read heavy and update heavy workloads, at the cost of more write amplification.
Time window compaction
TimeWindowCompactionStrategy (TWCS) buckets SSTables by time window and compacts only within a window.
- Ideal for time series with TTL, since whole old windows can be dropped cheaply.
Diagram
Key idea
Pick compaction by workload: STCS for writes, LCS for reads and updates, and TWCS for time series data with a TTL.