What an SSTable Is
A sorted string table, or SSTable, is the on disk file an LSM engine writes when it flushes a memtable. Its defining properties are that it is immutable and its keys are stored in sorted order.
- Sorted keys allow binary search and efficient range scans.
- Immutability means a file is never edited, only created or deleted.
- A sparse index and a bloom filter let a reader skip files that cannot hold a key.
Why Compaction Is Needed
Because files are never updated, an updated or deleted key leaves stale copies behind in older SSTables. Over time the number of files grows, slowing reads and wasting space.
Compaction is the background process that merges several SSTables into fewer, larger ones.
- It keeps the newest value for each key and drops older versions.
- It physically removes keys marked with a tombstone.
- It bounds how many files a read must inspect.
Compaction Strategies
- Size tiered merges files of similar size, favoring write throughput.
- Leveled keeps non overlapping files per level, favoring read and space efficiency.
Key idea
SSTables are immutable sorted files, and compaction periodically merges them to discard stale versions and keep read cost bounded.