Designed for Write Heavy Workloads
A B tree updates data in place, which means random disk writes. A log structured merge tree, or LSM tree, instead turns writes into sequential appends and defers the sorting work, trading some read cost for very high write throughput.
How It Works
- New writes land in an in memory memtable, usually a sorted structure, and in a write ahead log for durability.
- When the memtable fills, it is flushed to disk as an immutable sorted file called an SSTable.
- Over time many SSTables accumulate, so a background compaction merges them, discarding overwritten and deleted keys.
The Tradeoffs
A read may have to check the memtable plus several SSTables, so each level keeps a bloom filter to skip files that cannot hold the key. The cost is write amplification: data is rewritten each time it is compacted. Tuning compaction balances read speed, write amplification, and space used.
Key idea
An LSM tree buffers writes in a memtable, flushes immutable sorted SSTables, and compacts them in the background, trading write amplification and multi file reads for high write throughput.