← Lessons

quiz vs the machine

Gold1380

Databases

The SSTable and Compaction

An SSTable is an immutable sorted file on disk, and compaction merges many of them to remove old versions and bound read cost.

5 min read · core · beat Gold to climb

What an SSTable Is

A sorted string table, or SSTable, is the on disk file an LSM engine writes when it flushes a memtable. Its defining properties are that it is immutable and its keys are stored in sorted order.

  • Sorted keys allow binary search and efficient range scans.
  • Immutability means a file is never edited, only created or deleted.
  • A sparse index and a bloom filter let a reader skip files that cannot hold a key.

Why Compaction Is Needed

Because files are never updated, an updated or deleted key leaves stale copies behind in older SSTables. Over time the number of files grows, slowing reads and wasting space.

Compaction is the background process that merges several SSTables into fewer, larger ones.

  • It keeps the newest value for each key and drops older versions.
  • It physically removes keys marked with a tombstone.
  • It bounds how many files a read must inspect.

Compaction Strategies

  • Size tiered merges files of similar size, favoring write throughput.
  • Leveled keeps non overlapping files per level, favoring read and space efficiency.

Key idea

SSTables are immutable sorted files, and compaction periodically merges them to discard stale versions and keep read cost bounded.

Check yourself

Answer to earn rating on the learn ladder.

1. What are the two defining properties of an SSTable?

2. What does compaction do with multiple versions of a key?

3. Which strategy favors read and space efficiency?