← Lessons

quiz vs the machine

Platinum1780

System Design

Message Compaction

Keeping only the latest value per key so a log becomes a compact snapshot.

6 min read · advanced · beat Platinum to climb

Two retention models

Time based retention keeps everything for a window then deletes by age. Log compaction is different: it keeps the latest message for each key and removes older values for the same key. The log becomes a snapshot of the most recent state per key.

Why it is useful

Compaction suits changelog style data where you only care about the current value, such as the latest profile for each user id. A new consumer can rebuild full current state by replaying the compacted log, without storing every historical update.

How tombstones delete

To remove a key entirely, a producer writes a tombstone, a message with the key and a null value. Compaction keeps the tombstone long enough for all consumers to see the deletion, then eventually purges it.

The trade off

Compaction loses history: you can no longer see the full sequence of changes, only the final value per key. It also runs as a background process, so very recent duplicates may still be present until the next compaction pass.

Flow

Key idea

Compaction retains only the latest value per key, turning a log into a current state snapshot at the cost of losing historical change sequences.

Check yourself

Answer to earn rating on the learn ladder.

1. What does log compaction retain?

2. How is a key deleted from a compacted log?

3. What does compaction sacrifice?