← Lessons

quiz vs the machine

Silver1120

Concurrency

Deduplication Windows

Bound how long a system remembers seen message ids to filter duplicates cheaply.

4 min read · intro · beat Silver to climb

Why a window

Deduplication filters out repeated messages by remembering ids it has already processed. Remembering every id forever is impossible, so systems keep a deduplication window, a bounded span of recent ids.

How the window works

Each message carries a unique id. The receiver keeps ids seen within the window and rejects any repeat.

  • A time based window keeps ids for a fixed duration, say five minutes.
  • A count based window keeps the last N ids regardless of time.

If a duplicate arrives inside the window it is dropped. If it arrives after the window has expired, the system has forgotten the id and will process it again.

Choosing the size

The window must outlast the longest realistic retry. If a sender retries for up to ten minutes but the window is five, late duplicates slip through.

  • Larger windows catch more duplicates but cost more memory.
  • Smaller windows are cheap but leak late retries.

Key idea

A deduplication window trades perfect duplicate filtering for bounded memory, so it must be at least as long as the longest retry interval.

Check yourself

Answer to earn rating on the learn ladder.

1. What happens to a duplicate that arrives after the window expires?

2. How long should a dedup window be?

3. What is the cost of a larger window?