← Lessons

quiz vs the machine

Silver1120

Concurrency

At Least Once With Deduplication

Pair generous retries with a dedup store so duplicates are filtered before they cause harm.

4 min read · intro · beat Silver to climb

The pragmatic default

Most reliable pipelines choose at least once delivery. The sender keeps retrying until it gets an acknowledgement, accepting that some messages arrive more than once. Deduplication on the receiving side removes the extra copies.

How deduplication works

  • Every message carries a stable unique id, set by the producer, not the transport.
  • The consumer keeps a seen set of ids that have already been processed.
  • Before acting, it checks the set. A known id is dropped, a new id is processed and recorded.

The seen set cannot grow forever, so it uses a time window or a sliding range of ids. Messages older than the window are assumed never to repeat.

Tradeoffs

Deduplication adds storage and a lookup on every message. The window must be longer than the longest possible retry delay, or a late duplicate slips through. Choosing the id wisely, such as a business key rather than a random token, makes dedup meaningful across producers.

Key idea

At least once delivery plus a deduplication store with stable ids and a bounded window gives reliable processing without losing messages.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the consumer use to filter duplicates?

2. Why must the dedup window exceed the longest retry delay?