← Lessons

quiz vs the machine

Platinum1820

System Design

Exactly Once Stream Processing

Achieve effectively once results by pairing idempotent writes with transactional offset commits.

6 min read · advanced · beat Platinum to climb

Why it is hard

A stream consumer reads a record, processes it, writes output, and commits its offset. A crash between any two steps can cause duplicates or loss. With network failures and retries, at least once and at most once are the easy defaults, while exactly once is the prize.

The real goal is effectively once

You cannot stop a message from being delivered twice across an unreliable network. What you can guarantee is that the observable effect happens once. Two techniques deliver this.

  • Idempotent output: writes are keyed so reapplying the same record changes nothing, for example an upsert by event id.
  • Transactional commit: bind the output write and the offset commit into one atomic unit, so either both happen or neither does.

Kafka transactions

Kafka supports a transaction that spans producing output records and committing the consumed offsets together. A consumer reading with read committed isolation never sees output from an aborted transaction, so a crash and retry produces no visible duplicate.

The catch

  • Exactly once holds within the pipeline, not necessarily across an external system that lacks transactional or idempotent writes.
  • Transactions add latency and coordination overhead, so use them only where correctness demands it.

Key idea

Exactly once is really effectively once, reached by making outputs idempotent or by committing the output and the offset in a single transaction so a retry leaves no visible duplicate.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is true exactly once delivery impossible?

2. How do Kafka transactions enable effectively once?

3. What makes a write idempotent?