Why it is hard
A stream consumer reads a record, processes it, writes output, and commits its offset. A crash between any two steps can cause duplicates or loss. With network failures and retries, at least once and at most once are the easy defaults, while exactly once is the prize.
The real goal is effectively once
You cannot stop a message from being delivered twice across an unreliable network. What you can guarantee is that the observable effect happens once. Two techniques deliver this.
- Idempotent output: writes are keyed so reapplying the same record changes nothing, for example an upsert by event id.
- Transactional commit: bind the output write and the offset commit into one atomic unit, so either both happen or neither does.
Kafka transactions
Kafka supports a transaction that spans producing output records and committing the consumed offsets together. A consumer reading with read committed isolation never sees output from an aborted transaction, so a crash and retry produces no visible duplicate.
The catch
- Exactly once holds within the pipeline, not necessarily across an external system that lacks transactional or idempotent writes.
- Transactions add latency and coordination overhead, so use them only where correctness demands it.
Key idea
Exactly once is really effectively once, reached by making outputs idempotent or by committing the output and the offset in a single transaction so a retry leaves no visible duplicate.