The Kappa Architecture Deep Dive

Using a single streaming pipeline for everything, reprocessing history by replaying the log.

The simplification

Kappa architecture removes the batch layer entirely. There is one streaming pipeline that handles both real time and historical processing. The key enabler is a durable, replayable log like a partitioned commit log that retains events long term.

How reprocessing works

When logic changes, you do not run a separate batch job. Instead you replay the log from the beginning through a new version of the stream job, building a fresh output. Once it catches up, you switch traffic to it.

Benefits over lambda

One codebase for both paths removes duplicated logic and drift.
Reprocessing is just replaying the retained log, so correction is uniform.
The same operators serve live and historical needs.

The constraints

The log must retain enough history, which costs storage.
Replaying very large histories can be slow and resource heavy.

Kappa fits well when the stream engine supports strong state and exactly once semantics, since the streaming job must match batch quality.

Key idea