An application as a graph
A Kafka Streams app is defined as a topology, a directed graph that describes how data flows. It is built once at startup and then run by the framework.
The node types
- Source nodes read records from input topics.
- Processor nodes transform records: map, filter, aggregate, join, and branch. Stateful processors attach a state store.
- Sink nodes write results to output topics.
Tasks and parallelism
The framework splits the topology into tasks, one per input partition. Each task runs the topology over its partitions independently, and tasks spread across threads and machines for parallelism. Adding instances reassigns tasks much like a consumer group.
Why a topology helps
Describing the flow as a graph lets the runtime handle threading, state, fault tolerance, and scaling. You declare the what while the framework manages the how.
Key idea
A Kafka Streams topology is a graph of source, processor, and sink nodes that the runtime splits into per partition tasks, so you declare the flow while the framework handles scaling and fault tolerance.