← Lessons

quiz vs the machine

Gold1420

System Design

The Kafka Streams Topology

How a processing application is described as a graph of sources, processors, and sinks.

5 min read · core · beat Gold to climb

An application as a graph

A Kafka Streams app is defined as a topology, a directed graph that describes how data flows. It is built once at startup and then run by the framework.

The node types

  • Source nodes read records from input topics.
  • Processor nodes transform records: map, filter, aggregate, join, and branch. Stateful processors attach a state store.
  • Sink nodes write results to output topics.

Tasks and parallelism

The framework splits the topology into tasks, one per input partition. Each task runs the topology over its partitions independently, and tasks spread across threads and machines for parallelism. Adding instances reassigns tasks much like a consumer group.

Why a topology helps

Describing the flow as a graph lets the runtime handle threading, state, fault tolerance, and scaling. You declare the what while the framework manages the how.

Key idea

A Kafka Streams topology is a graph of source, processor, and sink nodes that the runtime splits into per partition tasks, so you declare the flow while the framework handles scaling and fault tolerance.

Check yourself

Answer to earn rating on the learn ladder.

1. What is a Kafka Streams topology?

2. How is the topology parallelized?

3. What attaches to a stateful processor node?