← Lessons

quiz vs the machine

Gold1460

System Design

Flink Stateful Streaming

How Flink keeps large keyed state local to operators and recovers it consistently after failures.

5 min read · core · beat Gold to climb

State as a first class citizen

Many stream computations are stateful. Counting events per user, joining two streams, or detecting patterns all require remembering past data. Flink treats this state as a core primitive that operators own and manage directly.

Keyed state

Most state in Flink is keyed state, partitioned by a key like user id. Flink routes all events for a key to the same operator instance, which keeps that key's state locally. This lets state scale by partitioning across the cluster.

Local state backends

State lives in a state backend. It can sit in heap memory for speed or in an embedded key value store on local disk for large state that exceeds memory. Keeping state local avoids a network call per event.

Consistency under failure

Because the job runs forever, state must survive crashes. Flink periodically snapshots all operator state into durable storage so it can restore a consistent view and resume without losing or double counting.

Key idea

Flink makes state a first class, locally held, key partitioned primitive and snapshots it durably so long running stateful streams scale and recover correctly.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does Flink keep state local to operators?

2. How is most Flink state organized?