← Lessons

quiz vs the machine

Platinum1720

System Design

Stateful Stream Processing

How operators remember past records to compute joins, counts, and aggregations.

6 min read · advanced · beat Platinum to climb

Beyond record at a time

A stateless operator like a filter or map looks at one record and forgets it. Many useful computations need memory of earlier records: running counts, aggregations, joins, and deduplication.

What state holds

A stateful operator keeps state keyed by some field. A count per user stores a number per user id; a streaming join holds records from one side waiting to match the other.

Keeping state correct under failure

State lives in memory for speed but must survive crashes. Engines take periodic checkpoints of state to durable storage. On failure the operator restores the latest checkpoint and resumes, so counts are not lost or double counted.

Bounding state growth

Unbounded state is a leak. Engines bound it with:

  • Time to live that expires old keys.
  • Window scoping that clears state when a window closes.

Key idea

Stateful operators remember past records in keyed state to compute counts and joins, rely on checkpoints to survive failure, and bound growth with time to live or windows.

Check yourself

Answer to earn rating on the learn ladder.

1. Which computation requires stateful processing?

2. How does a stateful operator survive a crash?

3. How is unbounded state growth controlled?