Beyond record at a time
A stateless operator like a filter or map looks at one record and forgets it. Many useful computations need memory of earlier records: running counts, aggregations, joins, and deduplication.
What state holds
A stateful operator keeps state keyed by some field. A count per user stores a number per user id; a streaming join holds records from one side waiting to match the other.
Keeping state correct under failure
State lives in memory for speed but must survive crashes. Engines take periodic checkpoints of state to durable storage. On failure the operator restores the latest checkpoint and resumes, so counts are not lost or double counted.
Bounding state growth
Unbounded state is a leak. Engines bound it with:
- Time to live that expires old keys.
- Window scoping that clears state when a window closes.
Key idea
Stateful operators remember past records in keyed state to compute counts and joins, rely on checkpoints to survive failure, and bound growth with time to live or windows.