The challenge
A stream never ends, so you cannot wait for all data before computing a sum or average. Streaming aggregation maintains running results and emits them over windows of time.
Window types
- A tumbling window is fixed and non overlapping, like every one minute bucket.
- A sliding window overlaps, recomputing as it advances.
- A session window groups events separated by gaps of inactivity.
Handling time and lateness
The engine tracks event time, the moment an event happened, not when it arrived. A watermark estimates how far event time has advanced so the engine knows when a window is complete and how long to wait for late events before closing it.
Aggregation state lives in the engine and must be checkpointed so a failure does not lose partial counts.
Key idea
Streaming aggregation keeps running results over time windows, using event time and watermarks to handle late and out of order data.