Why windows exist
A stream is unbounded, so you cannot wait for it to end before computing a sum or average. Windows slice the endless stream into finite chunks you can aggregate.
Window types
- Tumbling: fixed size, non overlapping. A five minute tumbling window covers minutes zero to five, then five to ten.
- Sliding: fixed size but overlapping, advancing by a small step, so each record can fall into several windows.
- Session: dynamic, defined by gaps of inactivity. A new event within the gap extends the session, a long silence closes it.
Event time versus processing time
- Event time is when the event happened.
- Processing time is when the system saw it.
Windows usually use event time so results are correct even when records arrive late or out of order.
Watermarks and lateness
- A watermark is a moving threshold that asserts no more events older than a point should arrive.
- When the watermark passes a window end, the window can fire its result.
- Late events that arrive after the watermark are dropped or sent to a side output.
Key idea
Windowing turns an unbounded stream into finite groups, and event time windows with watermarks let you aggregate correctly despite out of order and late data.