Two clocks in a stream
Every record carries two notions of time. Event time is when the event actually happened, stamped at the source. Processing time is when the stream engine handles the record.
Why they diverge
The two drift apart because of network delay, queuing, retries, and offline devices. A phone in a tunnel may buffer events for minutes, so event time can lag far behind processing time when the device reconnects.
When each is right
- Event time gives correct, reproducible results. Counting purchases per minute by event time yields the same answer no matter when the data arrives or is reprocessed.
- Processing time is simple and low latency but non deterministic, since a replay or a slow batch shifts which window a record lands in.
The cost of event time
Event time is more correct but harder: the engine cannot know when a window is done, since late events may still arrive. That is why event time processing relies on watermarks to decide when to emit.
Key idea
Event time is when an event happened and gives reproducible results, while processing time is when the engine handles it and is simpler but non deterministic, so correct windowing uses event time with watermarks.