← Lessons

quiz vs the machine

Gold1500

System Design

Watermarks And Late Data

Using watermarks to estimate event time progress so windows can close while handling stragglers.

5 min read · core · beat Gold to climb

The problem of closing a window

In event time, events can arrive out of order or late. So when can a window safely close and emit its result? Waiting forever blocks output, but closing too early drops valid stragglers.

What a watermark is

A watermark is a marker that flows with the stream and asserts that no more events with a timestamp earlier than the watermark are expected. When the watermark passes the end of a window, the engine fires that window. Watermarks let the system make progress on event time despite disorder.

Handling late data

Some events still arrive after the watermark. The engine offers choices:

  • Drop late events for simplicity.
  • Allowed lateness keeps the window state a bit longer to update results.
  • Route late events to a side output for separate handling.

The trade off

A conservative watermark waits longer, giving more correctness but higher latency. An aggressive watermark fires sooner but drops more late data.

Key idea

Watermarks estimate event time progress so windows can close on disordered streams, while a lateness policy decides how to treat events that still arrive too late.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a watermark assert?

2. What does a more conservative watermark trade off?

3. What is one way to handle events arriving after the watermark?