quiz vs the machine

Platinum1750

System Design

Design an Ad Click Aggregator

Count ad clicks in near real time with accurate windowed aggregation at scale.

7 min read · advanced · beat Platinum to climb

Requirements

Record every ad click and aggregate counts per ad over time windows.
Provide near real time dashboards and accurate billing totals.
Handle massive click volume and avoid double counting.

High level design

A streaming pipeline ingests clicks and aggregates them into time windows.

Ingestion: click events land in a partitioned log such as Kafka, partitioned by ad id.
Stream processing: a stream job aggregates counts per ad per time window using event time and watermarks.
Storage: write rolled up counts to an analytics store for dashboards and a durable store for billing.
Dedup: attach a click id and deduplicate to avoid counting retries twice.

Bottlenecks

Late events: watermarks decide when a window is complete while tolerating some lateness.
Exactly once: idempotent writes keyed by click id prevent double counting under retries.
Hot ads: partition by ad id and pre aggregate to spread skew.

Tradeoffs

Larger windows reduce overhead but increase result latency.
Strict exactly once costs more coordination than approximate at least once.

Key idea

An ad click aggregator is a partitioned streaming pipeline that windows events by event time, deduplicates by click id, and rolls up counts for dashboards and billing.

Check yourself

Answer to earn rating on the learn ladder.

1. Why attach a click id and deduplicate?

2. What do watermarks decide in the stream job?