The motivation
Lambda architecture answers a single question with two pipelines. A batch layer recomputes accurate results over all historical data, while a speed layer processes recent events for low latency. A serving layer merges both at query time.
The three layers
- Batch layer stores the immutable master dataset and periodically recomputes batch views. These are slow but correct.
- Speed layer processes only recent data the batch layer has not yet covered, producing real time views that fill the gap.
- Serving layer indexes both views and combines them so a query reflects complete plus fresh data.
The cost
The big drawback is duplicated logic. The same aggregation must be written and maintained twice, once in batch and once in streaming, with different frameworks and subtle behavior differences.
Lambda was popular before stream engines could reprocess history reliably. It guarantees eventual correctness because batch overwrites approximate speed results.
Key idea
Lambda architecture pairs an accurate batch layer with a fast speed layer and merges them, accepting duplicated logic for both freshness and correctness.