← Lessons

quiz vs the machine

Silver1080

System Design

Batch vs Streaming ETL

Choosing between processing data in scheduled chunks or as a continuous flow of events.

4 min read · intro · beat Silver to climb

Two ways to move data

ETL stands for extract, transform, and load. The big question is when you run it.

  • Batch ETL collects data over a window such as an hour or a day, then processes it all at once on a schedule.
  • Streaming ETL processes each record or small micro batch the moment it arrives, giving near real time results.

Trade offs

  • Latency: batch results are fresh as of the last run, while streaming results are fresh within seconds.
  • Cost and simplicity: batch jobs are cheaper and easier to reason about because they see a complete, bounded dataset.
  • Complexity: streaming must handle late, out of order, and duplicate events, which adds engineering effort.

When to use each

  • Use batch for daily reports, billing, and large historical recomputes where minutes or hours of delay are fine.
  • Use streaming for fraud detection, live dashboards, and alerting where stale data is useless.

Many teams run a hybrid approach: streaming for fresh signals and batch for accurate, reconciled totals.

Key idea

Batch trades freshness for simplicity, while streaming trades simplicity for low latency.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the main advantage of streaming ETL over batch?

2. Which workload best fits batch ETL?