Two ways to move data
ETL stands for extract, transform, and load. The big question is when you run it.
- Batch ETL collects data over a window such as an hour or a day, then processes it all at once on a schedule.
- Streaming ETL processes each record or small micro batch the moment it arrives, giving near real time results.
Trade offs
- Latency: batch results are fresh as of the last run, while streaming results are fresh within seconds.
- Cost and simplicity: batch jobs are cheaper and easier to reason about because they see a complete, bounded dataset.
- Complexity: streaming must handle late, out of order, and duplicate events, which adds engineering effort.
When to use each
- Use batch for daily reports, billing, and large historical recomputes where minutes or hours of delay are fine.
- Use streaming for fraud detection, live dashboards, and alerting where stale data is useless.
Many teams run a hybrid approach: streaming for fresh signals and batch for accurate, reconciled totals.
Key idea
Batch trades freshness for simplicity, while streaming trades simplicity for low latency.