← Lessons

quiz vs the machine

Silver1050

System Design

The Batch Processing Model

Processing large bounded datasets in scheduled jobs that read everything, compute, and write results.

4 min read · intro · beat Silver to climb

What batch processing is

Batch processing runs computation over a bounded dataset that is fully available before the job starts. A scheduler kicks off the job, it reads the whole input, transforms it, and writes the output. Classic examples are nightly billing runs, daily report rollups, and ETL pipelines.

Core properties

  • High throughput is the goal, not low latency. A job can run for minutes or hours.
  • Bounded input means the system knows where the data ends, so it can compute exact totals.
  • Reproducibility is easy because the input is fixed. Rerunning the job on the same data gives the same result.

The trade off

Batch jobs are simple and reliable, but they add latency between when an event happens and when its effect appears. A sale at noon may not show in a report until the next morning.

Batch is ideal when freshness can wait and correctness over complete data matters most, like finance and analytics.

Key idea

Batch processing trades freshness for simplicity and exactness by computing over a complete bounded dataset in scheduled runs.

Check yourself

Answer to earn rating on the learn ladder.

1. What defines the input to a batch job?

2. What does batch processing trade away?