← Lessons

quiz vs the machine

Gold1340

Machine Learning

Batch vs Real Time Inference

Precompute predictions in bulk or score each request live, and the tradeoffs between them.

4 min read · core · beat Gold to climb

Two serving styles

Batch inference computes predictions for many items on a schedule and stores the results. When a request arrives, you look up the precomputed answer. Real time inference computes a prediction on demand for each request as it arrives.

When batch fits

  • Predictions do not need to reflect the latest event, like a daily product recommendation
  • Inputs are known in advance, so you can score the whole population overnight
  • You want simple, cheap serving that is just a key lookup

When real time fits

  • Inputs are only known at request time, like a fraud check on a new transaction
  • Freshness matters, so a stale prediction would be wrong
  • The space of possible inputs is too large to precompute

The tradeoffs

Batch is cheaper and simpler but its predictions are stale and only cover precomputable inputs. Real time is fresh and flexible but needs a low latency service, careful scaling, and tight monitoring. Many systems blend both, precomputing what they can and scoring live only when needed.

Key idea

Batch precomputes predictions cheaply for known inputs; real time scores live when freshness or unknown inputs demand it.

Check yourself

Answer to earn rating on the learn ladder.

1. When is real time inference the right choice?

2. What is a key drawback of batch inference?