← Lessons

quiz vs the machine

Gold1360

Machine Learning

Batch versus Real Time Inference

Precomputing predictions in bulk versus computing them on demand per request.

5 min read · core · beat Gold to climb

Two serving modes

You can compute predictions ahead of time or at request time. The right choice depends on freshness needs and the input space.

Batch inference

Compute predictions on a schedule and store them.

  • Good when inputs are known and change slowly, such as daily user scores
  • Pros simple serving, predictable cost, no latency pressure
  • Cons stale between runs, wasteful if most predictions go unused

Real time inference

Compute on demand when the request arrives.

  • Good when inputs are fresh or the input space is huge, such as search
  • Pros always current, only computes what is needed
  • Cons tight latency budget, harder infrastructure

Hybrid

Many systems precompute heavy embeddings in batch, then do a light real time pass to combine them with fresh context. This captures most freshness at a fraction of the cost.

Key idea

Batch when inputs are known and slow changing; serve real time when freshness or a vast input space demands it. Hybrids often win.

Check yourself

Answer to earn rating on the learn ladder.

1. When is batch inference a strong fit?

2. What does a common hybrid approach do?