← Lessons

quiz vs the machine

Platinum1800

Concurrency

The Latency Versus Throughput Scheduling

The core tension between fast individual responses and high total work done.

6 min read · advanced · beat Platinum to climb

Two goals that pull apart

A scheduler can optimize for latency, how quickly a single request finishes, or for throughput, how much total work completes per second. These goals often conflict, and tuning for one can hurt the other.

Why they conflict

  • Batching boosts throughput by amortizing fixed costs over many items, but it makes the first item wait, raising latency.
  • Frequent preemption lowers latency for urgent tasks by interrupting often, but the extra context switches reduce total throughput.
  • Large time slices improve throughput by switching less, yet they delay other waiting tasks.

The queueing view

Latency grows sharply as a system approaches full utilization. Pushing a server to one hundred percent busy maximizes throughput but causes queues, and waiting time, to explode. Latency sensitive systems deliberately run with headroom, leaving capacity idle so bursts do not pile up.

Picking a point

There is no single best answer. A batch analytics job favors throughput and tolerates delay. An interactive service favors latency and accepts lower peak utilization. Good systems make this trade off explicit rather than stumbling into it.

Key idea

Latency and throughput pull against each other, since batching and high utilization raise throughput but inflate waiting time, so systems choose a point with headroom for fast response or full load for maximum work.

Check yourself

Answer to earn rating on the learn ladder.

1. How does batching affect the two goals?

2. Why do latency sensitive systems run with headroom?

3. Why does frequent preemption hurt throughput?