The Latency Versus Throughput Scheduling

Two goals that pull apart

A scheduler can optimize for latency, how quickly a single request finishes, or for throughput, how much total work completes per second. These goals often conflict, and tuning for one can hurt the other.

Why they conflict

Batching boosts throughput by amortizing fixed costs over many items, but it makes the first item wait, raising latency.
Frequent preemption lowers latency for urgent tasks by interrupting often, but the extra context switches reduce total throughput.
Large time slices improve throughput by switching less, yet they delay other waiting tasks.

The queueing view

Latency grows sharply as a system approaches full utilization. Pushing a server to one hundred percent busy maximizes throughput but causes queues, and waiting time, to explode. Latency sensitive systems deliberately run with headroom, leaving capacity idle so bursts do not pile up.

Picking a point

There is no single best answer. A batch analytics job favors throughput and tolerates delay. An interactive service favors latency and accepts lower peak utilization. Good systems make this trade off explicit rather than stumbling into it.

Key idea

Latency and throughput pull against each other, since batching and high utilization raise throughput but inflate waiting time, so systems choose a point with headroom for fast response or full load for maximum work.

The Latency Versus Throughput Scheduling

Two goals that pull apart

Why they conflict

The queueing view

Picking a point

Key idea

Check yourself