← Lessons

quiz vs the machine

Gold1340

Machine Learning

The Batch Size and GPU Utilization

How batch size fills the parallel hardware and where the trade offs lie.

4 min read · core · beat Gold to climb

Filling the machine

A GPU only reaches its potential when given enough parallel work. Batch size is the most direct lever: processing many samples at once gives the scheduler enough warps to keep every SM busy.

Too small versus too large

  • A tiny batch leaves cores idle and wastes the device, since launch overhead and memory latency are not hidden.
  • A large batch raises utilization and throughput, but uses more memory and can hurt model accuracy or latency per request.

There is a point of diminishing returns where utilization saturates and bigger batches only add memory pressure.

The throughput curve

Throughput rises with batch size, then flattens once the GPU is saturated.

Choosing a batch size

In training, larger batches improve hardware efficiency but may need learning rate tuning. In inference, batching boosts throughput at the cost of per request latency, so serving systems balance the two carefully.

Key idea

Batch size controls how fully the GPU is used: larger batches raise utilization and throughput up to saturation, trading memory and latency for efficiency.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can a very small batch waste a GPU?

2. What happens to throughput as batch size grows past saturation?