← Lessons

quiz vs the machine

Gold1350

Machine Learning

GPU Versus CPU Inference Tradeoffs

When parallel GPU power beats cheap flexible CPU serving.

4 min read · core · beat Gold to climb

Different shapes of compute

A CPU has a few powerful general cores. A GPU has thousands of small cores built for doing the same operation across huge arrays at once. Inference is mostly large matrix math, which fits the GPU shape.

Where GPUs win

  • Large models and big batches keep thousands of cores busy.
  • High memory bandwidth feeds the math fast.
  • Throughput per dollar is strong when the GPU stays full.

Where CPUs win

  • Small models or low traffic where a GPU would sit mostly idle.
  • Tasks with lots of branching logic that GPUs handle poorly.
  • Avoiding the cost and scarcity of GPUs entirely.

The utilization rule

A GPU is only cheap if it is busy. A lightly loaded GPU wastes an expensive resource, so low volume services often serve faster and cheaper on CPUs. High volume or large model services favor GPUs because they fill the hardware.

Key idea

GPUs win when large models or heavy traffic keep their many cores full; CPUs win for small models and low traffic where a GPU would idle. Match the hardware to model size and request volume.

Check yourself

Answer to earn rating on the learn ladder.

1. Why might a low traffic service be cheaper to run on a CPU?

2. What property of GPUs makes them strong for inference?