Different shapes of compute
A CPU has a few powerful general cores. A GPU has thousands of small cores built for doing the same operation across huge arrays at once. Inference is mostly large matrix math, which fits the GPU shape.
Where GPUs win
- Large models and big batches keep thousands of cores busy.
- High memory bandwidth feeds the math fast.
- Throughput per dollar is strong when the GPU stays full.
Where CPUs win
- Small models or low traffic where a GPU would sit mostly idle.
- Tasks with lots of branching logic that GPUs handle poorly.
- Avoiding the cost and scarcity of GPUs entirely.
The utilization rule
A GPU is only cheap if it is busy. A lightly loaded GPU wastes an expensive resource, so low volume services often serve faster and cheaper on CPUs. High volume or large model services favor GPUs because they fill the hardware.
Key idea
GPUs win when large models or heavy traffic keep their many cores full; CPUs win for small models and low traffic where a GPU would idle. Match the hardware to model size and request volume.