The Cost versus Accuracy Tradeoff

Deciding when a small accuracy gain is not worth its compute and complexity bill.

Accuracy is not free

Every accuracy point costs compute, latency, and engineering time. The senior question is whether the gain pays for itself in business value.

Diminishing returns

Accuracy gains usually flatten while cost keeps climbing. A model twice as expensive may add only a fraction of a percent.

Levers to cut cost

Distillation train a small model to mimic a large one
Quantization use lower precision for faster cheaper inference
Caching reuse predictions for repeated inputs
Cascades run a cheap model first, escalate only hard cases to the big one

Frame it in money

Translate accuracy into business value and compare it to the serving cost. If a one percent lift earns less than the extra inference spend, the simpler model wins.

Key idea

Optimize value per dollar, not raw accuracy. Cascades, distillation, and quantization capture most of the quality for a fraction of the cost.

The Cost versus Accuracy Tradeoff

Accuracy is not free

Diminishing returns

Levers to cut cost

Frame it in money

Key idea

Check yourself