The problem
Traffic arrives in bursts. A service that must process each request the instant it lands has to be sized for the peak, which is wasteful and fragile. Queue based load leveling puts a buffer between the burst and the work.
How it smooths load
Producers drop messages into a queue. Consumers pull at a steady, sustainable rate.
- A spike fills the queue instead of overwhelming the consumer.
- The consumer drains at its own pace, sized for average load.
- The queue depth shows how far behind the system is.
What it costs
- Work becomes asynchronous, so callers wait for results indirectly.
- A sustained overload grows the queue without bound unless you add capacity or shed load.
- You must handle retries and duplicate delivery.
The queue converts a spiky arrival pattern into a flat processing pattern, which is far cheaper to provision.
Key idea
Buffer bursts in a queue so consumers can work at a steady, affordable rate.