A queue is a buffer and a delay
A work queue sits in front of a pool of workers. It smooths bursts so workers stay busy, but every item that waits in the queue adds to that item's latency.
The depth tradeoff
- A deep queue absorbs big bursts and keeps throughput high, but items can wait a long time before a worker picks them up.
- A shallow queue keeps latency low because items are served quickly, but a burst that overflows it gets rejected.
The time an item spends in the queue is roughly its position divided by the service rate. Double the depth and you roughly double the worst case wait.
The hidden cost of unbounded queues
An unbounded queue never rejects work, which sounds friendly, but under sustained overload it grows without limit. Latency climbs until requests time out anyway, and memory can exhaust. Bounding the queue lets you fail fast instead of failing slow.
Key idea
Queue depth trades throughput for latency, and an unbounded queue under overload turns into unbounded latency, so bound it and reject early.