Rate Limiting With A Semaphore
A semaphore holds a fixed number of permits. A thread must acquire a permit before proceeding and release it when done. If no permit is free, the thread waits. This makes a semaphore a natural tool for limiting how many operations run concurrently.
A semaphore initialized with one permit acts like a mutex. Initialized with N permits it allows up to N concurrent holders, which is exactly what you want to cap concurrency against a fragile downstream service: never more than N in flight at once.
For rate limiting by requests per second rather than concurrency, a semaphore can be combined with replenishment. Start with a bucket of permits and refill them on a timer. Callers acquire a permit per request; when the bucket empties they wait until the next refill. This is the token bucket idea expressed with semaphore permits.
- tryAcquire Take a permit without blocking, returning failure if none is free, so callers can shed load.
- acquire Block until a permit frees up.
- release Return a permit for others.
A subtle bug is forgetting to release on an error path, which slowly leaks permits until the limiter deadlocks. Always release in a finally style block.
Key idea
A semaphore caps concurrency by handing out a fixed pool of permits, and with timed refills it enforces a request rate.