What introspection is
Token introspection is when a resource server asks the authorization server whether a presented token is currently active and what it grants. Unlike verifying a self contained JWT locally, introspection gets a live answer, so revocation is immediate.
The cost is a network call on potentially every request.
The scaling problem
If every request triggers an introspection call, the authorization server becomes a hot bottleneck. At high request rates this adds latency and a hard dependency.
Scaling techniques
- Cache introspection results for a short window keyed by the token, so repeated calls within seconds are free.
- Choose a cache time to live that balances freshness against load. A few seconds usually limits revocation lag while cutting traffic sharply.
- Use local JWT validation for the common path and reserve introspection for sensitive operations.
- Push the deny list to resource servers so they can reject revoked tokens without a round trip.
The art is mixing fast local checks with occasional authoritative introspection.
Key idea
Token introspection gives live revocation but a call per request will not scale, so cache results briefly and reserve authoritative checks for sensitive paths.