The Thread Per Request Model
In the thread per request model a server assigns one operating system thread to each incoming request and keeps that thread until the request finishes. The code reads top to bottom: accept, parse, query the database, format, respond. Every step can block because the thread has nothing else to do.
This model is loved for its simplicity. A developer writes ordinary sequential code, and the runtime handles parallelism by running many requests on many threads at once. Debugging is straightforward because a stack trace tells the full story of one request.
The cost shows up under load:
- Memory Each thread reserves a stack, often around one megabyte, so tens of thousands of threads exhaust memory.
- Scheduling The kernel must context switch between threads, and beyond a few thousand active threads that overhead grows.
- Idle blocking A thread waiting on a slow network call holds its stack and slot while doing nothing useful.
Servers like classic Apache and many Java application servers used this design with a bounded thread pool to cap the damage. It works well when concurrency is moderate and requests are short.
Key idea
Thread per request trades scalability for simple blocking code, and breaks down when idle threads pile up under high concurrency.