← Lessons

quiz vs the machine

Silver1050

Concurrency

The Thread Per Request Model

Dedicate one thread to each incoming request for its whole lifetime.

4 min read · intro · beat Silver to climb

The Thread Per Request Model

In the thread per request model a server assigns one operating system thread to each incoming request and keeps that thread until the request finishes. The code reads top to bottom: accept, parse, query the database, format, respond. Every step can block because the thread has nothing else to do.

This model is loved for its simplicity. A developer writes ordinary sequential code, and the runtime handles parallelism by running many requests on many threads at once. Debugging is straightforward because a stack trace tells the full story of one request.

The cost shows up under load:

  • Memory Each thread reserves a stack, often around one megabyte, so tens of thousands of threads exhaust memory.
  • Scheduling The kernel must context switch between threads, and beyond a few thousand active threads that overhead grows.
  • Idle blocking A thread waiting on a slow network call holds its stack and slot while doing nothing useful.

Servers like classic Apache and many Java application servers used this design with a bounded thread pool to cap the damage. It works well when concurrency is moderate and requests are short.

Key idea

Thread per request trades scalability for simple blocking code, and breaks down when idle threads pile up under high concurrency.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does thread per request scale poorly under high concurrency?

2. What is the main appeal of the thread per request model?