The Thread Per Connection Limit

The most natural server design assigns each incoming connection its own thread. The code reads as if only one client exists: read a request, do work, write a reply, repeat. The operating system handles the rest by switching threads when one blocks. This model is easy to reason about, which is why it dominated for years.

The trouble appears at scale. Every thread carries a fixed cost whether or not it is doing anything useful.

Stack memory A thread reserves a stack, commonly hundreds of kilobytes to a megabyte, multiplied by every open connection.
Context switching When thousands of threads exist, the scheduler spends more time switching than working, and caches are constantly flushed.
Diminishing returns Past a few thousand threads, adding connections slows the whole machine instead of serving more clients.

The model does not fail suddenly. It degrades, throughput rises then flattens then falls as overhead grows faster than useful work. For a handful of connections it is perfectly fine and very readable. For tens of thousands it collapses, which is exactly why the event driven approaches were invented.

Key idea

Thread per connection is simple and readable but scales poorly because each thread costs fixed memory and scheduling overhead regardless of useful work.

The Thread Per Connection Limit

The Thread Per Connection Limit

Key idea

Check yourself