The wrong assumption behind every web server

For decades, web servers made an assumption so natural nobody wrote it down: to hold a connection open, you need a thread sitting with it. One connection, one thread. It felt self-evident.

It was wrong — and the consequences of getting it right changed how most of the internet is served today.

The paradox

Picture a server with 100 users connected to it. Each one is holding an open connection — waiting for the next request, keeping the channel alive. No clicks. No data moving. The server is, for all practical purposes, doing nothing.

How many threads does Apache need to handle this?

100 threads. One per user. All of them parked, all of them burning memory, all of them waiting for something that isn’t coming.

That’s the C10K problem — named in a 1999 essay by engineer Dan Kegel, asking why a single machine couldn’t handle 10,000 concurrent connections. The hardware was capable. The math didn’t add up. Something about how servers were built was deeply wasteful, and it took years to name what.

A thread is not a unit of work. For most web servers, it’s a unit of waiting.

Why threads feel like the obvious answer

To understand why the old model seemed right, you have to start from the basics.

When a browser connects to your server, it opens a TCP connection — a persistent two-way channel. The OS represents this channel as a file descriptor: just a small integer, like 4 or 91. Think of it as a ticket number. The kernel manages the actual connection state internally; your program holds the number and passes it to system calls like read() and write() whenever it wants to interact with that connection.

Now, between when the connection opens and when the request fully arrives, there’s a gap. The browser sends data in packets. Those packets travel across the internet. They arrive whenever they arrive — your server has no say in this.

So what does your server do while waiting?

The traditional answer — baked into every OS since the 1970s — is that you call read() on the file descriptor, and your program stops. The OS parks your thread, pulls it off the CPU, and wakes it up only when data arrives. This is blocking I/O. Your thread is blocked. It cannot do anything else. It just sits there, holding its stack in memory, waiting.

For 100 users all waiting simultaneously, the obvious move: give each connection its own thread. Thread A waits for User 1. Thread B waits for User 2. When Thread A’s socket gets data, it wakes up, handles the request, goes back to waiting.

This is the Apache model. One thread per connection. For years, at reasonable scale, it worked perfectly well.

Think of it like a restaurant where each table gets a dedicated waiter — standing there, waiting for the customer to be ready. Ten tables, fine. Ten thousand tables, all reading the menu, none ordering yet — you’ve hired ten thousand waiters to stare at walls.

Where it breaks

Every thread needs a stack — a private block of memory holding its local variables, function call history, and execution state. Linux reserves 8MB of virtual address space per thread by default (actual physical pages only get allocated as the stack grows, but the address space is committed upfront).

At 10,000 connections, that’s 80GB of virtual address space reserved for stacks — the overwhelming majority of it sitting idle, representing connections that are doing absolutely nothing.

But memory is only half the problem. When you have more threads than CPU cores — and with 10,000 threads, you certainly do — the OS has to constantly context switch between them. Save Thread A’s register state. Load Thread B’s. Flush cache lines. Repeat. Each switch costs a few microseconds. At scale, the CPU is spending a meaningful fraction of its time doing this bookkeeping rather than serving actual requests.

Threads aren’t slow because they’re working too hard. They’re slow because they’re doing nothing — and nothing, at scale, is expensive. 95% of those 10,000 threads are blocked and idle at any given moment. You’re reserving 8MB of address space per thread just to represent inactivity.

The assumption

To hold a connection open, you need a thread sitting with it.

It feels self-evident. A connection is open, something needs to watch it, that something is a thread. One connection, one thread.

What does a thread actually do for a waiting connection? Two things: it holds state — where we are in the request, what’s been read, what comes next — and it waits for the OS to wake it up when data arrives.

The state is necessary. The thread is not.

State is just data. A small struct in memory — a few hundred bytes describing where a connection is in its lifecycle. A thread, on the other hand, is an execution context. It has an 8MB stack, OS scheduler overhead, and the right to occupy a CPU core. You are paying for a surgeon when all you need is someone to hold the bag.

A connection and a thread are not the same thing. Blocking I/O fused them together. But they were never meant to be the same.

How Nginx solved it

The OS already knows which connections have data ready — it has to, it manages the network stack. The old approach asked about one connection at a time: park a thread on it, wait for data, wake up. One thread, one connection, one wait.

Nginx asked differently. Instead of asking about one connection at a time, it hands the kernel a list of thousands of open connections and says “tell me when any of these are ready.” The kernel does the watching. Nginx just waits for the answer, then handles whatever came in. That mechanism is epoll.

Nginx runs one worker process per CPU core. Each worker runs a single, tight loop:

while (true) {
    events = epoll_wait()   // sleep until something is ready
    for each event:
        handle(event)       // do only what's immediately possible, never block
}

Think of it as a waiter moving through the room, serving only tables with raised hands. The rule everything depends on: never block. If handling a connection would require waiting — for a database response, a disk read, more bytes from the client — the worker doesn’t wait. It jots down where it left off in a small state struct and moves on. When data eventually arrives, epoll flags that connection again and the worker picks up exactly where it left off.

That state struct — ngx_connection_t — holds the file descriptor, event handlers, buffer pointers, and a handful of status flags. A few hundred bytes. That’s the entire cost of an open connection sitting idle.

One might ask: if each worker handles events one at a time, isn’t true concurrency still limited to the number of cores? Yes — but that was always true. Apache’s 10,000 threads weren’t running in parallel either; they were mostly parked. The difference is that Nginx makes the parking explicit and cheap. What looks like “handling 10,000 connections” is really: 10,000 connections open, and whichever handful are ready right now getting handled, very fast.

The syscall behind all of this is epoll_wait() — it doesn’t make I/O faster, it makes waiting cheaper. The kernel was already tracking which connections were ready. Nginx just asks it directly, without pinning a thread to each question.

Compare this to the Apache model:

	Apache (thread-per-connection)	Nginx (event loop)
10,000 connections requires	10,000 threads	A few worker processes
Memory per idle connection	~8MB virtual (stack)	~few hundred bytes (state)
CPU time goes toward	Context switching + actual work	Actual work
Finding ready connections	Scan all threads	O(1) epoll ready list

Nginx’s memory stays flat as connections grow because it’s storing state, not threads. And state is small.

Nginx does use threads in places — workers are OS processes, and it has a thread pool for truly blocking operations like certain disk reads. The insight isn’t “threads are always wrong.” It’s narrower and more useful: idle connections don’t deserve dedicated execution contexts. You don’t need a thread to represent waiting. A struct and an event notification is enough.

The same pattern, everywhere

Nginx didn’t win by waiting for better hardware. The hardware and OS primitives it uses — epoll was available in the Linux kernel before Nginx existed. It won because someone asked a different question: what does a thread actually do for a waiting connection? The answer was: not much that couldn’t be done cheaper.

Once you see this pattern, you start finding it everywhere. Go’s goroutines: lightweight execution contexts that park on I/O without consuming an OS thread underneath. Database connection pools: a handful of real connections multiplexed across many callers. Async/await in Python, JavaScript, Rust — syntax for yielding an execution context while you wait, so something else can run.

The assumption was the bottleneck. The hardware was fine the whole time.

Further reading: The C10K Problem — Dan Kegel’s original essay