Why are CDNs cheaper than your origin server?

Ashutosh Singh • May 2, 2026

The common answer is that CDNs are cheaper because they sit closer to your users, cutting network travel time and cost. That’s true. But it’s the least interesting reason. If you stop there, you miss what actually makes them economical at scale.

The full answer goes into hardware architecture, operating system design, and procurement economics.

First, the geography part — but properly

The internet isn’t free at the infrastructure level. Bandwidth costs money, and long-haul transit (traffic crossing oceans, jumping between different network operators) is especially expensive.

When your origin server in Mumbai serves a user in London, that request travels across multiple paid network links. Every hop costs someone money.

A CDN has PoPs (Points of Presence) — server clusters distributed globally, sitting close to end users. Here’s the key insight: the CDN still pays for that expensive Mumbai to London trip, but only once, to pull fresh content from your origin and cache it. After that, thousands of London users get served locally. The long-haul cost is paid once, not per user.

That’s the location advantage. Now for the hardware story.

Two completely different jobs

Origin servers and CDN nodes exist to do fundamentally different things, and that shapes every hardware decision.

An origin server, on every request: checks who you are, hits the database, runs business logic, builds a response, sends it back. The CPU is doing real work each time.

A CDN node: finds the file in cache, streams bytes to the socket. That’s it. No database. No auth. No business logic.

So CDN nodes flip the hardware budget entirely. Modest CPUs, but enormous storage capacity. An origin might run 32 cores with 2TB of SSD. A comparable CDN node might have 8 cores with 200TB of storage. The money goes where the bottleneck actually is.

This also means CDN nodes benefit more from falling storage costs. SSD prices have dropped dramatically year over year. The thing CDN nodes need most is getting cheaper faster than what origin servers need most.

The cache tier structure

CDN nodes don’t treat all cached content the same. They tier it by request frequency:

  • Hot tier (RAM): The top 1% of content by request frequency lives entirely in memory. Zero disk reads, served at memory speed.
  • Warm tier (NVMe SSD): Frequently accessed content that didn’t make the RAM cut. Fast, but a step slower.
  • Cold tier (SATA SSD / HDD): The long tail. Rarely accessed, cheapest storage. Fine because requests are rare anyway.
  • Cache miss: Only on a full miss does the request travel back to origin. At a 95% cache hit rate, the origin handles 5% of total traffic.

This tiering matches storage cost to access frequency. You’re not spending NVMe money on files that get requested once a week. That’s why CDN economics work. It’s not magic, it’s matching the right hardware to the right workload at each level.

The network card problem at scale

Every server has a NIC (Network Interface Card). It’s the physical hardware that connects the server to the internet and handles sending and receiving data.

On a normal server, when a packet arrives at the NIC, it triggers an interrupt — essentially a tap on the OS’s shoulder saying “something came in, deal with it.” The OS wakes up, runs through its network stack (checking packet integrity, managing sequencing, sending acknowledgements), copies the data into memory, and hands it to your application.

This works fine at moderate traffic. But at 100 gigabits per second, which top CDN nodes push, you’re receiving millions of packets every second. The interrupt overhead alone starts consuming a significant fraction of CPU time, before you’ve done anything useful with the data.

At 100Gbps, the cost of just telling the CPU a packet arrived becomes a bottleneck. The solution is to stop telling it, and have it check for itself.

Kernel bypass: skipping the OS entirely

CDN nodes at high throughput use a technique called kernel bypass, implemented with tools like DPDK (Data Plane Development Kit). Cloudflare wrote about how they use this in production.

Instead of waiting for interrupts, a dedicated CPU core runs in a tight loop constantly polling the NIC directly, asking “anything new?” This is called busy polling. The NIC’s memory is mapped directly into the application’s memory space, bypassing the OS kernel entirely.

Normal packet path vs. kernel bypass:

  • Normal: NIC → OS interrupt → kernel network stack → copy to kernel memory → copy to app memory → application reads it. Two memory copies. Full kernel involvement.
  • Kernel bypass: NIC → app memory directly. One memory copy. OS skipped entirely.

The tradeoff: that polling core burns 100% CPU even when there’s zero traffic. It never sleeps. This only makes economic sense above a certain throughput threshold where the CPU cost of constant interrupts would be worse. CDN nodes at major locations cross that threshold easily.

Zero-copy file serving

Even without kernel bypass, serving a file the normal way involves unnecessary work:

  1. Read file from disk into kernel memory
  2. Copy from kernel memory into application memory
  3. Copy from application memory into the socket buffer
  4. Kernel sends it over the network

That’s two copies of the file through memory. The copy into app memory is completely pointless for a CDN node that’s just going to pass it straight to the network anyway.

Linux has a syscall called sendfile() that eliminates the middle step. The kernel sends the file directly from the disk page cache to the socket, without ever copying it into application memory.

One fewer copy per response. At millions of requests per second, memory bandwidth is a real constraint. Eliminating copies is eliminating real cost.

CPU and memory placement

Modern servers have multiple CPU sockets, each with its own memory bank attached. Accessing memory on a different socket (called a cross-NUMA access) is significantly slower than accessing local memory.

CDN nodes are careful about this. The NIC sits on PCIe lanes connected to a specific CPU socket. If the memory holding cached content is on the other socket’s memory bank, every packet transfer pays a cross-NUMA penalty. At 100Gbps, those penalties add up into real throughput loss.

So CDN hardware and software is designed to keep the NIC, the CPU processing that NIC’s traffic, and the memory holding the cached content all on the same socket. This is called NUMA locality.

Related: IRQ pinning. You can control exactly which CPU core handles interrupts from which NIC port. CDN operators pin this explicitly, then pin worker threads to the same cores, so data stays hot in L1/L2 cache and threads don’t migrate between cores and lose that warmth.

OS tuning: the details that compound

A CDN node running default Linux kernel settings would leave significant performance on the table. The OS gets tuned everywhere:

TCP buffer sizes. Default Linux TCP buffers are conservatively small. You don’t want one connection eating all memory on a general-purpose server. A CDN node handling 500,000 concurrent connections needs much larger per-connection buffers, with memory pressure thresholds tuned so the kernel doesn’t start dropping connections under load.

Huge pages. Your OS doesn’t work with raw memory addresses directly. It divides all memory into fixed-size chunks called pages (think of them as numbered slots), and maintains a lookup table mapping each slot to where it actually lives in physical RAM. The CPU caches this lookup in a small, fast register called the TLB (Translation Lookaside Buffer). When you access memory, the CPU checks the TLB first. If the mapping isn’t there (a TLB miss), it has to fetch it from the full table in RAM, which is slow. The default page size on Linux is 4KB. When a CDN node holds terabytes of cached content, that’s billions of pages — far more TLB entries than the CPU can cache at once, so TLB misses become constant. Huge pages bump the size to 2MB per page, so the same terabyte of memory needs 512x fewer TLB entries, and the CPU can actually keep up.

Connection table sizes. The kernel maintains a lookup table for every active TCP connection. At millions of concurrent connections, the default table size is too small and the hash table backing it degrades. Operators tune this explicitly.

Utilization and procurement

Two more cost levers that compound everything above.

Utilization. Origin servers run at 20-40% average CPU to preserve headroom. A 2x traffic spike means 2x compute load, and if you’re already at 80% you’re in trouble. CDN nodes don’t have this problem. A 2x traffic spike mostly means 2x network throughput, not 2x compute. NICs are built with significant throughput headroom by design; CPUs are not. So CDN nodes run at higher effective utilization, meaning fewer servers serving the same traffic.

Commodity procurement. Because every CDN node does the same job, operators design one server spec and buy it in massive quantities. Cloudflare or Akamai buying 100,000 identical machines negotiates pricing no single company’s origin fleet can match. Origin servers have heterogeneous needs (GPU nodes, high-memory database servers, different specs per team) that fragment purchasing power.

Failure tolerance. When a CDN node dies, traffic reroutes. No incident. CDN nodes are stateless and interchangeable by design. So operators run cheaper commodity hardware with higher failure rates, accepting that individual machines will die, and building redundancy in software rather than buying it in hardware. Software redundancy is dramatically cheaper than hardware reliability.

The full picture

Layer What changes
Geography Per-user long-haul transit cost eliminated once cache is warm
Hardware profile Cheap compute, massive storage. Spend goes where the bottleneck is
Kernel bypass OS skipped entirely at high throughput via DPDK
Zero-copy I/O sendfile() eliminates the pointless memory copy on every response
NUMA + IRQ pinning NIC, CPU, and memory kept on the same socket. No cross-socket penalties
Utilization Run hotter. Fewer servers per unit of traffic served
Procurement One server spec, bought at scale. Commodity pricing

The origin server is a general-purpose computer that happens to serve HTTP. The CDN node is a specialized byte-moving machine. Every layer of the stack, from hardware to OS to network card, is co-designed to do one thing: take a file and stream it to as many people as possible, as cheaply as possible.

“Closer to the user” is just the first sentence.