The Architecture That Handled 10K Requests Per Second

Watching the Numbers Flood the Grafana Dashboard

Promo launch day. 12,000 requests per second hit us. 20% more than projected. But the servers held. Average API response time: 150ms. Error rate: 0.02%.

This is the story of how we prepped in 3 weeks.

On a Normal Day, We Did 200 RPS

Our service normally sat at about 200 requests per second. Simple setup: 2 EC2 instances, 1 RDS, 1 Redis. Then marketing planned a massive promotion. Expected concurrent users: 50,000. Peak traffic: 10,000 RPS. 50x our normal load. 3 weeks until D-day.

Load Testing Came First

Ran load tests with k6. At 1,000 RPS, API response times started exceeding 2 seconds. At 2,000, the DB connection pool was exhausted. At 3,000, the server died.

Current setup maxed out at about 800. Simply adding more servers wouldn't cut it.

The DB Broke First

RDS had max_connections at the default 150. Two servers each using 50 connections from the pool -- that's already 100. No headroom.

Added a Read Replica and distributed read queries. Since reads like the promo page product listing made up 80% of traffic, the impact was huge. Hot data went into Redis -- promo product listings and inventory counts cached with a 10-second TTL. DB queries dropped by 80%.

(We expected the DB to be the bottleneck. It just broke sooner than expected.)

Race Condition Hit on Inventory

Limited stock: 1,000 items. Concurrent decrements mean race conditions. Optimistic locking had too high a failure rate; pessimistic locking killed performance.

We went with Redis DECR for atomic inventory decrements. Return value >= 0 means success, negative means failure. DB updates happen async. This approach got the inventory API handling up to 50,000 RPS.

Knocked Out the Smaller Bottlenecks Too

JWT token validation was hitting the DB on every request for blacklist checks. Moved that to Redis. Added a CDN -- not just for static assets but also the promo page HTML served with a short TTL.

Reducing the requests that even reach origin is the most effective optimization.

Set up an Auto Scaling Group with scale-out at 60% CPU. Min 4 instances, max 12. But EC2 takes 2-3 minutes to boot, so it can't keep up with sudden spikes. Added scheduled scaling to pre-spin 8 instances 30 minutes before promo start.

D-Day Results

Scaled to 10 servers. Redis cache hit rate: 95%. All 1,000 items sold out in 47 seconds. Zero errors during inventory decrement.

Nothing Fancy

Three takeaways: running into high traffic without load testing is gambling. Bottlenecks always break at the weakest link. Caching and async processing solve most problems.

Handling 10,000 requests per second isn't about exotic technology -- it's about doing the basics thoroughly.

Watching the Numbers Flood the Grafana Dashboard

On a Normal Day, We Did 200 RPS

Load Testing Came First

The DB Broke First

Race Condition Hit on Inventory

Knocked Out the Smaller Bottlenecks Too

D-Day Results

Nothing Fancy

Related Posts

When Microservices Aren't the Answer

One Year with Serverless: The Honest Pros and Cons

Edge Computing: Practical Use Cases I Actually Tested