A Collection of Caching Mistakes I've Made
Caching looks simple until you mess up invalidation, consistency, and memory management. I thought Redis would solve everything.
Caching makes things faster -- true, but
Our API response times were slow, so we added caching. The product listing API averaged 1,200ms, and after adding Redis caching, it dropped to 45ms. 26x faster. Happy ending so far.
What happened next was the problem.
Mistake 1: No TTL
I didn't set a cache expiration time at first. "I'll just invalidate it when the data changes," I thought. But I missed the invalidation logic in one place. A product price changed, but the old price stayed in cache. A customer ended up paying 19,800 won for a 23,900 won item.
It took 3 days to discover this. During that time, 17 orders went through at the wrong price. The company ate the difference.
Mistake 2: Lazy cache key design
I cached the product list under the key products:list. But there was pagination. Whether it was page 1 or page 5, they all hit the same cache key, so only page 1 data ever showed up.
Changed it to products:list:page:1:size:20. Then realized the sort condition was missing. Price-sorted and newest-first were sharing the same cache. Changed it to products:list:page:1:size:20:sort:price. Then realized filters were missing too...
(If you design cache keys lazily, you end up in this endless cycle of patching.)
Mistake 3: Cache stampede
The moment the product list cache expires, if 200 users are connected simultaneously, 200 DB queries fire at once. This is called a cache stampede. After setting the cache TTL to 5 minutes, DB load spiked every 5 minutes like clockwork.
Solutions include proactively refreshing the cache before expiry, or using a lock so only one request queries the DB. I went with the lock approach, which meant spending another day implementing Redis locks.
Mistake 4: No memory management
I didn't set maxmemory on Redis. Cache data kept piling up until it consumed all 8GB of server memory. Redis died. With the cache down, every request hit the DB directly, and the DB buckled under the load too.
Set maxmemory to 2GB and switched the eviction policy to allkeys-lru. This is covered in the Redis basic setup guide, which I had skipped. My fault.
Caching isn't simple
"Slap Redis on it and it'll be fast" -- that's true. But if you don't handle invalidation, consistency, key design, and memory management properly, the cache creates bigger problems than it solves.
Honestly, optimizing the DB queries first might have been the right move. The 1,200ms query itself was the real problem, and I just papered over it with a cache. An index might have brought it down to 200ms. No proof, but that nagging feeling hasn't gone away.