A Practical Guide to Redis Caching Strategies
The trial and error of adding Redis as a cache layer, and how I chose the right strategy.
A Letter to My Past Self Who Said "Just Add a Cache"
Our main API had an average response time of 800ms. Joining multiple PostgreSQL tables, checking permissions, transforming data — the pipeline was complex. The DB query itself was 200ms, but permission checks, data transformation, and serialization at the application level ate up the other 600ms.
Adding a cache isn't the hard part. Managing it well is. I learned this the hard way over the course of a year.
Started with Cache-Aside
The most common pattern. When a request comes in, check Redis first; if it's a miss, query the DB and store the result in Redis. Set TTL to 5 minutes, plugged it in, and response time dropped from 800ms to 50ms.
Then problems hit immediately. An admin updated some data, but the old data kept showing for 5 minutes. "I just edited this but it's not reflecting" — that support ticket came in about 3 times a day.
The Cache Invalidation Swamp
Dropped the TTL to 30 seconds. Fewer complaints, but the cache hit rate fell to 60%. DB load climbed right back up.
I ended up mixing in Write-Through. When data is modified, the cache gets updated simultaneously. Code got more complex, but I could extend TTL to 10 minutes while still keeping things up to date.
But then cache update logic got scattered across multiple places, and one spot missed the update — instant bug. There's a reason they say cache invalidation is one of the hardest problems in computer science.
(I once spent an entire day tracking down "why isn't the data changing?" It turned out one cache update line was missing.)
Design Your Cache Keys Poorly and You'll Pay
Initially I kept keys simple, like user:123. But the same user needed different data depending on the requester's role. Admins see all fields; regular users see only public ones.
Changed the key to user:123:role:admin, but with 3 roles and 10,000 users, that's 30,000 cache entries. Redis memory spiked. I eventually split public and private data into separate cache entries and assembled them at response time.
If I'd designed the cache keys properly from the start, I could've avoided this whole detour.
Cache Stampede Nearly Killed the DB
When traffic peaks and a cache TTL expires, dozens of requests simultaneously hit the DB. That's a cache stampede. We actually had our DB connection pool exhausted because of this.
Applied the mutex pattern. When a cache entry expires, only one request queries the DB while the rest wait briefly and then read the refreshed cache. Implemented with Redis SETNX. Simple, but remarkably effective.
Without Monitoring, It's All Pointless
After adding Redis, the three metrics I watched most closely: cache hit rate, memory usage, and key count.
When the hit rate dropped below 80%, I'd revisit the TTL or caching strategy. I wrote a script that periodically checks the hit rate via the INFO stats command and sends Slack alerts.
Lessons After a Year of Running It
Caching isn't a silver bullet for performance — it's introducing new complexity. Optimizing the DB queries should come first. Only consider caching when that's not enough.
If you do add it, build the invalidation strategy and monitoring at the same time as the cache implementation. "We'll do it later" inevitably leads to data inconsistencies that hurt users.