Course/Caching/Distributed Caching

Distributed Caching

Scaling caches across nodes: consistent hashing, replication, cache clusters, and handling node failures without cache avalanche.

12 min readHigh interview weight

Why Distribute a Cache?

A single cache node has finite RAM. Once your working set exceeds a single machine's memory — or your QPS exceeds what one machine can handle — you need to shard the cache across multiple nodes. Distributed caching adds the full complexity of distributed systems: how do you route requests to the right node? What happens when a node fails? How do you add capacity without invalidating the entire cache?

Naive Sharding: Modulo Hashing

The simplest sharding approach: `node_index = hash(key) % num_nodes`. Every client computes the same formula and routes to the same node. The problem: when you add or remove a node, `num_nodes` changes, and nearly every key maps to a different node. All cached data is effectively invalidated simultaneously — the database gets crushed by a wave of cache misses. This is a cache avalanche.

⚠️

Cache avalanche from naive resharding

Changing N nodes to N+1 with modulo hashing remaps approximately (N/(N+1)) = ~91% of all keys to different nodes for a 10-node cluster. This is catastrophic for production systems. Consistent hashing solves this by minimizing remapping on topology changes.

Consistent Hashing

Consistent hashing places both nodes and keys on a circular hash ring (0 to 2^32). Each key is assigned to the nearest node clockwise on the ring. When a node is added or removed, only the keys between it and its predecessor on the ring are remapped — approximately `1/N` of all keys for an N-node cluster. This minimizes cache invalidation during scaling.

Loading diagram...

Consistent hash ring: each node owns keys between its predecessor and itself. Adding a node remaps only ~1/N keys.

Virtual nodes (vnodes): A single physical node is represented by multiple positions on the ring (e.g., 150 virtual nodes per physical node). This improves load distribution — without vnodes, an uneven ring means some nodes are responsible for a much larger key range than others. Vnodes also make it easier to assign more ring positions to higher-capacity nodes. Redis Cluster uses 16,384 hash slots (a fixed-size discrete ring) as its consistent hashing mechanism.

Replication in Distributed Caches

Sharding partitions keys across nodes but does not protect against node failure. Replication provides that protection: each primary node has one or more replicas that hold a copy of the same keys. Reads can be served by replicas (increased throughput); writes go to the primary and are asynchronously replicated.

Replication Model	Write Path	Read Path	Failure Behavior	Example
Primary-Replica	Write to primary only	Primary or replicas	Replica promotes to primary on failover (Sentinel)	Redis Sentinel
Multi-Primary	Write to any primary	Any node	Partition tolerance; conflict resolution needed	Redis Cluster with multiple primaries
Quorum writes	Write to W nodes; succeed when W ack	Read from R nodes	Tunable consistency via W+R > N	Dynamo-style systems

Cache Avalanche vs Cache Stampede

Two catastrophic distributed cache failure modes are often confused:

Failure Mode	Cause	Effect	Prevention
Cache Stampede	One popular key expires; many concurrent requests get a miss	Database spiked by N simultaneous queries for one key	Mutex lock, stale-while-revalidate, jittered TTL
Cache Avalanche	Many keys expire simultaneously (e.g., after a cold restart, or naive resharding)	Database flooded by massive miss wave across all keys	Staggered TTLs, warm-up strategies, gradual traffic shifting
Cache Penetration	Attacker queries keys that never exist (cache miss always)	Database queried for every request; cache offers no protection	Bloom filter to reject invalid keys; null-value caching

Cache Penetration and Bloom Filters

Cache penetration is a denial-of-service pattern: an attacker queries IDs that don't exist (e.g., `user:-1`, `product:99999999`). Every request is a cache miss and hits the database. The cache provides zero protection. The solution: use a Bloom filter — a probabilistic data structure that quickly answers 'definitely does not exist' or 'probably exists.' A Bloom filter for all valid user IDs can be loaded into the application and checked before any cache or database call.

python

# Bloom filter check before cache/DB lookup
# Using pybloom_live or similar
from pybloom_live import ScalableBloomFilter

# At startup, populate from DB
user_bloom = ScalableBloomFilter()
for user_id in db.query("SELECT id FROM users"):
    user_bloom.add(user_id)

def get_user(user_id: str) -> User:
    # Reject impossible IDs without hitting cache or DB
    if user_id not in user_bloom:
        raise NotFoundError("User does not exist")

    # Safe to check cache now
    cached = cache.get(f"user:{user_id}")
    if cached:
        return deserialize(cached)

    # Cache miss — DB fetch
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    if user:
        cache.set(f"user:{user_id}", serialize(user), ttl=300)
    return user

Hot Key Problem

Consistent hashing distributes keys evenly on average, but some keys receive disproportionate traffic — a celebrity's profile page, a trending news article, or a flash sale product. A single node handling millions of requests per second for one key becomes a hot spot regardless of how evenly other keys are distributed.

Solutions for hot keys:

Local in-process cache: Put the hot key in every application server's in-process cache. Reads never leave the process — zero network overhead.
Key replication: Shard the hot key into multiple copies (`product:trending#0`, `product:trending#1` ... `#N`) and distribute them across nodes. Reads pick a random shard.
Read replicas: Route read traffic for the hot key's primary node to its replicas, distributing the load across multiple physical machines.
Redis Cluster read-from-replica: Set `READONLY` on replica connections to allow reads, reducing primary load.

Gradual Cache Warm-Up

After a cold start (cluster restart, major resharding, or scaling out), the cache is empty. Sending all production traffic immediately causes a database avalanche. Use gradual warm-up: start by sending a small percentage of traffic to the new cluster while the old cluster handles the rest. As the hit rate climbs (monitor `cache_hits / (cache_hits + cache_misses)`), shift more traffic. Alternatively, pre-warm by replaying recent database reads against the new cluster before shifting traffic.

💡

Interview Tip

Distributed caching questions come up in senior system design interviews for products with large datasets. Cover these four areas: (1) Sharding — consistent hashing to minimize remapping; (2) Replication — primaries + replicas for fault tolerance; (3) Failure modes — cache stampede (one key), cache avalanche (mass expiry), cache penetration (invalid keys); (4) Hot keys — local in-process caching or key replication. Mentioning Bloom filters for cache penetration is a high-signal answer that most candidates don't give.

Practice this pattern

Design a distributed caching system for a social media feed

Cache Invalidation Patterns

Message Queue Fundamentals