Language:English VersionChinese Version

Every web developer eventually hits the same wall. The application works fine in development, handles the demo smoothly, and then falls apart the moment real users show up. Response times climb from 200ms to 2 seconds. The database starts sweating. Someone suggests “just add caching,” and suddenly you’re staring at a problem that looks simple on the surface but hides a decade’s worth of engineering pitfalls underneath.

Caching is not a single technique. It’s a stack of interdependent layers, each with its own behavior, failure modes, and invalidation headaches. Get it right, and your application handles 100x the traffic without breaking a sweat. Get it wrong, and you’ll spend your weekends debugging stale data bugs that only appear in production.

This article walks through the full caching stack for modern web applications in 2026 — from the browser all the way down to the database query cache — with specific implementation details, real configuration examples, and the bugs that will bite you if you’re not careful.

The Multi-Layer Caching Model

Think of caching as a series of checkpoints between a user’s browser and your database. Each layer intercepts requests and tries to serve a response without passing the request further down the stack:

Browser Cache -> CDN Edge -> Reverse Proxy (Nginx/Varnish) -> Application Cache (Redis) -> ORM/Query Cache -> Database Buffer Pool

Every layer has different characteristics. Browser caches are per-user and controlled by HTTP headers. CDN caches are shared across users but geographically distributed. Application caches like Redis give you programmatic control but require explicit management. Database caches are mostly automatic but limited in what they can optimize.

The key insight is that each layer serves a different purpose, and skipping any one of them creates a bottleneck that the others can’t compensate for. A Redis cache won’t help if every user is downloading the same 500KB JavaScript bundle on every page load. A CDN won’t help if your API responses are personalized and uncacheable at the edge.

Browser Caching: The Most Underestimated Layer

Browser caching is free performance. You don’t need infrastructure, you don’t need Redis clusters, and you don’t need a CDN contract. You just need the right HTTP headers.

The two headers that matter most are Cache-Control and ETag. Here’s how they work in practice:

# Nginx configuration for static assets
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2)$ {
    expires 1y;
    add_header Cache-Control "public, max-age=31536000, immutable";
    add_header Vary "Accept-Encoding";
}

# API responses that change infrequently
location /api/v1/categories {
    add_header Cache-Control "public, max-age=3600, stale-while-revalidate=86400";
    add_header ETag $upstream_http_etag;
}

# User-specific content -- never cache at shared layers
location /api/v1/me {
    add_header Cache-Control "private, no-store";
}

The immutable directive is something many teams still overlook. Without it, browsers will send conditional requests (If-None-Match) even for assets with far-future expiry dates — especially during page reloads. Adding immutable tells the browser: “This resource will never change at this URL. Don’t even ask.” Combined with content-hashed filenames (like app.3f8a2b1c.js), this eliminates unnecessary network round-trips entirely.

The stale-while-revalidate directive, standardized in RFC 5861 and now supported by all major browsers, is equally powerful for API responses. It tells the browser: “Serve the cached version immediately, but fetch a fresh copy in the background.” Users see instant responses while the cache stays reasonably fresh. This is perfect for data that changes periodically but doesn’t need to be real-time — product catalogs, blog listings, configuration data.

Common Browser Cache Bugs

Bug #1: Caching HTML pages with far-future expiry. If you set max-age=31536000 on your index.html, users will never see updates unless they hard-refresh. HTML documents should use no-cache (which still allows caching but forces revalidation) or short TTLs.

Bug #2: Missing Vary headers. If your server serves different content based on the Accept-Encoding or Accept-Language header, you must include Vary to prevent caches from serving the wrong version. Forgetting Vary: Accept-Encoding when serving compressed responses is a classic mistake that leads to users receiving garbled content.

Bug #3: Setting no-cache when you mean no-store. These are not the same thing. no-cache still stores the response but revalidates every time. no-store prevents any storage. For sensitive data (authentication tokens, personal information), you want no-store.

CDN Caching: Edge Performance at Scale

A CDN puts cached copies of your content on servers distributed around the world. When a user in Tokyo requests your page, they hit a server in Tokyo instead of your origin in Virginia. The physics alone — reducing round-trip time from 150ms to 5ms — makes a noticeable difference.

In 2026, the CDN market has consolidated around a few major players. Cloudflare remains the default choice for most applications, with their free tier handling a surprising amount of traffic. Fastly still dominates when you need VCL-level control or real-time log streaming. AWS CloudFront integrates tightly with the AWS ecosystem. Bunny CDN has carved out a niche with transparent pricing and strong performance in regions where the bigger players have gaps.

The important configuration decisions for CDN caching are:

What to cache at the edge: Static assets (images, JS, CSS, fonts) are obvious. But you can also cache API responses, HTML pages, and even GraphQL queries if you structure your cache keys correctly. Cloudflare’s Cache Rules and Fastly’s VCL both let you define custom caching logic based on URL patterns, headers, cookies, and query parameters.

// Cloudflare Cache Rules (via API)
// Cache API responses for product pages for 10 minutes at the edge
{
  "expression": "(http.request.uri.path matches \"^/api/v1/products/[0-9]+$\")",
  "action": "set_cache_settings",
  "action_parameters": {
    "cache": true,
    "edge_ttl": {
      "mode": "override_origin",
      "default": 600
    },
    "cache_key": {
      "custom_key": {
        "query_string": {
          "include": ["fields", "lang"]
        }
      }
    }
  }
}

Cache key design: The cache key determines what counts as “the same request.” By default, CDNs use the full URL including query parameters. But if you have tracking parameters like utm_source or fbclid in your URLs, every link from social media creates a cache miss. Strip irrelevant query parameters from your cache key. This single change often improves cache hit ratios from 40% to 80%+.

Purging strategy: When content changes, you need to invalidate the CDN cache. Most CDNs offer three approaches: purge by URL, purge by tag/surrogate key, and purge everything. Tag-based purging is the most practical for dynamic sites. Tag your cached responses with logical identifiers (e.g., product-123, category-electronics), and when a product updates, purge all responses tagged with that product ID.

Application-Level Caching: Redis in 2026

Application-level caching is where you have the most control and the most responsibility. Redis remains the dominant choice here, though things have shifted meaningfully over the past few years.

Redis 8.0 (released late 2025) brought significant changes. The re-licensing controversy of 2024 — when Redis Labs moved from BSD to dual SSPL/RSALv2 — has largely settled. The community forks (Valkey, backed by the Linux Foundation, and Redict, aiming for strict Redis compatibility) have matured into viable alternatives. If you’re running on AWS, ElastiCache now defaults to Valkey under the hood. Azure Cache for Redis and GCP Memorystore still use upstream Redis.

Redis vs Memcached: Does the Comparison Still Matter?

In 2026, picking Memcached over Redis is an increasingly rare choice, but there are still valid reasons. Memcached’s multi-threaded architecture means a single instance can saturate a modern CPU for simple key-value lookups. Redis, despite its I/O threading improvements in 7.x and 8.x, is still fundamentally single-threaded for command execution. If your workload is pure cache — simple GET/SET operations with no data structure needs — and you’re running on a machine with 32+ cores, Memcached will give you better per-node throughput.

But for almost everyone else, Redis wins because of data structures. Sorted sets for leaderboards and rate limiting. Streams for event queues. HyperLogLog for cardinality estimation. Hash types for storing structured objects without serialization overhead. These aren’t just nice-to-haves — they fundamentally change how you design your caching layer.

Practical Redis Caching Patterns

Cache-aside (Lazy Loading): The most common pattern. Your application checks Redis first. On a miss, it queries the database, writes the result to Redis, and returns it. Simple and effective, but vulnerable to cache stampede on cold starts.

# Python example with redis-py 5.x
import redis
import json

r = redis.Redis(host='redis-primary', port=6379, decode_responses=True)

def get_product(product_id: int) -> dict:
    cache_key = f"product:{product_id}:v3"

    # Check cache
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # Cache miss -- query database
    product = db.query("SELECT * FROM products WHERE id = %s", product_id)
    if product is None:
        # Cache negative results to prevent repeated DB queries
        r.setex(f"product:{product_id}:null", 300, "1")
        return None

    # Write to cache with TTL
    r.setex(cache_key, 3600, json.dumps(product))
    return product

Write-through: When data is updated, write to both the database and the cache in the same operation. This keeps the cache warm and consistent, but adds latency to write operations. Works well for data that’s read far more often than it’s written.

Cache stampede prevention: When a popular cache key expires, hundreds of concurrent requests might all miss the cache simultaneously and all hit the database. The standard solution is a distributed lock:

def get_product_safe(product_id: int) -> dict:
    cache_key = f"product:{product_id}:v3"
    lock_key = f"lock:product:{product_id}"

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # Try to acquire lock (SET NX with expiry)
    acquired = r.set(lock_key, "1", nx=True, ex=10)

    if acquired:
        # We got the lock -- rebuild cache
        product = db.query("SELECT * FROM products WHERE id = %s", product_id)
        r.setex(cache_key, 3600, json.dumps(product))
        r.delete(lock_key)
        return product
    else:
        # Another request is rebuilding -- wait and retry
        time.sleep(0.05)
        return get_product_safe(product_id)

Versioned Cache Keys

One of the most reliable invalidation strategies is key versioning. Instead of deleting cache entries when data changes, you increment a version number in the cache key. Old entries naturally expire via TTL, and new requests immediately get fresh data.

# Store a version counter per entity
def invalidate_product(product_id: int):
    r.incr(f"product:{product_id}:version")

def get_product_versioned(product_id: int) -> dict:
    version = r.get(f"product:{product_id}:version") or "0"
    cache_key = f"product:{product_id}:v{version}"

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    product = db.query("SELECT * FROM products WHERE id = %s", product_id)
    r.setex(cache_key, 3600, json.dumps(product))
    return product

This approach avoids the thundering herd problem that comes with explicit cache deletion, and it works naturally in distributed systems where cache deletion messages might arrive out of order.

Cache Invalidation: The Hard Part

Phil Karlton’s famous quote — “There are only two hard things in Computer Science: cache invalidation and naming things” — remains painfully accurate. Cache invalidation is hard because it requires you to answer a fundamentally difficult question: “When does this data become stale, and what should I do about it?”

There are three broad strategies:

TTL-based expiry: Set a time-to-live on every cached entry and accept that data might be stale within that window. This is the simplest approach and works surprisingly well for many applications. The key is choosing the right TTL. Too short, and you’re not getting much cache benefit. Too long, and users see outdated data. For most applications, a 5-minute TTL on API responses and a 1-hour TTL on rarely-changing reference data is a reasonable starting point.

Event-driven invalidation: When data changes in the database, publish an event that triggers cache invalidation. This gives you near-real-time consistency but requires infrastructure for event delivery. In practice, this means database triggers, application-level event publishing (e.g., after a successful write), or change data capture (CDC) tools like Debezium reading the database’s write-ahead log.

# Event-driven invalidation with a simple pub/sub approach
# Publisher (in your write path)
def update_product(product_id: int, data: dict):
    db.execute("UPDATE products SET ... WHERE id = %s", product_id)
    r.publish("cache_invalidation", json.dumps({
        "entity": "product",
        "id": product_id,
        "action": "updated"
    }))

# Subscriber (running as a background worker)
def cache_invalidation_listener():
    pubsub = r.pubsub()
    pubsub.subscribe("cache_invalidation")
    for message in pubsub.listen():
        if message["type"] == "message":
            event = json.loads(message["data"])
            pattern = f"{event['entity']}:{event['id']}:*"
            keys = r.keys(pattern)
            if keys:
                r.delete(*keys)

Versioned keys (described above): The hybrid approach. Use TTL for natural expiry but increment a version counter for immediate invalidation. This combines the simplicity of TTL with the responsiveness of event-driven invalidation.

Database Query Caching

Most databases have some form of internal caching. PostgreSQL’s shared_buffers setting controls how much memory is dedicated to caching table and index data. MySQL had a built-in query cache that was removed in version 8.0 because it caused more contention than benefit on multi-core systems. In 2026, if you’re using MySQL, query-level caching should happen in Redis or your application layer, not in the database itself.

PostgreSQL’s buffer cache is more nuanced. It caches disk pages, not query results, so repeated execution of the same query still incurs parsing and planning costs. For complex queries that are executed frequently, consider materialized views with periodic refresh:

-- Create a materialized view for a complex aggregation
CREATE MATERIALIZED VIEW product_stats AS
SELECT
    p.id,
    p.name,
    COUNT(r.id) as review_count,
    AVG(r.rating) as avg_rating,
    MAX(r.created_at) as latest_review
FROM products p
LEFT JOIN reviews r ON r.product_id = p.id
GROUP BY p.id, p.name;

-- Create an index on the materialized view
CREATE UNIQUE INDEX idx_product_stats_id ON product_stats(id);

-- Refresh periodically (e.g., via pg_cron)
SELECT cron.schedule('refresh_product_stats', '*/5 * * * *',
    'REFRESH MATERIALIZED VIEW CONCURRENTLY product_stats');

Putting It All Together: A Real-World Example

Let’s trace a request through all caching layers for an e-commerce product page.

First visit: The browser has nothing cached. The request hits the CDN, which also has nothing. The request reaches your origin server. The application checks Redis — cache miss. It queries the database, builds the HTML response, stores the result in Redis (TTL: 10 minutes), and returns it. The CDN caches the response (TTL: 5 minutes). The browser caches it with stale-while-revalidate=300. Total time: ~400ms.

Second visit (within 5 minutes): The browser serves its cached copy instantly. If using stale-while-revalidate, it also fires a background request to the CDN, which serves its cached copy. Total time: 0ms (from the user’s perspective).

Visit after CDN TTL expires but within Redis TTL: The CDN forwards the request to origin. The application finds the data in Redis and returns it quickly. The CDN re-caches the response. Total time: ~50ms.

Product data is updated: The write path publishes an invalidation event. Redis keys for this product are deleted. On the next CDN miss, the application fetches fresh data from the database and repopulates Redis. If you’re using surrogate keys with your CDN, you can also purge the edge cache immediately.

Monitoring Your Cache

A cache you don’t monitor is a liability. The minimum metrics you should track:

Hit ratio: The percentage of requests served from cache. For Redis, use INFO stats and look at keyspace_hits and keyspace_misses. A healthy application cache should have a hit ratio above 90%. Below 80%, something is wrong — either your TTLs are too short, your cache keys are too specific, or your working set doesn’t fit in memory.

Eviction rate: If Redis is evicting keys, your maxmemory is too low for your working set. Check evicted_keys in INFO stats. Non-zero evictions on a cache that should be holding data long-term means you need more memory or need to be more selective about what you cache.

Latency percentiles: Redis is fast, but network latency, connection pooling issues, or slow commands can create tail latency. Use redis-cli --latency-history for quick diagnostics, or set up proper monitoring with Prometheus and the Redis exporter. Watch p99 latency — if it spikes while p50 stays flat, you likely have a slow command or connection issue.

CDN hit ratio by URL pattern: Most CDNs provide analytics dashboards. Look for URL patterns with low hit ratios — these are candidates for configuration fixes (cache key stripping, longer TTLs, or headers adjustment).

Common Caching Bugs (and How to Avoid Them)

The stale session bug: You cache user session data in Redis with a 30-minute TTL. A user logs out, but their session is still in the cache. Another request comes in with the old session token and gets the cached (now-invalid) session. Fix: on logout, explicitly delete the session from cache — don’t rely on TTL alone for security-sensitive data.

The hot key problem: One cache key gets vastly more traffic than others (e.g., a viral product page). A single Redis node handling millions of requests per second for one key becomes a bottleneck. Fix: replicate hot keys across multiple Redis nodes using client-side routing, or use local in-process caching (a small LRU cache in your application) as a buffer in front of Redis.

The cache-database race condition: Request A reads a product from the database (price: $10). Request B updates the price to $15 and invalidates the cache. Request A writes the stale $10 price to the cache. Now the cache has outdated data until the TTL expires. Fix: use versioned keys, or check a version/timestamp before writing to the cache.

The serialization mismatch: You add a new field to your cached object, deploy to production, and existing cached entries don’t have the field. Your application crashes with a KeyError. Fix: always handle missing fields gracefully in deserialization, and consider using the versioned key pattern so deployments naturally refresh the cache.

The negative caching omission: Your application caches successful database lookups but not failed ones. An attacker or a bug sends millions of requests for non-existent IDs. Every request misses the cache and hits the database. Fix: cache negative results too (with a shorter TTL), and implement rate limiting as an additional safeguard.

Wrapping Up

Caching is not something you bolt on when performance becomes a problem. It’s an architectural decision that should be part of your design from the beginning. Each layer — browser, CDN, application, database — solves a different class of performance problem, and the best applications use all of them deliberately.

Start with browser caching because it’s free and effective. Add a CDN for static assets and public content. Use Redis for application-specific data that doesn’t fit neatly into HTTP caching semantics. And choose your invalidation strategy based on your consistency requirements, not on what’s easiest to implement.

The hard part of caching isn’t the implementation. It’s the discipline of maintaining it — monitoring hit ratios, investigating anomalies, updating TTLs as usage patterns change, and writing code that handles cache failures gracefully. Treat your caching layer as a first-class component of your architecture, and it will pay dividends for years.

By Michael Sun

Founder and Editor-in-Chief of NovVista. Software engineer with hands-on experience in cloud infrastructure, full-stack development, and DevOps. Writes about AI tools, developer workflows, server architecture, and the practical side of technology. Based in China.

Leave a Reply

Your email address will not be published. Required fields are marked *