Performance, Reliability & Monitoring

Caching

Overview

Caching plays a crucial role in API performance, reducing latency and server load while improving response times.

APIs SHOULD leverage appropriate caching mechanisms.

APIs SHOULD use a multi-layered caching strategy i.e. implement caching at various layers (e.g., API Gateway / CDN, application layer, database/persistence layer) to optimise performance.

APIs SHOULD use distributed caching

Rule of Thumb

APIs SHOULD implement caching when:

Serving frequently accessed, rarely changing data.
Response generation is computationally expensive.
Response times are slow or Requests are timing out.
Handling high volumes of traffic.
Serving static reference data or lookup tables.

APIs SHOULD NOT implement caching when:

Serving volatile data that chances frequently, unless you have a strategy for cache invalidation.

Use the following flowchart to help you decide:

flowchart LR
    A[Is the data frequently accessed?] -->|Yes| B[Is the data rarely changing?]
    A -->|No| F[Do Not Cache]
    B -->|Yes| C[Is response generation expensive or slow?]
    B -->|No| V[Do you have a strategy for cache invalidation?]
    V -->|Yes| E
    V -->|No| F
    C -->|Yes| D[Implement Caching]
    C -->|No| E[Would cache hit rate be high?]
    E -->|Yes| D
    E -->|No| F

Server-Side vs Client-Side Caching

APIs SHOULD implement server-side response caching and SHOULD avoid implementing client-side caching except where there are specific requirements that necessitate it. This approach provides greater control, security, and consistency.

The HTTP caching specification RFC9111 is complex and often inconsistently implemented by clients, RESTful APIs frequently have nuanced caching requirements which may include handling data with mixed sensitivity levels that require careful cache management which generic client implementations might not handle correctly, furthermore client-side caching can lead to sensitive data being stored in uncontrolled environments.

Server-side caching reduces the risk of cached data leakage across different client contexts, ensures that caching policies are correctly applied regardless of client capabilities and that all clients receive consistent data.

In addition server-side caching provides greater control over what data is cached and for how long, offers fine-grained cache control which can be implemented based on resource type, authorisation level, or other factors and enables the ability to invalidate cached content when underlying data changes.

As such APIs SHOULD follow the best practice of defaulting the Cache-Control response header to the following value to prevent any default client caching behaviors.

Cache-Control: no-cache, no-store, must-revalidate, max-age=0

Whilst this may seem inefficient, there are actually a good reasons for doing so, Preventing any accidental client caching of sensitive/secure data, meaning it’s much safer to default to a posture of no client caching by default.

Server-Side Response Caching

APIs SHOULD implement server-controlled response caching that is independent of client-specified caching headers.

APIs SHOULD utilise their respective development ecosystem and take advantage of the available caching tools/libraries to support server-side response caching, for example if you are building your API with dotnet there is an output caching middleware specifically for sever controlled caching, and for python there is a framework agnostic caching library called cachews.

When utilising an API gateway, APIs SHOULD make use of any response caching functionality, as this helps to reduces the load on the backend API; Azure Api Management (APIM) provides this functionality through the use of policies.

Implementation Approaches

Basic Implementation Pattern

The general pattern for implementing server-side response caching is:

Check if request can be served from cache.
If cached, return cached response.
If not cached, generate response and store in cache.
Return fresh response.

flowchart LR
    Request[Request Arrives] --> Lookup[Cache Key Lookup]
    Lookup --> Hit{Cache Hit?}
    Hit -->|Yes| Return[Return Cached Response]
    Hit -->|No| Generate[Generate Fresh Response]
    Generate --> CacheResponse[Cache Response]
    CacheResponse --> ReturnFresh[Return Fresh Response]

Client-Side Caching

Client-side caching SHOULD be avoided. However in addition to server-side response caching, there are cases where client-side caching MAY be appropriate:

When offline capability is required (e.g., mobile applications)
For static resources that rarely change (e.g., images, stylesheets)
To reduce network traffic in bandwidth-constrained environments

If your API really requires supporting HTTP caching , please observe the following rules:

MAY responsibly enable HTTP caching explicitly for any operations that require it.

MUST document all cacheable GET, HEAD, and POST endpoints by declaring the support of Cache-Control, Vary, and ETag headers in response.

MUST NOT define the Expires header to prevent redundant and ambiguous definition of cache lifetime.

MUST take care to specify the ability to support caching by defining the right caching boundaries, i.e. time-to-live and cache constraints, by providing sensible values for Cache-Control and Vary in your service.

APIs SHOULD use appropriate Cache-Control directives:

Directive	Purpose	Example
`max-age`	How long the response can be cached (in seconds)	`Cache-Control: max-age=3600`
`no-cache`	Must revalidate before using cached content	`Cache-Control: no-cache`
`no-store`	Don’t cache the response at all	`Cache-Control: no-store`
`private`	Only browser can cache, not intermediaries	`Cache-Control: private, max-age=600`
`public`	Response can be cached by any cache	`Cache-Control: public, max-age=86400`

Operations which require the use of the Authorization Header i.e. OAuth protected endpoints, SHOULD also contain the private directive.

Cache-Control: private, must-revalidate, max-age=60

Example Implementations

Read-Only Reference Data

Cache-Control: public, max-age=86400

User-Specific Data (Non-Sensitive)

Cache-Control: private, max-age=300

Time-Sensitive Data

Cache-Control: public, max-age=60

Sensitive Data

Cache-Control: no-store

Considerations for Special Cases

Pagination

Paginated responses SHOULD use cache control headers that decrease in duration for later pages:

# First page might be cached longer
Cache-Control: public, max-age=3600

# Later pages cached for shorter periods
Cache-Control: public, max-age=600

Search Results

Search endpoints MAY cache results for popular queries but SHOULD use shorter cache durations:

Cache-Control: public, max-age=300

Versioned APIs

Versioned API endpoints MAY use longer cache durations since their responses are stable by definition:

Cache-Control: public, max-age=604800

Multi-Layered Caching Strategy

APIs SHOULD take a multi-layered approach to implement caching.

The diagrams below illustrates an example of multi-layered caching that provides performance benefits at different levels of your architecture.

This is a generalised example and some of these layers might not be applicable for your application.

Flowchart Diagram

flowchart TD
    Client[Client] --> BrowserCache[Browser Cache]
    
    BrowserCache -->|Cache Miss| CDN[Content Delivery Network]
    BrowserCache -->|Cache Hit| Client
    
    CDN -->|Cache Miss| APIGateway[API Gateway Cache]
    CDN -->|Cache Hit| BrowserCache
    
    APIGateway -->|Cache Miss| AppServer[Application Server]
    APIGateway -->|Cache Hit| CDN
    
    AppServer --> AppCache[Application Cache
Redis/Memcached]
    
    AppCache -->|Cache Miss| DBCache[Database Cache
Query Cache]
    AppCache -->|Cache Hit| AppServer
    
    DBCache -->|Cache Miss| Database[(Database)]
    DBCache -->|Cache Hit| AppServer
    
    Database --> DBCache
    
    %% Return paths for storing in caches
    Database -.->|Store| DBCache
    DBCache -.->|Store| AppCache
    AppCache -.->|Store| APIGateway
    APIGateway -.->|Store| CDN
    CDN -.->|Store| BrowserCache

Sequence Diagram

sequenceDiagram
    participant Client
    participant BrowserCache as Browser Cache
    participant CDN as Content Delivery Network
    participant APIGateway as API Gateway
    participant AppCache as Application Cache
Redis/Memcached
    participant AppServer as Application Server
    participant DBCache as Database Cache
Query Cache
    participant Database
    Client->>BrowserCache: Request data
    alt Browser Cache Hit
        BrowserCache-->>Client: Return cached data
    else Browser Cache Miss
        BrowserCache->>CDN: Forward request
        alt CDN Cache Hit
            CDN-->>BrowserCache: Return cached data
            BrowserCache-->>Client: Return cached data
        else CDN Cache Miss
            CDN->>APIGateway: Forward request
            alt API Gateway Cache Hit
                APIGateway-->>CDN: Return cached data
                CDN-->>BrowserCache: Return & store cached data
                BrowserCache-->>Client: Return & store cached data
            else API Gateway Cache Miss
                APIGateway->>AppServer: Forward request
                AppServer->>AppCache: Check app cache
                alt Application Cache Hit
                    AppCache-->>AppServer: Return cached data
                    AppServer-->>APIGateway: Return response
                    APIGateway-->>CDN: Return & store response
                    CDN-->>BrowserCache: Return & store response
                    BrowserCache-->>Client: Return & store response
                else Application Cache Miss
                    AppServer->>DBCache: Query data
                    alt DB Cache Hit
                        DBCache-->>AppServer: Return cached data
                    else DB Cache Miss
                        DBCache->>Database: Query database
                        Database-->>DBCache: Return & cache results
                        DBCache-->>AppServer: Return results
                    end
                    AppServer-->>AppCache: Store in app cache
                    AppServer-->>APIGateway: Return response
                    APIGateway-->>CDN: Store & return response
                    CDN-->>BrowserCache: Store & return response
                    BrowserCache-->>Client: Store & return response
                end
            end
        end
    end

Benefits of Each Caching Layer

Browser Cache

Eliminates network requests completely for repeat visits.
Instant response times for cached resources.
Reduces bandwidth consumption for the end user.

SHOULD only be used for non-sensitive, static resources.

MUST NOT be relied upon for critical application functionality.

Content Delivery Network (CDN)

Geographical distribution reduces latency by serving from edge locations.
Massive scalability to handle traffic spikes.
Offloads traffic from origin servers.
Ideal for static assets, public API responses, and infrequently changing data.

API Gateway Cache

Centralised caching for all API endpoints.
Consistent policy enforcement across all services.
Reduces load on backend application servers.
Enables analytics on cache performance at the API level.

Application Cache (Redis/Memcached)

High-speed data access for frequently used data.
Flexible invalidation based on application-specific logic.
Supports complex data structures beyond simple key-value pairs.
Can be shared across multiple application instances.

Database Cache/Query Cache

Optimises repeated queries without application changes.
Reduces database load for read-heavy workloads.
Often built into database systems (e.g., MySQL Query Cache, PostgreSQL).
Transparent to the application in many cases.

Multi-Layer Advantages

Using this approach, APIs can achieve significantly better performance and scalability while reducing infrastructure costs.

Defence in depth: Even if one cache fails, others may still provide performance benefits.
Optimised resource usage: Most expensive operations (i.e. database queries) are cached at multiple levels.
Improved resilience: Distributed caching improves availability and fault tolerance.
Targeted optimisation: Each layer can be optimised for specific types of data and access pattern.

Cache Invalidation Strategies

APIs MUST implement appropriate cache invalidation strategies to ensure data consistency while maintaining performance benefits. Effective cache invalidation is critical to prevent serving stale or incorrect data to clients.

Types of Cache Invalidation Strategies

Time-Based Expiration

APIs MUST implement time-based expiration for all cacheable resources:

MUST set appropriate max-age values based on data volatility.

SHOULD use shorter expiration times for frequently changing data.

MAY use longer expiration times for static reference data.

MUST consider the business impact of serving stale data when setting expiration times.

sequenceDiagram
    participant Client
    participant Cache
    participant API
    
    Note over Cache: Cache entry created with
max-age=3600
    
    rect rgb(200, 250, 200)
    Note right of Cache: Within TTL period
    Client->>Cache: Request resource
    Cache-->>Client: Return cached resource
    end
    
    Note over Cache: After 3600 seconds
cache entry expires
    
    rect rgb(250, 230, 200)
    Note right of Cache: After TTL period
    Client->>Cache: Request resource
    Cache-->>API: Forward request (cache miss)
    API-->>Cache: Return fresh data
    Cache-->>Client: Return & store new data
    end

Event-Based Invalidation

For data that changes unpredictably, APIs SHOULD implement event-based invalidation:

MUST trigger cache invalidation when the underlying data changes.

SHOULD use publish/subscribe mechanisms to notify cache systems of changes.

SHOULD implement targeted invalidation for specific resources rather than flushing entire caches.

MAY use message queues or webhooks for distributed cache invalidation.

sequenceDiagram
    participant Client
    participant Cache
    participant API
    participant DB
    participant EventBus
    
    Note over Client,DB: Initial request and caching
    Client->>API: GET /resource/123
    API->>DB: Fetch data
    DB-->>API: Return data
    API->>Cache: Store in cache
    API-->>Client: Return response
    
    Note over Client,DB: Data update occurs
    Client->>API: PUT /resource/123
    API->>DB: Update data
    DB-->>API: Confirm update
    API->>EventBus: Publish "resource.123.updated" event
    EventBus->>Cache: Invalidate cache for /resource/123
    API-->>Client: Return success response
    
    Note over Client,DB: Subsequent request gets fresh data
    Client->>Cache: GET /resource/123
    Cache-->>API: Cache miss (invalidated)
    API->>DB: Fetch fresh data
    DB-->>API: Return updated data
    API->>Cache: Store updated data
    API-->>Client: Return updated response

Resource Versioning

APIs SHOULD consider resource versioning as a complementary strategy:

MAY include version identifiers (e.g., ETags, timestamps) in cache keys.

MAY use content-based hashing for automatic versioning of static resources.

SHOULD NOT rely on versioning alone for frequently updated resources.

sequenceDiagram
    participant Client
    participant Cache
    participant API
    participant DB
    
    Note over Client,DB: Initial request
    Client->>API: GET /resource/123
    API->>DB: Fetch data
    DB-->>API: Data with version v1
    API->>Cache: Store with key "/resource/123:v1"
    API-->>Client: Return response with ETag "v1"
    
    Note over Client,DB: Data updated to v2
    
    Note over Client,DB: Subsequent request with new version
    Client->>API: GET /resource/123
    API->>DB: Check version
    DB-->>API: Current version is v2
    API->>Cache: Look for "/resource/123:v2"
    Cache-->>API: Cache miss
    API->>DB: Fetch data
    DB-->>API: Data with version v2
    API->>Cache: Store with key "/resource/123:v2"
    API-->>Client: Return response with ETag "v2"

When to Use Each Strategy

Strategy	When to Use	When to Avoid
Time-Based Expiration	MUST use for all cacheable resources as a baseline strategy	SHOULD NOT rely solely on for critical, frequently changing data
Event-Based Invalidation	SHOULD use for dynamic data with unpredictable update patterns	SHOULD NOT use if update events cannot be reliably captured or propagated
Resource Versioning	SHOULD use for static assets and rarely changing resources	SHOULD NOT use as the only strategy for frequently updated resources

Hybrid Approaches

APIs SHOULD implement hybrid invalidation approaches for optimal results:

Time-Based + Event-Based

SHOULD set reasonable TTLs as a fallback.

MUST trigger invalidation on data changes.

MUST ensure cache consistency in distributed environments.

Versioning + Time-Based

MAY version resources for major changes.

SHOULD set appropriate TTLs for minor variations.

SHOULD use conditional requests with ETags.

flowchart TD
    A[Evaluate Resource] --> B{How frequently
does it change?}
    
    B -->|Rarely| C[Long TTL + Versioning]
    B -->|Sometimes| D[Medium TTL + Event-Based]
    B -->|Frequently| E[Short TTL + Event-Based]
    B -->|Unpredictably| F[Event-Based + Versioning]
    
    C --> G[Example: Static Assets]
    D --> H[Example: Product Information]
    E --> I[Example: Price Data]
    F --> J[Example: User Preferences]
    
    style C fill:#d4edda
    style D fill:#d4edda
    style E fill:#fff3cd
    style F fill:#f8d7da

Cache Key Strategies

APIs SHOULD carefully design cache keys to support effective invalidation:

SHOULD use hierarchical keys to enable invalidation of related resources.

MAY include relevant parameters in cache keys (e.g., user roles for permission-dependent content)

MUST avoid including sensitive information in cache keys.

SHOULD document cache key formats to aid debugging and maintenance.

graph LR
    subgraph "Good Cache Keys"
        A["users:123:profile"] --> B["users:123:permissions"]
        C["products:category:electronics"] --> D["products:id:456"]
    end
    
    subgraph "Poor Cache Keys"
        E["data_123"] --> F["info_abc"]
        G["result_xyz"] --> H["cache_456"]
    end
    
    style A fill:#d4edda,stroke:#c3e6cb
    style B fill:#d4edda,stroke:#c3e6cb
    style C fill:#d4edda,stroke:#c3e6cb
    style D fill:#d4edda,stroke:#c3e6cb

    style E fill:#f8d7da,stroke:#f5c6cb
    style F fill:#f8d7da,stroke:#f5c6cb
    style G fill:#f8d7da,stroke:#f5c6cb
    style H fill:#f8d7da,stroke:#f5c6cb

Performance Metrics

APIs using caching SHOULD monitor:

Cache Hit Rate: Percentage of requests served from cache.
Cache Latency: Time to retrieve data from cache.
Origin Latency: Time to retrieve data from the origin.
Cache Size: Memory/storage consumption by the cache.

APIs SHOULD aim for a cache hit rate of at least 80% for cacheable resources.

Response Headers for Monitoring

APIs MAY consider adding headers to help with debugging and monitoring:

X-Cache: HIT
X-Cache-TTL-Remaining: 286
X-Cache-Key: products:fec65fb3-1e5e-4ff2-a6e0-a423f77f0000

X-Cache: HIT
X-Cache-TTL-Remaining: 286
X-Cache-Key: products:list:limit=10:offset=0:sort=name|asc

Example metrics capture

The below pseudo python example shows how you could manually log caching statistics, however there might libraries that could collect this telemetry for you with OpenTelemetry instrumentation such as opentelemetry-instrumentation-fastapi.

# Python example of cache monitoring
def get_cached_response(cache_key):
    start_time = time.time()
    cached_response = cache.get(cache_key)
    lookup_time = time.time() - start_time
    
    metrics.timing('cache.lookup_time', lookup_time)
    
    if cached_response:
        metrics.increment('cache.hit')
        return cached_response
    else:
        metrics.increment('cache.miss')
        return None

Published: 6 March 2025
Last updated: 29 August 2025
Page Source