Caching content with Fastly

The guidance on this page was tested with an older version (0.11.0) of the Rust SDK. It may still work with the latest version (0.11.2), but the change log may help if you encounter any issues.

The Fastly edge cache is an enormous pool of storage across the platform's network which allows you to satisfy end user requests with exceptional performance and reduce the need for requests to your backend servers.

Most use cases make use of the readthrough cache interface, which works automatically with the HTTP requests that transit your Fastly service to save responses in cache so they can be reused. The first time a cacheable resource is requested at a particular POP, the resource will be requested from your backend server and stored in cache automatically. Subsequent requests for that resource can then be satisfied from cache without having to be forwarded to your servers.

Other cache interfaces, such as simple and core, offer direct access to the shared cache layer from your own code and are exclusively available in Compute services.

Use cases for caching vary hugely, and while some might be very simple, others benefit from some clever features built in to the cache mechanism:

  • HTTP caching semantics, such as the Cache-Control header, are part of the HTTP specification and allow HTTP responses to include information that the Fastly cache uses to decide how they should be cached.
  • Request collapsing allows us to identify multiple simultaneous requests for the same resource, and make just one backend fetch for it, using the resulting response to populate the cache and satisfy all waiting clients.
  • Range collapsing merges requests for separate byte ranges of a backend object into a single backend fetch for the entire object. The resulting response is used to populate the cache and fulfill future client requests for any byte range of the object, as well as to manage the lifetime for the entire object as a whole.
  • Streaming miss writes a response stream to cache and to an end user at the same time.
  • Client revalidation handles conditional headers received from a client to indicate whether its cached copy is still valid, allowing us to send the response body only if necessary. In addition, depending on the state of the object in the cache, this process potentially forwards the request to the backend to add or refresh the cached object.
  • Backend revalidation adds conditional headers to a client request as it is forwarded to a backend, when the request is for a cached object that is stale. If the backend validates that the content in the cache is still good to use, it can instruct the cache to extend the lifetime of the object in place without having to send its contents again.
  • Purging allows cache entries to be expunged ahead of their normal expiry, so that changes to the source content can be reflected at the edge immediately.

Balancing the benefits of these caching features with the desire for simplicity is the reason multiple different cache interfaces are offered (although they all result in objects being stored in the same cache layer).

IMPORTANT: All data stored in the Fastly cache is ephemeral: it will expire, and may be evicted by the platform before it expires depending on how frequently it is used. If you require persistent storage at the edge consider using dynamic configuration or data stores instead.

Interfaces

The Fastly cache is available through a readthrough interface built into the fetch mechanism. In Compute, it is additionally available using explicit calls to the simple or core cache interfaces.

ReadthroughSimpleCore
OverviewAutomatically caches HTTP responses based on HTTP caching semantics as requests traverse through Fastly's network to a backend.Offers a straightforward getOrSet method for programmatic access to the cache for simple use cases.Offers programmatic access to the cache with full control over all cache metadata and semantics, intended for advanced use cases or building custom higher level abstractions.
PlatformVCL and ComputeComputeCompute
Use it for...Automatic cachingSimple key-value cachingComplex requirements
Cache freshnessHTTP semanticsExplicitExplicit
Request collapsingHeuristicAlways-onManual control
Range collapsing✅ (automatic)✅ (manual)
Streaming miss✅ (automatic)✅ (manual)
Client revalidation✅ (automatic)✅ (manual)
Backend revalidation✅ (automatic)✅ (manual)
Surrogate keys
Purging

These interfaces all access the same underlying storage layer, and use the same address space. See interoperability for details.

Readthrough cache

The readthrough cache is the Fastly cache interface your services are most likely to use. In both VCL and Compute services, the readthrough interface is enabled by default and invoked every time you make a request to backend from your edge application. It is the only cache interface available to VCL services.

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, the readthrough interface works without any configuration or code required.

The readthrough cache interface understands HTTP caching semantics and seamlessly supports request collapsing (based on cacheability of the object), range collapsing, streaming miss, revalidating client requests, revalidating cached content with the backend, and purging.

Automatic request transformations

The readthrough cache is designed for HTTP and performs the following transformations automatically to make cache usage simple and efficient.

  • Ranged requests

    Ranged requests are requests that use headers to ask the server to return a desired byte range of an object's body, rather than the entire body. If the server honors the range, it will return a response that has the status 206 Partial Content and that contains only the specified part of the body.

    The readthrough cache interface automatically transforms a ranged request into a request for the entire body when forwarding it to the backend, so that the cache always contains the entire object. The cached response is then transformed back into a 206 Partial Content response containing the requested part of the body before returning it to the client. Subsequent requests for the entire object or for a range (even a different range) of the object is fulfilled by this cached object, transforming it for each range as necessary.

    This enables the cache to manage lifetime according to HTTP caching semantics for the entire object as a single entity. Additionally, this enables multiple simultaneous requests to same or different parts of a single object to be collapsed using the request collapsing mechanism.

  • Conditional (aka revalidation) requests

    Conditional requests are requests that use headers to ask the server to either "revalidate" an existing response (e.g. by reporting that the object hasn’t been modified since it was given), or else to return a normal response. If the object is still valid, the server is expected to return a response with a 304 Not Modified code without a body and include updated headers which can be used to freshen a cached response, including updating its effective TTL.

    The readthrough cache interface automatically processes conditional requests from the client by first treating them as normal requests, potentially bringing response data into the cache. Afterward, the readthrough cache attempts a revalidation against the response's validators to determine whether to return a 304 Not Modified or a normal response to the client.

  • Requests for stale objects

    On the other hand, if a requested object exists in the Fastly cache as a stale object, the readthrough cache will automatically issue conditional requests to a backend when possible to update (revalidate) its stored copy. In response, the backend can avoid transferring the body if it has not changed, instructing us to update just the headers and lifetime of the cached object in-place.

    When within a stale-while-revalidate period, these backend revalidations may be issued asynchronously, immediately returning the stale object for client use.

    See revalidation for details.

IMPORTANT: Note that client revalidation and backend revalidation occur independently:

  • A client request may or may not be a conditional request.
  • The request may or may not be fulfilled by the Fastly cache. If it is not fulfilled by the cache (i.e., it is for a missing or stale object), the readthrough cache interface forwards it to the backend, in order to add or update the cached object in the cache.
    • At this time, if the object has a validator (an ETag or Last-Modified header), the cache interface automatically adds conditional headers to the forwarded request. If the backend is able to revalidate it, it can update the headers and lifetime of the cache object without sending the body again.
  • Regardless of whether the readthrough cache forwards the request to the backend, it responds to the client request:
    • If the client request was a conditional request, and the content in the readthrough cache is able to revalidate it, then it returns to the client a revalidation response that does not include the body.
    • Otherwise, the readthrough cache returns to the client a normal response that includes the body.

Automatically interpreting and issuing ranged and conditional requests are significant benefits provided by an HTTP-specific caching interface.

Controlling cache behavior

The readthrough cache is a good starting point for most caching use cases. There is no explicit read/write method, but it's possible to control the caching behavior in a number of ways, described below.

Bypassing the readthrough cache for a request

You can mark requests to ensure they are not served from cache.

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, return(pass) from vcl_recv or vcl_miss.

IMPORTANT: a request marked to bypass the cache will also bypass automatic request transformations for that request.

Setting cache policy on a request

It's possible to explicitly set a TTL when making a request in Compute services, but not in VCL services.

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, it's only possible to set a TTL when the response is received. See Setting cache policy on a response below.

Controlling the cache key

By default, the readthrough cache for both VCL and Compute services uses a combination of the request URL (including the path and query) and Host header as its cache key to create unique HTTP objects. Headers are taken into account according to any Vary rules, as a kind of secondary cache key.

For example, the following URLs will cause distinct objects to be cached:

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, you can manipulate the cache key explicitly by adding a request setting via the web interface. For more information, check out the custom VCL documentation.

Setting cache policy on a response

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, set beresp.ttl in vcl_fetch to adjust the cache lifetime of a response, or use the beresp.http.Surrogate-Key header to add surrogate keys to the response.

Knowing a response object's cache state

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, the vcl_hit or vcl_miss subroutines are invoked based on the cache result.

In addition, the platform reports the cache state in fastly_info.state, and sets the X-Cache response header before the vcl_deliver subroutine is invoked.

In a VCL service, use the following to learn about an object's cache state:

For more information on how the readthrough cache determines cache lifetime, see HTTP caching semantics.

Customizing cache interaction with the backend

IMPORTANT: These features are available in the Rust SDK.

In a Compute service written in Rust, the readthrough cache interface can be further customized in the way it interacts with the backend, supporting use cases such as:

  • making modifications to a request only when it is forwarded to a backend
  • reading and modifying response status code and headers, and adjusting cache controls
  • transforming the body of the response that is stored into the cache

Modifying a request as it is forwarded to a backend

Use the Request::set_before_send() method to set a before-send callback on a request. When a send operation is made on the request, the readthrough cache interface invokes the before-send callback if it would forward the request to the backend, after cache lookup has occurred but before forwarding it. If a before-send callback is not set, the default behavior is to make no change to the request before forwarding it.

impl Request {
pub fn set_before_send(
&mut self,
before_send: impl Fn(&mut Request) -> Result<(), SendError>
);
}

Set a before-send callback to make modifications to the Request object only before forwarding it to the backend (i.e., when it is not fulfilled by the cache).

For VCL developers: The before-send callback is roughly equivalent to a combination of the vcl_miss and vcl_pass subroutines. It is invoked for both cache misses and revalidations.

For example, consider the case the backend requires an additional authorization header but the header value is expensive to produce. A before-send callback can be used to add this header only whenever the request must be forwarded to the backend.

The following example demonstrates the use of a before-send callback to inject an authorization header that is not part of the originating request headers:

let mut client_req = Request::from_client();
client_req.set_before_send(|req| {
req.set_header(http::header::AUTHORIZATION, prepare_auth_header());
Ok(())
});
let backend_resp = client_req.send("example_backend")?;

Controlling cache behavior based on backend response

Use the Request::set_after_send() method to set an after-send callback on a request. When a send operation is made on the request, the readthrough cache interface invokes the after-send callback if it forwards it to the backend, after a response has been received from the backend, but before it is potentially stored into the cache. If an after-send callback is not set, the default behavior is to make no change to the response before storing it into the cache.

impl Request {
pub fn set_after_send(
&mut self,
after_send: impl Fn(&mut CandidateResponse) -> Result<(), SendError>,
);
}

Set an after-send callback when you need to modify the status code or headers, adjust cache controls, or transform the body of the response before it is potentially stored into the cache and returned to the client.

For VCL developers: The after-send callback is roughly equivalent to a vcl_fetch subroutine, but unlike with VCL, it is invoked even for successful revalidations.

The CandidateResponse object

When the readthrough cache interface invokes the after-send callback, it passes in a CandidateResponse object. This object represents a "candidate" form of the response that the readthrough cache would store, and also provides methods for adjusting cache controls and transforming the body.

Working with the status code and headers

The CandidateResponse passed into the after-send callback supports an identical interface to Response for reading and writing the status code and headers.

  • Reading

    • CandidateResponse::get_status(&self) -> StatusCode
    • CandidateResponse::get_header(&self, name: impl ToHeaderName) -> Option<&HeaderValue>
  • Writing

    • CandidateResponse::set_status(&mut self, status: impl ToStatusCode)
    • CandidateResponse::set_header(&mut self, name: impl ToHeaderName, value: impl ToHeaderValue)

Changes you make to the CandidateResponse during the after-send callback are reflected in the response that will be stored in the cache and returned to the client.

Manipulating cache controls

CandidateResponse also supports checking and adjusting cache controls on a backend response before it is inserted into the cache.

  • The readthrough cache interprets HTTP caching headers on the backend response as cache controls and makes them available by calling the following methods:

    • CandidateResponse::get_ttl(&self) -> Duration
    • CandidateResponse::get_stale_while_revalidate(&self) -> Duration
    • CandidateResponse::get_vary(&self) -> impl Iterator<Item = &str>
    • CandidateResponse::get_surrogate_keys(&self) -> Vec<String>
    • CandidateResponse::is_cacheable(&self) -> bool
  • Cache controls can be modified either by directly modifying the HTTP caching headers or by calling the following cache control methods:

    • CandidateResponse::set_ttl(&mut self, override: Duration)
    • CandidateResponse::set_stale_while_revalidate(&mut self, override: Duration)
    • CandidateResponse::set_vary(&mut self, override: impl IntoIterator<Item = &'a HeaderName)
    • CandidateResponse::set_surrogate_keys(&mut self, override: impl IntoIterator<Item = &str>)
    • CandidateResponse::set_cacheable(&mut self)
    • CandidateResponse::set_uncacheable(&mut self, record: bool)

IMPORTANT:

  • The set_* methods do not modify the headers in any way, but always override their values.
  • The get_* methods always return the effective value (i.e., the override if one is set, or the value based on the current headers otherwise).

The following example demonstrates the use of an after-send callback to customize caching behavior based on the Content-Type returned by the backend:

let mut client_req = Request::from_client();
client_req.set_after_send(|resp| {
match resp.get_header_str("Content-Type").as_deref() {
Some("image") => resp.set_ttl(Duration::from_secs(67)),
Some("text/html") => resp.set_ttl(Duration::from_secs(321)),
Some("application/json") => resp.set_uncacheable(false),
_ => resp.set_ttl(Duration::from_secs(2)),
}
Ok(())
});
let backend_resp = client_req.send("example_backend")?;

The following example demonstrates the use of an after-send callback to store a hit-for-pass object (a marker to disable request collapsing until a cacheable response is returned). This is done by passing true for the record_uncacheable parameter of CandidateResponse::set_uncacheable():

let mut client_req = Request::from_client();
client_req.set_after_send(|resp| {
if resp.contains_header("my-private-header") {
resp.set_uncacheable(true);
}
Ok(())
});
let backend_resp = client_req.send("example_backend")?;
Modifying the body that is saved to the cache

In an after-send callback, optionally use the CandidateResponse::set_body_transform() method to set a body-transform callback. This callback provides a mechanism that allows modifying the response body that will be stored into the cache. If a body-transform callback is not set, the default behavior is to make no changes to the response body before storing it into the cache.

impl CandidateResponse {
fn set_body_transform(
&mut self,
transform: impl FnOnce(Body, &mut StreamingBody) -> Result<(), SendError>
);
}

When the cache interface receives the response body from the backend, it invokes the set-transform callback, passing in the Body that contains the response received from the backend and a StreamingBody for your callback to use to write out the transformed body. This transformed body is stored into the cache and returned to the client from the send operation. If you return an error from the callback, the send operation fails without writing into the cache, and the error is used as the return value of the Request::send() call.

TIP: Reading the entire body into memory can impact the limited WebAssembly memory space. Therefore, it is best practice to stream bytes out of the Body and into the StreamingBody when possible.

The following example demonstrates the use of a body-transform callback to transform a response before it is cached. In this example, suppose a function json_to_html() exists that builds a templated HTML response from a backend’s JSON API response. The body-transform callback performs the templating when the response is received from the backend, enabling the readthrough cache interface to cache the templated HTML rather than regenerating it for each request:

let mut client_req = Request::from_client();
client_req.set_after_send(|resp| {
resp.set_content_type(Mime::TEXT_HTML);
resp.set_body_transform(|body_in, body_out| {
let json = body_in.into_json()?;
body_out.append(json_to_html(json));
Ok(())
});
Ok(())
});
// The resulting `resp` will have an HTML body, just like the cached object,
// despite that the backend returned a JSON response:
let resp = client_req.send("example_backend")?;

Unlike the remainder of the after-send callback, the readthrough cache interface does not invoke your set-transform callback during a successful revalidation (304 Not Modified) response from the backend. This is because the revalidation response from the backend does not contain a body. See additional considerations for details.

Additional considerations

  • The readthrough cache interface invokes the before-send and after-send callbacks whenever a send operation causes it to forward a request to the backend. This includes cache misses as well as revalidationsincluding background revalidations.

    IMPORTANT: A background revalidation occurs when an object is found in the cache during its stale-while-revalidate lifetime. In this case:

    • The send operation returns to the caller immediately with a Response object that represents the stale content. Therefore, changes that would be made to the response during an after-send callback will not be reflected in the response returned to the client during this execution.
    • At some later time before the end of the execution of the application instance, the readthrough cache performs the revalidation by forwarding the request to the backend and updating the cache object, whose process includes invoking before-send and after-send callbacks if they have been set. The exact timing of this is outside the control of your application and should not be relied upon.

    IMPORTANT: Requests marked to bypass the cache skip the readthrough cache entirely, and as a result do not cause the callbacks to be invoked.

  • As described above in Automatic request transformations, the readthrough cache interface automatically transforms the request and response in order to make cache usage simple and efficient.

    IMPORTANT: If the client request is a ranged request:

    • The readthrough cache interface removes the range headers before it forwards the request, which will be reflected in the Request passed into the before-send callback if one is set. This enables the readthrough cache interface to retrieve and cache the entire object from the backend, manage its cache lifetime as a single entity, as well as subsequently serve arbitrary ranges from it.
    • Thus, the readthrough cache interface receives a response from the backend that includes the entire body, which will be reflected in the CandidateResponse passed into the after-send callback if one is set. The readthrough cache interface further transforms the response to return appropriate Response objects for both ranged- and non-ranged requests for this cache object.

    IMPORTANT: If the client request is for a cached object that is stale, and the object has a validator (an ETag or Last-Modified header):

    • The readthrough cache interface adds conditional headers before it forwards the request, which will be reflected in the Request passed into the before-send callback if one is set. This enables the readthrough cache interface to automatically attempt to revalidate the response with the backend, rather than re-retrieve it.
    • If the backend successfully revalidates the response (status code is 304 Not Modified): The readthrough cache interface invokes the after-send callback if one is set, but it does not invoke the body-transform callback. The CandidateResponse represents a complete response, i.e., it has status code 200 OK, and its headers and cache policy reflect updates specified by the backend response (for example, to extend an object's lifetime). These values are merged with any changes you may make during the after-send callback, and then used to update the cached response object "in-place", without changing the cached response body.
    • If the backend does not revalidate the response (status is other than 304 Not Modified): The readthrough cache interface treats the response as it would any other non-conditional request, invoking the after-send and body-transform callbacks if set, then storing the resulting response to the cache.

    This design enables the readthrough cache to internally manage the complexities of ranges and revalidation, allowing the developer to provide a single code path without needing to think about ranges or revalidation at all.

Simple cache

Often you may want to cache data directly from your Compute application in a simple, volatile key-value store, and do not require any of the more complex mechanisms supported by the readthrough cache. For example, if you want to cache the state required to resume an authentication flow, or flags that have been set for A/B testing in a session, a straightforward get/set interface is ideal.

Simple cache operations have always-on request collapsing, so if two operations attempt to populate the same cache key at the same time, the setter callback will only be executed once. However, values are treated as opaque data with no headers or metadata. This means simple cache does not support staleness, revalidation, or variation.

For complete documentation on the simple cache interface, refer to the reference for the Compute SDK of your choice.

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, the simple cache interface is not available.

Core cache

Core cache is a low-level interface available in Compute services. It offers the primitive operations required to implement high-performance cache applications with all the same advanced features available from the readthrough cache, but gives you complete control of them.

Items cached via this interface consist of:

  • A cache key: up to 4KiB of arbitrary bytes that identify a cached item. The cache key may not uniquely identify an item; headers can be used to augment the key when multiple items are associated with the same key. See LookupBuilder::header() in the Rust SDK documentation for more details.
  • General metadata, such as expiry data (item age, when to expire, and surrogate keys for purging).
  • User-controlled metadata: arbitrary bytes stored alongside the cached item contents that can be updated when revalidating the cached item.
  • The object itself: arbitrary bytes read via Body and written via StreamingBody.

For complete documentation on the core cache interface, refer to the reference for the Compute SDK of your choice.

  1. VCL
  2. Rust
  3. JavaScript
  4. Go

In a VCL service, the core cache interface is not available.

Interoperability

Whether you use the readthrough cache, simple cache or core cache interfaces, data is stored in the same namespace, but interoperability is currently limited to the following:

  • The core cache interface can read and overwrite objects inserted via the simple cache interface.
  • Simple cache can read (but cannot overwrite existing) objects inserted using core cache but provides only the body of the object.
  • The readthrough cache interface is not interoperable with other cache interfaces and cannot read data written through another interface, nor can it write data that is visible to other cache interfaces.

Interoperability also affects purging.

Limitations and constraints

The following limitations apply to all cache features:

  • Variants (created explicitly in the core cache interface or when the readthrough cache processes the Vary HTTP response header) are limited differently depending on platform.

    1. VCL
    2. Compute

    In VCL services, the number of variants is limited to 50 per cache object, regardless of the number of Vary rule permutations.

  • In the core cache interface, write operations target the primary storage node for that cache address only. If an existing object is overwritten, replicated copies may continue to be returned by subsequent reads until their TTL expires or the object is purged.

  • Purges are asynchronous while writes are synchronous, so performing a purge immediately before a write may result in a race condition in which the purge may clear the primary instance of the cached data after the write has completed.

Use of Fastly services is also subject to general limitations and constraints which are platform specific: for more information see separate limits for VCL services and for Compute services.