How Sleeping Duck survived Shark Tank
Since we launched Sleeping Duck in 2014, we've been actively trying to change the way the mattress industry works. Sleeping Duck came about because of our founders’ frustrations with the process of buying a mattress. Over the years we've worked hard to make our buying experience as smooth and risk-free as possible, nurturing every single person who's come through our online store. We even pioneered the 100-night trial that is now an industry standard for online mattress companies. In 2017, we had the unique opportunity to appear on Shark Tank — as the tech lead, it was my job to make sure our site would be ready for the attention we knew was coming our way. Here’s how we did just that.
Preparing for the Tank
We had just one technical requirement to prepare for Shark Tank: stay up while the show aired. Despite a ~60x increase in requests going through Fastly in the hour the episode aired, the site handled it well. The increase was sudden and dramatic: we went from ~100 req/min the minute before the episode aired to 48,600 requests the next minute. I am very happy we had Fastly in front of our servers to relieve the load. In this post, I’ll discuss what we did to prepare for appearing on Shark Tank, including:
Making sure we were caching as much as possible at the edge
Edge dictionary backed feature flags
Creating custom error pages in VCL
Implementing some common requests in AWS Lambda
Cache as much as possible
For Sleeping Duck, there is no way we could possibly replicate Fastly’s huge global network of super fast cache nodes. Serving cached HTML from Fastly means your app servers can just deal with the important things, like processing orders.
How to figure out what to cache
In order to find out what pages were not being cached, we turned to Fastly’s streaming logs to stream logs to an S3 bucket. Each request would log the URL, the cache state (using the VCL variable fastly_info.state
), and the time it took to process the request (using the VCL variable time.elapsed.usec
). We coupled this with the awesome json_generate.vcl script to format the log lines as JSON to make ingestion easier.
We then used the S3 AWS Lambda trigger to automatically insert the requests into a Postgres database for analytics. That said, this implementation isn’t limited to S3/Lambda. In fact, using the built-in BigQuery log streaming will get you up and running more easily.
The logging VCL looked like this:
include "json_generate.vcl";
sub vcl_log {
#FASTLY log
call json_generate_reset;
call json_generate_begin_object;
set req.http.value = "url";
call json_generate_string;
set req.http.value = req.url;
call json_generate_string;
set req.http.value = "cacheNode";
call json_generate_string;
set req.http.value = server.identity;
call json_generate_string;
set req.http.value = "method";
call json_generate_string;
set req.http.value = req.request;
call json_generate_string;
set req.http.value = "httpStatus";
call json_generate_string;
set req.http.value = resp.status;
call json_generate_number;
set req.http.value = "bytesWritten";
call json_generate_string;
set req.http.value = resp.bytes_written;
call json_generate_number;
set req.http.value = "fastlyState";
call json_generate_string;
set req.http.value = fastly_info.state;
call json_generate_string;
set req.http.value = "timestamp";
call json_generate_string;
set req.http.value = time.start;
call json_generate_string;
set req.http.value = "timeElapsed";
call json_generate_string;
set req.http.value = time.elapsed.usec;
call json_generate_number;
call json_generate_end_object;
log {"syslog [ID] S3 - Cache log :: "}req.http.json_generate_json;
}
Not only did we use these logs and data to find common URLs that were not being cached when they should have been, we used the time.elapsed.usec
variable to find slow requests, and resp.status
to find failing requests.
Make sure your cache hash key is perfect
At Sleeping Duck, we don't use any sort of query parameters to change the output of the page, so we make sure we ignore it when figuring out the cache hash. This allows us to use query params like UTM tags while maintaining an average cache hit ratio (CHR) of 92.97% (this work paid off, as we had a truly amazing 99.68% CHR during Shark Tank).
sub vcl_hash {
set req.hash += req.url.path;
set req.hash += req.http.host;
#FASTLY hash
return(hash);
}
Cache "dynamic" things with a short TTL
If you are expecting 100+ requests per minute, setting a TTL of 1 minute on requests that are slightly more dynamic will mean you'll only get 1 request per minute per edge POP to your origin, and the data can still remain pretty fresh.
At Sleeping Duck, we have a couple of endpoints that pull in blog posts, CMS pages, reviews, etc. We cache those for 1 minute, and that helps keep the requests to the origin down.
Use stale_while_revalidate and stale_if_error
Even with all the planning and pre-scaling in the world, things can go wrong and backends go down. Luckily, you can configure the stale_if_error directive to tell Fastly that if the backend throws errors, to use a stale cache copy. This is a perfect way to be able to fall back on cache and serve stale content rather than nothing at all or an error page.
In the same vein, you can configure the stale_while_revalidate directive to tell Fastly that you’d prefer a stale cached copy to be returned quickly while the updated copy is fetched asynchronously in the background.
Both of these configuration options are a trade off between the cost of not serving any content vs serving stale content. In some cases, it’s unacceptable to serve stale content and in our case, it’s much better to serve some fast outdated content then slow or none at all.
Cache POST requests
At Sleeping Duck we use a group of POST requests that return data to render different pages of the site. For the requests that didn't rely on or modify the session, we turned on a short TTL and added the POST body into the cache hash key. Ideally these requests are GET request, but we use a structured JSON body in the request. Due to the POST body size limitation of what is exposed in VCL, we audited each request we were turning on caching for to ensure we wouldn’t go over this limit:
sub vcl_hash {
set req.hash += req.url.path;
set req.hash += req.http.host;
if (req.postbody) {
set req.hash += req.postbody;
}
#FASTLY hash
return(hash);
}
We use a whitelist-based lookup in VCL for paths we'd like to cache. Normally these requests also bundle up the current session state, but we set a HTTP header in vcl_recv to tell the backend that this is going to be cached, so the backend doesn’t return any session data that might end up in the cache. Just to be extra sure, we also stripped cookies from the request, as a fail safe. This means that even if for some reason the backend leaks session data into the cache, it would be a brand new, fresh session without any PII:
if (table.lookup(feature_flags, "enable_action_cache") == "true" && table.lookup(cache_action_paths, req.url.path) && req.request == "POST" && req.postbody){
set req.http.X-Fastly-Cache = "true";
set req.http.X-Fastly-Cache-Type = "action";
unset req.http.Cookie;
return(lookup);
}
On the backend, this looks like:
if (req.header('X-Fastly-Cache') !== 'true') {
result.session = getSessionState();
}
Harden your origins
No matter the amount of cacheable content on our site, some requests were going to get to our origin. This could be because people were hitting PoPs that don’t have a cached copy, TTLs have expired, or maybe they were just purchasing a mattress.
While we spent a good amount time focusing on leveraging Fastly as much as possible, it was also important to look at what happened if requests did need to go to the origin, and plan accordingly. Here’s a couple key things to consider.
How often does a page change?
For Sleeping Duck, the bulk of our pages are very static — they only can change when a deploy occurs. So to reduce the load on our app servers generating the HTML for those simple pages, we generated HTML copies of the pages and uploaded them to S3 at deploy time. We then used a Fastly Edge Dictionary to look up the path in VCL and dynamically rewrite the origin to S3. That way we could always failover to our app servers if S3 went down. We also moved a lot of smaller, less important requests (such as page view beacons) off to AWS Lambda.
Dedicate resources to the business critical requests
We grouped our requests into business critical and non-business critical. We then isolated the origins that hosted the business critical and non-business critical requests. This meant:
We could scale the business critical hosting up to a ridiculous size if needed
The non-business critical requests wouldn't interfere with the business critical request, and
If the business critical requests had issues, we could share the load between the two environments.
Be able to tweak things quickly
At Sleeping Duck we use a "feature flag" Edge Dictionary that can turn different features off and on and route for requests. This means that if one origin started acting strange, we could simply turn it off. This was inspired by the blog post How to solve anything in VCL, part 1: collecting data at the edge.
We built a very basic UI to our feature flag Edge Dictionary, to allow us to tweak any of the settings we want:
if (table.lookup(feature_flags, "enable_static_frontend") == "true" && table.lookup(static_path_map, req.url.path) && (req.request == "GET" || req.request == "HEAD")) {
set req.url = "/assets.sleepingduck.com/ludwig-static-frontend" table.lookup(static_path_map, req.url.path);
set req.backend = F_S3___Assets_Website;
set req.http.host = "s3-ap-southeast-2.amazonaws.com";
[SNIP]
Have a backup plan
We wanted to make sure that even if everything behind Fastly fell apart, we were still able to show our potential customers something. In order to do this, we created a "down page" — a self-contained HTML document that is embedded into our Custom VCL that can be enabled by toggling a feature flag. Yes, we have an entire HTML document in our VCL to failover to, just in case. We also had the ability to completely override the entire contents of the down page by using another key in our feature flag Edge Dictionary.
Going forward
In the future, I see us moving more and more logic to the edge. We also serve the European market, so having as much logic near our customers will create a much faster and better experience for those users. I also expect we will look at using Fastly’s edge image optimizer along with image sets to serve smaller, device specific images. Also, given that Fastly can respond to most requests from the cache, or S3 for static content, we’re also exploring moving all our dynamic requests to AWS Lambda to reduce hosting costs by only paying per request.