How to Win Black Friday
For the majority of the population, Thanksgiving and the rest of the holiday period are a time for family, feasting and frivolity. For eCommerce managers, however, it's the busiest period of the year with almost a quarter of yearly online sales happening over a 4 week period.
When eCommerce first started making its mark, the retail world was a very different place than the one we now know. Online sales were minimal compared to those of traditional brick-and-mortar, in part because only a small percentage of customers had easy access to the internet at the time. Customers had low expectations and were therefore very forgiving. In many ways, retailer websites simply functioned as a way to entice customers into the store to part with their holiday bonus on that incredibly large TV at 70% off.
Fast forward 10+ years and the world has changed dramatically. eCommerce has entered the mainstream, with around 5.8% of all sales occurring online, up from approximately 2.3% in 2005. Because of this, both retail managers and customers have massively higher expectations for eCommerce sites in terms of performance and furthering goals. Sites are now expected to serve as more than a brochure, instead offering access to customer reviews, a seamless purchasing experience, and much more.
Along with this increase in expectations comes a change in traffic volumes. The huge levels of growth have compounded year on year, leaving sites woefully underprepared for today’s traffic loads. In 2004, 13.3 million customers visited eCommerce sites on Black Friday, an 11% increase from the previous year. Last year, the top 500 retail sites in the US alone had 193.8 million visitors, that’s a 1358% (13x!) increase from 2004. Outside of the US, sites see similar traffic spikes on other days too (e.g., the day after Christmas for the traditional "Boxing Day Sales.")
The traditional answer to coping with these massive spikes is to throw hardware at the problem, but capacity planning for a peak that only happens for four weeks of the year (no matter how important) is costly and wasteful. Adding servers gets expensive quickly, and there is no guarantee that this approach works. With more servers come the inherent scaling problems, like handling session failovers.
So what’s another approach? Rather than scaling up the number of servers required, you can focus on making fewer calls to the servers that are already in place.
A first pass - legacy CDNs
Since the late 90s when CDNs first appeared on the scene, the practice of pushing static assets such as images, and later CSS and Javascript files, closer to the end user has become commonplace.
This helps in two ways. Firstly, it reduces the number of requests back to the server which in turn reduces the load. Secondly, when assets are stored closer to the user, they load much faster. This means that site pages feel more responsive, which results in a better experience for shoppers.
This approach can only go so far, however. For a start, static assets are comparatively easy to serve - they require little to no calculation and don't tax servers in the same way that the dynamic content built from calls to application and database servers does.
In addition, the assets are loaded by the page content, so the browser only knows that it needs to fetch them. It can only work out where to place them when the page is fully loaded.
Improving the solution - Fastly Dynamic Caching
While a page’s content may be dynamically generated, it probably doesn't change all that often. A small website with 100 visitors a minute, for example, could serve the same content 500 times, even if that content changes every 5 minutes. Caching the generated content for that five minute window would drop the number of requests to the server down to 1 whilst still serving all 500 visitors. After the first request is made, each subsequent visitor would get the content straight from the cache, a huge decrease on server load and a much faster experience for the customer.
The benefits for larger websites with 1,000 or even 10,000 visitors each minute would be even greater - still the five minute window, but this time serving 5,000 or 50,000 customers with only the single request to the server.
But what about the very personalised dynamic content that will only receive a few hits before it changes? Even if it’s five round trips to the server saved, it all adds up to a faster, better experience. And because the added time cost of caching data is incredibly small, it still makes sense to cache data even if that content will only be used a handful of times.
In short, instead of thinking,
“This object only gets served a few times; why would I cache it?”
the question should be the other way around:
This object gets served more than once; why wouldn’t I cache it?”
Retaining control - Instant Purging
There's a paradox inherent in caching: the longer an object is kept in the cache, the greater the benefits. Unfortunately, there is also a higher chance that the object is out of date.
This remains especially true for eCommerce sites during peak sales and special events. Catalogue items must be kept up to date as inventory changes. If an item goes out of stock or a promotion has ended, the pages representing that item need to be updated immediately. Otherwise, the retailer risks the wrath of confused and angry consumers. The natural inclination is therefore to set a low TTL (Time to Live), but this defeats the point of the exercise - if you set a really low TTL you’re not fully benefiting from caching.
To get around this problem, Fastly provides customers with several ways to update their content instantly and on demand.
The nuclear option is the "Purge All" command, which dumps everything on your site from the cache using a simple call to the Fastly API:
POST /service/myServiceID/purge_all
Fastly-Key: myKey
Accept: application/json
Whilst this works, purging everything on your site when you only want to change a limited amount of content is a little bit like using a large sledgehammer to crack a small nut. You can, of course, purge individually by URL but this may not be ideal either. Generally you'll want to invalidate several objects in a group together such as a product page along with all photos of the product and any promotion or category page containing that product.
With this in mind, Fastly allows you to set a response header that tags content with one or more "surrogate keys." For example, the page and images for a given sweater could return:
200 OK
Last-Modified: Tue, 12 Nov 2013 23:48:53 GMT
Date: Wed, 13 Nov 2013 15:30:39 GMT
Surrogate-Key: category:knitwear product:5674 promotion:summersale
which contains surrogate keys tagging the sweater’s category (knitwear), its product code (5674), and promotions with which it is associated (summersale). Using this information, you can purge only when the stock update has affected that sweater:
POST /service/myServiceID/purge/product:5674
Fastly-Key: myKey
Accept: application/json
… or you can purge everything in the promotion at the same time:
POST /service/myServiceID/purge/promotion:summersale
Fastly-Key: myKey
Accept: application/json
The best of both worlds - Using AJAZ or ESI to mix content
Of course, caching content for anonymous visitors who haven't logged in is one thing, but you obviously don't want the experience to be markedly worse for users who have logged in either, particularly as they're likely to be the ones actually purchasing things! Logged in users are more likely to generate revenue on an eCommerce site, so their content needs to be more personalised. Tailored content for logged in users typically changes more than the content for users who are not logged in.
Fastly makes it possible to cache each user individually by adding a particular cookie to the Cache Key. More often than not though, the bits of the page that are user-specific (e.g., the shopping cart, a greeting) are actually limited to a very small subset of the page as a whole.
In that case, it’s best to separate the user-specific content into another request, which can be done in one of two ways. One way to do this is with an AJAX call to an uncached URL that returns all of the details you need. This can either return HTML or the response of an API call in XML or JSON - either works fine. And while the content won't be cached, you'll still get the benefit of our edge servers being closer to your customers than your server. This will reduce connection latency, making the page feel more responsive, especially for mobile users.
Alternatively, we support ESI (Edge Side Includes) which allows you to embed content from another URL directly into your content (in whatever format it may exist in) using a tag like:
<esi:include src="/shopping_cart" />
Summing up
By caching all your content, you'll eliminate capacity planning issues and end up with a faster site that is more resilient to unexpected and seasonal spikes in traffic. Profits will increase due to reduced CAPEX costs (since you’ll need fewer servers to handle the added traffic). Improved responsiveness will increase revenue as bounce rates decrease and conversions increase.
And with the peace of mind of a smoothly running site, you can relax and enjoy your holidays.
*Image from CNNMoney