Normalizing the Host Header
In the continued quest to increase cache hit ratios, the chant is: "Normalize, normalize, normalize." Less variation in your requests means you have a higher chance of getting hits. This month's highlight is the Host header.
The default VCL that comes with Varnish can safely be put in front of a web server with multiple virtual hosts, since the Host header is used to determine the hash of each object. Because of this, https://www.example.com/main.html
and https://example.com/main.html
will be two separate objects in the cache.
Sometimes caching different objects on different hostnames is what you want. One could be the actual page, and the other could be a redirect to it. The redirect makes sure your URLs are the same everywhere and tends to look tidy. On the other hand, it could also cause a slight delay in the page load due to the possible redirect, so you might want all variations of hostnames in the URL to get the same response. Of course, you'd want all variations of a URL to use just one object in your cache.
This is incredibly simple to do:
sub vcl_recv {
set req.http.Host = "example.com";
}
Maybe you also have assets on css.example.com
and img.example.com
, for which you have separate virtual hosts for on your webserver. In that case, you can extend your VCL with a condition:
sub vcl_recv {
if (req.http.Host == "www.example.com") {
set req.http.Host = "example.com";
}
}
Now only www.example.com
is changed to example.com
and the other hostnames are left as they are.
A different use case might be that you have different language versions of your website, at nl.example.com
, en.example.com
, and fr.example.com
. And the image, JavaScript, and CSS files for all these sites are all the same.
In this case you only rewrite the Host:
header for those files, like so:
sub vcl_recv {
# Language independent files
if (req.url ~ "^/(img|css|js)/") {
set req.http.Host = "nl.example.com";
}
}
If you're feeling really fancy, you could replace nl.example.com
with static.example.com
. Whatever works best with your web server setup.
If there are images or JavaScript files that do have language dependent parts to them, you could simply place those in separate subdirectories from the root, like /nl/css/
or /en/js/
.
As an aside, if you do use multiple hostnames for the same content, remember to use canonical links to help search engines determine which is the preferred hostname. See this canonical URL article for more information, and this blog post for pitfalls to avoid.