Boost Cache Efficiency with Origin Log Analysis
Here's a short tip: analyze the logs on your origin servers.
If you want to increase the efficiency of your Varnish (or Fastly) cache, you need to figure out what traffic is not cached. By definition, any traffic that reaches your origin is not cached, and thus worthy of investigation. If it reaches your origin, it's either a miss or a pass. You never want misses, but sometimes passes are unwanted, too. You don't want a cookie to cause static files to be passed instead of cached, for instance.
Just generating a top 100 list of URLs requested on your origin goes a long way towards finding quick wins in the form of URLs you assumed were cached, but in reality are not. The common causes, and how to solve them, will be covered in next month's Varnish tip.
Commonly used offline log analysis tools are AWStats and Piwik. There are also plenty of online (live) log analysis tools like Logentries and Splunk.
One caveat concerning the top 100 approach is randomized cache busters. Used well, cache busters can save a lot of work, but if they're completely random, they'll wreak havoc on your cache. And due to their nature, they might not show up in a top 100 list. So don't look at just a single list, but explore the possibilities of your log analyzer tools. Most will allow you to cut off query strings before the top 100 is made. Some might even allow you to whitelist only specific parameters, which could be useful if you're trying to cache a RESTful API, for instance.
Don't be afraid to immerse yourself in raw logs; sometimes the human brain trumps all other pattern detection.