A shell script to refresh a predefined set of pages in varnish cache

I have the varnish cache TTL set for a very long time (500 days), so cached content practically never expires. Every time I update this blog, there are several pages I need to purge. The following script seems to do a good job purging a list of pages and then immediately re-populating the cache with an updated version, so the next time a real visitor comes, there's no waiting around for the slow backend to generate it, instead the amazing varnish promptly serves a hot copy from its cache.

Script in action:

ak@loon:~$ v-purge-common 
-----------------------------
Purging old pages from cache
-----------------------------
0x2e105f00 1321650069.214849     0 	req.url ~ ^/$
0x2e111740 1321650069.354750     0 	req.url ~ ^/alain/$
0x2e118140 1321650069.498892     0 	req.url ~ ^/alain/feed/$
0x2e1183c0 1321650069.642832     0 	req.url ~ ^/alain/feed/atom/$
---------------------------------------
Populating cache with new page content
---------------------------------------
200 http://giantdorks.org/
200 http://giantdorks.org/alain/
200 http://giantdorks.org/alain/feed/
200 http://giantdorks.org/alain/feed/atom/

Download the script:

#!/bin/bash

cmd="sudo varnishadm -T 127.0.0.1:6082 -S /etc/varnish/secret"
site="http://giantdorks.org"
pages="
/
/alain/
/alain/feed/
/alain/feed/atom/
"

echo -----------------------------
echo Purging old pages from cache
echo -----------------------------
for page in $pages; do 
 $cmd purge.url "^$page\$" | sed '/^$/d' 
 $cmd purge.list | head -n 1
done

echo ---------------------------------------
echo Populating cache with new page content
echo ---------------------------------------
for page in $pages; do 
 curl -sL -w "%{http_code} %{url_effective}\n" $site/$page -o /dev/null
done

13 Comments

  • 1. David Hadaller replies at 22nd November 2011, 9:11 am :

    Thanks, this is exactly what I’ve been looking for. Nice and simple too, great work! Too bad Varnish doesn’t have this feature built in (would be nice if it scheduled a refetch when something is purged or right before it expires), but nice that people like you share your experiences and code to solve the problem a different way!

    Any thoughts on how to handle prefetching a large number of pages when you don’t necessarily know their URLs? e.g. on your WordPress blog when you post a new post it would be nice to prefetch the post page as well.

    Thanks again!

  • 2. Alain Kelder replies at 22nd November 2011, 5:51 pm :

    Hi David,

    Any thoughts on how to handle prefetching a large number of pages when you don’t necessarily know their URLs?

    You could use something like this script [1] to just blindly spider a whole site or build a list of URL by pulling out post/category/etc data from the DB.

    Not sure what your exact use case is, but if it’s page changes that occur due to normal content management, a WP plugin (or a Drupal module) that purges the cache whenever existing content changes, is what I’d use. I’m using the WP varnish plugin on this site and it works great for comments, it doesn’t correctly purge all the pages I need though, so that’s why I’m running some scripts via cron to do it for me so I don’t have to shell into the server every time content changes. I’ve been meaning to figure where it breaks, but haven’t had the time. I also don’t know if the Drupal/WordPress varnish modules also prefetch after purge, but it probably is easy enough to add.

    [1] http://giantdorks.org/alain/little-shell-script-to-recursively-check-a-site-for-broken-links/

  • 3. David Hadaller replies at 23rd November 2011, 9:44 am :

    Thanks for the ideas Alain! I have both WordPress and Drupal sites behind varnish (mainly Drupal at this point). The varnish module for Drupal seems to be doing an excellent job purging changed content as Drupal 7 has a pluggable cache system and the varnish module fits in there nicely. What I’m finding is when I manually clear the cache when I make a change to the site it would be nice to have a way to preload the cache again and I think your script is a good part of the answer to that. Also I’d like to make sure when a page expires or is purged that it is reloaded automatically. Your suggestion of having the Drupal/WordPress module trigger this is a good one!

    One more question. I’m new to WordPress but wp supercache looks pretty awesome and does prefetching I believe. Is there a reason you are using varnish over this?

    Thanks!

  • 4. Alain Kelder replies at 23rd November 2011, 11:11 am :

    I’d tried WP Super Cache among others before switching to varnish cache and found many of the application specific caching solutions quite good.

    Probably the most compelling reason to use something like varnish cache over application specific solutions for me is that the same solution works for all sites, regardless of what CMS/framework they’re built on. If your cache TTL is sufficiently low and you do content management/administration over https (bypassing varnish entirely), you could even do without any cache purging plugins.

    Also, Apache can be very fast at serving static files when properly tuned, but probably never as fast as varnish [1], so varnish cache in front of Apache should be faster than WP Super Cache or other similar solutions [2].

    For this blog, given the miniscule traffic, varnish is not necessary and I would be just fine with something like WP Super Cache, but not at my day job, so I’m using varnish here primarily to gain experience with it.

    There are also other solutions besides varnish that might be better depending on the use case.

    At work, I never know what web application I’ll be asked to support next, so running Apache as the default backend makes life easier as most web apps have been tested against Apache. Also, Drupal has a lot of momentum at Stanford and varnish cache has a lot of support and momentum within the Drupal community (in fact I first learned of it at DrupalCon SF), so going with varnish for the frondends was an easy sell because it can’t be half bad if whitehouse.gov uses it. 🙂

    [1] http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-varnish-vs-gwan/
    [2] http://cd34.com/blog/scalability/wordpress-cache-plugin-benchmarks/

  • 5. David Hadaller replies at 26th November 2011, 4:04 pm :

    Thanks Alain, great answer! I totally agree that varnish likely outperforms even Apache serving static files. It’s also nice to know you have something sitting in front of Apache shielding it from possible anonymous traffic spikes.

    Like you, I also learned of varnish through the Drupal community (PNW Drupal Summit 2010/2011 for me). I’m starting to learn WordPress for some entry-level jobs where the client needs a simple to use admin interface (Sorry Drupal, you just aren’t there yet!).

    My main motivation for page caching has been for sites with low traffic but high back-end page generation times (2-3 seconds), running Drupal. Yes I should probably do some digging with xhprof to trim down my code, but the ability to keep a fresh copy in varnish to serve up instantly is too tempting not to spend time on instead.

    I’ll play around with your scripts to help keep my varnish cache primed at all times. If I have any more thoughts or questions, you’ll hear from me! Thanks again!

  • 6. Alain Kelder is a Giant D&hellip replies at 18th July 2012, 11:44 am :

    […] on the heels of the shell script to refresh a predefined set of regularly changing pages in varnish cache, comes the updated version of single page refresh script. The previous version didn’t […]

  • 7. ratha replies at 5th September 2013, 2:08 am :

    Hi Alain Kelder,
    Any thoughts on how to handle prefetching a large number of pages when you don’t necessarily know their URLs? or can i use sitemap to prefetching this? and how to?

  • 8. Alain Kelder replies at 18th September 2013, 8:00 am :

    @ratha

    Your question is the same as comment #1 above. So my answer is the same as well (see comment #2).

  • 9. cherouvim replies at 30th September 2013, 4:03 am :

    Great post. Thanks.

  • 10. ratha replies at 9th October 2013, 1:07 am :

    Thank for Reply me.
    Now I have another issue is.
    I’m using AWS Server.

    I have Load-balancer in front of my 2 Varnish Servers(varnish1,varnish2) Varnish1-to-Backend1, Varnish2-to-Backend2.

    In your above script when I execute, it will cross the Load-Balancer and Load-Balance will forward my request sometime to Varnish1 and sometime to Varnish2. Any way if want to execute the script it direct to my define Varnish Servers?
    Thank for you help

  • 11. ratha replies at 9th October 2013, 9:07 pm :

    Just want to add more to #10
    Lets say I have a website example.com hosted on two separate servers, 1.3.5.7 and 2.4.6.8.

    To view each version of the site I would normally edit my /etc/hosts file to point the IP address of the server I wish to query.

    What I am looking to do is use wget/curl to send the header requests to a specific IP address

    ***the servers is a shared hosting server so many domains are sharing the same IP address

  • 12. Alain Kelder replies at 10th October 2013, 7:09 am :

    @ratha

    To purge cache on multiple Varnish hosts, you could purge via HTTP instead of using varnishadm. Instead of sending a purge command to example.com, you would send it directly to your Varnish servers (1.3.5.7 and 2.4.6.8). For specific example, see “Purging via HTTP” section here: http://giantdorks.org/alain/exploring-methods-to-purge-varnish-cache/. There’s a shell script at the bottom of that section that will send the PURGE command to multiple Varnish Cache frontends.

  • 13. ratha replies at 11th October 2013, 12:42 am :

    Thank Alain, I hope this will fix my issue.

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).