A general Varnish health check.

From Varnsih documentation it is quite clear that Varnishstat gives a good representation of the general health of Varnish, including cache hit rate, uptime, number of failed backend connections and many other statistics; everything above 80% is in a good state.

I like to use also another command varnishncsa.
A practical example:

varnishncsa -d | cut -f 9 -d ' ' | sort | uniq -c | sort -n

which will give something like

3 302
3 500
100 206
584 404
1455 301
1746 304
6713 200

Ok but what does it mean? Well easy, there are, for example, 584 pages not found and 6713 pages well served.

Then the next question: for these 404s, “who is the culprit?” well, let’s check.

This will provide too much information

varnishncsa -d | grep 404

like: - - [18/Feb/2013:20:57:23 +0530] "GET http://example.com/404/?requestedPage=http%3a%2f%2fexample.com%2fedition%2frssSectionXml.aspx HTTP/1.1" 200 38924 "-" "Java/1.6.0_31" - - [18/Feb/2013:20:57:23 +0530] "GET http://example.com/edition/rssSectionXml.aspx?SectionId=124 HTTP/1.1" 404 7796 "-" "MyRadio/1.2 CFNetwork/609 Darwin/13.0.0"

The difference between the first line and the second one is that the latter is a page not found for an url that is supposed to be there, while in the first one, the system is doing a GET towards its own server to render a 404 page for a specific page. It renders a 200 OK; so they can be ignored.

Ok too much information. Let make it neat and let’s check the URLs only.

varnishncsa -d | grep 404 | cut -f 7 -d ' '