There are multiple tools and metrics available to monitor a Varnish installation. This tutorial aims to provide information on important counters that will assist in monitoring vital aspects of a Varnish installation.
There are various commands that can be used to access additional information.
Run varnishlog command:
$ varnishlog -d -g raw -w tmp/varnishlog.raw
Slow client responses:
$ varnishlog -d -g request -q "Timestamp:Resp[2] > 1.0"
Slow backend responses:
$ varnishlog -d -g request -q "Timestamp:Beresp[2] > 1.0"
Requests in waiting list:
$ varnishlog -d -g request -q "Timestamp:Waitinglist[2] > 0.0"
Backend failures:
$ varnishlog -d -g request -q "RespStatus ~ '^5' or Timestamp:Resp[3] > 10.0 or Error" -R 100/1m
The varnishstat utility collects and displays counter and metrics of a Varnish instance since startup time.
Run varnishstat once and exit:
$ varnishstat -1
varnishstat also accepts filters that can be applied as follows:
varnishstat -1 -f 'filter'
More information on varnishstat and how to use it is available here.
The Varnishadm utility establishes a CLI connection to varnishd (Varnish daemon). The following are useful commands to troubleshoot a Varnish instance via varnishadm:
Collect Varnish parameters:
$ varnishadm -- param.show
Collect ban list:
$ varnishadm -- ban.list
Collect panic:
$ varnishadm -- panic.show
Collect backend health:
$ varnishadm -- backend.list -p
client_req
Number of parsable client requests received.
cache_hit
Number of cache hits.
cache_miss
Number of cache misses.
threads_limited
Number of times more threads were needed, but limit was reached in a thread pool.
n_object
Number of HTTP objects (headers + body, if present) in the cache.
n_lru_nuked
How many objects have been forcefully evicted from storage to make room for a new object.
bans
Number of all bans in the system, including bans superseded by newer bans and bans already checked by the ban-lurker.
fetch_failed
Backend content fetches failed.
sess_queued
Contains the number of sessions that are queued because there are no available threads immediately. Consider increasing the thread_pool_min parameter.
sess_dropped
Counts how many times sessions are dropped because varnishd hits the maximum thread queue length. Consider increasing the thread_queue_limit Varnish parameter as a solution to drop fewer sessions.
exp_mailed
Number of objects mailed to expiry thread for handling.
exp_received
Number of objects received by expiry thread for handling.
threads
Total number of threads being used by Varnish.
n_lru_nuked
Number of least recently used (LRU) objects thrown out to make room for new objects. If this is zero, there is no reason to enlarge your cache. Otherwise, your cache is evicting objects due to space constraints. In this case, consider increasing the size of your cache.
mse.c_bytes
Bytes allocated.
mse.c_freed
Bytes freed.
mse.g_alloc
Allocations outstanding.
mse.g_bytes
Bytes outstanding.
mse.g_space
Bytes available.
mse.insert_timeout
Number of inserts that timed out.
mse.n_lru_nuked
Number of LRU nuked objects.
mse.n_lru_moved
Number of LRU move operations.
mse.c_memcache_hit
Stored objects cache hits.
mse.c_memcache_miss
Stored objects cache misses.
mse.g_ykey_keys
Number of YKeys registered.
mse.c_ykey_purged
Number of objects purged with YKey.
g_bytes
Number of bytes allocated from the storage.
g_space
Number of bytes left in the storage.
These counters serve as a great tool to monitor the health and performance of Varnish. There are, however, a multitude of additional counters that can be utilized.
More detailed information on the predefined counters can be found in the varnish-counters
man page.
$ man varnish-counters