Search

Varnish Monitoring Tutorial

Introduction

There are multiple tools and metrics available to monitor a varnish installation. This tutorial aims to provide information on important counters that will assist in monitoring vital aspects of a vanish installation.

Commands

There are various commands that can be used to access additional information.

varnishlog

Run varnishlog command:

$ varnishlog -d -g raw -w tmp/varnishlog.raw

Slow client responses:

$ varnishlog -d -g request -q "Timestamp:Resp[2]  > 1.0"

Slow backend responses:

$ varnishlog -d -g request -q "Timestamp:Beresp[2]  > 1.0"

Requests in waiting list:

$ varnishlog -d -g request -q "Timestamp:Waitinglist[2]  > 0.0"

Backend failures:

$ varnishlog -d -g request -q "RespStatus ~ '^5' or Timestamp:Resp[3] > 10.0 or Error" -R 100/1m

varnishstat

The varnishstat utility collects and display counter and metrics of a Varnish instance since startup time.

Run varnishstat once and exit:

$ varnishstat -1

varnishstat also accepts filters which can be applied as follows:

varnishstat -1 -f 'filter'

More information on varnishstat and how to use it is available here.

varnishadm

The Varnishadm utility establishes a CLI connection to varnishd(Varnish daemon). The following are useful commands to troubleshoot a Varnish instance via varnishadm.

Collect Varnish parameters:

$ varnishadm -- param.show

Collect ban list:

$ varnishadm -- ban.list

Collect panic:

$ varnishadm -- panic.show

Collect backend health:

$ varnishadm -- backend.list -p

Counters

MAIN COUNTERS (MAIN.*)

client_req

Number of parsable client requests received.

cache_hit

Number of cache hits.

cache_miss

Number of cache misses.

threads_limited

Number of times more threads were needed, but limit was reached in a thread pool.

n_object

Number of HTTP objects (headers + body, if present) in the cache.

n_lru_nuked

How many objects have been forcefully evicted from storage to make room for a new object.

bans

Number of all bans in the system, including bans superseded by newer bans and bans already checked by the ban-lurker.

fetch_failed

Backend content fetches failed.

sess_queued

Contains the number of sessions that are queued because there are no available threads immediately. Consider to increase the thread_pool_min parameter.

sess_dropped

Counts how many times sessions are dropped because varnishd hits the maximum thread queue length. You may consider to increase the thread_queue_limit Varnish parameter as a solution to drop less sessions.

exp_mailed

Number of objects mailed to expiry thread for handling.

exp_received

Number of objects received by expiry thread for handling.

threads

Total number of threads being used by Varnish.

n_lru_nuked

Number of least recently used (LRU) objects thrown out to make room for new objects. If this is zero, there is no reason to enlarge your cache. Otherwise, your cache is evicting objects due to space constraints. In this case, consider increasing the size of your cache.

MSE COUNTERS (MSE.*)

mse.c_bytes

Bytes allocated.

mse.c_freed

Bytes freed.

mse.g_alloc

Allocations outstanding.

mse.g_bytes

Bytes outstanding.

mse.g_space

Bytes available.

mse.insert_timeout

Number of inserts that timed out.

mse.n_lru_nuked

Number of LRU nuked objects.

mse.n_lru_moved

Number of LRU move operations.

mse.c_memcache_hit

Stored objects cache hits.

mse.c_memcache_miss

Stored objects cache misses.

mse.g_ykey_keys

Number of YKeys registered.

mse.c_ykey_purged

Number of objects purged with YKey.

SMA COUNTERS (SMA.*)

g_bytes

Number of bytes allocated from the storage.

g_space

Number of bytes left in the storage.

Conclusion

These counters serve as a great tool to monitor the health and performance of varnish. There are, however, a multitude of additional counters that can be utilized.

More detailed information on the predefined counters can be found in the varnish-counters man page.

$ man varnish-counters