This document describes a set of experimental features. This means that the features are subject to change, but we will only do so if we have a good reason to, and that we will flag this clearly in the Changelog
The features described in this document let you collect custom
statistics in Varnish through VCL and VMOD stat. These statistics help you
understand what’s happening inside Varnish and make informed decisions
about cache behavior.
You can track:
All statistics use time-weighted averages that give more importance to recent data while still considering historical trends. This allows you to compare short-term behavior (last few minutes) against long-term averages (last hour or day). See the technical details section at the end for more information on how this works.
The event observatory, described in the VMOD stat
documentation,
lets you track events in your VCL and analyze patterns over time.
For example, you can create an observatory object for VCL misses and
register an event every time Varnish enters sub vcl_miss. By doing
the same for hit, pass and synth, you can monitor how requests
are being handled and potentially adjust cache behavior based on
recent patterns.
To read the statistics, use the .get_ewa(INT i) function, which
returns the number of events per second averaged over a time window.
The parameter i selects the time window (half-life):
i=0: 10 seconds (very recent activity)i=1: 60 seconds (last minute)i=2: 5 minutesi=3: 1 houri=4: 24 hours (long-term trends)You can also call stat.get_observatory_half_time(INT i) to get the
exact half-time value for each index.
The amount observatory is similar to the event observatory, but tracks quantities like byte counts instead of discrete events.
You can use it to calculate averages of amounts you register. For example, you could track the average size of JPEG files from the backend by registering the Content-Length value each time a JPEG is served.
Like the event observatory, use .get_ewa(INT i) to read the
weighted average of registered amounts over different time windows
(see the event observatory section above for the meaning of parameter
i).
For the event and amount observatories, it is possible to ask Varnish
to create counters that can be read through varnishstat.
Duration observatories track time-based measurements and maintain histograms showing how durations are distributed. The primary use case is understanding backend response latency.
Unlike event and amount observatories, you cannot query duration
observatories directly from VCL. Instead, access the data through VMOD
stat, typically using the .prometheus_backend() function to
generate histograms for visualization in Grafana.
The function utils.backend_ttfb() (available from 6.0.14r9) can be
used with duration observatories to segment traffic and analyze
latency for different request types.
When you create a VMOD udo director (see the UDO
documentation), a
duration observatory is automatically created and maintained for you.
Each backend request’s time to first byte is automatically
registered without any VCL intervention.
This means UDO users can get backend latency histograms in Prometheus
and Grafana without writing any additional VCL code - just use VMOD
stat to export the metrics.
The data from UDO directors, can be made into Grafana graphs through PromQL queries. The following examples can be used as a starting point.
This query works with a heat map of different latencies:
sum by (lo) (varnish_udo_ttfb_interval{halftime="60"})
The above will collapse all the different UDO director’s latencies
into one graph, as it does not specify what the name label is. In
most setups, you will want to also specify a name in the query, to
consider only the latencies of one director. Furthermore, if udo
directors are nested, durations will be registered on each level, and
one request will then be counted more than once, which is probably not
helpful.
A more complex query, which is similar but normalizes the values in each time interval, is
(
sum by (lo) (varnish_udo_ttfb_interval{halftime="60"})
)
/ ignoring(lo) group_left()
(
sum without (lo) (
sum by (lo) (varnish_udo_ttfb_interval{halftime="60"})
)
)
It is also possible to use the “native” Prometheus histograms, and get estimated percentiles, like this:
histogram_quantile(0.5, sum by (le)
(rate(varnish_dur_udo_ttfb_bucket{instance="localhost:8088"}[5m])))
Multiple quantiles can be added to the same graph by adding several lines similar to the query above, by varying the first parameter (0.5 corresponds to the median).
A snapshot of the exponentially weighted histogram can be seen through a bar gauge using the query
varnish_udo_ttfb_interval{halftime="60"}
and then using “labels to fields” to turn the to label into a
field. This will give an administrator a instant view of the latency.
This section explains the mathematical foundation behind the time-weighted averaging used in the observatory features. Most users can skip this section and use the features based on the practical descriptions above.
For any sequence of numbers/values and associated weights, it is possible to calculate a weighted average of the values in a natural way. Exponentially weighted averages are a special case where each observation has a weight based on how much time has passed since it occurred. Recent observations have high weights while older events have progressively lower weights.
These averages are also known as EWMAs (Exponentially Weighted Moving Averages) and are widely used in computing, finance and other sciences.
The half-life (or half-time) for an average is defined as the time it takes for an observation’s weight to drop to half its original value. It is also the age difference needed for one observation to have half the weight of another. By convention, new observations are given weight 1.
When you read exponentially weighted averages with different half-lives, you can compare them to understand trends. For example, if the 5-minute average is significantly higher than the 1-hour average, you know recent activity has increased.
In Varnish, observations are accumulated in 10-second buckets for efficiency. All observations within the same bucket receive the same weight. This optimization keeps CPU cost very low while maintaining accuracy.
Because of this bucketing, the .get_ewa(INT i) function only
updates once every 10 seconds, and the value is based on completed
buckets (so up to 10 seconds of very recent data may not yet be
included).
The function stat.get_observatory_half_time(INT i) returns the
half-time in seconds for indices i=0 through i=4, currently
returning 10s, 60s, 300s, 3600s and 86400s,
respectively.
The features described in this document are available in Varnish Enterprise 6.0.14r6 and later, but the documentation is not necessarily accurate for versions which are not current. See the Changelog for details on when each feature was introduced.