Summary statistics

Introduction

This document describes a set of experimental features. This means that the features are subject to change, but we will only do so if we have a good reason to, and that we will flag this clearly in the Changelog

The features described in this document let you collect custom statistics in Varnish through VCL and VMOD stat. These statistics help you understand what’s happening inside Varnish and make informed decisions about cache behavior.

You can track:

Events (e.g., cache hits, misses, passes) to understand request patterns
Amounts (e.g., byte counts, file sizes) to monitor data volumes
Durations (e.g., backend response times) to measure latency

All statistics use time-weighted averages that give more importance to recent data while still considering historical trends. This allows you to compare short-term behavior (last few minutes) against long-term averages (last hour or day). See the technical details section at the end for more information on how this works.

Feature: Event observatory

The event observatory, described in the VMOD stat documentation, lets you track events in your VCL and analyze patterns over time.

For example, you can create an observatory object for VCL misses and register an event every time Varnish enters sub vcl_miss. By doing the same for hit, pass and synth, you can monitor how requests are being handled and potentially adjust cache behavior based on recent patterns.

To read the statistics, use the .get_ewa(INT i) function, which returns the number of events per second averaged over a time window. The parameter i selects the time window (half-life):

i=0: 10 seconds (very recent activity)
i=1: 60 seconds (last minute)
i=2: 5 minutes
i=3: 1 hour
i=4: 24 hours (long-term trends)

You can also call stat.get_observatory_half_time(INT i) to get the exact half-time value for each index.

Feature: Amount observatory

The amount observatory is similar to the event observatory, but tracks quantities like byte counts instead of discrete events.

You can use it to calculate averages of amounts you register. For example, you could track the average size of JPEG files from the backend by registering the Content-Length value each time a JPEG is served.

Like the event observatory, use .get_ewa(INT i) to read the weighted average of registered amounts over different time windows (see the event observatory section above for the meaning of parameter i).

Event and amount observatory counters

For the event and amount observatories, it is possible to ask Varnish to create counters that can be read through varnishstat.

Feature: Duration observatory

Duration observatories track time-based measurements and maintain histograms showing how durations are distributed. The primary use case is understanding backend response latency.

Unlike event and amount observatories, you cannot query duration observatories directly from VCL. Instead, access the data through VMOD stat, typically using the .prometheus_backend() function to generate histograms for visualization in Grafana.

The function utils.backend_ttfb() (available from 6.0.14r9) can be used with duration observatories to segment traffic and analyze latency for different request types.

Feature: Automatic UDO duration tracking

When you create a VMOD udo director (see the UDO documentation), a duration observatory is automatically created and maintained for you. Each backend request’s time to first byte is automatically registered without any VCL intervention.

This means UDO users can get backend latency histograms in Prometheus and Grafana without writing any additional VCL code - just use VMOD stat to export the metrics.

Graphing histograms of durations in Grafana with examples

The data from UDO directors, can be made into Grafana graphs through PromQL queries. The following examples can be used as a starting point.

This query works with a heat map of different latencies:

sum by (lo) (varnish_udo_ttfb_interval{halftime="60"})

The above will collapse all the different UDO director’s latencies into one graph, as it does not specify what the name label is. In most setups, you will want to also specify a name in the query, to consider only the latencies of one director. Furthermore, if udo directors are nested, durations will be registered on each level, and one request will then be counted more than once, which is probably not helpful.

A more complex query, which is similar but normalizes the values in each time interval, is

(
 sum by (lo) (varnish_udo_ttfb_interval{halftime="60"})
)
/ ignoring(lo) group_left()
(
 sum without (lo) (
  sum by (lo) (varnish_udo_ttfb_interval{halftime="60"})
 )
)

It is also possible to use the “native” Prometheus histograms, and get estimated percentiles, like this:

histogram_quantile(0.5, sum by (le)
 (rate(varnish_dur_udo_ttfb_bucket{instance="localhost:8088"}[5m])))

Multiple quantiles can be added to the same graph by adding several lines similar to the query above, by varying the first parameter (0.5 corresponds to the median).

A snapshot of the exponentially weighted histogram can be seen through a bar gauge using the query

varnish_udo_ttfb_interval{halftime="60"}

and then using “labels to fields” to turn the to label into a field. This will give an administrator a instant view of the latency.

Technical details: Exponentially weighted averages

This section explains the mathematical foundation behind the time-weighted averaging used in the observatory features. Most users can skip this section and use the features based on the practical descriptions above.

For any sequence of numbers/values and associated weights, it is possible to calculate a weighted average of the values in a natural way. Exponentially weighted averages are a special case where each observation has a weight based on how much time has passed since it occurred. Recent observations have high weights while older events have progressively lower weights.

These averages are also known as EWMAs (Exponentially Weighted Moving Averages) and are widely used in computing, finance and other sciences.

The half-life (or half-time) for an average is defined as the time it takes for an observation’s weight to drop to half its original value. It is also the age difference needed for one observation to have half the weight of another. By convention, new observations are given weight 1.

When you read exponentially weighted averages with different half-lives, you can compare them to understand trends. For example, if the 5-minute average is significantly higher than the 1-hour average, you know recent activity has increased.

Implementation details

In Varnish, observations are accumulated in 10-second buckets for efficiency. All observations within the same bucket receive the same weight. This optimization keeps CPU cost very low while maintaining accuracy.

Because of this bucketing, the .get_ewa(INT i) function only updates once every 10 seconds, and the value is based on completed buckets (so up to 10 seconds of very recent data may not yet be included).

The function stat.get_observatory_half_time(INT i) returns the half-time in seconds for indices i=0 through i=4, currently returning 10s, 60s, 300s, 3600s and 86400s, respectively.

Availability

The features described in this document are available in Varnish Enterprise 6.0.14r6 and later, but the documentation is not necessarily accurate for versions which are not current. See the Changelog for details on when each feature was introduced.