Search
Varnish Custom Statistics

Use cases

Here we will go through some of the typical use cases for VCS. There are endless opportunities for tracking all aspects of website behavior. These examples will hopefully give you a good idea of what is possible and provide you with a bit of inspiration.

Most of these examples can be implemented with a few lines of VCL code in your Varnish setup. They work with both the vanilla Varnish Cache release and the Varnish Cache Plus release.

VCL key definitions

By default, vcs-agent installs with the -d parameter enabled. This configuration automatically generates a key for each URL, HOST, and a global ALL key.

Manually tagging a request with a key is done in VCL, by writing an std.log() line prefixed with the string "vcs-key:". The default key configuration is equivalent to the following VCL, used as an example:

sub vcl_deliver {
    std.log("vcs-key:ALL");
    std.log("vcs-key:HOST/" + req.http.Host);
    std.log("vcs-key:URL/" + req.http.Host + req.url);
}

In the above example, all requests will be tagged with the following keys:

  • contents of the Host header (example.com)
  • Host + URL (example.com/foo)
  • the vcs-key ALL

For std.log() you will also need to include the std VMOD, with an import std; directive in your VCL.

Splitting the VCS namespace

VCS has a flat namespace. Every key is created in this namespace. So, in order to add a bit of organization to your VCS setup we recommend your split the namespace into various sub-namespaces.

To split a namespace we recommend you use a separator. We recommend you use / and we’ll be using it in our examples here.

The reasons for splitting the namespace would be to create queries against VCS that gives you some subset of the data in VCS. Lets say that you use VCS to track the number of views on your website. If you prepend those keys with VIEWS you can query VCS to give you a top list of the views by asking it to show you the top list of every key beginning with VIEWS.

Then you might have another query that gives you the top list of caches misses - MISSES and other logical groups.

Omitting URLs from the default keys

To omit certain URLs (or requests) from the default keys, remove the -d parameter from vcs-agent’s systemd configuration.

Next, add the following VCL to generate the VCS keys, using an if statement to skip the requests you do not want to send to VCS:

sub vcl_deliver {
    if (req.url != "/healthcheck") {
        std.log("vcs-key:ALL");
        std.log("vcs-key:HOST/" + req.http.Host);
        std.log("vcs-key:URL/" + req.http.Host + req.url);
    }
}

In the above example, requests for /healthcheck will not be sent to VCS.

Generating histograms

Using histograms is a good way to get more detailed data from each bucket. To specify a histogram to be generated for a certain stat you use the -H option.

vcs -H reqbytes:0,10,1000,10000

The above example will produce a histogram for stat reqbytes with limits 0,10,1000,10000. The output from VCS will have a corresponding histogram added to each bucket like this:

{
    "allowlist/login.css": [
        {
            "timestamp": "2021-08-30T17:58:00+00",
            "n_req": 17,
            "n_req_uniq": "NaN",
            "n_miss": 0,
            ...
            "resp_5xx": 0,
            "histograms": [
                {
                    "type": "reqbytes",
                    "limits": [0, 10, 1000, 10000],
                    "counts": [0, 4, 12, 1],
                    "sum": 37645,
                    "count": 17
                }
            ]
        },
        {
            "timestamp": "2021-08-30T17:57:30+00",
            "n_req": 3,
            "n_req_uniq": "NaN",
            "n_miss": 0,
            ...
            "resp_5xx": 0,
            "histograms": [
                {
                    "type": "reqbytes",
                    "limits": [0, 10, 1000, 10000],
                    "counts": [0, 3, 0, 0],
                    "sum": 314,
                    "count": 3
                }
            ]
        },
        {
            "timestamp": "2021-08-30T17:57:00+00",
...
        }
    ]
}

Advanced options

You can specify more than one histogram, and you can use different formats. You can also specify the histogram with a label. Histograms that are specified with a label will only be generated for key that include a matching label string in their name.

$ vcs -H reqbytes:0,10,1000,10000 -H lbl1:restarts:A,1,4 -H lbl2:n_bodybytes:G,50,100,500 -H berespbytes:A,1,3,A,150,500,A,10,500

Let’s break down the above example.

First we have the explicitly specified sequence from the previous example:

-H reqbytes:0,10,1000,10000

Next we have a histogram with a label that is specified as an arithmetic sequence:

-H lbl1:restarts:A,1,4

This will produce a histogram with ranges that go from 0 and increase by 1 up to, but not exceeding, 4. Adding a label means that this histogram will only be generated for keys that have the specified label included as a label string in the key name.

Following we have a histogram specified as a geometric sequence:

-H lbl2:n_bodybytes:G,50,100,500

A geometric specification will produce a sequence where each limit increases with 100 percent of the previous value up to, but not exceeding, 500.

And finally we have a sequence with a repeated arithmetic sequence:

-H berespbytes:A,1,3,A,150,500,A,10,500

This syntax is perfectly valid and each new sequence will continue where the previous one ended.

The resulting vcs output from the above specification (when specifying both labels) will be:

{
    "allowlist/login.css:lbl1,lbl2": [
        {
            "timestamp": "2021-08-30T17:58:00+00",
            "n_req": 17,
            "n_req_uniq": "NaN",
            "n_miss": 0,
            ...
            "resp_5xx": 0,
            "histograms": [
                {
                    "type": "reqbytes",
                    "limits": [0, 10, 1000, 10000],
                    "counts": [0, 4, 12, 1],
                    "sum": 37645,
                    "count": 17
                },
                {
                    "type": "lbl1:restarts",
                    "limits": [0, 1, 2, 3, 4],
                    "counts": [4, 0, 0, 0, 0],
                    "sum": 5,
                    "count": 17
                },
                {
                    "type": "lbl2:n_bodybytes",
                    "counts": [2, 13, 1, 0, 1],
                    "limits": [0, 50, 100, 200, 400],
                    "sum": 1903,
                    "count": 17
                },
                {
                    "type": "berespbytes",
                    "counts": [0, 1, 8, 1, 5, 0, 0, 0, 0, 0, 2],
                    "limits": [0, 1, 2, 3, 153, 303, 453, 463, 473, 483, 493],
                    "sum": 19915,
                    "count": 17
                }
            ]
        },
        {
            "timestamp": "2021-08-30T17:57:30+00",
...
        }
    ]
}

Adding histogram labels to VCS log entries

As indicated above, you can use labels to control which histograms to produce for which vcs keys. Specifying a label name for a histogram on the VCS command line is one part of this, the other is to add the label to the VCS log in you VCL. This is done by adding a : after the regular key name and the label. To specify more than one label you separate them with a comma (,). This example shows how you can apply a label to a key to control what histograms should be generated by VCS for that key:

sub vcl_deliver {
    if (req.http.user-agent ~ "mobile") {
        std.log("vcs-key:MOBILE/" + req.url + ":lbl1,lbl2");
    }
    else {
        std.log("vcs-key:" + req.url + ":lbl2");
    }
}

This snippet would give you one extra histogram if the request comes from a mobile browser, but skip that for any other requests. By using labels you can choose the type of histogram that makes sense for a particular key without using up resources by generating this data for all keys.

Note:

  • When adding a label to a key, the label always becomes part of the key name.
  • Adding histograms to a bucket increases the ammount of data vcs needs to store for that bucket and the memory footprint will increase accordingly.
  • If your vcs keys already contains : that’s ok, only the part after the last occurence of : will be considered when vcs looks for label specifications.
  • Since labels are considered to be part of the key name, anything past the last : that doesn’t match any labels will be ignored.

Using VCS for tracking down slow pages and cache misses

To track which URLs have the slowest response times, we can make use of VCS’ ability to provide a sorted list of response times for the keys it is tracking. Simply issuing a request for:

/all/top_ttfb

will produce a list of the keys associated with the 10 slowest requests. To further get a breakdown of this, for example to get the actual URLs, we can make use of the default keys and combine this with VCS’ regex matching capabilities:

/match/^URL/top_ttfb

The abbreviation ttfb stands for time to first byte, and is the time between Varnish first started handling the request until it started transmitting the first byte to the client.

VCS for use in news media

For a news site there are a few specific things you might want to track. CMS systems typically have unique article IDs that identify one article. Logging the article IDs into VCS gives you easy real time access to what stories are being read right now. We have customers that are embedding this information on their websites generating the what is hot right now lists we often see on a news site.

Logging the article ID and not just the URL make the list ignore different presentations of the same article and makes the list about the articles themselves. It also removes the need to normalize the URL in any way, so query strings that annotate links will not pollute the list itself.

If your CMS can produce an x-artid header you should be all set.

In vcl_deliver you would need to add the following:

sub vcl_deliver {
    std.log("vcs-key:ARTICLE_ID/" + resp.http.x-artid);
}

You can expand on the setup in several ways. One might for instance also want to measure the social impact of each article by looking at the referrer header (if set).

In vcl_deliver add the following:

sub vcl_deliver {
    if (req.http.referer) {
        std.log("vcs-key:ARTREF/" + resp.http.x-artid + "/" + req.http.referer);
    }
}

You might also want to expand it further by looking at the user agent and adding a separate time series for mobile views. In vcl_deliver:

sub vcl_deliver {
    if (req.http.user-agent ~ "mobile") {
        std.log("vcs-key:MOBILE/" + resp.http.x-artid);
    }
}

Using VCS to track conversions

Many websites want to measure conversions. A conversions might be having a user click a link to sign up, putting an item in the shopping basket. Another use case would be for a paid content site, where the conversion happens with the user clicking the sign up page when reading a specific article.

The first step is to identify the conversion taking place, typically done by looking at the request URL, maybe in combination with the HTTP method used.

In this example our article page might be /news/art/23245. On that page there is a link pointing to /signup. To register the conversion in VCS with the article as the main key we would need the following VCL in vcl_deliver:

sub vcl_deliver {
    if (req.url == "/signup") {
        set req.http.artid = regsub(...);
        std.log("vcs-key:CONVERSION/SIGNUP/" + req.http.artid);
    }
}

For a more in depth discussion on using VCS to track conversions, and also a how-to on doing AB testing with Varnish and VCS, please see this blog post: https://info.varnish-software.com/blog/live-ab-testing-varnish-and-vcs

Using VCS to track HLS/HDS/DASH video streams

If you are streaming HLS/HDS/Smooth/DASH through Varnish you might want to count the number of users on each Varnish server. This might be useful for statistical reasons but might also be used for directing traffic to your various Varnish Cache clusters.

The tricky part is to uniquely identify a user. In order to do this you need some sort of session cookie to be preset on the client. All the HTTP video clients are suppose to support cookies. If there is a cookie already present we can probably utilize it, if not we have to generate a random one.

We recommend using the cookie VMOD when working with cookies. It will make the VCL much more readable. The following VCL sets a cookie if there is none present.

In vcl_deliver:

import cookie;

sub vcl_deliver
{
    cookie.parse(req.http.cookie);
    set req.http.X-vcsid = cookie.get("_vcsid");
    if (req.http.X-vcsid == "") {
        set req.http.X-vcsid = std.random(1, 10000000) + "." + std.random(1, 10000000);
        set resp.http.Set-Cookie = "_vcsid=" + req.http.X-vcsid + "; HttpOnly; Path=/";
    }
    std.log("vcs-key:SESSION/" + req.http.X-vcsid + "/" + req.http.Host + req.url);
}

There is a blog post on the matter that discusses this in some detail: https://info.varnish-software.com/blog/getting-live-statistics-varnish-hlshds

Using VCS to gain insights in E-commerce

In an e-commerce setting VCS can be used to give stats about how various SKUs behave. A typical use case would be running statistics on which SKUs receive what traffic. In addition there are various other aspects that VCS can help gather data on:

  • How viral are the various SKUs?
  • How much traffic does a SKU receive off social networks
  • Rank the different SKUs sources (Organic search, domains)
  • What are the SKUs that customers most often put in their baskets?

In vcl_deliver:

sub vcl_deliver {
    if (req.url ~ "/sku/\d+") {
        set req.http.sku = regsub(...);
    
        std.log("vcs-key:VIEWSKU/" + req.http.sku);
    
        if (req.http.referer ~ "facebook.com|twitter.com") {
            std.log("vcs-key:SOCIAL/" + req.http.sku);
        }
    
        if (req.http.referer ~ "yahoo.com|google.com") {
            std.log("vcs-key:ORGANIC/" + req.http.sku);
        }
    
        if (req.url ~ "/ajax/put/\d+") {
            std.log("vcs-key:PUTBASKET/" + req.http.sku);
        }
    }
}