VCL syntax

All this talk about the illustrious Varnish Configuration Language, and yet it took us until this section to talk about the syntax. We’d apologize for this, but really it’s all part of the plan.

Although this is a domain-specific language with no real other applications, the syntax is pretty easy to understand, particularly in the previous section where we showed the built-in VCL code, where it should make enough sense to comprehend.

In this section, we’ll take a look at some of the basics of VCL. We won’t focus too much on the subroutines, and the finite state machine because we’ve just done that. Let’s just talk about how you can get things done within one of those subroutines.

VCL version declaration

Every VCL file starts with a version declaration. As of Varnish 6, the version declaration you’ll want to use is the following:

vcl 4.1;

The VCL version declaration does not reflect the Varnish version it runs on, but instead ensures compatibility of the VCL syntax. In Varnish 6, Unix domain sSocket support introduced a backwards compatibility break: the .path variable in the backend declaration is not supported on older versions of Varnish and VCL syntax version 4.0.

The vcl 4.0; declaration will still work on Varnish 6, but it prevents some specific Varnish 6 features from being supported.

Assigning values

The VCL language doesn’t require you to program the full behavior, but rather allows you to extend pre-existing built-in behavior. Given this scope and purpose, the main objective is to set values based on certain conditions.

Let’s take the following line of code for example:

set resp.http.Content-Type = "text/html; charset=utf-8";

It comes from the vcl_synth built-in VCL and assigns the content type text/html; charset=utf-8 to the response HTTP header resp.http.Content-Type.

We’re basically assigning a string to a variable. The assigning is done by using the set keyword. If we want to unset a variable, we just use the unset keyword.

Let’s illustrate the unset behavior with another example from the built-in VCL:

unset bereq.body;

We’re unsetting the bereq.body variable. This is part of the vcl_backend_fetch logic of the built-in VCL.

Strings

VCL supports various data types, but the string type is by far the most common.

Here’s a conceptual example:

set variable = "value";

This is the easiest way to assign a string value. But as soon as you want to use newlines or double quotes, you’re in trouble.

Luckily there’s an alternative, which is called long strings. A long string begins with {" and ends with "}.

They may include newlines, double quotes, and other control characters, except for the NULL (0x00) character.

A very familiar usage of this is the built-in VCL implementation of vcl_synth, where the HTML template is composed using long strings:

set resp.body = {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + resp.status + " " + resp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + resp.status + " " + resp.reason + {"</h1>
    <p>"} + resp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + req.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"};

There is also an alternate form of a long string, which can be delimited by triple double quotes, """...""".

This example also shows how to perform string concatenation and variable interpolation. Let’s reimagine the vcl_synth example, and create a version using simple strings:

set beresp.body = "Status: " + resp.status + ", reason: " + resp.reason";

And again we’re using the +-sign for string concatenation and variable interpolation.

Conditionals

Although the VCL language is limited in terms of control structures, it does provide conditionals, meaning if/else statements.

Let’s take some built-in VCL code as an example since we’re so familiar with it:

if (req.method != "GET" && req.method != "HEAD") {
    /* We only deal with GET and HEAD by default */
    return (pass);
}

This is just a regular if-statement. We can also add an else clause:

if (req.method != "GET" && req.method != "HEAD") {
    /* We only deal with GET and HEAD by default */
    return (pass);
} else {
    return (hash);
}

And as you’d expect, there’s also an elseif clause:

if (req.method == "GET") {
    return (hash);
} elseif (req.method == "HEAD") {
    return (hash);
} else {
    return (pass);
}

elsif,elif and else if can also be used as an equivalent for elseif.

Operators

VCL has a number of operators that either evaluate to true or to false:

The = operator is used to assign values.
The ==, !=, <, <=, >, and >= operators are used to compare values.
The ~ operator is used to match values to a regular expression or an ACL.
The ! operator is used for negation.
&& is the logical and operator.
|| is the logical or operator.

And again the built-in VCL comes to the rescue to clarify how some of these operators can be used:

if (req.method != "GET" && req.method != "HEAD") {
    /* We only deal with GET and HEAD by default */
    return (pass);
}

You can clearly see the negation and the logical and, meaning that the expression only evaluates to true when condition one and condition two are false.

We’ve already used the = operator to assign values, but here’s another example for reference:

set req.http.X-Forwarded-Proto = "https";

This example assigns the https value to the X-Forwarded-Proto request header.

A logical or looks like this:

if(req.method == "POST" || req.method == "PUT") {
    return(pass);
}

At least one of the two conditions has to be true for the expression to be true.

And let’s end this part with a less than or equals example:

if(beresp.ttl <= 0s {
    set beresp.uncacheable = true;
}

Comments

Documenting your code with comments is usually a good idea, and VCL supports three different comment styles.

We’ve listed all three of them in the example below:

sub vcl_recv {
    // Single line of out-commented VCL.
    # Another way of commenting out a single line.
    /*
        Multi-line block of commented-out VCL.
    */
}

So you can use // or # to create a single-line comment. And /* ... */ can be used for multi-line comments.

Pretty straightforward, not all that exciting, but definitely noteworthy.

Numbers

VCL supports numeric values, both integers and real numbers.

Certain fields are numeric, so it makes sense to assign literal integer or real values to them.

Here’s an example:

set resp.status = 200;

But most variables are strings, so these numbers get cast into strings. For real numbers, their value is rounded to three decimal places (e.g. 3.142).

Booleans

Booleans can be either true or false. Here’s an example of a VCL variable that expects a boolean:

set beresp.uncacheable = true;

This example probably looks familiar. It comes from the built-in VCL and makes a response uncacheable.

When evaluating values of non-boolean types, the result can also be a boolean.

For example strings will evaluate to true or false if their existence is checked. This could result in the following example:

if(!req.http.Cookie) {
    //Do something
}

if(req.http.Authorization) {
    //Do something
}

Be aware that the header variable must be undefined or unset for it to be evaluated as false. If the header variable is defined with an empty value, it will evaluate as true.

Integers will evaluate to false if their value is 0; the same applies to duration types when their values are zero or less.

Boolean types can also be set based on the result of another boolean expression:

set beresp.uncachable = (beresp.http.do-no-cache == "true");

Time & durations

Time is an absolute value, whereas a duration is a relative value. However, in Varnish they are often combined.

Time

You can add a duration to a time, which results in a new time. It admittedly sounds confusing, but here’s some code to clarify this statement:

set req.http.tomorrow = now + 1d;

The now variable is how you can retrieve the current time in VCL. The now + 1d statement means we’re adding a day to the current time. The returned value is also a time type.

But since we’re assigning a time type to a string field, the time value is cast to a string, which results in the following string value:

Thu, 10 Sep 2020 12:34:54 GMT

Duration

As mentioned, durations are relative. They express a time change and are expressed numerically, but with a time unit attached.

Here are a couple of examples that illustrate the various time units:

1ms equals 1 millisecond.
5s equals 5 seconds.
10m equals 10 minutes.
3h equals 3 hours.
9d equals 9 days.
4w equals 4 weeks.
1y equals 1 year.

In string context their numeric value is kept, and the time unit is stripped off. This is exactly what a real number looks like when cast to a string. And just like real numbers, they are rounded to three decimal places (e.g. 3.142).

Here’s an example of a VCL variable that supports durations:

set beresp.ttl = 1h;

So this example sets the TTL of an object to one hour.

Regular expressions

Pattern matching is a very common practice in VCL. That’s why VCL supports Perl Compatible Regular Expressions (PCRE), and we can match values to a PCRE regex through the ~ operator.

Let’s immediately throw in an example:

if(req.url ~ "^/[a-z]{2}/cart") {
    return(pass);
}

This example is matching the request URL to a regex pattern that looks for the shopping cart URL of a website. This URL is prefixed by two letters, which represent the user’s selected language. When the URL is matched, the request bypasses the cache.

Backends

Varnish is a proxy server and depends on an origin server to provide (most of) the content. A backend definition is indispensable, even if you end up serving synthetic content.

The basics

This is what a backend looks like:

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

It has a name, default in this case, and uses the .host and .port properties to define how Varnish can connect to the origin server.

The first backend that is defined will be used by Varnish.

If you’re not planning to use a backend, or if you are using a dynamic backend like goto, you’ll have to define the following backend configuration:

backend default none;

This bypasses the requirement that you must define a single backend in your VCL.

Optional values

Backends also support the following options:

.connect_timeout is how long to wait for a connection to be made to the backend.
.first_byte_timeout is how long to wait for the first byte of the response.
.between_bytes_timeout is the maximum time to wait between bytes when reading the response.
.last_byte_timeout is the total time to wait for the complete backend response.
.max_connections is the maximum number of concurrent connections Varnish will hold to the backend. When this limit is reached, requests will fail into vcl_backend_error.

Probes

Knowing whether or not a backend is healthy is important. It helps to avoid unnecessary outages and allows you to use a fallback system.

When using probes, you can perform health checks at regular intervals. The probe sets the internal value of the health of that backend to healthy or sick.

Backends that are sick always result in an HTTP 503 error when called.

If you use vmod_directors to load balance with multiple backends, sick backends will be removed from the rotation until their health checks are successful and their state changes to healthy.

A sick backend will become healthy when a threshold of successful polls is reached within a polling window.

This is how you define a probe:

probe healthcheck {
}

Default values

The probe data structure has a bunch of attributes; even without mentioning these attributes, they will have a default behavior:

.url is the URL that will be polled. The default value is /.
.expected_response is the HTTP status code to that the probe expects. The default value is 200.
.timeout is the amount of time the probe is willing to wait for a response before timing out. The default value is 2s.
.interval is the polling interval. The default value is 5s.
.window is the number of polls that are examined to determine the backend health. The default value is 8.
.initial is the number of polls in .window that have to be successful before Varnish starts. The default value is 2.
.threshold is the number of polls in .window that have to be successful to consider the backend healthy. The default value is 3.
.tcponly is the mode of the probe. When enabled with 1, the probe will only check for available TCP connections. The default value is 0. This property is only available in Varnish Enterprise.

Extending values

You can start extending the probe by assigning values to these defaults.

Here’s an example:

probe healthcheck {
    .url = "/health";
    .interval = 10s;
    .timeout = 5s;
}

This example will call the /health endpoint for polling and will send a health check every ten seconds. The probe will wait for five seconds before it times out.

Customizing the entire HTTP request

When the various probe options do not give you enough flexibility, you can even choose to fully customize the HTTP request that the probe will send out.

The .request property allows you to do this. However, this property is mutually exclusive with the .url property.

Here’s an example:

probe healtcheck {
    .request =
        "HEAD /health HTTP/1.1"
        "Host: localhost"
        "Connection: close"
        "User-Agent: Varnish Health Probe";
    .interval = 10s;
    .timeout = 5s;
}

Although a lot of values remain the same, there are two customizations that are part of the request override:

The request method is HEAD instead of GET.
We’re using the custom Varnish Health Probe User-Agent.

Assigning the probe to a backend

Once your probe is set up and configured, you need to assign it to a backend.

It’s a matter of setting the .probe property in your backend to the name of the probe, as you can see in the example below:

vcl 4.1;

probe healthcheck {
    .url = "/health";
    .interval = 10s;
    .timeout = 5s;
}

backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = healthcheck;
}

By defining your probe as a separate data structure, it can be reused when multiple backends are in use.

The verbose approach is to define the .probe property inline, as illustrated in the example below:

vcl 4.1;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = {
        .url = "/health";
      .interval = 10s;
        .timeout = 5s;
    }
}

TCP-only probes

Probes usually perform HTTP requests to check the health of a backend. By using TCP-only probes, the health of a backend is checked by the availability of the TCP connection.

This can be used to probe non-HTTP endpoints. However, TCP-only probes cannot be used with .url, .request, or .expected_response properties.

Here’s how you define such a probe:

probe tcp_healtcheck {
  .tcponly = 1;
}

Keep in mind that TCP-only probes are only available in Varnish Enterprise.

UNIX domain sockets

The backend data structure has additional properties that can be set with regard to the endpoint it is connecting to.

If you want to connect to your backend using a UNIX domain socket, you’ll use the .path property. It is mutually exclusive with the .host property and is only available when you use the vcl 4.1; version declaration.

Here’s an example of a UDS-based backend definition:

backend default {
    .path = "/var/run/some-backend.sock";
}

Overriding the host header

If for some reason the Host header is not set in your HTTP requests, you can use the .host_header property to override it.

Here’s an example:

backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .host_header = "example.com";
}

This .host_header property will be used for both regular backend requests and health probe checks.

Access control lists

An access control list (ACL) is a VCL data structure that contains hostnames, IP addresses, and subnets. An ACL is used to match client addresses and restrict access to certain resources.

Here’s how you define an ACL:

acl admin {
    "localhost";
    "secure.my-server.com";
    "192.168.0.0/24";
    ! "192.168.0.25";   
}

This ACL named admin contains the following rules:

Access from localhost is allowed.
Access from the hostname secure.my-server.com is also allowed.
All IP address in the 192.168.0.0/24 subnet are allowed.
The only IP address from that range that is not allowed is 192.168.0.25.

In your VCL code, you can then match the client IP address to that list, as you’ll see in the next example:

acl admin {
    "localhost";
    "secure.my-server.com";
    "192.168.0.0/24";
    ! "192.168.0.25";   
}

sub vcl_recv {
    if(req.url ~ "^/admin/?" && client.ip !~ admin) {
        return(synth(403,"Forbidden"));
    }
}

In this example, we’re hooking into vcl_recv to intercept requests for /admin or any subordinate resource of /admin/. If users try to access this resource, we check if their client IP address is matched by acl admin.

If it doesn’t match, an HTTP 403 Forbidden error is returned synthetically.

Functions

Complex logic in a programming language is usually abstracted away by functions. This is also the case in VCL, which has a number of native functions.

The number of functions is limited, but extra functions are available in the wide range of VMODs that are supported by Varnish.

In chapter 5, we’ll talk about VMODs and how their functions extend the capabilities of Varnish.

ban()

ban() is a function that adds an expression to the ban list. These expressions are matched to cached objects. Every matching object is then removed from the cache.

In essence, the ban() function exists to invalidate multiple objects at the same time.

Although banning will be covered in detail in chapter 5, here’s a quick example:

ban("obj.age > 1h");

Multiple expressions can be chained using the && operator.

hash_data()

The hash_data() function is used within the vcl_hash subroutine and is used to append string data to the hash input that is used to lookup an object in cache.

Let’s just revisit the built-in VCL for vcl_hash where hash_data() is used:

sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    return (lookup);
}

synthetic()

The synthetic() function prepares a synthetic response body and uses a string argument for its input. This function can be used within vcl_synth and vcl_backend_error.

Here’s an example for vcl_synth:

synthetic(resp.reason);

However, this function is no longer used in the built-in VCL. As of Varnish Cache 5.0, it is recommended to instead use set beresp.body = {""};.

regsub()

The regsub() function is a very popular function in Varnish. This function performs string substitution using regular expressions. Basically, do find/replace on the first occurrence using a regex pattern.

This is the API of this function:

regsub(string, regex, sub)

The string argument is your input.
The regex argument is the regular expression you’re using to match what you’re looking for in the input string.
The sub argument is what the input string will be substituted with.

A practical example

Here’s a really practical example where we use regsub() to extract a cookie value:

vcl 4.1;

sub vcl_hash {
    hash_data(regsub(req.http.Cookie,"(;|^)language=([a-z]{2})(;|$)","\2"));
}

Let’s break it down because it looks quite complex.

This vcl_hash subroutine is used to extend the built-in VCL and to add the value of the language cookie to the hash. This creates a cache variation per language.

We really don’t want to hash the entire cookie because that will drive our hit rate down, especially when there are tracking cookies in place.

In order to extract the exact cookie value we need, we’ll match the req.http.Cookie header to a regular expression that uses grouping. In the substitution part, we can refer to those groups to extract the value we want.

Here’s the regular expression:

(;|^)language=([a-z]{2})(;|$)

This regular expression looks for a language= occurrence, followed by two letters. These letters represent the language. This language cookie can occur at the beginning of the cookie string, in the middle, or at the end. The (;|^) and (;|$) statements ensure that this is possible.

Because we’re using parentheses for grouping, the group where we match the language itself, is indexed as group two. This means we can refer to it in the regsub() function as \2.

So if we look at the entire regsub() example:

regsub(req.http.Cookie,"(;|^)language=([a-z]{2})(;|$)","\2")

And let’s imagine this is our Cookie header:

Cookie: privacy_accepted=1;language=en;sessionid=03F1C5944FF4

Given the regular expression and the group referencing, the output of this regsub() function would be en.

This means that en will be added to the hash along with the URL and the host header.

When the Cookie header doesn’t contain a language cookie, an empty string is returned. When there is no Cookie header, an empty string is returned as well. This means we don’t risk hash-key collisions when the cookie isn’t set. #### regsuball()

The regsuball() function is very similar to the regsub() function we just covered. The only difference is where regsub() matches and replaces the first occurrence of the pattern, regsuball() matches all occurrences.

Even the function API is identical:

regsuball(string, regex, sub)

The string argument is your input.
The regex argument is the regular expression you’re using to match what you’re looking for in the input string.
The sub argument is what the input string will be substituted with.

A practical example

Let’s have a look at a similar example, where we’ll strip off some cookies again. Instead of matching the values we want to keep, we’ll match the values we want to remove. We need to ensure that all occurrences are matched, not just the first occurrence. That’s why we use regsuball() instead of regsub():

regsuball(req.http.Cookie,"_g[a-z0-9_]+=[^;]*($|;\s*)","")

What this example does, is remove all Google Analytics cookies. This is the list of cookies we need to remove:

_ga
_gid
_gat
_gac_<property-id>

Instead of stripping them off one by one, we can use the _g[a-z0-9_]+=[^;]*($|;\s*) regular expression to match them all at once. In the end we’ll replace the matched cookies with an empty string.

This could be the raw value of your req.http.Cookie header:

cookie1=a; _ga=GA1.2.1915485056.1587105100;cookie2=b; _gid=GA1.2.873028102.1599741176; _gat=1

And the end result is the following:

cookie1=a;cookie2=b

Subroutines

At this point, the term subroutine in a VCL context is hopefully not a foreign concept. We’ve been through the Varnish finite state machine multiple times, you’ve seen the corresponding built-in VCL code. But what you might not know is that you can define your own subroutines.

Need an example? Here you go:

vcl 4.1;

sub skipadmin {
    if(req.url ~ "^/admin/?") {
        return(pass);
    }
}

sub vcl_recv {
    call skipadmin;
}

The skipadmin subroutine is entirely custom and is called within vcl_recv using the call statement. The purpose of custom subroutines is to allow code to be properly structured and functionality compartmentalized.

The example above groups the logic to bypass requests to the admin panel in a separate subroutine, which is then called from within vcl_recv.

You are free to name your custom subroutine whatever you want, but keep in mind that the vcl_ naming prefixes are reserved for the Varnish finite state machine. Please also keep in mind that a subroutine is not a function: it does not accept input parameters, and it doesn’t return values. It’s just a procedure that is called.

Include

Not all of your VCL logic should necessarily be in the same VCL file. When the line count of your VCL file increases, readability can become an issue.

To tackle this issue, VCL allows you to include VCL from other files. The include syntax is not restricted to subroutines and fixed language structures, even individual lines of VCL code can be included.

The include "<filename>;" syntax will tell the compiler to read the file and copy its contents into the main VCL file.

When including a file, the order of execution in the main VCL file will be determined by the order of inclusion.

This means that each include can define its own VCL routing logic and if an included file exits the subroutine early, it will bypass any logic that followed that return statement.

The built-in VCL follows this logic and can be thought of as an included file at the end of your VCL. This means that if you put a return statement anywhere in your VCL, the built-in VCL logic will be skipped since it is always appended at the end of your VCL.

So let’s talk about the previous example, where the skipadmin subroutine is used and put the custom subroutine in a separate file:

#This is skipadmin.vcl

sub skipadmin {
    if(req.url ~ "^/admin/?") {
        return(pass);
    }
}

In your main VCL file, you’ll use the include syntax to include skipadmin.vcl:

vcl 4.1;

include "skipadmin.vcl";

sub vcl_recv {
    call skipadmin;
}

And the resulting compiled VCL would be:

vcl 4.1;

sub skipadmin {
    if(req.url ~ "^/admin/?") {
        return(pass);
    }
}

sub vcl_recv {
    call skipadmin;
}

Import

The import statement can be used to import VMODs. These are Varnish modules, written in C-code, that are loaded into Varnish and offer a VCL interface. These modules basically enrich the VCL syntax without being part of the Varnish core.

We’ll cover all the ins and outs of VMODs in the next chapter.

Here’s a quick example:

vcl 4.1;

import std;

sub vcl_recv {
    set req.url = std.querysort(req.url);
}

This example uses import std; to import Varnish’s standard library containing a set of utility functions. The std.querysort() function will alphabetically sort the query string parameters of a URL, which has a beneficial impact on the hit rate of the cache.