All this talk about the illustrious Varnish Configuration Language, and yet it took us until this section to talk about the syntax. We’d apologize for this, but really it’s all part of the plan.
Although this is a domain-specific language with no real other applications, the syntax is pretty easy to understand, particularly in the previous section where we showed the built-in VCL code, where it should make enough sense to comprehend.
In this section, we’ll take a look at some of the basics of VCL. We won’t focus too much on the subroutines, and the finite state machine because we’ve just done that. Let’s just talk about how you can get things done within one of those subroutines.
Every VCL file starts with a version declaration. As of Varnish 6, the version declaration you’ll want to use is the following:
vcl 4.1;
The VCL version declaration does not reflect the Varnish version it runs on, but instead ensures compatibility of the VCL syntax. In Varnish 6, Unix domain sSocket support introduced a backwards compatibility break: the
.pathvariable in the backend declaration is not supported on older versions of Varnish and VCL syntax version 4.0.
The vcl 4.0; declaration will still work on Varnish 6, but it
prevents some specific Varnish 6 features from being supported.
The VCL language doesn’t require you to program the full behavior, but rather allows you to extend pre-existing built-in behavior. Given this scope and purpose, the main objective is to set values based on certain conditions.
Let’s take the following line of code for example:
set resp.http.Content-Type = "text/html; charset=utf-8";
It comes from the vcl_synth built-in VCL and assigns the content
type text/html; charset=utf-8 to the response HTTP header
resp.http.Content-Type.
We’re basically assigning a string to a variable. The assigning is done
by using the set keyword. If we want to unset a variable, we just
use the unset keyword.
Let’s illustrate the unset behavior with another example from the
built-in VCL:
unset bereq.body;
We’re unsetting the bereq.body variable. This is part of the
vcl_backend_fetch logic of the built-in VCL.
VCL supports various data types, but the string type is by far the most common.
Here’s a conceptual example:
set variable = "value";
This is the easiest way to assign a string value. But as soon as you want to use newlines or double quotes, you’re in trouble.
Luckily there’s an alternative, which is called long strings. A long
string begins with {" and ends with "}.
They may include newlines, double quotes, and other control characters,
except for the NULL (0x00) character.
A very familiar usage of this is the built-in VCL implementation of
vcl_synth, where the HTML template is composed using long strings:
set resp.body = {"<!DOCTYPE html>
<html>
<head>
<title>"} + resp.status + " " + resp.reason + {"</title>
</head>
<body>
<h1>Error "} + resp.status + " " + resp.reason + {"</h1>
<p>"} + resp.reason + {"</p>
<h3>Guru Meditation:</h3>
<p>XID: "} + req.xid + {"</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
"};
There is also an alternate form of a long string, which can be delimited
by triple double quotes, """...""".
This example also shows how to perform string concatenation and
variable interpolation. Let’s reimagine the vcl_synth example, and
create a version using simple strings:
set beresp.body = "Status: " + resp.status + ", reason: " + resp.reason";
And again we’re using the +-sign for string concatenation and
variable interpolation.
Although the VCL language is limited in terms of control structures, it does provide conditionals, meaning if/else statements.
Let’s take some built-in VCL code as an example since we’re so familiar with it:
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
This is just a regular if-statement. We can also add an else clause:
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
} else {
return (hash);
}
And as you’d expect, there’s also an elseif clause:
if (req.method == "GET") {
return (hash);
} elseif (req.method == "HEAD") {
return (hash);
} else {
return (pass);
}
elsif,elifandelse ifcan also be used as an equivalent forelseif.
VCL has a number of operators that either evaluate to true or to
false:
= operator is used to assign values.==, !=, <, <=, >, and >= operators are used to compare
values.~ operator is used to match values to a regular expression or
an ACL.! operator is used for negation.&& is the logical and operator.|| is the logical or operator.And again the built-in VCL comes to the rescue to clarify how some of these operators can be used:
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
You can clearly see the negation and the logical and, meaning that the
expression only evaluates to true when condition one and condition two
are false.
We’ve already used the = operator to assign values, but here’s another
example for reference:
set req.http.X-Forwarded-Proto = "https";
This example assigns the https value to the X-Forwarded-Proto
request header.
A logical or looks like this:
if(req.method == "POST" || req.method == "PUT") {
return(pass);
}
At least one of the two conditions has to be true for the expression to
be true.
And let’s end this part with a less than or equals example:
if(beresp.ttl <= 0s {
set beresp.uncacheable = true;
}
Documenting your code with comments is usually a good idea, and VCL supports three different comment styles.
We’ve listed all three of them in the example below:
sub vcl_recv {
// Single line of out-commented VCL.
# Another way of commenting out a single line.
/*
Multi-line block of commented-out VCL.
*/
}
So you can use // or # to create a single-line comment. And
/* ... */ can be used for multi-line comments.
Pretty straightforward, not all that exciting, but definitely noteworthy.
VCL supports numeric values, both integers and real numbers.
Certain fields are numeric, so it makes sense to assign literal integer or real values to them.
Here’s an example:
set resp.status = 200;
But most variables are strings, so these numbers get cast into strings.
For real numbers, their value is rounded to three decimal places
(e.g. 3.142).
Booleans can be either true or false. Here’s an example of a VCL
variable that expects a boolean:
set beresp.uncacheable = true;
This example probably looks familiar. It comes from the built-in VCL and makes a response uncacheable.
When evaluating values of non-boolean types, the result can also be a boolean.
For example strings will evaluate to true or false if their
existence is checked. This could result in the following example:
if(!req.http.Cookie) {
//Do something
}
if(req.http.Authorization) {
//Do something
}
Be aware that the header variable must be undefined or unset for it to
be evaluated as false. If the header variable is defined with an empty
value, it will evaluate as true.
Integers will evaluate to false if their value is 0; the same
applies to duration types when their values are zero or less.
Boolean types can also be set based on the result of another boolean expression:
set beresp.uncachable = (beresp.http.do-no-cache == "true");
Time is an absolute value, whereas a duration is a relative value. However, in Varnish they are often combined.
You can add a duration to a time, which results in a new time. It admittedly sounds confusing, but here’s some code to clarify this statement:
set req.http.tomorrow = now + 1d;
The now variable is how you can retrieve the current time in VCL.
The now + 1d statement means we’re adding a day to the current time.
The returned value is also a time type.
But since we’re assigning a time type to a string field, the time value is cast to a string, which results in the following string value:
Thu, 10 Sep 2020 12:34:54 GMT
As mentioned, durations are relative. They express a time change and are expressed numerically, but with a time unit attached.
Here are a couple of examples that illustrate the various time units:
1ms equals 1 millisecond.5s equals 5 seconds.10m equals 10 minutes.3h equals 3 hours.9d equals 9 days.4w equals 4 weeks.1y equals 1 year.In string context their numeric value is kept, and the time unit is
stripped off. This is exactly what a real number looks like when cast
to a string. And just like real numbers, they are rounded to three
decimal places (e.g. 3.142).
Here’s an example of a VCL variable that supports durations:
set beresp.ttl = 1h;
So this example sets the TTL of an object to one hour.
Pattern matching is a very common practice in VCL. That’s why VCL
supports Perl Compatible Regular Expressions (PCRE), and we can match
values to a PCRE regex through the ~ operator.
Let’s immediately throw in an example:
if(req.url ~ "^/[a-z]{2}/cart") {
return(pass);
}
This example is matching the request URL to a regex pattern that looks for the shopping cart URL of a website. This URL is prefixed by two letters, which represent the user’s selected language. When the URL is matched, the request bypasses the cache.
Varnish is a proxy server and depends on an origin server to provide (most of) the content. A backend definition is indispensable, even if you end up serving synthetic content.
This is what a backend looks like:
backend default {
.host = "127.0.0.1";
.port = "8080";
}
It has a name, default in this case, and uses the .host and
.port properties to define how Varnish can connect to the origin
server.
The first backend that is defined will be used by Varnish.
If you’re not planning to use a backend, or if you are using a dynamic
backend like goto, you’ll have to define the following backend
configuration:
backend default none;
This bypasses the requirement that you must define a single backend in your VCL.
Backends also support the following options:
.connect_timeout is how long to wait for a connection to be made to
the backend..first_byte_timeout is how long to wait for the first byte of the
response..between_bytes_timeout is the maximum time to wait between bytes
when reading the response..last_byte_timeout is the total time to wait for the complete
backend response..max_connections is the maximum number of concurrent connections
Varnish will hold to the backend. When this limit is reached, requests
will fail into vcl_backend_error.Knowing whether or not a backend is healthy is important. It helps to avoid unnecessary outages and allows you to use a fallback system.
When using probes, you can perform health checks at regular intervals. The probe sets the internal value of the health of that backend to healthy or sick.
Backends that are sick always result in an HTTP 503 error when called.
If you use vmod_directors to load balance with multiple backends, sick
backends will be removed from the rotation until their health checks are
successful and their state changes to healthy.
A sick backend will become healthy when a threshold of successful polls is reached within a polling window.
This is how you define a probe:
probe healthcheck {
}
The probe data structure has a bunch of attributes; even without mentioning these attributes, they will have a default behavior:
.url is the URL that will be polled. The default value is /..expected_response is the HTTP status code to that the probe
expects. The default value is 200..timeout is the amount of time the probe is willing to wait for a
response before timing out. The default value is 2s..interval is the polling interval. The default value is 5s..window is the number of polls that are examined to determine the
backend health. The default value is 8..initial is the number of polls in .window that have to be
successful before Varnish starts. The default value is 2..threshold is the number of polls in .window that have to be
successful to consider the backend healthy. The default value is
3..tcponly is the mode of the probe. When enabled with 1, the probe
will only check for available TCP connections. The default value is
0. This property is only available in Varnish Enterprise.You can start extending the probe by assigning values to these defaults.
Here’s an example:
probe healthcheck {
.url = "/health";
.interval = 10s;
.timeout = 5s;
}
This example will call the /health endpoint for polling and will send
a health check every ten seconds. The probe will wait for five
seconds before it times out.
When the various probe options do not give you enough flexibility, you can even choose to fully customize the HTTP request that the probe will send out.
The .request property allows you to do this. However, this property is
mutually exclusive with the .url property.
Here’s an example:
probe healtcheck {
.request =
"HEAD /health HTTP/1.1"
"Host: localhost"
"Connection: close"
"User-Agent: Varnish Health Probe";
.interval = 10s;
.timeout = 5s;
}
Although a lot of values remain the same, there are two customizations that are part of the request override:
HEAD instead of GET.Varnish Health Probe User-Agent.Once your probe is set up and configured, you need to assign it to a backend.
It’s a matter of setting the .probe property in your backend to the
name of the probe, as you can see in the example below:
vcl 4.1;
probe healthcheck {
.url = "/health";
.interval = 10s;
.timeout = 5s;
}
backend default {
.host = "127.0.0.1";
.port = "8080";
.probe = healthcheck;
}
By defining your probe as a separate data structure, it can be reused when multiple backends are in use.
The verbose approach is to define the .probe property inline, as
illustrated in the example below:
vcl 4.1;
backend default {
.host = "127.0.0.1";
.port = "8080";
.probe = {
.url = "/health";
.interval = 10s;
.timeout = 5s;
}
}
Probes usually perform HTTP requests to check the health of a backend. By using TCP-only probes, the health of a backend is checked by the availability of the TCP connection.
This can be used to probe non-HTTP endpoints. However, TCP-only probes
cannot be used with .url, .request, or .expected_response
properties.
Here’s how you define such a probe:
probe tcp_healtcheck {
.tcponly = 1;
}
Keep in mind that TCP-only probes are only available in Varnish Enterprise.
The backend data structure has additional properties that can be set
with regard to the endpoint it is connecting to.
If you want to connect to your backend using a UNIX domain socket,
you’ll use the .path property. It is mutually exclusive with the
.host property and is only available when you use the vcl 4.1;
version declaration.
Here’s an example of a UDS-based backend definition:
backend default {
.path = "/var/run/some-backend.sock";
}
If for some reason the Host header is not set in your HTTP requests,
you can use the .host_header property to override it.
Here’s an example:
backend default {
.host = "127.0.0.1";
.port = "8080";
.host_header = "example.com";
}
This .host_header property will be used for both regular backend
requests and health probe checks.
An access control list (ACL) is a VCL data structure that contains hostnames, IP addresses, and subnets. An ACL is used to match client addresses and restrict access to certain resources.
Here’s how you define an ACL:
acl admin {
"localhost";
"secure.my-server.com";
"192.168.0.0/24";
! "192.168.0.25";
}
This ACL named admin contains the following rules:
localhost is allowed.secure.my-server.com is also allowed.192.168.0.0/24 subnet are allowed.192.168.0.25.In your VCL code, you can then match the client IP address to that list, as you’ll see in the next example:
acl admin {
"localhost";
"secure.my-server.com";
"192.168.0.0/24";
! "192.168.0.25";
}
sub vcl_recv {
if(req.url ~ "^/admin/?" && client.ip !~ admin) {
return(synth(403,"Forbidden"));
}
}
In this example, we’re hooking into vcl_recv to intercept requests for
/admin or any subordinate resource of /admin/. If users try to
access this resource, we check if their client IP address is matched
by acl admin.
If it doesn’t match, an HTTP 403 Forbidden error is returned
synthetically.
Complex logic in a programming language is usually abstracted away by functions. This is also the case in VCL, which has a number of native functions.
The number of functions is limited, but extra functions are available in the wide range of VMODs that are supported by Varnish.
In chapter 5, we’ll talk about VMODs and how their functions extend the capabilities of Varnish.
ban() is a function that adds an expression to the ban list. These
expressions are matched to cached objects. Every matching object is then
removed from the cache.
In essence, the ban() function exists to invalidate multiple objects
at the same time.
Although banning will be covered in detail in chapter 5, here’s a quick example:
ban("obj.age > 1h");
Multiple expressions can be chained using the && operator.
The hash_data() function is used within the vcl_hash subroutine and
is used to append string data to the hash input that is used to
lookup an object in cache.
Let’s just revisit the built-in VCL for vcl_hash where hash_data()
is used:
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
return (lookup);
}
The synthetic() function prepares a synthetic response body and uses
a string argument for its input. This function can be used within
vcl_synth and vcl_backend_error.
Here’s an example for vcl_synth:
synthetic(resp.reason);
However, this function is no longer used in the built-in VCL. As of
Varnish Cache 5.0, it is recommended to instead use
set beresp.body = {""};.
The regsub() function is a very popular function in Varnish. This
function performs string substitution using regular expressions.
Basically, do find/replace on the first occurrence using a regex
pattern.
This is the API of this function:
regsub(string, regex, sub)
string argument is your input.regex argument is the regular expression you’re using to match
what you’re looking for in the input string.sub argument is what the input string will be substituted
with.Here’s a really practical example where we use regsub() to extract a
cookie value:
vcl 4.1;
sub vcl_hash {
hash_data(regsub(req.http.Cookie,"(;|^)language=([a-z]{2})(;|$)","\2"));
}
Let’s break it down because it looks quite complex.
This vcl_hash subroutine is used to extend the built-in VCL and to
add the value of the language cookie to the hash. This creates a cache
variation per language.
We really don’t want to hash the entire cookie because that will drive our hit rate down, especially when there are tracking cookies in place.
In order to extract the exact cookie value we need, we’ll match the
req.http.Cookie header to a regular expression that uses grouping.
In the substitution part, we can refer to those groups to extract the
value we want.
Here’s the regular expression:
(;|^)language=([a-z]{2})(;|$)
This regular expression looks for a language= occurrence, followed by
two letters. These letters represent the language. This language
cookie can occur at the beginning of the cookie string, in the middle,
or at the end. The (;|^) and (;|$) statements ensure that this is
possible.
Because we’re using parentheses for grouping, the group where we match
the language itself, is indexed as group two. This means we can refer
to it in the regsub() function as \2.
So if we look at the entire regsub() example:
regsub(req.http.Cookie,"(;|^)language=([a-z]{2})(;|$)","\2")
And let’s imagine this is our Cookie header:
Cookie: privacy_accepted=1;language=en;sessionid=03F1C5944FF4
Given the regular expression and the group referencing, the output
of this regsub() function would be en.
This means that en will be added to the hash along with the URL and
the host header.
When the Cookie header doesn’t contain a language cookie, an empty
string is returned. When there is no Cookie header, an empty string is
returned as well. This means we don’t risk hash-key collisions when
the cookie isn’t set. #### regsuball()
The regsuball() function is very similar to the regsub() function we
just covered. The only difference is where regsub() matches and
replaces the first occurrence of the pattern, regsuball() matches all
occurrences.
Even the function API is identical:
regsuball(string, regex, sub)
string argument is your input.regex argument is the regular expression you’re using to match
what you’re looking for in the input string.sub argument is what the input string will be substituted
with.Let’s have a look at a similar example, where we’ll strip off some
cookies again. Instead of matching the values we want to keep, we’ll
match the values we want to remove. We need to ensure that all
occurrences are matched, not just the first occurrence. That’s why we
use regsuball() instead of regsub():
regsuball(req.http.Cookie,"_g[a-z0-9_]+=[^;]*($|;\s*)","")
What this example does, is remove all Google Analytics cookies. This is the list of cookies we need to remove:
_ga_gid_gat_gac_<property-id>Instead of stripping them off one by one, we can use the
_g[a-z0-9_]+=[^;]*($|;\s*) regular expression to match them all at
once. In the end we’ll replace the matched cookies with an empty string.
This could be the raw value of your req.http.Cookie header:
cookie1=a; _ga=GA1.2.1915485056.1587105100;cookie2=b; _gid=GA1.2.873028102.1599741176; _gat=1
And the end result is the following:
cookie1=a;cookie2=b
At this point, the term subroutine in a VCL context is hopefully not a foreign concept. We’ve been through the Varnish finite state machine multiple times, you’ve seen the corresponding built-in VCL code. But what you might not know is that you can define your own subroutines.
Need an example? Here you go:
vcl 4.1;
sub skipadmin {
if(req.url ~ "^/admin/?") {
return(pass);
}
}
sub vcl_recv {
call skipadmin;
}
The skipadmin subroutine is entirely custom and is called within
vcl_recv using the call statement. The purpose of custom subroutines
is to allow code to be properly structured and functionality
compartmentalized.
The example above groups the logic to bypass requests to the admin
panel in a separate subroutine, which is then called from within
vcl_recv.
You are free to name your custom subroutine whatever you want, but keep in mind that the
vcl_naming prefixes are reserved for the Varnish finite state machine. Please also keep in mind that a subroutine is not a function: it does not accept input parameters, and it doesn’t return values. It’s just a procedure that is called.
Not all of your VCL logic should necessarily be in the same VCL file. When the line count of your VCL file increases, readability can become an issue.
To tackle this issue, VCL allows you to include VCL from other files. The include syntax is not restricted to subroutines and fixed language structures, even individual lines of VCL code can be included.
The include "<filename>;" syntax will tell the compiler to read the
file and copy its contents into the main VCL file.
When including a file, the order of execution in the main VCL file will be determined by the order of inclusion.
This means that each include can define its own VCL routing logic and if an included file exits the subroutine early, it will bypass any logic that followed that return statement.
The built-in VCL follows this logic and can be thought of as an
included file at the end of your VCL. This means that if you put a
return statement anywhere in your VCL, the built-in VCL logic will
be skipped since it is always appended at the end of your VCL.
So let’s talk about the previous example, where the skipadmin
subroutine is used and put the custom subroutine in a separate file:
#This is skipadmin.vcl
sub skipadmin {
if(req.url ~ "^/admin/?") {
return(pass);
}
}
In your main VCL file, you’ll use the include syntax to include
skipadmin.vcl:
vcl 4.1;
include "skipadmin.vcl";
sub vcl_recv {
call skipadmin;
}
And the resulting compiled VCL would be:
vcl 4.1;
sub skipadmin {
if(req.url ~ "^/admin/?") {
return(pass);
}
}
sub vcl_recv {
call skipadmin;
}
The import statement can be used to import VMODs. These are Varnish
modules, written in C-code, that are loaded into Varnish and offer
a VCL interface. These modules basically enrich the VCL syntax
without being part of the Varnish core.
We’ll cover all the ins and outs of VMODs in the next chapter.
Here’s a quick example:
vcl 4.1;
import std;
sub vcl_recv {
set req.url = std.querysort(req.url);
}
This example uses import std; to import Varnish’s standard library
containing a set of utility functions. The std.querysort() function
will alphabetically sort the query string parameters of a URL, which
has a beneficial impact on the hit rate of the cache.