Search

Making changes

Making changes

In the previous section of the book, we took a deep dive into all the VCL variables. We also covered the built-in VCL and the Varnish finite state machine extensively.

In this section, we’ll cover some basic scenarios on how to make meaningful changes in your VCL.

Excluding URL patterns

You want to cache as much as possible, but in reality you can’t: resources that are stateful are often hard or impossible to cache.

When caching a stateful resource would result in too many variations, it’s not worth caching.

A very common pattern in VCL is to exclude URL patterns and do a return(pass) when they are matched.

This one comes right out of the Magento 2 VCL file:

vcl 4.1;

sub vcl_recv {
    if (req.url ~ "/checkout") {
        return (pass);
    }
}

Because the /checkout URL namespace is a very personalized experience, it’s not really cacheable: you’re dealing with logins and payment details. You really have to pass here.

And here’s another example coming from WordPress:

vcl 4.1;

sub vcl_recv {
    if (req.url ~ "wp-(login|admin)" || req.url ~ "preview=true") {
        return (pass);
    }
}

If the URL starts with /wp-login or /wp-admin, you’re trying to access the admin panel, which is not cacheable.

This is also the case when you’re previewing cacheable pages when being logged in. As a result, pages containing preview=true in the URL won’t be cached either.

Notice that only slight modifications were required to achieve our goal. Because we only return(pass) for specific patterns, the rest of the application can still rely on the built-in VCL. As always, the built-in VCL is your safety net.

Sanitizing the URL

Cache objects are identified by the URL. The URL is not just the identifier of the resource, it can also contain query string parameters. But in terms of hashing, Varnish treats the URL as string.

This means that the slightest change in any of the query string parameters will result in a new hash, which in its turn results in a cache miss.

There are some strategies where the URL is sanitized in order to avoid too many cache misses.

Here’s some VCL to sanitize your URL:

vcl 4.1;

import std;

sub vcl_recv {
    # Sort the query string parameters alphabetically
    set req.url = std.querysort(req.url);
    
    # Remove third-party tracking parameters 
    if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content)=") {
        set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content)=([A-z0-9_\-\.%25]+)", "");
        set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content)=([A-z0-9_\-\.%25]+)", "?");
        set req.url = regsub(req.url, "\?&", "?");
        set req.url = regsub(req.url, "\?$", "");
    }
    
    # Remove hashes from the URL
    if (req.url ~ "\#") {
    set req.url = regsub(req.url, "\#.*$", "");
    }
    
    # Strip off trailing question marks
    if (req.url ~ "\?$") {
    set req.url = regsub(req.url, "\?$", "");
    }   
}

Alphabetic sorting

The first step is to sort the query string parameters alphabetically. If you change the order of a query string parameter, you change the string, which results in a cache miss.

The std.querysort function from vmod_std does this for you. It’s a simple modification that can have a massive impact.

Removing tracking query string parameters

Marketing people are keen to figure out how their campaigns are performing. Google Analytics can add campaign context to URL by adding tracking URL parameters.

Here’s a list of these parameters:

  • utm_source
  • utm_medium
  • utm_campaign
  • utm_content

In the example above we’re stripping them off because they are meaningless to the server, and they mess with our hit rate. Because these parameters are processed client-side, removing them server-side has no negative impact.

The regsub() and regsuball() functions in the example above strip off unwanted tracking query string parameters using regular expressions.

Removing URL hashes

In HTML, we can mark page sections using anchors, as illustrated below:

<a name="my-section"></a>

You can directly scroll to this section by adding a hash to the URL. Here’s how that looks:

http://example.com/#my-section

We’ve said it 100 times at least, and we’ll have to repeat it again: changing the URL changes the lookup hash for the cache. These URL hashes are also meaningless in a server-side context and also mess with our hit rate.

Your best move is to strip them off. The set req.url = regsub(req.url, "\#.*$", ""); does this for you.

Removing trailing question marks

In the same vein as the previous example, we want to avoid cache misses by stripping off trailing question marks.

The ? in a URL indicates the start of the query string parameters. But if the question mark is at the end of the URL, there aren’t any parameters, so we need to strip off the ?. This is done by set req.url = regsub(req.url, "\?$", "");

Stripping off cookies

Cookies are indicators of state. And stateful content should not be cached unless the variations are manageable.

But a lot of cookies are there to personalize the experience. They keep track of session identifiers, and there are also tracking cookies that change upon every request.

There are two approaches to get rid of them:

  • Identify the cookies you want to remove
  • Identify the cookies you want to keep

Removing select cookies

vcl 4.1;

    sub vcl_recv {
    # Some generic cookie manipulation, useful for all templates that follow
    # Remove the "has_js" cookie
    set req.http.Cookie = regsuball(req.http.Cookie, "has_js=[^;]+(; )?", "");
    
    # Remove any Google Analytics based cookies
    set req.http.Cookie = regsuball(req.http.Cookie, "__utm.=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "_ga=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "_gat=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "utmctr=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "utmcmd.=[^;]+(; )?", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "utmccn.=[^;]+(; )?", "");
    
    # Remove DoubleClick offensive cookies
    set req.http.Cookie = regsuball(req.http.Cookie, "__gads=[^;]+(; )?", "");
    
    # Remove the Quant Capital cookies (added by some plugin, all __qca)
    set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", "");
    
    # Remove the AddThis cookies
    set req.http.Cookie = regsuball(req.http.Cookie, "__atuv.=[^;]+(; )?", "");
    
    # Remove a ";" prefix in the cookie if present
    set req.http.Cookie = regsuball(req.http.Cookie, "^;\s*", "");
    
    # Are there cookies left with only spaces or that are empty?
    if (req.http.cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

This VCL snippet will identify every single cookie pattern that needs to be removed. It ranges from Google Analytics tracking cookies, to DoubleClick, all the way to AddThis.

Every cookie that matches is removed. If you end up with nothing more than a set of whitespace characters, this means there weren’t any cookies left, and we remove the entire Cookie header.

Cookies that weren’t removed will remain in the Cookie header and will fall back on the built-in VCL, which will perform a return(pass);. This is not really a problem because it’s by design.

Removing all but some cookies

The opposite is actually a lot easier: only keep a couple of cookies, and remove the rest.

Here’s an example that does that:

vcl 4.1;

sub vcl_recv {
    if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
    
    if (req.http.cookie ~ "^\s*$") {
        unset req.http.cookie;
    }
}

Imagine having a PHP web application that has an admin panel. When you create a session in PHP, the PHPSESSID cookie is used by default. This is the only cookie that matters server-side in our application.

When this cookie is set, you’re logged in, and the page can no longer be cached. This example looks quite complicated, but it just sets up a cookie format where PHPSESSID can easily be identified, and other cookies are replaced with an empty string.

And again: if you end up with a collection of whitespace characters, you can just remove that cookie.

If you’re on Varnish Cache 6.4 or later, vmod_cookie is shipped by default.

Here’s the first example, where we explicitly remove cookies using vmod_cookie:

vcl 4.1;

import cookie;

sub vcl_recv {
    cookie.parse(req.http.cookie);
    cookie.filter("_ga,_gat,utmctr,__gads,has_js");
    cookie.filter_re("(__utm.|utmcmd.|utmccn.|__qc.|__atuv.)");
    set req.http.cookie = cookie.get_string();
    if (req.http.cookie ~ "^\s*$") {
         unset req.http.cookie;
     }    
}

Here’s the second example, where we only keep the PHPSESSID cookie using vmod_cookie:

vcl 4.1;

import cookie;

sub vcl_recv {
    cookie.parse(req.http.cookie);
    cookie.keep("PHPSESSID");
    set req.http.cookie = cookie.get_string();
    if (req.http.cookie ~ "^\s*$") {
         unset req.http.cookie;
     }
}

You have to admit, this is a lot simpler. There are still regular expressions involved, but only to match cookie names. The complicated logic to match names, values, and separators is completely abstracted.

Using vmod_cookieplus

If you’re not on Varnish Cache 6.4, you can still benefit from another cookie VMOD: Varnish Enterprise ships vmod_cookieplus. This is an Enterprise version of vmod_cookie that has more features and a slightly different API.

Here’s the first example, where we explicitly remove cookies using vmod_cookieplus:

vcl 4.1;

import cookieplus;

sub vcl_recv {
    cookieplus.delete("_ga");
    cookieplus.delete("_gat");
    cookieplus.delete("utmctr");
    cookieplus.delete("__gads");
    cookieplus.delete("has_js");                
    cookieplus.delete_regex("(__utm.|utmcmd.|utmccn.|__qc.|__atuv.)");
    cookieplus.write(); 
}

And here’s how we only keep the PHPSESSID cookie using vmod_cookieplus:

vcl 4.1;

import cookieplus;

sub vcl_recv {
    cookieplus.keep("PHPSESSID");
    cookieplus.write();
}

As you can see, vmod_cookieplus doesn’t need to be initialized, the Cookie header doesn’t need to be parsed in advanced, and although there is a cookieplus.write() function, it doesn’t require writing the value back to req.http.Cookie.

A final note about vmod_cookieplus is that the deletion process doesn’t leave you with an empty Cookie header, unlike vmod_cookie. If the cookie is empty in the end, it is stripped off automatically.

Sanitizing content negotiation headers

We already covered this in chapter 3, but sanitizing your content negotiation headers is important, especially if you’re planning on varying on them.

By content negotiation headers we mean Accept and Accept-Language. There’s also the Accept-Encoding header, but Varnish handles this one out of the box.

The Accept request header defines what content types the client supports. This could be text/plain, text/html, or even application/json.

The Accept-Language request header defines what languages the client understands. This is an ideal way to serve multilingual content with explicit language selection.

The problem with these headers is that they can have so many variations. If you would do a Vary: Accept-Language your hit rate might drop massively.

It’s not only the vast number of languages that cause this, but also the order, the priority and the localization of these languages.

You probably have a pretty good idea which languages your web platform supports. Just allow them and rely on a default value when the client’s preferred language is not supported.

Here’s the example we used in chapter 3:

vcl 4.1;

import accept;

sub vcl_init {
    new lang = accept.rule("en");
    lang.add("nl");
}

sub vcl_recv {
    set req.http.accept-language = lang.filter(req.http.accept-language);
}

This is the Accept-Language header in my browser:

Accept-language: nl-BE,nl;q=0.9,en-US;q=0.8,en;q=0.7

These settings are personalized, and your browser settings will undoubtedly differ. Without a proper cleanup, it is impossible to get a decent hit rate when you vary on this header.

My VCL script will pick nl as the selected language. If nl is nowhere to be found in the Accept-Language header, en will be the fallback.

vmod_accept also works for the Accept header. Here’s an example:

vcl 4.1;

import accept;

sub vcl_init {
    new format = accept.rule("text/plain");
    format.add("text/html");
    format.add("application/json");
}

sub vcl_recv {
    set req.http.accept = format.filter(req.http.accept);
}

In this example we support content that is HTML or JSON. Anything else will result in the text/plain MIME type, which just means the document is not parsed and returned as plain text.

By sanitizing your content negotiation headers, you limit the variations per header, and you can safely issue a Vary: Accept, or a Vary: Accept-Language in your web application.

Overriding TTLs

Developer empowerment is a term we use a lot when we talk about caching. In chapter 3, we covered it in great detail: HTTP has so many built-in mechanisms to improve the cacheability of your web application. If you use the right headers, you’re in control.

However, in the real world, Cache-Control and Expires headers aren’t always used. And quite often you’ll find Cache-Control: private, no-cache, no-store headers on a perfectly cacheable page.

Refactoring your code and implementing the proper HTTP headers is a good idea. But every now and then, you’ll run into a legacy application that you wouldn’t want to touch with a stick: “it works, but don’t ask how”.

That’s where VCL comes into play. The beresp.ttl value is determined by the value of Cache-Control or Expires. But you can override the value if required.

Static data example

The following example will identify images and videos based on the Content-Type header. For those resources we set the TTL to one year because it’s static data, and it’s not supposed to change.

And if the Cache-Control header contains no-cache, no-store, or private, we strip off the Cache-Control header. Otherwise, the built-in VCL would turn this into a hit-for-miss:

vcl 4.1;

sub vcl_backend_response {
    if (beresp.http.Content-Type ~ "^(image|video)/") {
        if(beresp.http.Cache-Control ~ "(?i:no-cache|no-store|private)")){
            unset beresp.http.Cache-Control;
        }
        set beresp.ttl = 1y;
    }
}

Overriding the default TTL

Varnish’s default TTL is defined by the default_ttl runtime parameter. By default this is 120 seconds.

If you change the value of the default_ttl parameter, Varnish will use that value if the HTTP response doesn’t contain a TTL.

You can also do it in VCL:

vcl 4.1;

sub vcl_backend_response {
    set beresp.ttl = 1h;
}

Zero TTLs are evil

The lifetime of an object is defined by its TTL. If the TTL is zero, the object is stale. If grace and keep values are set, the TTL can even be less than zero.

An instinctive reaction is to set beresp.ttl = 0s if you want to make sure an object is not stored in cache. However, you’re doing more harm than good.

The built-in VCL has a mechanism in place to deal with uncacheable content:

set beresp.ttl = 120s;
set beresp.uncacheable = true;

By setting beresp.uncacheable = true, we’re deciding to cache the decision not to cache, as explained earlier in the book. We call this hit-for-miss and hit-for-pass, and these objects are kept for two minutes.

This metadata is used to bypass the waiting list, as we explained in the under the hood section in chapter 1.

By setting beresp.ttl = 0s, you lose the metadata, requests for this resource are put on the waiting list, and request coalescing will not satisfy the request.

The end result is serialization, which means that these items on the waiting list are processed serially rather than in parallel. The impact of serialization is increased latency for the clients.

We said it before, and we’ll say it again: zero TTLs are evil

Dealing with websockets

Websockets are a mechanism that offers full-duplex communication over a single TCP connection. Websockets are used for real-time bi-directional communication between a client and a server without the typical request-response exchange.

Websockets are initiated via HTTP, but the Connection: Upgrade and Upgrade: websocket headers will trigger a protocol upgrade. This protocol upgrade results in a persisted open connection between client and server, where another protocol is used for communication over the TCP connection.

Here’s an example request:

GET /chat
Host: example.com
Origin: https://example.com
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: Iv8io/9s+lYFgZWcXczP8Q==
Sec-WebSocket-Version: 13

This could be the following:

HTTP 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: hsBlbuDTkk24srzEOTBUlZAlC2g=

And as soon is the protocol has been switched, we’re no longer communicating over HTTP.

If you remember the Varnish finite state machine, and the various return statements, then you’ll probably agree that return(pipe) is the way to go here.

The vcl_pipe subroutine is used to deal with traffic that couldn’t be identified as HTTP. The built-in VCL uses it when Varnish notices an unsupported request method. The pipe we refer to is the TCP connection between Varnish and the backend. When a return(pipe) is executed, the raw bytes are shuffled over the wire, without interpreting anything as HTTP.

Here’s how you detect websockets in VCL, and how you successfully pipe the request to the backend without the loss of the connection upgrade headers:

sub vcl_recv {
    if (req.http.upgrade ~ "(?i)websocket") {
        return (pipe);
    }
}

sub vcl_pipe {
    if (req.http.upgrade) {
        set bereq.http.upgrade = req.http.upgrade;
        set bereq.http.connection = req.http.connection;
    }
}

Enabling ESI support

Edge Side Includes are a powerful hole-punching technique to dissect web pages into separate blocks that are processed as individual HTTP requests.

The ESI tag is a placeholder that is interpreted by Varnish and is replaced by the resource it refers to.

We already talked about this, but as a reminder, this is what an ESI tag looks like:

<esi:include src="/header" />

Varnish can interpret these tags, but this needs to be triggered through set beresp.do_esi = true. Because this is more computationally intensive, you don’t want to keep this turned on all the time.

Inspect the URL

In a lot of cases, people will match the URLs where ESI parsing is required, which might look like this:

vcl 4.1;

sub vcl_backend_response {
    if(bereq.url == "/" || bereq.url ~ "^/articles") {
        set beresp.do_esi = true;
    }
}

Unfortunately, this doesn’t offer you a lot of flexibility: whenever changes in the origin application occur, the VCL file needs to be modified. From a developer empowerment point of view, this is a poor implementation.

Inspect the Content-Type header

Another approach is to make an assumption about what kind of content would require ESI parsing.

The example below looks at the Content-Type, and assumes that all HTML pages are ESI parsing candidates. So if Content-Type: text/html is set, ESI parsing is enabled:

vcl 4.1;

    sub vcl_backend_response {
    if (beresp.http.content-type ~ "text/html") {
        set beresp.do_esi = true;
    }
}

But again, this results in far too many non-ESI pages being processed.

Surrogate headers

The preferred solution takes us all the way back to chapter 3, where we talked about the capabilities of HTTP. The surrogate headers enable the capability that is most relevant to this use case: by leveraging the Surrogate-Capability, and the Surrogate-Control headers, you can negotiate about behavior on the edge.

Varnish can announce ESI support through the following request header:

Surrogate-Capability: varnish="ESI/1.0"

When the origin has detected ESI support on the edge, it can leverage this and request ESI parsing through the following response header:

Surrogate-Control: content="ESI/1.0"

There is in fact a handshake that takes place to negotiate ESI parsing. Here is the VCL required to support this:

vcl 4.1;

sub vcl_recv {
    set req.http.Surrogate-Capability = "varnish=ESI/1.0";
}

sub vcl_backend_response {
    if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
        unset beresp.http.Surrogate-Control;
        set beresp.do_esi = true;
    }
}

And this is a conventional solution that only consumes CPU cycles to parse ESI when it’s absolutely necessary.

Protocol detection

Varnish Cache doesn’t support native TLS; Varnish Enterprise does. However, the most common way to support TLS in Varnish is by terminating it using a TLS proxy. We’ll discuss this in-depth in the TLS section of chapter 7.

But for now, it is important to know that Varnish usually only processes plain HTTP. But thanks to the PROXY protocol, Varnish has more information about the original connection that was made.

Protocol detection and protocol awareness are important for the origin, because they use this information to build the right URL schemes. If http:// is used as URL instead of https://, this might lead to mixed content, which is problematic from a browser point of view.

If you use a TLS proxy with PROXY protocol support, and connect it to Varnish using a listening socket that supports PROXY, VCL will use the connection metadata to populate the endpoint variables we discussed earlier in this chapter.

The following example uses the std.port(server.ip) expression to retrieve the server port. Because Varnish only does HTTP, this is not always 80. If Varnish receives a connection via the PROXY protocol, the value might be 443 if a TLS proxy terminated the connection:

vcl 4.1;

import std;

sub vcl_recv {
    set req.http.X-Forwarded-Port = std.port(server.ip);
    
    if(req.http.X-Forwarded-Port == "443") {
        set req.http.X-Forwarded-Proto = "https";
    } else {
        set req.http.X-Forwarded-Proto = "http";
    }
}

The result of this VCL snippet is the X-Forwar``d``ed-Proto header being sent to the origin. This header is a conventional one and contains either http, or https. It’s up to the origin to interpret this header and act accordingly. This value can be used to force HTTPS redirection, but also to create the right URLs in hypermedia resources.

Using vmod_proxy

If your TLS proxy communicates with Varnish over the PROXY protocol, you can leverage vmod_proxy to easily check whether or not TLS/SSL was used for the request.

vcl 4.1;

import proxy;

sub vcl_recv {
    if(proxy.is_ssl()) {
        set req.http.X-Forwarded-Proto = "https";
    } else {
        set req.http.X-Forwarded-Proto = "http";
    }
}

As you can see, it’s only a matter of checking proxy.is_ssl(), and you’re good to go.

Using vmod_tls

If you’re using a recent version of Varnish Enterprise, native TLS will be supported. If you’ve enabled native TLS using the -A flag, there is no TLS proxy, and the PROXY protocol isn’t used.

In Varnish Enterprise there is vmod_tls to check TLS parameters when native TLS is used.

Here’s the vmod_tls equivalent of proxy.is_ssl():

vcl 4.1;

import tls;

sub vcl_recv {
    if(tls.is_ssl()) {
        set req.http.X-Forwarded-Proto = "https";
    } else {
        set req.http.X-Forwarded-Proto = "http";
    }
}

Instead of using proxy.is_ssl(), there’s tls.is_ssl() to figure out what protocol was used.

VCL cache variations

Cache variations were discussed in chapter 3. Using the Vary header, an origin server can instruct Varnish to create a cache variation for a specific request header. Vary: Accept-Language would create a variation per cached object based on the browser language.

Although it is a very powerful instrument, a lot of web applications don’t use it. If refactoring your application to include Vary is impossible or too hard, you can also create the variation in VCL.

Protocol cache variations

What better way to illustrate VCL cache variations than by grabbing the previous example and creating a cache variation on X-Forwarded-Proto:

vcl 4.1;

import std;

sub vcl_recv {
    set req.http.X-Forwarded-Port = std.port(server.ip);
    
    if(req.http.X-Forwarded-Port == "443") {
        set req.http.X-Forwarded-Proto = "https";
    } else {
        set req.http.X-Forwarded-Proto = "http";
    }
}

sub vcl_hash (
    hash_data(req.http.X-Forwarded-Proto);
}

What we’re basically doing is adding X-Forwarded-Proto to the hash using hash_data(). Because we’re not returning anything in vcl_hash, we’re falling back on the built-in VCL, which also adds the request URL and the host.

Language cache variations

Let’s grab yet another example from this chapter to illustrate language cache variations. Remember the example where we sanitized the Accept-Language header? Let’s use this example to create a cache variation:

vcl 4.1;

import accept;

sub vcl_init {
    new lang = accept.rule("en");
    lang.add("nl");
}

sub vcl_recv {
    set req.http.accept-language = lang.filter(req.http.accept-language);
}

sub vcl_hash (
    hash_data(req.http.Accept-Language);
}

Because Accept-Language is sanitized, the number of values are limited, which reduces the number of cache variations. You can confidently vary on this header. And if you don’t, you can still use hash_data(req.http.Accept-Language) to do it in VCL.

However, the majority of multilingual websites use a language selection menu, or a splash page, instead of the Accept-Language header. The selected language is then stored in a cookie.

But we know that Varnish doesn’t cache when cookies are present because it implies stateful content. Varying on the Cookie header is also a bad idea, given the amount of tracking cookies that are injected.

But all is not lost! We can extract the value of the language cookie, and create a variation using VCL.

Imagine this being the language cookie:

Cookie: language=en

This is the VCL code you could use to vary on the language value:

vcl 4.1;

sub vcl_recv {
    if (req.http.Cookie) {
        set req.http.Cookie = ";" + req.http.Cookie;
        set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
        set req.http.Cookie = regsuball(req.http.Cookie, ";(language)=", "; \1=");
        set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
        set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

        if (req.http.cookie ~ "^\s*$") {
            unset req.http.cookie;
        }

        return(hash);
    }
}

sub vcl_hash {
    if(req.http.Cookie ~ "^.*language=(nl|en|fr);*.*$") {
        hash_data(regsub( req.http.Cookie, "^.*language=(nl|en|fr);*.*$", "\1" ));    
    } else {
        hash_data("en");
    }

}

In the vcl_recv subroutine, we’re doing the typical find and replace magic where we delete all the cookies, except the ones that matter. In our case that’s the language cookie.

Instead of doing a return(pass) when there are still cookies left, we deliberately call return(hash), and consider the content cacheable.

In vcl_hash, we check whether or not the cookies have been set. If not, we add en as the default language cache variation. Otherwise, we just extract the value from the cookie using the regsub() function.

Because we explicitly defined the list of supported languages in the regular expression, we avoid that too many variations can occur.

Here’s the same example, but with vmod_cookie for those who are on Varnish Cache 6.4 or later:

vcl 4.1;

import cookie;

sub vcl_recv {
    cookie.parse(req.http.cookie);
    cookie.keep("language");
    set req.http.cookie = cookie.get_string();
    if (req.http.cookie ~ "^\s*$") {
         unset req.http.cookie;
     }    
}

sub vcl_hash {
    if(cookie.get("language") ~ "^(nl|en|fr|de|es)$" ) {
        hash_data(cookie.get("language"));    
    } else (
        hash_data("en");
    }

}

Using vmod_cookieplus

Here’s the vmod_cookieplus implementation for those who use Varnish Enterprise:

vcl 4.1;

import cookieplus;

sub vcl_recv {
    cookieplus.keep("language");
    cookieplus.write();   
}

sub vcl_hash {
    if(cookieplus.get("language") ~ "^(nl|en|fr|de|es)$" ) {
        hash_data(cookieplus.get("language"));    
    } else (
        hash_data("en");
    }

}

Custom error messages

When a backend response fails, Varnish will return an error page that looks like this:

Error 503 Backend fetch failed
Backend fetch failed

Guru Meditation:
XID: 3

---

Varnish cache server

It looks a bit weird, and the guru meditation message doesn’t look that appealing.

The current built-in VCL implementation

These error messages, and the layout for synthetic responses are part of the built-in VCL. Here’s the VCL code for vcl_backend_error, in case of errors:

sub vcl_backend_error {
    set beresp.http.Content-Type = "text/html; charset=utf-8";
    set beresp.http.Retry-After = "5";
    set beresp.body = {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + beresp.status + " " + beresp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + beresp.status + " " + beresp.reason + {"</h1>
    <p>"} + beresp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + bereq.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"};
    return (deliver);
}

Regular synthetic responses triggered from client-side VCL logic have a similar VCL implementation:

sub vcl_synth {
    set resp.http.Content-Type = "text/html; charset=utf-8";
    set resp.http.Retry-After = "5";
    set resp.body = {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + resp.status + " " + resp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + resp.status + " " + resp.reason + {"</h1>
    <p>"} + resp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + req.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"};
    return (deliver);
}

Customize error messages using templates

To tackle the issue, you could modify the string that is assigned to beresp.body in vcl_backend_response, or resp.body in vcl_synth, but that can go wrong really quickly.

Not only can it become a copy-paste mess, but you also have to take variable interpolations into account.

The ideal solution is to load a template from a file, potentially replace some placeholder values, and inject the string value into the response body.

Here’s the VCL code:

vcl 4.1;

import std;

sub vcl_synth {
    set resp.http.Content-Type = "text/html; charset=utf-8";
    set resp.http.Retry-After = "5";
    set resp.body = regsuball(std.fileread("/etc/varnish/synth.html"),"<<REASON>>",resp.reason);
    return (deliver);
}

sub vcl_backend_error {
    set beresp.http.Content-Type = "text/html; charset=utf-8";
    set beresp.http.Retry-After = "5";
    set beresp.body = regsuball(std.fileread("/etc/varnish/synth.html"),"<<REASON>>",beresp.reason);
    return (deliver);
}

This example will use std.fileread() to load a file from disk and present it as a string. Using regsuball() we’re going to replace all occurrences of the <<REASON>> placeholder in that file with the actual reason phrase. This will be provided by either resp.reason or beresp.reason.

The cool thing about this implementation is that you can have your frontend developers compose and style this file to match the branding of the actual website. It can contain images, CSS, JavaScript, and all the other goodies, but it doesn’t fill up your VCL with very verbose content.

Caching objects on the second miss

Directly inserting an object in cache when a cache miss occurs is the default Varnish behavior. It also makes sense: we want to avoid hitting the origin server, so caching something as soon as possible is the logical course of action.

However, when you have a limited cache size, inserting random objects may not be very efficient. For long-tail content, you risk filling up your cache with objects that will hardly be requested.

A solution could be to only insert an object in cache on the second miss. The following example leverages vmod_utils, which is a Varnish Enterprise VMOD.

vcl 4.1;

import utils;

sub vcl_backend_response {
    if (!utils.waitinglist() && utils.backend_misses() == 0) {
        set beresp.uncacheable = true;
        set beresp.ttl = 24h;
    }
}

When a cache miss occurs, and we fetch content from the origin, utils.backend_misses() will tell us whether or not a hit-for-miss has already occurred.

As long as this value is 0, we know that this resource has not been requested, and didn’t result in a hit-for-miss for the last 24 hours. In that case we will enable beresp.uncacheable and set the TTL to 24 hours.

This ensures that Varnish keeps track of that hit-for-miss for a full day. When the next request for that resource is received during that timeframe, we know it’s somewhat popular, and we can insert the response in cache.

Because of request coalescing, it is possible that other clients are requesting the same content at exactly the same time. These requests will be put on the waiting list while the first in line fetches the content. We cannot hit-for-miss when this happens, because that would cause request serialization. Luckily utils.waitinglist() gives us insight into the number of waiting requests for that resource.

The end result is that only hot content is cached, and our precious caching space is less likely to be wasted on long-tail content. Of course you can tune this behavior and choose to only cache on the third or fourth miss. That’s up to you.

Keep in mind that this doesn’t work with hit-for-pass: when you add return(pass) to this logic to trigger hit-for-pass, the decision not to cache will be remembered for 24 hours and cannot be undone by the next cacheable response. This defeats the purpose of this feature.


®Varnish Software, Wallingatan 12, 111 60 Stockholm, Organization nr. 556805-6203