In the previous section of the book, we took a deep dive into all the VCL variables. We also covered the built-in VCL and the Varnish finite state machine extensively.
In this section, we’ll cover some basic scenarios on how to make meaningful changes in your VCL.
You want to cache as much as possible, but in reality you can’t: resources that are stateful are often hard or impossible to cache.
When caching a stateful resource would result in too many variations, it’s not worth caching.
A very common pattern in VCL is to exclude URL patterns and do a
return(pass) when they are matched.
This one comes right out of the Magento 2 VCL file:
vcl 4.1;
sub vcl_recv {
if (req.url ~ "/checkout") {
return (pass);
}
}
Because the /checkout URL namespace is a very personalized experience,
it’s not really cacheable: you’re dealing with logins and payment
details. You really have to pass here.
And here’s another example coming from WordPress:
vcl 4.1;
sub vcl_recv {
if (req.url ~ "wp-(login|admin)" || req.url ~ "preview=true") {
return (pass);
}
}
If the URL starts with /wp-login or /wp-admin, you’re trying to
access the admin panel, which is not cacheable.
This is also the case when you’re previewing cacheable pages when being
logged in. As a result, pages containing preview=true in the URL won’t
be cached either.
Notice that only slight modifications were required to achieve our goal. Because we only
return(pass)for specific patterns, the rest of the application can still rely on the built-in VCL. As always, the built-in VCL is your safety net.
Cache objects are identified by the URL. The URL is not just the identifier of the resource, it can also contain query string parameters. But in terms of hashing, Varnish treats the URL as string.
This means that the slightest change in any of the query string parameters will result in a new hash, which in its turn results in a cache miss.
There are some strategies where the URL is sanitized in order to avoid too many cache misses.
Here’s some VCL to sanitize your URL:
vcl 4.1;
import std;
sub vcl_recv {
# Sort the query string parameters alphabetically
set req.url = std.querysort(req.url);
# Remove third-party tracking parameters
if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content)=") {
set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content)=([A-z0-9_\-\.%25]+)", "");
set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content)=([A-z0-9_\-\.%25]+)", "?");
set req.url = regsub(req.url, "\?&", "?");
set req.url = regsub(req.url, "\?$", "");
}
# Remove hashes from the URL
if (req.url ~ "\#") {
set req.url = regsub(req.url, "\#.*$", "");
}
# Strip off trailing question marks
if (req.url ~ "\?$") {
set req.url = regsub(req.url, "\?$", "");
}
}
The first step is to sort the query string parameters alphabetically. If you change the order of a query string parameter, you change the string, which results in a cache miss.
The std.querysort function from vmod_std does this for you. It’s a
simple modification that can have a massive impact.
Marketing people are keen to figure out how their campaigns are performing. Google Analytics can add campaign context to URL by adding tracking URL parameters.
Here’s a list of these parameters:
utm_sourceutm_mediumutm_campaignutm_contentIn the example above we’re stripping them off because they are
meaningless to the server, and they mess with our hit rate. Because
these parameters are processed client-side, removing them
server-side has no negative impact.
The regsub() and regsuball() functions in the example above strip
off unwanted tracking query string parameters using regular
expressions.
In HTML, we can mark page sections using anchors, as illustrated below:
<a name="my-section"></a>
You can directly scroll to this section by adding a hash to the URL. Here’s how that looks:
http://example.com/#my-section
We’ve said it 100 times at least, and we’ll have to repeat it again: changing the URL changes the lookup hash for the cache. These URL hashes are also meaningless in a server-side context and also mess with our hit rate.
Your best move is to strip them off. The
set req.url = regsub(req.url, "\#.*$", ""); does this for you.
In the same vein as the previous example, we want to avoid cache misses by stripping off trailing question marks.
The ? in a URL indicates the start of the query string parameters.
But if the question mark is at the end of the URL, there aren’t any
parameters, so we need to strip off the ?. This is done by
set req.url = regsub(req.url, "\?$", "");
Cookies are indicators of state. And stateful content should not be cached unless the variations are manageable.
But a lot of cookies are there to personalize the experience. They keep track of session identifiers, and there are also tracking cookies that change upon every request.
There are two approaches to get rid of them:
vcl 4.1;
sub vcl_recv {
# Some generic cookie manipulation, useful for all templates that follow
# Remove the "has_js" cookie
set req.http.Cookie = regsuball(req.http.Cookie, "has_js=[^;]+(; )?", "");
# Remove any Google Analytics based cookies
set req.http.Cookie = regsuball(req.http.Cookie, "__utm.=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "_ga=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "_gat=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "utmctr=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "utmcmd.=[^;]+(; )?", "");
set req.http.Cookie = regsuball(req.http.Cookie, "utmccn.=[^;]+(; )?", "");
# Remove DoubleClick offensive cookies
set req.http.Cookie = regsuball(req.http.Cookie, "__gads=[^;]+(; )?", "");
# Remove the Quant Capital cookies (added by some plugin, all __qca)
set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", "");
# Remove the AddThis cookies
set req.http.Cookie = regsuball(req.http.Cookie, "__atuv.=[^;]+(; )?", "");
# Remove a ";" prefix in the cookie if present
set req.http.Cookie = regsuball(req.http.Cookie, "^;\s*", "");
# Are there cookies left with only spaces or that are empty?
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
This VCL snippet will identify every single cookie pattern that needs to be removed. It ranges from Google Analytics tracking cookies, to DoubleClick, all the way to AddThis.
Every cookie that matches is removed. If you end up with nothing more
than a set of whitespace characters, this means there weren’t any
cookies left, and we remove the entire Cookie header.
Cookies that weren’t removed will remain in the Cookie header and will
fall back on the built-in VCL, which will perform a return(pass);.
This is not really a problem because it’s by design.
The opposite is actually a lot easier: only keep a couple of cookies, and remove the rest.
Here’s an example that does that:
vcl 4.1;
sub vcl_recv {
if (req.http.Cookie) {
set req.http.Cookie = ";" + req.http.Cookie;
set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
Imagine having a PHP web application that has an admin panel. When
you create a session in PHP, the PHPSESSID cookie is used by
default. This is the only cookie that matters server-side in our
application.
When this cookie is set, you’re logged in, and the page can no longer be
cached. This example looks quite complicated, but it just sets up a
cookie format where PHPSESSID can easily be identified, and other
cookies are replaced with an empty string.
And again: if you end up with a collection of whitespace characters, you can just remove that cookie.
If you’re on Varnish Cache 6.4 or later, vmod_cookie is shipped by
default.
Here’s the first example, where we explicitly remove cookies using
vmod_cookie:
vcl 4.1;
import cookie;
sub vcl_recv {
cookie.parse(req.http.cookie);
cookie.filter("_ga,_gat,utmctr,__gads,has_js");
cookie.filter_re("(__utm.|utmcmd.|utmccn.|__qc.|__atuv.)");
set req.http.cookie = cookie.get_string();
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
Here’s the second example, where we only keep the PHPSESSID cookie
using vmod_cookie:
vcl 4.1;
import cookie;
sub vcl_recv {
cookie.parse(req.http.cookie);
cookie.keep("PHPSESSID");
set req.http.cookie = cookie.get_string();
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
You have to admit, this is a lot simpler. There are still regular expressions involved, but only to match cookie names. The complicated logic to match names, values, and separators is completely abstracted.
If you’re not on Varnish Cache 6.4, you can still benefit from another
cookie VMOD: Varnish Enterprise ships vmod_cookieplus. This is an
Enterprise version of vmod_cookie that has more features and a
slightly different API.
Here’s the first example, where we explicitly remove cookies using
vmod_cookieplus:
vcl 4.1;
import cookieplus;
sub vcl_recv {
cookieplus.delete("_ga");
cookieplus.delete("_gat");
cookieplus.delete("utmctr");
cookieplus.delete("__gads");
cookieplus.delete("has_js");
cookieplus.delete_regex("(__utm.|utmcmd.|utmccn.|__qc.|__atuv.)");
cookieplus.write();
}
And here’s how we only keep the PHPSESSID cookie using
vmod_cookieplus:
vcl 4.1;
import cookieplus;
sub vcl_recv {
cookieplus.keep("PHPSESSID");
cookieplus.write();
}
As you can see, vmod_cookieplus doesn’t need to be initialized, the
Cookie header doesn’t need to be parsed in advanced, and although
there is a cookieplus.write() function, it doesn’t require writing the
value back to req.http.Cookie.
A final note about vmod_cookieplus is that the deletion process
doesn’t leave you with an empty Cookie header, unlike vmod_cookie.
If the cookie is empty in the end, it is stripped off automatically.
We already covered this in chapter 3, but sanitizing your content negotiation headers is important, especially if you’re planning on varying on them.
By content negotiation headers we mean Accept and Accept-Language.
There’s also the Accept-Encoding header, but Varnish handles this
one out of the box.
The Accept request header defines what content types the client
supports. This could be text/plain, text/html, or even
application/json.
The Accept-Language request header defines what languages the client
understands. This is an ideal way to serve multilingual content with
explicit language selection.
The problem with these headers is that they can have so many variations.
If you would do a Vary: Accept-Language your hit rate might drop
massively.
It’s not only the vast number of languages that cause this, but also the order, the priority and the localization of these languages.
You probably have a pretty good idea which languages your web platform supports. Just allow them and rely on a default value when the client’s preferred language is not supported.
Here’s the example we used in chapter 3:
vcl 4.1;
import accept;
sub vcl_init {
new lang = accept.rule("en");
lang.add("nl");
}
sub vcl_recv {
set req.http.accept-language = lang.filter(req.http.accept-language);
}
This is the Accept-Language header in my browser:
Accept-language: nl-BE,nl;q=0.9,en-US;q=0.8,en;q=0.7
These settings are personalized, and your browser settings will undoubtedly differ. Without a proper cleanup, it is impossible to get a decent hit rate when you vary on this header.
My VCL script will pick nl as the selected language. If nl is
nowhere to be found in the Accept-Language header, en will be the
fallback.
vmod_accept also works for the Accept header. Here’s an example:
vcl 4.1;
import accept;
sub vcl_init {
new format = accept.rule("text/plain");
format.add("text/html");
format.add("application/json");
}
sub vcl_recv {
set req.http.accept = format.filter(req.http.accept);
}
In this example we support content that is HTML or JSON. Anything
else will result in the text/plain MIME type, which just means the
document is not parsed and returned as plain text.
By sanitizing your content negotiation headers, you limit the
variations per header, and you can safely issue a Vary: Accept, or a
Vary: Accept-Language in your web application.
Developer empowerment is a term we use a lot when we talk about caching. In chapter 3, we covered it in great detail: HTTP has so many built-in mechanisms to improve the cacheability of your web application. If you use the right headers, you’re in control.
However, in the real world, Cache-Control and Expires headers aren’t
always used. And quite often you’ll find
Cache-Control: private, no-cache, no-store headers on a perfectly
cacheable page.
Refactoring your code and implementing the proper HTTP headers is a good idea. But every now and then, you’ll run into a legacy application that you wouldn’t want to touch with a stick: “it works, but don’t ask how”.
That’s where VCL comes into play. The beresp.ttl value is determined
by the value of Cache-Control or Expires. But you can override the
value if required.
The following example will identify images and videos based on the
Content-Type header. For those resources we set the TTL to one
year because it’s static data, and it’s not supposed to change.
And if the Cache-Control header contains no-cache, no-store, or
private, we strip off the Cache-Control header. Otherwise, the
built-in VCL would turn this into a hit-for-miss:
vcl 4.1;
sub vcl_backend_response {
if (beresp.http.Content-Type ~ "^(image|video)/") {
if(beresp.http.Cache-Control ~ "(?i:no-cache|no-store|private)")){
unset beresp.http.Cache-Control;
}
set beresp.ttl = 1y;
}
}
Varnish’s default TTL is defined by the default_ttl runtime
parameter. By default this is 120 seconds.
If you change the value of the default_ttl parameter, Varnish will
use that value if the HTTP response doesn’t contain a TTL.
You can also do it in VCL:
vcl 4.1;
sub vcl_backend_response {
set beresp.ttl = 1h;
}
The lifetime of an object is defined by its TTL. If the TTL is zero, the object is stale. If grace and keep values are set, the TTL can even be less than zero.
An instinctive reaction is to set beresp.ttl = 0s if you want to make
sure an object is not stored in cache. However, you’re doing more harm
than good.
The built-in VCL has a mechanism in place to deal with uncacheable content:
set beresp.ttl = 120s;
set beresp.uncacheable = true;
By setting beresp.uncacheable = true, we’re deciding to cache the
decision not to cache, as explained earlier in the book. We call this
hit-for-miss and hit-for-pass, and these objects are kept for two
minutes.
This metadata is used to bypass the waiting list, as we explained in the under the hood section in chapter 1.
By setting beresp.ttl = 0s, you lose the metadata, requests for this
resource are put on the waiting list, and request coalescing will not
satisfy the request.
The end result is serialization, which means that these items on the waiting list are processed serially rather than in parallel. The impact of serialization is increased latency for the clients.
We said it before, and we’ll say it again: zero TTLs are evil
Websockets are a mechanism that offers full-duplex communication over a single TCP connection. Websockets are used for real-time bi-directional communication between a client and a server without the typical request-response exchange.
Websockets are initiated via HTTP, but the Connection: Upgrade and
Upgrade: websocket headers will trigger a protocol upgrade. This
protocol upgrade results in a persisted open connection between client
and server, where another protocol is used for communication over the
TCP connection.
Here’s an example request:
GET /chat
Host: example.com
Origin: https://example.com
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: Iv8io/9s+lYFgZWcXczP8Q==
Sec-WebSocket-Version: 13
This could be the following:
HTTP 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: hsBlbuDTkk24srzEOTBUlZAlC2g=
And as soon is the protocol has been switched, we’re no longer communicating over HTTP.
If you remember the Varnish finite state machine, and the various
return statements, then you’ll probably agree that return(pipe) is
the way to go here.
The vcl_pipe subroutine is used to deal with traffic that couldn’t be
identified as HTTP. The built-in VCL uses it when Varnish notices
an unsupported request method. The pipe we refer to is the TCP
connection between Varnish and the backend. When a return(pipe) is
executed, the raw bytes are shuffled over the wire, without interpreting
anything as HTTP.
Here’s how you detect websockets in VCL, and how you successfully pipe the request to the backend without the loss of the connection upgrade headers:
sub vcl_recv {
if (req.http.upgrade ~ "(?i)websocket") {
return (pipe);
}
}
sub vcl_pipe {
if (req.http.upgrade) {
set bereq.http.upgrade = req.http.upgrade;
set bereq.http.connection = req.http.connection;
}
}
Edge Side Includes are a powerful hole-punching technique to dissect web pages into separate blocks that are processed as individual HTTP requests.
The ESI tag is a placeholder that is interpreted by Varnish and is replaced by the resource it refers to.
We already talked about this, but as a reminder, this is what an ESI tag looks like:
<esi:include src="/header" />
Varnish can interpret these tags, but this needs to be triggered
through set beresp.do_esi = true. Because this is more computationally
intensive, you don’t want to keep this turned on all the time.
In a lot of cases, people will match the URLs where ESI parsing is required, which might look like this:
vcl 4.1;
sub vcl_backend_response {
if(bereq.url == "/" || bereq.url ~ "^/articles") {
set beresp.do_esi = true;
}
}
Unfortunately, this doesn’t offer you a lot of flexibility: whenever changes in the origin application occur, the VCL file needs to be modified. From a developer empowerment point of view, this is a poor implementation.
Another approach is to make an assumption about what kind of content would require ESI parsing.
The example below looks at the Content-Type, and assumes that all
HTML pages are ESI parsing candidates. So if
Content-Type: text/html is set, ESI parsing is enabled:
vcl 4.1;
sub vcl_backend_response {
if (beresp.http.content-type ~ "text/html") {
set beresp.do_esi = true;
}
}
But again, this results in far too many non-ESI pages being processed.
The preferred solution takes us all the way back to chapter 3, where
we talked about the capabilities of HTTP. The surrogate headers
enable the capability that is most relevant to this use case: by
leveraging the Surrogate-Capability, and the Surrogate-Control
headers, you can negotiate about behavior on the edge.
Varnish can announce ESI support through the following request header:
Surrogate-Capability: varnish="ESI/1.0"
When the origin has detected ESI support on the edge, it can leverage this and request ESI parsing through the following response header:
Surrogate-Control: content="ESI/1.0"
There is in fact a handshake that takes place to negotiate ESI parsing. Here is the VCL required to support this:
vcl 4.1;
sub vcl_recv {
set req.http.Surrogate-Capability = "varnish=ESI/1.0";
}
sub vcl_backend_response {
if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
unset beresp.http.Surrogate-Control;
set beresp.do_esi = true;
}
}
And this is a conventional solution that only consumes CPU cycles to parse ESI when it’s absolutely necessary.
Varnish Cache doesn’t support native TLS; Varnish Enterprise does. However, the most common way to support TLS in Varnish is by terminating it using a TLS proxy. We’ll discuss this in-depth in the TLS section of chapter 7.
But for now, it is important to know that Varnish usually only processes plain HTTP. But thanks to the PROXY protocol, Varnish has more information about the original connection that was made.
Protocol detection and protocol awareness are important for the
origin, because they use this information to build the right URL
schemes. If http:// is used as URL instead of https://, this
might lead to mixed content, which is problematic from a browser point
of view.
If you use a TLS proxy with PROXY protocol support, and connect it to Varnish using a listening socket that supports PROXY, VCL will use the connection metadata to populate the endpoint variables we discussed earlier in this chapter.
The following example uses the std.port(server.ip) expression to
retrieve the server port. Because Varnish only does HTTP, this is
not always 80. If Varnish receives a connection via the PROXY
protocol, the value might be 443 if a TLS proxy terminated the
connection:
vcl 4.1;
import std;
sub vcl_recv {
set req.http.X-Forwarded-Port = std.port(server.ip);
if(req.http.X-Forwarded-Port == "443") {
set req.http.X-Forwarded-Proto = "https";
} else {
set req.http.X-Forwarded-Proto = "http";
}
}
The result of this VCL snippet is the X-Forwar``d``ed-Proto header
being sent to the origin. This header is a conventional one and contains
either http, or https. It’s up to the origin to interpret this
header and act accordingly. This value can be used to force HTTPS
redirection, but also to create the right URLs in hypermedia
resources.
If your TLS proxy communicates with Varnish over the PROXY
protocol, you can leverage vmod_proxy to easily check whether or not
TLS/SSL was used for the request.
vcl 4.1;
import proxy;
sub vcl_recv {
if(proxy.is_ssl()) {
set req.http.X-Forwarded-Proto = "https";
} else {
set req.http.X-Forwarded-Proto = "http";
}
}
As you can see, it’s only a matter of checking proxy.is_ssl(), and
you’re good to go.
If you’re using a recent version of Varnish Enterprise, native TLS
will be supported. If you’ve enabled native TLS using the -A flag,
there is no TLS proxy, and the PROXY protocol isn’t used.
In Varnish Enterprise there is vmod_tls to check TLS parameters
when native TLS is used.
Here’s the vmod_tls equivalent of proxy.is_ssl():
vcl 4.1;
import tls;
sub vcl_recv {
if(tls.is_ssl()) {
set req.http.X-Forwarded-Proto = "https";
} else {
set req.http.X-Forwarded-Proto = "http";
}
}
Instead of using proxy.is_ssl(), there’s tls.is_ssl() to figure out
what protocol was used.
Cache variations were discussed in chapter 3. Using the Vary
header, an origin server can instruct Varnish to create a cache
variation for a specific request header. Vary: Accept-Language would
create a variation per cached object based on the browser language.
Although it is a very powerful instrument, a lot of web applications
don’t use it. If refactoring your application to include Vary is
impossible or too hard, you can also create the variation in VCL.
What better way to illustrate VCL cache variations than by grabbing
the previous example and creating a cache variation on
X-Forwarded-Proto:
vcl 4.1;
import std;
sub vcl_recv {
set req.http.X-Forwarded-Port = std.port(server.ip);
if(req.http.X-Forwarded-Port == "443") {
set req.http.X-Forwarded-Proto = "https";
} else {
set req.http.X-Forwarded-Proto = "http";
}
}
sub vcl_hash (
hash_data(req.http.X-Forwarded-Proto);
}
What we’re basically doing is adding X-Forwarded-Proto to the hash
using hash_data(). Because we’re not returning anything in vcl_hash,
we’re falling back on the built-in VCL, which also adds the request
URL and the host.
Let’s grab yet another example from this chapter to illustrate language
cache variations. Remember the example where we sanitized the
Accept-Language header? Let’s use this example to create a cache
variation:
vcl 4.1;
import accept;
sub vcl_init {
new lang = accept.rule("en");
lang.add("nl");
}
sub vcl_recv {
set req.http.accept-language = lang.filter(req.http.accept-language);
}
sub vcl_hash (
hash_data(req.http.Accept-Language);
}
Because Accept-Language is sanitized, the number of values are
limited, which reduces the number of cache variations. You can
confidently vary on this header. And if you don’t, you can still use
hash_data(req.http.Accept-Language) to do it in VCL.
However, the majority of multilingual websites use a language selection
menu, or a splash page, instead of the Accept-Language header. The
selected language is then stored in a cookie.
But we know that Varnish doesn’t cache when cookies are present
because it implies stateful content. Varying on the Cookie header is
also a bad idea, given the amount of tracking cookies that are injected.
But all is not lost! We can extract the value of the language cookie, and create a variation using VCL.
Imagine this being the language cookie:
Cookie: language=en
This is the VCL code you could use to vary on the language value:
vcl 4.1;
sub vcl_recv {
if (req.http.Cookie) {
set req.http.Cookie = ";" + req.http.Cookie;
set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
set req.http.Cookie = regsuball(req.http.Cookie, ";(language)=", "; \1=");
set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
return(hash);
}
}
sub vcl_hash {
if(req.http.Cookie ~ "^.*language=(nl|en|fr);*.*$") {
hash_data(regsub( req.http.Cookie, "^.*language=(nl|en|fr);*.*$", "\1" ));
} else {
hash_data("en");
}
}
In the vcl_recv subroutine, we’re doing the typical find and replace
magic where we delete all the cookies, except the ones that matter. In
our case that’s the language cookie.
Instead of doing a return(pass) when there are still cookies left, we
deliberately call return(hash), and consider the content cacheable.
In vcl_hash, we check whether or not the cookies have been set. If
not, we add en as the default language cache variation. Otherwise,
we just extract the value from the cookie using the regsub() function.
Because we explicitly defined the list of supported languages in the regular expression, we avoid that too many variations can occur.
Here’s the same example, but with vmod_cookie for those who are on
Varnish Cache 6.4 or later:
vcl 4.1;
import cookie;
sub vcl_recv {
cookie.parse(req.http.cookie);
cookie.keep("language");
set req.http.cookie = cookie.get_string();
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
sub vcl_hash {
if(cookie.get("language") ~ "^(nl|en|fr|de|es)$" ) {
hash_data(cookie.get("language"));
} else (
hash_data("en");
}
}
Here’s the vmod_cookieplus implementation for those who use Varnish
Enterprise:
vcl 4.1;
import cookieplus;
sub vcl_recv {
cookieplus.keep("language");
cookieplus.write();
}
sub vcl_hash {
if(cookieplus.get("language") ~ "^(nl|en|fr|de|es)$" ) {
hash_data(cookieplus.get("language"));
} else (
hash_data("en");
}
}
When a backend response fails, Varnish will return an error page that looks like this:
Error 503 Backend fetch failed
Backend fetch failed
Guru Meditation:
XID: 3
---
Varnish cache server
It looks a bit weird, and the guru meditation message doesn’t look that appealing.
These error messages, and the layout for synthetic responses are part
of the built-in VCL. Here’s the VCL code for vcl_backend_error, in
case of errors:
sub vcl_backend_error {
set beresp.http.Content-Type = "text/html; charset=utf-8";
set beresp.http.Retry-After = "5";
set beresp.body = {"<!DOCTYPE html>
<html>
<head>
<title>"} + beresp.status + " " + beresp.reason + {"</title>
</head>
<body>
<h1>Error "} + beresp.status + " " + beresp.reason + {"</h1>
<p>"} + beresp.reason + {"</p>
<h3>Guru Meditation:</h3>
<p>XID: "} + bereq.xid + {"</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
"};
return (deliver);
}
Regular synthetic responses triggered from client-side VCL logic have a similar VCL implementation:
sub vcl_synth {
set resp.http.Content-Type = "text/html; charset=utf-8";
set resp.http.Retry-After = "5";
set resp.body = {"<!DOCTYPE html>
<html>
<head>
<title>"} + resp.status + " " + resp.reason + {"</title>
</head>
<body>
<h1>Error "} + resp.status + " " + resp.reason + {"</h1>
<p>"} + resp.reason + {"</p>
<h3>Guru Meditation:</h3>
<p>XID: "} + req.xid + {"</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
"};
return (deliver);
}
To tackle the issue, you could modify the string that is assigned to
beresp.body in vcl_backend_response, or resp.body in vcl_synth,
but that can go wrong really quickly.
Not only can it become a copy-paste mess, but you also have to take variable interpolations into account.
The ideal solution is to load a template from a file, potentially replace some placeholder values, and inject the string value into the response body.
Here’s the VCL code:
vcl 4.1;
import std;
sub vcl_synth {
set resp.http.Content-Type = "text/html; charset=utf-8";
set resp.http.Retry-After = "5";
set resp.body = regsuball(std.fileread("/etc/varnish/synth.html"),"<<REASON>>",resp.reason);
return (deliver);
}
sub vcl_backend_error {
set beresp.http.Content-Type = "text/html; charset=utf-8";
set beresp.http.Retry-After = "5";
set beresp.body = regsuball(std.fileread("/etc/varnish/synth.html"),"<<REASON>>",beresp.reason);
return (deliver);
}
This example will use std.fileread() to load a file from disk and
present it as a string. Using regsuball() we’re going to replace all
occurrences of the <<REASON>> placeholder in that file with the actual
reason phrase. This will be provided by either resp.reason or
beresp.reason.
The cool thing about this implementation is that you can have your frontend developers compose and style this file to match the branding of the actual website. It can contain images, CSS, JavaScript, and all the other goodies, but it doesn’t fill up your VCL with very verbose content.
Directly inserting an object in cache when a cache miss occurs is the default Varnish behavior. It also makes sense: we want to avoid hitting the origin server, so caching something as soon as possible is the logical course of action.
However, when you have a limited cache size, inserting random objects may not be very efficient. For long-tail content, you risk filling up your cache with objects that will hardly be requested.
A solution could be to only insert an object in cache on the second
miss. The following example leverages vmod_utils, which is a Varnish
Enterprise VMOD.
vcl 4.1;
import utils;
sub vcl_backend_response {
if (!utils.waitinglist() && utils.backend_misses() == 0) {
set beresp.uncacheable = true;
set beresp.ttl = 24h;
}
}
When a cache miss occurs, and we fetch content from the origin,
utils.backend_misses() will tell us whether or not a hit-for-miss
has already occurred.
As long as this value is 0, we know that this resource has not been
requested, and didn’t result in a hit-for-miss for the last 24 hours.
In that case we will enable beresp.uncacheable and set the TTL to
24 hours.
This ensures that Varnish keeps track of that hit-for-miss for a full day. When the next request for that resource is received during that timeframe, we know it’s somewhat popular, and we can insert the response in cache.
Because of request coalescing, it is possible that other clients are
requesting the same content at exactly the same time. These requests
will be put on the waiting list while the first in line fetches the
content. We cannot hit-for-miss when this happens, because that would
cause request serialization. Luckily utils.waitinglist() gives us
insight into the number of waiting requests for that resource.
The end result is that only hot content is cached, and our precious caching space is less likely to be wasted on long-tail content. Of course you can tune this behavior and choose to only cache on the third or fourth miss. That’s up to you.
Keep in mind that this doesn’t work with hit-for-pass: when you add
return(pass) to this logic to trigger hit-for-pass, the decision not
to cache will be remembered for 24 hours and cannot be undone by the
next cacheable response. This defeats the purpose of this feature.