Referring to a point we’ve already made a couple of times:
Most cache invalidation strategies are based on the URL of a request. This only works if the content in your application can easily be mapped to one or more URLs.
Sometimes a content change impacts many URLs, and sometimes it is impossible to know which URLs will need to be evaluated. Under those circumstances, banning and purging doesn’t work.
We already hinted at tag-based invalidation in the previous section of the book.
Instead of identifying objects in the cache based on their request URL, you can use arbitrary tags to identify objects. By invalidating this tag, all objects are purged from the cache at once.
For request-based invalidation, we go through the typical lookup
logic that is triggered in the vcl_hash subroutine: we take the URL
and the Host header, and turn this into a hash key. This key can be
considered the primary key.
But if we start using other identifiers to match objects, such as tags, we can say that there’s a secondary key involved. Hence, the name of the section.
Although the ban("obj.http.tags ~ " + req.http.x-ban-tag) example that
we saw earlier works, it is not really built for the job.
Varnish has two VMODs that store secondary keys for objects, which allow these objects to be purged based on these secondary keys:
vmod_xkey is an open source VMOD that is part of the Varnish
Software VMOD collection.vmod_ykey is the successor of vmod_xkey. It is only available in
Varnish Enterprise.Let’s talk about those VMODS for a minute.
vmod_xkey is part of the Varnish Software VMOD collection. It is
open source, and its API can be found at
https://github.com/varnish/varnish-modules/blob/master/src/vmod_xkey.vcc.
The API for this VMOD is pretty simple. There are only two functions:
xkey.purge()xkey.softpurge()Both functions take a string as an argument. This string refers to the key that needs to be purged. This string may contain an individual key or a space-separated list of keys.
The initialization of vmod_xkey happens automatically. As soon as
import xkey; is part of your VCL file, vmod_xkey will be
bootstrapped, and any new objects that are inserted in cache will be
analyzed.
Xkey will look for the xkey`` or the X-HashTwo response headers
and will register the tags that are exposed through these headers.
As mentioned, vmod_xkey will look for the xkey or the X-HashTwo
response headers. Xkey headers are normally added by the backend
application, but you can also add them in the vcl_backend_response
subroutine. Multiple keys in one header line are separated by spaces or
commas.
So if you want the keys category_sports, id_1265778, type_article
for a page on a news website, the response would look like this:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Cache-Control: public, s-maxage=60
Xkey: category_sports id_1265778 type_article
Once vmod_xkey is imported, and xkey headers of objects are
processed, we have a collection of secondary keys that can be used for
invalidation.
If for some reason all articles from the sports category need to be
purged from cache, it’s just a matter of purging the category_sports
key.
For the implementation of the vmod_xkey invalidation logic, we can
revisit the tag-based invalidation example that used bans. It’s just
a matter of swapping out the ban() function with the corresponding
xkey.purge() function, and some cosmetic changes:
vcl 4.1;
import xkey;
import std;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405));
}
if(!req.http.x-xkey-purge) {
return(synth(400,"x-xkey-purge header missing"));
}
set req.http.x-purges = xkey.purge(req.http.x-xkey-purge);
if (std.integer(req.http.x-purges,0) != 0) {
return(synth(200, req.http.x-purges + " objects purged"));
} else {
return(synth(404, "Key not found"));
}
}
}
If we look back at the example of our news website, we can use the following HTTP request to invalidate all news articles from the sports category:
PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: category_sports
If no matching keys were found, you’ll get an HTTP 404 response; otherwise you’ll get a regular HTTP 200 response containing the number of purged objects.
We started this chapter talking about purging. It’s simple, it’s effective, but it’s not really flexible. Then we introduced banning, which seemed like the perfect alternative, but the flexibility comes a cost.
So here we are, talking about vmod_xkey as a more powerful alternative
for tag-based invalidation. The cost of invalidating many objects with
many keys is a lot lower than for bans.
But there is still a cost, some limitations, and some unpleasant side effects.
vmod_xkey doesn’t scale that well because of its architecture. It is
not a core concept, but rather an afterthought that was introduced in
the form of a VMOD. The Varnish core doesn’t have a framework in
place to natively support secondary keys alongside other parts of the
core.
This means Xkey had to look for an existing mechanism in Varnish that allowed it to safely invalidate content in a multi-threaded context.
The dynamic data structure that is responsible for object expiry
seemed like a good match. The vmod_xkeys interaction with Varnish is
basically a bit forced and piggybacks on the expiry mechanism for safe
access to objects.
The cost that Xkey incurs, both during object insertion and eviction, is added to the expiry data structure. This is mainly due to locking.
While vmod_xkey processes keys during object insertions, or purges
objects using keys, it uses the mutexes of the expiry data structure
for locking. This ensures safe access to these objects, but also blocks
anything else from accessing the expiry mechanism.
This can really bring Varnish to its knees on busy sites where many new objects are inserted or a lot of purges happen.
Another limitation is that vmod_xkey only processes newly inserted
objects. Objects that were already in the cache before import xkey;
took place cannot be purged.
This limitation only occurs when varnishd was started using a VCL
file that did not import vmod_xkey. Any object that was inserted
using that VCL configuration will not be subject to secondary key
inspection.
This can become a tangible issue in cases where custom VCL is deployed to your Varnish server as part of a config management strategy: your Varnish server is first started using boilerplate VCL, and at a later stage in the setup, the config management system deploys a VCL config that uses Xkey.
A third known limitation of Xkey is the fact that it behaves very poorly on persisted MSE caches.
The Massive Storage Engine supports cache persistence by storing objects on disk, while hot objects are kept in memory. Although this is a highly optimized storage component, Xkey doesn’t manage to benefit from this.
Imagine having a persisted cache with one million objects. When
vmod_xkey is imported by the VCL, the persisted cache will look like
new objects to Xkey, and it will start analyzing them one by one.
This analysis process consists of inspecting the Xkey response header.
This means cycling to the header object that is stored in disk. If you
have one million persisted objects stored in cache, one million disk
operations need to take place.
The secondary keys that result from the lookups in cache also need to be composed, which is CPU-intensive. Although this pales in comparison to the disk I/O that is caused by Xkey’s object indexing.
This can make the startup of Varnish extremely slow.
The locking effect that was previously discussed will only be amplified by the use of persisted MSE caches: the waits will be longer, the locking will last longer, and it will take more time for resources to be freed.
It is important to understand that not every Varnish setup will suffer from these performance issues. It’s a matter of scale: the number of requests your Varnish server receives, the number of objects in cache, the number of purges that take place, and the number of inserts the happen.
You might never experience this with
vmod_xkey. But even if you haven’t yet, it could just be a matter of time.
vmod_ykey is the Varnish Enterprise interpretation of what a
secondary key invalidation VMOD should look like.
Its approach and core concepts are similar to vmod_xkey. But it cannot
be considered a v2 of Xkey, because they are completely different
modules in the way they address secondary keys.
The name Ykey is intended to reflect the fact that while Ykey is distinctly different from Xkey, it fulfills a similar use case.
Where vmod_xkey was bolted onto the expiry data structure of
Varnish, vmod_ykey is a proper implementation that is backed by
changes in the Varnish core.
It is important to know that vmod_ykey is only available in Varnish
Enterprise and that the core changes aren’t reflected in the source
code of Varnish Cache.
vmod_xkey is a very square module with tons of sharp edges. It has
very specific rules and not a lot of flexibility. Over time, we received
lots of requests to make Xkey more flexible, but due to its
architecture that was not possible.
The API that Ykey delivers to interface with the VMOD inside VCL
is also not backwards compatible. As a matter of fact: vmod_xkey
operates outside of the scope of VCL.
The lack of a viable upgrade path for vmod_xkey led to the development
of vmod_ykey.
Ykey is integrated into the core of Varnish, and we specifically made sure it works well with MSE.
More specifically, with persisted MSE caches.
As you remember from Xkey, after every restart, all the persisted objects will be reindexed separately. That would result in tons of disk I/O.
vmod_ykey is designed to persist the secondary key index, not in the
MSE stores, but in the MSE books.
More detail about the Massive Storage Engine and its architecture will be presented in the next chapter. But until then, just remember the following two concepts:
The store contains the headers and payload of an object. It is stored in a big pre-allocated file on disk.
The book is a metadatabase, implemented using LMDB. It’s an embedded database based on memory-mapped files.
The fact that indexed keys are persisted in a fast but reliable mechanism doesn’t just speed up invalidation, it also makes indexing a one-time cost.
Indexing doesn’t happen automatically when import ykey; takes place.
The VCL API allows for various rules to be defined, which impacts how
secondary key indexing is done. By default nothing is done until you
instruct Ykey to do so.
Because vmod_ykey doesn’t piggyback on the expiry data structure,
and has its own data structures in the core of Varnish Enterprise, the
expiry mechanism doesn’t block all the time due to locking. This
results in a smoother flow that doesn’t jeopardize regular operations.
As mentioned, vmod_ykey behaves in an entirely different way from
vmod_xkey, especially in terms of indexing. The API reflects this.
An interesting concept is that not all keys should be registered via HTTP response headers:
ykey.add_key() registers an individual key to an object.ykey.add_keys() registers multiple keys to an object, based on a
separator.ykey.add_hashed_keys() registers multiple keys to an object, based
on a separator, with the assumption that they are already hashed.ykey.add_blob() also registers an individual key to an object, but
instead of a string value, a BLOB value is used to create the hash
of the key.Headers are also supported, just like in vmod_xkey, but the VCL API
allows for a lot more flexibility:
ykey.add_header(): registers the header that should be inspected.
Multiple keys coming from that header will be registered as keys, based
on a separator.
A combined VCL example featuring ykey.add_key() and
ykey.add_header() will show you how to implement this:
vcl 4.1;
import ykey;
sub vcl_backend_response {
ykey.add_header(beresp.http.Ykey, ", ");
ykey.add_header(beresp.http.Xkey, " ");
if (beresp.http.Content-Type ~ "^image/") {
ykey.add_key("IMAGE");
}
}
This example will inspect the Ykey header from each HTTP response
and will extract the keys. A comma space separator is used for this.
However, we want to remain compatible with Xkey, so we’re also looking
out for the Xkey header where a space is used as a separator.
Meanwhile, we also tag images automatically if their Content-Type
response header starts with image/. This doesn’t require any response
header being set.
Invalidation of content using Ykey is quite similar to Xkey. The
ykey.purge() function’s API is very similar to xkey.purge().
There is no dedicated soft purge method in vmod_ykey, but the
ykey.purge() method takes a second argument, which is a boolean. When
set to true a soft purge is done, which sets the TTL to zero, but
keeps grace and keep values as they are.
By default the soft purge argument is false.
The following example will use the ykey.purge() function, and
replicate the behavior of the Xkey example:
vcl 4.1;
import ykey;
import std;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405));
}
if(!req.http.x-ykey-purge) {
return(synth(400,"x-ykey-purge header missing"));
}
set req.http.x-purges = ykey.purge(req.http.x-ykey-purge);
if (std.integer(req.http.x-purges,0) != 0) {
return(synth(200, req.http.x-purges + " objects purged"));
} else {
return(synth(404, "Key not found"));
}
}
}
sub vcl_backend_response {
ykey.add_header(beresp.http.Ykey, ", ");
}
The limitation of this example is that ykey.purge() only allows a
single key to be invalidated. Luckily ykey.purge_keys() can take care
of that.
Let’s keep the limitations of the previous example in mind, and write an example that can invalidate multiple keys at once. But to switch it up a bit, we’ll perform a soft purge, which will keep the grace and keeps settings intact.
This means that the burden of the invalidation is not on the next user. If you paid attention, you’ll remember that grace will allow users to receive a stale version of the object, while Varnish asynchronously fetches the new version.
Here’s the code to achieve this:
vcl 4.1;
import ykey;
import std;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405));
}
if(!req.http.x-ykey-purge) {
return(synth(400,"x-ykey-purge header missing"));
}
set req.http.x-purges = ykey.purge_keys(req.http.x-ykey-purge, ", ", true);
if (std.integer(req.http.x-purges,0) != 0) {
return(synth(200, req.http.x-purges + " objects purged"));
} else {
return(synth(404, "Key not found"));
}
}
}
sub vcl_backend_response {
ykey.add_header(beresp.http.Ykey, ", ");
}
A quick heads-up here: this example will use a comma space separator.
The previous example was entirely built on the concepts of vmod_xkey.
Some of the checks aren’t required, as vmod_ykey has native support
for headers through the ykey.purge_header() function.
The difference is subtle and can only be felt if you use multiple
x-ykey-purge headers in a single request.
This is the code:
vcl 4.1;
import ykey;
import std;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405));
}
if(!req.http.x-ykey-purge) {
return(synth(400,"x-ykey-purge header missing"));
}
set req.http.x-purges = ykey.purge_header(req.http.x-ykey-purge, ", ", true);
if (std.integer(req.http.x-purges,0) != 0) {
return(synth(200, req.http.x-purges + " objects purged"));
} else {
return(synth(404, "Key not found"));
}
}
}
sub vcl_backend_response {
ykey.add_header(beresp.http.Ykey, ", ");
}
This is an example where the X-Ykey-Purge header has multiple
occurrences:
PURGE / HTTP/1.1
Host: example.com
X-Ykey-Purge: category_sports
X-Ykey-Purge: category_breaking_news
The ykey.purge_header() will loop through all occurrences because the
req.http.x-ykey-purge argument is treated as a header type, and is
not converted into a string type.
The previous example where we used ykey.purge_keys() wouldn’t support
this because the req.http.x-ykey-purge argument is treated as a
string type, and would only return the first occurrence of the
X-Ykey-Purge header, which would be category_sports.
Another advantage of vmod_ykey is namespace support. It allows
secondary keys to be stored in a namespace to avoid key collisions in
a multi-tenant setup.
Without namespacing, multiple independent clients or backends that use the same Varnish could risk using the same keys.
The ykey.namespace() function allows key indexing at the backend
level, and purging at the client level, to happen in a separate
namespace.
The following example injects ykey.namespace() calls into vcl_recv
for the client-side context, into vcl_backend_response for the
backend-side context, and resets it when not in the same namespace:
vcl 4.1;
import ykey;
import std;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purge) {
return(synth(405));
}
if (!req.http.x-ykey-purge) {
return(synth(400,"x-ykey-purge header missing"));
}
if (req.http.host ~ "tenant1") {
ykey.namespace(req.http.host);
} else {
ykey.namespace_reset();
}
set req.http.x-purges = ykey.purge_header(req.http.x-ykey-purge, ", ", true);
if (std.integer(req.http.x-purges,0) != 0) {
return(synth(200, req.http.x-purges + " objects purged"));
} else {
return(synth(404, "Key not found"));
}
}
}
sub vcl_backend_response {
if (bereq.http.host ~ "tenant1") {
ykey.namespace(bereq.http.host);
} else {
ykey.namespace_reset();
}
ykey.add_header(beresp.http.Ykey, ", ");
}