Secondary keys

Referring to a point we’ve already made a couple of times:

Most cache invalidation strategies are based on the URL of a request. This only works if the content in your application can easily be mapped to one or more URLs.

Sometimes a content change impacts many URLs, and sometimes it is impossible to know which URLs will need to be evaluated. Under those circumstances, banning and purging doesn’t work.

We already hinted at tag-based invalidation in the previous section of the book.

Instead of identifying objects in the cache based on their request URL, you can use arbitrary tags to identify objects. By invalidating this tag, all objects are purged from the cache at once.

For request-based invalidation, we go through the typical lookup logic that is triggered in the vcl_hash subroutine: we take the URL and the Host header, and turn this into a hash key. This key can be considered the primary key.

But if we start using other identifiers to match objects, such as tags, we can say that there’s a secondary key involved. Hence, the name of the section.

Although the ban("obj.http.tags ~ " + req.http.x-ban-tag) example that we saw earlier works, it is not really built for the job.

Varnish has two VMODs that store secondary keys for objects, which allow these objects to be purged based on these secondary keys:

vmod_xkey is an open source VMOD that is part of the Varnish Software VMOD collection.
vmod_ykey is the successor of vmod_xkey. It is only available in Varnish Enterprise.

Let’s talk about those VMODS for a minute.

vmod_xkey

vmod_xkey is part of the Varnish Software VMOD collection. It is open source, and its API can be found at https://github.com/varnish/varnish-modules/blob/master/src/vmod_xkey.vcc.

The API for this VMOD is pretty simple. There are only two functions:

xkey.purge()
xkey.softpurge()

Both functions take a string as an argument. This string refers to the key that needs to be purged. This string may contain an individual key or a space-separated list of keys.

Initializing vmod_xkey

The initialization of vmod_xkey happens automatically. As soon as import xkey; is part of your VCL file, vmod_xkey will be bootstrapped, and any new objects that are inserted in cache will be analyzed.

Xkey will look for the xkey`` or the X-HashTwo response headers and will register the tags that are exposed through these headers.

Registering keys

As mentioned, vmod_xkey will look for the xkey or the X-HashTwo response headers. Xkey headers are normally added by the backend application, but you can also add them in the vcl_backend_response subroutine. Multiple keys in one header line are separated by spaces or commas.

So if you want the keys category_sports, id_1265778, type_article for a page on a news website, the response would look like this:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Cache-Control: public, s-maxage=60
Xkey: category_sports id_1265778 type_article

Invalidating content

Once vmod_xkey is imported, and xkey headers of objects are processed, we have a collection of secondary keys that can be used for invalidation.

If for some reason all articles from the sports category need to be purged from cache, it’s just a matter of purging the category_sports key.

For the implementation of the vmod_xkey invalidation logic, we can revisit the tag-based invalidation example that used bans. It’s just a matter of swapping out the ban() function with the corresponding xkey.purge() function, and some cosmetic changes:

vcl 4.1;

import xkey;
import std;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		if(!req.http.x-xkey-purge) {
			return(synth(400,"x-xkey-purge header missing"));
		}

		set req.http.x-purges = xkey.purge(req.http.x-xkey-purge);

		if (std.integer(req.http.x-purges,0) != 0) {
			 return(synth(200, req.http.x-purges + " objects purged"));
		} else {
			return(synth(404, "Key not found"));
		}
	}
}

If we look back at the example of our news website, we can use the following HTTP request to invalidate all news articles from the sports category:

PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: category_sports

If no matching keys were found, you’ll get an HTTP 404 response; otherwise you’ll get a regular HTTP 200 response containing the number of purged objects.

vmod_xkey limitations

We started this chapter talking about purging. It’s simple, it’s effective, but it’s not really flexible. Then we introduced banning, which seemed like the perfect alternative, but the flexibility comes a cost.

So here we are, talking about vmod_xkey as a more powerful alternative for tag-based invalidation. The cost of invalidating many objects with many keys is a lot lower than for bans.

But there is still a cost, some limitations, and some unpleasant side effects.

vmod_xkey doesn’t scale that well because of its architecture. It is not a core concept, but rather an afterthought that was introduced in the form of a VMOD. The Varnish core doesn’t have a framework in place to natively support secondary keys alongside other parts of the core.

Locking

This means Xkey had to look for an existing mechanism in Varnish that allowed it to safely invalidate content in a multi-threaded context.

The dynamic data structure that is responsible for object expiry seemed like a good match. The vmod_xkeys interaction with Varnish is basically a bit forced and piggybacks on the expiry mechanism for safe access to objects.

The cost that Xkey incurs, both during object insertion and eviction, is added to the expiry data structure. This is mainly due to locking.

While vmod_xkey processes keys during object insertions, or purges objects using keys, it uses the mutexes of the expiry data structure for locking. This ensures safe access to these objects, but also blocks anything else from accessing the expiry mechanism.

This can really bring Varnish to its knees on busy sites where many new objects are inserted or a lot of purges happen.

Old objects aren’t processed

Another limitation is that vmod_xkey only processes newly inserted objects. Objects that were already in the cache before import xkey; took place cannot be purged.

This limitation only occurs when varnishd was started using a VCL file that did not import vmod_xkey. Any object that was inserted using that VCL configuration will not be subject to secondary key inspection.

This can become a tangible issue in cases where custom VCL is deployed to your Varnish server as part of a config management strategy: your Varnish server is first started using boilerplate VCL, and at a later stage in the setup, the config management system deploys a VCL config that uses Xkey.

Performs poorly with persisted MSE caches

A third known limitation of Xkey is the fact that it behaves very poorly on persisted MSE caches.

The Massive Storage Engine supports cache persistence by storing objects on disk, while hot objects are kept in memory. Although this is a highly optimized storage component, Xkey doesn’t manage to benefit from this.

Imagine having a persisted cache with one million objects. When vmod_xkey is imported by the VCL, the persisted cache will look like new objects to Xkey, and it will start analyzing them one by one.

This analysis process consists of inspecting the Xkey response header. This means cycling to the header object that is stored in disk. If you have one million persisted objects stored in cache, one million disk operations need to take place.

The secondary keys that result from the lookups in cache also need to be composed, which is CPU-intensive. Although this pales in comparison to the disk I/O that is caused by Xkey’s object indexing.

This can make the startup of Varnish extremely slow.

The locking effect that was previously discussed will only be amplified by the use of persisted MSE caches: the waits will be longer, the locking will last longer, and it will take more time for resources to be freed.

It is important to understand that not every Varnish setup will suffer from these performance issues. It’s a matter of scale: the number of requests your Varnish server receives, the number of objects in cache, the number of purges that take place, and the number of inserts the happen.

You might never experience this with vmod_xkey. But even if you haven’t yet, it could just be a matter of time.

vmod_ykey

vmod_ykey is the Varnish Enterprise interpretation of what a secondary key invalidation VMOD should look like.

Its approach and core concepts are similar to vmod_xkey. But it cannot be considered a v2 of Xkey, because they are completely different modules in the way they address secondary keys.

Why Ykey?

The name Ykey is intended to reflect the fact that while Ykey is distinctly different from Xkey, it fulfills a similar use case.

Where vmod_xkey was bolted onto the expiry data structure of Varnish, vmod_ykey is a proper implementation that is backed by changes in the Varnish core.

It is important to know that vmod_ykey is only available in Varnish Enterprise and that the core changes aren’t reflected in the source code of Varnish Cache.

vmod_xkey is a very square module with tons of sharp edges. It has very specific rules and not a lot of flexibility. Over time, we received lots of requests to make Xkey more flexible, but due to its architecture that was not possible.

The API that Ykey delivers to interface with the VMOD inside VCL is also not backwards compatible. As a matter of fact: vmod_xkey operates outside of the scope of VCL.

The lack of a viable upgrade path for vmod_xkey led to the development of vmod_ykey.

vmod_ykey performance improvements

Ykey is integrated into the core of Varnish, and we specifically made sure it works well with MSE.

More specifically, with persisted MSE caches.

As you remember from Xkey, after every restart, all the persisted objects will be reindexed separately. That would result in tons of disk I/O.

vmod_ykey is designed to persist the secondary key index, not in the MSE stores, but in the MSE books.

More detail about the Massive Storage Engine and its architecture will be presented in the next chapter. But until then, just remember the following two concepts:

The store contains the headers and payload of an object. It is stored in a big pre-allocated file on disk.

The book is a metadatabase, implemented using LMDB. It’s an embedded database based on memory-mapped files.

The fact that indexed keys are persisted in a fast but reliable mechanism doesn’t just speed up invalidation, it also makes indexing a one-time cost.

Indexing doesn’t happen automatically when import ykey; takes place. The VCL API allows for various rules to be defined, which impacts how secondary key indexing is done. By default nothing is done until you instruct Ykey to do so.

Because vmod_ykey doesn’t piggyback on the expiry data structure, and has its own data structures in the core of Varnish Enterprise, the expiry mechanism doesn’t block all the time due to locking. This results in a smoother flow that doesn’t jeopardize regular operations.

Registering keys

As mentioned, vmod_ykey behaves in an entirely different way from vmod_xkey, especially in terms of indexing. The API reflects this.

An interesting concept is that not all keys should be registered via HTTP response headers:

ykey.add_key() registers an individual key to an object.
ykey.add_keys() registers multiple keys to an object, based on a separator.
ykey.add_hashed_keys() registers multiple keys to an object, based on a separator, with the assumption that they are already hashed.
ykey.add_blob() also registers an individual key to an object, but instead of a string value, a BLOB value is used to create the hash of the key.

Headers are also supported, just like in vmod_xkey, but the VCL API allows for a lot more flexibility:

ykey.add_header(): registers the header that should be inspected. Multiple keys coming from that header will be registered as keys, based on a separator.

A combined VCL example featuring ykey.add_key() and ykey.add_header() will show you how to implement this:

vcl 4.1;

import ykey;

sub vcl_backend_response {
	ykey.add_header(beresp.http.Ykey, ", ");
	ykey.add_header(beresp.http.Xkey, " ");
	if (beresp.http.Content-Type ~ "^image/") {
		ykey.add_key("IMAGE");
	}
}

This example will inspect the Ykey header from each HTTP response and will extract the keys. A comma space separator is used for this.

However, we want to remain compatible with Xkey, so we’re also looking out for the Xkey header where a space is used as a separator.

Meanwhile, we also tag images automatically if their Content-Type response header starts with image/. This doesn’t require any response header being set.

Invalidating content

Invalidation of content using Ykey is quite similar to Xkey. The ykey.purge() function’s API is very similar to xkey.purge().

There is no dedicated soft purge method in vmod_ykey, but the ykey.purge() method takes a second argument, which is a boolean. When set to true a soft purge is done, which sets the TTL to zero, but keeps grace and keep values as they are.

By default the soft purge argument is false.

A vmod_xkey replica

The following example will use the ykey.purge() function, and replicate the behavior of the Xkey example:

vcl 4.1;

import ykey;
import std;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		if(!req.http.x-ykey-purge) {
			return(synth(400,"x-ykey-purge header missing"));
		}

		set req.http.x-purges = ykey.purge(req.http.x-ykey-purge);

		if (std.integer(req.http.x-purges,0) != 0) {
			 return(synth(200, req.http.x-purges + " objects purged"));
		} else {
			return(synth(404, "Key not found"));
		}
	}
}

sub vcl_backend_response {
	ykey.add_header(beresp.http.Ykey, ", ");
}

The limitation of this example is that ykey.purge() only allows a single key to be invalidated. Luckily ykey.purge_keys() can take care of that.

Multiple keys, soft purging

Let’s keep the limitations of the previous example in mind, and write an example that can invalidate multiple keys at once. But to switch it up a bit, we’ll perform a soft purge, which will keep the grace and keeps settings intact.

This means that the burden of the invalidation is not on the next user. If you paid attention, you’ll remember that grace will allow users to receive a stale version of the object, while Varnish asynchronously fetches the new version.

Here’s the code to achieve this:

vcl 4.1;

import ykey;
import std;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		if(!req.http.x-ykey-purge) {
			return(synth(400,"x-ykey-purge header missing"));
		}

		set req.http.x-purges = ykey.purge_keys(req.http.x-ykey-purge, ", ", true);

		if (std.integer(req.http.x-purges,0) != 0) {
			 return(synth(200, req.http.x-purges + " objects purged"));
		} else {
			return(synth(404, "Key not found"));
		}
	}
}

sub vcl_backend_response {
	ykey.add_header(beresp.http.Ykey, ", ");
}

A quick heads-up here: this example will use a comma space separator.

Native support for headers

The previous example was entirely built on the concepts of vmod_xkey. Some of the checks aren’t required, as vmod_ykey has native support for headers through the ykey.purge_header() function.

The difference is subtle and can only be felt if you use multiple x-ykey-purge headers in a single request.

This is the code:

vcl 4.1;

import ykey;
import std;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		if(!req.http.x-ykey-purge) {
			return(synth(400,"x-ykey-purge header missing"));
		}

		set req.http.x-purges = ykey.purge_header(req.http.x-ykey-purge, ", ", true);

		if (std.integer(req.http.x-purges,0) != 0) {
			 return(synth(200, req.http.x-purges + " objects purged"));
		} else {
			return(synth(404, "Key not found"));
		}
	}
}

sub vcl_backend_response {
	ykey.add_header(beresp.http.Ykey, ", ");
}

This is an example where the X-Ykey-Purge header has multiple occurrences:

PURGE / HTTP/1.1
Host: example.com
X-Ykey-Purge: category_sports
X-Ykey-Purge: category_breaking_news

The ykey.purge_header() will loop through all occurrences because the req.http.x-ykey-purge argument is treated as a header type, and is not converted into a string type.

The previous example where we used ykey.purge_keys() wouldn’t support this because the req.http.x-ykey-purge argument is treated as a string type, and would only return the first occurrence of the X-Ykey-Purge header, which would be category_sports.

Namespacing

Another advantage of vmod_ykey is namespace support. It allows secondary keys to be stored in a namespace to avoid key collisions in a multi-tenant setup.

Without namespacing, multiple independent clients or backends that use the same Varnish could risk using the same keys.

The ykey.namespace() function allows key indexing at the backend level, and purging at the client level, to happen in a separate namespace.

The following example injects ykey.namespace() calls into vcl_recv for the client-side context, into vcl_backend_response for the backend-side context, and resets it when not in the same namespace:

vcl 4.1;

import ykey;
import std;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		if (!req.http.x-ykey-purge) {
			return(synth(400,"x-ykey-purge header missing"));
		}
		if (req.http.host ~ "tenant1") {
			ykey.namespace(req.http.host);
		} else {
			ykey.namespace_reset();
		}
		set req.http.x-purges = ykey.purge_header(req.http.x-ykey-purge, ", ", true);

		if (std.integer(req.http.x-purges,0) != 0) {
			 return(synth(200, req.http.x-purges + " objects purged"));
		} else {
			return(synth(404, "Key not found"));
		}
	}
}

sub vcl_backend_response {
	if (bereq.http.host ~ "tenant1") {
		ykey.namespace(bereq.http.host);
	} else {
		ykey.namespace_reset();
	}
	ykey.add_header(beresp.http.Ykey, ", ");
}