Distributed invalidation with Varnish Broadcaster

A ban, a purge, a refresh, even secondary keys, can easily be triggered via simple HTTP requests.

We’ve discussed the invalidation mechanisms that offer the most flexibility when it comes to invalidating multiple objects. But there’s a different kind of flexibility that we’re still lacking that we haven’t talked about.

For really basic setups, Varnish can be hosted on the same machine as the origin. But for mission-critical setups, you want some level of high availability.

This means that in most setups you’ll use more than one Varnish server. Although we’ll discuss high availability in the next chapter, there is one aspect that we need to cover here: invalidating content on multiple Varnish servers.

In situations where multiple Varnish servers are in play, the client is responsible for sending an invalidation call to each client. But as your Varnish inventory increases, keeping track of every server can become challenging, and sending out all those purge calls can become equally challenging.

And that’s where you need Varnish Broadcaster.

Varnish Broadcaster

The Varnish Broadcaster comes in the form of the broadcaster program and is a utility that is shipped with Varnish Enterprise.

As the name indicates, it broadcasts messages to a pre-defined inventory of Varnish servers. This tool was specifically developed to perform purges and bans on multiple Varnish servers through a single point of entry.

In this section we will treat the Varnish Broadcaster as a utility to invalidate the cache across multiple servers, but we will not focus on broadcaster itself, and how it is configured. In the next chapter, we will talk more about certain operational elements of Varnish, and more in-depth information about broadcaster will be covered there.

Varnish inventory

The broadcaster cannot figure out on its own where the Varnish servers are located. For node discovery, it depends on a nodes.conf file where the inventory is specified.

Multiple Varnish endpoints can be described in this file, and nodes can be grouped as well.

Imagine the following setup in nodes.conf:

[eu]
eu-varnish1 = http://varnish1.eu.example.com
eu-varnish2 = http://varnish2.eu.example.com
eu-varnish3 = http://varnish3.eu.example.com

[us]
us-varnish1 = http://varnish1.us.example.com
us-varnish2 = http://varnish2.us.example.com
us-varnish3 = http://varnish3.us.example.com

The setup described in the example above consists of two geographic zones:

An eu zone with three Varnish servers
A us zone with three Varnish servers

By performing a purge call through the Varnish broadcaster, the purges will be broadcast to the following Varnish servers:

http://varnish1.eu.example.com
http://varnish2.eu.example.com
http://varnish3.eu.example.com
http://varnish1.us.example.com
http://varnish2.us.example.com
http://varnish3.us.example.com

If we just want the eu zone to be invalidated, a specific header can be sent to the broadcaster service. This will limit the scope of the broadcasting.

Issuing a purge

If we want to perform a purge on our full inventory, we could send the following request to the broadcaster:

PURGE / HTTP/1.1
Host: example.com

The call itself is identical, but the endpoint we connect to is different:

$ curl -X PURGE example.com:8088/

As you can see the broadcaster endpoint is hosted on a different port than Varnish. Here’s the output you get:

{
	"method": "PURGE",
	"uri": "/",
	"ts": 1603633688,
	"nodes": {
		"eu-varnish1": 200,
		"eu-varnish2": 200,
		"eu-varnish3": 200,
		"us-varnish1": 200,
		"us-varnish2": 200,
		"us-varnish3": 200
	},
	"rate": 100,
	"done": true
}

What you’re seeing is JSON output with metadata of your request, but also the nodes that were called. All six nodes were purged, and each node returned an HTTP 200 status.

Bans and secondary keys

Let’s add a level of complexity and evict objects from the cache based on a regular expression pattern. Under the hood, we use bans to achieve this.

The HTTP request is as follows:

BAN / HTTP/1.1
Host: example.com
X-Ban-Pattern: ^/products/

Here’s the curl implementation:

$ curl -X BAN -H "X-Ban-Pattern: ^/products/" example.com:8088/

If we want to invalidate based on secondary keys, this will be the request:

PURGE / HTTP/1.1
Host: example.com
X-Ykey-Purge: category_sports

Here’s the curl implementation:

$ curl -X PURGE -H "X-Ykey-Purge: category_sports" example.com:8088/

And in both cases the output will look the same:

{
	"method": "PURGE",
	"uri": "/",
	"ts": 1603652566,
	"nodes": {
		"eu-varnish1": 200,
		"eu-varnish2": 200,
		"eu-varnish3": 200,
		"us-varnish1": 200,
		"us-varnish2": 200,
		"us-varnish3": 200
	},
	"rate": 100,
	"done": true
}

Broadcast groups

Because nodes in the nodes.conf file can be grouped, it is possible to only broadcast messages to a single group.

Imagine that the content on Varnish servers in the eu group differs from the us group. In this case, it sometimes makes sense to only invalidate a specific group of servers.

Let’s throw in an example where we want to invalidate all files in the /images folder for the eu group:

BAN / HTTP/1.1
Host: example.com
X-Ban-Pattern: ^/images/
X-Broadcast-Group: eu

This is how you execute this via curl:

$ curl -X BAN -H "X-Ban-Pattern: ^/images/" -H "X-Broadcast-Group: eu" example.com:8088/

The output will be slightly different and will only feature responses from eu nodes:

{
	"method": "PURGE",
	"uri": "/",
	"ts": 1603654040,
	"nodes": {
		"eu-varnish1": 200,
		"eu-varnish2": 200,
		"eu-varnish3": 200
	},
	"rate": 100,
	"done": true
}

Other than the group definition, there are other X-Broadcast headers that can be combined and used to define the broadcasting strategy: * X-Broadcast-Random: if the value of this header is set to *, the broadcaster will only broadcast to one node in each configured group. The node is selected randomly. * X-Broadcast-InOrder: If this header is set to true, the broadcaster will handle each node one after the other. This is useful for purging multi-layer setups from upstream to downstream. * X-Broadcast-Skip: this header blacklists caches as a whitespace-separated list. They will be skipped when processing a group.