Varnish Cluster is a solution for increasing cache hit rate in a Varnish Enterprise deployment and reducing load on the origin service. It’s dynamic, scalable, and can be enabled with just a few lines of VCL.
Software:
Networking:
VCL:
sub vcl_recv
multiple times, so manipulation of req.url
and req.http.Host
must be idempotent.DNS: (Applies to dynamic clusters only)
A
, AAAA
, and SRV
records are supported.SRV
records, the port
, weight
, and priority
attributes are respected by default. weight
must not be changed while the cluster is receiving traffic.In a static cluster, each node is defined as a separate backend or director in the VCL. This is a good fit for clusters where nodes are added and removed infrequently.
Step 1: Include cluster.vcl
near the top of your VCL:
include "cluster.vcl";
Step 2: Create a backend or director for each Varnish node and add them to the cluster
director (created by cluster.vcl
):
backend node_a { .host = "ip:port"; }
backend node_b { .host = "ip:port"; }
backend node_c { .host = "ip:port"; }
sub vcl_init {
cluster.add_backend(node_a);
cluster.add_backend(node_b);
cluster.add_backend(node_c);
}
Step 3: Set a cluster token
. This is used to tell regular client requests apart from internal cluster requests.
sub vcl_init {
cluster_opts.set("token", "secret");
}
Step 4: Set req.backend_hint
or bereq.backend
to your origin backend or director.
sub vcl_backend_fetch {
set bereq.backend = origin;
}
Final VCL:
vcl 4.1;
include "cluster.vcl";
backend node_a { .host = "ip:port"; }
backend node_b { .host = "ip:port"; }
backend node_c { .host = "ip:port"; }
backend origin { .host = "ip:port"; }
sub vcl_init {
cluster.add_backend(node_a);
cluster.add_backend(node_b);
cluster.add_backend(node_c);
cluster_opts.set("token", "secret");
}
sub vcl_backend_fetch {
set bereq.backend = origin;
}
In a dynamic cluster, nodes are resolved from a domain name, allowing the cluster to shrink and grow on demand. A good fit for autoscaling clusters.
Step 1: Include cluster.vcl
near the top of your VCL:
include "cluster.vcl";
Step 2: Create a DNS group with a domain name that resolves to all of the cluster node IPs and subscribe the cluster
director to the group:
sub vcl_init {
new cluster_group = activedns.dns_group("varnish.nodes");
cluster.subscribe(cluster_group.get_tag());
}
Step 3: Set a cluster token
. This is used to tell regular client requests apart from internal cluster requests:
sub vcl_init {
cluster_opts.set("token", "secret");
}
Step 4: Set req.backend_hint
or bereq.backend
to your origin backend or director:
sub vcl_backend_fetch {
set bereq.backend = origin;
}
Final VCL:
vcl 4.1;
import activedns;
include "cluster.vcl";
backend origin { .host = "ip:port"; }
sub vcl_init {
new cluster_group = activedns.dns_group("varnish.nodes");
cluster.subscribe(cluster_group.get_tag());
}
sub vcl_backend_fetch {
set bereq.backend = origin;
}
To validate the configuration, the following steps can be taken:
Step 1: Log into any of the Varnish nodes and execute the following command:
varnishstat -1 -f 'KVSTORE.cluster_stats.*'
A set of cluster metrics should appear in the output.
Step 2: Log into any of the Varnish nodes and execute the following command:
varnishadm backend.list -p
The output should contain one backend for each cluster node, including the node itself. Make sure all cluster nodes are marked as healthy.
Step 3: Enable the X-Cluster-Trace
response header by setting the trace
option to true
:
sub vcl_init {
cluster_opts.set("trace", "true");
}
Step 4: Use curl -I
to send a request to any of the cluster nodes. Make sure to request a cacheable object that is not currently in the cache. The response should contain an X-Cluster-Trace
header that shows the requests path through the cluster. The header may look like any of the following:
v1->MISS, origin
: The node has determined itself to be the primary node for this object and fetched it from the origin. Requesting the same object from the v2
Varnish node should then result in a v2->MISS, v1->HIT
trace.v1->MISS->RETRY(1), origin
: The node has likely self-identified, and its self_identified
counter should now be 1
.v1->MISS, v2->MISS, origin
: The request was autosharded to v2
, which fetched the object from the origin.Here, v1
and v2
are the hostnames of two varnish nodes (or the server identity if varnish was started with -i
), and origin
is the name of the VCL backend or director representing the origin server.
The token is used to prove cluster membership for requests from one node to another. Must be set to the same value on all nodes in the same cluster.
It is recommended to set this to a hard-to-guess string:
sub vcl_init {
cluster_opts.set("token", "correct horse battery staple");
}
The number of times a node will retry a backend fetch to other nodes in the cluster before going to the origin. The default max_retries
value of 4
means that with a fallback
value of 3
, failed fetches will automatically retry to other cluster nodes up to three times before making a final fetch attempt to origin.
The default value is 3
. Setting this parameter to 0
makes the cluster nodes immediately retry to the origin when the fetch to another cluster node fails:
sub vcl_init {
cluster_opts.set("fallback", "3");
}
Determines whether or not to return an X-Cluster-Trace
response header to the client. The header provides information about the path a request took through the cluster, and can be useful for testing and troubleshooting.
It is normally recommended to leave this setting at its default value (false
) to avoid exposing request handling to external clients:
sub vcl_init {
cluster_opts.set("trace", "false");
}
Determines the number of primary nodes for each object. An object’s primary node is responsible for fetching it from the origin. The default value of 1
means that each object has exactly one primary node in the cluster, ensuring that the object is fetched only once from the origin.
Setting this value to 2
means that any given object has two primary nodes, and may be fetched from the origin by either node. Requests for an object to non-primary nodes are load balanced over the two primary nodes. This may reduce the strain on cluster nodes in extreme cases, at the cost of duplicate requests to origin.
It is normally recommended to leave this setting at its default value (1
):
sub vcl_init {
cluster_opts.set("primaries", "1");
}
Health checks can be enabled between cluster nodes by adding probes to the cluster backend definition. For the health checks to succeed, a synthetic 200 response can be added to sub vcl_recv
.
Step 1: Define a probe:
probe cluster_probe {
.url = "/health";
}
Step 2: Assign the probe to the cluster nodes.
For static clusters:
backend node_a { .host = "ip:port"; .probe = cluster_probe; }
backend node_b { .host = "ip:port"; .probe = cluster_probe; }
backend node_c { .host = "ip:port"; .probe = cluster_probe; }
For dynamic clusters:
sub vcl_init {
new cluster_group = activedns.dns_group("varnish.nodes");
cluster_group.set_probe_template(cluster_probe);
cluster.subscribe(cluster_group.get_tag());
}
Step 3: Define the health check endpoint at the top of sub vcl_recv
:
sub vcl_recv {
if (req.url == "/health") {
return (synth(200));
}
}
TLS can be enabled between cluster nodes the same way as with regular backends.
For static clusters:
backend node_a { .host = "ip:port"; .ssl = 1; }
backend node_b { .host = "ip:port"; .ssl = 1; }
backend node_c { .host = "ip:port"; .ssl = 1; }
For dynamic clusters:
sub vcl_init {
new cluster_group = activedns.dns_group("varnish.nodes:443");
cluster.subscribe(cluster_group.get_tag());
}
Any request can be marked to skip autosharding and go directly to the origin in case of a cache MISS. This is done by setting the X-Cluster-Skip
header to true
in sub vcl_recv
:
sub vcl_recv {
if (req.url == "/foo") {
set req.http.X-Cluster-Skip = "true";
}
}
Any request can also be marked to skip receiving accounting keys. This is done This is done by setting the X-Cluster-Skip-Accounting
header to true
in sub vcl_recv
:
sub vcl_recv {
if (req.url == "/foo") {
set req.http.X-Cluster-Skip-Accounting = "true";
}
}
Cluster storage capacity can be scaled horizontally with storage sharding. By using the autosharding algorithm to selectively persist objects to disk, the total storage capacity is increased with each node added to the cluster.
To implement storage sharding, import the mse VMOD and add the following snippet to sub vcl_backend_response
:
import mse;
sub vcl_backend_response {
if (bereq.backend == cluster.backend() && !cluster.self_is_next(1)) {
# Storage sharding: Mark the response as memory-only
mse.set_stores("none");
}
}
By making objects memory-only on all but the primary node, we ensure that any given object is persisted to disk on only one node in the cluster. This type of sharding is called full sharding.
Partial sharding is also possible by changing the cluster.self_is_next()
argument from 1
to 2
(or more). This will persist each object on both its primary and secondary node. The cluster can now lose any node without significantly increasing traffic to origin, but the total cluster storage capacity is reduced by 50%.
Cache invalidation can be performed as normal in a cluster, with one significant exception: It must be run twice. Whether PURGEs, BANs, or yKey purges are used, two rounds of invalidation must be performed to guarantee that all matching objects in the cluster have been evaluated.
The first invalidation round will invalidate all primary and non-primary objects currently cached in the cluster. When the first round has been completed, the second round will invalidate all non-primary objects that were created during the first invalidation round. It is important to wait for the first round to complete before starting the second round.
For examples on how to invalidate cache, see the cache invalidation tutorial.
The following varnishtest
counters are created by cluster.vcl
:
error_token: Bad cluster tokens received. This is likely caused by cluster nodes not being configured with the same token, or by overlap between two clusters.
error_fallback_limit: Cluster fallback limit exceeded. Incremented when a backend transaction reaches the cluster fallback
limit. This indicates issues with getting successful responses from the other cluster nodes.
error_unhealthy: No healthy nodes in the cluster. Incremented when autosharding was not possible due to all the nodes in the cluster being marked unhealthy. This is likely caused by health probes failing.
skipped: Autosharding was skipped.
passed: Cluster was bypassed. Incremented for PASS requests and causes the request to skip autosharding and go directly to origin.
hitmiss: Cluster was bypassed. Incremented for Hit-For-Miss requests and causes the request to skip autosharding and go directly to the origin.
self_identified: Node has self-identified. Set to 1
when the node has identified itself with a backend in the cluster
director. Not automatically set to 1
if cluster.set_identity()
has been used instead of self-identification.
These can be observed by running the following varnishstat
command:
varnishstat -1 -f 'KVSTORE.cluster_stats.*'
cluster.vcl
uses the accounting VMOD to make it easier to monitor the cache efficiency of a cluster. An accounting namespace called cluster
is automatically created and used for every request. The following keys may be added to a cluster transaction:
client_deliver: Added in sub vcl_deliver
when a response is being delivered to a real client.
cluster_deliver: Added in sub vcl_deliver
when a response is being delivered to a cluster node.
cluster_backend_response: Added in sub vcl_backend_response
when a response has been received from a cluster node.
origin_backend_response: Added in sub vcl_backend_response
when a response has been received from the origin.
These can be observed by running the following varnishstat
command:
varnishstat -1 -f 'ACCG.cluster.*'
The accounting metrics can for example be used to calculate the cluster-wide cache HIT rate:
client_deliver.client_hit_count + cluster_deliver.client_hit_count /
client_deliver.client_req_count
This calculates the number of client requests that resulted in a cache HIT on either the first or second hop in the cluster divided by the total number of client requests received by the cluster. To get a more complete picture of cluster request handling, the MISS, SYNTH, PASS, and PIPE rates should also be calculated in a similar way.
If a namespace has already been set when sub vcl_recv
is entered in cluster.vcl
(for example in a shared deployment with labeled VCLs), keys are added to that namespace instead of cluster
.
Requests can be excepted from accounting with the X-Cluster-Skip-Accounting header.
The X-Cluster-Trace
response header contains useful information about a given requests path though the cluster. It is based on each nodes server.identity
value, which defaults to the server’s hostname, but may be changed with the varnishd -i
command line argument. The trace header is not transmitted to clients by default, but this can be changed by setting the trace cluster configuration parameter to true
.
cluster.vcl
logs are prefixed with Cluster:
and are logged with the VCL_Log
VSL tag. They can be observed with the following varnishlog
command:
varnishlog -g request -q 'VCL_Log ~ "^Cluster:"' -i VCL_Log
When using a dynamic cluster, backend creation and destruction events can be observed with the following command:
varinshlog -g raw -q 'VCL_Log ~ "^udo:"' -i VCL_Log
And DNS events can be observed with the following commands:
varinshlog -g raw -q 'ADNS ~ "^libadns:"' -i ADNS
A consistent hashing algorithm is used to assign each client request to a primary node in the cluster. The primary node for a request is responsible for fetching it from the origin and optionally persisting it to disk. When a node receives a request it is not the primary for, it will fetch the object from the primary node. We call this autosharding.
Autosharding has two major benefits:
A node will not fetch from the primary node if:
X-Cluster-Skip
request header is set to true
.fallback
limit has been reached.The request hash is by default based on the request’s Host
header and req.url
, but this can be changed in sub vcl_hash
or overridden with cluster.set_hash()
.
Each node in the cluster will automatically discover which backend in the cluster
director corresponds to itself through a procedure we call Self-Identification. This procedure happens each time the VCL is reloaded.
Before a node has established its own identity, it will autoshard all requests like normal, but each fetch includes an X-Cluster-Identifier
header. This identifier is a randomly generated string associated with one of the backends in the cluster
director. When the node eventually receives an identifier that it has generated itself, it knows which backend represents its own identity.
From this point on, whenever the autosharding algorithm determines the primary backend for a given request to be the node itself, the node knows to fetch directly from the origin instead of looping back on itself.
Q: Can Slicer be used with cluster.vcl
?
A: Yes, Slicer can be enabled like normal and will take advantage of autosharding in a cluster. The hash of a Slicer subrequest is based on the top level request, so all Slicer subrequests for the same object are autosharded to the same primary node.
Q: Can ESI be used with cluster.vcl
?
A: Yes, ESI can be enabled like normal and will take advantage of autosharding in a cluster. Unlike Slicer subrequests, the hash of each ESI subrequest is based on the request hash of each subrequest. Make sure your VCL does not set resp.do_esi
to true
in sub vcl_deliver
.
Q: Can cluster.vcl
be used with VCL labels?
A: Yes, each labeled VCL can choose to include cluster.vcl
and define the cluster as normal. It is best practice to define a different cluster token
for each labeled VCL, as it makes it easier to discover misconfigurations in label routing. The root VCL does not need to include cluster.vcl
.
Q: How are background fetches performed between cluster nodes?
A: When a client request hits a stale object in cache on a non-primary node, a background fetch is kicked off as normal to the primary node. For this fetch, any stale object from the primary node is ignored. This happens automatically, and avoids revalidating a stale object with another stale object.
Q: Will PASS requests be autosharded?
A: No, any PASS request will go directly to the origin.
Q: Do cluster nodes communicate though a side-channel?
A: All communication between cluster nodes happens over the regular HTTP(s) listening endpoints (varnishd
-a
or varnishd -A
). There is no side-channel communication outside normal request handling.
Q: Does cluster.vcl
increase memory usage?
A: Cluster headers increase workspace usage by a small amount, but memory usage for the system as a whole should not be affected significantly.
Q: Does cluster.vcl
increase network usage?
A: Network usage will typically stay the same for each Varnish node when clustering is enabled, but network traffic to the origin should decrease. Network usage may increase if the cluster has a low cache HIT rate.
cluster.vcl
is a versioned VCL shipped with Varnish Enterprise. The version is stated at the top of the VCL.