vmod_slicer
lets you enable caching of partial responses.
Instead of asking the backend for the full response, this enables
splitting the object into smaller pieces, with Varnish issuing Range
requests to the backend.
The range fetched will be based on the client’s Range
header,
ensuring we only fetch what is necessary in order to satisfy the
client’s requested range.
The initial fetch where slicing is enabled will result in what we refer to as a segment meta object. This object will not store any response body bytes, and is merely used as a structure for cache lookup and verification of future range requests.
Hitting a segment meta object will trigger a special delivery mode that issues subrequests on its behalf. Each of these subrequests will ask for a specific range of the response. The segments will be stitched together on delivery, presented as a single contiguous response to the client.
Each subrequest will get a full execution of the VCL. In varnishlog
this will be presented as a set of linked subrequest. Executing
varnishlog with the -g request
option will present the top-level
request, all subrequests and any fetches logically grouped together.
Each segment subrequest will also contribute to varnishstat
counters: A single client request may lead to a number of cache hits
and misses, all depending on which segments overlapping with the
client’s requested range are currently in cache.
The Slicer VMOD has two modes of operation: It can be invoked from
either vcl_backend_fetch
or from vcl_backend_response
.
The vcl_backend_fetch
mode has the potential for slightly better
latency, however it is operating with limited knowledge when enabled.
When invoked from vcl_backend_fetch, the Slicer VMOD will turn the GET request into a HEAD. If we then find that the response is eligible for segmented fetch, we will issue separate requests for the relevant parts. Since the initial request was a HEAD the connection can be reused as usual for a future backend connection.
import slicer;
sub vcl_backend_fetch {
slicer.enable();
}
In the event that we later in the processing find that this particular response cannot be used, the fetch will result in a 503 error.
Recovering from a failed enable()
can in this case be accomplished
via a retry in vcl_backend_error
. The following VCL uses
slicer.failed()
to show a possible VCL solution
sub vcl_backend_fetch {
if (!slicer.failed()) {
slicer.enable();
}
}
sub vcl_backend_error {
if (slicer.failed()) {
return (retry);
}
}
This VCL will first attempt slicing and then do a retry with slicing disabled.
If invoked from vcl_backend_response
, the Slicer VMOD will inspect
the response headers to see if it is eligible for segmented
caching. If successful, the backend connection will be closed without
receiving the response body bytes. Further fetching of the body will
be handled in separate slicer subrequests.
The key advantage of enabling the slicer in vcl_backend_response
is
that we have the full response header set available to us, which not
only lets us know immediately if the object is a candidate for
slicing, but also offers the VCL user more information in deciding
whether a particular object should be sliced.
On the other hand, despite deferring the body fetch in separate slicer subrequests, this opportunistic approach to slicing with a GET request allows the backend to send the response body until socket buffers are full. This can lead to several MB of transfer before closing the connection for a response eligible for slicing, an overhead that is not accounted for in varnishstat since it happens in kernel space.
import slicer;
sub vcl_backend_response {
if (!slicer.enable()) {
return (fail);
}
}
Exception handling is done explicitly in the VCL. In the example above
failure to enable slicing is handled via a return (fail))
, which
will result in a 503.
An alternative error handling in the event of failure is to treat it
as a pass to avoid caching of a full-sized response. If the objective
is to also limit the consumption of transient memory, we recommend
enabling the transit_buffer
feature, which will limit the amount of
readahead and thus buffer size required for a pass.
import slicer;
sub vcl_backend_response {
if (!slicer.enable()) {
set beresp.transit_buffer = 5M;
return (pass);
}
}
There are a few preconditions that need to be satisfied. For a reponse to be eligible for slicing, the following requirements apply:
Content-Encoding
header.Content-Length
header.Last-Modified
or an ETag
header.The presence of a request body will prevent slicing from being enabled.
Additionally, if the request was processed as a PASS (including
“Hit-For-Miss”, “Hit-For-Pass” and VCL return (pass)
), slicing will
not be enabled.
For passed requests, a Range header from the client will be maintained and it can thus do Range requests to the backend just fine without help from the slicer.
Note that the presence of a client Range request header is not a condition for enabling the slicer. For enabling the slicer only in the case of a Range header, see the VCL usage example.
The lifetime of a single segment strictly follows the lifetime of the meta object. The TTL of the meta object when it was first inserted will apply to all segments, i.e. all segments belonging to a response will expire at the same time.
This also applies when it comes to invalidation: An invalidation of a segment meta object will also wipe all of its segments. Any form of invalidation is supported (e.g. ban, purge, ykey).
Replication of sliced objects is currently not supported and is explicitly disabled.
The Slicer VMOD fully supports the Varnish Massive Storage Engine, also in persisted mode. Slicer can be enabled with an already populated persisted MSE store, without any need for reinitializing the MSE configuration.
For the case where Slicer has been enabled, followed by a downgrade to a previous non-Slicer-enabled Varnish version, some manual steps need to be taken.
For this case we will end up with segment meta objects and partial response objects in our cache, which the older Varnish version will not be able to make sense of. The result of this is that Varnish will serve empty responses for the requests that would have been previously handled by the Slicer.
To remedy this, the following steps are required in the event of a downgrade where one wishes to maintain the MSE persisted store:
$ sudo varnishadm 'ban obj.http.slicer-meta ~ .'
$ sudo varnishadm 'ban obj.http.slicer-sub ~ .'
VMOD Slicer is available in Varnish Cache Plus 6.0.9r1.
The following is an example on how you may integrate the Slicer VMOD
into your setup. This example will enable slicing of object responses
for the case where the client presented a Range
request header.
This example also implements transit_buffer
as a fallback in case
slicing was not possible.
import slicer;
sub vcl_recv {
if (req.http.Range) {
set req.http.is-range = "1";
}
}
sub vcl_backend_response {
if (bereq.http.is-range && !slicer.enable()) {
set beresp.transit_buffer = 5M;
return (pass);
}
}
BOOL enable(BYTES size = 5242880)
Enables segmented fetch. Varnish will fetch at most size
bytes per
fetch. If no size is provided, a default of 5MB will be used.
Callable from vcl_backend_fetch
and vcl_backend_response
. A
false
return value indicates that slicing could not be performed
for this fetch.
BOOL failed()
Indicates if the slicer was previously attempted enabled and failed. Otherwise returns false. This can be used for implementing VCL error handling. See example above.
BOOL is_top()
Tells us if the ongoing transaction is a top-level segmented
request. This is true if a previous call to enable()
for this
request succeeded and if we are now initiating a segmented fetch.
Callable from vcl_hit, vcl_deliver and vcl_backend_response.
BOOL is_sub()
Tells us if the ongoing transaction is a partial fetch subrequest.
Callable from all VCL subroutines.