Before we can start personalizing the caching experience, we need to cover some fundamentals.
An important one is understanding how to access the body of an HTTP request or response.
A request body may contain a field or a parameter that we will use to identify the user. This request body information can also be used to create a cache variation.
We can also inspect and rewrite the response body. This means we can modify the output on the edge without having to access the origin application.
Let’s start with the request body. The request body doesn’t usually
occur in a GET method. Although it is theoretically possible,
Varnish just strips it off before sending the request to the origin.
This means the request body is used for requests that use a POST,
PUT, PATCH, or maybe even a DELETE method.
Unfortunately there is no req.body variable in VCL, and bereq.body
can only be unset.
Accessing the request body starts by calling std.cache_req_body().
This function call is required to ensure that the request body is read
from the client and stored in memory. Otherwise, the request body could
only be read once and could not be accessed in other parts of the VCL.
The function takes an argument that defines the size of a cacheable request body.
The function argument is a byte type, as illustrated below:
vcl 4.1;
import std;
sub vcl_recv {
	if(std.cache_req_body(1KB)) {
		std.log("Request body accessible");
	} else {
		std.log("Request body not accessible");
	}
}
This means the request body is only accessible if the size is smaller than 1 KB.
Once std.cache_req_body() has been called, there is a variety of
VMODs we can use to leverage this information.
Storing the request body in memory by calling std.cache_req_body also
makes sense when you restart or retry a transaction. Because the
request body needs to be sent to the origin multiple times, we cannot
afford losing this information. That’s why we cache it.
The easiest way to access the request body is by using
vmod_bodyaccess. It is an open source VMOD that is part of the
Varnish Software VMOD collection. See chapter 5 to refresh your
memory.
vmod_bodyaccess has a pretty limited API and is able to hash the
request body, calculate the length of the request body, search for
strings in the body, and log the request body to the VSL.
Let’s store the request body in the hash first. Here’s how you do this:
vcl 4.1;
import std;
import bodyaccess;
sub vcl_recv {
	std.cache_req_body(10KB);
	set req.http.x-method = req.method;
	return(hash);
}
sub vcl_hash {
	bodyaccess.hash_req_body();
}
sub vcl_backend_fetch {
	set bereq.method = bereq.http.x-method;
}
Because built-in VCL doesn’t allow POST calls to be served from
cache, we must override this behavior in vcl_recv. It starts with
explicitly calling return(hash) to bypass standard behavior.
Another trick we must do is store the original request method in a
custom request header. If we try to cache a POST call, Varnish
strips the request body and turns the request into a GET request.
That’s why we reset the request method inside vcl_backend_fetch.
It’s also pretty obvious that we’re only caching request bodies that are
at most 10 KB in size. The actual caching happens by adding
bodyaccess.hash_req_body() to vcl_hash.
When you run the following command, you’ll get the output specifically
for the key=value post data:
curl -XPOST -d "key=value" localhost
And when you change the post data to key=otherValue, the output will
be different:
curl -XPOST -d "key=otherValue" localhost
Just look at the
Ageheader. It will tell how long it has been in cache. Changes in post data will result in cache variations, which will result in different values for theAgeheader.
The bodyaccess.rematch_req_body() function allows us to inspect the
request body, making it possible to reject unwanted requests and make
request body caching conditional. Here’s an example:
vcl 4.1;
import std;
import bodyaccess;
sub vcl_recv {
	std.cache_req_body(10KB);
	if(bodyaccess.rematch_req_body("key=[^=]+") == 1) {
		set req.http.x-method = req.method;
		return(hash);
	}
}
sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}
sub vcl_hash {
	bodyaccess.hash_req_body();
}
In this example, we only cache requests where the request body contains
a field named key. If it is set to key=value, then the request body
is cached.
The following curl call is cacheable:
curl -XPOST -d "key=value" localhost
And because the following curl call doesn’t match the key=[^=]+
regular expression, it is not cacheable:
curl -XPOST -d "key" localhost
If you’re interested in what the actual length of the request body was,
you can use the bodyaccess.len_req_body() function.
And if you want the request body to be visible in varnishlog, you can
leverage
bodyaccess.log_req_body(STRING prefix = "", INT length = 200), which
takes a prefix and a max line length argument, so the body could be
split up across multiple lines.
A more feature-rich alternative to vmod_bodyaccess is xbody. It’s
part of Varnish Enterprise, and it’s a very useful tool in your
edge-computing toolbox.
vmod_xbody is capable of caching the request body, just like
vmod_bodyaccess, but it can also access the response body. And more
importantly, the module is also capable of modifying request and
response bodies.
Here’s the vmod_xbody equivalent of request body caching:
vcl 4.1;
import xbody;
import std;
import blob;
sub vcl_recv {
	std.cache_req_body(10KB);
	if(xbody.get_req_body() ~ "key=[^=]+") {
		set req.http.x-method = req.method;
		set req.http.x-hash = blob.encode(encoding=BASE64,blob=xbody.get_req_body_hash(md5));
		return(hash);
	}
}
sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}
sub vcl_hash {
	hash_data(req.http.x-hash);
}
You may have noticed that we execute the xbody.get_req_body_hash()
function from within vcl_recv. That’s because this function is only
accessible from that subroutine. The return type of this function is a
BLOB, so we need vmod_blob to turn it into a string.
In the end, we can use the x-hash request header to transport the
request body hash to the vcl_hash subroutine.
Because xbody.get_req_body() returns a string, we can make our cache
variations more efficient. The vmod_bodyaccess example used the
entire request body as a cache variation. But if we use regsub() we
can choose the exact part of the request body we want to vary on.
Here’s an example where we only create variations on the value of the
key field:
vcl 4.1;
import xbody;
import std;
sub vcl_recv {
	std.cache_req_body(10KB);
	if(xbody.get_req_body() ~ "(^|.+&)key=([^\=\&]+)(&.+|$)") {
		set req.http.x-method = req.method;
		set req.http.x-hash = regsub(xbody.get_req_body(),"(^|.+&)key=([^\=\&]+)(&.+|$)","\2");
		return(hash);
	}
}
sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}
sub vcl_hash {
	hash_data(req.http.x-hash);
}
You’ve probably spotted that the regular expression we use to match the
key is more complicated. That’s true, but because it’s also more
intelligent: key=value is a typical pattern we try to match. But
without the extra regex logic, mykey=value would also match.
The following curl call would create a variation for the term value:
curl -XPOST -d "key=value" localhost
The following curl call, which has a different request body, would
also hit that same variation:
curl -XPOST -d "foo=bar&key=value" localhost
By being more deliberate about the way we create request body variations, we can significantly increase our hit rate.
And because our regular expression is more secure, the following curl
call would miss that variation and result in a cache miss:
curl -XPOST -d "mykey=value" localhost
vmod_json is a Varnish Enterprise module that can parse JSON data
and can return individual JSON fields. From a request body point of
view, the json.parse_req_body() function is of particular interest to
us.
Let’s revisit the earlier examples, and try to cache POST requests
using the function.
vcl 4.1;
import json;
import std;
sub vcl_recv {
	std.cache_req_body(10KB);
	json.parse_req_body();
	if (json.is_valid() && json.is_object() && json.get("key")) {
		set req.http.x-method = req.method;
		return(hash);
	}
}
sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}
sub vcl_hash {
	hash_data(json.get("key"));
}
The json.parse_req_body() function in this example will parse the
request body as JSON and store the result in a new JSON context. Via
json.get() we can fetch individual values at a later stage.
However, via json.is_valid() we can check whether or not the valid
JSON was parsed. Via json.is_object() we can check whether or not
the JSON data is an object. And finally, we check whether or not the
key property is found inside the JSON object by using
json.get("key").
If all of these conditions apply, we can look the object up in cache,
even if the request is a POST request. If not, the built-in VCL will
handle it from there.
And just like in the previous example, we only create variations on
specific fields. In this case the value of the key property.
Here’s a curl call where the payload matches the criteria, which
results in this POST call being cached:
curl -XPOST -d "{ \"key\": \"value\"}" localhost
Even though the following example has a different JSON request
payload, it will also match the initial variation because the key
property exists:
curl -XPOST -d "{ \"key\": \"value\", \"foo\": \"bar\" }" localhost
Analyzing and changing the response body is where it gets really exciting.
In very basic terms, you can change the response body by setting the
beresp.body and resp.body variables.
Unfortunately their usage is very restricted. resp.body can only be
set in vcl_synth, as illustrated below:
vcl 4.1;
sub vcl_recv {
	return(synth(200));
}
sub vcl_synth {
	set  resp.body = "Welcome";
	return(deliver);
}
And beresp.body can only be set in vcl_backend_error:
vcl 4.1;
backend default none;
sub vcl_backend_error {
	set  beresp.body = "Welcome";
	return(deliver);
}
Alternatively, the synthetic() function can be used to achieve the
same, and depending on the subroutine it is used in, either
beresp.body or resp.body will be set.
Remember xbody? As mentioned this VMOD can also inspect and modify the response body.
We promise to present really good examples where vmod_xbody and
vmod_edgestash are combined. But that’s for later; first let us show
you some really basic examples:
Imagine the following obnoxiously hypothetical response body:
Hello world
The following VCL example will replace world with the IP address
of the client:
vcl 4.1;
import xbody;
sub vcl_backend_response {
	xbody.regsub("Hello \w+","Hello " + client.ip);
}
The end result on our local computer would be:
Hello 192.168.16.1
We can also use the xbody.capture() function to capture values that we
can retrieve using xbody.get() and xbody.get_all() afterwards:
vcl 4.1;
import xbody;
import std;
sub vcl_backend_response {
	xbody.capture("name","Hello (\w+)","\1");
}
sub vcl_deliver {
	std.log("Name: " + xbody.get("name"));
}
Although the response body remains untouched when we use
xbody.capture, the captured value will be logged in VSL:
$ varnishlog -g raw -I VCL_Log:Name
	 32770 VCL_Log        c Name: world
Trust us: we’ll show you a more exciting example once we’ve introduced
you to vmod_edgestash.
Speaking of which, Edgestash is one of our favorite Varnish
Enterprise features, which is available through vmod_edgestash.
You’ve probably heard of Mustache, a simple handlebars-based templating language that originated in the JavaScript world. It has tons of implementations on other languages and is somewhat of an industry standard in terms of templating.
Edgestash is a module that processes Mustache handlebars. Basically, you have Mustache on the edge, or Edgestash, if you will.
The idea is that placeholders like {{variable}} are put into your
templates. The business logic of your application is responsible for
parsing the values into those placeholders.
An origin application can emit a placeholder for potentially non-cacheable, personalized content, and have Varnish cache the otherwise uncacheable page and populate it with its required value.
This value may be identified by a session cookie or authentication credentials. The basic business logic that identifies the user and collects the stateful information can be offloaded to Varnish. Edgestash will be responsible for assembling the bits and pieces and parsing it into a single HTTP response body.
Imagine that your origin application returns the following output:
Hello {{name}}
The {{name}} placeholder could then be replaced by the client IP
address using the following VCL code:
vcl 4.1;
import edgestash;
sub vcl_backend_response {
	if(beresp.http.edgestash) {
		edgestash.parse_response();
	}
}
sub vcl_deliver {
	if (edgestash.is_edgestash()) {
		edgestash.add_json({"
		{
			"name":""} + client.ip + {""
		}
		"});
		edgestash.execute();
	}
}
At first, it doesn’t seem more interesting than the xbody.regsub()
example. However, not only does Edgestash support the full Mustache
syntax, the parsing happens at delivery time in vcl_deliver instead
of at cache-insertion time in vcl_backend_response.
This means values could be injected on-the-fly. It’s also important to note that JSON is the basis of the parsing.
It’s also important to note that the previous VCL example only
processes Mustache handlebars when the response contains an
edgestash response header. This avoids wasting CPU cycles on
non-Mustache content.
This is the parsed JSON that is processed by Edgestash:
{
	"name":"192.168.16.1"
}
And this is the final output:
Hello 192.168.16.1
Manually composing a JSON string in edgestash.add_json() can be
clunky at times. A very elegant way to inject JSON is by using the
edgestash.add_json_url().
This function takes an HTTP endpoint as its first argument and expects the response to be JSON output. RESTful APIs are excellent candidates for these endpoints.
This allows you to split cacheable responses and stateful content into separate endpoints.
Here’s an example:
vcl 4.1;
import edgestash;
sub vcl_backend_response {
	if(bereq.url == "/api") {
		edgestash.index_json();
	} elseif(beresp.http.edgestash) {
		edgestash.parse_response();
	}
}
sub vcl_deliver {
	if (edgestash.is_edgestash()) {
		edgestash.add_json_url("/api");
		edgestash.execute();
	}
}
As long as the /api endpoint produces a JSON object that has a
name property, the value can be parsed into the placeholder.
If the JSON endpoint is located on another domain, you can use the
second argument to specify the hostname. This could end up being
edgestash.add_json_url("/api","api.example.com").
The edgestash.index_json() function inside vcl_backend_response will
index the JSON for faster processing when edgestash.execute() is
called.
The Mustache templating language does more than replace placeholders with actual values.
It can perform loops; it has conditionals; there are variables and expression, and basic arithmetic.
Imagine the following JSON output, which represents a shopping cart:
[
  {
	"id": 1,
	"name": "Watch",
	"price": 25,
	"amount": 2
  },
  {
	"id": 2,
	"name": "Shoes",
	"price": 80,
	"amount": 1
  }
]
This is stateful data that depends on a session cookie. The curl
call that is required to retrieve the JSON could be the following:
curl -s -H"Cookie: PHPSESSID=9755a8b773f76bffeda28f746ac3957e" localhost/session
As you can see the Cookie: PHPSESSID=9755a8b773f76bffeda28f746ac3957e
header is set to identify the user.
[
  {
	"id": 1,
	"name": "Watch",
	"price": 25,
	"amount": 2
  },
  {
	"id": 2,
	"name": "Shoes",
	"price": 80,
	"amount": 1
  }
]
The goal is to turn this JSON data into the following HTML code:
<ul>
	<li>Watch: 2 x 25 EUR = 50 EUR</li>
	<li>Shoes: 1 x 80 EUR = 80 EUR</li>
</ul>
This means we have to find a way to list the product name for each item in the cart, but also the price and the product quantity.
The following Mustache syntax would be required to do the job:
<ul>
{{#.}}
	<li>{{name}}: {{amount}} x {{price}} EUR = {{amount * price}} EUR</li>
{{/}}
</ul>
The {{#.}}...{{/.}} expression can be used to iterate over a JSON
array. The {{amount * price}} expression does a multiplication.
Whereas {{#.}}{{/.}} was used in the previous example to iterate
through an array, {{#name}}{{/``name``}} could be used to check
whether or not the name property exists.
Here’s some example JSON:
{
	"name": "Thijs"
}
And here’s the conditional:
{{#name}}Welcome {{name}}{{/name}}
{{^name}}Welcome guest{{/name}}
Under normal circumstances Welcome Thijs would be returned. If for
some reason the name property is not in the JSON output, or the
JSON endpoint is not accessible, Welcome guest would be returned.
In this subsection, we’ll show you an example where we can combine xbody and Edgestash to cache personalized data.
The use case is an e-commerce platform. In this case it’s written in PHP and uses the Symfony framework. There is a shopping cart that shows the number of items in the cart.
The shopping cart is stored by the framework’s session handler in the
/session folder on disk. The session id could, for example, be
9755a8b773f76bffeda28f746ac3957e.
The corresponding session file would be
/sessions/sess_9755a8b773f76bffeda28f746ac3957e, and the cookie that
tracks this session would be
Cookie: PHPSESSID=9755a8b773f76bffeda28f746ac3957e.
Inside sess_9755a8b773f76bffeda28f746ac3957e you could find the
following session data:
_sf2_attributes|a:2:{s:4:"cart";a:1:{i:1;i:9;}s:11:"itemsInCart";i:9;}_sf2_meta|a:3:{s:1:"u";i:1611851104;s:1:"c";i:1611759335;s:1:"l";s:1:"0";}
The session file is serialized using PHP’s built-in serializer. It’s not
exactly JSON, but you can spot certain data structures. The number of
items that this user has in the shopping cart is represented by
s:11:"itemsInCart";i:9;. This means this user has nine items in the
cart.
When you visit the e-commerce platform, this value is visible in the HTML source code:
<span id="items-in-cart">9</span>
When you don’t have any items in the shopping cart, the HTML element remains empty, no session is initialized, and no cookie is set. This means the page is perfectly cacheable.
However, as soon as an element is stored in cache, the cookie is set:
Set-Cookie: PHPSESSID=4fde6819330b5d7d2166ae8fcab71a52; path=/; HttpOnly; SameSite=lax
The Set-Cookie will trigger a hit-for-miss, and subsequent requests
that have the Cookie header will trigger a pass. This makes the
platform uncacheable.
But even if we decide to cache despite the cookie, the shopping cart value will also be cached. This is not acceptable.
An alternative solution would be to create a cache variation per session id. Unfortunately, this will impact the hit rate.
The solution we’re going to apply is a non-intrusive one that doesn’t require any code changes.
First we’re going to match <span id="items-in-cart">9</span> with
xbody.regsub and inject Edgestash handlebars. This makes the page
cacheable.
We’re going to use vmod_kvstore to store the items in the cart inside
Varnish. The key-value store has a value per session id. At delivery
time vmod_edgestash will parse the value into the placeholder and
display the right value per session. This happens without accessing the
origin application.
The key-value store will be populated from the session file. Using
vmod_file we can read the right session file, extract the
itemsInCart value, and store it in the key-value store.
To avoid excessive file system access, we’ll only read the session file
when an item is added to or removed from the cart. This requires
intercepting requests for /add/to/cart/$id and
/remove/from/cart/$id.
Let’s go over the VCL code for our non-intrusive caching solution:
vcl 4.1;
import edgestash;
import xbody;
import cookieplus;
import kvstore;
import file;
sub vcl_init {
	new cart = kvstore.init();
	new sessions = file.init("/sessions/");
}
As you can see, we need a number of VMODs to get the job done. In
vcl_init we’re initializing the key-value store as the cart
object.
We’re also configuring file system access by creating a sessions
object that has access to the /sessions folder.
The next step involves checking for incoming requests:
sub vcl_recv {
	cookieplus.keep("PHPSESSID");
	cookieplus.write();
	if(req.url ~ "^/add/to/cart/[0-9]+$" || req.url ~ "^/remove/from/cart/[0-9]+") {
		return(pass);
	}
	if(req.url == "/") {
		return(hash);
	}
}
We’re making sure that only the PHPSESSID cookie is kept. Any other
cookie is removed.
The next step involves intercepting requests to /add/to/cart/$id and
/remove/from/cart/$id. When either of these is received we perform a
return(pass) to make sure these pages aren’t cached.
Time to see how xbody facilitates the use of Edgestash in
vcl_backend_response:
sub vcl_backend_response {
	if(bereq.url == "/") {
		unset beresp.http.cache-control;
		set beresp.ttl = 3600s;
		xbody.regsub({"(<span id="items-in-cart" [^>]+>)(\w*)(</span>)"},
			{"\1{{items-in-cart}}\3"});
		edgestash.parse_response();
	}
	if(bereq.url ~ "^/add/to/cart/[0-9]+$" || bereq.url ~ "^/remove/from/cart/[0-9]+") {
		call refresh_cart;
	}
}
When we first receive the backend response from the origin server, we look for the HTML element that contains the items in cache.
xbody.regsub() will turn <span id="items-in-cart">9</span> into
<span id="items-in-cart">{{items-in-cache}}</span>. This placeholder
will be cached, and edgestash.parse_response() will ensure it gets
recognized as an Edgestash placeholder.
vcl_backend_response also contains logic to refresh the shopping cart
information when we receive the backend response for requests that add
or delete shopping cart items.
The refresh happens by calling the custom refresh_cart_memcached
subroutine. Let’s have a look at this mysterious refresh_cart
subroutine:
sub refresh_cart {
	if(sessions.exists("sess_" + cookieplus.get("PHPSESSID"))) {
		set beresp.http.session = sessions.read("sess_" + cookieplus.get("PHPSESSID"));
		set beresp.http.items = regsub(beresp.http.session,{".+s:11:"itemsInCart";i:([0-9]+);.+"},"\1");
		cart.set(cookieplus.get("PHPSESSID"),beresp.http.items);
	} else {
		cart.set(cookieplus.get("PHPSESSID"),"0");
	}
	unset beresp.http.session;
	unset beresp.http.items;
}
This subroutine will attach the value of the PHPSESSID to sess_ and
check whether the corresponding file exists on disk. If that is the
case, it reads contents from the file. In our case this will be
sess_9755a8b773f76bffeda28f746ac3957e.
And again, this is the what the session file looks like:
_sf2_attributes|a:2:{s:4:"cart";a:1:{i:1;i:9;}s:11:"itemsInCart";i:9;}_sf2_meta|a:3:{s:1:"u";i:1611851104;s:1:"c";i:1611759335;s:1:"l";s:1:"0";}
Using regsub() we’re going to extract the value of the itemsInCart
key. The .+s:11:"itemsInCart";i:([0-9]+);.+ regular expression takes
care of that, and the first regex capturing group contains this value.
This value gets temporarily stored inside the beresp.http.items
header, before finding its way to the cart key-value store.
Via cart.set(cookieplus.get("PHPSESSID"),beresp.http.items), a key is
stored per session, containing the number of items inside the shopping
cart. This value will be used later by Edgestash. If the session file
doesn’t exist, we set the value to an empty string.
And finally, it’s a matter of parsing the right items-in-cart value into the Edgestash placeholder:
sub vcl_deliver {
	if(edgestash.is_edgestash()) {
		edgestash.add_json({"{ "items-in-cart": ""}
			+ cart.get(cookieplus.get("PHPSESSID"),0)
			+ {"" }"});
		edgestash.execute();
	}
}
In the end we can store a template in cache that we can populate on-the-fly based on a placeholder. In this case, the HTML code of the application didn’t even have the placeholder.
Thanks to xbody, the response body was modified, a placeholder was created, and Edgestash managed to parse in a value per user without having to create a cache variation per user.
We believe that this is a very powerful example of how to combine both modules, along with some other Varnish Enterprise VMODs.
A hard requirement for this example to work was having access to the session files of the application. When Varnish is hosted on the same machine as the origin application, that’s an easy task. Otherwise shared storage would be required. But there are other, more creative ways of tackling this issue, as you will see later in this chapter.