Body access

Before we can start personalizing the caching experience, we need to cover some fundamentals.

An important one is understanding how to access the body of an HTTP request or response.

A request body may contain a field or a parameter that we will use to identify the user. This request body information can also be used to create a cache variation.

We can also inspect and rewrite the response body. This means we can modify the output on the edge without having to access the origin application.

Request body access

Let’s start with the request body. The request body doesn’t usually occur in a GET method. Although it is theoretically possible, Varnish just strips it off before sending the request to the origin.

This means the request body is used for requests that use a POST, PUT, PATCH, or maybe even a DELETE method.

Unfortunately there is no req.body variable in VCL, and bereq.body can only be unset.

Accessing the request body starts by calling std.cache_req_body(). This function call is required to ensure that the request body is read from the client and stored in memory. Otherwise, the request body could only be read once and could not be accessed in other parts of the VCL.

The function takes an argument that defines the size of a cacheable request body.

The function argument is a byte type, as illustrated below:

vcl 4.1;

import std;

sub vcl_recv {
	if(std.cache_req_body(1KB)) {
		std.log("Request body accessible");
	} else {
		std.log("Request body not accessible");
	}
}

This means the request body is only accessible if the size is smaller than 1 KB.

Once std.cache_req_body() has been called, there is a variety of VMODs we can use to leverage this information.

Storing the request body in memory by calling std.cache_req_body also makes sense when you restart or retry a transaction. Because the request body needs to be sent to the origin multiple times, we cannot afford losing this information. That’s why we cache it.

vmod_bodyaccess

The easiest way to access the request body is by using vmod_bodyaccess. It is an open source VMOD that is part of the Varnish Software VMOD collection. See chapter 5 to refresh your memory.

vmod_bodyaccess has a pretty limited API and is able to hash the request body, calculate the length of the request body, search for strings in the body, and log the request body to the VSL.

Let’s store the request body in the hash first. Here’s how you do this:

vcl 4.1;

import std;
import bodyaccess;

sub vcl_recv {
	std.cache_req_body(10KB);
	set req.http.x-method = req.method;
	return(hash);
}

sub vcl_hash {
	bodyaccess.hash_req_body();
}

sub vcl_backend_fetch {
	set bereq.method = bereq.http.x-method;
}

Because built-in VCL doesn’t allow POST calls to be served from cache, we must override this behavior in vcl_recv. It starts with explicitly calling return(hash) to bypass standard behavior.

Another trick we must do is store the original request method in a custom request header. If we try to cache a POST call, Varnish strips the request body and turns the request into a GET request.

That’s why we reset the request method inside vcl_backend_fetch.

It’s also pretty obvious that we’re only caching request bodies that are at most 10 KB in size. The actual caching happens by adding bodyaccess.hash_req_body() to vcl_hash.

When you run the following command, you’ll get the output specifically for the key=value post data:

$ curl -XPOST -d "key=value" localhost

And when you change the post data to key=otherValue, the output will be different:

$ curl -XPOST -d "key=otherValue" localhost

Just look at the Age header. It will tell how long it has been in cache. Changes in post data will result in cache variations, which will result in different values for the Age header.

The bodyaccess.rematch_req_body() function allows us to inspect the request body, making it possible to reject unwanted requests and make request body caching conditional. Here’s an example:

vcl 4.1;

import std;
import bodyaccess;

sub vcl_recv {
	std.cache_req_body(10KB);
	if(bodyaccess.rematch_req_body("key=[^=]+") == 1) {
		set req.http.x-method = req.method;
		return(hash);
	}
}

sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}

sub vcl_hash {
	bodyaccess.hash_req_body();
}

In this example, we only cache requests where the request body contains a field named key. If it is set to key=value, then the request body is cached.

The following curl call is cacheable:

$ curl -XPOST -d "key=value" localhost

And because the following curl call doesn’t match the key=[^=]+ regular expression, it is not cacheable:

$ curl -XPOST -d "key" localhost

If you’re interested in what the actual length of the request body was, you can use the bodyaccess.len_req_body() function.

And if you want the request body to be visible in varnishlog, you can leverage bodyaccess.log_req_body(STRING prefix = "", INT length = 200), which takes a prefix and a max line length argument, so the body could be split up across multiple lines.

xbody

A more feature-rich alternative to vmod_bodyaccess is xbody. It’s part of Varnish Enterprise, and it’s a very useful tool in your edge-computing toolbox.

vmod_xbody is capable of caching the request body, just like vmod_bodyaccess, but it can also access the response body. And more importantly, the module is also capable of modifying request and response bodies.

Here’s the vmod_xbody equivalent of request body caching:

vcl 4.1;

import xbody;
import std;
import blob;

sub vcl_recv {
	std.cache_req_body(10KB);
	if(xbody.get_req_body() ~ "key=[^=]+") {
		set req.http.x-method = req.method;
		set req.http.x-hash = blob.encode(encoding=BASE64,blob=xbody.get_req_body_hash(md5));
		return(hash);
	}
}

sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}

sub vcl_hash {
	hash_data(req.http.x-hash);
}

You may have noticed that we execute the xbody.get_req_body_hash() function from within vcl_recv. That’s because this function is only accessible from that subroutine. The return type of this function is a BLOB, so we need vmod_blob to turn it into a string.

In the end, we can use the x-hash request header to transport the request body hash to the vcl_hash subroutine.

Because xbody.get_req_body() returns a string, we can make our cache variations more efficient. The vmod_bodyaccess example used the entire request body as a cache variation. But if we use regsub() we can choose the exact part of the request body we want to vary on.

Here’s an example where we only create variations on the value of the key field:

vcl 4.1;

import xbody;
import std;

sub vcl_recv {
	std.cache_req_body(10KB);
	if(xbody.get_req_body() ~ "(^|.+&)key=([^\=\&]+)(&.+|$)") {
		set req.http.x-method = req.method;
		set req.http.x-hash = regsub(xbody.get_req_body(),"(^|.+&)key=([^\=\&]+)(&.+|$)","\2");
		return(hash);
	}
}

sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}

sub vcl_hash {
	hash_data(req.http.x-hash);
}

You’ve probably spotted that the regular expression we use to match the key is more complicated. That’s true, but because it’s also more intelligent: key=value is a typical pattern we try to match. But without the extra regex logic, mykey=value would also match.

The following curl call would create a variation for the term value:

$ curl -XPOST -d "key=value" localhost

The following curl call, which has a different request body, would also hit that same variation:

$ curl -XPOST -d "foo=bar&key=value" localhost

By being more deliberate about the way we create request body variations, we can significantly increase our hit rate.

And because our regular expression is more secure, the following curl call would miss that variation and result in a cache miss:

$ curl -XPOST -d "mykey=value" localhost

json.parse_req_body()

vmod_json is a Varnish Enterprise module that can parse JSON data and can return individual JSON fields. From a request body point of view, the json.parse_req_body() function is of particular interest to us.

Let’s revisit the earlier examples, and try to cache POST requests using the function.

vcl 4.1;
import json;
import std;

sub vcl_recv {
	std.cache_req_body(10KB);
	json.parse_req_body();
	if (json.is_valid() && json.is_object() && json.get("key")) {
		set req.http.x-method = req.method;
		return(hash);
	}
}

sub vcl_backend_fetch {
	if(bereq.http.x-method) {
		set bereq.method = bereq.http.x-method;
	}
}

sub vcl_hash {
	hash_data(json.get("key"));
}

The json.parse_req_body() function in this example will parse the request body as JSON and store the result in a new JSON context. Via json.get() we can fetch individual values at a later stage.

However, via json.is_valid() we can check whether or not the valid JSON was parsed. Via json.is_object() we can check whether or not the JSON data is an object. And finally, we check whether or not the key property is found inside the JSON object by using json.get("key").

If all of these conditions apply, we can look the object up in cache, even if the request is a POST request. If not, the built-in VCL will handle it from there.

And just like in the previous example, we only create variations on specific fields. In this case the value of the key property.

Here’s a curl call where the payload matches the criteria, which results in this POST call being cached:

$ curl -XPOST -d "{ \"key\": \"value\"}" localhost

Even though the following example has a different JSON request payload, it will also match the initial variation because the key property exists:

$ curl -XPOST -d "{ \"key\": \"value\", \"foo\": \"bar\" }" localhost

Response body access

Analyzing and changing the response body is where it gets really exciting.

In very basic terms, you can change the response body by setting the beresp.body and resp.body variables.

Unfortunately their usage is very restricted. resp.body can only be set in vcl_synth, as illustrated below:

vcl 4.1;

sub vcl_recv {
	return(synth(200));
}

sub vcl_synth {
	set  resp.body = "Welcome";
	return(deliver);
}

And beresp.body can only be set in vcl_backend_error:

vcl 4.1;

backend default none;

sub vcl_backend_error {
	set  beresp.body = "Welcome";
	return(deliver);
}

Alternatively, the synthetic() function can be used to achieve the same, and depending on the subroutine it is used in, either beresp.body or resp.body will be set.

xbody revisited

Remember xbody? As mentioned this VMOD can also inspect and modify the response body.

We promise to present really good examples where vmod_xbody and vmod_edgestash are combined. But that’s for later; first let us show you some really basic examples:

Imagine the following obnoxiously hypothetical response body:

Hello world

The following VCL example will replace world with the IP address of the client:

vcl 4.1;
import xbody;

sub vcl_backend_response {
	xbody.regsub("Hello \w+","Hello " + client.ip);
}

The end result on our local computer would be:

Hello 192.168.16.1

We can also use the xbody.capture() function to capture values that we can retrieve using xbody.get() and xbody.get_all() afterwards:

vcl 4.1;

import xbody;
import std;

sub vcl_backend_response {
	xbody.capture("name","Hello (\w+)","\1");
}

sub vcl_deliver {
	std.log("Name: " + xbody.get("name"));
}

Although the response body remains untouched when we use xbody.capture, the captured value will be logged in VSL:

$ varnishlog -g raw -I VCL_Log:Name
	 32770 VCL_Log        c Name: world

Trust us: we’ll show you a more exciting example once we’ve introduced you to vmod_edgestash.

Edgestash

Speaking of which, Edgestash is one of our favorite Varnish Enterprise features, which is available through vmod_edgestash.

You’ve probably heard of Mustache, a simple handlebars-based templating language that originated in the JavaScript world. It has tons of implementations on other languages and is somewhat of an industry standard in terms of templating.

Edgestash is a module that processes Mustache handlebars. Basically, you have Mustache on the edge, or Edgestash, if you will.

The idea is that placeholders like {{variable}} are put into your templates. The business logic of your application is responsible for parsing the values into those placeholders.

An origin application can emit a placeholder for potentially non-cacheable, personalized content, and have Varnish cache the otherwise uncacheable page and populate it with its required value.

This value may be identified by a session cookie or authentication credentials. The basic business logic that identifies the user and collects the stateful information can be offloaded to Varnish. Edgestash will be responsible for assembling the bits and pieces and parsing it into a single HTTP response body.

Imagine that your origin application returns the following output:

Hello {{name}}

The {{name}} placeholder could then be replaced by the client IP address using the following VCL code:

vcl 4.1;

import edgestash;

sub vcl_backend_response {
	if(beresp.http.edgestash) {
		edgestash.parse_response();
	}
}

sub vcl_deliver {
	if (edgestash.is_edgestash()) {
		edgestash.add_json({"
		{
			"name":""} + client.ip + {""
		}
		"});
		edgestash.execute();
	}
}

At first, it doesn’t seem more interesting than the xbody.regsub() example. However, not only does Edgestash support the full Mustache syntax, the parsing happens at delivery time in vcl_deliver instead of at cache-insertion time in vcl_backend_response.

This means values could be injected on-the-fly. It’s also important to note that JSON is the basis of the parsing.

It’s also important to note that the previous VCL example only processes Mustache handlebars when the response contains an edgestash response header. This avoids wasting CPU cycles on non-Mustache content.

This is the parsed JSON that is processed by Edgestash:

{
	"name":"192.168.16.1"
}

And this is the final output:

Hello 192.168.16.1

JSON endpoint

Manually composing a JSON string in edgestash.add_json() can be clunky at times. A very elegant way to inject JSON is by using the edgestash.add_json_url().

This function takes an HTTP endpoint as its first argument and expects the response to be JSON output. RESTful APIs are excellent candidates for these endpoints.

This allows you to split cacheable responses and stateful content into separate endpoints.

Here’s an example:

vcl 4.1;

import edgestash;

sub vcl_backend_response {
	if(bereq.url == "/api") {
		edgestash.index_json();
	} elseif(beresp.http.edgestash) {
		edgestash.parse_response();
	}
}

sub vcl_deliver {
	if (edgestash.is_edgestash()) {
		edgestash.add_json_url("/api");
		edgestash.execute();
	}
}

As long as the /api endpoint produces a JSON object that has a name property, the value can be parsed into the placeholder.

If the JSON endpoint is located on another domain, you can use the second argument to specify the hostname. This could end up being edgestash.add_json_url("/api","api.example.com").

The edgestash.index_json() function inside vcl_backend_response will index the JSON for faster processing when edgestash.execute() is called.

Advanced Mustache templating

The Mustache templating language does more than replace placeholders with actual values.

It can perform loops; it has conditionals; there are variables and expression, and basic arithmetic.

Imagine the following JSON output, which represents a shopping cart:

[
  {
	"id": 1,
	"name": "Watch",
	"price": 25,
	"amount": 2
  },
  {
	"id": 2,
	"name": "Shoes",
	"price": 80,
	"amount": 1
  }
]

This is stateful data that depends on a session cookie. The curl call that is required to retrieve the JSON could be the following:

$ curl -s -H"Cookie: PHPSESSID=9755a8b773f76bffeda28f746ac3957e" localhost/session

As you can see the Cookie: PHPSESSID=9755a8b773f76bffeda28f746ac3957e header is set to identify the user.

[
  {
	"id": 1,
	"name": "Watch",
	"price": 25,
	"amount": 2
  },
  {
	"id": 2,
	"name": "Shoes",
	"price": 80,
	"amount": 1
  }
]

The goal is to turn this JSON data into the following HTML code:

<ul>
	<li>Watch: 2 x 25 EUR = 50 EUR</li>
	<li>Shoes: 1 x 80 EUR = 80 EUR</li>
</ul>

This means we have to find a way to list the product name for each item in the cart, but also the price and the product quantity.

The following Mustache syntax would be required to do the job:

<ul>
{{#.}}
	<li>{{name}}: {{amount}} x {{price}} EUR = {{amount * price}} EUR</li>
{{/}}
</ul>

The {{#.}}...{{/.}} expression can be used to iterate over a JSON array. The {{amount * price}} expression does a multiplication.

Whereas {{#.}}{{/.}} was used in the previous example to iterate through an array, {{#name}}{{/``name``}} could be used to check whether or not the name property exists.

Here’s some example JSON:

{
	"name": "Thijs"
}

And here’s the conditional:

{{#name}}Welcome {{name}}{{/name}}
{{^name}}Welcome guest{{/name}}

Under normal circumstances Welcome Thijs would be returned. If for some reason the name property is not in the JSON output, or the JSON endpoint is not accessible, Welcome guest would be returned.

An e-commerce example

In this subsection, we’ll show you an example where we can combine xbody and Edgestash to cache personalized data.

The use case is an e-commerce platform. In this case it’s written in PHP and uses the Symfony framework. There is a shopping cart that shows the number of items in the cart.

Sessions

The shopping cart is stored by the framework’s session handler in the /session folder on disk. The session id could, for example, be 9755a8b773f76bffeda28f746ac3957e.

The corresponding session file would be /sessions/sess_9755a8b773f76bffeda28f746ac3957e, and the cookie that tracks this session would be Cookie: PHPSESSID=9755a8b773f76bffeda28f746ac3957e.

Inside sess_9755a8b773f76bffeda28f746ac3957e you could find the following session data:

_sf2_attributes|a:2:{s:4:"cart";a:1:{i:1;i:9;}s:11:"itemsInCart";i:9;}_sf2_meta|a:3:{s:1:"u";i:1611851104;s:1:"c";i:1611759335;s:1:"l";s:1:"0";}

The session file is serialized using PHP’s built-in serializer. It’s not exactly JSON, but you can spot certain data structures. The number of items that this user has in the shopping cart is represented by s:11:"itemsInCart";i:9;. This means this user has nine items in the cart.

Cacheability

When you visit the e-commerce platform, this value is visible in the HTML source code:

<span id="items-in-cart">9</span>

When you don’t have any items in the shopping cart, the HTML element remains empty, no session is initialized, and no cookie is set. This means the page is perfectly cacheable.

However, as soon as an element is stored in cache, the cookie is set:

Set-Cookie: PHPSESSID=4fde6819330b5d7d2166ae8fcab71a52; path=/; HttpOnly; SameSite=lax

The Set-Cookie will trigger a hit-for-miss, and subsequent requests that have the Cookie header will trigger a pass. This makes the platform uncacheable.

But even if we decide to cache despite the cookie, the shopping cart value will also be cached. This is not acceptable.

An alternative solution would be to create a cache variation per session id. Unfortunately, this will impact the hit rate.

The caching solution

The solution we’re going to apply is a non-intrusive one that doesn’t require any code changes.

First we’re going to match 9 with xbody.regsub and inject Edgestash handlebars. This makes the page cacheable.

We’re going to use vmod_kvstore to store the items in the cart inside Varnish. The key-value store has a value per session id. At delivery time vmod_edgestash will parse the value into the placeholder and display the right value per session. This happens without accessing the origin application.

The key-value store will be populated from the session file. Using vmod_file we can read the right session file, extract the itemsInCart value, and store it in the key-value store.

To avoid excessive file system access, we’ll only read the session file when an item is added to or removed from the cart. This requires intercepting requests for /add/to/cart/$id and /remove/from/cart/$id.

The VCL code

Let’s go over the VCL code for our non-intrusive caching solution:

vcl 4.1;
import edgestash;
import xbody;
import cookieplus;
import kvstore;
import file;

sub vcl_init {
	new cart = kvstore.init();
	new sessions = file.init("/sessions/");
}

As you can see, we need a number of VMODs to get the job done. In vcl_init we’re initializing the key-value store as the cart object.

We’re also configuring file system access by creating a sessions object that has access to the /sessions folder.

The next step involves checking for incoming requests:

sub vcl_recv {
	cookieplus.keep("PHPSESSID");
	cookieplus.write();
	if(req.url ~ "^/add/to/cart/[0-9]+$" || req.url ~ "^/remove/from/cart/[0-9]+") {
		return(pass);
	}
	if(req.url == "/") {
		return(hash);
	}
}

We’re making sure that only the PHPSESSID cookie is kept. Any other cookie is removed.

The next step involves intercepting requests to /add/to/cart/$id and /remove/from/cart/$id. When either of these is received we perform a return(pass) to make sure these pages aren’t cached.

Time to see how xbody facilitates the use of Edgestash in vcl_backend_response:

sub vcl_backend_response {
	if(bereq.url == "/") {
		unset beresp.http.cache-control;
		set beresp.ttl = 3600s;
		xbody.regsub({"(<span id="items-in-cart" [^>]+>)(\w*)(</span>)"},
			{"\1{{items-in-cart}}\3"});
		edgestash.parse_response();
	}
	if(bereq.url ~ "^/add/to/cart/[0-9]+$" || bereq.url ~ "^/remove/from/cart/[0-9]+") {
		call refresh_cart;
	}
}

When we first receive the backend response from the origin server, we look for the HTML element that contains the items in cache.

xbody.regsub() will turn 9 into {{items-in-cache}}. This placeholder will be cached, and edgestash.parse_response() will ensure it gets recognized as an Edgestash placeholder.

vcl_backend_response also contains logic to refresh the shopping cart information when we receive the backend response for requests that add or delete shopping cart items.

The refresh happens by calling the custom refresh_cart_memcached subroutine. Let’s have a look at this mysterious refresh_cart subroutine:

sub refresh_cart {
	if(sessions.exists("sess_" + cookieplus.get("PHPSESSID"))) {
		set beresp.http.session = sessions.read("sess_" + cookieplus.get("PHPSESSID"));
		set beresp.http.items = regsub(beresp.http.session,{".+s:11:"itemsInCart";i:([0-9]+);.+"},"\1");
		cart.set(cookieplus.get("PHPSESSID"),beresp.http.items);
	} else {
		cart.set(cookieplus.get("PHPSESSID"),"0");
	}
	unset beresp.http.session;
	unset beresp.http.items;
}

This subroutine will attach the value of the PHPSESSID to sess_ and check whether the corresponding file exists on disk. If that is the case, it reads contents from the file. In our case this will be sess_9755a8b773f76bffeda28f746ac3957e.

And again, this is the what the session file looks like:

_sf2_attributes|a:2:{s:4:"cart";a:1:{i:1;i:9;}s:11:"itemsInCart";i:9;}_sf2_meta|a:3:{s:1:"u";i:1611851104;s:1:"c";i:1611759335;s:1:"l";s:1:"0";}

Using regsub() we’re going to extract the value of the itemsInCart key. The .+s:11:"itemsInCart";i:([0-9]+);.+ regular expression takes care of that, and the first regex capturing group contains this value. This value gets temporarily stored inside the beresp.http.items header, before finding its way to the cart key-value store.

Via cart.set(cookieplus.get("PHPSESSID"),beresp.http.items), a key is stored per session, containing the number of items inside the shopping cart. This value will be used later by Edgestash. If the session file doesn’t exist, we set the value to an empty string.

And finally, it’s a matter of parsing the right items-in-cart value into the Edgestash placeholder:

sub vcl_deliver {
	if(edgestash.is_edgestash()) {
		edgestash.add_json({"{ "items-in-cart": ""}
			+ cart.get(cookieplus.get("PHPSESSID"),0)
			+ {"" }"});
		edgestash.execute();
	}
}

The end result

In the end we can store a template in cache that we can populate on-the-fly based on a placeholder. In this case, the HTML code of the application didn’t even have the placeholder.

Thanks to xbody, the response body was modified, a placeholder was created, and Edgestash managed to parse in a value per user without having to create a cache variation per user.

We believe that this is a very powerful example of how to combine both modules, along with some other Varnish Enterprise VMODs.

A hard requirement for this example to work was having access to the session files of the application. When Varnish is hosted on the same machine as the origin application, that’s an easy task. Otherwise shared storage would be required. But there are other, more creative ways of tackling this issue, as you will see later in this chapter.