Database access

In terms of interacting with stateful data that can be used to offer a personalized caching experience, we already used the file system and API calls.

Although they are valid candidates as the source of truth, there are limiting factors:

The required files aren’t always available to Varnish.
Reading files may not always provide the right data-querying facilities.
The source data may not be accessible via an API.
The API containing the data may not be equipped to scale along with Varnish, causing a potential outage on the API due to excessive load.

Unless data is readily available in files, or unless data APIs can keep up with Varnish, we need to find another solution.

Having direct access to a database may be the better solution. The term database can refer to many implementations. Some databases may be accessible via a RESTful API, which can be leveraged using vmod_http.

In this section, we’re going to cover four types of databases:

SQLite
Key-value storage (kvstore)
Memcached
Redis

SQLite

For the record: SQLite is a library that implements a serverless, self-contained relational database system. Varnish Enterprise contains a VMOD that interacts with SQLite. We already featured this VMOD in chapters 5 and 2.

In chapter 5 I showed you an example where sessions were stored in the database, and that a cookie value was used to retrieve the username of a logged in user.

This time, we’ll use SQLite to store caching policies about specific pages.

Here are the commands you need to create and populate the database:

sqlite3 sqlite.db <<EOF
CREATE TABLE pages (
	cache BOOLEAN NOT NULL,
	url TEXT NOT NULL,
	host TEXT NOT NULL,
	PRIMARY KEY (url, host)
);

INSERT INTO pages (cache,url,host) VALUES
(0,'/checkout','example.com'),
(1,'/','example.com'),
(1,'/products','example.com'),
(0,'/cart','example.com');
EOF

Once the database has been put in place, we can match the URL and hostname of a page to determine its caching behavior. When the page is not found, the built-in VCL behavior is used.

Here’s the VCL:

vcl 4.1;

import sqlite3;

sub vcl_init {
	sqlite3.open("/etc/varnish/sqlite.db", "|;");
}

sub vcl_fini {
	sqlite3.close();
}

sub vcl_recv {
	set req.http.cache = sqlite3.exec("SELECT `cache` FROM `pages` WHERE url='"
		+ sqlite3.escape(req.url) + "' AND host='"
		+ sqlite3.escape(req.http.host) + "'");
	if(req.http.cache == "1") {
		return(hash);
	} elseif (req.http.cache == "0") {
		return(pass);
	}
}

The output from sqlite3.exec is used to determine the value of the cache database field, based on the url and hostname values.

If there’s a matching row in the database, and the cache field is 1, the page is cacheable and return(hash) is called. If cache is 0, return(pass) is called.

If there’s no matching row, we’re not returning anything, which means the built-in VCL behavior applies.

SQLite is a very lightweight database system and performs quite well for read-only access. As soon as you start writing to the database in VCL, latency will occur because write operations lock the database file.

Key-value storage (kvstore)

Can vmod_kvstore be considered a database? The examples we used throughout the book would suggest otherwise: the key-value store is populated in VCL, and a restart removes all content.

However, there is a very basic level of persistence available that can be triggered via the .init_file() function.

Here’s the vmod_kvstore implementation of the SQLite example, but backed by a file:

vcl 4.1;
import kvstore;

sub vcl_init {
	new pages = kvstore.init();
	pages.init_file("/etc/varnish/pages.store",",");
}

sub vcl_recv {
	set req.http.cache = pages.get(req.http.host+req.url,"");
	if(req.http.cache == "1") {
		return(hash);
	} elseif (req.http.cache == "0") {
		return(pass);
	}
}

The following command can be used to populate the pages.store file that contains the same rules as the SQLite database:

$ cat <<EOF > /etc/varnish/pages.store
> example.com/,1
> example.com/products,1
> example.com/cart,0
> example.com/checkout,0
> EOF

The pages.init_file("/etc/varnish/pages.store",",") function can be called in other places in your VCL when a resynchronization is required.

This persisted kvstore example will perform better than SQLite, but does not offer the flexibility of the SQL language.

Memcached

Memcached is a distributed key-value store that has client implementations in many programming languages. It is extremely fast and scalable, but offers no persistence layer. Technically, Memcached can be viewed as simple a cache that is accessible over the network.

vmod_memcached is an open source VMOD that provides access to a Memcached setup. It is available via https://github.com/varnish/libvmod-memcached, but is also packaged with Varnish Enterprise.

Let’s revisit the basic authentication example from earlier in this chapter. We featured this example to show the power of vmod_http. Let’s strip out the HTTP calls and replace them with Memcached calls.

Here’s the code:

vcl 4.1;

import crypto;
import memcached;

sub vcl_init {
   memcached.servers("--SERVER=192.168.98.101");
   memcached.error_string("error");
}

sub vcl_recv {
   if (req.http.Authorization !~ "^Basic ([a-z-A-Z0-9=]+)$") {
	   return(synth(401,"Authentication required"));
   }

   set req.http.base64 = regsub(req.http.Authorization,"^Basic ([a-z-A-Z0-9=]+)$","\1");
   set req.http.usernamepassword = crypto.string(crypto.base64_decode(req.http.base64));
   set req.http.username = regsub(req.http.usernamepassword,"^([^:]+):([^:]+)$","\1");
   set req.http.password = regsub(req.http.usernamepassword,"^([^:]+):([^:]+)$","\2");
   set req.http.memcached = memcached.get(req.http.username);

   if (req.http.memcached == "error") {
	   return(synth(403));
   }

   if (req.http.password != req.http.memcached) {
	   return(synth(401,"Authentication required"));
   }
   unset req.http.Authorization;
   unset req.http.base64;
   unset req.http.usernamepassword;
   unset req.http.username;
   unset req.http.password;
   unset req.http.memcached;
}

The Memcached server is accessible via 192.168.98.101 on the standard 11211 port and contains login credentials. Varnish uses these credentials to grant or deny access to the platform.

Varnish decodes the Authorization header using the crypto.base64_decode() function. Via regular expressions, the username and password are extracted.

The Memcached key is the username, and the corresponding value is the password. If a Memcached lookup results in an error, this means the user was not found. In that case we return an HTTP 403 response.

If the passwords don’t match, we return an HTTP 401 response, which gives the client the opportunity to try logging in again.

Once authentication is successful, the Authorization header is stripped off to ensure the built-in VCL can consider the request cacheable.

Memcached can also be used to store session information, or as a way to store projected results from relational databases.

Redis

Redis is also a distributed key-value store, like Memcached. It can be considered the successor of Memcached and offers a lot more features. To some extent we can say that Redis is steadily becoming the industry standard.

Unlike Memcached, Redis offers multiple data types and specific commands to interact with them in an atomic way. Redis also offers persistence, replication, security, and many more operational features.

The fun thing about Redis is that it has a LUA scripting language, which allows you to script certain behavior.

There is an open source VMOD available for Redis, which you get via https://github.com/carlosabalde/libvmod-redis. It has a very extensive API.

Let’s feature an example where Redis can be used to provide a personalized caching experience.

A shopping cart example

Remember the shopping cart example from earlier in this chapter? We used the file system to access the session file, and we extract the right key from the serialized session data.

It’s easy to replicate this example and use Redis instead. However, this example will store the product and session data in a more intuitive way:

Products will be stored as Redis hashes and product properties will be stored as fields for the hash.
Shopping cart items will be stored in a Redis list per session.

So whenever someone adds a product to their shopping cart, an RPUSH $sessionId $productId command is sent to Redis. And whenever the quantity of a product in the cart is decreased, an LREM $sessionId 1 $productId is used. When a complete product is removed from the shopping cart, an LREM $sessionId 0 $productId command is sent to Redis.

Computing the number of items in the shopping cart can be done using the following Redis command:

LLEN $sessionId

If we have access to Redis from VCL, there are many ways we can offload this stateful logic from the origin, but in this example we’ll limit it to counting the shopping cart items.

Here’s the VCL code:

vcl 4.1;

import redis;
import cookieplus;
import xbody;
import edgestash;

sub vcl_init {
	new sessions = redis.db(
		location="192.168.98.102:6379",
		shared_connections=false,
		max_connections=1);
}

sub vcl_recv {
	cookieplus.keep("PHPSESSID");
	cookieplus.write();
	if(req.url ~ "^/add/to/cart/[0-9]+$" || req.url ~ "^/remove/from/cart/[0-9]+") {
		return(pass);
	}
	if(req.url == "/") {
		return(hash);
	}
}

sub vcl_backend_response {
	if(bereq.url == "/") {
		unset beresp.http.cache-control;
		set beresp.ttl = 3600s;
		xbody.regsub({"(<span id="items-in-cart" [^>]+>)(\w*)(</span>)"},
		{"\1{{items-in-cart}}\3"});
		edgestash.parse_response();
	}
}

sub vcl_deliver {
	sessions.command("LLEN");
	sessions.push(cookieplus.get("PHPSESSID"));
	sessions.execute();
	if(edgestash.is_edgestash() && sessions.reply_is_integer()) {
		edgestash.add_json({"{ "items-in-cart": ""}
			+ sessions.get_integer_reply()
			+ {"" }"});
		edgestash.execute();
	}
}

Let’s talk through this one:

In vcl_init we initialize a Redis client object called sessions.
In vcl_recv we strip off all cookies except PHPSESSID.
In vcl_recv we don’t allow /add/to/cart/$productId and /remove/from/cart/$productId to be served from cache.
In vcl_recv we explicitly cache the homepage, despite the PHPSESSID cookie being present.
In vcl_backend_response we use xbody.regsub() to replace the items in cart counter with a {{items-in-cart}} Edgestash placeholder.
In vcl_deliver we execute an LLEN Redis command to get the number of items in the shopping cart.
In vcl_deliver we parse the LLEN Redis value in the items-in-cart placeholder.

Instead of temporarily storing the value via vmod_kvstore, we directly connect to Redis at delivery time. Although Redis scales really well, there might be some operational concerns. Please keep in mind that your Redis server should be properly tuned if you receive a lot of incoming requests.