Security

As the significance of online services increases, and as security risks increase at the same time, it is crucial to have the necessary security measures in place.

Widely covered vulnerabilities like Heartbleed, Shellshock, Spectre, and Meltdown were a wakeup call for the IT industry and changed the security landscape.

In this section we’ll cover security from two angles:

Prevention: how do we reduce attack vectors?
Mitigation: how do we reduce the damage if we still manage to get hacked?

Because Varnish operates at the edge, it is our first line of defense, but also the first component that will be under attack.

Although caches are designed to store large amounts of data in a tightly packed space, and although these systems prioritize performance, Varnish itself does an exceptional job in defensive coding practices, secure design, and maintaining cache integrity.

But that doesn’t mean we shouldn’t pay attention to security and potential risks. Let’s look at how we can prevent hacking and mitigate damage.

Firewalling

The very first thing we do is to shut down all ports that are not essential. This is a preventive measure.

Varnish will typically operate on port 80. If native-TLS is active, port 443 also needs to be accessible. The same applies if Hitch is used for TLS termination.

There are situations where Varnish sits behind a load balancer. In that case, the load balancer will be exposed to the outside world, and Varnish isn’t.

Although port 80 and 443 will be exposed to the outside world, there is of course the access to the Varnish CLI.

The Varnish CLI, which runs on port 6082 by default, should only be accessed by IP addresses or IP ranges that are entitled to access it.

In most cases these will be private IP addresses or ranges that aren’t accessible via the internet. In that case it makes sense to set the -T parameter to only listen on an private IP address within the range of the management network.

Cache encryption

In the unlikely event that someone can hack the varnishd process, cached data can be accessed, and maybe even modified. Depending on the sensitivity of that data, this might result in a serious security risk.

A possible mitigation strategy for Varnish Enterprise users is to use Total Encryption.

Total Encryption is a Varnish Enterprise feature, written in VCL, that leverages vmod_crypto.

Using Total Encryption for non-persistent memory caches only requires the following include:

include "total-encryption/random_key.vcl";

The files you include are automatically shipped with Varnish Enterprise.

Objects are encrypted using an AES256 encryption cipher with a dual-key algorithm for extra security.

The first key is 128-bit randomly generated number that is stored in kernel space for the duration of varnishd’s lifetime.
The second key contains the request hash and isn’t stored anywhere.

These two keys are used to create an HMAC signature that represents our master key. The random number is the key of our HMAC signing object, and the request hash is the value that is signed.

The HMAC signature is generated using the Linux Kernel Crypto API. This means that values are never stored in user space and are kept inside the Linux kernel. When varnishd restarts, a new master key is generated.

Once we have the master key, we can start encrypting. Behind the scenes the crypto.aes_encrypt_response() and crypto.aes_decrypt_response() functions are used to encrypt and decrypt content. Encryption happens in a vcl_backend_response hook, whereas the decryption happens in a vcl_deliver hook.

Not only does our AES256 encryption use our dual-key algorithm, we also add a randomly generated salt for extra security.

Total Encryption doesn’t really care whether or not you passed the right master key. It uses whatever key is presented, and if it cannot successfully decrypt the content, garbage is returned. So even if you tamper with the settings, the only way to successfully decrypt an object is if you know the request hash, the master key, and the salt.

And even if you succeed, you can only decrypt that single object because every object uses a different key.

Encrypting persisted cache objects

For persisted objects, our dual-key algorithm is implemented slightly differently.

Because the persisted cache can outlive the varnishd process, we cannot rely on the random key to still be the same for that object.

The solution is to use a local secret key that is stored on disk. We still use the request hash for the second key.

Here’s a safe way to generate the local key:

$ cat /dev/urandom | head -c 1024 > /etc/varnish/disk_secret
$ sudo chmod 600 /etc/varnish/disk_secret
$ sudo chown root: /etc/varnish/disk_secret

As you can see, the key is long enough to be secure, and the permissions are tightly locked down. The end result is the /etc/varnish/disk_secret file.

Varnish Enterprise uses the -E runtime parameter to take in the secret key that is used by the crypto.secret() function to expose it to VCL.

Again, the key is not stored inside the varnishd process but is kept in the Linux kernel.

Here’s an example of the -E runtime parameter:

$ varnishd -a :80 -f /etc/varnish/default.vcl -E /et/varnish/disk_secret

Whereas memory caches include the total-encryption/random_key.vcl file, this is how persisted caches should enable Total Encryption:

include "total-encryption/secret_key.vcl";

The rest of the behavior is identical and will ensure that persisted objects can also be encrypted and decrypted.

Performance impact

Varnish Total Encryption performance is on par with any other AES implementation. AES calculations are hardware accelerated and are quite CPU-intensive.

We performed some benchmarks, and here are some performance results with and without Total Encryption:

Mode	Requests	Bandwidth	Response time
Unencrypted	23068	17.61 Gbit	0.084 ms
Total Encryption	11353	8.68 Gbit	0.135 ms
Overhead	50.78%	50.77%	61.68%

As you can derive from the table, there is a 50% performance overhead when using Total Encryption. These tests were run on a four-core server with 100 KB objects.

Because the performance decrease is related to the CPU, adding more CPUs will bring your performance back to original levels.

Skipping encryption

Primarily because of the performance overhead, there might be situations where you don’t want to encrypt certain objects.

The crypto.aes_skip_response() will make sure the current object is not encrypted. The example below uses this function to skip encryption on video files:

vcl 4.1;

sub vcl_backend_response {
	if (beresp.http.Content-Type ~ "video") {
		crypto.aes_skip_response();
	}
}

If you’re certain that some objects don’t contain any sensitive data, and if you suspect these objects are quite big, skipping them might be a good decision.

Some objects are automatically skipped: if it turns out the object contains an HTTP 304 response, it will be skipped. Because if you remember, an HTTP 304 response has no response body, so there’s no need to encrypt it.

Choosing an alternate encryption cipher

The standard AES implementation uses cipher block chaining (CBC). If you want to switch to propagating cipher block chaining (PCBC), you can set it by modifying the algorithm configuration setting in the te``_opts key-value store.

Here’s how you can do this:

sub vcl_init {
	te_opts.set("algorithm", "pcbc(aes)");
}

Header encryption

Although Total Encryption encrypts the reponse body of an HTTP response, it doesn’t encrypt the headers.

vmod_crypto does have the required methods to achieve this. However, you’ll have to encrypt each header manually, and separately, as you can see in the example below:

sub vcl_backend_response {
	set beresp.http.Content-Type = crypto.hex_encode(crypto.aes_encrypt(beresp.http.Content-Type));
}

sub vcl_deliver {
	if (resp.http.Content-Type != "") {
		set resp.http.Content-Type = crypto.aes_decrypt(crypto.hex_decode(resp.http.Content-Type));
	}
}

### Jailing

*Varnish* uses jails to reduce the privileges of the *Varnish*
processes.

Usually, the `varnishd` process will be run with root privileges. It
uses these privileges to load the files it needs for its operations.

Once that has happened, the jailing mechanism kicks in, and `varnishd`
switches to an alternative user. This is the `varnish` user by default.

The worker process that is spawned will run as the `vcache` user.

It is possible to change these values via the `-j` runtime parameter.

Here's an example where the management process uses the `varnish-mgt`
user, and the worker process uses the `varnish-wrk` user:

``` shell
$ varnishd -a :80 -f /etc/varnish/default.vcl -j unix,user=varnish-mgt,workuser=varnishwrk

It is even possible to define a group to which the varnishd process and subprocesses belong. This is done using the ccgroup configuration option that is also part of the -j runtime parameter.

Here’s an example:

$ varnishd -a :80 -f /etc/varnish/default.vcl \ 
	-j unix,user=varnish-mgtccgroup=varnish-grp,workuser=varnish-wrk

### Making runtime parameters read-only

`varnishd` parameters that are set via the `-p` option can be overridden
using the `param.set` command in `varnishadm`.

Some of these parameters may result in *privilege escalation*. This can
be especially dangerous if *remote CLI access* is available, and the CLI
client gets compromised.

The `-r` option for `varnishd` can make certain parameters read-only.

Here's an example where some sensitive runtime parameters are made
read-only:

``` shell
varnishd -a :80 -f /etc/varnish/default.vcl -r "cc_command, vcc_allow_inline_c, vmod_path"

When we then try to change cc_command via varnishadm, we get the following message:

$ varnishadm param.set cc_command "bla"
parameter "cc_command" is protected.
Command failed with error code 107

VCL security

When you perform tasks in VCL that are restricted to authorized hosts or users, you should write security logic in your VCL file.

We’ve already covered this in chapter 6 when we talked about purging and banning.

Here’s the very first example we used in that chapter, and it contains an ACL to prohibit unauthorized access:

vcl 4.1;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		return (purge);
	}
}

There might even be other parts of your web platform that should only be accessible for specific IP addresses, ranges, or hostnames.

Another way to secure your VCL, or maybe even an additional way, is to add an authentication layer. In this case Varnish would serve as an authentication gateway.

We won’t go into much detail about this because it will be covered in the next chapter. Let’s just throw in a simple example where basic authentication is used on top of the ACL to protect purges:

vcl 4.1;

acl purge {
	"localhost";
	"192.168.55.0"/24;
}

sub vcl_recv {
	if (req.method == "PURGE") {
		if (!client.ip ~ purge) {
			return(synth(405));
		}
		if (! req.http.Authorization ~ "Basic Zm9vOmJhcg==") {
			return(synth(401, "Authentication required"));
		}
		unset req.http.Authorization;
		return (purge);
	}
}

sub vcl_synth {
  if (resp.status == 401) {
	set resp.status = 401;
	set resp.http.WWW-Authenticate = "Basic";
	return(deliver);
  }
}

So not only does the client need to execute the purge from localhost or the 192.168.55.0/24 IP range, the client also needs to log in with username foo and password bar.

But again: more about authentication in the next chapter.

TLS

Remember the Heartbleed security vulnerability? The bug in the OpenSSL library allowed memory to be leaked and exposed sensitive data.

Although this vulnerability should no longer affect software that uses the updated OpenSSL version, it should serve as a warning. Varnish Enterprise uses OpenSSL for its native-TLS feature. Hitch also uses OpenSSL.

Although we really have no reason to suspect similar vulnerabilities to be present, we can put mitigating measures in place by splitting up caching and cryptography into separate services.

It sounds quite mysterious, but the truth is that it just involves using Hitch because Hitch is a separate service that runs under a different user.

Here’s an example where the socket is placed under /var/run/varnish.sock, owned by the varnish user and the varnish group. The file has 660 permissions, which only grants read and write access to the varnish user and users that are in the varnish group:

$ varnishd -a uds=/var/run/varnish.sock,PROXY,user=varnish,group=varnish,mode=660 \ 
	-a http=:80 -f /etc/varnish/default.vcl

Just make sure the following Hitch configuration directives are set:

backend = "/var/run/varnish.sock"
user = "hitch"
group = "varnish"
write-proxy-v2 = on

The fact that the hitch process is owned by the varnish group is what allows it to access /var/run/varnish.sock.

In the unlikely event that your Hitch setup is hacked, only this part of the memory would leak. Varnish itself, which is a separate process, running under a separate user, would remain secure.

Cache busting

Cache busting is a type of attack that involves deliberately causing cache misses to bring down the origin server.

It either happens by calling random URLs or by attaching random query strings to an otherwise cached object.

Regardless of the attack specifics, the goal is to send as much traffic to the origin as possible in an attempt to bring it down.

There are measures we can take to prevent certain types of cache busting as well as measures to mitigate the impact of cache busting.

Let’s have a look at the various options:

Query string filtering

Quite often, cache busting attacks use random query string parameters to cause cache misses. While we cannot completely prevent this from happening, we can at least make sure that we only allow the query string parameters we need.

To enforce this, we are using vmod_urlplus, an enterprise-only VMOD that has the capability of throwing out unwanted query string parameters.

Imagine that the attacker calls tens of thousands of URLs that look like this:

http://example.com/?DA80F1C6-2244-4F48-82FF-807445621783
http://example.com/?59FEF405-3292-4326-A214-53A5681D3E24
http://example.com/?BFD51681-CFE0-4C4B-A276-FB638F5FCB82
http://example.com/?DE5B3B6D-AF08-435D-B361-C5235460418E
http://example.com/?FE0B8B92-E163-4276-B12B-AF9A2B355ED6
http://example.com/?B4D05958-D0A9-4554-A64D-63D6FD4F16C5
http://example.com/?3F3915C5-97B5-4C03-ACB8-BF4FCA63D783
http://example.com/?AF86F196-26D6-4FFC-8A2D-7B5040F6B903
http://example.com/?EA72A80C-DDDB-45AF-BCE1-19357C8484DA

The urlplus.keep() and urlplus.keep_regex() functions will help us get rid of this garbage while still keeping important query string parameters. Here’s the VCL code:

vcl 4.1;

import urlplus;

sub vcl_recv {
	urlplus.query_keep("id");
	urlplus.query_keep("sort");
	urlplus.query_keep_regex("product_*");
	urlplus.write();
}

This example will only keep the following query string parameters while removing all others:

id
sort
All query string parameters that start with product_

Once the filtering is complete, urlplus.write() will not only write back the value to req.url, it will also sort the query string alphabetically.

The sorting feature is great because it prevents cache busting when the attacker reorders the query string parameters.

With this VCL code in place for query string filtering, we could call /?id=1&foo=bar&sort=asc&product_category=shoes&xyz=123 and end up with the following output:

$ varnishlog -g request -i ReqUrl
*   << Request  >> 3964951
-   ReqURL         /?id=1&foo=bar&sort=asc&product_category=shoes&xyz=123
-   ReqURL         /?foo=bar&id=1&product_category=shoes&sort=asc&xyz=123
-   ReqURL         /?id=1&product_category=shoes&sort=asc
**  << BeReq    >> 3964952

The first ReqURL tag displays the input URL.
The second ReqURL tag shows the filtered version.
The third ReqURL tag returns the sorted version.

If you’re not using Varnish Enterprise, you can achieve the same result with the regsuball() function, but it will require writing potentially complex regular expressions.

You may also remember the std.querystring() function. This function is readily available in Varnish Cache and will at least take care of the query string sorting.

Don’t forget that query string filtering only works for names, not for values. You can still add random values to the id and cause cache busting.

If you want to protect the values of your query string parameters, you can check their values, as illustrated here:

if(urlplus.query_get("id") !~ "^[0-9]{1,9}$") {
	return(synth(400));
}

So at least you narrow the values of id down to numeric ones. But this still leaves us with nine digits to abuse, which might be enough to take down the origin.

Max connections

It’s safe to say that query string filtering doesn’t offer a foolproof solution to prevent cache busting.

If we can’t fully prevent this from happening, we can focus on reducing the impact.

By setting the .max_connections backend setting, we can control the maximum number of open connections to the origin server.

Here’s an example where we allow a maximum of 100 connections to the origin server:

vcl 4.1;

backend default {
	.host = "origin.example.com";
	.port = "80";
	.max_connections = 100;
}

Setting .max_connections is always a good idea, not just to prevent attacks. But it is important to know that once the limit is reached, requests will return an HTTP 503 error because the backend is currently not available to those requests.

Unfortunately this is not an elegant solution because you also punish regular visitors. One could even say that the HTTP 503 is just as bad as an outage.

Backend throttling

A better mitigation strategy is to punish the culprit. We can use the vmod_vsthrottle to make this happen.

Previous vmod_vsthrottle examples featured rate limiting at the request level. This prevents users from sending too many requests within a given timeframe.

In this case, we’re moving the rate-limiting logic to the backend side of Varnish: we’ll temporarily block access to Varnish for users that have caused too many backend requests within a given timeframe.

Here’s an example:

vcl 4.1;

import vsthrottle;

sub vcl_backend_fetch {
	if (vsthrottle.is_denied(client.ip, 100, 1s, 1m)) {
		return(error(429, "Too Many Requests"));
	}
}

If the client, identified by its client IP address makes more than 100 requests per second that result in a cache miss, access is prohibited for one minute.

Although vmod_vsthrottle is packaged with Varnish Enterprise, it is an open source module that is part of the Varnish Software VMOD collection. We talked about it in chapter 5 in case you forgot.

Slowloris attacks

Slowloris attacks are denial of service attacks that hold the connection open as long as possible. The goal is to exhaust all available connections and cause new, valid connections to be refused.

Keeping the connection open is not good enough. Varnish has a timeout_idle runtime parameter with a default value of five seconds, which closes the client connection if it has been idle for five seconds.

Slowloris attacks are much smarter than that: they actually send partial requests, adding data as slowly as possible, to prevent the timeout_idle from being triggered.

By tuning the idle_send_timeout, you can control how long Varnish waits in between individual pieces of received data. The default value is 60 seconds, which means Varnish is willing to wait up to one minute in between every line of data being sent.

Luckily, there’s also the send_timeout runtime parameter with a standard value of 600 seconds. This parameter represents the total timeout. Basically a last byte timeout for the request.

If you’re suffering from slowloris attacks, you can tune these settings to mitigate the impact.

Web application firewall

One can say that with the power of VCL, and the way this allows you to intelligently block unwanted requests, Varnish is really also a web application firewall (WAF).

However, writing the VCL, doing the request inspection, and making sure you’re prepared for the next zero-day exploit, can be a lot of work.

To make things easier, and to transform Varnish into an actual WAF, Varnish Enterprise offers a WAF add-on that leverages the ModSecurity library.

Installing the Varnish WAF

If you have the right Varnish Enterprise subscription, you’ll have access to the package repository that contains the varnish-plus-waf package.

The example below installs that package along with Varnish Enterprise itself. This example is targeted at Red Hat, CentOS, and Fedora systems:

$ sudo yum install varnish-plus varnish-plus-waf

If you’re on a Debian or Ubuntu system, you’ll use the following command:

$ sudo apt install varnish-plus varnish-plus-waf

The ModSecurity library has ruleset definitions that are stored in separate files. You can define your own rules, but you can also download the OWASP Core Rule Set (OWASP CRS).

Here’s how you download these rules to your Varnish server:

$ sudo get_owasp_crs

The result is that a collection of files is placed in /etc/varnish/modsec/owasp-crs-{VERSION_NUMBER}, which you can then load into vmod_waf.

In this case, this leads to the following result:

OWASP CRS VERSION v3.1.1 installed to /etc/varnish/modsec/owasp-crs-v3.1.1

The vmod_waf API, and the complexity of the WAF is nicely abstracted by the include "waf.vcl" include.

Here’s how you can easily enable the WAF:

vcl 4.1;

include "waf.vcl";

sub vcl_init {
	varnish_waf.add_files("/etc/varnish/modsec/modsecurity.conf");
	varnish_waf.add_files("/etc/varnish/modsec/owasp-crs-v3.1.1/crs-setup.conf");
	varnish_waf.add_files("/etc/varnish/modsec/owasp-crs-v3.1.1/rules/*.conf");
}

The varnish_waf.add_files() methods will load the various rulesets for the WAF to use.

Here’s an extract of the files inside the rules directory:

REQUEST-900-EXCLUSION-RULES-BEFORE-CRS.conf.example
REQUEST-901-INITIALIZATION.conf
REQUEST-903.9001-DRUPAL-EXCLUSION-RULES.conf
REQUEST-903.9002-WORDPRESS-EXCLUSION-RULES.conf
REQUEST-903.9003-NEXTCLOUD-EXCLUSION-RULES.conf
REQUEST-903.9004-DOKUWIKI-EXCLUSION-RULES.conf
REQUEST-903.9005-CPANEL-EXCLUSION-RULES.conf
REQUEST-905-COMMON-EXCEPTIONS.conf
REQUEST-910-IP-REPUTATION.conf
REQUEST-911-METHOD-ENFORCEMENT.conf
REQUEST-912-DOS-PROTECTION.conf
REQUEST-913-SCANNER-DETECTION.conf
REQUEST-920-PROTOCOL-ENFORCEMENT.conf
REQUEST-921-PROTOCOL-ATTACK.conf
REQUEST-930-APPLICATION-ATTACK-LFI.conf
REQUEST-931-APPLICATION-ATTACK-RFI.conf
REQUEST-932-APPLICATION-ATTACK-RCE.conf
REQUEST-933-APPLICATION-ATTACK-PHP.conf
REQUEST-941-APPLICATION-ATTACK-XSS.conf
REQUEST-942-APPLICATION-ATTACK-SQLI.conf
REQUEST-943-APPLICATION-ATTACK-SESSION-FIXATION.conf
REQUEST-944-APPLICATION-ATTACK-JAVA.conf
REQUEST-949-BLOCKING-EVALUATION.conf
RESPONSE-950-DATA-LEAKAGES.conf
RESPONSE-951-DATA-LEAKAGES-SQL.conf
RESPONSE-952-DATA-LEAKAGES-JAVA.conf
RESPONSE-953-DATA-LEAKAGES-PHP.conf
RESPONSE-954-DATA-LEAKAGES-IIS.conf
RESPONSE-959-BLOCKING-EVALUATION.conf
RESPONSE-980-CORRELATION.conf
RESPONSE-999-EXCLUSION-RULES-AFTER-CRS.conf.example

SQL injections are a common way to take advantage of an application that uses a SQL database. When input parameters are parsed in the SQL statement, and the input is poorly secured, an attacker can inject malicious input in an attempt to retrieve data or to modify the database.

Inside REQUEST-942-APPLICATION-ATTACK-SQLI.conf there are a variety of SQL-related rules; here’s a specific one that detects attempts to run sleep() statements:

SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx (?i:sleep\(\s*?\d*?\s*?\)|benchmark\(.*?\,.*?\))" \
	"id:942160,\
	phase:2,\
	block,\
	capture,\
	t:none,t:urlDecodeUni,\
	msg:'Detects blind sqli tests using sleep() or benchmark().',\
	logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',\
	tag:'application-multi',\
	tag:'language-multi',\
	tag:'platform-multi',\
	tag:'attack-sqli',\
	tag:'OWASP_CRS/WEB_ATTACK/SQL_INJECTION',\
	ver:'OWASP_CRS/3.1.1',\
	severity:'CRITICAL',\
	setvar:'tx.msg=%{rule.msg}',\
	setvar:'tx.sql_injection_score=+%{tx.critical_anomaly_score}',\
	setvar:'tx.anomaly_score_pl1=+%{tx.critical_anomaly_score}',\
	setvar:'tx.%{rule.id}-OWASP_CRS/WEB_ATTACK/SQLI-%{MATCHED_VAR_NAME}=%{tx.0}'"

However, by default the WAF won’t block these kinds of requests because the SecRuleEngine DetectionOnly setting doesn’t allow this.

By setting SecRuleEngine on in /etc/varnish/modsec/modsecurity.conf, requests matching any of the ModSecurity rules will be blocked.

Here’s the malicious request that injects sleep(10) into the request body of a POST request:

$ curl -XPOST -d "sleep(10)" http://example.com

The VSL WAF tag reports the following:

$ varnishlog -g request -i WAF
*   << Request  >> 5
**  << BeReq    >> 6
--  WAF            proto: / POST HTTP/1.1
--  WAF            ReqHeaders: 8
--  WAF            ReqBody: 9
--  WAF            LOG: [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `10' ) [file "/etc/varnish/modsec/owasp-crs-v3.1.1/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 10)"] [data ""] [severity "2"] [ver ""] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/"] [unique_id "161070346090.720057"] [ref ""]

Because we matched the security rule, the anomaly score increased and exceeded the threshold. That’s why we get the Inbound Anomaly Score Exceeded message in the output.

In the end, we received an HTTP/1.1 403 Forbidden response, preventing us from impacting a potentially vulnerable origin server.

When using the Varnish WAF it is advisable to set the thread_pool_stack runtime parameter to 96 KB. This can be done by adding -p thread_pool_stack=96k to varnishd.