Varnish Cache Plus

Rewrite

Description

vmod_rewrite aims to reduce the amount of VCL code dedicated to url and headers manipulation. The usual way of handling things in VCL is a long list of if-else clauses:

sub vcl_recv {
    # simple regex
    if (req.url ~ "pattern1") {
        set req.url = regsub(req.url, "pattern1", "subtitute1");
    # case insensitivity
    else if (req.url ~ "(?i)PaTtERn2") {
        set req.url = regsub(req.url, "(?i)PaTtERn2", "subtitute2");
    # capturing group
    else if (req.url ~ "pattern([0-9]*)") {
        set req.url = regsub(req.url, "pattern([0-9]*)", "subtitute\1");
    ...
}

Using vmod_rewrite, the VCL boils down to:

import rewrite;

sub vcl_init {
    new rs = rewrite.ruleset("/path/to/file.rules");
}

sub vcl_recv {
    set req.url = rs.replace(req.url);
}

with file.rules containing:

"pattern1"        "subtitute1"
"(?i)PaTtERn2"    "subtitute2"
"pattern([0-9]*)" "subtitute\1"
  ...

This is specially useful to clean URL normalization code as well as redirection generation. Thanks to the object-oriented approach, you can create multiple rulesets, one for each task, and keep your VCL code clean and isolated.

API

rewrite.ruleset()

OBJECT ruleset(STRING path=NULL, STRING string=NULL, INT min_fields = 2, ENUM {any, regex, prefix, suffix, exact} type = "regex")

Parse the file indicated by path or contained in string and create a new rules object. This stores all the rewrite rules described in the file.

The file list all the rules, one per line, with following format (except if type=any)::

PAT [SUB...]

If type=any, a first field is inserted to give the type for that line:

TYPE PAT [SUB...]

PAT and SUBs are both quoted strings, TYPE is not, and all are separated by whitespaces. PAT is the regex to match and SUB the string to rewrite the match with. Empty lines and those starting with “#” are ignored.

TYPE (in the rule file) and type (as function argument) can be:

  • regex: PAT is matched as a regular expression.
  • prefix: PAT is a string that tries to match the beginning of the target.
  • suffix: PAT is a string that tries to match the end of the target.
  • exact: PAT is a string and tries to match the full string.
  • any: use the first field in the rule file to decide (can’t be used in the rule file)

min_fields dictate how many strings each line should contain (not including TYPE), if that minimum isn’t reached, the call will fail and the VCL won’t load.

This method is called in vcl_init and you can create as many objects as you need:

sub vcl_init {
    new redirect = rewrite.ruleset("/path/to/redirect.rules");
    new normalize = rewrite.ruleset(string = {"
        # this is a comment
        "pattern1"        "subtitute1"

        "(?i)PaTtERn2"    "subtitute2"
        "pattern([0-9]*)" "subtitute\1"
        "});
}

.add_rules()

VOID .add_rules(STRING path=NULL, STRING string=NULL)

Add rules to an existing ruleset. This is a convenience for split-VCL setups where rules need to be centralized in a single ruleset, but initialized in multiple places to be co-located with related logic.

This function can only be called from vcl_init. Just like ruleset constructors, path and string arguments are mutual-exclusive. When rules are added in multiple places, they are then treated in the same order they were added. It is possible to both specify rules in a ruleset constructor and then add more rules.

.match()

BOOL .match(STRING)

Returns true if a ruled in the rule set matched the argument, false otherwise.

Example:

import rewrite;

sub vcl_init {
	new rs = rewrite.ruleset({"
		"^/admin/"
		"^/purge/"
		"^/private"
	"}, min_fields = 1);
}

sub vcl_recv {
	if (rs.match(req.url)) {
		return (synth(405, "Restricted");
	}
}

.rewrite()

STRING .rewrite(INT field = 2, ENUM {regsub, regsuball, only_matching} mode = "regsuball")

.rewrite() is called after .match(), and applies the previously matched rule, skipping the lookup operation.

By default, the first substitute string (index 2 of the rule definition) is used, but you can specify a different field if needed. If the field doesn’t exist, the string is not rewritten.

mode dictates how the string should be rewritten:

  • regsub: replace only the first match found
  • regsuball: replace all matches
  • only_matching: only output the rewritten (first) match, discarding both the prefix and suffix. It’s the same a the --only-matching option of GNU grep.

For example, considering this rule::

"bar" "qux"

and the string “/foo/bar/bar”:

  • regsub will output “/foo/qux/bar”
  • regsuball will output “/foo/qux/qux”
  • only_matching will output “qux”

You can use this function to retrieve multiple values associated to one rule:

import std;
import rewrite;

sub vcl_init {
	new rs = rewrite.ruleset({"
		# pattern       ttl     grace   keep
		"\.(js|css)"    "1m"    "10m"   "1d"
		"\.(jpg|png)"   "1w"    "1w"    "10w"
	"});
}

sub vcl_backend_response {
	# if there's a match, convert text to duration
	if (rs.match(bereq.url)) {
		set beresp.ttl   = std.duration(rs.rewrite(0, mode = only_matching), 0s);
		set beresp.grace = std.duration(rs.rewrite(1, mode = only_matching), 0s);
		set beresp.keep  = std.duration(rs.rewrite(2, mode = only_matching), 0s);
	}
}

.match_rewrite()

STRING .match_rewrite(STRING, INT field = 2, ENUM {regsub, regsuball, only_matching} mode = "regsuball")

This is a convenience function combining the .match() and .rewrite() methods:

redirect.match_rewrite(req.url, field = 3, mode = regsuball);

is functionally equivalent to:

redirect.match(req.url);
redirect.rewrite(field = 3, mode = regsuball);

You can use it to apply the first matching rewrite rule to a string:

import rewrite;

sub vcl_init {
	new rs = rewrite.ruleset({"
		"^(api|www).example.com$"       "example.com"
		"^img(|1|2|3).example.com$"     "img.example.com"
		"temp.example.com"              "test.example.com"
	"});
}

sub vcl_recv {
	# normalize the host
	set req.url = rs.match_rewrite(req.url);
}

.replace()

STRING .replace(STRING, INT field = 2, ENUM {regsub, regsuball, only_matching} mode = "regsuball")

.replace() is deprecated. Please use .match_rewrite() instead.

Availability

vmod_rewrite is available starting from version 4.1.7r1.

Installation

This vmod is packaged directly in the varnish-plus package.

More

The package contains further installation and usage instructions, accessible via man vmod_rewrite.

Contact support@varnish-software.com if you need assistance.