Search
Varnish Cache Plus

Body Access & Transformation (xbody)

Varnish 6.0

Description

vmod-xbody provides access to request and response bodies.

On the response side, xbody is a streaming regular expression engine which allows Varnish to manipulate, capture, hash, and log body text. xbody supports full PCRE regex expressions, capture groups, and backreferences.

xbody supports two important modes: regsub and capture.

regsub performs a regular expression substitution. Any number of regsub substitutions can be done on a response body.

capture finds a pattern and then put its substitution, or value, into a JSON object which can then be queried out using VCL or included directly in an Edgestash template. Any number of capture calls can be done on a response body.

When regsub and capture are combined, Varnish can convert a dynamic and personalized response into a JSON object and Edgestash template, allowing for uncacheable content to be cached and acclerated. Please see Edgestash for more information on JSON based templating.

Note: Because xbody is a streaming regex parser, embedded use of anchors (^ and $) is not supported. Anchors can only be used as the first and last characters of your regex expression.

For assistance in creating PCRE compatible regular expressions, please use regex101.com.

Click here for legacy vmod_bodyaccess.

Example VCL

Example regsub and capture

Input (backend): This is a test!

import std;
import xbody;

sub vcl_backend_response
{
    xbody.regsub("test", "new string");
    xbody.regsub("\sis", " isn't");
    xbody.regsub("string", "cat");
    xbody.capture("name", "new (\w+)", "\1");
}

sub vcl_deliver
{
    std.log("Name: " + xbody.get("name"));
    std.log("Capture JSON: " + xbody.get_all());
}

Output (cache): This isn't a new cat!

Captures (xbody.get_all()):

{
  "name": "cat"
}

Change All Domain References

import xbody;

sub vcl_backend_response
{
    xbody.regsub("www.example.com", "test.example.com");
    xbody.regsub("admin@example.com", "dev@test.example.com");
}

Javascript CSP Nonce

import crypto;
import edgestash;
import xbody;

sub vcl_backend_response{
    if (beresp.http.Content-Type ~ "html") {
        xbody.regsub("<script>", "<script nonce={{nonce}}>");
        edgestash.parse_response();
    }
}

sub vcl_deliver {
    if (edgestash.is_edgestash()) {
        set req.http.X-nonce = crypto.hex_encode(crypto.urandom(16));
        set resp.http.Content-Security-Policy = "script-src 'nonce-" +
            req.http.X-nonce + "'";
        edgestash.add_json({"
            {
              "nonce":""} + req.http.X-nonce + {""
            }
        "});
        edgestash.execute();
    }
}

JSONP

Convert a JSON response into JSONP using a callback query parameter.

import urlplus;
import xbody;

sub vcl_backend_response {
    # Insert the JSONP callback
    if (urlplus.query_get("callback") && beresp.http.Content-Type ~ "json") {
        xbody.regsub("^", urlplus.query_get("callback") + "(");
        xbody.regsub("$", ");");
        set beresp.http.Content-Type = regsub(beresp.http.Content-Type, "json", "javascript");
    }
}

Personalization Caching

Cache PHP pages with personalized content. Personalized content is wrapped with an edgestash-name attribute. These attributes will be templated and delivered per session.

<span edgestash-name="greeting">
Hello [name]!
</span>

<span edgestash-name="number-items">
5
</span>

Content ...
import cookieplus;
import edgestash;
import xbody;
import ykey;

sub vcl_recv
{
    # Purge session data
    if (req.method == "POST") {
        # Broadcaster can be used to send this purge to a cluster
        ykey.purge(cookieplus.get("PHPSESSID"));
    }

    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }

    return(hash);
}

sub vcl_hash
{
    # Add the session to xbody data
    if (req.http.host ~ "^xbody\.") {
        hash_data(cookieplus.get("PHPSESSID", "guest"));
    }
}

sub vcl_pass
{
    # When you purge, you may be forced to pass
    if (req.http.host ~ "^xbody\.") {
        return (restart);
    }
}

sub vcl_backend_fetch
{
    # Remove xbody host marker
    unset bereq.http.xbody;
    if (bereq.http.Host ~ "^xbody\.") {
        set bereq.http.Host = regsub(bereq.http.Host, "^xbody\.", "");
        set bereq.http.xbody = "true";
    }
}

sub vcl_backend_response
{
    if (beresp.http.Content-Type ~ "text") {
        if(bereq.http.xbody) {
            # Extract xbody data, add a ykey session key
            xbody.capture("\2", {"(<[^>]*edgestash-name="?([^"\s]+)"?[^>]+>)([^>]*)<"}, "\3", "g");
            ykey.add_key(cookieplus.get("PHPSESSID", "guest"));
            set beresp.ttl = 1d;
        } else {
            # Edgestash template
            xbody.regsub({"(<[^>]*edgestash-name="?([^"\s]+)"?[^>]+>)([^>]*)<"}, "\1{{\2}}<");
            edgestash.parse_response();
        }
    }
}

sub vcl_deliver
{
    if (edgestash.is_edgestash()) {
        # Use the homepage as the xbody data source
        edgestash.add_json_url("/", json_host="xbody." + req.http.host, xbody=true);
        edgestash.execute();
    }
}

Functions

regsub

VOID regsub(STRING pattern, STRING substitution, STRING mode, INT max)

Perform a regex substitution on the response body. Can only be used in vcl_backend_response.

Arguments

  • STRING pattern - PCRE regular expression to find.
  • STRING substitution - Substitution text. Backreferences are allowed (\1 thru \9).
  • STRING mode - optional Pattern mode. Defaults to g. Possible values: [none], g.
  • INT max - optional Max number of substitutions. Defaults to 0 (no limit).

Returns

Nothing.

capture

VOID capture(STRING name, STRING pattern, STRING value, STRING mode, INT max)

Perform a regex capture on the response body. If pattern is found, value is set as the value of name. Use xbody.get() to read the value in vcl_deliver. Can only be used in vcl_backend_response. When used, streaming is disabled as to allow the captured value to be safely read in vcl_deliver.

Arguments

  • STRING name - JSON variable name. Backreferences are allowed (\1 thru \9).
  • STRING pattern - PCRE regular expression to find.
  • STRING value - Value to use if pattern is found. Backreferences are allowed (\1 thru \9).
  • STRING mode - optional Pattern mode. Defaults to no mode. Global mode, g will repeat the capture and produce a JSON array value of all captures found.
  • INT max - optional Max number of captures. Defaults to 0 (no limit).

Returns

Nothing.

get

STRING get(STRING name, STRING default)

Get a capture value by name. Can only be used in vcl_deliver. Can be used in vcl_synth in combination with xbody.synth().

Arguments

  • STRING name - The name of the capture. Please see xbody.capture().
  • STRING default - optional The default value if name is not found. Defaults to an empty string.

Returns

The value of name if it exists, otherwise default.

get_all

STRING get_all()

Get all capture values as a JSON object string. Can only be used in vcl_deliver. Can be used in vcl_synth in combination with xbody.synth().

Arguments

None.

Returns

A JSON object representing all capture values. If no capture values exist, an empty JSON object is returned. This is used to send all captures directly to Edgestash or for VHA replication.

get_req_body

STRING get_req_body()

Get the contents of the request body. Can only be used in vcl_recv. Must call std.cache_req_body() prior to this call.

Arguments

None.

Returns

A string of the request body.

get_req_body_hash

BLOB get_req_body_hash(ENUM {md5, sha1, sha224, sha256, sha384, sha512} algorithm)

Generate and return the hash of the request body. Can only be used in vcl_recv.

Arguments

  • ENUM algorithm - The algorithm to hash the body with.

Returns

A blob containing the hash of the request body.

hash_body

VOID hash_body(ENUM {md5, sha1, sha224, sha256, sha384, sha512} algorithm, STRING name)

Generate the hash of the response body. Can only be used in vcl_backend_response.

Arguments

  • ENUM algorithm - The algorithm to hash the body with.
  • STRING name - optional The name of the hash used when storing multiple hashes. If not used, this is the default hash.

Returns

Nothing.

get_hash

BLOB get_hash(STRING name)

Get the hash generated with hash_body. Can only be used in vcl_deliver.

Arguments

  • STRING name - optional The name of the hash used when generating multiple hashes. If not used, it returns the default hash.

Returns

A blob of the binary encoded hash.

log_body

VOID log_body(BYTES max)

Log the response body to the VSL (varnishlog). The Body VSL tag is used for this operation. The body logging can span multiple tag lines, this is determined by the maximum VSL record length.

Arguments

  • BYTES max - optional The maximum amount of bytes to log. After logging this many bytes, a truncation message will be appended: __XBODY_TRUNCATED_%MISSING_BYTES%. Defaults to 4KB.

Returns

Nothing.

synth

VOID synth()

Allow captured JSON data to be accessible in vcl_synth.

Arguments

None.

Returns

Nothing.

set

VOID set(STRING json)

Set capture values. When used, regex and capture operations are skipped. This is used to replicate xbody operations via VHA. Can only be used in vcl_backend_response.

Arguments

  • STRING json - Capture values from xbody.get_all(). Must be a valid JSON object.

Returns

Nothing.

reset

VOID reset()

Reset the internal state.

Arguments

None.

Returns

Nothing.