Solutions and Use Cases Tutorial

This tutorial shows how to configure Varnish for various use cases:

How to use Varnish to deliver fast video streaming

When it comes to streaming on HTTP, there are essentially two ways of delivering video content. You can serve the entire file and let the browser ask for the parts as the video is played or you can split the media file into smaller objects (segments) and provide a playlist (manifest). The playlist contains the names, location and sequence of playback of these segments along with metadata that describes resolution, bitrates, etc. For both cases, we can leverage HTTP caching with Varnish. However, as the segmented streaming model is popular these days, we will be mostly focusing on the latter.

Linear streaming

Linear live streaming is the streaming of a scheduled event where all viewers watch the same content. Because everyone is viewing the same content, the dataset is very limited and this lets us cache media file segments and manifest files into memory. Setting up basic configurations for linear streaming is quite simple. Let’s take a look at the following example.

vcl 4.0;
import std;
backend default {
    .host = "";
    .port = "80";
acl purge {
sub vcl_recv {
    if (req.method == "PURGE"){
        if (!client.ip ~ purge) {
            return (synth (401, "Unauthorized."));
        return (purge);
    return (hash);

sub vcl_backend_response {
    set beresp.do_stream = true;
    if (bereq.url ~ "m3u8") {
        # assuming chunks are 4 seconds long
        set beresp.ttl = 3s;
        set beresp.grace = 0s;
    } else {
        set beresp.ttl = 120s;

In the backend response, we are setting TTL to three seconds and grace to zero seconds when bereq.url contains m3u8. M3U8 files contain the playlist for your media, and you should set the TTL of these manifest files to a low number, depending on the segmenting in the source. The reason behind this is the manifest file needs to be updated as new segments are introduced into the screen.

You can set longer TTLs for the segments, depending on how long you need the segments in the cache. In this example, we think 120s is a reasonable value.

Additionally, Varnish can deliver the object to the client as soon as bits start to flow into Varnish while fetching the whole object is ongoing. This feature can be beneficial for reducing latency. You can enable this feature by setting set beresp.do_stream = true; and reduce latency even further.

OTT streaming

OTT (over-the-top) streaming provides us with a huge catalog of titles. Therefore, the dataset in OTT streaming is massive, and you will need more than memory to store all the cache. Fortunately, Varnish has its own advanced stevedore, MSE (Massive Storage Engine), for persisting a large amount of cache to a disk. Some Varnish customers use MSE with storage of over 100TB. As the TTL and the storage size can vary greatly depending on the content, you will need to configure values that suit your needs.

You can read a detailed explanation of MSE here.

Scale with VHA

As the number of clients mounts, you will need to set up more Varnish instances to meet the demands. However, more Varnish servers means that the backend will have to bear the burden of populating all Varnish instances with cached content, and this can overload your origin server. Varnish High Availability (VHA) provides the functionality of content replication between Varnish instances, therefore effectively reducing the number of requests hitting the origin server.

Read more about VHA.

How to use Varnish to increase content security

Varnish Software provides a customizable security package for customers to implement increased security measures to combat known threats and unknown vulnerabilities.


At some point in data transport the sender and the receiver lose the control of the data. Therefore, it is important to authenticate the other party in a connection, check the integrity of data and provide encrypted protection. Varnish Enterprise offers support for using TLS on backend connections and In-Process TLS offloading.

Total encryption

With “total encryption” every cache object has its unique AES256 encryption key. It requires knowledge of that object’s unique key and associated request to decrypt its cache, meaning that leaking the cache would require breaking AES256 encryption for each and every object in the cache. Related resources:


The Varnish WAF (web application firewall) will let you set your own security rules, Modsecurity style. The Varnish WAF will help you protect your backend from predatory traffic. As stated in the documentation, OWASP CRS can be installed and included in the VCL file. You can find more details in the links below.

JSON Web Token

JSON Web Token allows secure transmission of information between parties as a JSON object. Varnish has the JWT vmod, which enables manipulation, creation, and verification of JWT and JWS tokens.

Other VMODs

Varnish modules (VMODs) also allow customers to control how and when they detect malicious traffic patterns. For example, vmod-bodyaccess helps identify potentially dangerous traffic, and vmod-vsthrottle will limit incoming requests when suspicious activity is detected.