Varnish Cache Plus

MSE

Massive Storage Engine 3.0 (Varnish 6.0)

The Massive Storage Engine (MSE) is an advanced stevedore for Varnish Cache Plus. The stevedore is the component that handles storing the cached objects and their metadata, and keeping track of which objects in the cache are most relevant, and which to purge if needed to make room for new content. MSE adds several advanced features compared to standard stevedores that ship with Varnish:

  • Compact memory object structure. MSE has a more compact object storage structure giving less storage overhead. This is most noticable for small objects.

  • Fair LRU eviction strategy. When evicting content to make room for fresh content in the cache, the fetch task that does the eviction will be given priority to the space that it made available. This ensures that fetches does not fail due to other simultaneous fetch tasks stealing the space from under it.

  • Large caches using disks to cache objects. MSE can use disks as backing for object data, enabling cache sizes that are much larger than the available system memory. Content that is frequently used will be kept in memory, while less frequently used content will be read from disk instead of fetching from the backend.

  • Persisted caches. MSE will persist the disk stored objects, keeping the content in the cache between planned and unplanned restarts of the Varnish daemon.

Configuration and usage

MSE uses a structured configuration file to describe the layout of the devices to use for object storage. The syntax of the configuration file is shown in the examples below.

The configuration structure is hierarchical. At the top level there is one and exactly one environment, which configures the global sizes and rules to use for memory held object fragments.

An environment by itself configures a non-persistent cache. This makes MSE behave like a regular Varnish instance with a memory only cache, much like when using the default malloc stevedore, while giving the benefits of the compact object memory structure and fair LRU eviction.

To configure a persisted cache using disk for object storage, one or more books with associated stores needs to be configured in the environment. The books contain metadata internal to MSE, while the stores contain the object data used by Varnish.

Once the configuration file has been created, MSE can be enabled using the -s mse,<path-to-config-file> option to the Varnish daemon.

Persisted caching

When books and stores are configured, the cached objects are also persisted, keeping the content between restarts of the Varnish daemon.

The book is an embedded database that contains the necessary metadata about the objects stored in the cache. This includes e.g. the hash values for the objects, their associated TTLs and Vary matching information. The book also contains the maps for where in the store the payload data for the object resides, and the lists mapping free store space. Lastly the book has one journal file to persist bans on the system, and one journal file for each configured store to speed up metadata updates. All the data files that make up the book are kept in a directory, and the path to this directory is given in the configuration file.

Each book needs to have at least one store associated with it. The store holds the object payload data, consisting of its attributes (object headers, ESI instructions etc) and the object body. Each store is a single large file in the filesystem, that can contain any number of objects within it. New objects are assigned to a store on a round robin basis.

Keeping books and stores configured and stored separately is useful when the disks to use may not have the same IO capacity. It would be advisable to e.g. keep the book on a fast SSD type of drive, while using the larger but slower rotating disk for the store.

The data files, both for books and stores, needs to be initialized before starting the Varnish daemon for the first time. This is done using the bundled mkfs.mse utility, passing the configuration file as an option. See the mkfs.mse(1) manpage for details.

All the data files are marked with a version marker identifying the on disk data format. If this version marker does not match that of the Varnish daemon, Varnish will refuse to start, and the files will have to be recreated using the mkfs.mse utility with the -f force option, clearing the cache in the process. If a new release of Varnish Cache Plus comes with a new on disk format, the changelog entry will clearly say so.

Using MSE with SELinux

If SELinux is enabled, the MSE books and stores needs to be located on a path that the SELinux policy allows the Varnish daemon to access. The policy shipped with the packages enables the /var/lib/mse/* path for this purpose. If your stores are on separate drives, you will need to mount those drives below that path.

Example memory only configuration

The following is an example configuration for a memory only cache using 100 Gb of memory to hold cached objects.

env: {
	id = "myenv";
	memcache_size = "100G";
};

Example with 2 books each holding 2 stores

The following example demonstrates how to configure multiple books each holding multiple stores. In this example the server will use 100Gb of memory to hold frequently accessed object data. There will be 2 books, each configured for 1Gb metadata space. Each book has 2 stores, each holding 1Tb of object data.

env: {
	id = "myenv";
	memcache_size = "100G";

	books = ( {
		id = "book1";
		directory = "/var/lib/mse/book1";
		database_size = "1G";

		stores = ( {
			id = "store-1-1";
			filename = "/var/lib/mse/stores/disk1/store-1-1.dat";
			size = "1T";
		}, {
			id = "store-1-2";
			filename = "/var/lib/mse/stores/disk2/store-1-2.dat";
			size = "1T";
		} );
	}, {
		id = "book2";
		directory = "/var/lib/mse/book2";
		database_size = "1G";

		stores = ( {
			id = "store-2-1";
			filename = "/var/lib/mse/stores/disk3/store-2-1.dat";
			size = "1T";
		}, {
			id = "store-2-2";
			filename = "/var/lib/mse/stores/disk4/store-2-2.dat";
			size = "1T";
		} );
	} );
};

Further reading

In addition to the information above, the manual page varnish-mse(7) includes information on all configuration flags and parameters.

Massive Storage Engine 2.0 (Varnish 4.1)

With Varnish Cache Plus 4.1, the Massive Storage Engine 2.0 facilitates persistence in the cache.

Description

Varnish Massive Storage Engine 2.0 (MSE2) is an improved storage backend for Varnish, replacing the traditional malloc and file storages.

MSE2 adds data persistence across Varnish and server restarts, while continuing to offer the performance improvements seen in MSE1 on Varnish Cache Plus 4.0.

MSE1 and MSE2 are designed and tested with storage sizes up to 10 TB, but expected to work on far larger workloads.

Availability

MSE2 is available in Varnish Cache Plus 4.1.2r1.

Installation

MSE2 is built into supported Varnish Cache Plus versions, and does not need additional installation steps.

Configuration

MSE2 introduces a two stage configuration, where the storage files on disk are created by mkfs.mse (installed by the varnish-plus package) before being used by Varnish Cache Plus.

mkdir -p /var/lib/varnish/mse/
mkfs.mse -s /var/lib/varnish/mse/store,100G -b /var/lib/varnish/mse/book,1G

This will create a small 100 GB storage segment that Varnish Cache Plus can use, with a 1 GB bookkeeping journal for persistence.

Note: Bookkeeping file should be on a file system backed by storage with fast random access speeds. However, the store and the book don’t need to be on the same disk.

To use these files, we can use the -s mse,$STORAGE_FILE,$BOOK_FILE parameter in the varnishd command. In the above example case, this would be:

varnishd ... -s mse,/var/lib/varnish/mse/store,/var/lib/varnish/mse/book

Or if you have a /etc/varnish/varnish.params file, you can change the VARNISH_STORAGE variable:

VARNISH_STORAGE="mse,/var/lib/varnish/mse/store,/var/lib/varnish/mse/book"

Note: after creating the storage, mkfs.mse will output the proper -s argument to use.

If a persistence is not required, mkfs.mse can be run without the -b argument to only create the store files, the storage argument is simply -s mse,$STORAGE_FILE.

MSE2 has a small set of tunable parameters that must be set when the storage files are made. It is recommended to keep most of these at the default values. For optimal performance -p big_alloc=SIZE (default 1 MB) can be tuned to approximately the size of the larger cached object.

See mkfs.mse(1) man page for more information.

Storage sizing

Sizing the storage should be done based on the following recommendations.

On setups with gigabyte range storages, bookkeeping file should be around 1% of the storage size.

Setups in the terabyte range should have a bookeeping file size around 0.5% of storage size.

Incorrect bookkeeping file sizing can be seen in g_sparenode in varnishstat. This should never run out, if this happens objects will be evicted to make room for any new ones. Having spare nodes will waste a comparatively small amount of disk space in the bookkeeping file and does not do any harm.

If running on a system with standalone disks (no raid controller), use separate -smse instances for each disk. If the disks are SSDs, the bookeeping file can be kept on the same disk.

File system should be EXT4 and the total usage of a single disk/filesystem should not go above 95% capacity.

MSE2 is a userspace filesystem and fully allocates the storage on creation. TRIM for SSD storages is not necessary.

Massive Storage Engine (Varnish 4.0)

Description

Varnish Massive Storage Engine 1.0 (MSE or MSE1) is an improved storage backend for Varnish. It is a replacement for the original malloc and file storage backends. Main improvements are decreased disk IO load and lower storage fragmentation.

MSE is designed and tested with storage sizes up to 10 TB, but expected to work on far larger workloads.

Availability

MSE1 is available in Varnish Cache Plus 4.0 from 4.0.2r1 and newer. In Varnish Cache Plus 4.1 MSE1 has been replaced by MSE2.

Installation

MSE is built into supported Varnish Cache Plus versions, and does not need additional installation steps.

Configuration

MSE is configured in the same way that the file storage backend is:

varnishd -s mse,<file>,<size>[,<segments>]

It expects a data file path on disk and what size it should be. Number of storage segments can be left out.

MSE introduces a set of new parameters:

mse_bigalloc               1M [bytes] (default)
mse_delay_writes           on [bool] (default)
mse_membuf_size            4 [pages] (default)
mse_minextfree             4k [bytes] (default)
mse_nuke_limit             10 (default)
mse_pad_writes             on [bool] (default)
mse_prune_factor           2 (default)
mse_prune_loop             10 (default)
mse_sendfile_min           0b [bytes] (default)

It is recommended to keep these at the default values, with the exception of mse_bigalloc. This parameter should be as large as the bigger objects cached, for example 5MB for HTTP Live Streaming setups.

Advanced configuration

Using multiple stores

Using multiple storage segments is possible. For example it can be beneficial to use malloc storage (standard) for normal objects, and MSE for special objects.

This can be done through setting explicit storage in vcl_backend_response, and giving each storage defined on the command line a name:

varnishd -s memory=malloc,1G -s msestorage=mse,/var/lib/varnish/mse,100G

Example VCL:

# Note: advanced usage
sub vcl_backend_response {
	set beresp.storage_hint = "memory";
	if (bereq.url ~ ".mp4$") {
		set beresp.storage_hint = "msestorage";
	}
}

Number of MSE segments

Number of storage segments within MSE can be set on the command line.

Increasing it reduces lock contention when doing memory allocations, especially when forcefully expunging content is necessary.

See the varnishd manual pages for more information on tuning this parameter for big cache sizes.