The Massive Storage Engine (MSE) is an advanced stevedore for Varnish Cache Plus. The stevedore is the component that handles storing the cached objects and their metadata, and keeping track of which objects in the cache are most relevant, and which to purge if needed to make room for new content. MSE adds several advanced features compared to standard stevedores that ship with Varnish:
Compact memory object structure. MSE has a more compact object storage structure giving less storage overhead. This is most noticeable for small objects.
Fair LRU eviction strategy. When evicting content to make room for fresh content in the cache, the fetch task that does the eviction will be given priority to the space that it made available. This ensures that fetches does not fail due to other simultaneous fetch tasks stealing the space from under it.
Large caches using disks to cache objects. MSE can use disks as backing for object data, enabling cache sizes that are much larger than the available system memory. Content that is frequently used will be kept in memory, while less frequently used content will be read from disk instead of fetching from the backend.
Persisted caches. MSE will persist the disk stored objects, keeping the content in the cache between planned and unplanned restarts of the Varnish daemon.
MSE doesn’t need disks to work, it can be used in a pure memory setup. Apart
the fair eviction feature, it also simplifies cache size computation thanks to
the Memory Governor which
take into account various memory overheads like thread stack, workspaces or
Transient
storage.
To use it, remove any existing -s malloc,...
argument from your varnishd
command line and add:
# replace 20G with your own cache size
-s mse -p memory_target=20G
Create the following MSE 3.0 conf file /var/lib/mse/mse.conf
:
env: {
id = "mse";
# how much RAM can be used, "auto" defaults to 80% of the total memory
# see the "Memory Governor" section for more information
memcache_size = "auto";
# one caching set, containing one book (for metadata) and one or more
# stores (for the actual cache content)
books = ( {
id = "book";
# the location of the book
directory = "/var/lib/mse/book";
# how large the book should be, a cautious rule of thumb is
# 2KB per object in the stores
database_size = "2G";
stores = ( {
id = "store";
# the store is actually a large contiguous file, note that
# it does NOT have to be on the same directory or disk as
# the book
filename = "/var/lib/mse/store.dat";
# how large the store should be.
size = "100G";
} );
} );
};
Important note: make sure both your book and stores are on an ext4 volume, see the mkfs.mse section for more information.
Next, initialize the MSE configuration:
mkfs.mse -c /var/lib/mse/mse.conf
You should now see a book directory and a store file in /var/lib/mse
. This is called a
“bookstore”.
Finally, edit your varnishd
command line:
# replace
-s malloc,xxG
# with
-s mse,/var/lib/mse/mse.conf
You can then (re)start varnishd
to benefit from MSE.
For very large MSE stores (which can house many million objects),
consider increasing the startup_timeout
to allow for a longer
startup time.
As you’ve seen in the Quick Start section, MSE uses a structured configuration file to describe the layout of the devices to use for object storage. The syntax of the configuration file is formally described in the settings page but you will finder higher-level explanations and examples below.
The configuration structure is hierarchical. At the top level there is one
and exactly one environment
, which configures the global sizes and rules
to use for memory held object fragments.
An environment by itself configures a non-persistent cache. This makes MSE
behave like a regular Varnish instance with a memory only cache, much like
when using the default malloc
stevedore, while giving the benefits of
the compact object memory structure and fair LRU eviction.
It is possible to specify MSE with no configuration file by simply
using -s mse
. This is needed if you want a memory only cache with
the Memory Governor active. Read the separate page for the Memory
Governor for more information.
To configure a persisted cache using disk for object storage, one or more
books
with associated stores
needs to be configured in the
environment. The books
contain metadata internal to MSE, while the
stores
contain the object data used by Varnish.
Once the configuration file has been created, MSE can be enabled using the
-s mse,<path-to-config-file>
option to the Varnish daemon.
When books and stores are configured, the cached objects are also persisted, keeping the content between restarts of the Varnish daemon.
The book
is an embedded database that contains the necessary
metadata about the objects stored in the cache. This includes e.g. the
hash values for the objects, their associated TTLs and Vary matching
information. The book
also contains the maps for where in the
store
the payload data for the object resides, and the lists mapping
free store
space. Lastly the book has one journal file to persist
bans on the system, and one journal file for each configured store to
speed up metadata updates. All the data files that make up the book
are kept in a directory, and the path to this directory is given in
the configuration file.
Each book
needs to have at least one store
associated with it. The
store
holds the object payload data, consisting of its attributes
(object headers, ESI instructions etc) and the object body. Each store
is a single large file in the filesystem, that can contain any number
of objects within it. New objects are assigned to a store on a round
robin basis.
Keeping books and stores configured and stored separately is useful when the disks to use may not have the same IO capacity. It would be advisable to e.g. keep the book on a fast SSD type of drive, while using the larger but slower rotating disk for the store.
The data files, both for books and stores, needs to be initialized
before starting the Varnish daemon for the first time. This is done
using the bundled mkfs.mse
utility, passing the configuration file
as an option. See the mkfs.mse(1)
manpage for details.
All the data files are marked with a version marker identifying the on
disk data format. If this version marker does not match that of the
Varnish daemon, Varnish will refuse to start, and the files will have
to be recreated using the mkfs.mse
utility with the -f
force
option, clearing the cache in the process. If a new release of Varnish
Cache Plus comes with a new on disk format, the changelog entry will
clearly say so.
If SELinux is enabled, the MSE books
and stores
needs to be
located on a path that the SELinux policy allows the Varnish daemon to
access. The policy shipped with the packages enables the
/var/lib/mse/*
path for this purpose. If your stores
are on
separate drives, you will need to mount those drives below that path.
The following is an example configuration for a memory only cache using 100 Gb of memory to hold cached objects.
env: {
id = "myenv";
memcache_size = "100G";
};
This configuration above is discouraged, use -s mse
with no
configuration file instead, and use the Memory Governor to
automatically adjust the cache size based on total memory usage by
MSE.
The following example demonstrates how to configure multiple books
each holding multiple stores
. There will be two books, each configured
for 1Gb metadata space. Each book has 2 stores, each holding 1Tb of
object data.
env: {
id = "myenv";
memcache_size = "auto";
books = ( {
id = "book1";
directory = "/var/lib/mse/book1";
database_size = "1G";
stores = ( {
id = "store-1-1";
filename = "/var/lib/mse/stores/disk1/store-1-1.dat";
size = "1T";
}, {
id = "store-1-2";
filename = "/var/lib/mse/stores/disk2/store-1-2.dat";
size = "1T";
} );
}, {
id = "book2";
directory = "/var/lib/mse/book2";
database_size = "1G";
stores = ( {
id = "store-2-1";
filename = "/var/lib/mse/stores/disk3/store-2-1.dat";
size = "1T";
}, {
id = "store-2-2";
filename = "/var/lib/mse/stores/disk4/store-2-2.dat";
size = "1T";
} );
} );
};
This example is similar to the previous, but the stores are of
different sizes. In addition, a default set of stores is selected
through the default_stores
parameter.
The two books in the configuration are of the same size, while their stores’ sizes differ by an order of magnitude. Although spinning disks is often a bad idea, it can be a good option if you have many huge, uncommonly requested files that you want to cache.
In the example, imagine that the second book contains two stores on
spinning disks, while the first book use faster SSD/nVMe drives. By
default, through the default_stores
parameter, only the first book’s
stores will be selected during object insertion::
env: {
id = "myenv";
memcache_size = "auto";
books = ( {
id = "book1";
directory = "/var/lib/mse/book1";
database_size = "1G";
stores = ( {
id = "store-1-1";
filename = "/var/lib/mse/stores/disk1/store-1-1.dat";
size = "1T";
}, {
id = "store-1-2";
filename = "/var/lib/mse/stores/disk2/store-1-2.dat";
size = "1T";
} );
}, {
id = "book2";
directory = "/var/lib/mse/book2";
database_size = "1G";
stores = ( {
id = "store-2-1";
filename = "/var/lib/mse/stores/disk3/store-2-1.dat";
size = "10T";
}, {
id = "store-2-2";
filename = "/var/lib/mse/stores/disk4/store-2-2.dat";
size = "10T";
} );
} );
default_stores = "book1";
};
It is possible to use vmod_mse
to override the store selection on
each individual backend request, and this will be the only way to get
objects into book2
with the above configuration.
It is possible to attach tags to books and individual stores, and use these to select which stores. Consider the following, somewhat silly, example::
env: {
id = "myenv";
memcache_size = "auto";
books = ( {
id = "book1";
directory = "/var/lib/mse/book1";
database_size = "1G";
tags = "red";
stores = ( {
id = "store-1-1";
filename = "/var/lib/mse/stores/disk1/store-1-1.dat";
size = "1T";
}, {
tags = ( "orange", "store-2-1" );
id = "store-1-2";
filename = "/var/lib/mse/stores/disk2/store-1-2.dat";
size = "1T";
} );
}, {
id = "book2";
directory = "/var/lib/mse/book2";
database_size = "1G";
tags = ( "pink", "red" );
stores = ( {
id = "store-2-1";
filename = "/var/lib/mse/stores/disk3/store-2-1.dat";
size = "1T";
tags = "green";
}, {
id = "store-2-2";
filename = "/var/lib/mse/stores/disk4/store-2-2.dat";
size = "1T";
tags = ( "blue", "book1", "red" );
} );
} );
default_stores = "none";
};
The example above is equal to the previous example, but tags have been added on both the books and the stores, and default_stores
is set to the special value "none"
(indicating that objects should be memory only by default).
Tags can be specified either as a single string, or as a list of strings, and they can be applied to books and stores.
When a set of stores is selected, either by using default_stores
or vmod_mse
, the string will be matched against book names, store names, and tags.
In the example above, mse.set_stores("red");
will select all stores, since both books have been tagged "red"
.
Even though store-2-2
is tagged "red"
twice (in the book and the store itself), it will not be chosen twice as often as the other "red"
stores.
There is no discrimination of names and tags, so mse.set_stores("book1");
will select all the stores in book1, and store-2-2, since this store has the "book1"
tag.
Similarly, mse.set_stores("store-2-1");
will select two stores, one because of a matching name, and the other because a tag matches.
Read more about store selection in the MSE VMOD documentation, where it also explained how stores are weighted after selection, and how to change it.
It is possible to enable fault tolerance for an MSE environment, in which case it may start with a subset of its books and stores. In the event of a device failure, misconfiguration, or any other reason that would result in a book or a store not successfully opening during startup, MSE can ignore these failures and proceed in degraded mode.
If an environment is successfully loaded, but corrupted, a fault may occur after startup. In this case Varnish tries to catch the error and cache it on disk for the next startup. This cache is a directory containing one text file per MSE environment that can be edited to unregister failed books and stores after restoring them to a pristine state.
Configuring a fault-tolerant environment can be done like this:
env: {
id = "myenv";
memcache_size = "auto";
degradable = true;
degradable_cache = "/var/lib/mse/degradable_cache";
# define books and stores
}
MSE3 was not initially designed with fault tolerance, so failures will first manifest as panics. As the cache process crashes, the manager process might cache the MSE error before restarting a new cache process. Soon Varnish is ready to serve traffic, with a degraded persistent storage capacity.
It is strongly encouraged that when using MSE, utils.fast_304
is also utilized whenever possible. A fast 304 insertion means only a simple book update is performed on revalidation, whereas normally a full copy of the object payload must be performed which is often unnecessary. This represents a significant reduction in IO cost. For more information on utils.fast_304
please refer to the vmod_utils
documentation found here.
Example configuration of fast_304:
sub vcl_backend_response {
// If the fast_304() is all you need, the if test can be omitted.
// The utils.fast_304() does nothing if the response wasn't 304.
if (beresp.was_304) {
utils.fast_304();
}
}