MSE4 features a mechanism which makes it possible to place cached content in specific categories. This feature is named Content Categories. Once categorized, MSE4 will provide real time statistics on exactly how much content and payload data each category holds. Furthermore the categories can have dedicated memory and disk resources assigned to them, making it possible to dedicate cache resources towards types of content.
The set of categories are defined in the MSE4 configuration file. The
categories are hierarchical, and form a tree with a single root node. Each
level of the tree can have any number of branches. The root of the tree is
configured on the categories
key in the environment section of the
configuration file.
Content is categorized by associating the objects with a leaf node in the category tree through VCL rules. The MSE4 vmod contains methods for associating content with a category when content is fetched into the cache.
Content Categories is an optional feature in MSE4, and it is not required
to define any categories to start using Varnish Enterprise with MSE4. If
no categories
key exists in the environment section of the
configuration file, a single unnamed root category is automatically
created, and all cache content is always associated with this
category. All configured stores are also associated with this root
category (store and category association is explained below). The result
of this is that the MSE4 instance will regard all content as equally
important, and all stores can be used to hold any type of content.
The following example MSE4 configuration file is for an imaginary site
serving images, video and supporting HTML resources. Four leaf categories
are defined. Below the root, one category is defined for holding media
content and one for other content. The media
category is further
split into two categories, one for images and one for video files. The
images
category is split into icons and pictures.
env: {
categories = {
other = {};
media = {
images = {
icons = {};
pictures = {};
};
video = {};
};
};
default_category = "other";
};
The VCL code for associating content with a category could for this example look like:
vcl 4.1;
import mse4;
sub vcl_backend_response {
if (beresp.http.content-type ~ "^image/") {
if (bereq.url ~ "icon") {
mse4.set_category("media.images.icons");
} else {
mse4.set_category("media.images.pictures");
}
} else if (beresp.http.content-type ~ "^video/") {
mse4.set_category("media.video");
} else {
# Our configuration uses "other" as a default category
# Alternatively we could do 'mse4.set_category("other");' here
}
}
Content categories are named by a category path. The category path for a
specific node is a string formed by concatenating every level of the tree
from the root to that node. From the example above, the icons
category has the category path media.images.icons
. When associating
content with a category, the category path is used.
If the configuration specifies a default_category
(like the example
above does), this category is assigned to all content where the VCL did
not explicitly associate the content with a category.
The default_category
will also be applied if the VCL program failed to
correctly associate a category. This could for example happen if the
argument to set_category()
was for an unknown category.
If no category is assigned to the content and no default_category
is
configured, the cache object creation will fail.
Detailed VSC (Varnish Shared Counter) resource usage counter sets are
created for each leaf node of the category tree. They will be of the VSC
class MSE4_CAT
, and have the category path prefixed in parenthesis to
the counter names.
The counter values in the class correspond to the similarly named counter
values in the global MSE4_MEM
counter set, and show how much of the
total is consumed by the specific category.
Using the example configuration above, the counter
MSE4_CAT.(media.video).g_bytes
would show the values for content
placed in the media.video
leaf node category.
The content categories may have memory weights associated with them. This allows prioritization on memory usage between categories, and control over how much memory each of them will be allowed to consume.
Note that the memory usage is controlled by weights rather than absolute byte counts. This makes the memory controls work regardless of how much memory is actually consumed, which may fluctuate with the traffic patterns the server is experiencing.
The memory weights describe the relative memory usage allowance for
categories sharing the same parent. If two categories A
and B
share the same parent, and A
has a memory weight of 1 and B
a
memory weight of 3, then the content in category B
is allowed to
consume 3 times as much memory as A
.
It is a requirement that either none or all sibling categories (categories having the same parent) has a memory specified. Furthermore, if a category specifies a memory weight, then the parent category also needs to specify a memory weight (the root category is an exception to this rule, as it always has an implied memory weight of 1). If these rules are not kept, a configuration error message will be given when attempting to start the Varnish daemon.
Whenever content needs to be removed from the cache, an algorithm will walk from the root of the category tree. At each level an assessment of the relative memory usage of each of the child categories is done. The child category that shows the highest weight adjusted memory usage is selected, and the process is repeated for that category. When a leaf category is reached, or a category is reached where its child categories does not have memory weights associated with them, a cache eviction is performed at that level.
The final step in the cache eviction process is to select the Least Recently Used content from among all of the objects associated with the selected category.
This example expands our previous example, adding memory weights to some of the categories. The new MSE4 configuration is:
env: {
categories = {
other = {
*memory_weight* = 1;
};
media = {
*memory_weight* = 5;
images = {
*memory_weight* = 1;
icons = {};
pictures = {};
};
video = {
*memory_weight* = 3;
};
};
};
default_category = "other";
};
The VCL code does not change from the previous example.
In this example, MSE4 will aim to spend 5 times more memory on the media
content as it will on the other supporting HTML files. So 1/6 of the cache
space will be spent towards the other
category, and 5/6 towards the
media
category.
In the same way, MSE4 will spend 3 times more memory on video as it will on the images. So the 5/6 of the total cache space reserved for media content is further divided into 1 part images and 3 parts video.
Note that the media.images.icons
and media.images.pictures
categories do not have memory weights associated with them. This means
that the content in these categories are for cache eviction purposes all
treated as belonging to media.images
. If an eviction needs to happen
there, the Least Recently Used item would be selected from the combined
set of both icons
and pictures
.
When configuring Varnish Enterprise to use persisted caching, Content
Categories is the mechanism that is used to assign objects to stores when
the content enters the cache. The VCL program decides the category, and if
the selected category has one or more stores associated with it, then the
object may be persisted to one of these stores. The stores are associated
with a category by listing the store ID on the special *stores*
key in the category declaration. A store can only be associated with a
single category, meaning all objects from the same store will always
belong to the same category.
The combined sizes of the stores associated with a category determines the total amount of disk space available for persisted caching of objects in that category. This is the method by which the administrator can provision the disk resources between different categories of objects. If there is a need to share a disk between multiple categories, multiple stores should be created on the disk, splitting the disk space between the categories as needed.
Upon daemon restarts, the objects from a given store gets the category designation from the category that its store is assigned to. This means that if a store’s category is changed in the configuration file, and the Varnish daemon restarted, then the persisted objects from the given store would also change category.
For more information about provisioning disks for persisted caching, please see Persisted Caching.
In this example, we build on the previous example and add stores to the
other
and media.video
categories from two disks. The
other
category gets a small persisted space on one of the disks
only, while the media.video
gets the remaining space on the first
disk and the complete space on the second.
env: {
categories = {
other = {
*memory_weight* = 1;
*stores* = ("other");
};
media = {
*memory_weight* = 5;
images = {
*memory_weight* = 1;
icons = {};
pictures = {};
};
video = {
*memory_weight* = 3;
*stores* = ("video_disk1", "video_disk2");
};
};
};
default_category = "other";
books = ( {
id = "book_disk1";
filename = "/var/lib/mse/disk1/book_disk1";
size = "2g";
stores = ( {
id = "other";
filename = "/var/lib/mse/disk1/store_other";
size = "100g";
}, {
id = "video_disk1";
filename = "/var/lib/mse/disk1/store_video";
size = "400g";
} );
}, {
id = "book_disk2";
filename = "/var/lib/mse/disk2/book_disk2";
size = "2g";
stores = ( {
id = "video_disk2";
filename = "/var/lib/mse/disk2/store_video";
size = "500g";
} );
} );
};
The VCL code does not change from the previous example.
Categories can further be subdivided, which is a mechanism that splits a category into multiple categories under the hood. This enables the splitting of some potentially contended data structures, allowing more concurrent changes to happen without creating bottlenecks. This can be a useful tuning to apply to caches that see a very high churn rate, with objects being inserted and evicted very frequently.
The subdivided categories do not get any names and are not visible to the administrator. MSE4 will assign content randomly to a subdivision. When a subdivided category is selected for content eviction, the subdivision with the most number of objects assigned to it will be the one selected to produce an object to evict.
The statistics for the subdivisions are not made accessible. The category statistics showed will be a sum across all of the subdivisions, displaying the complete category information.
The number of subdivisions for a category is controlled by setting the
special *subdivisions*
key in a category definition. The key should
be set to an integer value, which specifies the number of subdivisions to
create.
Categories that do not specify a *subdivisions*
key will have the
value from the environment level default_subdivisions
applied to
them. The default value of default_subdivisions
is 4, meaning that
all categories are given 4 subdivisions by default.