Content Categories in MSE4

MSE4 features a mechanism which makes it possible to place cached content in specific categories. This feature is named Content Categories. Once categorized, MSE4 will provide real time statistics on exactly how much content and payload data each category holds. Furthermore the categories can have dedicated memory and disk resources assigned to them, making it possible to dedicate cache resources towards types of content.

The set of categories are defined in the MSE4 configuration file. The categories are hierarchical, and form a tree with a single root node. Each level of the tree can have any number of branches. The root of the tree is configured on the categories key in the environment section of the configuration file.

Content is categorized by associating the objects with a leaf node in the category tree through VCL rules. The MSE4 vmod contains methods for associating content with a category when content is fetched into the cache.

Content Categories is an optional feature in MSE4, and it is not required to define any categories to start using Varnish Enterprise with MSE4. If no categories key exists in the environment section of the configuration file, a single unnamed root category is automatically created, and all cache content is always associated with this category. All configured stores are also associated with this root category (store and category association is explained below). The result of this is that the MSE4 instance will regard all content as equally important, and all stores can be used to hold any type of content.

Example configuration

The following example MSE4 configuration file is for an imaginary site serving images, video and supporting HTML resources. Four leaf categories are defined. Below the root, one category is defined for holding media content and one for other content. The media category is further split into two categories, one for images and one for video files. The images category is split into icons and pictures.

   env: {
	categories = {
		other = {};
		media = {
			images = {
				icons = {};
				pictures = {};
			};
			video = {};
		};
	};
	default_category = "other";
   };

The VCL code for associating content with a category could for this example look like:

   vcl 4.1;

   import mse4;

   sub vcl_backend_response {
	if (beresp.http.content-type ~ "^image/") {
		if (bereq.url ~ "icon") {
			mse4.set_category("media.images.icons");
		} else {
			mse4.set_category("media.images.pictures");
		}
	} else if (beresp.http.content-type ~ "^video/") {
		mse4.set_category("media.video");
	} else {
		# Our configuration uses "other" as a default category
		# Alternatively we could do 'mse4.set_category("other");' here
	}
   }

Category path name

Content categories are named by a category path. The category path for a specific node is a string formed by concatenating every level of the tree from the root to that node. From the example above, the icons category has the category path media.images.icons. When associating content with a category, the category path is used.

Default category

If the configuration specifies a default_category (like the example above does), this category is assigned to all content where the VCL did not explicitly associate the content with a category.

The default_category will also be applied if the VCL program failed to correctly associate a category. This could for example happen if the argument to set_category() was for an unknown category.

If no category is assigned to the content and no default_category is configured, the cache object creation will fail.

Content categories and Varnish counters

Detailed VSC (Varnish Shared Counter) resource usage counter sets are created for each leaf node of the category tree. They will be of the VSC class MSE4_CAT, and have the category path prefixed in parenthesis to the counter names.

The counter values in the class correspond to the similarly named counter values in the global MSE4_MEM counter set, and show how much of the total is consumed by the specific category.

Using the example configuration above, the counter MSE4_CAT.(media.video).g_bytes would show the values for content placed in the media.video leaf node category.

Memory prioritization for Content Categories

The content categories may have memory weights associated with them. This allows prioritization on memory usage between categories, and control over how much memory each of them will be allowed to consume.

Note that the memory usage is controlled by weights rather than absolute byte counts. This makes the memory controls work regardless of how much memory is actually consumed, which may fluctuate with the traffic patterns the server is experiencing.

The memory weights describe the relative memory usage allowance for categories sharing the same parent. If two categories A and B share the same parent, and A has a memory weight of 1 and B a memory weight of 3, then the content in category B is allowed to consume 3 times as much memory as A.

It is a requirement that either none or all sibling categories (categories having the same parent) has a memory specified. Furthermore, if a category specifies a memory weight, then the parent category also needs to specify a memory weight (the root category is an exception to this rule, as it always has an implied memory weight of 1). If these rules are not kept, a configuration error message will be given when attempting to start the Varnish daemon.

Whenever content needs to be removed from the cache, an algorithm will walk from the root of the category tree. At each level an assessment of the relative memory usage of each of the child categories is done. The child category that shows the highest weight adjusted memory usage is selected, and the process is repeated for that category. When a leaf category is reached, or a category is reached where its child categories does not have memory weights associated with them, a cache eviction is performed at that level.

The final step in the cache eviction process is to select the Least Recently Used content from among all of the objects associated with the selected category.

Example configuration using memory weights

This example expands our previous example, adding memory weights to some of the categories. The new MSE4 configuration is:

   env: {
	categories = {
		other = {
			*memory_weight* = 1;
		};
		media = {
			*memory_weight* = 5;
			images = {
				*memory_weight* = 1;
				icons = {};
				pictures = {};
			};
			video = {
				*memory_weight* = 3;
			};
		};
	};
	default_category = "other";
   };

The VCL code does not change from the previous example.

In this example, MSE4 will aim to spend 5 times more memory on the media content as it will on the other supporting HTML files. So 1/6 of the cache space will be spent towards the other category, and 5/6 towards the media category.

In the same way, MSE4 will spend 3 times more memory on video as it will on the images. So the 5/6 of the total cache space reserved for media content is further divided into 1 part images and 3 parts video.

Note that the media.images.icons and media.images.pictures categories do not have memory weights associated with them. This means that the content in these categories are for cache eviction purposes all treated as belonging to media.images. If an eviction needs to happen there, the Least Recently Used item would be selected from the combined set of both icons and pictures.

Content Categories and Persisted Caches

When configuring Varnish Enterprise to use persisted caching, Content Categories is the mechanism that is used to assign objects to stores when the content enters the cache. The VCL program decides the category, and if the selected category has one or more stores associated with it, then the object may be persisted to one of these stores. The stores are associated with a category by listing the store ID on the special *stores* key in the category declaration. A store can only be associated with a single category, meaning all objects from the same store will always belong to the same category.

The combined sizes of the stores associated with a category determines the total amount of disk space available for persisted caching of objects in that category. This is the method by which the administrator can provision the disk resources between different categories of objects. If there is a need to share a disk between multiple categories, multiple stores should be created on the disk, splitting the disk space between the categories as needed.

Upon daemon restarts, the objects from a given store gets the category designation from the category that its store is assigned to. This means that if a store’s category is changed in the configuration file, and the Varnish daemon restarted, then the persisted objects from the given store would also change category.

For more information about provisioning disks for persisted caching, please see Persisted Caching.

Example configuration using Content Categories and Persisted Caching

In this example, we build on the previous example and add stores to the other and media.video categories from two disks. The other category gets a small persisted space on one of the disks only, while the media.video gets the remaining space on the first disk and the complete space on the second.

   env: {
	categories = {
		other = {
			*memory_weight* = 1;
			*stores* = ("other");
		};
		media = {
			*memory_weight* = 5;
			images = {
				*memory_weight* = 1;
				icons = {};
				pictures = {};
			};
			video = {
				*memory_weight* = 3;
				*stores* = ("video_disk1", "video_disk2");
			};
		};
	};
	default_category = "other";

	books = ( {
		id = "book_disk1";
		filename = "/var/lib/mse/disk1/book_disk1";
		size = "2g";

		stores = ( {
			id = "other";
			filename = "/var/lib/mse/disk1/store_other";
			size = "100g";
		}, {
			id = "video_disk1";
			filename = "/var/lib/mse/disk1/store_video";
			size = "400g";
		} );
	}, {
		id = "book_disk2";
		filename = "/var/lib/mse/disk2/book_disk2";
		size = "2g";

		stores = ( {
			id = "video_disk2";
			filename = "/var/lib/mse/disk2/store_video";
			size = "500g";
		} );
	} );
   };

The VCL code does not change from the previous example.

Category subdivisions

Categories can further be subdivided, which is a mechanism that splits a category into multiple categories under the hood. This enables the splitting of some potentially contended data structures, allowing more concurrent changes to happen without creating bottlenecks. This can be a useful tuning to apply to caches that see a very high churn rate, with objects being inserted and evicted very frequently.

The subdivided categories do not get any names and are not visible to the administrator. MSE4 will assign content randomly to a subdivision. When a subdivided category is selected for content eviction, the subdivision with the most number of objects assigned to it will be the one selected to produce an object to evict.

The statistics for the subdivisions are not made accessible. The category statistics showed will be a sum across all of the subdivisions, displaying the complete category information.

The number of subdivisions for a category is controlled by setting the special *subdivisions* key in a category definition. The key should be set to an integer value, which specifies the number of subdivisions to create.

Categories that do not specify a *subdivisions* key will have the value from the environment level default_subdivisions applied to them. The default value of default_subdivisions is 4, meaning that all categories are given 4 subdivisions by default.