Why Varnish?

If you’re planning to build your own CDN, why should you consider using Varnish for the job?

To make our point, we won’t present a lot of new information, but instead will reiterate facts we have already mentioned throughout the book.

Request coalescing

Request coalescing ensures that massive amounts of concurrent requests for non-cached content don’t cause a stampede of backend requests.

As explained earlier, request coalescing will put requests for the same resource on a waiting list, and only send a single request to the origin. The response will be stored in cache and will satisfy all queued sessions in parallel.

In terms of origin shielding, this is a killer feature that makes Varnish an excellent building block for a private CDN.

Backend request routing

Because of VCL, Varnish is capable of doing granular routing of incoming requests to the selected backend.

A backend can be the origin server, but it could also be another caching tier that is part of your CDN strategy.

vmod_directors offers a wide range of load-balancing algorithms, and when content affinity matters, the shard director is the director of choice.

Extra logic, written in VCL, can even precede the use of directors.

When having to connect to a lot of backends, or connect to backends on-the-fly, Varnish Enterprise’s vmod_goto is an essential tool.

Performance and throughput

Varnish is designed for performance and scales incredibly well. If you were to build a private CDN using Varnish, the following facts and figures will give you an idea on how it is going to perform.

An individual Varnish server can handle more than 800,000 requests per second.
A throughput of over 100 Gbps can be achieved on a single Varnish server where Hitch is used to terminate TLS.
Varnish Enterprise can handle over 200 Gbps on a single server using its native TLS capabilities.
In terms of latency, Varnish can serve cached objects sub-millisecond.

These are not marketing numbers: these numbers were measured in actual environments, both by Varnish Software and some of its clients.

Of course you will only attain these numbers if you have the proper hardware, and if your network is fast and stable enough to handle the throughput. Some of the hardware that was used for these benchmarks is incredibly expensive.

In real-world situations on commercial off-the-shelf hardware, you will probably not be able to match this performance; however, Varnish is still freakishly fast.

Horizontal scalability

It is easy to scale out a cluster of Varnish servers to increase the capacity of the CDN.

In fact it is quite common to have two layers of Varnish for scalability reasons:

An edge tier that stores hot content in memory and routes cache misses to the storage tier via consistent hashing
A storage tier that is responsible for storing most of the content catalog

A request routing component selects one of two edge nodes. As explained, these edge nodes only contain the most popular objects. Via consistent hashing traffic is routed to the storage layer. The sharding director will create a consistent hash and will provide content affinity.

This content affinity, based on the request URL, will ensure that every miss for a URL on the edge tier will also be routed to the same server on the storage tier.

Adding storage capacity in your CDN is as simple as adding extra storage nodes.

Horizontally scaling the edge tier is also possible, but the hit rate doesn’t matter too much at that level. The only reason to do it is to increase the outward bandwidth of our PoP.

Remember the statement earlier in this chapter?

It’s not about the cache hits; it’s about how good your misses are.

In this case our misses are very good because they are also served by Varnish at the storage-tier level. There is really no need to beef up your edge tier too much as long as it doesn’t get crushed under the load of incoming requests.

Transparency

Because of Varnish’s unparalleled logging and monitoring tools, the transparency of a Varnish-based CDN is quite amazing.

varnishlog provides in-depth information about requests, responses, timing, caching, selected backends, and VCL decisions. On top of that you can use std.log() to log custom messages.

When you start using multiple Varnish nodes in a single environment, running varnishlog on each node can become tedious. At this point log centralization will become important.

Tools like Logstash and Beats offer plugins for varnishlog, which facilitates shipping logs to a central location without having to transfers log files.

In chapter 7 we already talked about Prometheus, and how it has become something of an industry standard for time-series data. varnishstat counters can easily be exported and centralized in Prometheus. A tool like Grafana can be used to visualize these alerts.

In Varnish Enterprise vmod_kvstore gives you the ability to have custom counters. And on top of that there’s Varnish Custom Statistics.

Having transparency is important: knowing how your CDN is behaving, being able to troubleshoot, and having actionable data to base decisions on. This leads to more control and a better understanding of your end-to-end delivery.

Once again Varnish proves to be an excellent candidate as CDN software.

Varnish Cache or Varnish Enterprise?

It is entirely possible to build your own CDN using Varnish Cache.

Storage becomes a bit trickier with Varnish Cache: we advise against using the file stevedore, which means your storage tier relies on memory only.

As long as you can equip your storage tier with enough memory and enough nodes, your CDN will scale out just fine. Just keep the increased complexity of managing large amounts of servers in mind.

A very important component is the shard director. It is responsible for creating the content affinity that is required to provide horizontal scalability of your storage tier. This director is part vmod_directors and is shipped with Varnish Cache.

The reality is that the Massive Storage Engine (MSE) is a key feature for building a private CDN:

MSE combines the speed of memory and the reliability of disks.
MSE can store petabytes of data on a single machine.
MSE is much more configurable than any other stevedore.
The Memory Governor ensures a constant memory footprint on the server.
MSE offers a persisted cache that can survive restarts.
vmod_mse allows MSE book and store selection on a per-request basis.

MSE is only available on Varnish Enterprise and is the number one reason why people who are building their own CDN choose Varnish Enterprise.

When your CDN increases in size, being able to benefit from the Varnish Controller will simplify managing those nodes.

Other than that, choosing between Varnish Cache and Varnish Enterprise will mainly depend on the VMODs you need.