If you’re planning to build your own CDN, why should you consider using Varnish for the job?
To make our point, we won’t present a lot of new information, but instead will reiterate facts we have already mentioned throughout the book.
Request coalescing ensures that massive amounts of concurrent requests for non-cached content don’t cause a stampede of backend requests.
As explained earlier, request coalescing will put requests for the same resource on a waiting list, and only send a single request to the origin. The response will be stored in cache and will satisfy all queued sessions in parallel.
In terms of origin shielding, this is a killer feature that makes Varnish an excellent building block for a private CDN.
Because of VCL, Varnish is capable of doing granular routing of incoming requests to the selected backend.
A backend can be the origin server, but it could also be another caching tier that is part of your CDN strategy.
vmod_directors offers a wide range of load-balancing algorithms, and
when content affinity matters, the shard director is the director of
choice.
Extra logic, written in VCL, can even precede the use of directors.
When having to connect to a lot of backends, or connect to backends
on-the-fly, Varnish Enterprise’s vmod_goto is an essential tool.
Varnish is designed for performance and scales incredibly well. If you were to build a private CDN using Varnish, the following facts and figures will give you an idea on how it is going to perform.
These are not marketing numbers: these numbers were measured in actual environments, both by Varnish Software and some of its clients.
Of course you will only attain these numbers if you have the proper hardware, and if your network is fast and stable enough to handle the throughput. Some of the hardware that was used for these benchmarks is incredibly expensive.
In real-world situations on commercial off-the-shelf hardware, you will probably not be able to match this performance; however, Varnish is still freakishly fast.
It is easy to scale out a cluster of Varnish servers to increase the capacity of the CDN.
In fact it is quite common to have two layers of Varnish for scalability reasons:
A request routing component selects one of two edge nodes. As explained, these edge nodes only contain the most popular objects. Via consistent hashing traffic is routed to the storage layer. The sharding director will create a consistent hash and will provide content affinity.
This content affinity, based on the request URL, will ensure that every miss for a URL on the edge tier will also be routed to the same server on the storage tier.
Adding storage capacity in your CDN is as simple as adding extra storage nodes.
Horizontally scaling the edge tier is also possible, but the hit rate doesn’t matter too much at that level. The only reason to do it is to increase the outward bandwidth of our PoP.
Remember the statement earlier in this chapter?
It’s not about the cache hits; it’s about how good your misses are.
In this case our misses are very good because they are also served by Varnish at the storage-tier level. There is really no need to beef up your edge tier too much as long as it doesn’t get crushed under the load of incoming requests.
Because of Varnish’s unparalleled logging and monitoring tools, the transparency of a Varnish-based CDN is quite amazing.
varnishlog provides in-depth information about requests, responses,
timing, caching, selected backends, and VCL decisions. On top of that
you can use std.log() to log custom messages.
When you start using multiple Varnish nodes in a single environment,
running varnishlog on each node can become tedious. At this point log
centralization will become important.
Tools like Logstash and Beats offer plugins for varnishlog, which
facilitates shipping logs to a central location without having to
transfers log files.
In chapter 7 we already talked about Prometheus, and how it has
become something of an industry standard for time-series data.
varnishstat counters can easily be exported and centralized in
Prometheus. A tool like Grafana can be used to visualize these
alerts.
In Varnish Enterprise vmod_kvstore gives you the ability to have
custom counters. And on top of that there’s Varnish Custom Statistics.
Having transparency is important: knowing how your CDN is behaving, being able to troubleshoot, and having actionable data to base decisions on. This leads to more control and a better understanding of your end-to-end delivery.
Once again Varnish proves to be an excellent candidate as CDN software.
It is entirely possible to build your own CDN using Varnish Cache.
Storage becomes a bit trickier with Varnish Cache: we advise against using the file stevedore, which means your storage tier relies on memory only.
As long as you can equip your storage tier with enough memory and enough nodes, your CDN will scale out just fine. Just keep the increased complexity of managing large amounts of servers in mind.
A very important component is the shard director. It is responsible
for creating the content affinity that is required to provide
horizontal scalability of your storage tier. This director is part
vmod_directors and is shipped with Varnish Cache.
The reality is that the Massive Storage Engine (MSE) is a key feature for building a private CDN:
vmod_mse allows MSE book and store selection on a per-request
basis.MSE is only available on Varnish Enterprise and is the number one reason why people who are building their own CDN choose Varnish Enterprise.
When your CDN increases in size, being able to benefit from the Varnish Controller will simplify managing those nodes.
Other than that, choosing between Varnish Cache and Varnish Enterprise will mainly depend on the VMODs you need.