Varnish built-in VCL behavior

In the previous section, we talked about how Varnish deals with the Cache-Control header. It talked about how Varnish deals with the different values internally.

Parts of this behavior are implemented in the Varnish core, but other parts are implemented in the so-called built-in VCL.

The built-in VCL contains a set of rules that will be executed by default, even if they are not specified in your own VCL file. It is possible to bypass the built-in VCL through a return statement, but this chapter assumes that this does not happen. The built-in VCL provides much of the safe-by-default behavior of Varnish, so be very careful if you decide to skip it.

The VCL language is tightly connected with the Varnish finite state machine, where the various VCL subroutines correspond to states in the machine. Since the built-in VCL executes last, each VCL subroutine ends with a return statement, which controls the flow of the machine. Fully understanding the finite state machine requires an understanding of the built-in VCL, and vice versa. Luckily, a basic understanding of both is sufficient to solve most use cases for VCL.

We won’t be going over the built-in VCL code in this section, only the built-in VCL behavior. In the next chapter we’ll cover VCL in detail, including the syntax. The corresponding built-in VCL code will be presented in the next chapter, which will make a lot more sense.

It is important to note that the built-in VCL is very cautious in its implementation. This prevents Varnish from caching anything that shouldn’t be cached, but this can result in a very low hit rate when the backend does not provide good caching headers or uses cookies. How to mitigate this is explained in chapter 4.

Let’s look at the concrete behavior that the built-in VCL implements.

When is a request cacheable?

The first task for the built-in VCL is to decide if a request is cacheable.

When Varnish receives an HTTP request from the client, it first looks at the HTTP request method to decide what needs to happen.

Cacheable request methods

In the previous section, we talked about cacheable request methods. This logic is also part of the built-in VCL.

If it turns out the request method equals GET or HEAD, Varnish will allow the request to be served from cache. Other request methods will immediately result in a pass or pipe to the backend.

A small side note about GET requests: although it’s not that common, a GET request can contain payload in its request body. However, Varnish will strip the request body from a GET request before sending it to the backend.

Invalid request methods

The cacheability of a request method isn’t the only thing Varnish cares about. Varnish will only deal with request methods it can handle.

If the request method is PRI, which shouldn’t normally happen, Varnish will stop execution with an HTTP 405 Method not allowed error. That’s because PRI is part of HTTP/2, and is handled in the Varnish core when HTTP/2 is enabled.

If you receive a request with such a method, it means HTTP/2 wasn’t properly configured, and you’re receiving an HTTP/2 request on an HTTP/1.1 server.

By default Varnish will only handle the following request methods:

GET
HEAD
POST
PUT
PATCH
DELETE
OPTIONS
TRACE

For any other request method, other than PRI of course, the built-in VCL will turn the connection into a pipe.

This means that Varnish doesn’t only pass the request to the backend, but it no longer treats the incoming request as an HTTP request.

Varnish opens up a pipe to the backend, and just sends the incoming request through as regular TCP, without adding any notion of HTTP.

Please note that there is no pipe in HTTP/2. For HTTP/1.1, websockets is the main use case.

State getting in the way

As mentioned before, caching when state is involved is tricky.

Stateful data is usually personalized data. Storing this kind of data in the cache doesn’t make sense. Previously we mentioned cache variations as a potential solution.

But by default, Varnish doesn’t serve any content from cache that either contains a Cookie request header, or an Authorization request header.

In the real world, cookies will almost always be used. That’s why you’ll need to write custom VCL for your specific application, which caches certain cookies, and strips off others. In the next chapter, we’ll show you how to do this.

How does Varnish identify objects in cache?

Once Varnish decides that an incoming request is cacheable, it needs to look up the corresponding object in cache.

When an object is stored in cache, a hash key is used to identify the object in cache. As previously explained, the hostname and the URL are used as identifiers to create this hash.

The hostname is the value that comes out of the Host request header. When the request doesn’t contain a Host header, Varnish will use the its own server IP address instead. This can only happen when a request uses an old version of HTTP.

As of HTTP/1.1, a Host header is no longer optional. Varnish will return an HTTP 400 error when it notices an HTTP/1.1 request without a Host header.

Dealing with stale content

One of the core caching principles is the time to live (TTL). This is not part of the built-in VCL, but part of the core caching logic of Varnish.

We talked about Cache-Control values, and about the Expires header. We even talked briefly about overriding the TTL in VCL code.

Varnish will use these mechanisms to come up with the TTL of an object. And as long as the TTL is greater than zero, the object is deemed fresh, and will be served from cache.

Varnish will check the freshness of an object upon every cache hit. If it turns out the TTL has become zero or less, the object is deemed stale and revalidation needs to happen.

Because of the so-called grace mode, Varnish is able to serve stale data to the client while asynchronously revalidating the content with the backend server. We already talked about this when explaining stale-while-revalidate behavior.

Varnish will check the sum of the TTL and the grace value, and if it is greater than zero, it will still serve the stale data while performing a background fetch.

If the sum is zero or less than zero, the content will cause a cache miss, and a synchronous fetch will happen.

The default_grace runtime parameter defines the default grace period when the grace period is not specified in a response header from the backend. The default value for default_grace is ten seconds, but it can be changed in the command line or dynamically when Varnish is running. If an object’s grace is ten seconds on insertion, then the object will be asynchronously revalidated until ten seconds after the TTL has expired. When the grace period has passed, a request to this object will be a cache miss and will trigger a synchronous fetch to the origin.

The grace period can also be set in VCL by assigning a value to the beresp.grace variable, but this will be discussed in the next chapter. As mentioned before: the Cache-Control header also has the stale-while-revalidate directive to set the grace period.

When does Varnish store a response in cache?

There’s a difference between a cacheable request, and deciding that a response should be stored in cache.

The former decision is made when Varnish receives an incoming request. Chances are that the object is already stored in cache, which results in a hit. If that’s not the case, it’s a miss.

When a request is not served from cache, a backend request is made. When the backend responds, Varnish will receive the HTTP response and will decide what to do with it.

The decision-making process as to whether or not to cache is based on response headers sent by the backend.

The first step in the decision-making process is interpreting the TTL. If the TTL, coming from Cache-Control or Expires headers is zero or less, Varnish will decide not to store this response in cache.

In the next step, the built-in VCL will check if there’s a Set-Cookie in the response. The Set-Cookie header is used to store state in the client, and is usually used for private information or as a unique identifier for the user. For this reason the built-in VCL will mark the response as uncacheable, so that no other clients can get the same, potentially private, Set-Cookie header.

In the previous section we talked about surrogates. Varnish checks whether or not a Surrogate-Control: no-store is set. If this is the case, Varnish will not store the response in cache.

When there’s no Surrogate-Control header being returned, Varnish will also look at the semantics of the Cache-control header, beyond the interpretation of the TTL. If terms like no-cache, no-store, or private occur in this header, Varnish will decide not to store this response in cache.

And finally, if a Vary: * header is sent by the origin, Varnish won’t cache the response either. Because varying on all headers makes no sense if you want to cache a response.

What happens if the response couldn’t be stored in cache?

When Varnish decides that a response is not cacheable, for the reasons mentioned above, the response is directly served to the client that requested it without being cached.

However, Varnish will store some metadata in the cache for uncacheable resources. Varnish will create a hit-for-miss object to recognize the fact that the response was uncacheable.

The purpose of this hit-for-miss object is to bypass the waiting list for future requests to this resource.

Requests for cacheable resources can be put on the waiting list while request coalescing happens.

Request coalescing was explained in chapter 1. It makes sure that multiple requests that can be satisfied by a single backend request are not sent to the backend, but instead are put on a waiting list. When the backend response is received, the response is sent to all satisfiable requests that were queued in the waiting list.

By marking a request for a specific resource uncacheable through a hit-for-miss object, we avoid the waiting list and immediately issue a backend request. We do this knowing that a request for this resource will probably not be satisfied through request coalescing.

As a consequence, we avoid potential request serialization. This term refers to requests being processed in a serial manner instead of in parallel. Request serialization causes extra latency and even becomes a bottleneck when there are enough clients requesting the same URL.

Request coalescing is otherwise a powerful Varnish feature, and the waiting list is part of this implementation. But for uncacheable content, the waiting list would become counterproductive. That’s why the hit-for-miss is there to deliberately bypass it.

The hit-for-miss logic is actually quite forgiving: hit-for-miss objects can be replaced with actual cached responses when the next backend response is considered cacheable.

A hit-for-miss object is kept in cache for a certain amount of time. By default a hit-for-miss object has a TTL of two minutes. This TTL represents the upper limit, but if the next response is cacheable, the hit-for-miss object is replaced with the actual cacheable response.