Tuning Varnish

Out-of-the-box Varnish performs exceptionally well. But based on the available server resources, based on your traffic patterns, and other criteria, you might not get the most out of Varnish.

It’s also entirely possible that the default settings are too taxing on your system.

Either way, there are dozens of parameters you can tune. The goal is to strike a balance between making efficient use of the available resources and protecting your server from excessive load.

Threading settings

When we talk about getting the most out of Varnish, this usually means increasing the amount of simultaneous requests.

The threading settings are the best way to tune the concurrency of Varnish. As mentioned in the Under The Hood section in chapter 1: thread_pool_min and thread_pool_max are the settings that control how many threads are in the thread pool.

As a quick reminder: instead of creating threads on demand, a couple of thread pools are initialized that contain a certain number of threads that are ready to use. Creating and destroying threads can create overhead and cause a slight delay.

By default there are two thread pools, which can be changed by modifying the thread_pools parameter. But benchmarks have shown no significant improvement by changing the value.

When varnishd starts, 200 threads are created in advance: 100 threads per thread pool. The thread_pool_min parameter can be used to tune this number.

Growing the thread pools

When the thread pools are running out of available threads, Varnish will grow the pools until thread_pool_max is reached. Growing the pools is also somewhat resource intensive.

By increasing thread_pool_min, it’s easier to cope with an onslaught of incoming traffic upon restart. This is common when Varnish sits behind a load balancer and is suddenly added to the rotation.

On the other hand, starting varnishd with too many threads will have an impact on the resource consumption of your server.

A good indicator is the MAIN.threads counter in varnishstat. It lets you know how many threads are currently active. You can correlate this number to your resource usage, and it helps you establish a baseline.

The thread_pool_min parameter should at least be the number of threads in use for an average traffic pattern.

When the MAIN.threads_limited counter increases, it means the thread pools ran out of available threads. If there is room to grow the pools, Varnish will add threads. If not, tasks will be queued until a thread is available.

The MAIN.threads_limited counter might increase early on when the MAIN.threads counter reaches the thread_pool_min threshold. As the pools are grown, it won’t occur that much. But when MAIN.threads reaches the thread_pool_max value, the change rate of the MAIN.threads_limited counter can indicate problematic behavior.

The thread queue doesn’t have an infinite size: each thread pool has a queue limit of 20 tasks. This is configurable via the thread_queue_limit parameter. When the queue is full, any new task or request will be dropped.

So when the MAIN.threads_limited counter increases, and the MAIN.sess_dropped or MAIN.req_dropped counters are increasing as well, know that the queue is full and sessions/streams are being dropped.

The MAIN.sess_dropped counter refers to HTTP/1.1 sessions being dropped, whereas MAIN.req_dropped refers to HTTP/2 streams being dropped.

You can choose to increase the thread_queue_limit, which will allow more tasks to be queued. Unless resources are too tight, you really want to increase thread_pool_min because it will make your system more responsive.

Shrinking the thread pools

When a thread has been idle for 300 seconds, Varnish will clean it up. This is controlled by the thread_pool_timeout parameter. The MAIN.threads counter will reflect this.

This means Varnish will automatically shrink the thread pools based on demand, but with a delay. But if increased server load, caused by varnishd worker threads, is too much for your system, you should decrease thread_pool_max to an acceptable level.

If you believe Varnish needs to clean up idle threads quicker, you can reduce the thread_pool_timeout. But remember: destroying threads also consumes resources.

Another factor that will impact server load is the worker stack size. This is stack space that is consumed by every worker thread. By limiting the stack size, we manage to reduce the memory footprint of Varnish on the system. The size is configurable via the thread_pool_stack parameter. The default stack size in Varnish is * 48 KB. The default process stack size on Linux is typically multiple orders of magnitude larger than the stack sizes we use in Varnish.

Stack space is typically consumed by third-party libraries that are used by Varnish. libpcre, the library to run Perl Compatible Regular Expressions, can consume quite a bit of stack space. If you write very complicated regular expressions in VCL, you might even cause a stack overflow.

When a stack overflow happens, you should increase the value of thread_pool_stack. But this, in its turn, will have a direct impact on resource consumption because the worker stack size is per thread.

If you set your worker stack size to 100 KB and you have 5000 threads in two thread pools, this will consume almost 1 GB of memory. So be careful, and consider reducing thread_pool_max when this would be too taxing on your system.

Client-side timeouts

Varnish has some client-side timeouts that can be configured, which can improve the overall experience.

Most of these settings have already been discussed in the security section of this chapter, as they can be used to mitigate denial of service (DoS) attacks.

The timeout_idle parameter is one of these. It’s a sort of keep-alive timeout that defines how long a connection remains open after a request. If no new pipelined request is received on the connection within five seconds, the connection is closed.

The idle_send_timeout parameter defines how long Varnish is willing to wait for the next bytes, after having already received data. This is a typical between-bytes timeout.

And then there’s also the send_timeout parameter, which acts as a last-byte timeout.

From a DoS perspective these settings can help you prevent slowloris attacks, as mentioned earlier.

From a performance point of view, these settings can also be used to improve the end-user experience. If your Varnish servers sit behind a set of load balancers, it makes sense to increase timeout_idle because you know they are the only devices that are directly connecting to Varnish, and they are most probably going to reuse their connections with Varnish.

If you’re handling large volumes of data that are processed by potentially slow clients, you can also increase the send_timeout value.

Backend timeouts

For requests that cannot be served from cache, a backend connection is made to the origin server, which acts as the source of truth.

If your backend is slow, or the connection is unreliable, backend connections might be left open for too long. It is also possible that the connection is closed while data is still being sent.

In order to strike the right balance, Varnish offers a set of backend timeouts. You should already be familiar with the settings, as they are configurable in your VCL backend definition.

The backend timeouts you can tune are the following ones:

connect_timeout: the amount of time Varnish is willing to wait for the backend to accept the connection
first_byte_timeout: the timeout for receiving the first byte from the origin
between_bytes_timeout: the amount of time we are willing to wait in between receiving bytes from the backend

Here’s a quick reminder on how to configure this in VCL:

vcl 4.1;

backend default {
	.host = "origin.example.com";
	.port = "80";
	.connect_timeout = "10s";
	.first_byte_timeout = "90s";
	.between_bytes_timeout = "5s";
}

These settings are of course also available as varnishd runtime parameters, but it is important to know that the values in VCL are on a per-backend basis, and take precedence over the runtime parameters.

These parameters can also be specified on a per-request basis, using bereq.connect_timeout or bereq.first_byte_timeout from VCL_backend_fetch in VCL.

The backend_idle_timeout parameter is not configurable in VCL, defaults to 60 seconds, and defines how long a backend connection can be idle before Varnish closes it.

Workspace settings

You might remember the concept of workspaces from the Under The Hood section in chapter 1. Workspaces is a concept that is used to lessen the strain on the memory allocator where a chunk of memory is allocated for a specific task in Varnish.

Whereas the stack space is an operating system concept, the workspace is a Varnish-specific concept. The workspace memory is used for request and response parsing, for VCL storage, and also for any VMOD requiring memory space to store data.

There are different kinds of workspaces, and each of them can be tuned. When a workspace overflow occurs, this means the transactions couldn’t allocate enough memory to perform their tasks.

The workspace_client parameter, with a default value of 64 KB, is used to limit memory allocation for HTTP request handling.
The workspace_backend parameter, which also has a default value of 64 KB, sets the amount of memory that can be used during backend processing.
The workspace_session parameter limits the size of workspace memory used to establish the TCP connection to Varnish. The default value is 0.5 KB.

When a task consumes more memory than allowed in one of the specific workspace contexts, the transaction is aborted, and an HTTP 503 response is returned. When a workspace_session overflow occurs, the connection will be closed.

It is always possible to increase the size of the various workspaces. Memory consumption depends on what happens in VCL, but also depends on the input Varnish receives from clients and backends.

A better solution is to optimize your VCL, or reduce the size and the amount of headers that are sent by the backend. But sometimes, you have no control over this, or no way to significantly reduce memory consumption. In that case, increasing workspace_client or workspace_backend is your best move.

Luckily there are ways to monitor workspace overflow. These workspaces have a varnishstat overflow counter:

MAIN.ws_client_overflow
MAIN.ws_backend_overflow
MAIN.ws_session_overflow

When these counters start increasing, don’t blindly increase the workspace size. Instead, have a look at your logs, see which transactions cause the overflow, and try to figure out if you can optimize that part of your VCL to avoid the overflows in the future.

As always, varnishstat and varnishlog will be the tools you need to figure out what is going on before deciding to increase the size of the workspaces.

HTTP limits

HTTP requests and responses are parsed by Varnish. As mentioned earlier, parsing them requires a bit of workspace memory.

Incoming requests and cached responses are parsed in the client context, and use client workspace memory. When a cache miss takes place, and the response needs to be parsed from the origin server, we operate in the backend context. This will consume backend workspace memory.

There are certain limits in place that prevent Varnish from having to waste too much memory on request and response parsing and to avoid DoS attacks.

Here’s an overview of runtime parameters that limit the length and size of requests and responses:

http_max_hdr: the maximum number of headers an HTTP request or response may contain. The default value is 64.
http_req_hdr_len: the maximum size of an individual request header. By default this is 8 KB.
http_req_size: the maximum total size of the HTTP request headers. This defaults to 32 KB.
http_resp_hdr_len: the maximum size of an individual response header. By default this is 8 KB.
http_resp_size: the maximum total size of the HTTP response headers. This defaults to 32 KB.

When requests or responses exceed these limits, the transaction will fail.

HTTP request limit examples

Here’s some example logging output when the http_max_hdr threshold is exceeded:

*   << Request  >> 5
-   Begin          req 4 rxreq
-   Timestamp      Start: 1611051232.286266 0.000000 0.000000
-   Timestamp      Req: 1611051232.286266 0.000000 0.000000
-   BogoHeader     Too many headers: foo:bar
-   HttpGarbage    "GET%00"
-   RespProtocol   HTTP/1.1
-   RespStatus     400
-   RespReason     Bad Request
-   ReqAcct        519 0 519 28 0 28
-   End

As you can see, an HTTP 400 status code is returned when this happens.

Here’s an example where an individual request header exceeds the http_req_hdr_len limit:

*   << Request  >> 98314
-   Begin          req 98313 rxreq
-   Timestamp      Start: 1611051653.320914 0.000000 0.000000
-   Timestamp      Req: 1611051653.320914 0.000000 0.000000
-   BogoHeader     Header too long: test:YnEJyVqxTMgn7aX
-   HttpGarbage    "HEAD%00"
-   RespProtocol   HTTP/1.1
-   RespStatus     400
-   RespReason     Bad Request
-   ReqAcct        10081 0 10081 28 0 28
-   End

When the total request size exceeds http_req_size, the following output can be found in your VSL:

*   << Session  >> 32793
-   Begin          sess 0 HTTP/1
-   SessOpen       172.21.0.1 60576 http 172.21.0.3 80 1611052643.429084 30
-   SessClose      RX_OVERFLOW 0.001
-   End

HTTP response limit examples

When the origin server returns too many headers and exceeds the http_max_hdr limit, this doesn’t result in an HTTP 400 status, but in an actual HTTP 503.

You might see the following output appear in your VSL:

-- BogoHeader     Too many headers:foo: bar
-- HttpGarbage    "HTTP/1.1%00"
-- BerespStatus   503
-- BerespReason   Service Unavailable
-- FetchError     http format error

And when this happens, the MAIN.losthdr counter will also increase.

When the http_resp_hdr_len limit is exceeded, you will see the following output end up in VSL:

--  BogoHeader     Header too long: Last-Modified: Tue,
--  HttpGarbage    "HTTP/1.1%00"
--  BerespStatus   503
--  BerespReason   Service Unavailable
--  FetchError     http format error

And finally, when the http_resp_size limit is exceeded, the following VSL line may serve as an indicator:

--  FetchError     overflow

Make sure you have enough workspace memory

Remember, HTTP header processing, both for requests and responses, is done using workspace memory. If you decide to increase some of the HTTP header limits in varnishd, there’s no guarantee that Varnish will work flawlessly.

The HTTP limits are there for a reason, and the defaults have been chosen pragmatically. When for example a client workspace overflow occurs, you’ll see the following occur in your VSL:

-   Error          workspace_client overflow
-   RespProtocol   HTTP/1.1
-   RespStatus     500
-   RespReason     Internal Server Error

Interestingly, the status code is HTTP 500 and not HTTP 503. This makes sense because the backend didn’t fail; it’s actually Varnish that failed.

In the backend context, Varnish is more likely to drop response headers that would cause a backend workspace overflow rather than fail the transaction.

When this happens, you’ll see LostHeader tags appear in your VSL output:

--  LostHeader     foo:
--  LostHeader     bar:

If you really need to increase some of the HTTP limits, please ensure the workspace size is updated accordingly.

Limiting I/O with tmpfs

Varnish uses the /var/lib/varnish folder quite extensively. It’s the place where the VSL circular buffer is located. It’s also the place where the compiled VCL files are stored as .so files.

To avoid that VSL causes too many I/O operations, we can mount /var/lib/varnish as a tmpfs volume. This means /var/lib/varnish is actually stored in memory on a RAM disk.

These are the commands you need to make it happen:

$ echo "tmpfs /var/lib/varnish tmpfs defaults,noatime 0 0" | sudo tee -a /etc/fstab
$ sudo mount /var/lib/varnish

You’ll agree that this is a no-brainer.

Other settings

There are some other random settings that aren’t deserving of their own subsection but can be useful nonetheless.

Listen depth

listen_depth is one of those settings. It refers to the number of unacknowledged pending TCP connections that are allowed in Varnish. On really busy systems, setting the queue high enough will yield better results.

But there is a fine line between better results and increased server load. The default value for listen_depth is set to 1024 connections.

However, this value is ignored if the operating system’s somaxconn value is lower. Please verify the contents of /proc/sys/net/core/somaxconn to be sure.

If your operating system’s value is too low, you can tune it via sysctl -w net.core.somaxconn=1024.

Nuke limit

This section doesn’t apply to MSE, which will still use LRU eviction but relies on different mechanisms to keep things in check.

When you have a lot of inserts and your cache is full, the nuking mechanism will remove the least recently used objects to free the required space.

The MAIN.n_lru_nuked counter indicates that LRU nuking is taking place.

When a lot of small objects are stored in the cache, and a large objects needs to be inserted, the nuking mechanism may need to remove multiple objects before having enough space to store the new object.

There is an upper limit as to how many objects can be removed before the eviction is aborted. This is defined by the nuke_limit parameter. The standard value is set to 50.

If more than 50 object removals are required to free up space, Varnish will abort the transaction and return an HTTP 503 error. The MAIN.n_lru_limited counter will count the number of times the nuke limit was reached.

Unfortunately, Varnish Cache has a limitation where a task can request LRU nuking, but where another competing task will steal its space. This might also be a reason why the nuke_limit threshold is reached.

Short-lived

The shortlived parameter will enforce the threshold for short-lived objects. This means that objects with a TTL lower than the value of shortlived will not be stored in the regular caching stevedore. Instead these objects will be stored in transient storage.

You may remember that transient storage is unbounded by default. This can result in your server going out of memory when the transient objects rapidly increase. By default the shortlived threshold is set to ten seconds.

Objects where the full TTL, which also implies the grace and keep values, is lower than ten seconds will go into the transient storage.

Logging CLI traffic in syslog

By default CLI commands are logged to syslog via syslog(LOG_INFO). On systems that rely a lot on the CLI, this may result in a lot of noise in the logs, but also in degraded performance.

If you don’t care that CLI commands are not logged, just set syslog_cli_traffic to off. It’s always a tradeoff unfortunately.