This tutorial shows how to use the agent’s maintenance file to drain traffic from Varnish nodes. While the Drain Traffic tutorial covers the CLI, UI, and API approach, this tutorial focuses on the file-based mechanism which is useful for automation and scripting.
For a conceptual overview of the maintenance file, see Concepts: Maintenance File.
/var/lib/varnish-controller/<agent-name>/).In this example, we have three Varnish nodes (cache1, cache2, cache3) behind a Traffic Router.
We want to perform OS updates on each node without dropping client traffic.
The maintenance file is located at <base-dir>/<agent-name>/maintenance by default. If you changed
the maintenance-file configuration parameter, use that path instead.
# For an agent named "cache1" with default base directory:
$ ls /var/lib/varnish-controller/cache1/
agent.uid traffic_router_health.json
The maintenance file does not exist yet, which means the agent is in normal operating mode.
Check the agent’s health check file to confirm the current status:
$ cat /var/lib/varnish-controller/cache1/traffic_router_health.json
{"version":1,"status":"healthy","mbps":245.3,"max_mbps":1000,"score":0.24,"updated_at":"2026-04-14T10:30:00Z"}
The status is healthy, confirming the node is actively receiving routed traffic.
Create an empty file at the maintenance file path:
$ touch /var/lib/varnish-controller/cache1/maintenance
Within one second, the agent detects the file and updates its status. Verify:
$ cat /var/lib/varnish-controller/cache1/traffic_router_health.json
{"version":1,"status":"maintenance","mbps":245.3,"max_mbps":1000,"score":0.24,"updated_at":"2026-04-14T10:30:01Z"}
The status has changed to maintenance. The Traffic Router will no longer route new clients to
cache1.
The Varnish node still serves existing clients, especially those with cached DNS records. Monitor traffic until it subsides:
# Check active connections using varnishstat
$ varnishstat -1 -f MAIN.sess_conn -f MAIN.client_req
For DNS routing, wait at least the TTL duration of the DNS records before proceeding.
With traffic drained, perform your maintenance tasks:
$ sudo apt update && sudo apt upgrade -y
$ sudo systemctl restart varnish
Remove the maintenance file to restore the agent to healthy status:
$ rm /var/lib/varnish-controller/cache1/maintenance
Verify the status is restored:
$ cat /var/lib/varnish-controller/cache1/traffic_router_health.json
{"version":1,"status":"healthy","mbps":0,"max_mbps":1000,"score":0,"updated_at":"2026-04-14T10:45:00Z"}
The Traffic Router will begin routing new clients to cache1 again.
Repeat steps 3–6 for cache2 and cache3, one at a time.
---
- name: Rolling maintenance for Varnish nodes
hosts: varnish_nodes
serial: 1
vars:
base_dir: /var/lib/varnish-controller
drain_wait: 60
tasks:
- name: Enable maintenance mode
file:
path: "{{ base_dir }}/{{ agent_name }}/maintenance"
state: touch
mode: "0644"
- name: Wait for traffic to drain
pause:
seconds: "{{ drain_wait }}"
- name: Perform OS updates
apt:
upgrade: dist
update_cache: yes
- name: Restart Varnish
systemd:
name: varnish
state: restarted
- name: Disable maintenance mode
file:
path: "{{ base_dir }}/{{ agent_name }}/maintenance"
state: absent
- name: Wait before proceeding to next node
pause:
seconds: 10
You can verify the maintenance status from the Controller CLI at any time:
# List agents and their routing status
$ vcli agent list
+----+--------+---------+--------------+
| ID | Name | State | StopRouting |
+----+--------+---------+--------------+
| 1 | cache1 | Running | true |
| 2 | cache2 | Running | false |
| 3 | cache3 | Running | false |
+----+--------+---------+--------------+
When the maintenance file is created on disk, the StopRouting column reflects the change after
the agent reports it to brainz.
vcli agent stop-routing) and via the filesystem
(touch maintenance) achieve the same result. Mixing both methods simultaneously is
safe, the agent reconciles the state.