Search
Varnish Controller

Invalidations

Invalidations is the name used for purging, banning and tag-based invalidations. Support for invalidations is added in version 3.0.0. The Varnish Controller has some built in features that make invalidations easy to use in a multi layered setup. Invalidations support selection based on VCLGroup, domains and tags. These configurations can be combined to only invalidate for a single domain in a VCLGroup or invalidate first a storage layer and once that is finished invalidate the edge layer.

An invalidation can have the following states:

  • (1) Running - Invalidation is being run on agents.
  • (2) Completed - Invalidation completed successfully.
  • (3) Failed - Invalidation failed, list the agent errors with the agent errors API endpoint and check the brainz errors in the invalidation response.
  • The following states are only applicable when monitor agents is enabled in the invalidation request:
  • (4) Retrying - Retrying agents that are being monitored and are changed in state.
  • (5) Monitoring - Monitoring agents that were down, had NATS connection issues or had a VCLGroup deployed previously to be re-deployed as it could have old caches.

First steps

To start using invalidations you need to setup support for purging and/or banning in your VCL. The Varnish Controller does not change or add logic to handle invalidations, this should be implemented in the VCL deployed to the agents by the user. The agent will fire the invalidation requests from localhost. When setting up purging and banning in your VCL be sure to only allow purging and banning from localhost otherwise anyone could send in invalidation requests. There is also support to specify HTTP headers for invalidation requests, making it possible to add access control via headers. See the Varnish Developer Portal to find out how to setup purging with ACL. In a containerized environment the usage of CIDR blocks is recommended.

The Varnish Controller does basic ACL checks for the logged in user. The logged in users can only send invalidation requests to VCLGroups, tags or domains they have access to. This will prevent users from sending invalidations across organizations.

When an invalidation request comes in, brainz will filter the agents and send the invalidation request to the agents that need to invalidate over NATS. The agent then locally sends a HTTP request to the domain from localhost with the configured HTTP request. This happens parallel and asynchronous from the invalidation request. You can check the invalidation state to monitor the progress.

Invalidations are automatically removed after 1 hour if they succeeded or failed. It can be configured with the -remove-invalidations 1h option in the brainz configuration. The agents errors are also cleared when a invalidation is deleted.

Monitoring agents

Monitoring agents is used to:

  1. Run invalidations on agents that are down
  2. Run invalidations on agents that are having NATS connection issues
  3. Run invalidations on agents that previously had the tag, domain or VCLGroup deployed.

When setting the flag monitor agents it is recommended set a good execution TTL, as monitoring will stop when the execution TTL is reached. (See Execution TTL)

Agents that are down or have NATS connection issues at the time the invalidation requests is sent, will be invalidated as soon as the agent is reachable again. The VCLGroup needs to still be deployed to that agent for the invalidations to be run. The invalidation request could hang in the monitoring state if a tag, domain or VCLGroup that is being tried to invalidate was previously deployed to that agent. The problem is that if the tag, domain or VCLGroup gets added to that agent it will serve the old caches. It is important the invalidation monitors if there is a change and run invalidations as soon as the agent has that tag, domain or VCLGroup again.

There is a list with agents to monitor and a list with monitoring reasons explaining why the agent is being monitored. This will explain most situations, currently possible reasons are:

  1. Unable to reach agent over NATS
  2. Monitoring for previously deployed VCLGroups to be re-deployed
  3. Agent not running

Agents that are readonly will not be invalidated or monitored.

Execution TTL

The execution TTL indicates brainz how long the overall invalidation request is allowed to run. By default it is 2 minutes as the default TTL of caches in Varnish is 2 minutes. If the invalidation request takes more than 2 minutes the invalidation request will get the state Failed. This feature is good to use in combination with the monitor agents feature, as you can set it to for example 1 hour. If in that hour agents become reachable that were down when sending the invalidation request it will run the invalidation on those agents.

Checking errors

If the invalidation error gets a Failed state and you want to know why that happened, you will need to check the invalidation response. You can see the brainz errors if an error in brainz occurred, but most likely it will be agent errors. The agent errors are logged per agent, per domain, per path and can be large if you send in a lot of paths at once. There are 2 endpoint available to see the agent errors. To see the agent errors grouped based on agent, domain and status code you can use the /api/v1/invalidations/{invalidation}/errors-grouped endpoint. In this case there are a total of 24.000 errors, 12.000 per server and all on the same domain. This generates the following output:

[
    {
        "agentName": "server3",
        "errorCount": 12000,
        "host": "example.org",
        "statusCode": 500
    },
    {
        "agentName": "server1",
        "errorCount": 12000,
        "host": "example.org",
        "statusCode": 500
    }
]

To view a list of all the errors you can use /api/v1/invalidations/{invalidation-id}/errors this will return the following:

[
    {
        "agentName": "server1",
        "createdAt": "2021-09-24T12:44:57.49496Z",
        "error": "",
        "host": "example.org",
        "id": 2,
        "invalidationId": 1,
        "method": "PURGE",
        "path": "/images/1.png",
        "statusCode": 500,
        "updatedAt": "2021-09-24T12:45:19.803054Z"
    },
    23.999 more items ...
]

These endpoints support filtering and pagination like the other API endpoints. To filter errors on domain /api/v1/invalidations/{invalidation-id}/errors?host=example.org, to paginate /api/v1/invalidations/{invalidation-id}/errors?take=20,0.

There are cli commands available as well, to list all errors grouped use vcli inv errors {invalidation-id} and to list all errors you add the verbose flag vcli inv errors -v {invalidation-id}.

Invalidation configuration

invalidations-documentation-storage-edge.png
Figure 1. Agent setup

Considering the above setup we are going through a couple of scenarios how we can setup the invalidation requests. With all scenarios the user will need to have access to the domains, tags and VCLGroup it is requesting an invalidation on. If a user does not have access to a domain, tag or VCLGroup an authorization error will be returned.

Scenario 1

Using the tags we can invalidate all agents within the EU region with sending only the eu tag with the invalidation request. This approach can be used with all tags that the user has access to. This will run invalidations on Agent 2 and Agent 4 in no specific order.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "tags": [[{"name": "eu"}]]
}

Scenario 2

Using a tag list in the API we can define an order of tags to invalidate. We want to invalidate all agents with the storage and after those invalidations succeeded we want to invalidate the edge tag in the eu region (indicated by the eu tag). In this scenario it is good to know if there is an error when invalidating the storage agents as it will stop the invalidation and will not affect the edge agents. Here we give 2 tag lists, the Varnish Controller will first run invalidations on Agent 4 and after those succeeded it will run invalidations on Agent 2.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "tags": [[{"name": "storage"}, {"name": "eu"}], [{"name": "edge"}, {"name": "eu"}]]
}

Scenario 3

Using just a domain we can invalidate agents that have that domain deployed. This domain has to exists in the Varnish Controller, we will be using domain edge.example.com. This invalidation request will only invalidate Agent 1, but if more agents would have this domain configured those would be invalidated as well.

With root deployments a VCLGroup is required as then the domains do not exist in the Varnish Controller.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "domains": [{"fqdn": "edge.example.com"}]
}

Scenario 4

Invalidate a non-root VCLGroup, we are going to invalidate using VCLGroup 1 that has ID 1. This invalidation request will invalidate Agent 3 and Agent 4. The Varnish Controller will retrieve all domains configured for this VCLGroup, in this case it will add the domains strg.example.com and strg.example.org to the invalidation request.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "vclGroup": {"id":1}
}

Scenario 5

Only invalidate for VCLGroup 1 with domain strg.example.org. This will invalidate Agent 3 and Agent 4 but invalidation requests are only send for domain strg.example.org.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "vclGroup": {"id":1},
    "domains": [{"fqdn": "strg.example.org"}]
}

Scenario 6

Only invalidate for VCLGroup 1 with domain strg.example.org and tag eu. This will invalidate only Agent 4 and invalidation requests are only send for domain strg.example.org.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "vclGroup": {"id":1},
    "tags": [[{"name": "eu"}]],
    "domains": [{"fqdn": "strg.example.org"}]
}

Scenario 7

Invalidate agents with tag eu and send additional headers with the request. This will invalidate Agent 2 and Agent 4 in no specific order with the additional headers which you could read in your VCL.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "headers": {"X-Custom-Header": "custom-value"},
    "tags": [[{"name": "eu"}]]
}

Scenario 8

Invalidate a root VCLGroup. This is not present in figure 1 but is good to know. In case you have a root deployment and you want to invalidate that VCLGroup the Varnish Controller supports sending in domains that do not exist in the Varnish Controller. At least one domain is required in case of a root deployment invalidation. In the following example we have a root VCLGroup with ID 3 and we invalidate all agents with this VCLGroup deployed. The agents will send invalidation requests to all domains with the paths configured to Varnish on that agent.

Example invalidation request:

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "vclGroup": {"id":3},
    "domains": [
        {"fqdn": "example.eu"},
        {"fqdn": "example.info"}
    ]
}

Other combinations

It is possible to also combine just tags and domains, VCLGroups and tags and more. These were a couple of scenarios that can be used to run invalidations with.

Root deployments

With root deployments a domain still needs to be set. The domain will be set as a Host header with the HTTP request. The domain does not need to exists in the Varnish Controller, but you will need to set it to the domain you wish to invalidate for. The VCL is responsible for handling the invalidation requests.

With a root deployment you need to specify the VCLGroup and set the domains you wish to invalidate for. Using the fqdn you can send in domains that do not exists in the Varnish Controller, this only works for root deployments.

{
    "method": "PURGE",
    "paths": ["/images/1.png", "/images/2.png"],
    "vclGroup": {"id":3},
    "domains": [
        {"fqdn": "example.eu"},
        {"fqdn": "example.info"}
    ]
}

Banning

With invalidations it is also possible to setup banning over HTTP. First the VCL needs to be setup to allow for banning over HTTP, see this tutorial for more information about that. The reason why vcl_backend_response is set in the VCL is explained here.

The difference between vcli command ban and vcli inv is that the command vcli command ban runs the ban command through varnishadm on the Varnish instance. The vcli command is only available for system administrators and not for organization accounts. The vcli inv runs HTTP requests from the agent to the Varnish instance and an organization account is allowed to use this command. With vcli inv we do some basic ACL to only send the invalidations to the resources where the organization account has access to.

An example ban request if you are using the example code from the tutorial above is:

vcli inv new -m PURGE -p / -t name=example -H x-invalidate-pattern=^/example/[0-9]+\.html -d fqdn=example.org

The actual ban pattern is send in a header that is read in the VCL. The path is set to / as we need at least 1 path in order for the HTTP request to be run on the agent. For each path an HTTP request is sent from the agent to Varnish. The method is set to PURGE but if you want you can change that to BAN or something else as long as it corresponds with the method used in the VCL.

Caveats

Invalidation VCL Configuration & ACL in Varnish

The Varnish Controller will not add configurations to your VCL to handle purging and banning requests, the user is responsible to set this up according to their needs and use case. The Varnish Controller is also not able to perform ACL in Varnish for purging and banning, this has to be configured as well by the user in the VCL. The Varnish Controller can only perform basic ACL on the logged in user that is using the UI, CLI or API and makes sure to only send the invalidation request to the resources the user has access to.

Untagging agents & monitoring

If an agent is untagged we keep the previously deployed VCLGroups, if there is an invalidation request that will monitor agents it could be that the invalidation will stay in the monitoring state until the execution TTL is reached. The invalidation will fail in this case as it took to long to invalidate and there are still old caches on that Varnish instance of the previously deployed VCLGroup, meaning invalidation was not 100% successful.

Removing domains

When removing a domain from a VCLGroup the Varnish Controller does not know that the Varnish instance previously served that domain. Therefore an invalidation request send for the removed domain will not invalidate that Varnish instance. The Varnish Controller only keeps track of the history based on VCLGroup ID.