Versions above 7.2.1 have the capability to expose monitoring data of your cluster. As well as health checks per component (Brainz, API-GW, Agent and Router), each component has their own checks that can be included in the health check. To use the monitoring functionality the Management Interface has to be enabled per component.
Each component has a few new configuration options, they are the same ones that the Traffic Router
already had. To enable the Management Interface for the component, enable the -enable-mgmt
configuration value. You can change on which host and port the Management Interface listens on by
changing the -mgmt-host and -mgmt-port values.
| Flag | Config file | Environment | Default |
|---|---|---|---|
-enable-mgmt |
enable-mgmt |
VARNISH_CONTROLLER_ENABLE_MGMT |
false |
-mgmt-host |
mgmt-host |
VARNISH_CONTROLLER_MGMT_HOST |
localhost |
-mgmt-port |
mgmt-port |
VARNISH_CONTROLLER_MGMT_PORT |
8092 |
When using TLS for the Management Interface, be sure to enable the -mgmt-tls configuration value
and set the -key and -cert. The key and cert values need to be paths to the private key and
certificate you’d wish to use for the Management Interface. These certificates get loaded before we
drop the privileges to the configured user. Since the Management Interface is mainly used for local
traffic, TLS is not necessary if it is not publicly available to the internet.
| Flag | Config file | Environment | Default |
|---|---|---|---|
-mgmt-tls |
mgmt-tls |
VARNISH_CONTROLLER_MGMT_TLS |
false |
-cert |
cert |
VARNISH_CONTROLLER_CERT |
|
-key |
key |
VARNISH_CONTROLLER_KEY |
The Management Interface has support to perform health checks towards the interface.
By default these health checks are only available from localhost, configure the correct host
by setting the -mgmt-host configuration.
The health checks can be performed by doing a HTTP request to the GET /healthy endpoint. Without
any requested checks, this endpoint will return a 200 OK when the binary is running. Extra checks
can be defined by adding a request parameter called check. Multiple checks can be requested,
they are comma separated in the same request parameter. In the response body you’ll find the
performed checks and their status (true or false).
Examples:
# Always returns 200 OK if the binary is running.
GET http://<MGMT_HOST>:<MGMT_PORT>/healthy
# Only returns 200 OK if nats is running.
GET http://<MGMT_HOST>:<MGMT_PORT>/healthy?check=nats
# Only returns 200 OK if nats and the database is running.
GET http://<MGMT_HOST>:<MGMT_PORT>/healthy?check=nats,db
The available checks are sent in the check query parameter and are comma separated.
If all checks succeed or there are no checks defined at all, a 200 OK is returned.
Brainz
nats - Checks if NATS is connecteddb - Checks if the database is connectedAPI-GW
nats - Checks if NATS is connectedAgent
nats - Checks if NATS is connectedvarnish - Checks if VarnishADM is connectedRouter
nats - Checks if NATS is connectedany-endpoint - Checks if any endpoint is healthyspecific-endpoint - Checks if a specific endpoint is healthy, specified by the domain query parameter ?check=nats,specific-endpoint&domain=test.seKubernetes uses liveness and readiness probes to determine the health of a container:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-service
spec:
replicas: 2
selector:
matchLabels:
app: my-service
template:
metadata:
labels:
app: my-service
spec:
containers:
- name: my-service
image: my-service:latest
ports:
- containerPort: <MGMT_PORT>
readinessProbe:
httpGet:
path: /healthy?check=nats
port: <MGMT_PORT>
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthy
port: <MGMT_PORT>
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 2
failureThreshold: 3
Docker Compose supports
container-level health checks
using the health check field in docker-compose.yml.
version: "3.9"
services:
my-service:
image: my-service:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:<MGMT_PORT>/healthy"]
interval: 10s
timeout: 2s
retries: 3
start_period: 5s
Find out how to setup Prometheus and DataDog scraping here.
Example how to graph out the MBPS per agent with joining the agent name to it for identification:
sum by(agent_id) (varnish_controller_agent_mbps)
* on(agent_id) group_left(name, version, varnish_version)
max by(agent_id, name, version, varnish_version) (
varnish_controller_agent_info{}
)
Here are some default dashboards for Grafana that can be imported when prometheus is connected to Grafana.
varnish_controller_error_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of errors registered in the logs.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.PromQL examples:
# Get the error count per brainz component
varnish_controller_error_total{brainz_id!=""}
varnish_controller_go_cpus
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of CPUs.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_gc_duration_seconds
Type: Summary
Exposed by: Brainz, API-GW, Agent and Router
Description: A summary of the pause duration of garbage collection cycles.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_alloc_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes allocated and still in use.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_alloc_bytes_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: Total number of bytes allocated, even if freed.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_buck_hash_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes used by the profiling bucket hash table.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_forced_gc_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of completed forced GC cycles. A forced GC cycle is triggered from the application.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_frees_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: Total number of frees.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_gc_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes used for garbage collection system metadata.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_gc_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of completed GC cycles.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_heap_alloc_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of heap bytes allocated and still in use.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_heap_idle_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of heap bytes waiting to be used.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_heap_inuse_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of heap bytes that are in use.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_heap_objects
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of allocated objects.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_heap_released_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of heap bytes released to OS.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_heap_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of heap bytes obtained from system.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_last_gc_time_unix_nano
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of nanoseconds since 1970 of last garbage collection.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_lookups_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: Total number of pointer lookups.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_mallocs_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: Total number of mallocs.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_mcache_inuse_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes in use by mcache structures.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_mcache_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes used for mcache structures obtained from system.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_mspan_inuse_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes in use by mspan structures.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_mspan_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes used for mspan structures obtained from system.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_next_gc_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of heap bytes when next garbage collection will take place.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_other_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes used for other system allocations.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_stack_inuse_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes in use by the stack allocator.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_stack_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes obtained from system for stack allocator.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_memstats_sys_bytes
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of bytes obtained from system.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_routines
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of running Go routines in this component.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_go_threads
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Number of OS threads created.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_hb_time_unix
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: The UNIX UTC timestamp in seconds of when the last heartbeat came in.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.PromQL examples:
# Get the amount of seconds of when the last heartbeat came in per component
(time() - varnish_controller_hb_time_unix)
varnish_controller_start_time_unix
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: The UNIX UTC timestamp in seconds of when the component started.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.PromQL examples:
# Get the amount of hours each process has been running
(time() - varnish_controller_start_time_unix) / 3600
# Get the uptime for all Brainz components and add the hostname as a label (host)
(time() - varnish_controller_start_time_unix) / 3600 * on(brainz_id) group_left(host) varnish_controller_brainz_info
# Get the uptime for all Agent components and add the hostname as a label (host)
(time() - varnish_controller_start_time_unix) / 3600 * on(agent_id) group_left(host) varnish_controller_agent_info
varnish_controller_warning_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of warnings registered in the logs.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.PromQL examples:
# Get the warning count per brainz component
varnish_controller_warning_total{brainz_id!=""}
varnish_controller_nats_connected
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: Indicates if NATS is connected. (1) connected, (0) disconnected
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_incoming_bytes
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of incoming bytes to NATS to this component.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_incoming_messages
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of incoming messages to NATS to this component.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_outgoing_bytes
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of outgoing bytes from NATS from this component.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_outgoing_messages
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of outgoing messages from NATS from this component.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_payload_max
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: The maximum payload of a single NATS message.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_reconnects_total
Type: Counter
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of NATS reconnects that have occurred.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_servers_total
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of NATS servers that are known to this component.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_nats_subscriptions_total
Type: Gauge
Exposed by: Brainz, API-GW, Agent and Router
Description: The total amount of NATS subscriptions that this component has.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.router_id: The ID of the Router component that has exposed the metric, used for grouping extra labels from the varnish_controller_router_info metric.varnish_controller_database_connected
Type: Gauge
Exposed by: Brainz
Description: Indicates if the database is connected. (1) connected, (0) disconnected
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_database_connections_exhausted_total
Type: Counter
Exposed by: Brainz
Description: The total count of failures trying to acquire a database connection.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_database_connections_max
Type: Gauge
Exposed by: Brainz
Description: The configured maximum database connections, if not defined the default is the amount of CPUs with a minimum of 4.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_database_current_connections_total
Type: Gauge
Exposed by: Brainz
Description: The current amount of active database connections.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_database_idle_connections_total
Type: Gauge
Exposed by: Brainz
Description: The current amount of idle database connections.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_db_table_total
Type: Gauge
Exposed by: Brainz
Description: Total rows in the table database table. The table can be replaced with the database table you’d wish to track. For example agents, then the metric name will be varnish_controller_db_agents_total. That will show the total amount of agents in the system.
Labels: None
varnish_controller_state_total{component=agent, state=down}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Agent that are in a down state.
varnish_controller_state_total{component=agent, state=failing}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Agent that are in a failing state.
varnish_controller_state_total{component=agent, state=locked}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Agent that are in a locked state.
varnish_controller_state_total{component=agent, state=readonly}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Agent that are in a readonly state.
varnish_controller_state_total{component=agent, state=running}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Agent that are in a running state.
varnish_controller_state_total{component=api-gw, state=down}
Type: Gauge
Exposed by: Brainz
Description: The total amount of API-GW that are in a down state.
varnish_controller_state_total{component=api-gw, state=running}
Type: Gauge
Exposed by: Brainz
Description: The total amount of API-GW that are in a running state.
varnish_controller_state_total{component=brainz, state=down}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Brainz that are in a down state.
varnish_controller_state_total{component=brainz, state=running}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Brainz that are in a running state.
varnish_controller_state_total{component=certificate, state=invalid}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Certificates in a invalid state.
varnish_controller_state_total{component=certificate, state=renewal_enabled}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Certificates that can be renewed by the system.
varnish_controller_state_total{component=certificate, state=renewal_error}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Certificates in a renewal error state.
varnish_controller_state_total{component=certificate, state=valid}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Certificates in a valid state.
varnish_controller_state_total{component=router, state=down}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Router that are in a down state.
varnish_controller_state_total{component=router, state=failing}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Router that are in a failing state.
varnish_controller_state_total{component=router, state=locked}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Router that are in a locked state.
varnish_controller_state_total{component=router, state=running}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Router that are in a running state.
varnish_controller_state_total{component=router, state=unlicensed}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Router that are in a unlicensed state.
varnish_controller_state_total{component=routing_health, state=down}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Routing health checks that are in a down state.
varnish_controller_state_total{component=routing_health, state=running}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Routing health checks that are in a running state.
varnish_controller_state_total{component=varnish, state=died_restarting}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Varnish servers in a died, restarting state.
varnish_controller_state_total{component=varnish, state=running}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Varnish servers in a running state.
varnish_controller_state_total{component=varnish, state=starting}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Varnish servers in a starting state.
varnish_controller_state_total{component=varnish, state=stopped}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Varnish servers in a stopped state.
varnish_controller_state_total{component=varnish, state=stopping}
Type: Gauge
Exposed by: Brainz
Description: The total amount of Varnish servers in a stopping state.
varnish_controller_state_total{component=vcl_groups, state=deployed}
Type: Gauge
Exposed by: Brainz
Description: The total amount of VCLGroups that are in a deployed state.
varnish_controller_state_total{component=vcl_groups, state=undeployed}
Type: Gauge
Exposed by: Brainz
Description: The total amount of VCLGroups that are in an undeployed state.
varnish_controller_agent_down_events_total
Type: Counter
Exposed by: Brainz
Description: The total count of down events for agent.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.PromQL examples:
# Example for alerting: If an agent has gone down more than 10 times over the last 5 minutes, then we alert.
expr: increase(varnish_controller_agent_down_events_total[5m]) > 10
# Example for alerting: Group per brainz and alert per brainz if there is an increase in agents going down over the last 5 minutes.
expr: sum by (brainz_id) (increase(varnish_controller_agent_down_events_total[5m])) > 0
varnish_controller_agent_failing_events_total
Type: Counter
Exposed by: Brainz
Description: The total count of failing events for agent.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_brainz_info
Type: Gauge
Exposed by: Brainz
Description: The information record. This can be used to join onto other data
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.db_uid: The UID stored in the database, if this UID changes all agents and routers go in a locked state.go_version: The Go version that the component was build with.host: The hostname of the machine / container where the component runs on.name: The name of the component.revision: The revision of the component.state: The state of the component (running or down).version: The version of the component.varnish_controller_brainz_state
Type: Gauge
Exposed by: Brainz
Description: The current state of the brainz component. (1) running, (3) down
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_brainz_total
Type: Gauge
Exposed by: Brainz
Description: Total amount of brainz components in the system.
Labels: None
varnish_controller_certificate_renewal_retries_max
Type: Gauge
Exposed by: Brainz
Description: The maximum amount of retries the Controller will do for certificate renewals.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_router_down_events_total
Type: Counter
Exposed by: Brainz
Description: The total count of down events for router.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_router_failing_events_total
Type: Counter
Exposed by: Brainz
Description: The total count of failing events for router.
Labels:
brainz_id: The ID of the Brainz component that has exposed the metric, used for grouping extra labels from the varnish_controller_brainz_info metric.varnish_controller_api_gw_info
Type: Gauge
Exposed by: API-GW
Description: The information record. This can be used to join onto other data
Labels:
api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.go_version: The Go version that the component was build with.host: The hostname of the machine / container where the component runs on.name: The name of the component.revision: The revision of the component.state: The state of the component (running or down).version: The version of the component.varnish_controller_api_gw_requests_average_seconds
Type: Gauge
Exposed by: API-GW
Description: The average amount of seconds for the endpoint in the API-GW.
Labels:
api_gw_endpoint: The endpoint in question. Example value: v1.brainz.vclgroup.all, this is the endpoint for all VCLGroups.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.varnish_controller_api_gw_requests_total
Type: Counter
Exposed by: API-GW
Description: The total amount of requests to the endpoint in the API-GW.
Labels:
api_gw_endpoint: The endpoint in question. Example value: v1.brainz.vclgroup.all, this is the endpoint for all VCLGroups.api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.PromQL examples:
# Example to get a graph for the total amount of invalidation requests. 1 line for Basic Invalidations and 1 for regular invalidations that run through brainz. This is combined over all API-GWs known in the system.
sum(varnish_controller_api_gw_requests_total{api_gw_endpoint=~"v1.apigw.invalidation.basic|v1.brainz.invalidation.create"}) by (api_gw_endpoint)
varnish_controller_api_gw_state
Type: Gauge
Exposed by: API-GW
Description: The current state of the api-gw component. (1) running, (3) down
Labels:
api_gw_id: The ID of the API-GW component that has exposed the metric, used for grouping extra labels from the varnish_controller_api_gw_info metric.varnish_controller_api_gw_total
Type: Gauge
Exposed by: API-GW
Description: Total amount of API-GWs in the system.
Labels: None
varnish_controller_agent_info
Type: Gauge
Exposed by: Agent
Description: The information record. This can be used to join onto other data
Labels:
access_level: The access level of the component (private or system). If the agent has a private token attached it is private.agent_id: The ID of the Agent component that has exposed the metric, used for grouping to other metrics.go_version: The Go version that the component was build with.host: The hostname of the machine / container where the component runs on.name: The name of the component.private_token_id: The private token ID of the agent. Combination with the name label makes this agent metric unique.revision: The revision of the component.state: The state of the component (running, failing, down, readonly or locked).stop_routing: Boolean if stop routing is enabled or not. Also available as a metric.varnish_version: The Varnish version that the agent is communicating with.version: The version of the component.varnish_controller_agent_mbps
Type: Gauge
Exposed by: Agent
Description: The amount of Mbit per second the agent is sending out as traffic.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping to other metrics.varnish_controller_agent_root_vcl_reloads_total
Type: Counter
Exposed by: Agent
Description: The total amount of root VCL reloads.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping to other metrics.varnish_controller_agent_state
Type: Gauge
Exposed by: Agent
Description: The current state of the agent component. (1) running, (2) failing, (3) down, (4) readonly, (5) locked
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping to other metrics.varnish_controller_agent_stop_routing
Type: Gauge
Exposed by: Agent
Description: Indicates if the agent is in the stop routing mode. (1) stop routing, (0) receive routing traffic
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping to other metrics.PromQL examples:
# Show the total amount of agents that are marked as Stop Routing.
sum(varnish_controller_agent_stop_routing)
varnish_controller_agent_vcl_reloads_total
Type: Counter
Exposed by: Agent
Description: The total amount of VCL reloads.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping to other metrics.varnish_controller_agents_stop_routing_total
Type: Gauge
Exposed by: Agent
Description: Total amount of agents in the system that have ‘stop routing’ enabled.
Labels: None
varnish_controller_router_dns_enabled
Type: Gauge
Exposed by: Router
Description: Indicates if the router is setup for DNS routing. (1) enabled, (0) disabled
Labels:
router_id: The ID of the Router component that has exposed the metric, used for grouping to other metrics.varnish_controller_router_http_enabled
Type: Gauge
Exposed by: Router
Description: Indicates if the router is setup for HTTP routing. (1) enabled, (0) disabled
Labels:
router_id: The ID of the Router component that has exposed the metric, used for grouping to other metrics.varnish_controller_router_https_enabled
Type: Gauge
Exposed by: Router
Description: Indicates if the router is setup for HTTPS routing. (1) enabled, (0) disabled
Labels:
router_id: The ID of the Router component that has exposed the metric, used for grouping to other metrics.varnish_controller_router_info
Type: Gauge
Exposed by: Router
Description: The information record. This can be used to join onto other data
Labels:
access_level: The access level of the component (private or system). If the router has a private token attached it is private.dns: Boolean to indicate if DNS routing is enabled.go_version: The Go version that the component was build with.host: The hostname of the machine / container where the component runs on.http: Boolean to indicate if HTTP routing is enabled.https: Boolean to indicate if HTTPS routing is enabled.name: The name of the component.private_token_id: The private token ID of the router. Combination with the name label makes this router metric unique.revision: The revision of the component.router_id: The ID of the Router component that has exposed the metric, used for grouping to other metrics.state: The state of the component (running, failing, down, unlicensed or locked).version: The version of the component.varnish_controller_router_routing_rule_reloads_total
Type: Counter
Exposed by: Router
Description: The total amount of routing rule reloads.
Labels:
router_id: The ID of the Router component that has exposed the metric, used for grouping to other metrics.varnish_controller_router_state
Type: Gauge
Exposed by: Router
Description: The current state of the router component. (1) running, (2) failing, (3) down, (5) locked, (6) unlicensed
Labels:
router_id: The ID of the Router component that has exposed the metric, used for grouping to other metrics.varnish_controller_domain_info
Type: Gauge
Exposed by: Domain
Description: The information record. This can be used to join onto other data
Labels:
domain_id: The ID of the domain that has exposed the metric, used for grouping to other metrics.fqdn: The FQDN of the domain.varnish_controller_domains_with_certificate_total
Type: Gauge
Exposed by: Domain
Description: Total amount of domains that has a certificate in the system
Labels: None
varnish_controller_routing_health_grace_enabled
Type: Gauge
Exposed by: Routing health check
Description: Indicates if the grace period is enabled or disabled. (1) enabled, (0) disabled
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_routing_health_healthy_threshold
Type: Gauge
Exposed by: Routing health check
Description: The threshold of how many healthy health checks there need to be.
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_routing_health_healthy_total
Type: Gauge
Exposed by: Routing health check
Description: The total amount of healthy health checks.
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_routing_health_info
Type: Gauge
Exposed by: Routing health check
Description: The information record. This can be used to join onto other data
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.expected_status: The configured expected response status code.grace: Boolean (true or false) if the health check is in grace period.method: The configured request method.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.url: The configured request URL.varnish_controller_routing_health_last_probe_seconds
Type: Gauge
Exposed by: Routing health check
Description: Last health probe duration of Varnish Traffic Routers in seconds.
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_routing_health_last_probe_unix
Type: Gauge
Exposed by: Routing health check
Description: The UNIX UTC timestamp in seconds of when the last probe was executed.
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_routing_health_state
Type: Gauge
Exposed by: Routing health check
Description: Status of health probes of Varnish Traffic Routers. (1) running, (3) down
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_routing_health_window_total
Type: Gauge
Exposed by: Routing health check
Description: The total amount of health checks being kept.
Labels:
agent_id: The ID of the agent that is related to the metric, used for grouping to other metrics.domain_id: The ID of the domain that is related to the metric, used for grouping to other metrics.router_id: The ID of the router that is related to the metric, used for grouping to other metrics.varnish_controller_grpc_health_connected
Type: Gauge
Exposed by: GRPC Plugin
Description: Status of GRPC health probes of Varnish Traffic Routers. (1) connected, (0) disconnected
Labels:
grpc_id: The ID of the GRPC that is related to the metric, used for grouping to other metrics.router_id:varnish_controller_grpc_health_has_connection_error
Type: Gauge
Exposed by: GRPC Plugin
Description: Determine if the GRPC health probe has a connection error. (1) error, (0) no error
Labels:
grpc_id: The ID of the GRPC that is related to the metric, used for grouping to other metrics.router_id:varnish_controller_grpc_health_has_request_error
Type: Gauge
Exposed by: GRPC Plugin
Description: Determine if the GRPC health probe has a request error. (1) error, (0) no error
Labels:
grpc_id: The ID of the GRPC that is related to the metric, used for grouping to other metrics.router_id:varnish_controller_grpc_health_last_connection_error_unix
Type: Gauge
Exposed by: GRPC Plugin
Description: The UNIX timestamp of when the GRPC health probe had the last connection error.
Labels:
grpc_id: The ID of the GRPC that is related to the metric, used for grouping to other metrics.router_id:varnish_controller_grpc_health_last_request_error_unix
Type: Gauge
Exposed by: GRPC Plugin
Description: The UNIX timestamp of when the GRPC health probe had the last request error.
Labels:
grpc_id: The ID of the GRPC that is related to the metric, used for grouping to other metrics.router_id:varnish_controller_grpc_info
Type: Gauge
Exposed by: GRPC Plugin
Description: The information record. This can be used to join onto other data
Labels:
grpc_id: The ID of the GRPC that is related to the metric, used for grouping to other metrics.name: The name of the GPRC plugin.url: The configured GRPC URL.verify_tls: Boolean to indicate if the GRPC plugin verifies TLS (true or false).varnish_controller_vcl_group_agent_state_deployed
Type: Gauge
Exposed by: VCLGroup
Description: A boolean indicating if the VCLGroup is deployed to this specific agent. (1) deployed, (0) undeployed
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_agent_state_last_deployed_at_unix
Type: Gauge
Exposed by: VCLGroup
Description: The timestamp of when the last deployment took place as a UNIX timestamp.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_agent_state_last_failed_at_unix
Type: Gauge
Exposed by: VCLGroup
Description: The timestamp of when the deployment was last failed as a UNIX timestamp.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_agent_state_last_retried_at_unix
Type: Gauge
Exposed by: VCLGroup
Description: The timestamp of when the deployment was last retried as a UNIX timestamp.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_agent_state_retries_total
Type: Gauge
Exposed by: VCLGroup
Description: The amount of deployment retries per agent and VCLGroup.
Labels:
agent_id: The ID of the Agent component that has exposed the metric, used for grouping extra labels from the varnish_controller_agent_info metric.vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_deployed
Type: Gauge
Exposed by: VCLGroup
Description: A boolean indicating if the VCLGroup is deployed. (1) deployed, (0) undeployed
Labels:
vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_deployed_agents_total
Type: Gauge
Exposed by: VCLGroup
Description: The total amount of agents this VCLGroup has been deployed to.
Labels:
vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_deployment_errors_total
Type: Gauge
Exposed by: VCLGroup
Description: The total amount of agents in this VCLGroup that has a deployment error.
Labels:
vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_info
Type: Gauge
Exposed by: VCLGroup
Description: The information record. This can be used to join onto other data
Labels:
deployed: Boolean to indicate if the VCLGroup is deployed (true or false).name: The name of the VCLGroup.root: Boolean to indicate if the VCLGroup is a root deployment (true or false).track_latest: Boolean to indicate if the VCLGroup is tracking the latest changes in the VCLs automatically (true or false).vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_vcl_group_matched_agents_total
Type: Gauge
Exposed by: VCLGroup
Description: The total amount of agents this VCLGroup should be deployed to.
Labels:
vcl_group_id: The ID of the VCLGroup that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_info
Type: Gauge
Exposed by: Certificates
Description: The information record. This can be used to join onto other data
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.name: The name of the Certificate.type: The type of the Certificate.varnish_controller_certificate_not_after_unix
Type: Gauge
Exposed by: Certificates
Description: The UNIX timestamp of when the certificate should not be used after.
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_not_before_unix
Type: Gauge
Exposed by: Certificates
Description: The UNIX timestamp of when the certificate should not be used before.
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_remaining_days
Type: Gauge
Exposed by: Certificates
Description: The total remaining days the certificate is valid for.
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_renewal_enabled
Type: Gauge
Exposed by: Certificates
Description: Indicates if the certificate is automatically being renewed by the Varnish Controller. (1) enabled, (0) disabled
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_renewal_error
Type: Gauge
Exposed by: Certificates
Description: Only exposed when it is a renewable certificate. Indicates if the certificate has a renewal error. (1) error, (0) no error
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_renewal_retry_count
Type: Gauge
Exposed by: Certificates
Description: Only exposed when it is a renewable certificate. The total amount if times the certificate has been retried to be renewed.
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_renewed_at_unix
Type: Gauge
Exposed by: Certificates
Description: Only exposed when it is a renewable certificate. The UNIX timestamp of when the certificate was last renewed.
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificate_valid
Type: Gauge
Exposed by: Certificates
Description: Indicates if the certificate is valid. (1) valid, (0) invalid
Labels:
certificate_id: The ID of the Certificate that is related to the metric, used for grouping to other metrics.varnish_controller_certificates_acme_db_total
Type: Gauge
Exposed by: Certificates
Description: Total amount of ACME-DB certificates in the system.
Labels: None
varnish_controller_certificates_database_total
Type: Gauge
Exposed by: Certificates
Description: Total amount of Database certificates in the system.
Labels: None
varnish_controller_certificates_disk_total
Type: Gauge
Exposed by: Certificates
Description: Total amount of Disk certificates in the system.
Labels: None
varnish_controller_certificates_renew_enabled_total
Type: Gauge
Exposed by: Certificates
Description: Total amount of certificates that are able to be renewed by the system. This is not the amount of certificates up for renewal, but rather those that are automatically renewed by the system.
Labels: None
varnish_controller_license_remaining_days
Type: Gauge
Exposed by: Licenses
Description: The total remaining days that the license is still valid.
Labels:
component: The component that the license applies to. brainz or varnish.varnish_controller_license_valid
Type: Gauge
Exposed by: Licenses
Description: A boolean indicating if the license is still valid. (1) valid, (0) invalid
Labels:
component: The component that the license applies to. brainz or varnish.varnish_controller_licenses_total
Type: Gauge
Exposed by: Licenses
Description: Total amount of licenses in the system.
Labels: None
varnish_controller_data_collection_duration_seconds
Type: Gauge
Exposed by: Data collector
Description: The amount of seconds it took to collect all data for the Prometheus scrape.
Labels: None
varnish_controller_data_collection_successful
Type: Gauge
Exposed by: Data collector
Description: The data collection was successful. (1) successful, (0) unsuccessful
Labels: None
varnish_controller_sessions_api_total
Type: Gauge
Exposed by: Sessions
Description: Total amount of API sessions in the system
Labels: None
varnish_controller_sessions_ui_total
Type: Gauge
Exposed by: Sessions
Description: Total amount of UI sessions in the system
Labels: None
varnish_controller_sessions_user_total
Type: Gauge
Exposed by: Sessions
Description: Total amount of user sessions in the system
Labels: None
varnish_controller_db_tags_total
Type: Gauge
Exposed by: Tags
Description: Total amount of accounts in the system.
Labels: None
varnish_controller_tags_static_total
Type: Gauge
Exposed by: Tags
Description: Total amount of static tags in the system
Labels: None
varnish_controller_db_orgs_total
Type: Gauge
Exposed by: Organizations
Description: Total amount of organizations in the system.
Labels: None
varnish_controller_organizations_locked_total
Type: Gauge
Exposed by: Organizations
Description: Total amount of locked organizations in the system
Labels: None
varnish_controller_accounts_locked_total
Type: Gauge
Exposed by: Accounts
Description: Total amount of locked accounts in the system
Labels: None
varnish_controller_db_accounts_total
Type: Gauge
Exposed by: Accounts
Description: Total amount of tags in the system.
Labels: None