Search
Varnish Helm Chart

Setting up Varnish Enterprise autoscaling

Introduction

Autoscaling is useful in a scenario where Varnish Enterprise instances need to be scaled up or down depending on usage, whether to optimize for resource usage or for cost. Varnish Enterprise Helm Chart fully supports autoscaling using both resource-based metrics and custom metrics through Prometheus.

Autoscaling using resource-based metrics

To use autoscaling using resource-based metrics, make sure that Kubernetes’ metrics-server is installed and configured on the server. To verify that metrics-server is already installed on the server, try the following command:

kubectl top node

The node statistics should be displayed if metrics-server is already installed. If not, please consult the Kubernetes provider’s documentation on how to enable metrics-server. For example:

Using CPU metrics

Autoscaling with CPU metrics requires that all containers have CPU requests set:

---
server:
  resources:
    requests:
      cpu: 4
      memory: 12Gi

  varnishncsa:
    resources:
      requests:
        cpu: 500m
        memory: 512Mi

  agent:
    resources:
      requests:
        cpu: 500m
        memory: 512Mi

As resource requirements are different from setup to setup, it is recommended to measure and set these values to a normal time traffic plus some allowance to allow for an occassional spike. In a Kubernetes setup with metrics-server installed, this can be measured by running the following command:

kubectl top pod --containers

If Prometheus metrics are setup in the cluster, the CPU metrics can also be queried from Prometheus:

max by (container) (rate(container_cpu_usage_seconds_total{namespace="varnish", container!=""}[5m]))

Note: it is highly recommended to set only CPU requests and not CPU limits, as CPU limits are based on time slicing and may result in severe throttle during a peak period, e.g., the schedule will pause the Varnish Enterprise process once it reaches the specific CPU seconds until the next scheduling interval, resulting in a higher latency.

Once resource requests are setup, autoscaling can be configured in Varnish Enterprise Helm Chart using the following values:

---
server:
  autoscaling:
    enabled: true
    minReplicas: 1
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 80

In this case, Kubernetes will create a new Varnish Enterprise Pod once average utilization across all replicas reaches 80%. The value to use for averageUtilization depends on the desired number of replicas during peak time, and the CPU usage delta between the normal time traffic and the peak time traffic.

Using memory metrics

Using memory metrics for autoscaling Varnish Enterprise is not recommended. Varnish Enterprise will try to use as much memory as it can for caching content. Autoscaling based on memory metrics will result in a lower cache hit rate, as a Varnish Enterprise Pod would come up empty.

Autoscale using Prometheus metrics

To use Prometheus metrics for autoscaling, prometheus-adapter is needed to expose metrics from Prometheus into Kubernetes. Refer to the installation section in prometheus-adapter repository for more information.

This guide will assume that prometheus-community/prometheus-adapter Helm Chart is used.

Using bandwidth via container metrics

To use container metrics for autoscaling, a custom rule must be configured in prometheus-adapter. For example, to expose outgoing bandwidth per second (Tx) as a metric for autoscaling, configure custom rules as follows:

# in values.yaml of prometheus-community/prometheus-adapter
---
rules:
  custom:
    - seriesQuery: 'container_network_transmit_bytes_total{interface="eth0"}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: "sum by (<<.GroupBy>>) (rate(<<.Series>>{<<.LabelMatchers>>}[3m]))"

This custom rule will expose container_network_transmit_bytes_per_second as an autoscaling metric by matching against namespace and pod in a PromQL query, then take an average rate over 3 minutes. Rate interval to use here must be at least two times higher than the scraping interval of the container metrics.

To confirm that container_network_transmit_bytes_per_second is properly configured once prometheus-adapter is deployed, run the following command:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<namespace>/pods/*/container_network_transmit_bytes_per_second"

Replace <namespace> with any namespaces that are configured to export container metrics and contain at least one container (e.g. kube-system). The value shown in the value field in each Pod is denoted in a milli format (where 1m equals to 1/1000). For example, 1 MiB/s is denoted as 1048576000m.

To use container_network_transmit_bytes_per_second for autoscaling, configure server.autoscaling.metric as follows:

---
server:
  autoscaling:
    enabled: true
    minReplicas: 1
    metrics:
      - type: Pods
        pods:
          metric:
            name: container_network_transmit_bytes_per_second
          target:
            type: AverageValue
            averageValue: 26214400
            # As Kubernetes always use the `m` unit when running `kubectl get hpa`
            # it may be helpful to also use the `m` unit here (where 1 equals to 1000m):
            #averageValue: 26214400000m

In this example, Kubernetes will create a new Varnish Enterprise Pod once the average bandwidth across all replicas exceed 25MiB/s (200Mbps), and will scale down to server.autoscaling.minReplicas when average utilization can be satisfied with a less number of replicas.

Using Varnish Enterprise metrics (vmod_stat)

It is possible to use metrics from Varnish Enterprise via vmod_stat for autoscaling, by applying custom rules to prometheus-adapter. For example, to autoscale based on requests per second, custom rules can be configured as follows:

---
rules:
  custom:
    - seriesQuery: '{__name__=~"varnish_main_client", type="req"}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: '^.*$'
        as: "varnish_main_client_req_per_second"
      metricsQuery: "sum by (<<.GroupBy>>) (rate(<<.Series>>{<<.LabelMatchers>>}[3m]))"

As stats reported by Varnish Enterprise may have multiple types on a single metric, it is necessary to manually select the metric you’re interested in via rules.custom[0].seriesQuery and adjust rules.custom[0].name.as accordingly. In this example, the custom rules will expose varnish_main_client{type="req"} for each Varnish Enterprise Pod as varnish_main_client_req_per_second metric.

To confirm that varnish_main_client_req_per_second is properly configured once prometheus-adapter is deployed, run the following command:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<namespace>/pods/*/varnish_main_client_req_per_second"

Replace <namespace> with any namespaces that are configured to export Varnish Enterprise metrics (e.g. varnish). The value shown in the value field in each Pod is denoted in a milli format (where 1m equals to 1/1000). For example, 10 requests per second is denoted as 10000m.

To use varnish_main_client_req_per_second for autoscaling, configure server.autoscaling.metric as follows:

---
server:
  autoscaling:
    enabled: true
    minReplicas: 1
    metrics:
      - type: Pods
        pods:
          metric:
            name: varnish_main_client_req_per_second
          target:
            type: AverageValue
            averageValue: 500
            # As Kubernetes always use the `m` unit when running `kubectl get hpa`
            # it may be helpful to also use the `m` unit here (where 1 equals to 1000m):
            #averageValue: 500000m

In this example, Kubernetes will create a new Varnish Enterprise Pod once average requests per second across all replicas exceed 500 requests per second. The appropriate value to use for averageValue depends on the workload and available resources.

Using Varnish Controller metrics

When Prometheus is setup to scrape aggregated metrics from Varnish Controller, it is possible to use these values for autoscaling by configuring custom rules in prometheus-adapter as follows:

---
rules:
  custom:
    - seriesQuery: '{__name__=~"varnish_controller_.*"}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          agent_name: {resource: "pod"}
      name:
        matches: '^varnish_controller_(.*)$'
        as: "varnish_controller_${1}_per_second"
      metricsQuery: "sum by (<<.GroupBy>>) (rate(<<.Series>>{<<.LabelMatchers>>}[3m]))"

This custom rule will make varnish_controller_*_per_second available based on any metrics with the name varnish_controller_*. For example, varnish_controller_client_req_count will be available for use in autoscaling as varnish_client_req_count_per_second.

To confirm that varnish_controller_*_per_second is properly configured once prometheus-adapter is deployed, run the following command (e.g. for varnish_controller_client_req_count_per_second):

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<namespace>/pods/*/varnish_controller_client_req_count_per_second"

Replace <namespace> with any namespaces that are configured to export Varnish Controller metrics (e.g. varnish). The value shown in the value field in each Pod is denoted in a milli format (where 1m equals to 1/1000). For example, 10 requests per second is denoted as 10000m.

To autoscale based on varnish_controller_client_req_count_per_second, configure server.autoscaling.metrics as follows:

---
server:
  autoscaling:
    enabled: true
    minReplicas: 1
    metrics:
      - type: Pods
        pods:
          metric:
            name: varnish_controller_client_req_count_per_second
          target:
            type: AverageValue
            averageValue: 500
            # As Kubernetes always use the `m` unit when running `kubectl get hpa`
            # it may be helpful to also use the `m` unit here (where 1 equals to 1000m):
            #averageValue: 500000m

In this example, Kubernetes will create a new Varnish Enterprise Pod once average requests per second across all replicas exceed 500 requests per second. The appropriate value to use for averageValue depends on the workload and available resources.