Search
Varnish Controller

Router

Routers in Varnish Controller are used for traffic routing. The router supports two types of routing:

The router runs as an integrated part of Varnish Controller. The routers will be configured over NATS by brainz and also receive utilization information from agents over NATS. Reconfiguration, updates, health states etc, will automatically be applied to the routers.

Each router must have a unique name that is specified during the startup of the router process.

Note: In this chapter, endpoints, Varnish servers and cache nodes are interchangeable.

States

An router can have the following states:

  • (1) Running - Available for receiving configuration and traffic. If it can actually route traffic depends on loaded configuration.
  • (3) Down - Brainz does not have contact with the router, and it is considered as down by brainz. Note that the routers could still be up and route traffic. It is just an indication that brainz do not have contact with the router over NATS.
  • (5) Locked - Locked router, only reports information, cannot be deployed to (see “Locked Routers”)
  • (6) Unlicensed - No valid license from Brainz, make sure to apply a valid license with router add-on in brainz. No configuration will be reloaded in the router. It will keep running with any existing configuration until the license issue is resolved.

HTTP Redirect

Incoming HTTP requests from clients are redirected to the most suitable caching node using the 302 Found HTTP response. Location header is also included containing the URL of the caching node. The original host header is preserved in the redirected URL as part of the path, which makes it possible to route multiple tenants in a shared environment. The caching node can extract the original host header and revert the path back to the original path as needed.

This routing mechanism is usually suitable for static assets and media streaming.

HTTP routing can be enabled/disabled via the -http-routing flag for the router.

Basic example:

router_http.png

  • The router is configured with the domain cdn.example.com and has 6 running endpoints (with agents), 3 in the US and 3 in the EU.
  • cdn.example.com is configured as VCLGroup with an assigned RoutingRules. Using GeoIP for routing.
  • Each agent has its specific BaseURL: http://eu1.example.com, http://eu2.example.com, …, http://us1.example.com etc.
  • cdn.example.com points to the router IP.
  • Router performs health checks for cdn.example.com towards all varnish instances to verify which endpoints are healthy.
  1. The client requests http://cdn.example.com/movie.m3u8
  2. Router receives the request:
    • Verifies that it has cdn.example.com configured.
    • Checks client IP and based on configured RoutingRules it makes a decision that the client is from US and the geographically closest endpoint to the client is the 3rd server located in the US.
    • The router returns a HTTP 302 redirect with location header set to: http://us3.example.com/cdn.example.com/movie.m3u8
  3. The client is redirected to the Varnish server us3.example.com with the URL: http://us3.example.com/cdn.example.com/movie.m3u8
    • The Varnish server parses the URL (in VCL) and takes cdn.example.com from the first part of the URL and verifies that this domain is configured on the server.
    • The loaded VCL for cdn.example.com receives the request.
  4. The Varnish server serves the requested manifest to the client.

VCL Configuration For HTTP Routing

In order for the Varnish servers to handle HTTP redirect URLs, the root.vcl which is deployed by Varnish Controller picks out the redirected host and path. This is handled automatically for shared deployments. For root deployments, this must be manually handled in the VCL.

The Host and URL is not changed to the requested host and URL in the loaded user VCL. Hence, the Host will be the host of the configured BaseURL for the agent. The path will include the domain (e.g. http://agent1.example.com/mydomain.com/myfile.m3u8). This is subject to change in future releases.

In order to make web browsers happy regarding CORS for HTTP redirect routing, the VCL that is handling the requests must add CORS rules to the backend responses.

Example:

sub vcl_backend_response {
    set beresp.http.Access-Control-Allow-Origin = "*";
    set beresp.http.Access-Control-Allow-Methods = "GET,HEAD";
}

TLS

TLS certificates must be manually managed for now. Support for managing TLS certificates via Varnish Controller will be added to future releases.

DNS

Incoming DNS requests from clients are directed to the best caching node using dynamic A and AAAA records. Geographical distance is calculated using edns-client-subnet if available, or the source IP of the DNS resolver doing the request.

The router acts as a remote backend for PowerDNS. PowerDNS is the DNS server and queries to PowerDNS will be forwarded over a REST API (defined by PowerDNS) towards the router. The router will then respond with the most appropriate cache node’s IP address for the specific DNS request.

This routing mechanism is usually suitable for web and APIs.

DNS can be enabled/disabled via the -dns-routing flag for the router.

Basic example:

router_dns.png

  • The router is configured with the domain cdn.example.com and has 6 running endpoints (with agents), 3 in US and 3 in EU.
  • cdn.example.com is configured with a VCLGroup and assigned a RoutingRules, with GeoIP routing rule.
  • Each agent has their specific IPv4/IPv6 addresses configured.
  • The router has the PowerDNS name server configured (-ns).
  • The router performs health checks for cdn.example.com towards all varnish instances to verify which endpoints are healthy.
  1. The client asks their closest resolver (8.8.8.8 in this example) for cdn.example.com
  2. The PowerDNS server is an authoritative server for cdn.example.com and retrieves the requests.
  3. PowerDNS sends a request to the router over the HTTP REST interface.
    • The router checks if the request contains remote subnet IP (via edns), else it will use the remote IP of the resolver.
    • Based on the IP and the configured RoutingRules for the domain, the geographically closest, healthy, endpoint is selected (us3 in this case).
    • The IPv4 (or IPv6) address for the us3 endpoint is returned back as the A (or AAAA) record.
  4. PowerDNS responds back to 8.8.8.8 that it’s endpoint us3 IP that is the address for cdn.example.com.
  5. The client local resolver retrieves the IP for cdn.example.com
  6. The client contacts us3 endpoint based on the URL such as http://cdn.example.com/movie.m3u8.
  7. The Varnish server (us3) responds back with the manifest file.

Routing Types

Each VCLGroup can have a specific order of routing decisions. These are configured via RoutingRules. For each client request the configured routing order will be applied. The first routing type that returns a matching endpoint will succeed and will be sent back to the client (via HTTP redirect or DNS). If there is no match, the router will try with the next routing type in the configured routing order.

Example order:

history,geoip,plugin:1,plugin:2,tags,leastutilized,external

If no endpoint is found after all rules have been evaluated, a 503 Service Unavailable will be returned with a RetryAfter header. The RetryAfter header value (duration) is configurable in the RoutingRules. LeastUtilized is always the fallback rule applied if no other configured rule could find a suitable endpoint.

In the example above, external will only be evaluated if there are no healthy endpoints found for the other rules. If external rule contains no healthy endpoints, leastutilized will be evaluated again (fallback). Most likely this will fail once again since it previously failed. Then 503 Service Unavailable will be reported back to the client.

The plugin:1 and plugin:2 in the example above indicate that the plugins with ID 1 and 2 should be used. The naming convention to use plugins is plugin:<id>.

When adding a plugin rule and it’s a gRPC type, the connection will not be instant and some requests may skip the plugin rule as it’s not yet connected to the gRPC service. The requests will then fallback to the rule after the plugin.

Geographical (GeoIP)

Look up the client IP in a MaxMind GeoIP database. If a successful lookup is made, the geographically closest endpoint to the client’s location will be used. If one or more endpoints are on the same location, the first healthy, not over-utilized endpoint will be selected.

Note that this is the geographical distance between cache node and client and not the closest endpoint in terms of network hops or latency. When using DNS-based routing edns-client-subnet is needed and forwarded by the client’s recursive resolver, for client IP, otherwise, it will be the recursive resolvers IP that is used for lookup.

If no healthy and not over-utilized endpoint is found, then the next routing rule will be evaluated (if any).

Tagged Based

Tagged based routing is making use of the tags that the router and the agents have assigned. If the router has the same tag as an agent, the agent’s Varnish server will be selected as the endpoint. This can be used for selecting endpoints at the same location as the router itself.

If no healthy and not over-utilized endpoint is found, then the next routing rule will be evaluated (if any).

Least Utilized

This will select the least utilized endpoint based on utilization reported by the agents. This will return an endpoint even if it is over-utilized.

If no healthy endpoint is found, then the next routing rule will be evaluated (if any).

This is always the fallback rule if no other rule is finding an appropriate endpoint.

Random

Randomly selects one endpoint that is healthy and not over-utilized.

If no healthy and not over-utilized endpoint is found, then the next routing rule will be evaluated (if any).

External

External routing will use the configured external endpoint with highest weight, if it is healthy. If it is not healthy it will select next based on the weight. The external endpoints are basically redirect URLs or IP addresses that could point to a 3rd party CDN or other type of cache node.

External routes have no utilization probes and if they are over utilized they should report unhealthy. Hence, how to measure utilization for external endpoints is up to the user.

If no healthy external endpoint is found, then the next routing rule will be evaluated (if any).

History

Uses the same endpoint that was selected previously for the same IP address during the configured period of time. Once the history is timed out for the IP address, a new routing decision will be made. This leads to faster lookups of endpoints.

If the existing endpoint in the history is not healthy, is over-utilized or it has timed out (HistoryTTL), then the next routing rule will be evaluated (if any).

Plugins: gRPC

The router from version (4.1.0) supports plugins as routing rules. The first available plugin is for gRPC (Google Remote Procedure Call). The router provides a pre-defined Flatbuffer schema that defines the functions for communicating with the gRPC service. The router acts as a client towards the user implemented gRPC server. The gRPC service is something that the user has to implement.

The router sends information (defined in the schema) about available and healthy endpoints, client information and HTTP headers (for HTTP requests). The gRPC service then either responds with an ID of the endpoint to select or a custom result with a redirect URL and IPv4/IPv6 addresses. The router will then perform a HTTP redirect or DNS response based on the gRPC service response. Either with the IPv4/IPv6 or URL that the selected endpoint is configured with, or the information given as a custom response.

The gRPC service can be written in most common languages (Go/C++/C/C#/Java/Kotlin/Javascript/Lua/PHP/Rust/Swift etc.). As long as it is implemented using the Flatbuffer schema provided by the router.

For implementation details of a gRPC routing plugin, see Routing Plugins/gRPC.

Locked Routers

Routers become Locked when brainz reports wrong unique ID (UID) to them. The brainz UID is created the first time brainz initializes the database. When routers are started for the first time, brainz will provide them with this UID. The routers will then only accept configuration from a brainz instance that uses the same UID.

This functionality is to prevent empty/wrong configuration if the database becomes reset or destroyed without backups. Starting brainz with a new database, without any data, would otherwise wipe the configuration from the routers. So in order to avoid this scenario, the router will become locked.

Locked routers will keep running the previously known configuration until unlocked.

Unlocking Routers

Before unlocking a router make sure that the database contains the same settings that should be deployed to the routers. That means, same tags, deployments, domains, vclgroups, routingrules and that the routers have the same tags as before (if tagged based routing is used). Once this is added to the database (via API/UI/CLI) the router’s UID file can be removed. The UID file exists in the routers configured base-dir and is called router.uid. Remove the file and then restart the router and the router will become running again. Brainz will then update the router with correct configuration.

Management Interface

The router has an internal management interface. A very simple API that supports health checks for the router, health checks for specific domains and prometheus statistics output. The management interface is usually used for debugging purposes only. More detailed information can be retrieved via the Varnish Controller REST API.

Management API endpoints:

  • /metrics - Prometheus metrics output
  • /health?domain=<domain> - Check health for a given domain, returns 200 OK if the domain has at least one healthy endpoint, else 503 Service Unavailable.
  • /live- Always responds 200 OK to verify that the router is up and running.

The management interface can be enabled/disabled via the -enable-mgmt flag to the router.

Health Checks

Each router will perform health checks towards the configured domains. The health checks are performed towards each agent’s configured -base-url. The host is specified for the configured domain and the method and path as per RoutingRule configuration.

Example:

Method: GET 
HOST: mydomain.com 
URL: http://agent1.example.com/ping

Status of the health checks can be retrieved via the Varnish Controller REST API, CLI and GUI.

Router Trace

In order to verify that routing works as expected it is possible to perform dry run lookups towards the routers via API, CLI and GUI. These traces are not accounted for in the statistics for the routers. The router trace will show information about decisions made, endpoints routed to, client information, geo-ip information (if applicable), timing information etc.

The trace will be performed as it was originating from the specified IP, which makes it possible to see routing decisions for the given IP address. The result can of course vary depending on health, routing order, routing types and utilization.

Example using the CLI:

$ vcli router trace 123.1.1.1 example.com   
+---------+-----------+-------------+--------+---------+------------+-----------------------+-----------+------+-------+
| Router  | ClientIP  | Domain      | Agent  |  Type   | LookupTime |          URL          |   IPv4    | IPv6 | Error |
+---------+-----------+-------------+--------+---------+------------+-----------------------+-----------+------+-------+
| router1 | 123.1.1.1 | example.com | ag1(1) | history | 4.879┬Ás    | http://127.0.0.1:8091 | 127.0.0.1 |      |       |
+---------+-----------+-------------+--------+---------+------------+-----------------------+-----------+------+-------+