Search

Varnish with S3 compatible storage services Tutorial

Introduction

The S3 HTTP REST API is the standard interface used by most object storage services on the Internet. This set of tutorials show how to use Varnish as a reverse proxy in front of S3 compatible storage.

Diagram

Available tutorials

Basic setup:

Additional functionality:

Providers

The following lists contain some of the providers of S3 compatible storage services and storage products.

Providers of S3 compatible services

Providers of S3 compatible products

Provider notes

Amazon S3 endpoints now resolve to multiple IP addresses

Amazon recently introduced a change to DNS resolution for Amazon S3 endpoints. Earlier, each endpoint resolved to a single IP address. Now, each endpoint resolves to multiple IP addresses. This change is incompatible with some Varnish deployments set up to use Amazon S3 as a backend, and requires action to get compatible functionality and configuration in place.

DNS names in Varnish

A backend in Varnish Configuration Language (VCL) is defined by a single IP address and a single port specifying where the backend can be reached:

backend example {
    .host = "192.168.0.1";
    .port = "80";
}

A backend can also be defined using a DNS name. The DNS name will be resolved automatically when the VCL is loaded (for example when Varnish is started or reloaded), and its IP address will be used for the lifetime of the VCL:

backend example {
    .host = "backend.example.com";
    .port = "80";
}

If the DNS name does not exist or resolves to multiple IP addresses it can not be used to define a single backend, and the configuration will be invalid. Varnish can not start or reload the configuration.

A Varnish director enables load balancing between multiple backends, and is part of the proper solution if a DNS name resolves to multiple IP addresses. More about this in the solution description below.

DNS names in Amazon S3

Amazon S3 is using DNS, among other mechanisms, to load balance clients within the Amazon infrastructure. The result is that:

  • each HTTP(S) endpoint for Amazon S3 resolves to multiple IP addresses (between one and eight to be specific), and
  • each DNS resolution for the same endpoint may end up with a new set of IP addresses.

Example DNS resolution for the bucket endpoint varnish-example.s3.us-east-1.amazonaws.com:

$ dig varnish-example.s3.us-east-1.amazonaws.com
[....]
varnish-example.s3.us-east-1.amazonaws.com. 293 IN CNAME s3-r-w.us-east-1.amazonaws.com.
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       54.231.140.2
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       52.216.168.158
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       52.217.139.234
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       54.231.201.114
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       54.231.133.74
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       54.231.228.194
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       52.216.59.74
s3-r-w.us-east-1.amazonaws.com. 5 IN    A       52.216.216.18

The output above shows that the hostname varnish-example.s3.us-east-1.amazonaws.com has a CNAME record pointing to s3-r-w.us-east-1.amazonaws.com, which has eight A records that change with each DNS resolution. Clients, like Varnish, interfacing with Amazon S3 need to handle both the multiple A records and the frequent rotation of IP addresses. Ideally, the clients also load balance their requests over the different IP addresses.

Solution description

The basic setup tutorials listed above show how to set up and configure Varnish using vmods to provide:

  • DNS resolution of Amazon S3 endpoints, including those that resolve to multiple IP addresses.
  • A director that enables load balancing of HTTP requests over the different IP addresses that DNS name resolves to. The director also makes it possible to retry failed backend requests to other IP addresses in the load balancer.
  • Ongoing refresh of the DNS names during operation. New IP addresses will be added to and expired IP addresses will be removed from the director, automatically.