AWS S3 is an object storage service that is available over HTTP(S). There are several use cases where it is beneficial to put Varnish in front of S3, and this tutorial covers how it can be done using Varnish Enterprise.
The tutorial will cover the following items:
One or more servers running Varnish Enterprise.
Follow the Getting Started tutorial to install Varnish Enterprise on one or more servers. If you do not have a token to access the software, please reach out to sales or deploy Varnish Enterprise from the AWS Marketplace where a token is not needed.
An S3 bucket with an HTTPS endpoint.
AWS S3 buckets are by default available at https://$BUCKET.s3.$REGION.amazonaws.com/, where $BUCKET is the name of the bucket and $REGION is the region ID. In this tutorial we will use the bucket varnish-example in us-east-1
.
If Varnish is required to send authenticated requests in order to access the bucket, the following will also be needed:
An IAM user with an access key pair (access key ID and secret access key).
An IAM policy associated with the IAM user allowing read access to the objects in the bucket. The following can be used as a starting point for your own policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::varnish-example/*"
]
}
]
}
Buckets can either be public or private. Public buckets can be accessed by anyone while private buckets accept only authenticated requests. Follow the instructions below depending on whether your bucket is public or private:
The following VCL configuration will:
The Varnish modules used are udo and activedns.
vcl 4.1;
import udo;
import activedns;
# Static backends are not used in this example
backend default none;
sub vcl_init {
# Create a DNS group to regularly resolve the DNS name.
new s3_group = activedns.dns_group();
s3_group.set_host("varnish-example.s3.us-east-1.amazonaws.com:443");
# Create a load balancer to use for S3
new s3 = udo.director();
# Have the load balancer subscribe to DNS changes. This will let Varnish
# load balance over all the IP addresses that the S3 hostname resolves to
# and automatically add/remove backends as needed on the fly.
s3.subscribe(s3_group.get_tag());
# Set the load balancing type to random.
s3.set_type(random);
}
sub vcl_backend_fetch {
# Set the backend and the hostname that S3 expects to see.
# Replace the hostname below with the hostname of your S3 endpoint.
set bereq.backend = s3.backend();
set bereq.http.Host = "varnish-example.s3.us-east-1.amazonaws.com";
}
sub vcl_backend_error {
# Retry the backend request to another backend if the request failed.
return(retry);
}
Go to Step 2 for verification.
If the bucket is private, Varnish needs to authenticate by adding a signature to each individual request it sends to it.
Put the access key ID and secret access key as environment variables that can be accessed by Varnish. One way to do this is to put them in a systemd service overrides file. Open the overrides file using sudo systemctl edit varnish
and add the environment variables according to the following:
[Service]
Environment="AWS_ACCESS_KEY_ID=your-access-key-goes-here"
Environment="AWS_SECRET_ACCESS_KEY=your-secret-access-key-goes-here"
Note: The keys will be readable by unprivileged users on the host. If this is not acceptable, please consider using other mechanisms to store the keys.
The following VCL configuration will:
The Varnish modules used are udo, activedns and std as well as the AWS VCL library.
vcl 4.1;
import std;
import udo;
import activedns;
# Include the VCL library to use for request signing. This library is provided
# as part of Varnish Enterprise.
include "aws/init.vcl";
include "aws/sign.vcl";
# Static backends are not used in this example.
backend default none;
sub vcl_init {
# Read credentials from the environment variables.
aws_config.set("aws_access_key_id", std.getenv("AWS_ACCESS_KEY_ID"));
aws_config.set("aws_secret_access_key", std.getenv("AWS_SECRET_ACCESS_KEY"));
# Specify the region and hostname for the S3 endpoint.
aws_config.set("region", "us-east-1");
aws_config.set("host", "varnish-example.s3.us-east-1.amazonaws.com");
# Create a DNS group to regularly resolve the DNS name.
new s3_group = activedns.dns_group();
s3_group.set_host(aws_config.get("host") + ":443");
# Create a load balancer to use for S3.
new s3 = udo.director();
# Have the load balancer subscribe to DNS changes.
s3.subscribe(s3_group.get_tag());
}
sub vcl_backend_fetch {
# Set the backend and the hostname that S3 expects to see.
set bereq.backend = s3.backend();
set bereq.http.Host = aws_config.get("host");
# Add AWSv4 signature to the backend request.
call aws_sign_bereq;
}
sub vcl_backend_error {
# Retry the backend request to another backend if the request failed.
return(retry);
}
With the configuration from Step 1 in place, Varnish will automatically discover new backends and remove the ones that become inactive as the DNS resolution changes. You can expect a few backends to linger for a period before they are removed, as it takes some time for them to become inactive. This is normal. The following is example output from the backend.list
showing multiple backends:
$ sudo varnishadm backend.list
Backend name Admin Probe Last updated
boot.udo.s3.(sa4:52.216.93.30:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:54.231.226.82:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:52.216.214.194:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:52.217.173.34:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:52.217.41.248:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:54.231.197.202:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:52.216.222.98:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:52.216.27.160:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:55 GMT
boot.udo.s3.(sa4:54.231.134.66:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:54.231.128.42:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:52.217.41.56:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:52.217.94.152:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:52.216.245.8:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:52.217.38.152:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:52.216.92.190:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
boot.udo.s3.(sa4:52.217.167.106:443) probe Healthy (no probe) Thu, 16 Mar 2023 08:50:56 GMT
Verify that a user agent can fetch objects from S3 via Varnish. Example using cURL:
$ curl -i http://varnish.example.com/test.txt
HTTP/2 200
x-amz-id-2: sZuV5fCHQEreyA7LQP3lMGReBLkyJgbz1ojonCbmeQZ81Uf7LAk+b6VX3txKNIWgyPNrjXe9Lx0=
x-amz-request-id: ET982H7NWWTG5DNT
date: Wed, 15 Mar 2023 14:06:16 GMT
last-modified: Fri, 10 Mar 2023 13:28:27 GMT
etag: "cc18924e71607a1df1c5d90bd1de1fe8"
x-amz-server-side-encryption: AES256
content-type: text/plain
server: AmazonS3
content-length: 10
x-varnish: 458764
age: 67154
via: 1.1 varnish (Varnish/6.0)
accept-ranges: bytes
The caching policy can be set by using custom response headers from AWS S3 or it can be set in VCL. The three mechanisms to use when configuring this in VCL are TTL, grace and keep.
As objects expire from the cache and need to be refreshed, Varnish can use conditional requests to revalidate these objects without having to transfer their actual content from S3, even after the TTL of the objects has expired. This mechanism reduces the network transfer from S3.
The following VCL expands on step 1 with a caching policy that allows conditional requests for an extensive period of time. All objects get the same caching policy.
import utils;
sub vcl_backend_response {
# For the duration of beresp.ttl since an object was inserted into the
# cache, it will be delivered from the cache without revalidating with S3.
set beresp.ttl = 2h;
# For the duration of beresp.grace since the ttl of an object expired, the
# object will be delivered from the cache while an asynchronous refresh
# from S3 is triggered. This is a compromise that allows a recently expired
# object to be delivered from the cache with low latency while it is being
# refreshed in the background.
set beresp.grace = 1h;
# For the duration of beresp.keep since the grace of an object expired, the
# object will be kept in the cache to allow synchronous conditional
# requests. This may reduce bandwidth consumption between Varnish and S3
# since only objects that have actually changed need to be transferred.
# Objects that have not been changed will be reused from the cache.
set beresp.keep = 180d;
# More efficient handling of responses to conditional requests.
# https://docs.varnish-software.com/varnish-enterprise/vmods/utils/#fast_304
if (beresp.was_304) {
utils.fast_304();
}
}
If different types of objects need to have different caching policies, it is possible to set the caching policy per content-type
as reported by S3 instead:
sub vcl_backend_response {
# Cache image objects (such as image/png and image/jpeg) for one week.
if (beresp.http.content-type ~ "^image") {
set beresp.ttl = 7d;
set beresp.grace = 6h;
set beresp.keep = 1y;
}
# Cache video objects (suach as video/mp4) for 30 days.
else if (beresp.http.content-type ~ "^video") {
set beresp.ttl = 30d;
# Disabling grace to require expired objects to be revalidated before
# delivery.
set beresp.grace = 0s;
set beresp.keep = 1y;
}
# Any other objects are cached for one day.
else {
set beresp.ttl = 1d;
set beresp.grace = 1h;
set beresp.keep = 1m;
}
# More efficient handling of responses to conditional requests.
# https://docs.varnish-software.com/varnish-enterprise/vmods/utils/#fast_304
if (beresp.was_304) {
utils.fast_304();
}
}
Varnish will cache objects in memory by default. This allows for very fast caching, but the cache capacity is limited to the amount of memory available to Varnish. If the dataset is large, it may be beneficial to allow Varnish to use disks for caching.
Persistent caching on disks is provided by the Massive Storage Engine stevedore.
Varnish will cache entire objects by default, even if clients send partial requests (using the range
request header). If the dataset contains large files that are fetched by clients using partial requests, it may be more beneficial to cache partial responses in Varnish.
Partial request caching is provided by the slicer module.
Varnish can use stale-if-error to keep and serve stale content if S3 becomes unavailable.
Stale-if-error handling is provided by the stale module.
AWS S3, AWSv4, AWS EC2 and AWS Marketplace are trademarks of Amazon AWS.