Streaming in an HTTP context is often associated with video streaming. And although OTT video streaming is an important use case for Varnish, this section is not about that.
In chapter 10 we will be talking about OTT video streaming in detail.
Here, it means that Varnish avoids buffering, in the sense that it doesn’t need to receive the full response from the origin before starting to send it to the user.
Instead, Varnish can start streaming the origin data to users as soon as it’s available. This can significantly reduce latency when delivering live video, where certain origins can start delivering video chunks before they are completely processed. If Varnish were to buffer the chunk, the low-latency benefit of the origin would be lost.
An HTTP server that wants to send the response in chunks, which
implies not sending a Content-Length header, signals this by sending
the following header to the client:
Transfer-Encoding: chunked
When all the headers have been sent to the client, the server starts sending HTTP chunks.
Each chunk is prefixed by its chunk length followed by a \r\n
sequence, then there’s the chunk itself, also followed by a \r\n
sequence. A web server sending a chunked response will typically write
one chunk at a time to the network socket, and this flushes the chunk to
the client. Then it’s a matter of repeating the process until all chunks
are sent. Finally the server sends a zero-length chunk to mark the end
of the transaction, and the HTTP connection can be used to request more
resources.
This may sound very confusing, so here’s an example:
HTTP/1.1 200 OK 
Content-Type: text/plain
Transfer-Encoding: chunked
8\r\n
Varnish\n\r\n
9\r\n
supports\n\r\n
8\r\n
chunked\n\r\n
9\r\n
transfer\n\r\n
9\r\n
encoding\n\r\n
0\r\n
\r\n
The chunk length consists of the number of characters in each chunk,
but the \r\n shouldn’t be accounted for. However, any new lines that
are part of the chunk output should be part of the chunk length.
In this case
Varnishis seven characters long, but the new line results in a chunk length of eight.
The output of this HTTP response will be:
Varnish
supports
chunked
transfer
encoding
In total there are five chunks that are flushed to the client one at a time unless the kernel helpfully combines some of the chunks into a single TCP package. If we know that it takes one second for each chunk to be rendered, output will start appearing as of second one.
If this were done using regular buffered output, the client would have to wait for five seconds before the output were to appear.
For time-consuming processes, content streaming using chunked transfer encoding will have a positive impact on the quality of experience for the end-user.
Streaming delivery support in Varnish goes beyond the support for chunked transfer encoding. When Varnish fetches from a server, it will start sending the response body to the clients while still fetching from the backend, independently of chunked transfer encoding being used or not.
When Varnish does not know the length of the body while streaming because the server is using chunked transfer encoding, and the fetch is still ongoing, it will use chunked transfer encoding when sending the response to clients.
Once the response is fully processed, Varnish will know the content length, and no longer use chunked encoding when sending the response to new clients.
This means that a cache miss will be streamed to the client using the
Transfer-Encoding: chunked response header. But for the next request,
if it is a hit, and the object has been fully fetched from the backend,
the entire response body is sent at once, and a Content-Length header
is included.
Varnish switches to Content-Length as soon as it can because it is
cheaper in terms of resource usage. When the entire object is in
memory, Varnish can send all of the data to the kernel in a single
call, and the overhead associated with chunked encoding is eliminated.
In VCL, you can enable or disable streaming by toggling the value for
the beresp.do_stream variable. The default value for this variable is
true.
Here’s an example of a situation in which streaming for streamable content is disabled:
sub vcl_backend_response {
    if(bereq.url == "/my-page") {
        set beresp.do_stream = false;
    }
}
This snippet will disable streaming if the request URL is /my-page.
You can also check on delivery whether or not streaming was used by
reading the value of resp.is_streaming, which will return a boolean.
It happens that backends suddenly fail, or stop sending data in the middle of a transaction. If Varnish is streaming, and data stops coming from the backend, it can only signal this to the client by hanging up, leaving the client with a partial response. On the other hand, if streaming has been disabled for the given transaction, Varnish will be able to send an HTTP 503 error code to the client when it realizes that the full response cannot be sent to the client.