Search
Varnish Enterprise

tcp

Description

The tcp vmod contains functions to control TCP congestion control algorithms, set pacing (rate limiting) and perform logging of protocol-related information.

Examples

TCP rate-limiting

import std;
import tcp;

sub vcl_recv
{
  # Limit all clients to 1000 KB/s.
  tcp.set_socket_pace(1000);
}

TCP congestion control algorithm

import std;
import tcp;

sub vcl_recv
{
  set req.http.X-Tcp = tcp.congestion_algorithm("bbr");
}

Here, the X-Tcp header field will be set to 0 when changing the congestion control algorithm succeeded. Otherwise, it will be -1, indicating an error.

See the tcp.congestion_algorithm() function for more information about congestion control algorithms.

API

congestion_algorithm

INT congestion_algorithm(STRING algorithm)

Set the client socket congestion control algorithm to algorithm. Returns 0 on success, and -1 on error.

sub vcl_recv {
  set req.http.x-tcp = tcp.congestion_algorithm("cubic");
}

To see your available algorithms:

# sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic bbr

The bbr congestion control algorithm requires kernel version 4.9.0 or later. See: https://www.vultr.com/docs/how-to-deploy-google-bbr-on-centos-7

Arguments:

  • algorithm accepts type STRING

Type: Function

Returns: Int

Restricted to: client

dump_info

VOID dump_info()

Write the contents of the TCP_INFO data structure into varnishlog.

sub vcl_recv {
  tcp.dump_info();
}

The varnishlog output could look like this:

VCL_Log    tcpi: snd_mss=1448 rcv_mss=536 lost=0 retrans=0
VCL_Log    tcpi2: pmtu=1500 rtt=12042 rttvar=6021 snd_cwnd=10 advmss=1448 reordering=3

This function is provided for backward compatibility, please refer to log_info() for the meaning of the different values and a better way of getting information about the kernel TCP info metrics.

Arguments: None

Type: Function

Returns: None

Restricted to: client

log_info

VOID log_info(STRING record_prefix = "tcpinfo", STRING fields = "snd_mss,rcv_mss,segs_out,total_retrans,delta:segs_out,delta:total_retrans,pmtu,rtt,rttvar", ENUM {text, column, json} format = text)

This function will produce a log when the processing of the client request ends. The log will contain the fields values from the TCP informations as reported by the kernel.

The following fields can be reported as calculated by the linux kernel:

  • advmss : advertised maximum segment size
  • data_segs_in : number of payload packets received from the client
  • data_segs_out : number of payload packets sent to the client
  • lost : number of currently queued packets marked lost
  • min_rtt : minimum estimated round trip time observed (in microseconds)
  • notsent_bytes : bytes ready to be sent to the client which have not been sent yet
  • pmtu : number of bytes which can be transmitted in a single packet
  • rcv_mss : maximum segment size of received packets from client
  • rcv_rtt : estimated round trip time of the client (in microseconds)
  • rcv_ssthresh : maximum receive capacity advertised to the client
  • retrans : number of currently queued packets being actively retransmitted
  • reordering : maximum number of duplicate acknowledgement before retransmitting
  • rtt : estimated round trip time (in microseconds)
  • rttvar : estimated mean deviation of the round trip time (variance)
  • segs_in : number of packets received from the client
  • segs_out : number of packets sent to the client
  • snd_cwnd : maximum number of packets that can be waiting for client acknowledgement (congestion window)
  • snd_mss : maximum segment size for transmitting to the client
  • snd_ssthresh : number of packets in the slow start threshold for transmitting to the client
  • total_retrans : number of packets retransmitted

Each item in the list can be prefixed by the delta: prefix meaning that the output should be the difference between the value at the end of processing and the value at the time of invocation of this function from VCL.

The record_prefix is used as a prefix in the output log and can later be used as a record selection criteria in a VSL query.

If format is set to column then the values only are in the output log.

If format is set to json then the key-value pairs are placed in JSON format in the output, note that all values are JSON numbers in the generated document. Note that the JSON format is subject to change in the future.

Arguments:

  • record_prefix accepts type STRING with a default value of tcpinfo optional

  • fields accepts type STRING with a default value of snd_mss,rcv_mss,segs_out,total_retrans,delta:segs_out,delta:total_retrans,pmtu,rtt,rttvar optional

  • format is an ENUM that accepts values of text, column, and json with a default value of text optional

Type: Function

Returns: None

Restricted to: client

get_estimated_rtt

REAL get_estimated_rtt()

Get the estimated round-trip-time for the client socket, measured in milliseconds.

sub vcl_recv
{
  if (tcp.get_estimated_rtt() > 300) {
    std.log("Client is far away!");
  }
}

Arguments: None

Type: Function

Returns: Real

Restricted to: client

set_socket_pace

VOID set_socket_pace(INT, ENUM {sess, req} scope = sess)

Socket pacing is a Linux method for rate limiting TCP connections in a network friendly way.

Controls TCP rate limiting for the client connection, where pace is measured in KB/s. The outgoing network interface used must be configured with a supported scheduler, such as fq.

sub vcl_recv
{
  # Set client max bandwidth to 1000kb/s for this client,
  # as long as the current network scheduler supports it:
  if (tcp.set_socket_pace(1000) != 0) {
    std.log("Failed to set pacing for client socket!");
  }
}

Servers utilizing rate limiting must change their network scheduler. This can be changed with a sysctl setting:

net.core.default_qdisc=fq

See: https://wiki.mikejung.biz/Sysctl_tweaks

The scope parameter has two options, req and sess:

  • req scope: content-based pacing (eg. large files)
  • sess scope: client-based pacing (eg. ACL)

Note that this is a no-op for HTTP/2 clients when used with req scope.

Arguments:

  • scope is an ENUM that accepts values of sess, and req with a default value of sess optional

Type: Function

Returns: None

Restricted to: client

get_socket_pace

INT get_socket_pace()

Get the socket pace.

Arguments: None

Type: Function

Returns: Int

Restricted to: client

get_quick_ack

INT get_quick_ack()

Get the current setting of the TCP_QUICKACK socket option of the client socket.

Arguments: None

Type: Function

Returns: Int

Restricted to: client

set_quick_ack

VOID set_quick_ack(INT quickack)

Set the current setting of the TCP_QUICKACK socket option of the client socket. When ‘quickack’ is 1, TCP ACK will be sent immediately without waiting for TCP to select an appropiate time to send one. This is useful to workaround a client sending small messages without using TCP_NODELAY. TCP may override this setting at a later time or delay ACKs for other reasons. This function may only be called from the client side.

Arguments:

  • quickack accepts type INT

Type: Function

Returns: None

Restricted to: client

Availability

The tcp VMOD is available in Varnish Enterprise version 6.0.0r0 and later.


®Varnish Software, Wallingatan 12, 111 60 Stockholm, Organization nr. 556805-6203