Christian Huitema's blog

Cloudy sky, waves on the sea, the sun is
shining

Update QUIC timers once per RTT

22 Jun 2023

In a previous post I observed that the classic way to compute RTT statistics and retransmission timers does not work well. When acknowledgements are too frequent, the correlation between successive RTT measurements causes the smoothed RTT estimate to track closely the last values, and the RTT variations to be widely underestimated. This imprecision impacts both the effectiveness of loss recovery and the implementation of delay based congestion algorithms like BBR. I think this can be fixed by updating the variables just “once per RTT”. In this paper, I first describe the once per RTT update algorithm, then report on promising measurements using the simulations in the Picoquic test suite.

The once per RTT update algorithm

The goal of the algorithm is to update the smoothed_rtt, rttvar and PTO variables just once per RTT. The once per RTT update algorithm operates as follow:

initialize smoothed_rtt and rttvar as specified in RFC 9002.
```
  smoothed_rtt = kInitialRtt
  rttvar = kInitialRtt / 2
```

remember the packet number of the first packet sent in an RTT measurement epoch, and initialize epoch specific variables:

  rtt_sample_number_in_epoch = 0
  rtt_epoch_pn = last_packet_number_sent + 1
  rtt_epoch_nb_samples = 0
  rtt_epoch_sum_samples = 0
  rtt_epoch_sample_max = 0
  rtt_epoch_sample_min = UINT64_MAX;

upon receiving an acknowledgement, process the RTT sample:

if this is the first sample after initialization, set smoothed_rtt rttvar and rtt
```
  smoothed_rtt = latest_rtt
  rttvar = latest_rtt / 2
  min_rtt = latest_rtt
```

else, update the per epoch statistics:

  adjusted_rtt = latest_rtt
  if (latest_rtt >= min_rtt + ack_delay):
  adjusted_rtt = latest_rtt - ack_delay
  rtt_epoch_nb_samples += 1
  rtt_epoch_sum_samples += adjusted_rtt
  rtt_epoch_sample_max = max(adjusted_rtt, rtt_epoch_sample_max)
  rtt_epoch_sample_min = min(adjusted_rtt, rtt_epoch_sample_min)
  min_rtt = min(min_rtt, adjusted_rtt)

if the acknowledged packet number is greater than the first packet number sent in the epoch, update the RTT variables. The rttvar_sample is set to the maximum value of the difference between the smoothed_rtt and the maximum or minimum RTT seen during the epoch.

  rttvar_sample = max(abs(rtt_epoch_sample_max - smoothed_rtt), 
                      abs(rtt_epoch_sample_min - smoothed_rtt))
  rttvar = 3/4 * rttvar + 1/4 * rttvar_sample
  rtt_epoch_estimate = rtt_epoch_sum_samples / rtt_epoch_nb_samples
  smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * rtt_epoch_estimate

  PTO = smoothed_rtt + 3*rttvar + max_ack_delay

Then, reinialize the per-epoch variables.

Evaluating the once per RTT algorithm

After coding the once RTT algorithm in picoquic I ran the picoquic test suite to evaluate the results. The test suite include a couple hundred simulations of QUIC connections under various conditions. Each of these tests verifies result such as expected simulated time for completing the smulated transfers, or sometimes maximum number of retransmission errors or maximum number of spurious retransmissions. All these tests passed, which means that the new algorithm was well within the tolerances set for the new algorithms. Looking at the details, I saw that:

in many cases, the old and new algorithm produced exactly the same results. These is expected, because many tests do not involve packet losses, or involve packet losses that are detected via packet numbers instead of timers.
when they differ, the new algorithm results were generally better, but not always. In particular, the results tended to be better when simulating RTT-dependent congestion control algorithm like BBR, but delays could be very slightly higher for some tests using Cubic.

The big differences happened with tests simulating connection events such as link breaks in multipath scenarios. In these cases, having a better estimate of the RTT helps react sooner. In single path scenarios link breaks result in connection failures and the measurements are not very interesting, but the test suite includes “MTU Drop” tests that simulate a change in the path MTU size, and measure how long it takes to ract to the event and continue the session with the new MTU. This reaction is also very dependent on RTT estimates, and I analyzed it in details.

Comparison of RTT algorithms in MTU-Drop test

The “MTU Drop” test simulates the download of a 1MB file over a 1Mbps link, with the transmission delay set at 100ms in each direction, for a nominal RTT of 200ms. Initially, the path MTU is set to 1500 bytes. The code simulates the establishment of the connection, then waiting for 1 second, and verifying that path MTU discovery was performed, and that the connection is using a packet size of 1440 bytes. At that point, the path MTU is reset to 1394 bytes. The test verifies that the code notices the drop in size, and renegotiates an appropriate path MTU. The test verifies that the download completes in the expected time.

The goal of the test is to verify the PMTU adaptation, but that adaptation depends on the packet loss recovery algorithm, which itself depends on the computation of the RTT and the retransmission timer. We evaluate two ways to computing these timers:

Classic, as specified in RFC 9002
Once per RTT, as explained above.

When using the once per RTT computation, we evaluate tree variants:

Once 4V, in which the retransmission timer is computed as specified in RFC 9002, adding 4 times “RTT_VAR” and the ACK delay to the smoothed RTT value
Once MaxRTT, in which the retransmission timer is set to the MaxRTT plus RTT_VAR plus the ACK delay
Once 3V, which is the same as Once 4V, but only adding 3 times “RTT_VAR” instead of 4.

The results of the four test runs are summarized in the following table:

RTT-method	Completion	Delta time%	Losses	Delta-loss%
Classic	11.3	0.0%	79	0%
Once 4V	10.7	-5.6%	50	-37%
Once MaxRTT	11.5	1.3%	48	-39%
Once 3V	10.5	-7.2%	48	-39%

We see that the “Once 3V” performs better than the classic implementation: the download completes faster, and there are significantly fewer packet losses.

It is hard to understand exactly why one specific tuning works better than others. When looking at “Once 3V” versus “Classic”, it seems that most of the difference happened during the initial “start up” phase. The log of the “Once 3V” trial shows it exiting start-up 631ms after the beginning of the connection, with the “smoothed RTT” set at 224ms. The “Classic” log shows it exiting start up 991ms after the start, with the smoothed RTT set at 353ms. By the time the MTU drop is simulated, the “classic” connection has the smoothed RTT set at 407ms, while the “once 3V” connection has the RTT at 217ms. The larger RTT means that the “classic” connection takes almost twice as long to detect the packet losses caused by the MTU drop and recovers almost twice slower.

That same reasoning explains why setting the retransmission timer from the Max RTT does not work so well. In our samples, the maximum observed RTT happens at the end of start up, before stabilizing to lower values. The retransmission timer set from Max RTT is too large, and losses are not detected quickly enough.

How many deviations in the timer?

We thus arrive to the notion that “large retransmission timers slow down recovery”. That’s the observation that pushes us to deviate from the original formula:

retransmission timer = smoothed_RTT + 4 * rtt_var + ack_delay

That formula seems pretty arbitrary. It is related to the classic observations that for a Gaussian distribution, observations over 4 standard deviations away from the average happen less than 0.01% of the time, which seems a good way to minimize “spurious packet loss detection”. But it is far from clear that the distribution of RTT samples is Gaussian, and the computation of rtt_var is only a vague approximation of computing a standard deviation. In any case, the use of N times rtt_var is a tradeoff, fewer spurious detection versus longer recovery time from errors.

In QUIC, most loss detections are based on packet number comparisons, not timers. This, alone, renders spurious loss detections fairly rare. In addition, congestion control algorithms like BBR recover much better from spurious loss detection than old algorithms like RENO. It is probably time to revisit the tradeoff. Our tests show that using a coefficent 3 rather than 4 speeds up error detection without visible side effect.

Further work

The “once per RTT” algorithm design appears sound, but the evaluations are based on simulations. The next step is to works with other QUIC developers and see whether the algorithm can be validated in large scale deployments.

Comments

If you want to start or join a discussion on this post, the simplest way is to send a toot on the Fediverse/Mastodon to @huitema@social.secret-wg.org.