linux/include
Neal Cardwell ed66dfaf23 tcp: when scheduling TLP, time of RTO should account for current ACK
Fix the TLP scheduling logic so that when scheduling a TLP probe, we
ensure that the estimated time at which an RTO would fire accounts for
the fact that ACKs indicating forward progress should push back RTO
times.

After the following fix:

df92c8394e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")

we had an unintentional behavior change in the following kind of
scenario: suppose the RTT variance has been very low recently. Then
suppose we send out a flight of N packets and our RTT is 100ms:

t=0: send a flight of N packets
t=100ms: receive an ACK for N-1 packets

The response before df92c8394e that was:
  -> schedule a TLP for now + RTO_interval

The response after df92c8394e is:
  -> schedule a TLP for t=0 + RTO_interval

Since RTO_interval = srtt + RTT_variance, this means that we have
scheduled a TLP timer at a point in the future that only accounts for
RTT_variance. If the RTT_variance term is small, this means that the
timer fires soon.

Before df92c8394e this would not happen, because in that code, when
we receive an ACK for a prefix of flight, we did:

    1) Near the top of tcp_ack(), switch from TLP timer to RTO
       at write_queue_head->paket_tx_time + RTO_interval:
            if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
                   tcp_rearm_rto(sk);

    2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
            if (flag & FLAG_ACKED) {
                   tcp_rearm_rto(sk);

    3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
       to TLP at now + RTO_interval:
            if (icsk->icsk_pending == ICSK_TIME_RETRANS)
                   tcp_schedule_loss_probe(sk);

In df92c8394e we removed that 3-phase dance, and instead directly
set the TLP timer once: we set the TLP timer in cases like this to
write_queue_head->packet_tx_time + RTO_interval. So if the RTT
variance is small, then this means that this is setting the TLP timer
to fire quite soon. This means if the ACK for the tail of the flight
takes longer than an RTT to arrive (often due to delayed ACKs), then
the TLP timer fires too quickly.

Fixes: df92c8394e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-19 12:25:26 +09:00
..
acpi TTY/Serial patches for 4.15-rc1 2017-11-13 21:05:31 -08:00
asm-generic include/asm-generic/topology.h: remove unused parent_node() macro 2017-11-17 16:10:05 -08:00
clocksource arm64 updates for 4.15 2017-11-15 10:56:56 -08:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-11-14 10:52:09 -08:00
drm amdgpu DC display code for Vega. 2017-11-17 14:34:42 -08:00
dt-bindings We have two changes to the core framework this time around. The first being a 2017-11-17 20:04:24 -08:00
keys
kvm KVM: arm/arm64: Rework kvm_timer_should_fire 2017-11-06 16:23:17 +01:00
lib
linux Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2017-11-17 20:21:44 -08:00
math-emu
media media updates for v4.15-rc1 2017-11-15 20:30:12 -08:00
memory
misc
net tcp: when scheduling TLP, time of RTO should account for current ACK 2017-11-19 12:25:26 +09:00
pcmcia
ras
rdma Updates for 4.15 kernel merge window 2017-11-15 14:54:53 -08:00
scsi SCSI misc on 20171114 2017-11-14 16:23:44 -08:00
soc We have two changes to the core framework this time around. The first being a 2017-11-17 20:04:24 -08:00
sound ASoC: Updates for v4.15 2017-11-13 15:45:57 +01:00
target A couple of configfs cleanups: 2017-11-14 14:44:04 -08:00
trace Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00
uapi Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-17 20:18:37 -08:00
video
xen xen: features and fixes for v4.15-rc1 2017-11-16 13:06:27 -08:00