summaryrefslogtreecommitdiffstats
path: root/net/ipv4
diff options
context:
space:
mode:
authorYuchung Cheng <ycheng@google.com>2017-04-07 11:42:05 -0700
committerDavid S. Miller <davem@davemloft.net>2017-04-07 11:44:00 -0700
commitcc663f4d4c97b7297fb45135ab23cfd508b35a77 (patch)
treefd04acd46637db063b4b2ea1528c7b43e1542461 /net/ipv4
parent16cf72bb085626ae1323e59985d6cbb58a8f71d8 (diff)
downloadop-kernel-dev-cc663f4d4c97b7297fb45135ab23cfd508b35a77.zip
op-kernel-dev-cc663f4d4c97b7297fb45135ab23cfd508b35a77.tar.gz
tcp: restrict F-RTO to work-around broken middle-boxes
The recent extension of F-RTO 89fe18e44 ("tcp: extend F-RTO to catch more spurious timeouts") interacts badly with certain broken middle-boxes. These broken boxes modify and falsely raise the receive window on the ACKs. During a timeout induced recovery, F-RTO would send new data packets to probe if the timeout is false or not. Since the receive window is falsely raised, the receiver would silently drop these F-RTO packets. The recovery would take N (exponentially backoff) timeouts to repair N packet losses. A TCP performance killer. Due to this unfortunate situation, this patch removes this extension to revert F-RTO back to the RFC specification. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4')
-rw-r--r--net/ipv4/tcp_input.c20
1 files changed, 12 insertions, 8 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2c1f593..659d1ba 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1935,6 +1935,7 @@ void tcp_enter_loss(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk);
struct sk_buff *skb;
+ bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
bool is_reneg; /* is receiver reneging on SACKs? */
bool mark_lost;
@@ -1994,15 +1995,18 @@ void tcp_enter_loss(struct sock *sk)
tp->high_seq = tp->snd_nxt;
tcp_ecn_queue_cwr(tp);
- /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO
- * if a previous recovery is underway, otherwise it may incorrectly
- * call a timeout spurious if some previously retransmitted packets
- * are s/acked (sec 3.2). We do not apply that retriction since
- * retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS
- * so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO
- * on PTMU discovery to avoid sending new data.
+ /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous
+ * loss recovery is underway except recurring timeout(s) on
+ * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing
+ *
+ * In theory F-RTO can be used repeatedly during loss recovery.
+ * In practice this interacts badly with broken middle-boxes that
+ * falsely raise the receive window, which results in repeated
+ * timeouts and stop-and-go behavior.
*/
- tp->frto = sysctl_tcp_frto && !inet_csk(sk)->icsk_mtup.probe_size;
+ tp->frto = sysctl_tcp_frto &&
+ (new_recovery || icsk->icsk_retransmits) &&
+ !inet_csk(sk)->icsk_mtup.probe_size;
}
/* If ACK arrived pointing to a remembered SACK, it means that our
OpenPOWER on IntegriCloud