summaryrefslogtreecommitdiffstats
path: root/sys/netinet
diff options
context:
space:
mode:
authorlstewart <lstewart@FreeBSD.org>2010-10-16 07:12:39 +0000
committerlstewart <lstewart@FreeBSD.org>2010-10-16 07:12:39 +0000
commit91a790ee6eb79ee70716805678dce2cca93a990b (patch)
treed94319f99714864b36efa859a45e46f057caf4d7 /sys/netinet
parentab26135efcc4fcf05ac5f88fde6ca4a8275a4b75 (diff)
downloadFreeBSD-src-91a790ee6eb79ee70716805678dce2cca93a990b.zip
FreeBSD-src-91a790ee6eb79ee70716805678dce2cca93a990b.tar.gz
Retire the system-wide, per-reassembly queue segment limit. The mechanism is far
too coarse grained to be useful and the default value significantly degrades TCP performance on moderate to high bandwidth-delay product paths with non-zero loss (e.g. 5+Mbps connections across the public Internet often suffer). Replace the outgoing mechanism with an individual per-queue limit based on the number of MSS segments that fit into the socket's receive buffer. This should strike a good balance between performance and the potential for resource exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer autotuning (which is enabled by default), the reassembly queue tracks the socket buffer and benefits too. As the XXX comment suggests, my testing uncovered some unexpected behaviour which requires further investigation. By using so->so_rcv.sb_hiwat instead of sbspace(&so->so_rcv), we allow more segments to be held across both the socket receive buffer and reassembly queue than we probably should. The tradeoff is better performance in at least one common scenario, versus a devious sender's ability to consume more resources on a FreeBSD receiver. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
Diffstat (limited to 'sys/netinet')
-rw-r--r--sys/netinet/tcp_reass.c26
1 files changed, 15 insertions, 11 deletions
diff --git a/sys/netinet/tcp_reass.c b/sys/netinet/tcp_reass.c
index e361551..9efacdc 100644
--- a/sys/netinet/tcp_reass.c
+++ b/sys/netinet/tcp_reass.c
@@ -92,12 +92,6 @@ SYSCTL_VNET_PROC(_net_inet_tcp_reass, OID_AUTO, cursegments, CTLFLAG_RD,
&VNET_NAME(tcp_reass_qsize), 0, &tcp_reass_sysctl_qsize, "I",
"Global number of TCP Segments currently in Reassembly Queue");
-static VNET_DEFINE(int, tcp_reass_maxqlen) = 48;
-#define V_tcp_reass_maxqlen VNET(tcp_reass_maxqlen)
-SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, maxqlen, CTLFLAG_RW,
- &VNET_NAME(tcp_reass_maxqlen), 0,
- "Maximum number of TCP Segments per individual Reassembly Queue");
-
static VNET_DEFINE(int, tcp_reass_overflows) = 0;
#define V_tcp_reass_overflows VNET(tcp_reass_overflows)
SYSCTL_VNET_INT(_net_inet_tcp_reass, OID_AUTO, overflows, CTLFLAG_RD,
@@ -197,13 +191,23 @@ tcp_reass(struct tcpcb *tp, struct tcphdr *th, int *tlenp, struct mbuf *m)
goto present;
/*
- * Limit the number of segments in the reassembly queue to prevent
- * holding on to too many segments (and thus running out of mbufs).
- * Make sure to let the missing segment through which caused this
- * queue.
+ * Limit the number of segments that can be queued to reduce the
+ * potential for mbuf exhaustion. For best performance, we want to be
+ * able to queue a full window's worth of segments. The size of the
+ * socket receive buffer determines our advertised window and grows
+ * automatically when socket buffer autotuning is enabled. Use it as the
+ * basis for our queue limit.
+ * Always let the missing segment through which caused this queue.
+ * NB: Access to the socket buffer is left intentionally unlocked as we
+ * can tolerate stale information here.
+ *
+ * XXXLAS: Using sbspace(so->so_rcv) instead of so->so_rcv.sb_hiwat
+ * should work but causes packets to be dropped when they shouldn't.
+ * Investigate why and re-evaluate the below limit after the behaviour
+ * is understood.
*/
if (th->th_seq != tp->rcv_nxt &&
- tp->t_segqlen >= V_tcp_reass_maxqlen) {
+ tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
V_tcp_reass_overflows++;
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
OpenPOWER on IntegriCloud