summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_input.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-12-051-57/+61
|\
| * MFC r286227, r286443:jch2016-11-241-57/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r286227: Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc. r286443: Fix a kernel assertion issue introduced with r286227: Avoid too strict INP_INFO_RLOCK_ASSERT checks due to tcp_notify() being called from in6_pcbnotify(). Reported by: Larry Rosenman <ler@lerctr.org> Submitted by: markj, jch
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-11-021-0/+18
|\ \ | |/
| * MFC r307551:jch2016-10-251-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a double-free when an inp transitions to INP_TIMEWAIT state after having been dropped. This change enforces in_pcbdrop() logic in tcp_input(): "in_pcbdrop() is used by TCP to mark an inpcb as unused and avoid future packet delivery or event notification when a socket remains open but TCP has closed." PR: 203175 Reported by: Palle Girgensohn, Slawa Olhovchenkov Tested by: Slawa Olhovchenkov Reviewed by: Slawa Olhovchenkov Approved by: gnn, Slawa Olhovchenkov Differential Revision: https://reviews.freebsd.org/D8211 Sponsored by: Verisign, inc
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-08-081-12/+16
|\ \ | |/
| * MFC r271119, r272081:jch2016-07-271-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r271119: In tcp_input(), don't acquire the pcbinfo global write lock for SYN packets targeting a listening socket. Permit to reduce TCP input processing starvation in context of high SYN load (e.g. short-lived TCP connections or SYN flood). Submitted by: Julien Charbon <jcharbon@verisign.com> Reviewed by: adrian, hiren, jhb, Mike Bentkofsky r272081: Catch up with r271119.
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-06-211-1/+1
|\ \ | |/
| * MFC r300240truckman2016-06-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change net.inet.tcp.ecn.enable sysctl mib from a binary off/on control to a three way setting. 0 - Totally disable ECN. (no change) 1 - Enable ECN if incoming connections request it. Outgoing connections will request ECN. (no change from present != 0 setting) 2 - Enable ECN if incoming connections request it. Outgoing conections will not request ECN. Change the default value of net.inet.tcp.ecn.enable from 0 to 2. Linux version 2.4.20 and newer, Solaris, and Mac OS X 10.5 and newer have similar capabilities. The actual values above match Linux, and the default matches the current Linux default. Reviewed by: eadler Relnotes: yes Differential Revision: https://reviews.freebsd.org/D6386
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-05-091-2/+11
|\ \ | |/
| * MFC r298408:jtl2016-05-061-2/+11
| | | | | | | | | | | | Prevent underflows in tp->snd_wnd if the remote side ACKs more than tp->snd_wnd. This can happen, for example, when the remote side responds to a window probe by ACKing the one byte it contains.
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2016-01-131-10/+58
|\ \ | |/
| * MFC: r292003hiren2016-01-111-8/+33
| | | | | | | | Improve tcp duplicate ack processing when SACK is present.
| * MFC: r290122hiren2016-01-111-2/+25
| | | | | | | | | | | | | | | | | | Calculate the correct amount of bytes that are in-flight for a connection as suggested by RFC 6675. MFC: r292046 r290122 added 4 bytes and removed 8 in struct sackhint. Add a pad entry of 4 bytes to restore the size.
* | Merge remote-tracking branch 'origin/stable/10' into develRenato Botelho2015-12-281-6/+86
|\ \ | |/
| * MFC r292706:pkelsey2015-12-281-6/+86
| | | | | | | | | | | | | | | | | | | | Implementation of server-side TCP Fast Open (TFO) [RFC7413]. TFO is disabled by default in the kernel build. See the top comment in sys/netinet/tcp_fastopen.c for implementation particulars. Differential Revision: https://reviews.freebsd.org/D4350 Sponsored by: Verisign, Inc.
* | Merge branch 'stable/10' into develRenato Botelho2015-10-211-0/+10
|\ \ | |/
| * MFC r288914hiren2015-10-141-0/+10
| | | | | | | | Add a comment specifying how we implement rfc3042.
* | MFC r275716:Luiz Otavio O Souza2015-10-201-2/+0
|/ | | | | | | | | | | Do not count security policy violation twice. ipsec*_in_reject() do this by their own. Obtained from: Yandex LLC Sponsored by: Yandex LLC TAG: IPSEC-HEAD Issue: #4841
* MFC r285567:pkelsey2015-07-211-0/+1
| | | | | | | | Check TCP timestamp option flag so that the automatic receive buffer scaling code does not use an uninitialized timestamp echo reply value from the stack when timestamps are not enabled. Approved by: re (gjb)
* MFC r266420 (by adrian)hiren2015-06-191-0/+1
| | | | | | | | Ensure that the flowid hashtype is assigned to the inp if the flowid is also assigned. Spotted by: gallatin Tested by: gallatin
* MFC r275358 r275483 r276982 - Removing M_FLOWID by hps@hiren2015-04-241-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r275358: Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. r275483: Remove M_FLOWID from SCTP code. r276982: Remove no longer used "M_FLOWID" flag from mbuf.h and update the netisr manpage. Note: The FreeBSD version has been bumped. Reviewed by: hps, tuexen Sponsored by: Limelight Networks
* MFC r271946 and r272595:hselasky2014-11-031-0/+2
| | | | | | | | | Improve transmit sending offload, TSO, algorithm in general. This change allows all HCAs from Mellanox Technologies to function properly when TSO is enabled. See r271946 and r272595 for more details about this commit. Sponsored by: Mellanox Technologies
* Fix Denial of Service in TCP packet processing.delphij2014-09-161-5/+1
| | | | | Security: FreeBSD-SA-14:19.tcp Approved by: re (implicit, security advisory)
* MFC r266620:bz2014-08-161-4/+0
| | | | | | Remove the prototpye for the static inline function tcp_signature_verify_input(). The function is defined before first use already.
* MFC r266597:bz2014-08-161-2/+0
| | | | | | | | Remove the prototypes for things that are no longer file local but were moved to the header file. Was suppoed to be MFCed with: r266596 Pointy hat to: bz
* MFC r266596:bz2014-08-161-20/+0
| | | | | | | | Move the tcp_fields_to_host() and tcp_fields_to_net() (inline) functions to the tcp_var.h header file in order to avoid further duplication with upcoming commits. Reviewed by: np
* MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2014-01-171-3/+3
|
* MFC r258605: Convert over the TCP probes to use mtod()avg2014-01-171-8/+8
| | | | MFC slacker: adrian
* Revert MFC of r258821 - it was already handled by MFC of r239672.peter2014-01-081-4/+2
| | | | Pointy hat to: peter
* MFC r258821 - fix tcp simultaneous closepeter2014-01-071-2/+4
| | | | PR: kern/99188
* MFC r259906: Draft-ietf-tcpm-initcwnd-05 became RFC6928.pluknet2014-01-021-2/+2
|
* MFC r256920:andre2013-10-291-4/+7
| | | | | | | | | | | | | | | | | | | | | | | The TCP delayed ACK logic isn't aware of LRO passing up large aggregated segments thinking it received only one segment. This causes it to enable the delay the ACK for 100ms to wait for another segment which may never come because all the data was received already. Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes us further away from acking every other packet; b) it introduces additional delay in responding to the sender. The latter is especially bad because it is in the nature of LRO to aggregated all segments of a burst with no more coming until an ACK is sent back. Change the delayed ACK logic to detect LRO segments by being larger than the MSS for this connection and issuing an immediate ACK for them to keep the ACK clock ticking without interruption. Reported by: julian, cperciva Tested by: cperciva Reviewed by: lstewart Approved by: re (glebius)
* When processing ACK in tcp_do_segment, use sbcut_locked() instead ofglebius2013-10-091-2/+5
| | | | | | | | | | | sbdrop_locked() to cut acked mbufs from the socket buffer. Free this chain a batch manner after the socket buffer lock is dropped. This measurably reduces contention on socket buffer. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)
* Implement the ip, tcp, and udp DTrace providers. The probe definitions usemarkj2013-08-251-10/+30
| | | | | | | | | dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month
* Remove the large part of struct ipsecstat. Only few fields of thisae2013-07-231-2/+2
| | | | | | | | | | | structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure. This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters. Reported by: Taku YAMAMOTO, Maciej Milewski
* Extend debug logging of TCP timestamp related specificationandre2013-07-101-5/+25
| | | | | | violations. Update related comments and style.
* Use new macros to implement ipstat and tcpstat using PCPU counters.ae2013-07-091-58/+7
| | | | Change interface of kread_counters() similar ot kread() in the netstat(1).
* Fix kmod_*stat_inc() after r249276. The incorrect code actuallyglebius2013-06-211-1/+1
| | | | | | | | increased the pointer, not the memory it points to. In collaboration with: kib Reported & tested by: Ian FREISLICH <ianf clue.co.za> Sponsored by: Nginx, Inc.
* Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statisticsae2013-06-201-2/+2
| | | | | | accounting. MFC after: 2 weeks
* Allow drivers to specify a maximum TSO length in bytes if they areandre2013-06-031-7/+10
| | | | | | | | | | | | | | | | | | | | | | | limited in the amount of data they can handle at once. Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet. The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by: cperciva MFC after: 1 week (using spare struct members to preserve ABI)
* When doing RFC3042 limited transmit on the first on secondandre2013-04-231-1/+12
| | | | | | | | | duplicate ACK make sure we actually have new data to send. This prevents us from sending unneccessary pure ACKs. Reported by: Matt Miller <matt@matthewjmiller.net> Tested by: Matt Miller <matt@matthewjmiller.net> MFC after: 2 weeks
* Fix a race condition on tcp listen socket teardown with pendingandre2013-04-091-0/+9
| | | | | | | | | | | | | | connections in the accept queue and contiguous new incoming SYNs. Compared to the original submitters patch I've moved the test next to the SYN handling to have it together in a logical unit and reworded the comment explaining the issue. Submitted by: Matt Miller <matt@matthewjmiller.net> Submitted by: Juan Mojica <jmojica@gmail.com> Reviewed by: Matt Miller (changes) Tested by: pho MFC after: 1 week
* Fix VIMAGE build.glebius2013-04-091-2/+2
|
* Merge from projects/counters: TCP/IP stats.glebius2013-04-081-10/+64
| | | | | | | | | Convert 'struct ipstat' and 'struct tcpstat' to counter(9). This speeds up IP forwarding at extreme packet rates, and makes accounting more precise. Sponsored by: Nginx, Inc.
* Keep fwd_tag around for subsequent pcb lookupsemaste2013-03-291-17/+8
| | | | | | | | | | For TIMEWAIT handling tcp_input may have to jump back for an additional pass through pcblookup. Prior to this change the fwd_tag had been discarded after the first lookup, so a new connection attempt delivered locally via 'ipfw fwd' would fail to find a match. As of r248886 the tag will be detached and freed when passed to the socket buffer.
* Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited"lstewart2013-01-221-1/+1
| | | | | | | | | | | | logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could therefore potentially corrupt the result (although under normal operation, neither variable should legitmately exceed 32 bits). [1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html Submitted by: jhb MFC after: 1 week
* Fix !INET6 build after r244365.glebius2012-12-181-2/+11
|
* Clear correct flag in INET6 case.glebius2012-12-181-1/+1
|
* Since we use different flags to detect tcp forwarding, and we share theae2012-12-171-1/+2
| | | | | | | same code for IPv4 and IPv6 in tcp_input, we should check both M_IP_NEXTHOP and M_IP6_NEXTHOP flags. MFC after: 3 days
* Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it,glebius2012-12-121-0/+2
| | | | | | | | | but later after processing and freeing the tag, we need to jump back again to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to process and free the tag for second time. Reported & tested by: Pawel Tyll <ptyll nitronet.pl> MFC after: 3 days
OpenPOWER on IntegriCloud