summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_reass.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix devfs rules not applied by default for jails.delphij2014-04-301-3/+4
| | | | | | | | | | | | | Fix OpenSSL use-after-free vulnerability. Fix TCP reassembly vulnerability. Security: FreeBSD-SA-14:07.devfs Security: CVE-2014-3001 Security: FreeBSD-SA-14:08.tcp Security: CVE-2014-3000 Security: FreeBSD-SA-14:09.openssl Security: CVE-2010-5298
* uma_zone_set_max() directly returns the rounded effective zoneandre2013-02-011-4/+4
| | | | | | | limit. Use the return value directly instead of doing a second uma_zone_set_max() step. MFC after: 1 week
* Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied,glebius2012-12-251-1/+1
| | | | and arg2 doesn't pass size of arg1.
* Simplify implementation of net.inet.tcp.reass.maxsegments andandre2012-10-281-17/+11
| | | | | | net.inet.tcp.reass.cursegments. MFC after: 2 weeks
* Plug a TCP reassembly UMA zone leak introduced in r226113 by only using thelstewart2011-11-271-17/+22
| | | | | | | | | | backup stack queue entry when the zone is exhausted, otherwise we leak a zone allocation each time we plug a hole in the reassembly queue. Reported by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Tested by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Reviewed by: bz (very brief sanity check) MFC after: 3 days
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-1/+1
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Prevent TCP sessions from stalling indefinitely in reassemblyandre2011-10-071-2/+28
| | | | | | | | | | | | | | | | | | | | when reaching the zone limit of reassembly queue entries. When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress. Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket. Add logging under net.inet.tcp.log_debug for reassembly queue issues. Reviewed by: lsteward (previous version) Tested by: Steven Hartland <killing-at-multiplay.co.uk> MFC after: 3 days
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needmdf2011-01-181-3/+6
| | | | | | to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE.
* Trim extra spaces before tabs.jhb2011-01-071-1/+1
|
* After some off-list discussion, revert a number of changes to thedim2010-11-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
* Add new, per connection, statistics for TCP, including:gnn2010-11-171-0/+1
| | | | | | | | | | Retransmitted Packets Zero Window Advertisements Out of Order Receives These statistics are available via the -T argument to netstat(1). MFC after: 2 weeks
* Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughoutdim2010-11-141-4/+4
| | | | the tree.
* Retire the system-wide, per-reassembly queue segment limit. The mechanism is farlstewart2010-10-161-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | too coarse grained to be useful and the default value significantly degrades TCP performance on moderate to high bandwidth-delay product paths with non-zero loss (e.g. 5+Mbps connections across the public Internet often suffer). Replace the outgoing mechanism with an individual per-queue limit based on the number of MSS segments that fit into the socket's receive buffer. This should strike a good balance between performance and the potential for resource exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer autotuning (which is enabled by default), the reassembly queue tracks the socket buffer and benefits too. As the XXX comment suggests, my testing uncovered some unexpected behaviour which requires further investigation. By using so->so_rcv.sb_hiwat instead of sbspace(&so->so_rcv), we allow more segments to be held across both the socket receive buffer and reassembly queue than we probably should. The tradeoff is better performance in at least one common scenario, versus a devious sender's ability to consume more resources on a FreeBSD receiver. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
* - Switch the "net.inet.tcp.reass.cursegments" andlstewart2010-10-161-13/+23
| | | | | | | | | | | | | | | | | "net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone stats. The value returned by the cursegments sysctl is approximate owing to the way in which uma_zone_get_cur is implemented. - Discontinue use of V_tcp_reass_qsize as a global reassembly segment count variable in the reassembly implementation. The variable was used without proper synchronisation and was duplicating accounting done by UMA already. The lack of synchronisation was particularly problematic on SMP systems terminating many TCP sessions, resulting in poor TCP performance for connections with non-zero packet loss. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo (as part of a larger patch) MFC after: 2 weeks
* Internalise reassembly queue related functionality and variables which shouldlstewart2010-09-251-3/+25
| | | | | | | | | | not be used outside of the reassembly queue implementation. Provide a new function to flush all segments from a reassembly queue and call it from the appropriate places instead of manipulating the queue directly. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
* MFP4: @176978-176982, 176984, 176990-176994, 177441bz2010-04-291-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days
* Destroy TCP UMA zones (empty or not) upon network stack teardownbz2010-03-071-0/+9
| | | | | | | | | | | | to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet. We will still leak pages (especially for zones marked NOFREE). Reshuffle cleanup order in tcp_destroy() to get rid of what we can easily free first. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+1
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Remove unused VNET_SET() and related macros; only VNET_GET() isrwatson2009-07-161-3/+3
| | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-21/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Remove comment about moving tcp_reass() to its own file named tcp_reass.c,rwatson2009-05-251-2/+1
| | | | | | that happened a while ago. MFC after: 3 days
* Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() andrwatson2009-04-111-6/+6
| | | | | | | | TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days
* First pass at separating per-vnet initializer functionszec2009-04-061-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)
* Rather than using hidden includes (with cicular dependencies),bz2008-12-021-0/+1
| | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
* Change the initialization methodology for global variables scheduledzec2008-11-191-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Step 1.5 of importing the network stack virtualization infrastructurezec2008-10-021-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Commit step 1 of the vimage project, (network stack)bz2008-08-171-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros torwatson2008-04-171-1/+1
| | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)
* Add FBSDID to all files in netinet so that people can moresilby2007-10-071-1/+3
| | | | | | easily include file version information in bug reports. Approved by: re (kensmith)
* Complete the (mechanical) move of the TCP reassembly and timewaitandre2007-05-131-31/+2
| | | | | | | functions from their origininal place to their own files. TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c
* Drop everything that doesn't belong into this new file.andre2007-05-111-2980/+0
| | | | It's neither functional nor connected to the build yet.
* Move universally to ANSI C function declarations, with relativelyrwatson2007-05-101-1/+2
| | | | consistent style(9)-ish layout.
* o Fix style(9) bugs introduced in the last commit.maxim2007-05-091-3/+3
| | | | Pointed out by: bde
* o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.maxim2007-05-091-0/+2
| | | | | PR: kern/112517 Submitted by: vd
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofandre2007-05-061-22/+22
| | | | a decdicated sack_enable int for this bool. Change all users accordingly.
* o Remove redundant tcp reassembly check in header prediction codeandre2007-05-061-19/+9
| | | | | | o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()
* Reorder the TCP header prediction test to check for the most volatileandre2007-05-061-4/+6
| | | | values first to spend less time on a fallback to normal processing.
* Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segmentandre2007-05-061-65/+17
| | | | | | | | and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.
* Tweak comment at end of tcp_input() when calling into tcp_do_segment(): therwatson2007-05-041-3/+3
| | | | pcbinfo lock will be released as well, not just the pcb lock.
* o Fix INP lock leak in the minttl caseandre2007-04-231-5/+6
| | | | | o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()
* o Remove unncessary TOF_SIGLEN flag from struct tcpoptandre2007-04-201-1/+2
| | | | | o Correctly set to->to_signature in tcp_dooptions() o Update comments
* Add more KASSERT's.andre2007-04-201-0/+4
|
* Remove bogus check for accept queue length and associated failure handlingandre2007-04-201-16/+10
| | | | | | | | | | | | | | from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.
* Simplifly syncache_expand() and clarify its semantics. Zero is returnedandre2007-04-201-8/+8
| | | | | | | | | | | | | | | when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)
* Remove unused variable tcbinfo_mtx.rwatson2007-04-151-1/+0
|
* Change the TCP timer system from using the callout system five timesandre2007-04-111-30/+22
| | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)
* Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add someandre2007-04-041-0/+3
| | | | further INP_INFO_WLOCK_ASSERT() while there.
* Move last tcpcb initialization for the inbound connection case fromandre2007-04-041-10/+2
| | | | | | | | tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.
* Retire unused TCP_SACK_DEBUG.andre2007-04-041-1/+0
|
* In tcp_dooptions() skip over SACK options if it is a SYN segment.andre2007-04-041-0/+2
|
OpenPOWER on IntegriCloud