summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_reass.c
Commit message (Collapse)AuthorAgeFilesLines
* MFP4: @176978-176982, 176984, 176990-176994, 177441bz2010-04-291-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days
* Destroy TCP UMA zones (empty or not) upon network stack teardownbz2010-03-071-0/+9
| | | | | | | | | | | | to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet. We will still leak pages (especially for zones marked NOFREE). Reshuffle cleanup order in tcp_destroy() to get rid of what we can easily free first. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+1
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Remove unused VNET_SET() and related macros; only VNET_GET() isrwatson2009-07-161-3/+3
| | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-21/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Remove comment about moving tcp_reass() to its own file named tcp_reass.c,rwatson2009-05-251-2/+1
| | | | | | that happened a while ago. MFC after: 3 days
* Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() andrwatson2009-04-111-6/+6
| | | | | | | | TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days
* First pass at separating per-vnet initializer functionszec2009-04-061-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)
* Rather than using hidden includes (with cicular dependencies),bz2008-12-021-0/+1
| | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
* Change the initialization methodology for global variables scheduledzec2008-11-191-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Step 1.5 of importing the network stack virtualization infrastructurezec2008-10-021-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Commit step 1 of the vimage project, (network stack)bz2008-08-171-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros torwatson2008-04-171-1/+1
| | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)
* Add FBSDID to all files in netinet so that people can moresilby2007-10-071-1/+3
| | | | | | easily include file version information in bug reports. Approved by: re (kensmith)
* Complete the (mechanical) move of the TCP reassembly and timewaitandre2007-05-131-31/+2
| | | | | | | functions from their origininal place to their own files. TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c
* Drop everything that doesn't belong into this new file.andre2007-05-111-2980/+0
| | | | It's neither functional nor connected to the build yet.
* Move universally to ANSI C function declarations, with relativelyrwatson2007-05-101-1/+2
| | | | consistent style(9)-ish layout.
* o Fix style(9) bugs introduced in the last commit.maxim2007-05-091-3/+3
| | | | Pointed out by: bde
* o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.maxim2007-05-091-0/+2
| | | | | PR: kern/112517 Submitted by: vd
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofandre2007-05-061-22/+22
| | | | a decdicated sack_enable int for this bool. Change all users accordingly.
* o Remove redundant tcp reassembly check in header prediction codeandre2007-05-061-19/+9
| | | | | | o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()
* Reorder the TCP header prediction test to check for the most volatileandre2007-05-061-4/+6
| | | | values first to spend less time on a fallback to normal processing.
* Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segmentandre2007-05-061-65/+17
| | | | | | | | and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.
* Tweak comment at end of tcp_input() when calling into tcp_do_segment(): therwatson2007-05-041-3/+3
| | | | pcbinfo lock will be released as well, not just the pcb lock.
* o Fix INP lock leak in the minttl caseandre2007-04-231-5/+6
| | | | | o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()
* o Remove unncessary TOF_SIGLEN flag from struct tcpoptandre2007-04-201-1/+2
| | | | | o Correctly set to->to_signature in tcp_dooptions() o Update comments
* Add more KASSERT's.andre2007-04-201-0/+4
|
* Remove bogus check for accept queue length and associated failure handlingandre2007-04-201-16/+10
| | | | | | | | | | | | | | from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.
* Simplifly syncache_expand() and clarify its semantics. Zero is returnedandre2007-04-201-8/+8
| | | | | | | | | | | | | | | when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)
* Remove unused variable tcbinfo_mtx.rwatson2007-04-151-1/+0
|
* Change the TCP timer system from using the callout system five timesandre2007-04-111-30/+22
| | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)
* Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add someandre2007-04-041-0/+3
| | | | further INP_INFO_WLOCK_ASSERT() while there.
* Move last tcpcb initialization for the inbound connection case fromandre2007-04-041-10/+2
| | | | | | | | tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.
* Retire unused TCP_SACK_DEBUG.andre2007-04-041-1/+0
|
* In tcp_dooptions() skip over SACK options if it is a SYN segment.andre2007-04-041-0/+2
|
* When blackholing do a 'dropunlock' in the new world order to prevent theandre2007-03-281-1/+1
| | | | | | | INP_INFO_LOCK from leaking. Reported by: ache Found by: rwatson
* o Use a define for a buffer size.maxim2007-03-241-1/+10
| | | | | | | | Prodded by: db o Add missed vars for TCPDEBUG in tcp_do_segment(). Prodded by: tinderbox
* Split tcp_input() into its two functional parts:andre2007-03-231-132/+208
| | | | | | | | | | | | | | o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic. Change all KASSERT() messages to ("%s: ", __func__). The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made. Discussed with: rwatson
* Tidy up some code to conform better to surroundings and style(9), 0 = NULLandre2007-03-231-17/+16
| | | | and space/tab.
* Bring SACK option handling in tcp_dooptions() in line with all otherandre2007-03-231-4/+7
| | | | options and ajust users accordingly.
* ANSIfy function declarations and remove register keywords for variables.andre2007-03-211-50/+24
| | | | Consistently apply style to all function declarations.
* Tidy up IPFIREWALL_FORWARD sections and comments.andre2007-03-211-4/+3
|
* Update and clarify comments in first section of tcp_input().andre2007-03-211-15/+13
|
* Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and removeandre2007-03-211-57/+27
| | | | old dead T/TCP code.
* Tidy up tcp_log_in_vain and blackhole.andre2007-03-211-44/+31
|
* Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default itandre2007-03-211-5/+0
| | | | | | doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.
* Remove tcp_minmssoverload DoS detection logic. The problem it tried toandre2007-03-211-59/+0
| | | | | | protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.
* Match up SYSCTL declaration style.andre2007-03-191-14/+15
|
* Consolidate insertion of TCP options into a segment from within tcp_output()andre2007-03-151-2/+2
| | | | | | | | | | | | | | and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005
* This patch is provided to fix a couple of deployment issues observedqingli2007-03-071-5/+7
| | | | | | | | | | | | | | | | | | in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time . The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments. Reviewed by: silby, Mike Karels MFC after: 1 week
OpenPOWER on IntegriCloud