summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_reass.c
Commit message (Collapse)AuthorAgeFilesLines
* Add FBSDID to all files in netinet so that people can moresilby2007-10-071-1/+3
| | | | | | easily include file version information in bug reports. Approved by: re (kensmith)
* Complete the (mechanical) move of the TCP reassembly and timewaitandre2007-05-131-31/+2
| | | | | | | functions from their origininal place to their own files. TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c
* Drop everything that doesn't belong into this new file.andre2007-05-111-2980/+0
| | | | It's neither functional nor connected to the build yet.
* Move universally to ANSI C function declarations, with relativelyrwatson2007-05-101-1/+2
| | | | consistent style(9)-ish layout.
* o Fix style(9) bugs introduced in the last commit.maxim2007-05-091-3/+3
| | | | Pointed out by: bde
* o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.maxim2007-05-091-0/+2
| | | | | PR: kern/112517 Submitted by: vd
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofandre2007-05-061-22/+22
| | | | a decdicated sack_enable int for this bool. Change all users accordingly.
* o Remove redundant tcp reassembly check in header prediction codeandre2007-05-061-19/+9
| | | | | | o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()
* Reorder the TCP header prediction test to check for the most volatileandre2007-05-061-4/+6
| | | | values first to spend less time on a fallback to normal processing.
* Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segmentandre2007-05-061-65/+17
| | | | | | | | and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.
* Tweak comment at end of tcp_input() when calling into tcp_do_segment(): therwatson2007-05-041-3/+3
| | | | pcbinfo lock will be released as well, not just the pcb lock.
* o Fix INP lock leak in the minttl caseandre2007-04-231-5/+6
| | | | | o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()
* o Remove unncessary TOF_SIGLEN flag from struct tcpoptandre2007-04-201-1/+2
| | | | | o Correctly set to->to_signature in tcp_dooptions() o Update comments
* Add more KASSERT's.andre2007-04-201-0/+4
|
* Remove bogus check for accept queue length and associated failure handlingandre2007-04-201-16/+10
| | | | | | | | | | | | | | from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.
* Simplifly syncache_expand() and clarify its semantics. Zero is returnedandre2007-04-201-8/+8
| | | | | | | | | | | | | | | when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)
* Remove unused variable tcbinfo_mtx.rwatson2007-04-151-1/+0
|
* Change the TCP timer system from using the callout system five timesandre2007-04-111-30/+22
| | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)
* Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add someandre2007-04-041-0/+3
| | | | further INP_INFO_WLOCK_ASSERT() while there.
* Move last tcpcb initialization for the inbound connection case fromandre2007-04-041-10/+2
| | | | | | | | tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.
* Retire unused TCP_SACK_DEBUG.andre2007-04-041-1/+0
|
* In tcp_dooptions() skip over SACK options if it is a SYN segment.andre2007-04-041-0/+2
|
* When blackholing do a 'dropunlock' in the new world order to prevent theandre2007-03-281-1/+1
| | | | | | | INP_INFO_LOCK from leaking. Reported by: ache Found by: rwatson
* o Use a define for a buffer size.maxim2007-03-241-1/+10
| | | | | | | | Prodded by: db o Add missed vars for TCPDEBUG in tcp_do_segment(). Prodded by: tinderbox
* Split tcp_input() into its two functional parts:andre2007-03-231-132/+208
| | | | | | | | | | | | | | o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic. Change all KASSERT() messages to ("%s: ", __func__). The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made. Discussed with: rwatson
* Tidy up some code to conform better to surroundings and style(9), 0 = NULLandre2007-03-231-17/+16
| | | | and space/tab.
* Bring SACK option handling in tcp_dooptions() in line with all otherandre2007-03-231-4/+7
| | | | options and ajust users accordingly.
* ANSIfy function declarations and remove register keywords for variables.andre2007-03-211-50/+24
| | | | Consistently apply style to all function declarations.
* Tidy up IPFIREWALL_FORWARD sections and comments.andre2007-03-211-4/+3
|
* Update and clarify comments in first section of tcp_input().andre2007-03-211-15/+13
|
* Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and removeandre2007-03-211-57/+27
| | | | old dead T/TCP code.
* Tidy up tcp_log_in_vain and blackhole.andre2007-03-211-44/+31
|
* Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default itandre2007-03-211-5/+0
| | | | | | doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.
* Remove tcp_minmssoverload DoS detection logic. The problem it tried toandre2007-03-211-59/+0
| | | | | | protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.
* Match up SYSCTL declaration style.andre2007-03-191-14/+15
|
* Consolidate insertion of TCP options into a segment from within tcp_output()andre2007-03-151-2/+2
| | | | | | | | | | | | | | and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005
* This patch is provided to fix a couple of deployment issues observedqingli2007-03-071-5/+7
| | | | | | | | | | | | | | | | | | in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time . The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments. Reviewed by: silby, Mike Karels MFC after: 1 week
* In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().mohans2007-02-281-3/+2
| | | | The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
* Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigatemohans2007-02-261-1/+5
| | | | | | | | potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby.
* Rename two identically named log_in_vain variables: tcp_input.c's staticrwatson2007-02-201-4/+4
| | | | | | | log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week
* Auto sizing TCP socket buffers.andre2007-02-011-3/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month
* MFp4: 92972, 98913 + one more changebz2006-12-121-3/+7
| | | | | | | In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* fix calculating to_tsecr... This prevents the rtt calculations fromjmg2006-09-261-1/+1
| | | | going all wonky...
* Always set the IP version in the TCP input path, to preservebms2006-09-231-2/+0
| | | | | | | | | the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'. PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth
* Rewrite of TCP syncookies to remove locking requirements and to enhanceandre2006-09-131-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005
* Back when we had T/TCP support, we used to apply differentru2006-09-071-2/+2
| | | | | | | | | | | timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).
* First step of TSO (TCP segmentation offload) support in our network stack.andre2006-09-061-4/+9
| | | | | | | | | | | | o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005
* Fixes an edge case bug in timewait handling where ticks rolling over causingmohans2006-08-111-1/+1
| | | | | the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby
* Use INPLOOKUP_WILDCARD instead of just 1 more consistently.bz2006-06-291-3/+6
| | | | OKed by: rwatson (some weeks ago)
OpenPOWER on IntegriCloud