summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_reass.c
Commit message (Collapse)AuthorAgeFilesLines
* oops, I forgot this file in a prior commit (change was still sitting here,darrenr2004-05-021-1/+1
| | | | | | | | uncommitted): Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.
* Tighten up reset handling in order to make reset attacks as difficult assilby2004-04-261-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)
* Correct an edge case in tcp_mss() where the cached path MTUandre2004-04-231-2/+2
| | | | | | | | | | | | from tcp_hostcache would have overridden a (now) lower MTU of an interface or route that changed since first PMTU discovery. The bug would have caused TCP to redo the PMTU discovery when not strictly necessary. Make a comment about already pre-initialized default values more clear. Reviewed by: sam
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* fix -O0 compilation without INET6.ume2004-03-011-2/+12
| | | | Pointed out by: ru
* Remove now unneeded arguments to tcp_twrespond() -- so and msrc. Theserwatson2004-02-281-1/+1
| | | | | | were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam
* Re-remove MT_TAGs. The problems with dummynet have been fixed now.mlaier2004-02-251-5/+2
| | | | | Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam
* Relax a KASSERT condition to allow for a valid corner case wherehsu2004-02-251-2/+5
| | | | | | the FIN on the last segment consumes an extra sequence number. Spurious panic reported by Mike Silbersack <silby@silby.com>.
* Convert the tcp segment reassembly queue to UMA and limit the maximumandre2004-02-241-9/+77
| | | | | | | | | | | | | | | | | | | | | | | | amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby
* Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet ismlaier2004-02-181-2/+6
| | | | | | not working properly with the patch in place. Approved by: bms(mentor)
* IPSEC and FAST_IPSEC have the same internal API now;ume2004-02-171-16/+8
| | | | | | so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
* This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacingmlaier2004-02-131-6/+2
| | | | | | | | | | | them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson
* Brucification.bms2004-02-131-1/+1
| | | | Submitted by: bde
* Remove an unnecessary initialization that crept in from the code whichbms2004-02-121-2/+1
| | | | | | verifies TCP-MD5 digests. Noticed by: njl
* Initial import of RFC 2385 (TCP-MD5) digest support.bms2004-02-111-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net
* pass pcb rather than so. it is expected that per socket policyume2004-02-031-2/+2
| | | | works again.
* Merge from DragonFlyBSD rev 1.10:hsu2004-01-201-6/+5
| | | | | | | date: 2003/09/02 10:04:47; author: hsu; state: Exp; lines: +5 -6 Account for when Limited Transmit is not congestion window limited. Obtained from: DragonFlyBSD
* Limiters and sanity checks for TCP MSS (maximum segement size)andre2004-01-081-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day
* Enable the following TCP options by default to give it more exposure:andre2004-01-061-2/+2
| | | | | | | | | | | | rfc3042 Limited retransmit rfc3390 Increasing TCP's initial congestion Window inflight TCP inflight bandwidth limiting All my production server have it enabled and there have been no issues. I am confident about having them on by default and it gives us better overall TCP performance. Reviewed by: sam (mentor)
* Restructure a too broad ifdef which was disabling the setting of theandre2003-11-251-2/+4
| | | | | | tcp flightsize sysctl value for local networks in the !INET6 case. Approved by: re (scottl)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-144/+200
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Introduce a MAC label reference in 'struct inpcb', which cachesrwatson2003-11-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* dropwithreset is not needed in this case as tcp_drop() is already notifyingandre2003-11-121-1/+1
| | | | the other side. Before we were sending two RST packets.
* o correct locking problem: the inpcb must be held across tcp_respondsam2003-11-081-3/+3
| | | | | | | o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond Supported by: FreeBSD Foundation
* speedup stream socket recv handling by tracking the tail ofsam2003-10-281-3/+3
| | | | | | | the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)
* enclose IPv6 part with ifdef INET6.ume2003-10-201-2/+3
| | | | Obtained from: KAME
* correct linkmtu handling.ume2003-10-201-2/+11
| | | | Obtained from: KAME
* - add dom_if{attach,detach} framework.ume2003-10-171-2/+1
| | | | | | - transition to use ifp->if_afdata. Obtained from: KAME
* A number of patches in the last years have created new return pathsharti2003-08-131-0/+21
| | | | | | | | | | in tcp_input that leave the function before hitting the tcp_trace function call for the TCPDEBUG option. This has made TCPDEBUG mostly useless (and tools like ports/benchmarks/dbs not working). Add tcp_trace calls to the return paths that could be identified in this maze. This is a NOP unless you compile with TCPDEBUG.
* Unify the "send high" and "recover" variables as specified in thehsu2003-07-151-19/+24
| | | | | | | | | | | | lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]
* Add /* FALLTHROUGH */phk2003-05-311-0/+1
| | | | Found by: FlexeLint
* Correct a bug introduced with reduced TCP state handling; makerwatson2003-05-071-1/+1
| | | | | | | | | | | | | | | | | | | sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Explicitly declare 'int' parameters.obrien2003-04-211-0/+1
|
* Observe conservation of packets when entering Fast Recovery whilehsu2003-04-011-3/+21
| | | | | | | | | doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit. Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
* Greatly simplify the unlocking logic by holding the TCP protocol lock untilhsu2003-03-131-8/+2
| | | | | | after FIN_WAIT_2 processing. Helped with debugging: Doug Barton
* Add support for RFC 3390, which allows for a variable-sizedhsu2003-03-131-2/+9
| | | | initial congestion window.
* Implement the Limited Transmit algorithm (RFC 3042).hsu2003-03-121-0/+14
|
* Remove a panic(); if the zone allocator can't provide more timewaitjlemon2003-03-081-4/+3
| | | | | | | structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs
* In timewait state, if the incoming segment is a pure in-sequence ackjlemon2003-02-261-2/+4
| | | | | | | | | that matches snd_max, then do not respond with an ack, just drop the segment. This fixes a problem where a simultaneous close results in an ack loop between two time-wait states. Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG> Sponsored by: DARPA, NAI Labs
* The TCP protocol lock may still be held if the reassembly queue dropped FIN.jlemon2003-02-261-1/+2
| | | | | | Detect this case and drop the lock accordingly. Sponsored by: DARPA, NAI Labs
* tcp_twstart() need to be called with the TCP protocol lock held to avoidhsu2003-02-241-6/+8
| | | | a race condition with the TCP timer routines.
* Pass the right function to callout_reset() for a compressedhsu2003-02-241-1/+1
| | | | TIME-WAIT control block.
* Yesterday just wasn't my day. Remove testing delta that crept into the diff.jlemon2003-02-231-1/+1
| | | | Pointy hat provided by: sam
* Check to see if the TF_DELACK flag is set before returning fromjlemon2003-02-221-8/+7
| | | | | | | | tcp_input(). This unbreaks delack handling, while still preserving correct T/TCP behavior Tested by: maxim Sponsored by: DARPA, NAI Labs
* Add a TCP TIMEWAIT state which uses less space than a fullblown TCPjlemon2003-02-191-30/+186
| | | | | | | | control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
* Correct comments.jlemon2003-02-191-7/+4
|
* Clean up delayed acks and T/TCP interactions:jlemon2003-02-191-28/+27
| | | | | | | | - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending() Sponsored by: DARPA, NAI Labs
* The protocol lock is always held in the dropafterack case, so we don'thsu2003-02-131-2/+2
| | | | need to check for it at runtime.
* Add the TCP flags to the log message whenever log_in_vain is 1, notcjc2003-02-021-8/+3
| | | | | | | just when set to 2. PR: kern/43348 MFC after: 5 days
* Fix NewReno.hsu2003-01-131-41/+44
| | | | Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
OpenPOWER on IntegriCloud