summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_var.h
Commit message (Collapse)AuthorAgeFilesLines
...
* Enhance our RFC1948 implementation to perform better in some pathlogicalsilby2004-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | TIME_WAIT recycling cases I was able to generate with http testing tools. In short, as the old algorithm relied on ticks to create the time offset component of an ISN, two connections with the exact same host, port pair that were generated between timer ticks would have the exact same sequence number. As a result, the second connection would fail to pass the TIME_WAIT check on the server side, and the SYN would never be acknowledged. I've "fixed" this by adding random positive increments to the time component between clock ticks so that ISNs will *always* be increasing, no matter how quickly the port is recycled. Except in such contrived benchmarking situations, this problem should never come up in normal usage... until networks get faster. No MFC planned, 4.x is missing other optimizations that are needed to even create the situation in which such quick port recycling will occur.
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* Remove now unneeded arguments to tcp_twrespond() -- so and msrc. Theserwatson2004-02-281-1/+1
| | | | | | were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam
* Fixed namespace pollution in rev.1.74. Implementation of the syncachebde2004-02-251-1/+4
| | | | | | | | increased <netinet/tcp_var>'s already large set of prerequisites, and this was handled badly. Just don't declare the complete syncache struct unless <netinet/pcb.h> is included before <netinet/tcp_var.h>. Approved by: jlemon (years ago, for a more invasive fix)
* Don't use the negatively-opaque type uma_zone_t or be chummy withbde2004-02-251-3/+1
| | | | <vm/uma.h>'s idempotency indentifier or its misspelling.
* Convert the tcp segment reassembly queue to UMA and limit the maximumandre2004-02-241-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby
* Brucification.bms2004-02-131-6/+3
| | | | Submitted by: bde
* Update the prototype for tcpsignature_apply() to reflect the spelling ofbms2004-02-121-1/+1
| | | | | | the types used by m_apply()'s callback function, f, as documented in mbuf(9). Noticed by: njl
* Initial import of RFC 2385 (TCP-MD5) digest support.bms2004-02-111-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net
* Limiters and sanity checks for TCP MSS (maximum segement size)andre2004-01-081-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-11/+33
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Add an additional check to the tcp_twrecycleable function; I hadsilby2003-11-021-0/+1
| | | | | | | | | previously only considered the send sequence space. Unfortunately, some OSes (windows) still use a random positive increments scheme for their syn-ack ISNs, so I must consider receive sequence space as well. The value of 250000 bytes / second for Microsoft's ISN rate of increase was determined by testing with an XP machine.
* - Add a new function tcp_twrecycleable, which tells us if the ISN whichsilby2003-11-011-0/+2
| | | | | | | | | | | | | we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled. - Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed. This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.
* Unify the "send high" and "recover" variables as specified in thehsu2003-07-151-22/+27
| | | | | | | | | | | | lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]
* Correct a bug introduced with reduced TCP state handling; makerwatson2003-05-071-1/+1
| | | | | | | | | | | | | | | | | | | sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Observe conservation of packets when entering Fast Recovery whilehsu2003-04-011-0/+1
| | | | | | | | | doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit. Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
* Remove a panic(); if the zone allocator can't provide more timewaitjlemon2003-03-081-2/+4
| | | | | | | structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs
* Add a TCP TIMEWAIT state which uses less space than a fullblown TCPjlemon2003-02-191-0/+18
| | | | | | | | control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
* Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so thejlemon2003-02-191-2/+2
| | | | | | | | routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs
* Fix NewReno.hsu2003-01-131-1/+3
| | | | Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
* Implement TCP bandwidth delay product window limiting, similar to (butdillon2002-08-171-0/+7
| | | | | | | | | | | | not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week
* Add the tcps_sndrexmitbad statistic, keep track of late acks that causeddillon2002-07-191-0/+1
| | | | unnecessary retransmissions.
* Notify functions can destroy the pcb, so they have to return anhsu2002-06-141-3/+6
| | | | | | | | indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
* Re-commit w/fix:silby2002-06-141-0/+1
| | | | | | | | | | | Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.
* Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)silby2002-06-141-1/+0
| | | | Pointy-hat to: silby
* Ensure that the syn cache's syn-ack packets contain the samesilby2002-06-141-0/+1
| | | | | | | ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks
* Lock up inpcb.hsu2002-06-101-0/+1
| | | | Submitted by: Jennifer Yang <yangjihui@yahoo.com>
* Remove __P.alfred2002-03-191-28/+27
|
* Fix a bug with transmitter restart after receiving a 0 window. Thedillon2001-12-021-0/+1
| | | | | | | | | | | | | receiver was not sending an immediate ack with delayed acks turned on when the input buffer is drained, preventing the transmitter from restarting immediately. Propogate the TCP_NODELAY option to accept()ed sockets. (Helps tbench and is a good idea anyway). Some cleanup. Identify additonal issues in comments. MFC after: 1 day
* Introduce a syncache, which enables FreeBSD to withstand a SYN floodjlemon2001-11-221-7/+75
| | | | | | | DoS in an improved fashion over the existing code. Reviewed by: silby (in a previous iteration) Sponsored by: DARPA, NAI Labs
* Add a flag TF_LASTIDLE, that forces a previously idle connectionjayanth2001-10-051-0/+1
| | | | | | | | | | | to send all its data, especially when the data is less than one MSS. This fixes an issue where the stack was delaying the sending of data, eventhough there was enough window to send all the data and the sending of data was emptying the socket buffer. Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp) Submitted by: Jayanth Vijayaraghavan
* Patches from Keiichi SHIMA <keiichi@iij.ad.jp>julian2001-09-031-1/+1
| | | | | | to make ip use the standard protosw structure again. Obtained from: Well, KAME I guess.
* Much delayed but now present: RFC 1948 style sequence numberssilby2001-08-221-5/+1
| | | | | | | | | | In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present. Reviewed by: jesper
* Temporary feature: Runtime tuneable tcp initial sequence numbersilby2001-07-081-0/+1
| | | | | | | | | | | | | | | | | | generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method. While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks. To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default. Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed. Reviewed by: jlemon Tested by: numerous subscribers of -net
* Eliminate the allocation of a tcp template structure for eachsilby2001-06-231-2/+3
| | | | | | | | | | | | connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks
* Randomize the TCP initial sequence numbers more thoroughly.kris2001-04-171-0/+4
| | | | | Obtained from: OpenBSD Reviewed by: jesper, peter, -developers
* Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly.jlemon2001-02-261-1/+0
| | | | | | | | For TCP, verify that the sequence number in the ICMP packet falls within the tcp receive window before performing any actions indicated by the icmp packet. Clean up some layering violations (access to tcp internals from in_pcb)
* Remove tcp_drop_all_states, which is unneeded after jlemon removed itjesper2001-02-251-1/+0
| | | | from tcp_subr.c in rev 1.92
* Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotifyphk2001-02-181-0/+1
| | | | | | | | | | | | | | | | | | | | Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst to reflect the fact that we also react on ICMP unreachables that are not administrative prohibited. Also update the comments to reflect this. In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different. PR: 23986 Submitted by: Jesper Skriver <jesper@skriver.dk>
* Update the "icmp_admin_prohib_like_rst" code to check the tcp-window andphk2000-12-241-0/+1
| | | | | | | to be configurable with respect to acting only in SYN or in all TCP states. PR: 23665 Submitted by: Jesper Skriver <jesper@skriver.dk>
* We currently does not react to ICMP administratively prohibitedphk2000-12-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | messages send by routers when they deny our traffic, this causes a timeout when trying to connect to TCP ports/services on a remote host, which is blocked by routers or firewalls. rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually requi re that we treat such a message for a TCP session, that we treat it like if we had recieved a RST. quote begin. A Destination Unreachable message that is received MUST be reported to the transport layer. The transport layer SHOULD use the information appropriately; for example, see Sections 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol that has its own mechanism for notifying the sender that a port is unreachable (e.g., TCP, which sends RST segments) MUST nevertheless accept an ICMP Port Unreachable for the same purpose. quote end. I've written a small extension that implement this, it also create a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if this new behaviour is activated. When it's activated (set to 1) we'll treat a ICMP administratively prohibited message (icmp type 3 code 9, 10 and 13) for a TCP sessions, as if we recived a TCP RST, but only if the TCP session is in SYN_SENT state. The reason for only reacting when in SYN_SENT state, is that this will solve the problem, and at the same time minimize the risk of this being abused. I suggest that we enable this new behaviour by default, but it would be a change of current behaviour, so if people prefer to leave it disabled by default, at least for now, this would be ok for me, the attached diff actually have the sysctl set to 0 by default. PR: 23086 Submitted by: Jesper Skriver <jesper@skriver.dk>
* When a connection is being dropped due to a listen queue overflow,jayanth2000-07-211-0/+1
| | | | | | | | | delete the cloned route that is associated with the connection. This does not exhaust the routing table memory when the system is under a SYN flood attack. The route entry is not deleted if there is any prior information cached in it. Reviewed by: Peter Wemm,asmodai
* Fix a comment which was broken in rev 1.36.sheldonh2000-07-181-1/+1
| | | | | PR: 19947 Submitted by: Tetsuya Isaki <isaki@net.ipc.hiroshima-u.ac.jp>
* Back out the previous change to the queue(3) interface.jake2000-05-261-2/+2
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-2/+2
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Implement TCP NewReno, as documented in RFC 2582. This allowsjlemon2000-05-061-0/+3
| | | | | | | | better recovery for multiple packet losses in a single window. The algorithm can be toggled via the sysctl net.inet.tcp.newreno, which defaults to "on". Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
* tcp updates to support IPv6.shin2000-01-091-6/+26
| | | | | | | also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
* Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"peter1999-12-291-2/+2
| | | | | | is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
* IPSEC support in the kernel.shin1999-12-221-1/+1
| | | | | | | | pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
* KAME related header files additions and merges.shin1999-11-051-1/+4
| | | | | | | (only those which don't affect c source files so much) Reviewed by: cvs-committers Obtained from: KAME project
OpenPOWER on IntegriCloud