FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove the now unused tcp_canceltimers() function. tcpcb timers are	rwatson	2004-12-23	1	-1/+0
\| \| \| \| \| \|	now stopped as part of tcp_discardcb(). MFC after: 2 weeks
*	Remove RFC1644 T/TCP support from the TCP side of the network stack.	andre	2004-11-02	1	-40/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
*	Shave 40 unused bytes from struct tcpcb.	andre	2004-10-22	1	-1/+0
\|
*	- Estimate the amount of data in flight in sack recovery and use it	ps	2004-10-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
*	White space cleanup for netinet before branch:	rwatson	2004-08-16	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \|	- Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
*	The tcp syncache code was leaving the IPv6 flowlabel uninitialised	dwmalone	2004-07-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	for the SYN\|ACK packet and then letting in6_pcbconnect set the flowlabel later. Arange for the syncache/syncookie code to set and recall the flow label so that the flowlabel used for the SYN\|ACK is consistent. This is done by using some of the cookie (when tcp cookies are enabeled) and by stashing the flowlabel in syncache. Tested and Discovered by: Orla McGann <orly@cnri.dit.ie> Approved by: ume, silby MFC after: 1 month
*	Whitespace.	bms	2004-06-25	1	-3/+3
\|
*	Add support for TCP Selective Acknowledgements. The work for this	ps	2004-06-23	1	-1/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
*	Tighten up reset handling in order to make reset attacks as difficult as	silby	2004-04-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)
*	Fix a typo in a comment.	bms	2004-04-20	1	-1/+1
\|
*	Enhance our RFC1948 implementation to perform better in some pathlogical	silby	2004-04-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TIME_WAIT recycling cases I was able to generate with http testing tools. In short, as the old algorithm relied on ticks to create the time offset component of an ISN, two connections with the exact same host, port pair that were generated between timer ticks would have the exact same sequence number. As a result, the second connection would fail to pass the TIME_WAIT check on the server side, and the SYN would never be acknowledged. I've "fixed" this by adding random positive increments to the time component between clock ticks so that ISNs will always be increasing, no matter how quickly the port is recycled. Except in such contrived benchmarking situations, this problem should never come up in normal usage... until networks get faster. No MFC planned, 4.x is missing other optimizations that are needed to even create the situation in which such quick port recycling will occur.
*	Remove advertising clause from University of California Regent's	imp	2004-04-07	1	-4/+0
\| \| \| \| \| \| \|	license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
*	Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These	rwatson	2004-02-28	1	-1/+1
\| \| \| \| \| \|	were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam
*	Fixed namespace pollution in rev.1.74. Implementation of the syncache	bde	2004-02-25	1	-1/+4
\| \| \| \| \| \| \| \|	increased <netinet/tcp_var>'s already large set of prerequisites, and this was handled badly. Just don't declare the complete syncache struct unless <netinet/pcb.h> is included before <netinet/tcp_var.h>. Approved by: jlemon (years ago, for a more invasive fix)
*	Don't use the negatively-opaque type uma_zone_t or be chummy with	bde	2004-02-25	1	-3/+1
\| \| \| \|	<vm/uma.h>'s idempotency indentifier or its misspelling.
*	Convert the tcp segment reassembly queue to UMA and limit the maximum	andre	2004-02-24	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby
*	Brucification.	bms	2004-02-13	1	-6/+3
\| \| \| \|	Submitted by: bde
*	Update the prototype for tcpsignature_apply() to reflect the spelling of	bms	2004-02-12	1	-1/+1
\| \| \| \| \| \|	the types used by m_apply()'s callback function, f, as documented in mbuf(9). Noticed by: njl
*	Initial import of RFC 2385 (TCP-MD5) digest support.	bms	2004-02-11	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net
*	Limiters and sanity checks for TCP MSS (maximum segement size)	andre	2004-01-08	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day
*	Introduce tcp_hostcache and remove the tcp specific metrics from	andre	2003-11-20	1	-11/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
*	Add an additional check to the tcp_twrecycleable function; I had	silby	2003-11-02	1	-0/+1
\| \| \| \| \| \| \| \| \|	previously only considered the send sequence space. Unfortunately, some OSes (windows) still use a random positive increments scheme for their syn-ack ISNs, so I must consider receive sequence space as well. The value of 250000 bytes / second for Microsoft's ISN rate of increase was determined by testing with an XP machine.
*	- Add a new function tcp_twrecycleable, which tells us if the ISN which	silby	2003-11-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled. - Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed. This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.
*	Unify the "send high" and "recover" variables as specified in the	hsu	2003-07-15	1	-22/+27
\| \| \| \| \| \| \| \| \| \| \| \|	lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]
*	Correct a bug introduced with reduced TCP state handling; make	rwatson	2003-05-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
*	Observe conservation of packets when entering Fast Recovery while	hsu	2003-04-01	1	-0/+1
\| \| \| \| \| \| \| \| \|	doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit. Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
*	Remove a panic(); if the zone allocator can't provide more timewait	jlemon	2003-03-08	1	-2/+4
\| \| \| \| \| \| \|	structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs
*	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP	jlemon	2003-02-19	1	-0/+18
\| \| \| \| \| \| \| \|	control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
*	Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the	jlemon	2003-02-19	1	-2/+2
\| \| \| \| \| \| \| \|	routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs
*	Fix NewReno.	hsu	2003-01-13	1	-1/+3
\| \| \| \|	Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
*	Implement TCP bandwidth delay product window limiting, similar to (but	dillon	2002-08-17	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week
*	Add the tcps_sndrexmitbad statistic, keep track of late acks that caused	dillon	2002-07-19	1	-0/+1
\| \| \| \|	unnecessary retransmissions.
*	Notify functions can destroy the pcb, so they have to return an	hsu	2002-06-14	1	-3/+6
\| \| \| \| \| \| \| \|	indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
*	Re-commit w/fix:	silby	2002-06-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.
*	Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)	silby	2002-06-14	1	-1/+0
\| \| \| \|	Pointy-hat to: silby
*	Ensure that the syn cache's syn-ack packets contain the same	silby	2002-06-14	1	-0/+1
\| \| \| \| \| \| \|	ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks
*	Lock up inpcb.	hsu	2002-06-10	1	-0/+1
\| \| \| \|	Submitted by: Jennifer Yang <yangjihui@yahoo.com>
*	Remove __P.	alfred	2002-03-19	1	-28/+27
\|
*	Fix a bug with transmitter restart after receiving a 0 window. The	dillon	2001-12-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	receiver was not sending an immediate ack with delayed acks turned on when the input buffer is drained, preventing the transmitter from restarting immediately. Propogate the TCP_NODELAY option to accept()ed sockets. (Helps tbench and is a good idea anyway). Some cleanup. Identify additonal issues in comments. MFC after: 1 day
*	Introduce a syncache, which enables FreeBSD to withstand a SYN flood	jlemon	2001-11-22	1	-7/+75
\| \| \| \| \| \| \|	DoS in an improved fashion over the existing code. Reviewed by: silby (in a previous iteration) Sponsored by: DARPA, NAI Labs
*	Add a flag TF_LASTIDLE, that forces a previously idle connection	jayanth	2001-10-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	to send all its data, especially when the data is less than one MSS. This fixes an issue where the stack was delaying the sending of data, eventhough there was enough window to send all the data and the sending of data was emptying the socket buffer. Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp) Submitted by: Jayanth Vijayaraghavan
*	Patches from Keiichi SHIMA <keiichi@iij.ad.jp>	julian	2001-09-03	1	-1/+1
\| \| \| \| \| \|	to make ip use the standard protosw structure again. Obtained from: Well, KAME I guess.
*	Much delayed but now present: RFC 1948 style sequence numbers	silby	2001-08-22	1	-5/+1
\| \| \| \| \| \| \| \| \| \|	In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present. Reviewed by: jesper
*	Temporary feature: Runtime tuneable tcp initial sequence number	silby	2001-07-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method. While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks. To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default. Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed. Reviewed by: jlemon Tested by: numerous subscribers of -net
*	Eliminate the allocation of a tcp template structure for each	silby	2001-06-23	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks
*	Randomize the TCP initial sequence numbers more thoroughly.	kris	2001-04-17	1	-0/+4
\| \| \| \| \|	Obtained from: OpenBSD Reviewed by: jesper, peter, -developers
*	Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly.	jlemon	2001-02-26	1	-1/+0
\| \| \| \| \| \| \| \|	For TCP, verify that the sequence number in the ICMP packet falls within the tcp receive window before performing any actions indicated by the icmp packet. Clean up some layering violations (access to tcp internals from in_pcb)
*	Remove tcp_drop_all_states, which is unneeded after jlemon removed it	jesper	2001-02-25	1	-1/+0
\| \| \| \|	from tcp_subr.c in rev 1.92
*	Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify	phk	2001-02-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst to reflect the fact that we also react on ICMP unreachables that are not administrative prohibited. Also update the comments to reflect this. In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different. PR: 23986 Submitted by: Jesper Skriver <jesper@skriver.dk>
*	Update the "icmp_admin_prohib_like_rst" code to check the tcp-window and	phk	2000-12-24	1	-0/+1
\| \| \| \| \| \| \|	to be configurable with respect to acting only in SYN or in all TCP states. PR: 23665 Submitted by: Jesper Skriver <jesper@skriver.dk>