FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix for a bug in the change that defers sack option processing until	ps	2005-07-01	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	after PAWS checks. The symptom of this is an inconsistency in the cached sack state, caused by the fact that the sack scoreboard was not being updated for an ACK handled in the header prediction path. Found by: Andrey Chernov. Submitted by: Noritoshi Demizu, Raja Mukerji. Approved by: re
*	Assert tcbinfo lock in tcp_drop() due to its call of tcp_close()	rwatson	2005-06-01	1	-0/+6
\| \| \| \| \| \| \|	Assert tcbinfo lock in tcp_close() due to its call to in{,6}_detach() Assert tcbinfo lock in tcp_drop_syn_sent() due to its call to tcp_drop() MFC after: 7 days
*	Fix two issues which were missed in FreeBSD-SA-05:08.kmem.	cperciva	2005-05-07	1	-0/+2
\| \| \| \|	Reported by: Uwe Doering
*	If we don't get a suggested MTU during path MTU discovery	andre	2005-05-04	1	-9/+20
\| \| \| \| \| \| \| \|	look up the packet size of the packet that generated the response, step down the MTU by one step through ip_next_mtu() and try again. Suggested by: dwmalone
*	- Make the sack scoreboard logic use the TAILQ macros. This improves	ps	2005-04-21	1	-0/+2
\| \| \| \| \| \| \| \| \|	code readability and facilitates some anticipated optimizations in tcp_sack_option(). - Remove tcp_print_holes() and TCP_SACK_DEBUG. Submitted by: Raja Mukerji. Reviewed by: Mohan Srinivasan, Noritoshi Demizu.
*	Move Path MTU discovery ICMP processing from icmp_input() to	andre	2005-04-21	1	-7/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_ctlinput() and subject it to active tcpcb and sequence number checking. Previously any ICMP unreachable/needfrag message would cause an update to the TCP hostcache. Now only ICMP PMTU messages belonging to an active TCP session with the correct src/dst/port and sequence number will update the hostcache and complete the path MTU discovery process. Note that we don't entirely implement the recommended counter measures of Section 7.2 of the paper. However we close down the possible degradation vector from trivially easy to really complex and resource intensive. In addition we have limited the smallest acceptable MTU with net.inet.tcp.minmss sysctl for some time already, further reducing the effect of any degradation due to an attack. Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.2 MFC after: 3 days
*	Ignore ICMP Source Quench messages for TCP sessions. Source Quench is	andre	2005-04-21	1	-24/+11
\| \| \| \| \| \| \| \| \| \| \|	ineffective, depreciated and can be abused to degrade the performance of active TCP sessions if spoofed. Replace a bogus call to tcp_quench() in tcp_output() with the direct equivalent tcpcb variable assignment. Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1 MFC after: 3 days
*	- If the reassembly queue limit was reached or if we couldn't allocate	ps	2005-04-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	a reassembly queue state structure, don't update (receiver) sack report. - Similarly, if tcp_drain() is called, freeing up all items on the reassembly queue, clean the sack report. Found, Submitted by: Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp> Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com), Raja Mukerji (raja at moselle dot com).
*	Use NET_CALLOUT_MPSAFE macro.	glebius	2005-03-01	1	-7/+5
\|
*	o Add handling of an IPv4-mapped IPv6 address.	maxim	2005-02-14	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \|	o Use SYSCTL_IN() macro instead of direct call of copyin(9). Submitted by: ume o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where most of tcp sysctls live. o There are net.inet[6].tcp[6].getcred sysctls already, no needs in a separate struct tcp_ident_mapping. Suggested by: ume
*	teach scope of IPv6 address to net.inet6.tcp6.getcred.	ume	2005-02-04	1	-4/+10
\| \| \| \|	MFC after: 1 week
*	Update an additional reference to the rate of ISN tick callouts that was	rwatson	2005-01-31	1	-1/+1
\| \| \| \| \| \| \| \|	missed in tcp_subr.c:1.216: projected_offset must also reflect how often the tcp_isn_tick() callout will fire. MFC after: 2 weeks Submitted by: silby
*	Have tcp_isn_tick() fire 100 times a second, rather than HZ times a	rwatson	2005-01-30	1	-1/+1
\| \| \| \| \| \| \| \| \|	second; since the default hz has changed to 1000 times a second, this resulted in unecessary work being performed. MFC after: 2 weeks Discussed with: phk, cperciva General head nod: silby
*	/* -> /*- for license, minor formatting changes	imp	2005-01-07	1	-1/+1
\|
*	Attempt to consistently use () around return values in calls to	rwatson	2004-12-23	1	-18/+18
\| \| \| \|	return() in newer code (sysctl, ISN, timewait).
*	Remove an XXXRW comment relating to whether or not the TCP timers are	rwatson	2004-12-23	1	-6/+1
\| \| \| \| \| \| \| \|	MPSAFE: they are now believed to be. Correct a typo in a second comment. MFC after: 2 weeks
*	Assert inpcb lock in:	rwatson	2004-12-05	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcpip_fillheaders() tcp_discardcb() tcp_close() tcp_notify() tcp_new_isn() tcp_xmit_bandwidth_limit() Fix a locking comment in tcp_twstart(): the pcbinfo will be locked (and is asserted). MFC after: 2 weeks
*	tcp_timewait() performs multiple non-atomic reads on the tcptw	rwatson	2004-11-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks
*	Assert the inpcb lock in tcp_twstart(), which does both read-modify-write	rwatson	2004-11-23	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on the tcpcb, but also calls into tcp_close() and tcp_twrespond(). Annotate that tcp_twrecycleable() requires the inpcb lock because it does a series of non-atomic reads of the tcpcb, but is currently called without the inpcb lock by the caller. This is a bug. Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write of the timewait structure/inpcb, and calls in_pcbdetach() which requires the lock. Assert the inpcb lock in tcp_twrespond(), as it performs multiple non-atomic reads of the tcptw and inpcb structures, as well as calling mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the inpcb lock. MFC after: 2 weeks
*	Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(),	rwatson	2004-11-23	1	-0/+4
\| \| \| \| \| \|	and tcp_drop(), due to read-modify-write of TCP state variables. MFC after: 2 weeks
*	Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lock	rwatson	2004-11-23	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \|	protects access to the ISN state variables. Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize timer-driven isn bumping. Staticize internal ISN variables since they're not used outside of tcp_subr.c. MFC after: 2 weeks
*	support TCP-MD5(IPv4) in KAME-IPSEC, too.	suz	2004-11-08	1	-0/+1
\| \| \| \|	MFC after: 3 week
*	Remove RFC1644 T/TCP support from the TCP side of the network stack.	andre	2004-11-02	1	-39/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
*	Push acquisition of the accept mutex out of sofree() into the caller	rwatson	2004-10-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
*	- Estimate the amount of data in flight in sack recovery and use it	ps	2004-10-05	1	-5/+0
\| \| \| \| \| \| \| \| \| \|	to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
*	fix up socket/ip layer violation... don't assume/know that	jmg	2004-09-05	1	-1/+2
\| \| \| \|	SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
*	For IPv6 access pointer to tcpcb only after we have checked it is valid.	andre	2004-08-19	1	-1/+4
\| \| \| \|	Found by: Coverity's automated analysis (via Ted Unangst)
*	White space cleanup for netinet before branch:	rwatson	2004-08-16	1	-68/+68
\| \| \| \| \| \| \| \| \| \| \|	- Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
*	In tcp6_ctlinput, lock tcbinfo around the call to syncache_unreach	dwmalone	2004-08-12	1	-0/+2
\| \| \| \| \| \|	so that the locks held are the same as the IPv4 case. Reviewed by: rwatson
*	Backout removal of UMA_ZONE_NOFREE flag for all zones which are established	andre	2004-08-11	1	-4/+4
\| \| \| \| \| \| \| \| \|	for structures with timers in them. It might be that a timer might fire even when the associated structure has already been free'd. Having type- stable storage in this case is beneficial for graceful failure handling and debugging. Discussed with: bosko, tegge, rwatson
*	Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and	andre	2004-08-11	1	-4/+4
\| \| \| \| \|	TCP code. This flag would have prevented giving back excessive free slabs to the global pool after a transient peak usage.
*	Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead	rwatson	2004-08-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	structures, allowing in6_pcbnotify() to lock the pcbinfo and each inpcb that it notifies of ICMPv6 events. This prevents inpcb assertions from firing when IPv6 generates and delievers event notifications for inpcbs. Reported by: kuriyama Tested by: kuriyama
*	o Move the inflight sysctls to their own sub-tree under net.inet.tcp to be	andre	2004-08-03	1	-5/+9
\| \| \| \|	more consistent with the other sysctls around it.
*	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is	cperciva	2004-07-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
*	Let IN_FASTREOCOVERY macro decide if we are in recovery mode.	jayanth	2004-07-19	1	-4/+0
\| \| \| \| \|	Nuke sackhole_limit for now. We need to add it back to limit the total number of sack blocks in the system.
*	Move the sack sysctl's under net.inet.tcp.sack	ps	2004-06-23	1	-4/+4
\| \| \| \| \| \| \|	net.inet.tcp.do_sack -> net.inet.tcp.sack.enable net.inet.tcp.sackhole_limit -> net.inet.tcp.sack.sackhole_limit Requested by: wollman
*	Add support for TCP Selective Acknowledgements. The work for this	ps	2004-06-23	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
*	If debug.mpsafenet is set, initialize TCP callouts as CALLOUT_MPSAFE.	rwatson	2004-06-20	1	-5/+12
\|
*	Extend coverage of SOCK_LOCK(so) to include so_count, the socket	rwatson	2004-06-12	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
*	Switch to using the inpcb MAC label instead of socket MAC label when	rwatson	2004-05-04	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular: - Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond(). While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
*	Enhance our RFC1948 implementation to perform better in some pathlogical	silby	2004-04-20	1	-2/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TIME_WAIT recycling cases I was able to generate with http testing tools. In short, as the old algorithm relied on ticks to create the time offset component of an ISN, two connections with the exact same host, port pair that were generated between timer ticks would have the exact same sequence number. As a result, the second connection would fail to pass the TIME_WAIT check on the server side, and the SYN would never be acknowledged. I've "fixed" this by adding random positive increments to the time component between clock ticks so that ISNs will always be increasing, no matter how quickly the port is recycled. Except in such contrived benchmarking situations, this problem should never come up in normal usage... until networks get faster. No MFC planned, 4.x is missing other optimizations that are needed to even create the situation in which such quick port recycling will occur.
*	Remove advertising clause from University of California Regent's	imp	2004-04-07	1	-4/+0
\| \| \| \| \| \| \|	license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
*	Two missed in previous commit -- compare pointer with NULL rather than	rwatson	2004-04-05	1	-2/+2
\| \| \| \|	using it as a boolean.
*	Prefer NULL to 0 when checking pointer values as integers or booleans.	rwatson	2004-04-05	1	-19/+20
\|
*	Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These	rwatson	2004-02-28	1	-10/+2
\| \| \| \| \| \|	were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam
*	Split the mlock() kernel code into two parts, mlock(), which unpacks	truckman	2004-02-26	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
*	Convert the tcp segment reassembly queue to UMA and limit the maximum	andre	2004-02-24	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby
*	Fixed ucred structure leak.	pjd	2004-02-19	1	-0/+2
\| \| \| \| \| \|	Approved by: scottl (mentor) PR: 54163 MFC after: 3 days
*	Final brucification pass. Spell types consistently (u_int). Remove bogus	bms	2004-02-14	1	-1/+1
\| \| \| \| \| \|	casts. Remove unnecessary parenthesis. Submitted by: bde
*	Brucification.	bms	2004-02-13	1	-10/+14
\| \| \| \|	Submitted by: bde