FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	tcp/lro: Allow drivers to set the TCP ACK/data segment aggregation limit	sephe	2016-02-18	2	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \|	ACK aggregation limit is append count based, while the TCP data segment aggregation limit is length based. Unless the network driver sets these two limits, it's an NO-OP. Reviewed by: adrian, gallatin (previous version), hselasky (previous version) Approved by: adrian (mentor) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5185
*	Add protection code for issues reported by PVS / D5245.	tuexen	2016-02-17	1	-2/+4
\| \| \| \|	MFC after: 3 days
*	Code cleanup which will silence a warning in PVS / D5245.	tuexen	2016-02-17	2	-7/+3
\|
*	Address a warning reported by D5245 / PVS.	tuexen	2016-02-17	1	-2/+2
\| \| \| \|	MFC after: 3 days
*	Whitespace changes.	tuexen	2016-02-16	4	-6/+7
\|
*	Improve the teardown of the SCTP stack.	tuexen	2016-02-16	5	-54/+109
\| \| \| \| \|	Obtained from: bz@ MFC after: 1 week
*	Loopback addresses are 127.0.0.0/8, not 127.0.0.1/32.	tuexen	2016-02-11	1	-4/+1
\| \| \| \|	MFC after: 1 week
*	Use 4 spaces instead of a tab.	tuexen	2016-02-11	1	-4/+4
\|
*	Merge SVN r295220 (bz) from projects/vnet/	dteske	2016-02-11	1	-1/+2
\| \| \| \| \| \| \|	Fix a panic that occurs when a vnet interface is unavailable at the time the vnet jail referencing said interface is stopped. Sponsored by: FIS Global, Inc.
*	Use a pair of ifs when comparing the 32-bit flowid integers so that	hselasky	2016-02-11	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	the sign bit doesn't cause an overflow. The overflow manifests itself as a sorting index wrap around in the middle of the sorted array, which is not a problem for the LRO code, but might be a problem for the logic inside qsort(). Reviewed by: gnn @ Sponsored by: Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D5239
*	Garbage collect unused arguments of m_init().	glebius	2016-02-10	1	-1/+1
\|
*	Increase max allowed backlog for listen sockets	alfred	2016-02-02	2	-4/+10
\| \| \| \| \| \| \| \|	from short to int. PR: 203922 Submitted by: White Knight <white_knight@2ch.net> MFC After: 4 weeks
*	These files were getting sys/malloc.h and vm/uma.h with header pollution	glebius	2016-02-01	3	-1/+4
\| \| \| \|	via sys/mbuf.h
*	Add missing parentheses. This was reported by ccaughie via GitHub	tuexen	2016-01-30	1	-1/+1
\| \| \| \| \| \|	for the userland stack. MFC after: 3 days
*	Update the path mtu when turning on/off UDP encapsulation for SCTP.	tuexen	2016-01-30	1	-12/+33
\| \| \| \|	MFC after: 3 days
*	Don't allow a remote encapsulation port change during the	tuexen	2016-01-30	3	-20/+41
\| \| \| \| \| \|	SCTP restart procedure. MFC after: 3 days
*	Don't change the remote UDP encapsulation port for SCTP packets	tuexen	2016-01-30	1	-3/+9
\| \| \| \| \| \|	containing an INIT chunk. MFC after: 3 days
*	Ignore peer addresses in a consistent way also when checking for	tuexen	2016-01-30	1	-31/+58
\| \| \| \| \| \| \| \|	new addresses during restart. If this is not done, restart doesn't work when the local socket is IPv4 only and the peer uses IPv4 and IPv6 addresses. MFC after: 3 days.
*	Remove debug output which was committed by accident.	tuexen	2016-01-28	1	-3/+0
\| \| \| \| \| \| \|	Thanks to Oliver Pinter for reporting. MFC after: 3 days X-MFC with: r294995
*	Always look in the TCP pool.	tuexen	2016-01-28	2	-15/+5
\| \| \| \| \| \| \|	This fixes issues with a restarting peer when the listening 1-to-1 style socket is closed. MFC after: 3 days
*	Rename netinet/tcp_cc.h to netinet/cc/cc.h.	glebius	2016-01-27	16	-18/+18
\| \| \| \|	Discussed with: lstewart
*	Fix issues with TCP_CONGESTION handling after r294540:	glebius	2016-01-27	1	-16/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	o Return back the buf[TCP_CA_NAME_MAX] for TCP_CONGESTION, for TCP_CCALGOOPT use dynamically allocated *pbuf. o For SOPT_SET TCP_CONGESTION do NULL terminating of string taking from userland. o For SOPT_SET TCP_CONGESTION do the search for the algorithm keeping the inpcb lock. o For SOPT_GET TCP_CONGESTION first strlcpy() the name holding the inpcb lock into temporary buffer, then copyout. Together with: lstewart
*	Grab a snap amount of TCP connections in syncache from tcpstat.	glebius	2016-01-27	3	-22/+3
\|
*	Augment struct tcpstat with tcps_states[], which is used for book-keeping	glebius	2016-01-27	6	-2/+22
\| \| \| \| \| \| \|	the amount of TCP connections by state. Provides a cheap way to get connection count without traversing the whole pcb list. Sponsored by: Netflix
*	Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros.	glebius	2016-01-27	1	-0/+3
\|
*	Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and	hiren	2016-01-26	4	-2/+14
\| \| \| \| \| \| \| \| \| \| \|	60 seconds, respectively. Turn them into sysctls that can be tuned live. The default values of 5 seconds and 60 seconds have been retained. Submitted by: Jason Wolfe (j at nitrology dot com) Reviewed by: gnn, rrs, hiren, bz MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D5024
*	Convert TCP mtu checks to the new routing KPI.	melifaro	2016-01-25	1	-31/+22
\|
*	MFP r287070,r287073: split radix implementation and route table structure.	melifaro	2016-01-25	3	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.
*	Provide new socket option TCP_CCALGOOPT, which stands for TCP congestion	glebius	2016-01-22	3	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	control algorithm options. The argument is variable length and is opaque to TCP, forwarded directly to the algorithm's ctl_output method. Provide new includes directory netinet/cc, where algorithm specific headers can be installed. The new API doesn't yet have any in tree consumers. The original code written by lstewart. Reviewed by: rrs, emax Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D711
*	Refactor TCP_CONGESTION setsockopt handling:	glebius	2016-01-21	1	-43/+39
\| \| \| \| \|	- Use M_TEMP instead of stack variable. - Unroll error handling, removing several levels of indentation.
*	- Rename cc.h to more meaningful tcp_cc.h.	glebius	2016-01-21	16	-29/+36
\| \| \| \| \|	- Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h
*	Cleanup TCP files from unnecessary interface related includes.	glebius	2016-01-21	8	-13/+2
\|
*	The variable is write once only and not used.	bz	2016-01-21	1	-4/+0
\| \| \| \| \| \| \| \| \| \|	Recover the vertical space. Sponsored by: The FreeBSD Foundation MFC After: 3 days Obtained from: p4 CH=180830 Reviewed by: gnn, hiren Differential Revision: https://reviews.freebsd.org/D4898
*	Add optimizing LRO wrapper:	hselasky	2016-01-19	2	-26/+181
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument. - Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics. - Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries. - Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed. - Add own memory allocation counter for LRO. - Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size. Sponsored by: Mellanox Technologies Reviewed by: gallatin, sbruno, rrs, gnn, transport Tested by: Netflix Differential Revision: https://reviews.freebsd.org/D4914
*	Fix a bug in INIT handling on accepted 1-to-1 style sockets when the	tuexen	2016-01-15	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	listener is closed. This fix allows the following packetdrill test to pass: // Setup a connected, blocking 1-to-1 style socket +0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3 // Check the handshake with en empty(!) cookie +0.0 bind(3, ..., ...) = 0 +0.0 listen(3, 1) = 0 +0.0 < sctp: INIT[flgs=0, tag=1, a_rwnd=1500, os=1, is=1, tsn=1] +0.0 > sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=..., os=..., is=..., tsn=1, ...] +0.0 < sctp: COOKIE_ECHO[flgs=0, len=..., val=...] +0.0 > sctp: COOKIE_ACK[flgs=0] +0.0 accept(3, ..., ...) = 4 +0.0 close(3) = 0 // Inject an INIT chunk and expect an INIT-ACK +0.0 < sctp: INIT[flgs=0, tag=3, a_rwnd=1500, os=1, is=1, tsn=1] +0.0 > sctp: INIT_ACK[flgs=0, tag=..., a_rwnd=..., os=..., is=..., tsn=..., ...] MFC after: 3 days
*	Fail the SCTP_GET_ASSOC_NUMBER and SCTP_GET_ASSOC_ID_LIST	tuexen	2016-01-14	1	-2/+16
\| \| \| \| \| \|	socket options for 1-to-1 style sockets as specified in RFC 6458. MFC after: 3 days
*	There is a bug in tcp_output()'s implementation of the TCP_SIGNATURE	glebius	2016-01-14	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	(RFC 2385/TCP-MD5) kernel option. If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called, and to.to_signature is an uninitialized stack variable. The value is later used as write offset, which leads to writing to random address. Submitted by: rstone, jtl Security: SA-16:05.tcp
*	Remove now-unused wrappers for various routing functions.	melifaro	2016-01-14	3	-15/+1
\|
*	Store the timer type for logging, because the timer can be freed	tuexen	2016-01-13	1	-8/+9
\| \| \| \| \| \|	during processing the timerout. MFC after: 3 days
*	Bring RADIX_MPATH support to new routing KPI to ease migration.	melifaro	2016-01-11	1	-0/+7
\| \| \| \| \| \|	Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.
*	Finish r275196: do not dereference rtentry in if_output() routines.	melifaro	2016-01-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae
*	Remove sys/eventhandler.h from net/route.h	melifaro	2016-01-09	6	-0/+7
\| \| \| \|	Reviewed by: ae
*	(Temporarily) remove route_redirect_event eventhandler.	melifaro	2016-01-09	1	-15/+0
\| \| \| \| \| \| \| \| \| \|	Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np
*	Apply the changes from r293284 to one additional file.	jtl	2016-01-07	1	-3/+1
\| \| \| \|	Discussed with: glebius
*	Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,	glebius	2016-01-07	6	-90/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593
*	Get struct sctp_net_route in sync with struct route again.	tuexen	2016-01-04	1	-3/+5
\|
*	Maintain consistent behavior: make fib4_lookup_nh_ext() return	melifaro	2016-01-04	1	-1/+4
\| \| \| \|	rt_ifp pointer by default, as done by other fib lookup functions.
*	Add rib_lookup_info() to provide API for retrieving individual route	melifaro	2016-01-04	1	-19/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).
*	Fix fib4_lookup_nh_ext() flags/flowid order messed up while merging.	melifaro	2016-01-03	1	-2/+2
\|
*	Remove second EVENTHANDLER_REGISTER slipped in r292978.	melifaro	2016-01-01	1	-3/+0
\| \| \| \|	Describe the reason of doing unconditional M_PREPEND in ether_output().