summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* tcp/lro: Allow drivers to set the TCP ACK/data segment aggregation limitsephe2016-02-182-2/+18
| | | | | | | | | | | | ACK aggregation limit is append count based, while the TCP data segment aggregation limit is length based. Unless the network driver sets these two limits, it's an NO-OP. Reviewed by: adrian, gallatin (previous version), hselasky (previous version) Approved by: adrian (mentor) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5185
* Add protection code for issues reported by PVS / D5245.tuexen2016-02-171-2/+4
| | | | MFC after: 3 days
* Code cleanup which will silence a warning in PVS / D5245.tuexen2016-02-172-7/+3
|
* Address a warning reported by D5245 / PVS.tuexen2016-02-171-2/+2
| | | | MFC after: 3 days
* Whitespace changes.tuexen2016-02-164-6/+7
|
* Improve the teardown of the SCTP stack.tuexen2016-02-165-54/+109
| | | | | Obtained from: bz@ MFC after: 1 week
* Loopback addresses are 127.0.0.0/8, not 127.0.0.1/32.tuexen2016-02-111-4/+1
| | | | MFC after: 1 week
* Use 4 spaces instead of a tab.tuexen2016-02-111-4/+4
|
* Merge SVN r295220 (bz) from projects/vnet/dteske2016-02-111-1/+2
| | | | | | | Fix a panic that occurs when a vnet interface is unavailable at the time the vnet jail referencing said interface is stopped. Sponsored by: FIS Global, Inc.
* Use a pair of ifs when comparing the 32-bit flowid integers so thathselasky2016-02-111-3/+4
| | | | | | | | | | | the sign bit doesn't cause an overflow. The overflow manifests itself as a sorting index wrap around in the middle of the sorted array, which is not a problem for the LRO code, but might be a problem for the logic inside qsort(). Reviewed by: gnn @ Sponsored by: Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D5239
* Garbage collect unused arguments of m_init().glebius2016-02-101-1/+1
|
* Increase max allowed backlog for listen socketsalfred2016-02-022-4/+10
| | | | | | | | from short to int. PR: 203922 Submitted by: White Knight <white_knight@2ch.net> MFC After: 4 weeks
* These files were getting sys/malloc.h and vm/uma.h with header pollutionglebius2016-02-013-1/+4
| | | | via sys/mbuf.h
* Add missing parentheses. This was reported by ccaughie via GitHubtuexen2016-01-301-1/+1
| | | | | | for the userland stack. MFC after: 3 days
* Update the path mtu when turning on/off UDP encapsulation for SCTP.tuexen2016-01-301-12/+33
| | | | MFC after: 3 days
* Don't allow a remote encapsulation port change during thetuexen2016-01-303-20/+41
| | | | | | SCTP restart procedure. MFC after: 3 days
* Don't change the remote UDP encapsulation port for SCTP packetstuexen2016-01-301-3/+9
| | | | | | containing an INIT chunk. MFC after: 3 days
* Ignore peer addresses in a consistent way also when checking fortuexen2016-01-301-31/+58
| | | | | | | | new addresses during restart. If this is not done, restart doesn't work when the local socket is IPv4 only and the peer uses IPv4 and IPv6 addresses. MFC after: 3 days.
* Remove debug output which was committed by accident.tuexen2016-01-281-3/+0
| | | | | | | Thanks to Oliver Pinter for reporting. MFC after: 3 days X-MFC with: r294995
* Always look in the TCP pool.tuexen2016-01-282-15/+5
| | | | | | | This fixes issues with a restarting peer when the listening 1-to-1 style socket is closed. MFC after: 3 days
* Rename netinet/tcp_cc.h to netinet/cc/cc.h.glebius2016-01-2716-18/+18
| | | | Discussed with: lstewart
* Fix issues with TCP_CONGESTION handling after r294540:glebius2016-01-271-16/+15
| | | | | | | | | | | | | o Return back the buf[TCP_CA_NAME_MAX] for TCP_CONGESTION, for TCP_CCALGOOPT use dynamically allocated *pbuf. o For SOPT_SET TCP_CONGESTION do NULL terminating of string taking from userland. o For SOPT_SET TCP_CONGESTION do the search for the algorithm keeping the inpcb lock. o For SOPT_GET TCP_CONGESTION first strlcpy() the name holding the inpcb lock into temporary buffer, then copyout. Together with: lstewart
* Grab a snap amount of TCP connections in syncache from tcpstat.glebius2016-01-273-22/+3
|
* Augment struct tcpstat with tcps_states[], which is used for book-keepingglebius2016-01-276-2/+22
| | | | | | | the amount of TCP connections by state. Provides a cheap way to get connection count without traversing the whole pcb list. Sponsored by: Netflix
* Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros.glebius2016-01-271-0/+3
|
* Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds andhiren2016-01-264-2/+14
| | | | | | | | | | | 60 seconds, respectively. Turn them into sysctls that can be tuned live. The default values of 5 seconds and 60 seconds have been retained. Submitted by: Jason Wolfe (j at nitrology dot com) Reviewed by: gnn, rrs, hiren, bz MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D5024
* Convert TCP mtu checks to the new routing KPI.melifaro2016-01-251-31/+22
|
* MFP r287070,r287073: split radix implementation and route table structure.melifaro2016-01-253-20/+20
| | | | | | | | | | | | | | | | | | | | | | | There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.
* Provide new socket option TCP_CCALGOOPT, which stands for TCP congestionglebius2016-01-223-1/+31
| | | | | | | | | | | | | | | control algorithm options. The argument is variable length and is opaque to TCP, forwarded directly to the algorithm's ctl_output method. Provide new includes directory netinet/cc, where algorithm specific headers can be installed. The new API doesn't yet have any in tree consumers. The original code written by lstewart. Reviewed by: rrs, emax Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D711
* Refactor TCP_CONGESTION setsockopt handling:glebius2016-01-211-43/+39
| | | | | - Use M_TEMP instead of stack variable. - Unroll error handling, removing several levels of indentation.
* - Rename cc.h to more meaningful tcp_cc.h.glebius2016-01-2116-29/+36
| | | | | - Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h
* Cleanup TCP files from unnecessary interface related includes.glebius2016-01-218-13/+2
|
* The variable is write once only and not used.bz2016-01-211-4/+0
| | | | | | | | | | Recover the vertical space. Sponsored by: The FreeBSD Foundation MFC After: 3 days Obtained from: p4 CH=180830 Reviewed by: gnn, hiren Differential Revision: https://reviews.freebsd.org/D4898
* Add optimizing LRO wrapper:hselasky2016-01-192-26/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument. - Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics. - Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries. - Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed. - Add own memory allocation counter for LRO. - Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size. Sponsored by: Mellanox Technologies Reviewed by: gallatin, sbruno, rrs, gnn, transport Tested by: Netflix Differential Revision: https://reviews.freebsd.org/D4914
* Fix a bug in INIT handling on accepted 1-to-1 style sockets when thetuexen2016-01-151-2/+6
| | | | | | | | | | | | | | | | | | | | | listener is closed. This fix allows the following packetdrill test to pass: // Setup a connected, blocking 1-to-1 style socket +0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3 // Check the handshake with en empty(!) cookie +0.0 bind(3, ..., ...) = 0 +0.0 listen(3, 1) = 0 +0.0 < sctp: INIT[flgs=0, tag=1, a_rwnd=1500, os=1, is=1, tsn=1] +0.0 > sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=..., os=..., is=..., tsn=1, ...] +0.0 < sctp: COOKIE_ECHO[flgs=0, len=..., val=...] +0.0 > sctp: COOKIE_ACK[flgs=0] +0.0 accept(3, ..., ...) = 4 +0.0 close(3) = 0 // Inject an INIT chunk and expect an INIT-ACK +0.0 < sctp: INIT[flgs=0, tag=3, a_rwnd=1500, os=1, is=1, tsn=1] +0.0 > sctp: INIT_ACK[flgs=0, tag=..., a_rwnd=..., os=..., is=..., tsn=..., ...] MFC after: 3 days
* Fail the SCTP_GET_ASSOC_NUMBER and SCTP_GET_ASSOC_ID_LISTtuexen2016-01-141-2/+16
| | | | | | socket options for 1-to-1 style sockets as specified in RFC 6458. MFC after: 3 days
* There is a bug in tcp_output()'s implementation of the TCP_SIGNATUREglebius2016-01-141-2/+4
| | | | | | | | | | | | (RFC 2385/TCP-MD5) kernel option. If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called, and to.to_signature is an uninitialized stack variable. The value is later used as write offset, which leads to writing to random address. Submitted by: rstone, jtl Security: SA-16:05.tcp
* Remove now-unused wrappers for various routing functions.melifaro2016-01-143-15/+1
|
* Store the timer type for logging, because the timer can be freedtuexen2016-01-131-8/+9
| | | | | | during processing the timerout. MFC after: 3 days
* Bring RADIX_MPATH support to new routing KPI to ease migration.melifaro2016-01-111-0/+7
| | | | | | Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.
* Finish r275196: do not dereference rtentry in if_output() routines.melifaro2016-01-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae
* Remove sys/eventhandler.h from net/route.hmelifaro2016-01-096-0/+7
| | | | Reviewed by: ae
* (Temporarily) remove route_redirect_event eventhandler.melifaro2016-01-091-15/+0
| | | | | | | | | | Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np
* Apply the changes from r293284 to one additional file.jtl2016-01-071-3/+1
| | | | Discussed with: glebius
* Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,glebius2016-01-076-90/+121
| | | | | | | | | | | | | | | | | | | | | | and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593
* Get struct sctp_net_route in sync with struct route again.tuexen2016-01-041-3/+5
|
* Maintain consistent behavior: make fib4_lookup_nh_ext() returnmelifaro2016-01-041-1/+4
| | | | rt_ifp pointer by default, as done by other fib lookup functions.
* Add rib_lookup_info() to provide API for retrieving individual routemelifaro2016-01-041-19/+29
| | | | | | | | | | | | | | | | | | | | | | | entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).
* Fix fib4_lookup_nh_ext() flags/flowid order messed up while merging.melifaro2016-01-031-2/+2
|
* Remove second EVENTHANDLER_REGISTER slipped in r292978.melifaro2016-01-011-3/+0
| | | | Describe the reason of doing unconditional M_PREPEND in ether_output().
OpenPOWER on IntegriCloud