summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* mdoc: drop even more redundant .Pp callsuqs2010-10-191-1/+0
| | | | | | No change in rendered output, less mandoc lint warnings. Tool provided by: Nobuyuki Koganemaru n-kogane at syd.odn.ne.jp
* MfP4 CH182763 (original version):bz2010-10-161-0/+15
| | | | | | | | | | | | | | | | Make it harder to exploit certain in_control() related races between the intiial lookup at the beginning and the time we will remove the entry from the lists by re-checking that entry is still in the list before trying to remove it. (*) It is believed that with the current code and locking strategy we cannot completely fix all race. Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817 Tested by: Nima Misaghian (nima_misa hotmail.com) (original version) PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version) MFC after: 1 week
* Retire the system-wide, per-reassembly queue segment limit. The mechanism is farlstewart2010-10-161-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | too coarse grained to be useful and the default value significantly degrades TCP performance on moderate to high bandwidth-delay product paths with non-zero loss (e.g. 5+Mbps connections across the public Internet often suffer). Replace the outgoing mechanism with an individual per-queue limit based on the number of MSS segments that fit into the socket's receive buffer. This should strike a good balance between performance and the potential for resource exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer autotuning (which is enabled by default), the reassembly queue tracks the socket buffer and benefits too. As the XXX comment suggests, my testing uncovered some unexpected behaviour which requires further investigation. By using so->so_rcv.sb_hiwat instead of sbspace(&so->so_rcv), we allow more segments to be held across both the socket receive buffer and reassembly queue than we probably should. The tradeoff is better performance in at least one common scenario, versus a devious sender's ability to consume more resources on a FreeBSD receiver. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
* - Switch the "net.inet.tcp.reass.cursegments" andlstewart2010-10-161-13/+23
| | | | | | | | | | | | | | | | | "net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone stats. The value returned by the cursegments sysctl is approximate owing to the way in which uma_zone_get_cur is implemented. - Discontinue use of V_tcp_reass_qsize as a global reassembly segment count variable in the reassembly implementation. The variable was used without proper synchronisation and was duplicating accounting done by UMA already. The lack of synchronisation was particularly problematic on SMP systems terminating many TCP sessions, resulting in poor TCP performance for connections with non-zero packet loss. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo (as part of a larger patch) MFC after: 2 weeks
* Use ifa_ifwithaddr_check() rather than ifa_ifwithaddr() as we are notbz2010-10-141-1/+1
| | | | | | | | interested in the result and would leak a reference otherwise. PR: kern/151435 Submitted by: Andrew Boyer (aboyer averesystems.com) MFC after: 3 days
* put back the assigment to sched_time. It was correct, andluigi2010-10-011-0/+1
| | | | | | it was necessary. Submitted by: Riccardo Panicucci
* Proper bracketing.bz2010-10-011-2/+2
| | | | | | PR: kern/151100 Submitted by: SunMinghao (sunminghao hotmail.com) MFC after: 3 days
* remove an unnecessary (and wrong) assignment.luigi2010-09-291-1/+0
| | | | | | | | It was meant to reset idle_time (and it was not needed), but i even used the wrong field. Obtained from: Oleg MFC after: 3 days
* whitespace changes in preparation for future commitsluigi2010-09-295-8/+15
|
* fix handling of initial credit for an idle pipe.luigi2010-09-291-1/+4
| | | | | | | | | This fixes the bug where setting bw > 1 MTU/tick resulted in infinite bandwidth if io_fast=1 PR: 147245 148429 Obtained from: Riccardo Panicucci MFC after: 3 days
* fix breakage in in-kernel NAT: the code did not honorluigi2010-09-281-0/+5
| | | | | | | | | | | net.inet.ip.fw.one_pass and always moved to the next rule in case of a successful nat. This should fix several related PR (waiting for feedback before closing them) PR: 145167 149572 150141 MFC after: 3 days
* Whitespace changes to reduce diffs wrt the most recent ipfw/dummynet code:luigi2010-09-283-10/+7
| | | | | | | | + remove an unused macro, + adjust the constants in an enum + small whitespace changes MFC after: 3 days
* Add a bandaid for a long-standing race condition during route entrydelphij2010-09-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | un-expiring. The previous version of code have no locking when testing rt_refcnt. The result of the lack of locking may result in a condition where a routing entry have a reference count but at the same time have RTPRF_OURS bit set and an expiration timer. These would eventually lead to a panic: panic: rtqkill route really not free When the system have ICMP redirects accepted from local gateway in a moderate frequency, for instance. Commit this workaround for now until we have some better solution. PR: kern/149804 Reviewed by: bz Tested by: Zhao Xin, Pete French MFC after: 2 weeks
* Log the number of segments currently in the reassembly queue.lstewart2010-09-251-6/+11
| | | | Sponsored by: FreeBSD Foundation
* Internalise reassembly queue related functionality and variables which shouldlstewart2010-09-253-24/+28
| | | | | | | | | | not be used outside of the reassembly queue implementation. Provide a new function to flush all segments from a reassembly queue and call it from the appropriate places instead of manipulating the queue directly. Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
* Make the RPC specific __rpc_inet_ntop() and __rpc_inet_pton() generalattilio2010-09-241-0/+2
| | | | | | | | | | in the kernel (just as inet_ntoa() and inet_aton()) are and sync their prototype accordingly with already mentioned functions. Sponsored by: Sandvine Incorporated Reviewed by: emaste, rstone Approved by: dfr MFC after: 2 weeks
* IP_BINDANY is not correctly handled in getsockopt() case.attilio2010-09-241-0/+4
| | | | | | | | | Fix it by specifying the correct bits. Sponsored by: Sandvine Incorporated Reviewed by: bz, emaste, rstone Obtained from: Sandvine Incorporated MFC after: 10 days
* Do not convert some meaningful error value to EINVAL.glebius2010-09-201-4/+4
| | | | Reviewed by: will
* Fix a locking issue which resulted in aborted associationstuexen2010-09-201-4/+4
| | | | | | due to a corrupted nr-mapping array. MFC after: 2 weeks.
* Allow the initial congestion window to be configuretuexen2010-09-191-2/+2
| | | | | | to one MTU. Improve the description. MFC after: 2 weeks.
* Fix a locking issue which shows up when the code is usedtuexen2010-09-192-4/+4
| | | | | | on Mac OS X. MFC after: 2 weeks.
* Rearrange the TSO code to make it more readable and to clearlyandre2010-09-171-33/+49
| | | | | | | | | | | | | | | | | | separate the decision logic, of whether we can do TSO, and the calculation of the burst length into two distinct parts. Change the way the TSO burst length calculation is done. While TSO could do bursts of 65535 bytes that can't be represented in ip_len together with the IP and TCP header. Account for that and use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both have the same value of 64K). When more data is available prevent less than MSS sized segments from being sent during the current TSO burst. Add two more KASSERTs to ensure the integrity of the packets. Tested by: Ben Wilber <ben-at-desync com> MFC after: 10 days
* Fix a bug where the wrong PR-SCTP policy was considered.tuexen2010-09-173-11/+3
| | | | | | | While there, use always the same code for the check of TTL expiration. MFC after: 2 weeks.
* Make the initial congestion window configurable via sysctl.tuexen2010-09-173-7/+29
| | | | MFC after: 2 weeks.
* * Implement initial version of send buffer splitting.tuexen2010-09-174-6/+36
| | | | | * Make send/recv buffer splitting switchable via sysctl. * While there: Fix some comments.
* Remove the TCP inflight bandwidth limiter as announced in r211315andre2010-09-168-235/+16
| | | | | | | | | | | | | | | | | | | | | | | to give way for the pluggable congestion control framework. It is the task of the congestion control algorithm to set the congestion window and amount of inflight data without external interference. In 'struct tcpcb' the variables previously used by the inflight limiter are renamed to spares to keep the ABI intact and to have some more space for future extensions. In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to preserve the ABI. It is always set to 0. In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed to preserve the ABI. It is always set to 0. These unused variable in the various structures may be reused in the future or garbage collected before the next release or at some other point when an ABI change happens anyway for other reasons. No MFC is planned. The inflight bandwidth limiter stays disabled by default in the other branches but remains available.
* Improve comment to TCP_MINMSS by taking the wording from lstewart (withandre2010-09-161-7/+7
| | | | | | | a small difference in the last paragraph though) as suggested by jhb. Clarify that the 'reviewed by' in r212653 by lstewart was for the functional change, not the comments in the committed version.
* Remove old debug code.tuexen2010-09-154-48/+0
| | | | MFC after: 2 weeks.
* Remove unused variable/assignment.tuexen2010-09-151-3/+1
| | | | MFC after: 3 weeks.
* Delay the assignment of a path for DATA chunk until they hittuexen2010-09-158-191/+112
| | | | | | | the sent_queue. Honor a given path when the SCTP_ADDR_OVER flag is set. MFC after: 2 weeks.
* Use TAILQ_EMPTY() for testing if a tail queue is empty.tuexen2010-09-151-4/+5
| | | | Set whoFrom to NULL after freeing whoFrom.
* Remove unused variable/assignment.tuexen2010-09-151-2/+1
| | | | MFC after: 2 weeks.
* Remove assignment without effect.tuexen2010-09-151-2/+0
| | | | MFC after: 2 weeks.
* * Use !TAILQ_EMPTY() for checking if a tail queue is not empty.tuexen2010-09-151-4/+3
| | | | | | * Remove assignment without any effect. MFC after: 2 weeks.
* Change the default MSS for IPv4 and IPv6 TCP connections from anandre2010-09-151-19/+27
| | | | | | | | | | | | | | | | | | artificial power-of-2 rounded number to their real values specified in RFC879 and RFC2460. From the history and existing comments it appears that the rounded numbers were intended to be advantageous for the kernel and mbuf system. However this hasn't been the case at for at least a long time. The mbuf clusters used in tcp_output() have enough space to hold the larger real value for the default MSS for both IPv4 and IPv6. Note that the default MSS is only used when path MTU discovery is disabled. Update and expand related comments. Reviewed by: lsteward (including some word-smithing) MFC after: 2 weeks
* Adding an address on an interface also requires the loopback route toqingli2010-09-121-0/+2
| | | | | | | | that address be installed. PR: kern/150481 Submitted by: Ingo Flaschberger <if at xip.at> MFC after: 5 days
* * Remove code which has no effect.tuexen2010-09-091-108/+61
| | | | | | * Clean up the handling in sctp_lower_sosend(). MFC after: 3 weeks.
* Fix CARP in backup mode by properly registering its hooks for INET and INET6will2010-09-061-0/+15
| | | | | | | | | using ipproto_{un,}register() and the newly created ip6proto_{un,}register() so that it can again receive IPPROTO_CARP packets allowing its state machine to work. Reviewed by: bz Approved by: ken (mentor)
* Fix static kernel builds with carp(4) by changing its SYSINIT order so thatwill2010-09-061-1/+1
| | | | | | | | it is initialized after basic protocol initialization, which allows it to register via pf_proto_register(). Reviewed by: bz Approved by: ken (mentor)
* in_delayed_cksum() requires host byte order.glebius2010-09-061-6/+4
| | | | | Reported by: Alexander Levin <amindomao googlemail.com> MFC after: 1 week
* Implement correct handling of address parameter andtuexen2010-09-052-122/+78
| | | | | | sendinfo for SCTP send calls. MFC after: 4 weeks.
* Fix some CLANG warnings. One clang warning is leftrrs2010-09-055-17/+35
| | | | | | due to the fact that its bogus.. nam->sa_family will not change from AF_INET6 to AF_INET (but clang thinks it does ;-D)
* In case of RADIX_MPATH do not leak the IN_IFADDR read lock onbz2010-09-041-2/+3
| | | | | | early return. MFC after: 3 days
* MFp4 CH=183052 183053 183258:bz2010-09-022-12/+8
| | | | | | | | | | | | | | | | | | | | | In protosw we define pr_protocol as short, while on the wire it is an uint8_t. That way we can have "internal" protocols like DIVERT, SEND or gaps for modules (PROTO_SPACER). Switch ipproto_{un,}register to accept a short protocol number(*) and do an upfront check for valid boundries. With this we also consistently report EPROTONOSUPPORT for out of bounds protocols, as we did for proto == 0. This allows a caller to not error for this case, which is especially important if we want to automatically call these from domain handling. (*) the functions have been without any in-tree consumer since the initial introducation, so this is considered save. Implement ip6proto_{un,}register() similarly to their legacy IP counter parts to allow modules to hook up dynamically. Reviewed by: philip, will MFC after: 1 week
* Fix a bug which results in peer IPv4 addresses a.b.c.d with 224<=d<=239tuexen2010-09-011-1/+1
| | | | | | incorrectly being detected as multicast addresses on little endian systems. MFC after: 2 weeks
* o Some programs could send broadcast/multicast traffic to ipfwmaxim2010-08-301-2/+21
| | | | | | | | | | | | | pseudo-interface. This leads to a panic due to uninitialized if_broadcastaddr address. Initialize it and implement ip_output() method to prevent mbuf leak later. ipfw pseudo-interface should never send anything therefore call panic(9) in if_start() method. PR: kern/149807 Submitted by: Dmitrij Tejblum MFC after: 2 weeks
* Fix the the SCTP_WITH_NO_CSUM option when used in combination withtuexen2010-08-297-101/+123
| | | | | | | interface supporting CRC offload. While at it, make use of the feature that the loopback interface provides CRC offloading. MFC after: 4 weeks
* Bugfix: Do not send a packet drop report in response to a receivedtuexen2010-08-281-3/+6
| | | | INIT-ACK with incorrect CRC.
* Fix the switching on/off of CMT using sysctl and socket option.tuexen2010-08-2811-174/+159
| | | | | | | | Fix the switching on/off of PF and NR-SACKs using sysctl. Add minor improvement in handling malloc failures. Improve the address checks when sending. MFC after: 4 weeks
* Simplify the tcp pcblist estimate logic slightly.jhb2010-08-271-5/+3
| | | | MFC after: 3 days
OpenPOWER on IntegriCloud