summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* Mfp4 CH=177274,177280,177284-177285,177297,177324-177325bz2011-02-162-0/+4
| | | | | | | | | | | | | | | | | | | | | | VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks
* Bump dummynet module version to meet dummynet schedulers' requirements,pluknet2011-02-161-1/+1
| | | | | | | and thus unbreak loading dummynet.ko via /boot/loader.conf. Reported by: rihad <rihad att mail.ru> on freebsd-net Approved by: kib (mentor)
* Fix a bug reported by Jonathan Leighton in his web-sctp testingrrs2011-02-131-7/+15
| | | | | | | | at the Univ-of-Del. Basically when a 1-to-1 socket did a socket/bind/send(data)/close. If the timing was right we would dereference a socket that is NULL. MFC after: 1 month
* Fix several bugs related to stream scheduling.tuexen2011-02-131-127/+103
| | | | | Obtained from: Robin Seggelmann MFC after: 3 months.
* Oops, revert an accidental local change that got added indeischen2011-02-131-4/+0
| | | | | | my last commit (r218627). No damage was done in the last commit, just some duplicated code was added (which is now removed).
* Allow the SO_SETFIB socket option to select the default (0)deischen2011-02-131-0/+4
| | | | | | routing table. Reviewed by: julian
* Remove addresses from endpoint when there are no associations.tuexen2011-02-101-3/+10
| | | | | | This fixes a bug reported by brucec@. MFC after: 3 months.
* Fix bugs related to M_FLOWID:tuexen2011-02-074-13/+25
| | | | | | | | * Store the flowid when receiving an SCTP/IPv6 packet. * Store the flowid when receiving an SCTP packet with wrong CRC. * Initilize flowid correctly. * Put test code under INVARIANTS. MFC after: 3 months.
* If not set (due to some error Michael is working onrrs2011-02-071-0/+12
| | | | | | fixing) set it for the net. MFC after: 3 months
* 1) Track when flowid does get set.rrs2011-02-073-2/+4
| | | | MFC after: 3 months
* 1) Use same scheme Michael and I discussed for a selected for a flowidrrs2011-02-061-20/+20
| | | | | | | 2) If flowid is not set, arrange so it is stored. 3) If flowid is set by lower layer, use it. MFC after: 3 Months
* correct the 'output_time' of packets generated by dummynet.luigi2011-02-051-1/+1
| | | | | | | | | | | | | | | | In the dec.2009 rewrite I introduced a bug, using for the computation the arrival time instead of the time the packet has exited from the queue. The bandwidth computation was still correct because it is computed elsewhere, but traffic was sent out in bursts. The bug is also present in RELENG_8 after dec.2009 Thanks to Daikichi Osuga for investingating, finding and fixing the bug with detailed graphs of the behaviour before and after the fix. Submitted by: Daikichi Osuga MFC after: 2 weeks
* Add support for M_FLOWID.tuexen2011-02-054-10/+52
|
* 1) Typo correction in comments and one spacing change.rrs2011-02-0536-51/+78
| | | | | 2) Mass update to all copyrights. MFC after: 3 Months
* When turning off TCP_NOPUSH, only call tcp_output() to immediately flushjhb2011-02-041-2/+3
| | | | | | | | any pending data if the connection is established. Submitted by: csjp Reviewed by: lstewart MFC after: 1 week
* 1) Fix cpu mapping per JB's suggestionsrrs2011-02-042-19/+50
| | | | | | 2) Fix it so INIT's don't always end up on CPU0 MFC after: 3 months
* Fix typo (Tuneable -> Tunable).brucec2011-02-041-4/+4
|
* Fix several bugs in the stream schedulers.tuexen2011-02-036-54/+67
| | | | | | From Robin Seggelmann. MFC after: 3 months.
* Make sure that changing the ECN sysctl does not affecttuexen2011-02-036-30/+34
| | | | | | exisiting associations and endpoints. MFC after: 3 months.
* 1) Move per John Baldwin to mp_maxidrrs2011-02-037-22/+37
| | | | | | | 2) Some signed/unsigned errors found by Mac OS compiler (from Michael) 3) a couple of copyright updates on the effected files. MFC after: 3 months
* Fix the per CPU stats so that:rrs2011-02-034-8/+39
| | | | | | | | | | | | 1) They don't use the giant "MAX_CPU" define and instead are allocated dynamically based on mp_ncpus 2) Will zero with the netstat -z -s -p sctp 3) Will be properly handled by both the sctp_init and finish (the multi-net stuff was incorrectly bzero'ing in sctp_init the wrong size.. the bzero is now moved to the right places). And of course the free is put in at the very end. MFC after: 3 Months
* Adds an experimental option to create a pool ofrrs2011-02-038-4/+253
| | | | | | | | | threads. These serve as input threads and are queued packets based on the V-tag number. This is similar to what a modern card can do with queue's for TCP... but alas modern cards know nothing about SCTP. MFC after: 3 months (maybe)
* 1) Allow a chunk to track the cwnd it was at when sent.rrs2011-02-0214-29/+120
| | | | | | | | | | | | | | | | | | | 2) Add separate max-bursts for retransmit and hb. These are set to sysctlable values but not settable via the socket api. This makes sure we don't blast out HB's or fast-retransmits. 3) Determine on the first data transmission on a net if its local-lan (by being under or over a RTT). This can later be used to think about different algorithms based on locallan vs big-i (experimental) 4) The cwnd should NOT be allowed to grow when an ECNEcho is seen (TCP has this same bug). We fix this in SCTP so an ECNe being seen prevents an advance of cwnd. 5) CWR's should not be sent multiple times to the same network, instead just updating the TSN being transmitted if needed. MFC after: 1 Month
* Algorithm modules can define their own private congestion signal types in thelstewart2011-02-012-4/+10
| | | | | | | | | | | | | | | | | | | | | | | top 8 bits of the 32 bit signal bit field space for internal use. These private signals should not be leaked outside of a module. Given that many algorithm modules use the NewReno hook functions to simplify their implementation, the obvious place such a leak would show up is in the NewReno cong_signal hook function. - Show the full number of significant bits in the signal type definitions in <netinet/cc.h>. - Add a bitmask to simplify figuring out if a given signal is in the private or public bit range. - Add a sanity check in newreno_cong_signal() to ensure private signals are not being leaked into the hook function. Sponsored by: FreeBSD Foundation Discussed with: David Hayes <dahayes at swin edu au> MFC after: 1 week X-MFC with: r215166
* Fix typo in comment: "course" -> "coarse"lstewart2011-02-011-1/+1
| | | | | | | Sponsored by: FreeBSD Foundation Submitted by: jmallett MFC after: 3 months X-MFC with: r218152
* Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion controllstewart2011-02-011-0/+497
| | | | | | | | | | | | | | | | | | | | | | algorithm described in the paper "Improved coexistence and loss tolerance for delay based TCP congestion control" by Hayes and Armitage. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide tolerance to non-congestion related packet loss and improvements to coexistence with loss-based congestion control algorithms. A key idea in improving coexistence with loss-based congestion control algorithms is the use of a shadow window, which attempts to track how NewReno's congestion window (cwnd) would evolve. At the next packet loss congestion event, CHD uses the shadow window to correct cwnd in a way that reduces the amount of unfairness CHD experiences when competing with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
* Import a clean-room implementation of the Hamilton-Delay (HD) congestion controllstewart2011-02-011-0/+254
| | | | | | | | | | | | | | | | | | | | | | algorithm based on the paper "A strategy for fair coexistence of loss and delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and Baker. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. HD uses a probabilistic approach to reacting to delay-based congestion. The probability of reducing cwnd is zero when the queuing delay is very small, increasing to a maximum at a set threshold, then back down to zero again when the queuing delay is high. Normal operation keeps the queuing delay below the set threshold. However, since loss-based congestion control algorithms push the queuing delay high when probing for bandwidth, having the probability of reducing cwnd drop back to zero for high delays allows HD to compete with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
* Import a clean-room implementation of the VEGAS congestion control algorithmlstewart2011-02-011-0/+308
| | | | | | | | | | | | | | | | | based on the paper "TCP Vegas: end to end congestion avoidance on a global internet" by Brakmo and Peterson. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. VEGAS uses network delay as a congestion indicator and unlike regular loss-based algorithms, attempts to keep the network operating with stable queuing delays and no congestion losses. By keeping network buffers used along the path within a set range, queuing delays are kept low while maintaining high throughput. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
* More ECN fixes:rrs2011-01-3113-486/+86
| | | | | | | | | | | 1) We now remove ECN-Nonce since it will no longer continue as a I-D 2) Eliminate last_tsn_echo, this tied us to an assoc not the net and thus we were not doing m-homing on the ECN-Echo senders side right. 3) Increment the count going out even if the TSN in lower in the pending ECN-Echo, this way the receiver knows exactly how many packets were marked even with network re-ordering 4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue MFC after: 1 month
* Remove duplicate printing of TF_NOPUSH in db_print_tflags().bz2011-01-291-4/+0
| | | | MFC after: 10 days
* Fixes to ECN in SCTP.rrs2011-01-299-53/+165
| | | | | | | | | | | | | 1) ECN was on an association basis, this is incorrect and will not work with CMT or for that matter if the user is sending to multiple addresses. This commit makes ECN on a per path basis. 2) Adopt the new format for the ECN internet draft. This also maintains compatability with old format chunks as well. 3) Keep track of the real time of a RTT down to micro seconds. For some future conditional features (for like a data center this is good information to have). MFC after: 1 month
* Keep track of the real last RTT on each net.rrs2011-01-282-1/+8
| | | | | | | | | This will be used for Data Center congestion control, we won't want to engage it in the ECN code unless we KNOW that the RTT is less than 500us. MFC after: 1 week
* Fix a bug in the way ECN-Echo chunkrrs2011-01-282-3/+19
| | | | | | | | | | sends were being accounted for. The counting was such that we counted only when we queued a chunk, not when we sent it. Now keep an additional counter for queuing and one for sending. MFC after: 1 week
* * Use 300 ms as the default for RTO_MIN.tuexen2011-01-261-4/+3
| | | | | | | * Disable burst mitigation by default. * Remove unused constant. Discussed with rrs. MFC after: 3 months.
* Make SCTP_MAX_BURST compliant with the latest version oftuexen2011-01-261-13/+22
| | | | | the socket API ID. This is not compatible with the API in stable/8.
* Change infrastructure for SCTP_MAX_BURST to allow compliancetuexen2011-01-266-33/+43
| | | | | | | | | | with the latest socket API ID. Especially it can be disabled. Full compliance needs changing the structure used in the socket option. Since this breaks the API, it will be a seperate commit which will not be MFCed to stable/8. MFC after: 3 months.
* Prison check addresses set with multicast interface options.deischen2011-01-261-6/+9
| | | | | Reviewed by: bz MFC after: 1 week
* When matching an incoming ARP against a bridge, ensure both interfaces belongthompsa2011-01-251-2/+2
| | | | | | to the same bridge. Submitted by: Alexander Zagrebin
* Import the ERTT (Enhanced Round Trip Time) Khelp module. ERTT uses thelstewart2011-01-242-0/+634
| | | | | | | | | | | | | | | | Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low noise estimate of the instantaneous RTT. ERTT's implementation is robust even in the face of delayed acknowledgements and/or TSO being in use for a connection. A high quality, low noise RTT estimate is a requirement for applications such as delay-based congestion control, for which we will be importing some algorithm implementations shortly. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
* Add stream scheduling support.tuexen2011-01-2314-178/+1198
| | | | | | This work is based on a patch received from Robin Seggelmann. MFC after: 3 months.
* An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when alstewart2011-01-231-4/+21
| | | | | | | | | | | | | | | | | | | | write to the buffer causes it to overflow. We therefore can't hold the CC list rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND. Switch to a fixed length sbuf which should be of sufficient size except in the very unlikely event that the sysctl is being processed as one or more new algorithms are loaded. If that happens, we accept the race and may fail the sysctl gracefully if there is insufficient room to print the names of all the algorithms. This should address a WITNESS warning and the potential panic that would occur if the sbuf call to malloc did sleep whilst holding the CC list rwlock. Sponsored by: FreeBSD Foundation Reported by: Nick Hibma Reviewed by: bz MFC after: 3 weeks X-MFC with: r215166
* Remove unnecessary checking of variable.tuexen2011-01-231-12/+2
| | | | MFC after: 3 months.
* Some correctness and robustness fixes related to CUBIC's mean RTT estimate:lstewart2011-01-211-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - The mean RTT is updated at the end of each congestion epoch, but if we switch to congestion avoidance within the first epoch (e.g. if ssthresh was primed from the hostcache), we'll trigger a divide by zero panic in cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the mean is less than the min to ensure we have a sane mean for use in this situation. This fixes the panic reported by Nick Hibma. - Adjust conditions under which we update the mean RTT in cubic_post_recovery() to ensure a low latency path won't yield an RTT of less than 1. This avoids another potential divide by zero panic when running CUBIC in networks with sub-millisecond latencies. - Remove the "safety" assignment of min into mean when we don't update the mean because of failed conditions. The above change to the conditions for updating the mean ensures the safety issue is addressed and I feel it is better to keep our previous mean estimate around if we can't update than to revert to the min. - Initialise the mean RTT to 1 on connection startup to act as a safety belt if a situation we haven't considered and addressed with the above changes were to crop up in the wild. Sponsored by: FreeBSD Foundation Reported and tested by: Nick Hibma Discussed with: David Hayes <dahayes at swin edu au> MFC after: 5 weeks X-MFC with: r216114
* Improve comments.tuexen2011-01-201-5/+5
| | | | MFC after: 1 week.
* Fix it so we align with new socket API draft forrrs2011-01-203-25/+21
| | | | | | state's in destination (i.e. ACTIVE/INACTIVE/UNCONFIRMED) MFC after: 1 week
* Cleanup the management of CC functions.tuexen2011-01-195-256/+68
| | | | MFC after: 3 months.
* Fix style 9 nit that snuck in when Irrs2011-01-191-1/+1
| | | | | | grabbed the wrong patch ;-0 (thanks Daniel) MFC after: 1 week
* Fix a bug where Multicast packets sent from arrs2011-01-191-3/+5
| | | | | | | | udp endpoint may end up echoing back to the sender even with OUT joining the multi-cast group. Reviewed by: gnn, bms, bz? Obtained from: deischen (with help from)
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needmdf2011-01-186-75/+81
| | | | | | to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE.
* Add support for resource pooling to CMT.tuexen2011-01-161-44/+119
| | | | | | | An original version of the patch was developed by Martin Becke and Thomas Dreibholz. MFC after: 3 months
OpenPOWER on IntegriCloud