summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* MFC r260871:adrian2014-02-101-0/+10
| | | | | | | | | | | | | If the flowid is available for the mbuf that finalised the creation of a syncache connection, copy it into the inp_flowid field. Without this, an incoming TCP connection won't have an inp_flowid marked until some data comes in, and this means that things like the per-CPU TCP timer option will choose a different CPU for the timer work. (It also means that if one grabbed the flowid via an ioctl from userland, it won't be available until some data has been received.) Sponsored by: Netflix, Inc.
* MFC r260702 (by melifaro):ae2014-02-061-0/+8
| | | | | | | | | | | | | Fix ipfw fwd for IPv4 traffic broken by r249894. Problem case: Original lookup returns route with GW set, so gw points to rte->rt_gateway. After that we're changing dst and performing lookup another time. Since fwd host is most probably directly reachable, resulting rte does not contain rt_gateway, so gw is not set. Finally, we end with packet transmitted to proper interface but wrong link-layer address.
* MFC 260796gnn2014-02-031-8/+18
| | | | | | | Fix various places where we don't properly release a lock PR: 185043 Submitted by: Michael Bentkofsky
* Merge 261024: fix PIM input regression.glebius2014-01-271-5/+3
|
* Merge r257846:glebius2014-01-221-0/+21
| | | | | Make TCP_KEEP* socket options readable. At least PostgreSQL wants to read the values.
* MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2014-01-176-37/+37
|
* MFC r258605: Convert over the TCP probes to use mtod()avg2014-01-172-10/+11
| | | | MFC slacker: adrian
* MFC r260151 (by adrian):ae2014-01-102-4/+5
| | | | | | | | | | | | | | | | | Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. MFC r260187: lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. MFC r260217: Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag.
* Revert MFC of r258821 - it was already handled by MFC of r239672.peter2014-01-081-4/+2
| | | | Pointy hat to: peter
* MFC r259943:tuexen2014-01-072-3/+3
| | | | Address some warnings which showed up on the userland version.
* MFC r258821 - fix tcp simultaneous closepeter2014-01-071-2/+4
| | | | PR: kern/99188
* Merge r260188 from head:glebius2014-01-051-0/+6
| | | | | | | Fix regression from r249894. Now we pass "gw" as argument to if_output method, thus for multicast case we need it to point at "dst". PR: 185395
* MFC r259906: Draft-ietf-tcpm-initcwnd-05 became RFC6928.pluknet2014-01-021-2/+2
|
* MFC r259839:dim2013-12-281-0/+4
| | | | | | In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled.
* MFC r258574:tuexen2013-12-032-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only initialize some mutexes for the default VNET. In r208160, sctp_it_ctl was made a global variable, across all VNETs. However, sctp_init() is called for every VNET that is created. This results in the same global mutexes which are part of sctp_it_ctl being initialized. This can result in crashes if many jails are created. To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html ) Witness will warn about the same mutex being initialized. Fix the problem by only initializing these mutexes in the default VNET. MFC r258765: In http://svnweb.freebsd.org/changeset/base/258221 I introduced a bug which initialized global locks whenever the SCTP stack initialized. This was fixed in http://svnweb.freebsd.org/changeset/base/258574 by rodrigc@. He just initialized the locks for the default vnet. This fix reverts to the old behaviour before r258221, which explicitly makes sure it is only called once, because this works also on other platforms. Approved by: re@ (gjb)
* MFC r256556:tuexen2013-11-2111-112/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a buggy comparision when setting manually the path MTU. After fixing, the comparision would have become redundant. Thanks to Andrew Galante for reporting the issue. MFC r257272: Fix compilation if SCTP_DONT_DO_PRIVADDR_SCOPE is defined. The issue was reported by Andrew Galante. MFC r257274: Fix the value of *optlen when calling getsockopt() for SCTP_REMOTE_UDP_ENCAPS_PORT. This issue was reported by Andrew Galante. MFC r257359: Terminate a debug output with a \n. MFC r257555: Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC r257574: Unlock the lock before destroying it. This issue was reported by Andrew Galante. MFC r257800: Use htons()/ntohs() appropriately. These issues were reported by Andrew Galante. MFC r257803: Make sure that we don't try to build an ASCONF-ACK chunk larger than what fits in the the mbuf cluster. This issue was reported by Andrew Galante. MFC r257804: Get rid of the artification limitation enforced by SCTP_AUTH_RANDOM_SIZE_MAX. This was suggested by Andrew Galante. MFC r258221: Cleanups which result in fixes which have been made upstream and where partially suggested by Andrew Galante. There is no functional change in FreeBSD. MFC r258224: When determining if an address belongs to an stcb, take the address family into account for wildcard bound endpoints. MFC r258228: Remove a stray write operation. MFC r258235: Use SCTP_PR_SCTP_TTL when the user provides a positive timetolive in sctp_sendmsg(). Approved by: re@
* MFC r256920:andre2013-10-291-4/+7
| | | | | | | | | | | | | | | | | | | | | | | The TCP delayed ACK logic isn't aware of LRO passing up large aggregated segments thinking it received only one segment. This causes it to enable the delay the ACK for 100ms to wait for another segment which may never come because all the data was received already. Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes us further away from acking every other packet; b) it introduces additional delay in responding to the sender. The latter is especially bad because it is in the nature of LRO to aggregated all segments of a burst with no more coming until an ACK is sent back. Change the delayed ACK logic to detect LRO segments by being larger than the MSS for this connection and issuing an immediate ACK for them to keep the ACK clock ticking without interruption. Reported by: julian, cperciva Tested by: cperciva Reviewed by: lstewart Approved by: re (glebius)
* When processing ACK in tcp_do_segment, use sbcut_locked() instead ofglebius2013-10-091-2/+5
| | | | | | | | | | | sbdrop_locked() to cut acked mbufs from the socket buffer. Free this chain a batch manner after the socket buffer lock is dropped. This measurably reduces contention on socket buffer. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)
* Add a separate translator for headers passed to the TCP probes in themarkj2013-10-021-4/+4
| | | | | | | | | input path. These probes get some of the fields in host order, whereas the output probes get them in network order, so a single translator isn't enough. This workaround ensures that the problem is essentially invisble to users: none of the probe arguments or their fields have changed. Approved by: re (hrs)
* Introduce spares in the TCP syncache and timewait structuresbz2013-09-212-1/+4
| | | | | | | | | so that fixed TCP_SIGNATURE handling can later be merged. This is derived from follow-up work to SVN r183001 posted to net@ on Sep 13 2008. Approved by: re (gjb)
* Unregister inet/inet6 pfil hooks on vnet destroy.trociny2013-09-131-0/+5
| | | | | Discussed with: andre Approved by: re (rodrigc)
* Fix the aborting of association with the iterator using an emptytuexen2013-09-091-37/+35
| | | | | | | user initiated error cause (using SCTP_ABORT|SCTP_SENDALL). Approved by: re (delphij) MFC after: 1 week
* Relese the interface in the last.trociny2013-09-081-1/+1
| | | | | Reviewed by: glebius Approved by: re (kib)
* When computing the partial delivery point, take thetuexen2013-09-071-3/+2
| | | | | | receiver socket buffer size correctly into account. MFC after: 1 week
* Use LIST_FOREACH_SAFE() instead of doing it by hand.jhb2013-09-051-7/+5
|
* Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. Thisjhb2013-09-051-4/+4
| | | | | | | | | matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week
* Remove unused code and sort variables declarations.ae2013-09-051-8/+2
| | | | | PR: kern/181822 MFC after: 1 week
* Remove redundant field pr_sctp_on.tuexen2013-09-035-13/+3
| | | | MFC after: 1 week
* Use uint16_t instead of in_port_t for consistency with the SCTP code.tuexen2013-09-021-1/+1
| | | | MFC after: 1 week
* All changes affect only SCTP-AUTH:tuexen2013-09-024-109/+29
| | | | | | | | | * Remove non working code related to SHA224. * Remove support for non-standardised HMAC-IDs using SHA384 and SHA512. * Prefer SHA256 over SHA1. * Minor cleanup. MFC after: 2 weeks
* Merge r254336 from user/np/cxl_tuning.np2013-08-282-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be. Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface. Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later. Reviewed by: andre
* Remove most of the remaining sysctl name list macros. They were onlyjhb2013-08-266-61/+0
| | | | | | | | ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery
* The second last argument of udp:::receive is supposed to contain themarkj2013-08-261-1/+1
| | | | | | connection state, not the IP header. X-MFC with: r254889
* Implement the ip, tcp, and udp DTrace providers. The probe definitions usemarkj2013-08-2513-21/+294
| | | | | | | | | dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month
* Provide human readable debug output.tuexen2013-08-251-2/+2
|
* For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits.andre2013-08-251-1/+1
| | | | | | The upper 32bits are not occupied for now. Sponsored by: The FreeBSD Foundation
* Restructure the mbuf pkthdr to make it fit for upcoming capabilities andandre2013-08-242-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation
* Export the inpcb features as a 64-bit entity.tuexen2013-08-222-3/+3
| | | | | | Bump __FreeBSD_version to 1000048 since the modified structure is user visible and used by netstat, for example.
* Make also the features of the association 64-bit.tuexen2013-08-222-2/+2
| | | | | | | | | When exporting to xinpcb, just export the lower 32-bit. Using there also 64-bits will break the ABI and will be committed separetly. MFC after: 2 weeks X-MFC with: 254248
* Fix an integer overflow in computing the size of a temporary bufferdelphij2013-08-221-0/+2
| | | | | | | | can result in a buffer which is too small for the requested operation. Security: CVE-2013-3077 Security: FreeBSD-SA-13:09.ip_multicast
* Reorder the mbuf defines to make more sense and group related flagsandre2013-08-191-1/+1
| | | | | | | | | | | | | | together. Add M_FLAG_PRINTF for use with printf(9) %b indentifier. Use the generic mbuf flags print names in the net80211 code and adjust the protocol specific bits for their new positions. Change SCTP M_PROTO mapping from 5 to 1 to fit within the 16bit field they use internally to store some additional information. Discussed with: trociny, glebius
* Add m_clrprotoflags() to clear protocol specific mbuf flags at up andandre2013-08-195-3/+17
| | | | | | | | downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius
* Move the SCTP specific definition of M_NOTIFICATION onto a protocolandre2013-08-191-0/+5
| | | | | | | specific mbuf flag from sys/mbuf.h to netinet/sctp_os_bsd.h. It is only relevant within SCTP. Discussed with: tuexen
* Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specificandre2013-08-191-1/+3
| | | | | | | | | | flag instead. The flag is only used within the IP and IPv6 layer 3 protocols. Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both. Discussed with: trociny, glebius
* Move ip_reassemble()'s use of the global M_FRAG mbuf flag to a protocol layerandre2013-08-192-3/+4
| | | | | | | specific flag instead. The flag is only relevant while the packet stays in the IP reassembly queue. Discussed with: trociny, glebius
* Remove unused M_FRAG, M_FIRSTFRAG and M_LASTFRAG tagging from ip_fragment().andre2013-08-191-8/+3
| | | | | There wasn't any real driver (and hardware) support for it. Modern hardware does full fragmentation/segmentation offload instead.
* Specify SDT probe argument types in the probe definition itself rather thanmarkj2013-08-151-159/+102
| | | | | | | | | using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
* Don't send uninitialized memory (two instances of 4 bytes) intuexen2013-08-141-0/+8
| | | | | | | every cookie on the wire. This bug was reported in https://bugzilla.mozilla.org/show_bug.cgi?id=905080 MFC after: 3 days
* Virtualize carp(4) variables to have per vnet control.trociny2013-08-131-53/+61
| | | | Reviewed by: ae, glebius
* Make the features a 64-bit value instead of 32-bit.tuexen2013-08-124-35/+34
| | | | | | | | This will allow an easier integration of the support for NDATA. While there, do also some minor cleanups. Obtained from: rrs@ MFC after: 2 weeks
OpenPOWER on IntegriCloud