summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* MFC r264879smh2014-05-263-8/+10
| | | | | | | Fix jailed raw sockets not setting the correct source address by calling in_pcbladdr instead of prison_get_ip4. Sponsored by: Multiplay
* MFC r265942:yongari2014-05-161-2/+3
| | | | Fix checksum computation. Previously it didn't include carry.
* MFC r264212,r264213,r264248,r265776,r265811,r265909:kevlo2014-05-137-70/+314
| | | | | | | | | | | | | | - Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks. Tested with vlc and a test suite [1]. [1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz Reviewed by: jhb, glebius, adrian - Fix a logic bug which prevented the sending of UDP packet with 0 checksum. - Disable TX checksum offload for UDP-Lite completely. It wasn't used for partial checksum coverage, but even for full checksum coverage it doesn't work.
* Merge 260488, r260508.melifaro2014-05-081-43/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r260488: Split rt_newaddrmsg_fib() into two different functions. Adding/deleting interface addresses involves access to 3 different subsystems, int different parts of code. Each call can fail, so reporting successful operation by rtsock in the middle of the process error-prone. Further split routing notification API and actual rtsock calls via creating public-available rt_addrmsg() / rt_routemsg() functions with "private" rtsock_* backend. r260508: Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (*). bird alias handling is already broken in *BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde).
* MFC: r264739rmacklem2014-05-061-2/+4
| | | | | | Add {} braces so that the code conforms to the indentation. Fortunately, I don't think doing the assignment of cap->tsomax unconditionally causes any problem.
* Fix devfs rules not applied by default for jails.delphij2014-04-301-3/+4
| | | | | | | | | | | | | Fix OpenSSL use-after-free vulnerability. Fix TCP reassembly vulnerability. Security: FreeBSD-SA-14:07.devfs Security: CVE-2014-3001 Security: FreeBSD-SA-14:08.tcp Security: CVE-2014-3000 Security: FreeBSD-SA-14:09.openssl Security: CVE-2010-5298
* MFC r263966:ae2014-04-071-0/+1
| | | | | | | Don't copy the MF flag from original IP header to ICMP error message. PR: 188092 Sponsored by: Yandex LLC
* Merge r262341:glebius2014-04-041-38/+33
| | | | | - Improve logging of send errors, reporting error code and interface. - Reduce code duplication between INET and INET6.
* Merge r262763, r262767, r262771, r262806 from head:glebius2014-03-218-33/+26
| | | | | | | | | | - Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
* Merge r262747: remove extraneous ifa_ref()/ifa_free().glebius2014-03-191-9/+1
|
* Merge r263091: fix mbuf flags clash that lead to failure of operationglebius2014-03-182-9/+3
| | | | | | | of IPSEC and packet filters. PR: kern/185876 PR: kern/186755
* Merge r261582, r261601, r261610, r261613, r261627, r261640, r261641, r261823,glebius2014-03-042-42/+3
| | | | | | | | | | r261825, r261859, r261875, r261883, r261911, r262027, r262028, r262029, r262030, r262162 from head. Large flowtable revamp. See commit messages for merged revisions for details. Sponsored by: Netflix
* Merge r261590: Fixup for r261590 (vnet sysctl handlers cleanup)glebius2014-03-042-11/+2
|
* Merge r261590, r261592 from head:glebius2014-03-041-4/+0
| | | | | | | | | Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET in the sysctl_root(). Note: SYSCTL_VNET_* macros can be removed as well. All is needed to virtualize a sysctl oid is set CTLFLAG_VNET on it. But for now keep macros in place to avoid large code churn.
* MFC r260871:adrian2014-02-101-0/+10
| | | | | | | | | | | | | If the flowid is available for the mbuf that finalised the creation of a syncache connection, copy it into the inp_flowid field. Without this, an incoming TCP connection won't have an inp_flowid marked until some data comes in, and this means that things like the per-CPU TCP timer option will choose a different CPU for the timer work. (It also means that if one grabbed the flowid via an ioctl from userland, it won't be available until some data has been received.) Sponsored by: Netflix, Inc.
* MFC r260702 (by melifaro):ae2014-02-061-0/+8
| | | | | | | | | | | | | Fix ipfw fwd for IPv4 traffic broken by r249894. Problem case: Original lookup returns route with GW set, so gw points to rte->rt_gateway. After that we're changing dst and performing lookup another time. Since fwd host is most probably directly reachable, resulting rte does not contain rt_gateway, so gw is not set. Finally, we end with packet transmitted to proper interface but wrong link-layer address.
* MFC 260796gnn2014-02-031-8/+18
| | | | | | | Fix various places where we don't properly release a lock PR: 185043 Submitted by: Michael Bentkofsky
* Merge 261024: fix PIM input regression.glebius2014-01-271-5/+3
|
* Merge r257846:glebius2014-01-221-0/+21
| | | | | Make TCP_KEEP* socket options readable. At least PostgreSQL wants to read the values.
* MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2014-01-176-37/+37
|
* MFC r258605: Convert over the TCP probes to use mtod()avg2014-01-172-10/+11
| | | | MFC slacker: adrian
* MFC r260151 (by adrian):ae2014-01-102-4/+5
| | | | | | | | | | | | | | | | | Use an RLOCK here instead of an RWLOCK - matching all the other calls to lla_lookup(). This drastically reduces the very high lock contention when doing parallel TCP throughput tests (> 1024 sockets) with IPv6. MFC r260187: lla_lookup() does modification only when LLE_CREATE is specified. Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing lla_lookup() without LLE_CREATE flag. MFC r260217: Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with LLE_CREATE flag.
* Revert MFC of r258821 - it was already handled by MFC of r239672.peter2014-01-081-4/+2
| | | | Pointy hat to: peter
* MFC r259943:tuexen2014-01-072-3/+3
| | | | Address some warnings which showed up on the userland version.
* MFC r258821 - fix tcp simultaneous closepeter2014-01-071-2/+4
| | | | PR: kern/99188
* Merge r260188 from head:glebius2014-01-051-0/+6
| | | | | | | Fix regression from r249894. Now we pass "gw" as argument to if_output method, thus for multicast case we need it to point at "dst". PR: 185395
* MFC r259906: Draft-ietf-tcpm-initcwnd-05 became RFC6928.pluknet2014-01-021-2/+2
|
* MFC r259839:dim2013-12-281-0/+4
| | | | | | In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever KTR is defined, so put it between #ifdef KTR guards. This avoids a warning about a unused function if KTR is not enabled.
* MFC r258574:tuexen2013-12-032-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only initialize some mutexes for the default VNET. In r208160, sctp_it_ctl was made a global variable, across all VNETs. However, sctp_init() is called for every VNET that is created. This results in the same global mutexes which are part of sctp_it_ctl being initialized. This can result in crashes if many jails are created. To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo (see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html ) Witness will warn about the same mutex being initialized. Fix the problem by only initializing these mutexes in the default VNET. MFC r258765: In http://svnweb.freebsd.org/changeset/base/258221 I introduced a bug which initialized global locks whenever the SCTP stack initialized. This was fixed in http://svnweb.freebsd.org/changeset/base/258574 by rodrigc@. He just initialized the locks for the default vnet. This fix reverts to the old behaviour before r258221, which explicitly makes sure it is only called once, because this works also on other platforms. Approved by: re@ (gjb)
* MFC r256556:tuexen2013-11-2111-112/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a buggy comparision when setting manually the path MTU. After fixing, the comparision would have become redundant. Thanks to Andrew Galante for reporting the issue. MFC r257272: Fix compilation if SCTP_DONT_DO_PRIVADDR_SCOPE is defined. The issue was reported by Andrew Galante. MFC r257274: Fix the value of *optlen when calling getsockopt() for SCTP_REMOTE_UDP_ENCAPS_PORT. This issue was reported by Andrew Galante. MFC r257359: Terminate a debug output with a \n. MFC r257555: Changes from upstream to improve compilation when INET or INET6 or none of them is defined. MFC r257574: Unlock the lock before destroying it. This issue was reported by Andrew Galante. MFC r257800: Use htons()/ntohs() appropriately. These issues were reported by Andrew Galante. MFC r257803: Make sure that we don't try to build an ASCONF-ACK chunk larger than what fits in the the mbuf cluster. This issue was reported by Andrew Galante. MFC r257804: Get rid of the artification limitation enforced by SCTP_AUTH_RANDOM_SIZE_MAX. This was suggested by Andrew Galante. MFC r258221: Cleanups which result in fixes which have been made upstream and where partially suggested by Andrew Galante. There is no functional change in FreeBSD. MFC r258224: When determining if an address belongs to an stcb, take the address family into account for wildcard bound endpoints. MFC r258228: Remove a stray write operation. MFC r258235: Use SCTP_PR_SCTP_TTL when the user provides a positive timetolive in sctp_sendmsg(). Approved by: re@
* MFC r256920:andre2013-10-291-4/+7
| | | | | | | | | | | | | | | | | | | | | | | The TCP delayed ACK logic isn't aware of LRO passing up large aggregated segments thinking it received only one segment. This causes it to enable the delay the ACK for 100ms to wait for another segment which may never come because all the data was received already. Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes us further away from acking every other packet; b) it introduces additional delay in responding to the sender. The latter is especially bad because it is in the nature of LRO to aggregated all segments of a burst with no more coming until an ACK is sent back. Change the delayed ACK logic to detect LRO segments by being larger than the MSS for this connection and issuing an immediate ACK for them to keep the ACK clock ticking without interruption. Reported by: julian, cperciva Tested by: cperciva Reviewed by: lstewart Approved by: re (glebius)
* When processing ACK in tcp_do_segment, use sbcut_locked() instead ofglebius2013-10-091-2/+5
| | | | | | | | | | | sbdrop_locked() to cut acked mbufs from the socket buffer. Free this chain a batch manner after the socket buffer lock is dropped. This measurably reduces contention on socket buffer. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)
* Add a separate translator for headers passed to the TCP probes in themarkj2013-10-021-4/+4
| | | | | | | | | input path. These probes get some of the fields in host order, whereas the output probes get them in network order, so a single translator isn't enough. This workaround ensures that the problem is essentially invisble to users: none of the probe arguments or their fields have changed. Approved by: re (hrs)
* Introduce spares in the TCP syncache and timewait structuresbz2013-09-212-1/+4
| | | | | | | | | so that fixed TCP_SIGNATURE handling can later be merged. This is derived from follow-up work to SVN r183001 posted to net@ on Sep 13 2008. Approved by: re (gjb)
* Unregister inet/inet6 pfil hooks on vnet destroy.trociny2013-09-131-0/+5
| | | | | Discussed with: andre Approved by: re (rodrigc)
* Fix the aborting of association with the iterator using an emptytuexen2013-09-091-37/+35
| | | | | | | user initiated error cause (using SCTP_ABORT|SCTP_SENDALL). Approved by: re (delphij) MFC after: 1 week
* Relese the interface in the last.trociny2013-09-081-1/+1
| | | | | Reviewed by: glebius Approved by: re (kib)
* When computing the partial delivery point, take thetuexen2013-09-071-3/+2
| | | | | | receiver socket buffer size correctly into account. MFC after: 1 week
* Use LIST_FOREACH_SAFE() instead of doing it by hand.jhb2013-09-051-7/+5
|
* Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. Thisjhb2013-09-051-4/+4
| | | | | | | | | matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week
* Remove unused code and sort variables declarations.ae2013-09-051-8/+2
| | | | | PR: kern/181822 MFC after: 1 week
* Remove redundant field pr_sctp_on.tuexen2013-09-035-13/+3
| | | | MFC after: 1 week
* Use uint16_t instead of in_port_t for consistency with the SCTP code.tuexen2013-09-021-1/+1
| | | | MFC after: 1 week
* All changes affect only SCTP-AUTH:tuexen2013-09-024-109/+29
| | | | | | | | | * Remove non working code related to SHA224. * Remove support for non-standardised HMAC-IDs using SHA384 and SHA512. * Prefer SHA256 over SHA1. * Minor cleanup. MFC after: 2 weeks
* Merge r254336 from user/np/cxl_tuning.np2013-08-282-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be. Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface. Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later. Reviewed by: andre
* Remove most of the remaining sysctl name list macros. They were onlyjhb2013-08-266-61/+0
| | | | | | | | ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery
* The second last argument of udp:::receive is supposed to contain themarkj2013-08-261-1/+1
| | | | | | connection state, not the IP header. X-MFC with: r254889
* Implement the ip, tcp, and udp DTrace providers. The probe definitions usemarkj2013-08-2513-21/+294
| | | | | | | | | dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month
* Provide human readable debug output.tuexen2013-08-251-2/+2
|
* For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits.andre2013-08-251-1/+1
| | | | | | The upper 32bits are not occupied for now. Sponsored by: The FreeBSD Foundation
OpenPOWER on IntegriCloud