summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect ismelifaro2012-10-104-6/+48
| | | | | | | | | | | | | | | | | | enabled. This eliminates one mtx_lock() per each routing lookup thus improving performance in several cases (routing to directly connected interface or routing to default gateway). Icmp redirects should not be used to provide routing direction nowadays, even for end hosts. Routers should not use them too (and this is explicitly restricted in IPv6, see RFC 4861, clause 8.2). Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every AF_INET routing table in given VNET instance on drop_redirect sysctl change. This change is part of bigger patch eliminating rte locking. Sponsored by: Yandex LLC MFC after: 2 weeks
* Revert previous commit...kevlo2012-10-105-5/+5
| | | | Pointyhat to: kevlo (myself)
* Prefer NULL over 0 for pointerskevlo2012-10-095-5/+5
|
* After r241245 it appeared that in_delayed_cksum(), which still expectsglebius2012-10-084-6/+9
| | | | | | | | | | | | | | host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().
* No reason to play with IP header before calling sctp_delayed_cksum()glebius2012-10-081-2/+0
| | | | with offset beyond the IP header.
* A step in resolving mess with byte ordering for AF_INET. After this change:glebius2012-10-063-53/+62
| | | | | | | | | | | | | | | | | | | - All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated. Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)
* There is a complex race in in_pcblookup_hash() and in_pcblookup_group().glebius2012-10-022-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both functions need to obtain lock on the found PCB, and they can't do classic inter-lock with the PCB hash lock, due to lock order reversal. To keep the PCB stable, these functions put a reference on it and after PCB lock is acquired drop it. If the reference was the last one, this means we've raced with in_pcbfree() and the PCB is no longer valid. This approach works okay only if we are acquiring writer-lock on the PCB. In case of reader-lock, the following scenario can happen: - 2 threads locate pcb, and do in_pcbref() on it. - These 2 threads drop the inp hash lock. - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock, does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 2 aforementioned threads acquire reader lock on the pcb and run in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues, second gets 0 and considers pcb freed, returns. - The thread that got 1 continutes working with detached pcb, which later leads to panic in the underlying protocol level. To plumb that problem an additional INPCB flag introduced - INP_FREED. We check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend that that was the last reference. Discussed with: rwatson, jhb Reported by: Vladimir Medvedkin <medved rambler-co.ru>
* carp_send_ad() should never return without rescheduling next run.glebius2012-09-291-11/+6
|
* Fix bug in TCP_KEEPCNT setting, which slipped in in the last roundglebius2012-09-271-8/+14
| | | | | | | | | | of reviewing of r231025. Unlike other options from this family TCP_KEEPCNT doesn't specify time interval, but a count, thus parameter supplied doesn't need to be multiplied by hz. Reported & tested by: amdmi3
* Whitespace change.tuexen2012-09-231-2/+1
| | | | MFC after: 3 days
* Declare a static function as such.tuexen2012-09-231-1/+1
| | | | MFC after: 3 days
* Fix a bug related to handling Re-config chunks. It is not true thattuexen2012-09-221-17/+0
| | | | | | the association can be removed if the socket is gone. MFC after: 3 days
* Small cleanups. No functional change.tuexen2012-09-222-53/+18
| | | | MFC after: 10 days
* Fix typo: s/pakcet/packetkevlo2012-09-201-1/+1
|
* s/teh/the/geadler2012-09-141-1/+1
| | | | | Approved by: cperciva MFC after: 3 days
* Small cleanups. No functional change.tuexen2012-09-141-8/+3
| | | | MFC after: 10 days
* o Create directory sys/netpfil, where all packet filters shouldglebius2012-09-1427-17622/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reside, and move there ipfw(4) and pf(4). o Move most modified parts of pf out of contrib. Actual movements: sys/contrib/pf/net/*.c -> sys/netpfil/pf/ sys/contrib/pf/net/*.h -> sys/net/ contrib/pf/pfctl/*.c -> sbin/pfctl contrib/pf/pfctl/*.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/*.4 -> share/man/man4 contrib/pf/pfctl/*.5 -> share/man/man5 sys/netinet/ipfw -> sys/netpfil/ipfw The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice. Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd. The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match. Discussed with: bz, luigi
* Whitespace changes.tuexen2012-09-091-6/+4
| | | | MFC after: 10 days
* Whitespace cleanup.tuexen2012-09-081-1/+0
| | | | MFC after: 10 days
* Merge the projects/pf/head branch, that was worked on for last six months,glebius2012-09-085-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>
* Don't include a structure containing a flexible array in anothertuexen2012-09-075-24/+13
| | | | | | structure. MFC after: 10 days
* Get rid of a gcc'ism.tuexen2012-09-061-4/+3
| | | | MFC after: 10 days
* Using %p in a format string requires a void *.tuexen2012-09-059-47/+47
| | | | MFC after: 10 days
* Use the consistenly the size of a variable. This helps to keep the codetuexen2012-09-041-21/+21
| | | | | | simpler for the userland implementation. MFC after: 3 days
* Whitespace change.tuexen2012-09-041-1/+1
| | | | MFC after: 3 days
* Introduce new link-layer PFIL hook V_link_pfil_hook.melifaro2012-09-044-30/+149
| | | | | | | | | | | | | Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit. Remove ipfw headers from ether/bridge code since they are unneeded now. Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet. MFC after: 3 weeks
* Provide a sysctl switch that allows to install ARP entriesglebius2012-09-031-3/+5
| | | | | | | | with multicast bit set. FreeBSD refuses to install such entries since 9.0, and this broke installations running Microsoft NLB, which are violating standards. Tested by: Tarasov Oleg <oleg_tarasov sg-tea.com>
* Fix a typo which results in RTT to be off by a factor of 10, if the RTT istuexen2012-09-021-1/+1
| | | | | | larger than 1 second. MFC after: 3 days
* Mark the ipfw interface type as not being ether. This fixes an issueeadler2012-09-011-2/+2
| | | | | | | | | | | | where uuidgen tried to obtain a ipfw device's mac address which was always zero. PR: 170460 Submitted by: wxs Reviewed by: bdrewery Reviewed by: delphij Approved by: cperciva MFC after: 1 week
* This small change takes care of a race conditionrrs2012-08-251-0/+30
| | | | | | | | | | | | | that can occur when both sides close at the same time. If that occurs, without this fix the connection enters FIN1 on both sides and they will forever send FIN|ACK at each other until the connection times out. This is because we stopped processing the FIN|ACK and thus did not advance the sequence and so never ACK'd each others FIN. This fix adjusts it so we *do* process the FIN properly and the race goes away ;-) MFC after: 1 month
* Correctly handle the case where an inp has already been dropped by the timenp2012-08-212-5/+7
| | | | | | the TOE driver reports that an active open failed. toe_connect_failed is supposed to handle this but it should be provided the inpcb instead of the tcpcb which may no longer be around.
* Though I disagree, I conceed to jhb & Rui. Noterrs2012-08-191-1/+1
| | | | | | | | that we still have a problem with this whole structure of locks and in_input.c [it does not lock which it should not, but this *can* lead to crashes]. (I have seen it in our SQA testbed.. besides the one with a refcnt issue that I will have SQA work on next week ;-)
* Ok jhb, lets move the ifa_free() down to the bottom torrs2012-08-171-1/+1
| | | | | | | | assure that *all* tables and such are removed before we start to free. This won't protect the Hash in ip_input.c but in theory should protect any other uses that *do* use locks. MFC after: 1 week (or more)
* The TCP PAWS fix for kernels with fast tick rates (r231767) changed the TCPlstewart2012-08-171-4/+6
| | | | | | | | | | | | | | | | | | | timestamp related stack variables to reference ms directly instead of ticks. The h_ertt(4) Khelp module relies on TCP timestamp information in order to calculate its enhanced RTT estimates, but was not updated as part of r231767. Consequently, h_ertt has not been calculating correct RTT estimates since r231767 was comitted, which in turn broke all delay-based congestion control algorithms because they rely on the h_ertt RTT estimates. Fix the breakage by switching h_ertt to use tcp_ts_getticks() in place of all previous uses of the ticks variable. This ensures all timestamp related variables in h_ertt use the same units as the TCP stack and therefore results in meaningful comparisons and RTT estimate calculations. Reported & tested by: Naeem Khademi (naeemk at ifi uio no) Discussed with: bz MFC after: 3 days
* Its never a good idea to double free the samerrs2012-08-161-1/+1
| | | | | | address. MFC after: 1 week (after the other commits ahead of this gets MFC'd)
* s/lenght/length/ in commentsluigi2012-08-072-3/+3
|
* move functions outside the SYSBEGIN/SYSEND blockluigi2012-08-061-15/+22
| | | | | | (SYSBEGIN/SYSEND are specific to ipfw/dummynet and are used to emulate sysctl on platforms that do not have them, and they work by creating an array which contains all the sysctl-ed symbols.)
* use FREE_PKT instead of m_freem to free an mbuf.luigi2012-08-061-1/+1
| | | | | | The former is the standard form used in ipfw/dummynet, so that it is easier to remap it to different memory managers depending on the platform.
* Fix a bug found by dim@:tuexen2012-08-061-1/+1
| | | | | | | | Don't use an uninitilized variable, if INVARIANTS is on and an illegal packet with destination 0 is received. MFC after: 3 days X-MFC with: 238003
* In tcp timers, check INP_DROPPED flag a little later, aftertrociny2012-08-051-9/+38
| | | | | | | | | | | callout_deactivate(), so if INP_DROPPED is set we return with the timer active flag cleared. For me this fixes negative keep timer values reported by `netstat -x' for connections in CLOSE state. Approved by: net (silence) MFC after: 2 weeks
* Fix a refcount issue. The called only decrements is stcb is NULL.tuexen2012-08-051-4/+3
| | | | | MFC after: 3 days Discussed with: rrs
* Fix a bug reported by Simon L. B. Nielsen:tuexen2012-08-041-2/+0
| | | | | | | | If an SCTP endpoint receives an ASCONF with a wildcard lookup address and incorrect verification tag, the system crashes. MFC after: 3 days.
* Testing an interface property should depend on the interface, nottuexen2012-08-041-15/+13
| | | | | | on an address. MFC after: 3 days
* Fix races between in_lltable_prefix_free(), lla_lookup(),glebius2012-08-022-42/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via |= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
* replace __unused with a portable construct;luigi2012-08-021-3/+4
| | | | fix a couple of signed/unsigned warnings.
* replace inet_ntoa_r with the more standard inet_ntop().luigi2012-08-012-10/+12
| | | | | | | | As discussed on -current, inet_ntoa_r() is non standard, has different arguments in userspace and kernel, and almost unused (no clients in userspace, only net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c in the kernel)
* add a cast to avoid a signed/unsigned warning (to be removedluigi2012-08-011-1/+1
| | | | when we will have TUNABLE_UINT constructors)
* Some more whitespace cleanup.glebius2012-08-012-13/+13
|
* Some style(9) and whitespace changes.glebius2012-07-312-67/+59
| | | | Together with: Andrey Zonov <andrey zonov.org>
* nobody uses this file except the userspace ipfw code, but the castluigi2012-07-311-1/+1
| | | | | | | of a pointer to an integer needs a cast to prevent a warning for size mismatch. MFC after: 1 week
OpenPOWER on IntegriCloud