summaryrefslogtreecommitdiffstats
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* Prefer NULL over 0 for pointerskevlo2012-10-0920-20/+20
|
* A step in resolving mess with byte ordering for AF_INET. After this change:glebius2012-10-062-27/+1
| | | | | | | | | | | | | | | | | | | - All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated. Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)
* MFV: libpcap 1.3.0.delphij2012-10-051-1/+37
| | | | MFC after: 4 weeks
* Remove the M_NOWAIT from bridge_rtable_init as it isn't needed. The functionthompsa2012-10-041-8/+3
| | | | | | return value is not even checked and could lead to a panic on a null sc_rthash. MFC after: 2 weeks
* Cast through void * to silence compiler warningemaste2012-10-031-7/+8
| | | | | | | | The base netmap pointer and offsets involved are provided by the kernel side of the netmap interface and will have appropriate alignment. Sponsored by: ADARA Networks MFC After: 2 weeks
* Rename the module for 'device enc' to "if_enc" to avoid conflicting withjhb2012-10-021-2/+2
| | | | | | | | | the CAM "enc" peripheral (part of ses(4)). Previously the two modules used the same name, so only one was included in a linked kernel causing enc0 to not be created if you added IPSEC to GENERIC. The new module name follows the pattern of other network interfaces (e.g. "if_loop"). MFC after: 1 week
* The drbr(9) API appeared to be so unclear, that most drivers inglebius2012-09-281-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times. o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists. o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes. o Drivers handle their stats theirselves: if_obytes, if_omcasts. o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves. o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself. o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well. o mxge(4) just no longer needs NO_SLOW_STATS define. o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware. Reviewed by: jfv, gnn
* - In the bridge_enqueue() do success/error accounting forglebius2012-09-261-5/+4
| | | | | each fragment, not only once. - In the GRAB_OUR_PACKETS() macro do increase if_ibytes.
* Correct misspelling in debug output.emaste2012-09-261-1/+1
|
* Revert part of an earlier patch attempt that snuck in with r240938.emaste2012-09-251-1/+0
|
* Avoid INVARIANTS panic destroying an in-use tap(4)emaste2012-09-252-5/+2
| | | | | | | | | | | | | | The requirement (implied by the KASSERT in tap_destroy) that the tap is closed isn't valid; destroy_dev will block in devdrn while other threads are in d_* functions. Note: if_tun had the same issue, addressed in SVN revisions r186391, r186483 and r186497. The use of the condvar there appears to be redundant with the functionality provided by destroy_dev. Sponsored by: ADARA Networks Reviewed by: dwhite MFC after: 2 weeks
* Remove an incorrect commentemaste2012-09-251-1/+0
|
* Convert lagg(4) to use if_transmit instead of if_start.glebius2012-09-201-24/+32
| | | | In collaboration with: thompsa, sbruno, fabient
* Utilize Jenkins hash with random seed for source nodes storage.glebius2012-09-201-22/+0
|
* Add missing break.glebius2012-09-201-0/+1
| | | | Pointy hat to: glebius
* Fix build, pass the pointy hat please.glebius2012-09-181-1/+1
|
* Make ruleset anchors in pf(4) reentrant. We've got two problems here:glebius2012-09-181-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Ruleset parser uses a global variable for anchor stack. 2) When processing a wildcard anchor, matching anchors are marked. To fix the first one: o Allocate anchor processing stack on stack. To make this allocation as small as possible, following measures taken: - Maximum stack size reduced from 64 to 32. - The struct pf_anchor_stackframe trimmed by one pointer - parent. We can always obtain the parent via the rule pointer. - When pf_test_rule() calls pf_get_translation(), the former lends its stack to the latter, to avoid recursive allocation 32 entries. The second one appeared more tricky. The code, that marks anchors was added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea is to enable the "quick" keyword on an anchor rule. The feature isn't documented anywhere. The most obscure part of the 1.516 was that code examines the "match" mark on a just processed child, which couldn't be put here by current frame. Since this wasn't documented even in the commit message and functionality of this is not clear to me, I decided to drop this examination for now. The rest of 1.516 is redone in a thread safe manner - the mark isn't put on the anchor itself, but on current stack frame. To avoid growing stack frame, we utilize LSB from the rule pointer, relying on kernel malloc(9) returning pointer aligned addresses. Discussed with: dhartmei
* - Add $FreeBSD$ to allow modifications to this file.glebius2012-09-181-2/+2
| | | | - Move $OpenBSD$ to a more standard place.
* o Create directory sys/netpfil, where all packet filters shouldglebius2012-09-144-0/+2387
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reside, and move there ipfw(4) and pf(4). o Move most modified parts of pf out of contrib. Actual movements: sys/contrib/pf/net/*.c -> sys/netpfil/pf/ sys/contrib/pf/net/*.h -> sys/net/ contrib/pf/pfctl/*.c -> sbin/pfctl contrib/pf/pfctl/*.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/*.4 -> share/man/man4 contrib/pf/pfctl/*.5 -> share/man/man5 sys/netinet/ipfw -> sys/netpfil/ipfw The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice. Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd. The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match. Discussed with: bz, luigi
* Merge the projects/pf/head branch, that was worked on for last six months,glebius2012-09-081-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>
* Fix the build broken by r240099.melifaro2012-09-041-0/+2
| | | | | | Hide link_pfil_hook under _KERNEL macro. MFC after: 3 weeks
* Introduce new link-layer PFIL hook V_link_pfil_hook.melifaro2012-09-043-206/+62
| | | | | | | | | | | | | Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit. Remove ipfw headers from ether/bridge code since they are unneeded now. Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet. MFC after: 3 weeks
* - Move jenkins.h to jenkins_hash.cglebius2012-09-041-3/+3
| | | | | | | | | | - Provide missing function that can do hashing of arbitrary sized buffer. - Refetch lookup3.c and do only minimal edits to it, so that diff between our jenkins_hash.c and lookup3.c is minimal. - Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h. - Document these functions in hash(9) Obtained from: http://burtleburtle.net/bob/c/lookup3.c
* Change bridge(4) to use if_transmit for forwarding packets to underlyingglebius2012-09-031-33/+33
| | | | | | interfaces instead of queueing. Tested by: ray
* In ifc_alloc_unit():glebius2012-08-301-6/+15
| | | | | | | | - In the !wildcard case, return ENOSPC instead of confusing EEXIST in case if ifc->ifc_maxunit reached. - Fix unit leak, that I've introduced in previous revision. Submitted by: Daan Vreeken <Daan vitsch.nl>
* Fix a silly grammar bogon.jhb2012-08-211-1/+1
| | | | Submitted by: Stephen McKay
* Refine the changes made in r208212 to avoid bogus failures fromjhb2012-08-201-13/+21
| | | | | | | | | | | | | if_delmulti() when clearing the configuration for a subinterface when the parent interface is being detached. The current code was still triggering an assertion in if_delmulti() due to the parent interface being partially detached. Fix this by not calling if_delmulti() at all if the parent interface is being detached. Warn if if_delmulti() fails when the parent is not being detached (but similar to 208212, still proceed with tearing down the vlan state). Tested by: ae@ MFC after: 1 month
* Unexpand a couple of TAILQ_FOREACH()s.jhb2012-08-171-4/+1
|
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonkib2012-08-051-0/+1
| | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
* Fix races between in_lltable_prefix_free(), lla_lookup(),glebius2012-08-023-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via |= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
* The llentry_update() is used only by flowtable and the latterglebius2012-08-023-24/+15
| | | | | always passes NULL pointer to it. Thus, code can be simplified and function renamed to llentry_alloc() to match rtalloc().
* Some more whitespace cleanup.glebius2012-08-011-5/+5
|
* Some style(9) and whitespace changes.glebius2012-07-312-16/+17
| | | | Together with: Andrey Zonov <andrey zonov.org>
* Hardcode the loopback rx/tx checkum options for IPv6 to on withoutbz2012-07-281-2/+26
| | | | | | | | | | | | | | | | checking. This allows the FreeBSD 9.1 release process to move forward. Work around the problem that loopback connections to local addresses not on loopback interfaces and not on interfaces w/ IPv6 checksum offloading enabled would not work. A proper fix to allow us to disable the "checksum offload" on loopback for testing, measurements, ... as we allow for IPv4 needs to put in place later. Reported by: tuexen, Matthew Seaman (m.seaman infracaninophile.co.uk) Reported by: Mike Andrews (mandrews bit0.com), kib, ... PR: kern/170070 MFC after: 1 day X-MFC after: re approval
* Permit changing MTU in 6to4 relay.melifaro2012-07-151-2/+14
| | | | | | | | | | | | This behavior is recommended by RFC 4213 clause 3.2. Sometimes fragmentation is the least evil. For example, some Linux IPVS kernels forwards ICMPv6 checksums to real servers incorrectly. Reviewed by: hrs(previous version) Approved by: kib(mentor) MFC after: 1 week
* Simplify error caseemaste2012-07-101-4/+4
| | | | Submitted by: thompsa@
* Plug potential mbuf leak when bridging fragmentsemaste2012-07-101-0/+2
| | | | | | | If an error occurs when transmitting one mbuf in a chain of fragments, free the subsequent fragments instead of leaking them. Sponsored by: ADARA Networks
* In epair_clone_destroy(), when destroying the second half, we have totrociny2012-07-091-18/+20
| | | | | | | | | | | | | switch to its vnet before calling ether_ifdetach(). Otherwise if the second half resides in a different vnet, if_detach() silently fails leaving a stale pointer in V_ifnet list, and the system crashes trying to access this pointer later. Another solution could be not to allow to destroy epair unless both ends are in the home vnet. Discussed with: bz Tested by: delphij
* Restore error handling lost in r191603emaste2012-07-091-1/+1
| | | | | | This was missed in the change from IFQ_ENQUEUE to if_transmit. Sponsored by: ADARA Networks
* Implement SIOCGIFMEDIA for if_tap(4)emaste2012-07-061-4/+22
| | | | | | | | | Appease certain if_tap(4) consumers by providing simulated Ethernet media status. DragonFly commit 70d9a675bf5441cc854a843ead702d08928c37f3 Obtained from: DragonFly BSD
* When ip_output()/ip6_output() is supplied a struct route *ro argument,glebius2012-07-042-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)
* Add the same check as vlan(4) where we ignore the ifnet departure event if thethompsa2012-06-301-0/+3
| | | | | | | | interface is just being renamed. PR: kern/169557 Submitted by: Mark Johnston MFC after: 3 days
* Hold GIF_LOCK() for almost all of gif_start(). It is required to be heldjhb2012-06-292-19/+0
| | | | | | | | | across in_gif_output() and in6_gif_output() anyway, and once it is held across those it might as well be held for the entire loop. This simplifies the code and removes the need for the custom IFF_GIF_WANTED flag (which belonged in the softc and not as an IFF_* flag anyway). Tested by: Vincent Hoffman vince unsane co uk
* - Updated TOE support in the kernel.np2012-06-192-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m | grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp | grep toe # sockstat -46c | grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
* Fix comment to better reflect how we arerrs2012-06-121-6/+11
| | | | | cheating and using the csum_data. Also fix style issues with the comments.
* Note to self. Have morning coffee *before* committing things.rrs2012-06-121-4/+6
| | | | | | There is no mac_addr in the mbuf for BSD.. cheat like we are supposed to and use the csum field since our friend the gif tunnel itself will never use offload.
* Opps forgot to commit the flag.rrs2012-06-121-1/+1
|
* Allow a gif tunnel to be used with ALTq.rrs2012-06-121-46/+102
| | | | Reviewed by: gnn
* Fix a panic I introduced in r234487, the bridge softc pointer is set to nullthompsa2012-06-111-14/+22
| | | | | | | | early in the detach so rearrange things not to explode. Reported by: David Roffiaen, Gustau Perez Querol Tested by: David Roffiaen MFC after: 3 days
* Fix typo introduced in r236559.melifaro2012-06-091-1/+1
| | | | | Pointed by: bcr Approved by: kib(mentor)
OpenPOWER on IntegriCloud