summaryrefslogtreecommitdiffstats
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* MFS r304704: Update iflib to support more NIC designsshurd2016-08-243-181/+410
| | | | | | | | | | | | | | - Move group task queue into kern/subr_gtaskqueue.c - Change intr_enable to return an int so it can be detected if it's not implemented - Allow different TX/RX queues per set to be different sizes - Don't split up TX mbufs before transmit - Allow a completion queue for TX as well as RX - Pass the RX budget to isc_rxd_available() to allow an earlier return and avoid multiple calls Approved by: re (glb), davidch Requested by: shurd
* Merge r303263:glebius2016-08-021-0/+13
| | | | | | | Partially revert r257696/r257713, which have an issue with writing to user controlled address. Restore the old code that emulated OSIOCGIFCONF in if.c. Approved by: re (kib)
* MFC r302476:pfg2016-07-132-2/+3
| | | | | | | | | | ng_mppc(4):: basic code readability cleanups. In particular use __unreachable() to appease static analyzers. No functional change. CID: 1356591 Approved by: re (gjb)
* MFC r302439:bdrewery2016-07-121-3/+3
| | | | | | iflib: Fix typo in 'iflib_rx_miss_bufs' sysctl name Approved by: re (gjb)
* Add variable declaration missing in r302372.nwhitehorn2016-07-061-0/+1
| | | | | Submitted by: andrew Approved by: re (gjb, kib)
* Replace a number of conflations of mp_ncpus and mp_maxid with eithernwhitehorn2016-07-062-5/+5
| | | | | | | | | | | | | | | | | | | mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)
* Several device drivers call if_alloc() and then do further checks andbz2016-06-291-0/+3
| | | | | | | | | | | | | | | | will cal if_free() in case of conflict, error, .. if_free() however sets the VNET instance from the ifp->if_vnet which was not yet initialized but would only in if_attach(). Fix this by setting the curvnet from where we allocate the interface in if_alloc(). if_attach() will later overwrite this as needed. We do not set the home_vnet early on as we only want to prevent the if_free() panic but not change any of the other housekeeping, e.g., triggered through ifioctl()s. Reviewed by: brooks Approved by: re (gjb) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7010
* Update pf(4) and pflog(4) to survive basic VNET testing, which includesbz2016-06-231-1/+3
| | | | | | | | | | | | | | proper virtualisation, teardown, avoiding use-after-free, race conditions, no longer creating a thread per VNET (which could easily be a couple of thousand threads), gracefully ignoring global events (e.g., eventhandlers) on teardown, clearing various globally cached pointers and checking them before use. Reviewed by: kp Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6924
* Add spares to struct ifnet and socket for packet pacing and/or generalnp2016-06-231-0/+2
| | | | | | | | | | use. Update comments regarding the spare fields in struct inpcb. Bump __FreeBSD_version for the changes to the size of the structures. Reviewed by: gnn@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications
* Add more fields to if_debug.c for ddb(4) 'show ifnet'; resortbz2016-06-221-3/+10
| | | | | | | | | | some fields to match the order in the struct. Especially needed if_pf_kif to do pf(4) VNET debugging. Approved by: re (marius) Obtained from: projects/vnet MFC after: 1 week Sponsored by: The FreeBSD Foundation
* After r302054 unloading an network interface driver on a kernelbz2016-06-221-1/+8
| | | | | | | | | | | | without VIMAGE support would dereference a NULL point unconditionally leading to a panic. Wrap the entire VIMAGE related code with #ifdefs rather than just the decision making part to save an extra bit of resources. Reported by: np Sponsored by: The FreeBSD Foundation MFC After: 13 days Approved by: re (marius)
* Get closer to a VIMAGE network stack teardown from top to bottom ratherbz2016-06-2113-45/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747
* pf: Filter on and set vlan PCP valueskp2016-06-171-1/+7
| | | | | | | | | | | Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to filter on it. Reviewed by: allanjude, araujo Approved by: re (gjb) Obtained from: OpenBSD (mostly) Differential Revision: https://reviews.freebsd.org/D6786
* iflib: Improve cleanup on iflib_queues_alloc error pathcem2016-06-071-5/+16
| | | | | | | | | Fix some memory leaks. Some may remain. Reported by: Coverity Discussed with: mmacy CIDs: 1356036, 1356037, 1356038 Sponsored by: EMC / Isilon Storage Division
* iflib: Fix potential leak in iflib_if_transmitcem2016-06-071-2/+2
| | | | | | | | | | | | | | Due to an accidental mismatch between allocation and release in the slow path of iflib_if_transmit, if a caller passed 9-16 mbufs to the routine, the mbuf array would be leaked. Fix the mismatch by removing the magic numbers in favor of nitems() on the stack array. According to mmacy, this leak is unlikely. Reported by: Coverity Discussed with: mmacy CID: 1356040 Sponsored by: EMC / Isilon Storage Division
* ng_mppc(4): Bring netgraph(3) MPPC compression support.pfg2016-06-073-0/+645
| | | | | | | | | | | | | | | Support for compression has been available from July 2007 but it was never imported due to concerns with patents once held by STAC/HiFn. The issues have clearly been resolved so bring it in now. Special thanks to Brett Glass for preserving the code and pointing documentation for the expiration case. Obtained from: mav (through Brett Glass) Relnotes: yes MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6739
* net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash propertiessephe2016-06-072-3/+2
| | | | | | Reviewed by: hps, erj, tuexen Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6688
* After tearing down the interface per-"domain" bits, set the data areabz2016-06-061-1/+3
| | | | | | | | | | to NULL to avoid it being mis-treated on a possible re-attach but also to get a clean NULL pointer derefence in case of errors due to unexpected race conditions elsewhere in the code, e.g., callouts. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Similarly to r301505 protect the removal of the ifa from the if_addrheadbz2016-06-061-1/+4
| | | | | | | | by a lock (as well as the check that the list is not empty). Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* In if_purgeaddrs() we cannot hold the lock over the entire loopbz2016-06-061-0/+3
| | | | | | | | | | due to called functions (as in other parts of the stack, leave a comment). Put around a lock the removal of the ifa from the list however to reduce the possible race with other places. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* SYSINIT functions do not return a value; switch to void, removebz2016-06-061-6/+4
| | | | | | | | the return value, and mark the unused argument __unused. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Provide a public interface to rt_flushifroutes which takes the addressbz2016-06-062-0/+10
| | | | | | | | family as an argument as well. This will be used to cleanup individual protocols during VNET teardown. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation
* Make the KASSERT message more helpful by also printing the ifp informationbz2016-06-061-1/+2
| | | | | | | | which we are asserting. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Add support to priority code point (PCP) that is an 3-bit fieldaraujo2016-06-063-9/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | which refers to IEEE 802.1p class of service and maps to the frame priority level. Values in order of priority are: 1 (Background (lowest)), 0 (Best effort (default)), 2 (Excellent effort), 3 (Critical applications), 4 (Video, < 100ms latency), 5 (Video, < 10ms latency), 6 (Internetwork control) and 7 (Network control (highest)). Example of usage: root# ifconfig em0.1 create root# ifconfig em0.1 vlanpcp 3 Note: The review D801 includes the pf(4) part, but as discussed with kristof, we won't commit the pf(4) bits for now. The credits of the original code is from rwatson. Differential Revision: https://reviews.freebsd.org/D801 Reviewed by: gnn, adrian, loos Discussed with: rwatson, glebius, kristof Tested by: many including Matthew Grooms <mgrooms__shrew.net> Obtained from: pfSense Relnotes: Yes
* Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.bz2016-06-035-6/+200
| | | | | | | | | | | | | | | | | | | | Add accessor functions to toggle the state per VNET. The base system (vnet0) will always enable itself with the normal registration. We will share the registered protocol handlers in all VNETs minimising duplication and management. Upon disabling netisr processing for a VNET drain the netisr queue from packets for that VNET. Update netisr consumers to (de)register on a per-VNET start/teardown using VNET_SYS(UN)INIT functionality. The change should be transparent for non-VIMAGE kernels. Reviewed by: gnn (, hiren) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6691
* This change re-adds L2 caching for TCP and UDP, as originally added in D4306gnn2016-06-029-26/+67
| | | | | | | | | but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262
* In if_attachdomain1() there does not seem to be any reasonbz2016-05-281-2/+1
| | | | | | | | | | to use TRYLOCK rather than just acquire the lock, so just do that. Reviewed by: markj Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6578
* Change net.link.log_promisc_mode_change to a read-only tunablen_hibma2016-05-251-1/+1
| | | | | | | PR: 166255 Submitted by: eugen.grosbein.net Obtained from: hselasky MFC after: 3 days
* Allow an MTU of 65535 bytes to be set via TUN[SG]IFINFO. This requirestuexen2016-05-241-2/+2
| | | | | | | | | changing the type on the mtu field in struct tuninfo from short to unsigned short. This is used, for example, by packetdrill to test with MTUs up to the maximum value. Differential Revision: 6452
* sys/net: more spelling.pfg2016-05-191-2/+2
|
* Allow writing IP packets of length TUNMRU no matter if TUNSIFHEAD is settuexen2016-05-191-2/+5
| | | | or not.
* Rather than having the if_vmove() code intermixed in the vnet_destroy()bz2016-05-183-9/+17
| | | | | | | | | | | | | | function in vnet.c move it to if.c where it logically belongs and put it under a VNET_SYSUNINIT() call. To not change the current behaviour make sure it runs first thing during teardown. In the future this will allow us more flexibility on changing the order on when we want to get rid of interfaces. Stop exporting if_vmove() and make it file static. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6438
* Add a "vnet_state" field to struct vnet.bz2016-05-182-0/+5
| | | | | | | | | | | This is set to the SI_SUB_* value before executing any VNET_SYSINIT or VNET_SYSUNINT. While good for debugging especially VNET teardown problems having a chance to know at which level during teardown we are, it will also be used to identify to detcted a "stable state" (as in fully up and running) later on. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation
* Activate the NO_64BIT_ATOMICS code for mips and powerpcscottl2016-05-181-3/+5
|
* Remove assertions that don't make sense for the data type.scottl2016-05-181-2/+0
|
* Add a dummy VNET_SYSINIT that will make sure all VNETs started willbz2016-05-181-0/+10
| | | | | | | always end on SI_SUB_VNET_DONE. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation
* Split 'show vnets' into 'show vnet' and 'show all vnets'.bz2016-05-181-13/+31
| | | | | | | | While here adjust some db_printf format string. Document the two show commands in ddb.4. Sponsored by: The FreeBSD Foundation
* Make compile without INET or without IP support in the kernel by hidingbz2016-05-181-7/+18
| | | | | | variables and lro function calls behind approriate #ifdefs. Also move the #includes for "opt_*" to the place where they should be.
* Import the 'iflib' API library for network drivers. From the author:scottl2016-05-187-0/+6085
| | | | | | | | | | | | | | | "iflib is a library to eliminate the need for frequently duplicated device independent logic propagated (poorly) across many network drivers." Participation is purely optional. The IFLIB kernel config option is provided for drivers that want to transition between legacy and iflib modes of operation. ixl and ixgbe driver conversions will be committed shortly. We hope to see participation from the Broadcom and maybe Chelsio drivers in the near future. Submitted by: mmacy@nextbsd.org Reviewed by: gallatin Differential Revision: D5211
* Don't repeat the the word 'the'eadler2016-05-171-1/+1
| | | | | | | (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)
* Mark the unused arguments of various SYSINIT functions __unused.bz2016-05-171-3/+3
| | | | | MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* When handling SIOCSIFNAME ensure that the new interface name is NULtruckman2016-05-151-0/+5
| | | | | | terminated. Reject the rename attempt if the name is too long. MFC after: 1 week
* Add an EARLY_AP_STARTUP option to start APs earlier during boot.jhb2016-05-141-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix
* Allow silencing of 'promiscuous mode enabled/disabled' messages.n_hibma2016-05-121-4/+14
| | | | | | | PR: 166255 Submitted by: eugen.grosbein.net Obtained from: eugen.grosbein.net MFC after: 1 week
* Improve performance and functionality of the bitstring(3) apiasomers2016-05-041-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow for efficient searching of set or cleared bits starting from any bit offset within the bit string. Performance is improved by operating on longs instead of bytes and using ffsl() for searches within a long. ffsl() is a compiler builtin in both clang and gcc for most architectures, converting what was a brute force while loop search into a couple of instructions. All of the bitstring(3) API continues to be contained in the header file. Some of the functions are large enough that perhaps they should be uninlined and moved to a library, but that is beyond the scope of this commit. sys/sys/bitstring.h: Convert the majority of the existing bit string implementation from macros to inline functions. Properly protect the implementation from inadvertant macro expansion when included in a user's program by prefixing all private macros/functions and local variables with '_'. Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and bit_ffc() in terms of their "at" counterparts. Provide a kernel implementation of bit_alloc(), making the full API usable in the kernel. Improve code documenation. share/man/man3/bitstring.3: Add pre-exisiting API bit_ffc() to the synopsis. Document new APIs. Document the initialization state of the bit strings allocated/declared by bit_alloc() and bit_decl(). Correct documentation for bitstr_size(). The original code comments indicate the size is in bytes, not "elements of bitstr_t". The new implementation follows this lead. Only hastd assumed "elements" rather than bytes and it has been corrected. etc/mtree/BSD.tests.dist: tests/sys/Makefile: tests/sys/sys/Makefile: tests/sys/sys/bitstring.c: Add tests for all existing and new functionality. include/bitstring.h Include all headers needed by sys/bitstring.h lib/libbluetooth/bluetooth.h: usr.sbin/bluetooth/hccontrol/le.c: Include bitstring.h instead of sys/bitstring.h. sbin/hastd/activemap.c: Correct usage of bitstr_size(). sys/dev/xen/blkback/blkback.c Use new bit_alloc. sys/kern/subr_unit.c: Remove hard-coded assumption that sizeof(bitstr_t) is 1. Get rid of unrb.busy, which caches the number of bits set in unrb.map. When INVARIANTS are disabled, nothing needs to know that information. callapse_unr can be adapted to use bit_ffs and bit_ffc instead. Eliminating unrb.busy saves memory, simplifies the code, and provides a slight speedup when INVARIANTS are disabled. sys/net/flowtable.c: Use the new kernel implementation of bit-alloc, instead of hacking the old libc-dependent macro. sys/sys/param.h Update __FreeBSD_version to indicate availability of new API Submitted by: gibbs, asomers Reviewed by: gibbs, ngie MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6004
* sys/net*: minor spelling fixes.pfg2016-05-0320-23/+23
| | | | No functional change.
* Remove the most useful INET || INET6 check leftover from whenever,bz2016-05-031-3/+0
| | | | | | | doing nothing. MFC after: 1 week Sponsored by: The FreeBSD Foundation
* Complete the UDP tunneling of ICMP msgs to those protocolsrrs2016-04-281-1/+1
| | | | | | | | interested in having tunneled UDP and finding out about the ICMP (tested by Michael Tuexen with SCTP.. soon to be using this feature). Differential Revision: http://reviews.freebsd.org/D5875
* radix_mpath: Don't derefence a NULL pointer in for loop iterationcem2016-04-261-1/+1
| | | | | | | | | | | | | | | | | | It seems rn_dupedkey may be NULL, because of the NULL check inside the loop. (Also, the rt gets assigned from rn_dupedkey and NULL checked at top of loop.) However, the for-loop update condition happens before the top-of-loop check and dereferences 'rt' unconditionally. Instead, NULL-check before dereferencing. If rn_dupedkey cannot in fact be NULL, or something else protects this, feel free to revert this and add an ASSERT of some kind instead. This was introduced in r191080 (2009) and moved around slightly in r293657. Reported by: Coverity CID: 1348482 Sponsored by: EMC / Isilon Storage Division
* sys: extend use of the howmany() macro when available.pfg2016-04-261-1/+1
| | | | | We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.
OpenPOWER on IntegriCloud