summaryrefslogtreecommitdiffstats
path: root/sys/net/netisr.c
Commit message (Collapse)AuthorAgeFilesLines
* Changes to support crashdump analysis of netisr:rwatson2010-03-011-113/+52
| | | | | | | | | | | | | | | | | | | | | - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks
* Fix edge cases in several KASSERTs: use <= rather than < when testing thatrwatson2010-02-251-3/+3
| | | | | | | | | | counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days
* Export netisr configuration and statistics to userspace via sysctl(9).rwatson2010-02-221-0/+168
| | | | | MFC after: 1 week Sponsored by: Juniper Networks
* Mark various sysctls also as tunables.pjd2010-02-151-4/+4
| | | | | Reviewed by: rwatson MFC after: 1 week
* When warning about possible netisr configuration problems during boot,rwatson2009-12-231-4/+4
| | | | | | | report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days
* Refine netisr.c comments a bit.rwatson2009-12-231-20/+28
|
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+1
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* In case we cannot queue a packet reaching the queue limit, retain thebz2009-06-301-0/+1
| | | | | | | | semantics netisr_queue() always had and free the mbuf along with returning the error. Reviewed by: rwatson Approved by: re (kensmith)
* In light of DPCPU use by netisr, revise various for loops from usingrwatson2009-06-261-17/+16
| | | | | | | | | MAXCPU to mp_maxid, and handling and reporting of requests to use more threads than we have CPUs to run them on. Reviewed by: bz Approved by: re (kib) MFC after: 6 weeks
* Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizingrwatson2009-06-261-21/+40
| | | | | | | | | arrays to [MAXCPU], offering moderate memory savings. In some places, this requires using CPU_ABSENT() to handle less common platforms with sparse CPU IDs. In several places, assert that the selected CPUID for work placement or statistics is not CPU_ABSENT() to be on the safe side. Discussed with: bz, jeff
* Add an optional callback function that will be invoked when a per-CPUbz2009-06-141-0/+4
| | | | | | | | | | queue was drained. It will never fire for a directly dispatched packet. You will most likely never want to use this for any ordinary netisr usage and you will never blame netisr in case you try to use it and it does not work as expected. Reviewed by: rwatson
* Revert a recent netisr2 change: when billing packets to the currentrwatson2009-06-011-2/+0
| | | | | | | CPU, don't lock the workstream, as its mutexes may not have been initialized if there are fewer workstreams than CPUs. Run into by: hps, ps
* Garbage collect NETISR_POLL and NETISR_POLLMORE, which are no longerrwatson2009-06-011-6/+17
| | | | | | | | | | | | required for options DEVICE_POLLING. De-fragment the NETISR_ constant space and lower NETISR_MAXPROT from 32 to 16 -- when sizing queue arrays using this compile-time constant, significant amounts of memory are saved. Warn on the console when tunable values for netisr are automatically adjusted during boot due to exceeding limits, invalid values, or as a result of DEVICE_POLLING.
* Reimplement the netisr framework in order to support parallel netisrrwatson2009-06-011-157/+1029
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz
* Garbage collect now-unused NETISR_FORCEQUEUE, which overrode the globalrwatson2009-05-131-10/+5
| | | | | | direct dispatch policy for specific protocols (NETISR_USB). We leave the additional 'flags' argument to netisr_register() for the time being, even though it is no longer required.
* Change the curvnet variable from a global const struct vnet *,zec2009-05-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
* Remove NETISR_MPSAFE, which allows specific netisr handlers to be directlyrwatson2008-07-041-31/+15
| | | | | | | | | | | | | | | | | | | | | | | dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+1
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, whichrwatson2007-08-061-1/+0
| | | | | | | | | | | | | | | previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
* First in a series of changes to remove the now-unused Giant compatibilityrwatson2007-07-271-92/+0
| | | | | | | | | | | | | | | | | | | | | | | | | framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)
* Change net.isr.direct from defaulting to 0 to 1 in 7-CURRENT. Thisrwatson2006-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | enables direct dispatch of the network stack from the device driver ithread, enabling input path parallelism by default when multiple interfaces are present. The strategy for network stack parallelism is something being actively discussed, and this is just one of several possible (and perfectly reasonable) strategies, but has the distinct advantage of reducing the number of context switches and preemptions significantly, resulting in higher efficiency in many cases. In some caes, this may reduce network stack parallelism due to work not being deferred from the ithread to the netisr. Therefore, the strategy may change in the future, but this offers a reasonable first pass and enabling parallelism while maintaining strong ordering. Hopefully this will trigger lots of nice new bugs. This change is not intended for MFC.
* - Don't pollute opt_global.h with DEVICE_POLLING and introduceglebius2005-10-051-0/+1
| | | | | | | | | opt_device_polling.h - Include opt_device_polling.h into appropriate files. - Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that can be compiled as loadable modules. Reviewed by: bde
* Rename net.isr.enable to net.isr.dispatch.rwatson2005-10-041-5/+5
| | | | | | | | No compatibility code is provided, as this will be the production name as of 6.0. MFC after: 3 days Requested by: scottl
* Correctly unregister a netisr by clearing the ni->ni_queue field to NULL asandre2004-10-111-0/+1
| | | | | | | | | well. This field is actually used by various netisr functions to determine the availablility of the specified netisr. This uncomplete unregister leads directly to a crash when the KLD unregistering the netisr is unloaded. Submitted by: Sam <sah@softcardsystems.com> MFC after: 3 days
* Correct a comment typo: s/Note/Not/.rwatson2004-09-031-1/+1
| | | | Pointed out by: kensmith
* Correct typo in printf() warning.rwatson2004-08-281-1/+1
| | | | Submitted by: Pawel Worach <pawel.worach at telia.com>
* Change the default disposition of debug.mpsafenet from 0 to 1, whichrwatson2004-08-281-3/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | will cause the network stack to operate without the Giant lock by default. This change has the potential to improve performance by increasing parallelism and decreasing latency in network processing. Due to the potential exposure of existing or new bugs, the following compatibility functionality is maintained: - It is still possible to disable Giant-free operation by setting debug.mpsafenet to 0 in loader.conf. - Add "options NET_WITH_GIANT", which will restore the default value of debug.mpsafenet to 0, and is intended for use on systems compiled with known unsafe components, or where a more conservative configuration is desired. - Add a new declaration, NET_NEEDS_GIANT("componentname"), which permits kernel components to declare dependence on Giant over the network stack. If the declaration is made by a preloaded module or a compiled in component, the disposition of debug.mpsafenet will be set to 0 and a warning concerning performance degraded operation printed to the console. If it is declared by a loadable kernel module after boot, a warning is displayed but the disposition cannot be changed. This is implemented by defining a new SYSINIT() value, SI_SUB_SETTINGS, which is intended for the processing of configuration choices after tunables are read in and the console is available to generate errors, but before much else gets going. This compatibility behavior will go away when we've finished the last of the locking work and are confident that operation is correct.
* Apply error and success logic consistently to the function netisr_queue() andandre2004-08-271-3/+5
| | | | | | | | | | | | | | | | | | its users. netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported. Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets. PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days
* Comment clarifying debug_mpsafenet.rwatson2004-07-181-4/+5
|
* o add a flags parameter to netisr_register that is used to specifysam2003-11-081-39/+45
| | | | | | | | | | | | | | | | whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation
* o make debug_mpsafenet globally visiblesam2003-11-051-0/+10
| | | | | | | | o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation
* When direct dispatching an netisr (net.isr.enable=1), if there are alreadyrwatson2003-10-031-13/+21
| | | | | | | | | | | | | | | | | | any queued packets for the isr, process those packets before the newly submitted packet, maintaining ordering of all packets being delivered to the netisr. Remove the bypass counter since we don't bypass anymore. Leave the comment about possible problems and options since later performance optimization may change the strategy for addressing ordering problems here. Specifically, this maintains the strong isr ordering guarantee; additional parallelism and lower latency may be possible by moving to weaker guarantees (per-interface, for example). We will probably at some point also want to remove the one instance netisr dispatch limit currently enforced by a mutex, but it's not clear that's 100% safe yet, even in the netperf branch. Reviewed by: sam, others
* Create a tunable for net.isr.enable so that it may be set fromrwatson2003-10-021-0/+1
| | | | inception, rather than having to wait for the boot to finish.
* Temporarily turn net.isr.enable back off again until patches torwatson2003-10-011-1/+1
| | | | correct potential nits in packet ordering are resolved.
* Enable net.isr.enable by default, causing "delivery to completion"rwatson2003-10-011-1/+1
| | | | | | | | | | | | | | | (direct dispatch) in interrupt threads when the netisr in question isn't already active. If a netisr is already active, or direct dispatch is already in progress, we queue the packet for later delivery. Previously, this option was disabled by default. I have measured 20%+ performance improvements in IP packet forwarding with this enabled. Please report any problems ASAP, especially relating to stack depth or out-of-order packet processing. Discussed with: jlemon, peter Sponsored by: DARPA, Network Associates Laboratories
* Discard the packet if the netisr queue is null instead of panicing, forjlemon2003-03-081-2/+8
| | | | the benefit of modules which are compiled differently than the kernel.
* Update netisr handling; Each SWI now registers its queue, and all queuejlemon2003-03-041-58/+187
| | | | | | | | | | drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs
* Moved netisr code from kern/kern_intr.c to net/netisr.c as threatened in ajake2002-09-221-0/+116
comment.
OpenPOWER on IntegriCloud