summaryrefslogtreecommitdiffstats
path: root/sys/netgraph/ng_base.c
Commit message (Collapse)AuthorAgeFilesLines
* (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.antoine2009-12-281-1/+1
| | | | | | | | | Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+0
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Introduce and use a sysinit-based initialization scheme for virtualrwatson2009-07-231-20/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)
* Remove unused VNET_SET() and related macros; only VNET_GET() isrwatson2009-07-161-3/+3
| | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-42/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Introduce a mechanism for detecting calls from outbound path of thezec2009-06-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | network stack when reentering the inbound path from netgraph, and force queueing of mbufs at the outbound netgraph node. The mechanism relies on two components. First, in netgraph nodes where outbound path of the network stack calls into netgraph, the current thread has to be appropriately marked using the new NG_OUTBOUND_THREAD_REF() macro before proceeding to call further into the netgraph topology, and unmarked using the NG_OUTBOUND_THREAD_UNREF() macro before returning to the caller. Second, netgraph nodes which can potentially reenter the network stack in the inbound path have to mark their inbound hooks using NG_HOOK_SET_TO_INBOUND() macro. The netgraph framework will then detect when there is a danger of a call graph looping back from outbound to inbound path via netgraph, and defer handing off the mbufs to the "inbound" node to a worker thread with a clean stack. In this first pass only the most obvious netgraph nodes have been updated to ensure no outbound to inbound calls can occur. Nodes such as ng_ipfw, ng_gif etc. should be further examined whether a potential for outbound to inbound call looping exists. This commit changes the layout of struct thread, but due to __FreeBSD_version number shortage a version bump has been omitted at this time, nevertheless kernel and modules have to be rebuilt. Reviewed by: julian, rwatson, bz Approved by: julian (mentor)
* Introduce an infrastructure for dismantling vnet instances.zec2009-06-081-3/+42
| | | | | | | | | | | | | | | | | | | | | | | | | Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)
* Unbreak LINT build, caused by a change in struct ng_node layout introducedzec2009-05-051-0/+1
| | | | | | with r191816, which become uncovered only with NETGRAPH_DEBUG defined. NOT approved by mentor (julian) due to emergency.
* In preparation to make options VIMAGE operational, where needed,zec2009-04-261-2/+28
| | | | | | | | | | | | | initialize / release netgraph related state in iattach() / idetach() functions called via the vnet module registration / initialization framework, instead of initialization / cleanups being done in mod_event handlers. While here, introduce a crude hack aimed at preventing ng_ether to autoattach to ng_eiface ifnets, which are also netgraph nodes already. Reviewed by: bz Approved by: julian (mentor)
* To avoid one doubtless netgraph SMP scalability limitation point, switchmav2008-12-141-13/+32
| | | | | | | | node queues processing from single swi:net thread to several specialized threads. Reviewed by: julian Tested with: Netperf Cluster
* Revert rev. 183277:mav2008-12-131-8/+2
| | | | | | | | | | Remove ng_rmnode_flags() function. ng_rmnode_self() was made to be called only while having node locked. When node is properly locked, any function call sent to it will always be queued. So turning ng_rmnode_self() into the ng_rmnode_flags() is not just meaningless, but incorrent, as it violates node locking when called outside. No objections: julian, thompsa
* Conditionally compile out V_ globals while instantiating the appropriatezec2008-12-101-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Unhide declarations of network stack virtualization structs fromzec2008-11-281-1/+0
| | | | | | | | | | | | | | | | | | underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Remove unneeded NULL check. At first msg can't be null here and and at secondmav2008-11-221-2/+1
| | | | | | NG_FREE_MSG() also checks it. Found with: Coverity Prevent(tm)
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-4/+4
| | | | MFC after: 3 months
* Step 1.5 of importing the network stack virtualization infrastructurezec2008-10-021-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Add ng_rmnode_flags() so the caller can pass NG_QUEUE and have the nodethompsa2008-09-221-2/+8
| | | | | | destroyed asynchronously due to locking or other constraints. Reviewed by: julian
* We can't implicitly trust the hook on NGQF_FN/NGQF_FN2 processing inmav2008-09-131-6/+14
| | | | | | | | | | ng_apply_item(). There are possible (and I have got one) use-after-free class panics because of it. If hook is specified, require it to be valid at the apply time. The only exceptions are the internal ng_con_part2(), ng_con_part3() and ng_rmhook_part2() functions which are specially made to work with invalid hooks.
* A bunch of formatting fixes brough to light by, or created by the Vimage commitjulian2008-08-201-1/+1
| | | | a few days ago.
* Commit step 1 of the vimage project, (network stack)bz2008-08-171-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
* Remove NETISR_MPSAFE, which allows specific netisr handlers to be directlyrwatson2008-07-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.
* ng_address_hook() microoptimization. Use local variables as they should be.mav2008-04-191-7/+5
| | | | It helps compiller to avoid some extra memory accesses.
* Use separate UMA zone for data items allocation. It is a partialmav2008-04-161-40/+70
| | | | | | | | | rev. 1.149 rework. It allows to save several percents of CPU time on SMP by using UMA's internal per-CPU allocation limits instead of own global variable each time updated with atomics. Tested with: Netperf cluster
* Several changes breaking netgraph module ABI collected together:mav2008-04-151-104/+91
| | | | | | | | | | | | | | | | | | | | | | - reorder structures fields (XX_refs) a bit to group fields modified same time together. According to my tests it gives up to 10% SMP performance benefit on real workload due to reduced inter-CPU cache trashing. - change q_flags from long to int as long is not really needed there and it's usage with atomics is argued by some people. - move NGF_WORKQ flag into the separate field q_flags2 as it protected by queue mutex instead of node writer protection used by the rest of flags. - move nd_work queue entry to ng_queue structure to which it is more related and make it STAILQ instead of TAILQ as now it is a classic FIFO. - remove q_node pointer from ng_queue structure as it is not really needed. - reimplement item queue using STAILQ instead of own equal implementation. As soon as BT subsystem has own item queues using ng_item.el_next update it also. - change depth field in ng_item from uintptr_t to u_int. It was made uintptr_t to keep ABI compatibility. Reviewed by: julian, emax Tested with: Netperf cluster
* Add memory barriers to the node locking operations.mav2008-04-091-18/+32
| | | | Add some comments.
* Rewrite node's r/w/q-lock semantics using only atomics instead of mutexmav2008-04-061-247/+76
| | | | | and atomics combination. Mutex is now used only for queue protection. Also avoid unneded extra swi scheduling calls.
* Use new atomic_fetchadd() primitive instead of looping atomic_cmpset().mav2008-03-301-8/+5
|
* There is no need to erase hook->hk_node before freing hook.mav2008-03-291-3/+1
|
* Remove ng_setisr() call from ng_dequeue(). It is useless as we any waymav2008-03-271-11/+0
| | | | | | | | will never exit ngintr(), while there is some ready requests on the queue. It was made years ago with hope of parallel queue processing by several net threads. But even if we have several threads sometimes, we have no rights to process queue in parallel as it will break original requests serialization that is critically important for some setups.
* Remove impossible (hk_peer == NULL) check from ng_address_hook().mav2008-03-161-1/+0
| | | | | Valid hook can't have NULL peer. Even invalid one can't, as it is resets to deadhook, but not NULL.
* Improve apply callback error reporting:mav2008-03-111-8/+25
| | | | | | | | | | | Before this patch callback returned result of the last finished call chain. Now it returns last nonzero result from all call chain results in this request. As soon as this improvement gives reliable error reporting, it is now possible to remove dirty workaround in ng_socket, made to return ENOBUFS error statuses of request-response operations. That workaround was responsible for returning ENOBUFS errors to completely unrelated requests working at the same time on socket.
* Increase default queue items allocation limit from 512 to 4096 itemsmav2008-03-051-2/+16
| | | | | | | | | to avoid terrible unpredicted effects for netgraph operation of their exhaustion while allocating control messages. Add separate configurable 512 items limit for data items allocation for DoS/overload protection. Discussed with: julian
* Implement 128 items node name hash for faster name search.mav2008-03-041-47/+70
| | | | Increase node ID hash size from 32 to 128 items.
* Fix incorrect constant used in rev. 1.146 that broke node writer locking.mav2008-02-251-1/+1
|
* Cleanup and tune ng_snd_item() function as it is one of themav2008-02-061-95/+42
| | | | | most busy netgraph functions. Tune stack protection constants to avoid division operation.
* Fix one more grammo.marck2008-02-021-1/+1
| | | | Noticed by: ru
* Reword recent comment a bit.marck2008-02-011-3/+3
|
* Add comments about stack protection mechanism.mav2008-02-011-0/+8
|
* Some code reformat.mav2008-01-311-55/+38
|
* Implement stack protection based on GET_STACK_USAGE() macro.mav2008-01-311-8/+22
| | | | | This fixes system panics possible with complicated netgraph setups and allows to avoid unneded extra queueing for stack unwrapping.
* Add a new 'why' argument to kdb_enter(), and a set of constants to userwatson2007-12-251-2/+2
| | | | | | | | | for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.
* - Merge all the ng_send_fn2* functions into one - ng_send_fn2(),glebius2007-11-141-77/+36
| | | | | | | | removing some copy&pasted code. - Reduce copy and paste in ng_apply_item(). - Resurrect ng_send_fn() as a valid symbol, not a define. Reviewed by: mav, julian
* Minor debug message fix.mav2007-10-281-1/+1
|
* Fix build with NETGRAPH_DEBUG.ru2007-10-191-1/+9
|
* Implement new apply callback mechanism to handle item forwarding.mav2007-10-191-65/+182
| | | | | | | | When item forwarded refence counter is incremented, when item processed, counter decremented. When counter reaches zero, apply handler is getting called. Now it allows to report right connect() call status from user-level at the right time.
* Add ng_send_fn() error handeling inside ng_con_nodes().mav2007-08-181-2/+5
| | | | | | | | Without it some errors may left unnoticed and unhandeled that will lead to hooks left in half-connected state. Reviewed by: julian@ Approved by: re (kensmith), glebius (mentor)
* Despite several examples in the kernel, the third argument ofdwmalone2007-06-041-1/+1
| | | | | | | | | | | | | sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.
* Partially back out rev. 1.127, to restore broken functionality. Thisglebius2007-06-011-5/+4
| | | | | | should be redesigned, but better enter RELENG_7 with a working ngctl(8). Agreed by: julian
* Universally adopt most conventional spelling of acquire.rwatson2007-05-271-1/+2
|
* We don't need spinning locks here. Change them to the adaptive mutexes. Thiswkoszek2007-03-311-6/+6
| | | | | | | change should bring no performance decrease, as it did not in my tests. Reviewed by: julian, glebius Approved by: cognet (mentor)
OpenPOWER on IntegriCloud