diff options
author | adrian <adrian@FreeBSD.org> | 2014-07-21 04:48:02 +0000 |
---|---|---|
committer | adrian <adrian@FreeBSD.org> | 2014-07-21 04:48:02 +0000 |
commit | 6a2f31c5016ff399c9064a8a247a0af3fb39a067 (patch) | |
tree | a8c8e596f8e54bbad21fe8ff55e5e6686876f121 | |
parent | f3bb5d8aca394a890918761a5ad93a0bafcbd748 (diff) | |
download | FreeBSD-src-6a2f31c5016ff399c9064a8a247a0af3fb39a067.zip FreeBSD-src-6a2f31c5016ff399c9064a8a247a0af3fb39a067.tar.gz |
Add the PCBGROUPS manpage.
Thanks to wblock for helping me with this manpage.
-rw-r--r-- | share/man/man9/Makefile | 1 | ||||
-rw-r--r-- | share/man/man9/PCBGROUPS.9 | 228 |
2 files changed, 229 insertions, 0 deletions
diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile index abfcb20..73f1c68 100644 --- a/share/man/man9/Makefile +++ b/share/man/man9/Makefile @@ -188,6 +188,7 @@ MAN= accept_filter.9 \ osd.9 \ panic.9 \ pbuf.9 \ + PCBGROUPS.9 \ p_candebug.9 \ p_cansee.9 \ pci.9 \ diff --git a/share/man/man9/PCBGROUPS.9 b/share/man/man9/PCBGROUPS.9 new file mode 100644 index 0000000..5e09213 --- /dev/null +++ b/share/man/man9/PCBGROUPS.9 @@ -0,0 +1,228 @@ +.\" Copyright (c) 2014 Adrian Chadd +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. The name of the author may not be used to endorse or promote products +.\" derived from this software without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd July 18, 2014 +.Dt PCBGROUPS 9 +.Os +.Sh NAME +.Nm PCBGROUPS +.Nd Distributed Protocol Control Block Groups +.Sh SYNOPSIS +.Ft void +.Fn in_pcbgroup_init(struct inpcbinfo *pcbinfo, u_int hashfields, int hash_nelements); +.Ft void +.Fn in_pcbgroup_destroy(struct inpcbinfo *pcbinfo); +.Ft struct inpcbgroup * +.Fn in_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype, uint32_t hash); +.Ft struct inpcbgroup * +.Fn in_pcbgroup_byinpcb(struct inpcb *inp); +.Ft void +.Fn in_pcbgroup_update(struct inpcb *inp); +.Ft void +.Fn in_pcbgroup_update_mbuf(struct inpcb *inp, struct mbuf *m); +.Ft void +.Fn in_pcbgroup_remove(struct inpcb *inp); +.Ft int +.Fn in_pcbgroup_enabled(struct inpcbinfo *pcbinfo); +.Ft struct inpcbgroup * +.Fn in6_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype, uint32_t hash); +.Pp +.Cd "options PCBGROUPS" +.Sh DESCRIPTION +PCBGROUPS, or "connection groups", are based on Willman, Rixner, and Cox's +2006 USENIX paper, +.Qo +An Evaluation of Network Stack Parallelization Strategies in Modern +Operating Systems +.Qc . +.Pp +The PCBGROUPS paper describes two main kind of connection groups. +The first, called ConnP-T, uses a pool of worker threads which +implement the network stack. +Serialization occurs when queuing work into and completing work from +the network stack. +No locking is required inside each worker thread. +.Pp +The second type of connection group, called ConnP-L, uses an array +of PCB groups rather than a single list. +Each PCB group is protected by its own lock. +.Pp +This implementation differs significantly from that described in the +paper, in that it attempts to introduce not just notions of affinity +for connections and distribute work so as to reduce lock contention, +but also align those notions with hardware work distribution strategies +such as RSS. +In this construction, connection groups supplement, rather than replace, +existing reservation tables for protocol 4-tuples, offering CPU-affine +lookup tables with minimal cache line migration and lock contention +during steady state operation. +.Pp +Internet protocols like UDP and TCP register to use connection groups +by providing an ipi_hashfields value other than IPI_HASHFIELDS_NONE. +This indicates to the connection group code whether a 2-tuple or +4-tuple is used as an argument to hashes that assign a connection to +a particular group. +This must be aligned with any hardware-offloaded distribution model, +such as RSS or similar approaches taken in embedded network boards. +Wildcard sockets require special handling, as in Willman 2006, and +are shared between connection groups while being protected by +group-local locks. +Connection establishment and teardown can be signficantly more +expensive than without connection groups, but that steady-state +processing can be significantly faster. +.Pp +Enabling PCBGROUPS in the kernel only provides the infrastructure +required to create and manage multiple PCB groups. +An implementation needs to fill in a few functions to provide PCB +group hash information in order for PCBs to be placed in a PCB group. +.Ss Operation +By default, each PCB info block (struct pcbinfo) has a single hash for +all PCB entries for the given protocol with a single lock protecting it. +This can be a significant source of lock contention on SMP hardware. +When a PCBGROUP is created, an array of separate hash tables are +created, each with its own lock. +A separate table for wildcard PCBs is provided. +By default, a PCBGROUP table is created for each available CPU. +The PCBGROUP code attempts to calculate a hash value from the given +PCB or mbuf when looking up a PCBGROUP. +While processing a received frame, +.Fn in_pcbgroup_byhash() +can be used in conjunction with either a hardware-provided hash +value +.Po +eg the +.Xr RSS 9 +calculated hash value provided by some NICs +.Pc +or a software-provided hash value in order to choose a PCBGROUP +table to query. +A single table lock is held while performing a wildcard match. +However, all of the table locks are acquired before modifying the +wildcard table. +The PCBGROUP tables operate in conjunction with the normal single PCB list +in a PCB info block. +Thus, inserting and removing a PCB will still incur the same costs +as without PCBGROUPS. +A protocol which uses PCBGROUPS should fall back to the normal PCB list +lookup if a call to the PCBGROUPS layer does not yield a lookup hit. +.Ss Usage +Initialize a PCBGROUP in a PCB info block +.Pq Vt "struct pcbinfo" +by calling +.Fn in_pcbgroup_init . +.Pp +Add a connection to a PCBGROUP with +.Fn in_pcbgroup_update . +Connections are removed by with +.Fn in_pcbgroup_remove . +These in turn will determine which PCBGROUP bucket the given PCB +is placed into and calculate the hash value appropriately. +.Pp +Wildcard PCBs are hashed differently and placed in a single wildcard +PCB list. +If +.Xr RSS 9 +is enabled and in use, RSS-aware wildcard PCBs are placed in a single +PCBGROUP based on RSS information. +Protocols may look up the PCB entry in a PCBGROUP by using the lookup +functions +.Fn in_pcbgroup_byhash +and +.Fn in_pcbgroup_byinpcb . +.Sh IMPLEMENTATION NOTES +The PCB code in +.Pa sys/netinet +and +.Pa sys/netinet6 +is aware of PCBGROUPS and will call into the PCBGROUPS code to do +PCBGROUP assignment and lookup, preferring a PCBGROUP lookup to the +default global PCB info table. +.Pp +An implementor wishing to experiment or modify the PCBGROUP assignment +should modify this set of functions: +.Bl -tag -width "12345678" -offset indent +.It Fn in_pcbgroup_getbucket No and Fn in6_pcbgroup_getbucket +Map a given 32 bit hash value to a PCBGROUP. +By default this is hash % number_of_pcbgroups. +However, this distribution may not align with NIC receive queues or +the +.Xr netisr 9 +configuration. +.It Fn in_pcbgroup_byhash No and Fn in6_pcbgroup_byhash +Map a 32 bit hash value and a hash type identifier to a PCBGROUP. +By default, this simply returns NULL. +This function is used by the +.Xr mbuf 9 +receive path in +.Pa sys/netinet/in_pcb.c +to map an mbuf to a PCBGROUP. +.It Fn in_pcbgroup_bytuple No and Fn in6_pcbgroup_bytuple +Map the source and destination address and port details to a PCBGROUP. +By default, this does a very simple XOR hash. +This function is used by both the PCB lookup code and as a fallback in +the +.Xr mbuf 9 +receive path in +.Pa sys/netinet/in_pcb.c . +.El +.Sh SEE ALSO +.Xr mbuf 9 , +.Xr RSS 9 , +.Xr netisr 9 +.Sh HISTORY +PCBGROUPS first appeared in FreeBSD 9.0. +.Pp +The PCBGROUPS implementation is inspired by Willman, Rixner, and Cox's +2006 USENIX paper, +.Qo +An Evaluation of Network Stack Parallelization Strategies in Modern +Operating Systems +.Qc : +.Li http://www.ece.rice.edu/~willmann/pubs/paranet_usenix.pdf +.Sh AUTHORS +.An -nosplit +The PCBGROUPS implementation was written by +.An Robert N. M. Watson Aq Mt rwatson@FreeBSD.org +under contract to Juniper Networks, Inc. +.Pp +This manual page written by +.An Adrian Chadd Aq Mt adrian@FreeBSD.org . +.Sh NOTES +The +.Xr RSS 9 +implementation currently uses +.Ic #ifdef +blocks to tie into PCBGROUPS. +This is a sign that a more abstract programming API is needed. +.Pp +There is currently no support for re-balancing the PCBGROUPS assignment, +nor is there any support for overriding which PCBGROUP a socket/PCB +should be in. +.Pp +No statistics are kept to indicate how often PCBGROUPS lookups +succeed or fail. |