summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authoradrian <adrian@FreeBSD.org>2014-07-21 04:48:02 +0000
committeradrian <adrian@FreeBSD.org>2014-07-21 04:48:02 +0000
commit6a2f31c5016ff399c9064a8a247a0af3fb39a067 (patch)
treea8c8e596f8e54bbad21fe8ff55e5e6686876f121
parentf3bb5d8aca394a890918761a5ad93a0bafcbd748 (diff)
downloadFreeBSD-src-6a2f31c5016ff399c9064a8a247a0af3fb39a067.zip
FreeBSD-src-6a2f31c5016ff399c9064a8a247a0af3fb39a067.tar.gz
Add the PCBGROUPS manpage.
Thanks to wblock for helping me with this manpage.
-rw-r--r--share/man/man9/Makefile1
-rw-r--r--share/man/man9/PCBGROUPS.9228
2 files changed, 229 insertions, 0 deletions
diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile
index abfcb20..73f1c68 100644
--- a/share/man/man9/Makefile
+++ b/share/man/man9/Makefile
@@ -188,6 +188,7 @@ MAN= accept_filter.9 \
osd.9 \
panic.9 \
pbuf.9 \
+ PCBGROUPS.9 \
p_candebug.9 \
p_cansee.9 \
pci.9 \
diff --git a/share/man/man9/PCBGROUPS.9 b/share/man/man9/PCBGROUPS.9
new file mode 100644
index 0000000..5e09213
--- /dev/null
+++ b/share/man/man9/PCBGROUPS.9
@@ -0,0 +1,228 @@
+.\" Copyright (c) 2014 Adrian Chadd
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\" 3. The name of the author may not be used to endorse or promote products
+.\" derived from this software without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd July 18, 2014
+.Dt PCBGROUPS 9
+.Os
+.Sh NAME
+.Nm PCBGROUPS
+.Nd Distributed Protocol Control Block Groups
+.Sh SYNOPSIS
+.Ft void
+.Fn in_pcbgroup_init(struct inpcbinfo *pcbinfo, u_int hashfields, int hash_nelements);
+.Ft void
+.Fn in_pcbgroup_destroy(struct inpcbinfo *pcbinfo);
+.Ft struct inpcbgroup *
+.Fn in_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype, uint32_t hash);
+.Ft struct inpcbgroup *
+.Fn in_pcbgroup_byinpcb(struct inpcb *inp);
+.Ft void
+.Fn in_pcbgroup_update(struct inpcb *inp);
+.Ft void
+.Fn in_pcbgroup_update_mbuf(struct inpcb *inp, struct mbuf *m);
+.Ft void
+.Fn in_pcbgroup_remove(struct inpcb *inp);
+.Ft int
+.Fn in_pcbgroup_enabled(struct inpcbinfo *pcbinfo);
+.Ft struct inpcbgroup *
+.Fn in6_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype, uint32_t hash);
+.Pp
+.Cd "options PCBGROUPS"
+.Sh DESCRIPTION
+PCBGROUPS, or "connection groups", are based on Willman, Rixner, and Cox's
+2006 USENIX paper,
+.Qo
+An Evaluation of Network Stack Parallelization Strategies in Modern
+Operating Systems
+.Qc .
+.Pp
+The PCBGROUPS paper describes two main kind of connection groups.
+The first, called ConnP-T, uses a pool of worker threads which
+implement the network stack.
+Serialization occurs when queuing work into and completing work from
+the network stack.
+No locking is required inside each worker thread.
+.Pp
+The second type of connection group, called ConnP-L, uses an array
+of PCB groups rather than a single list.
+Each PCB group is protected by its own lock.
+.Pp
+This implementation differs significantly from that described in the
+paper, in that it attempts to introduce not just notions of affinity
+for connections and distribute work so as to reduce lock contention,
+but also align those notions with hardware work distribution strategies
+such as RSS.
+In this construction, connection groups supplement, rather than replace,
+existing reservation tables for protocol 4-tuples, offering CPU-affine
+lookup tables with minimal cache line migration and lock contention
+during steady state operation.
+.Pp
+Internet protocols like UDP and TCP register to use connection groups
+by providing an ipi_hashfields value other than IPI_HASHFIELDS_NONE.
+This indicates to the connection group code whether a 2-tuple or
+4-tuple is used as an argument to hashes that assign a connection to
+a particular group.
+This must be aligned with any hardware-offloaded distribution model,
+such as RSS or similar approaches taken in embedded network boards.
+Wildcard sockets require special handling, as in Willman 2006, and
+are shared between connection groups while being protected by
+group-local locks.
+Connection establishment and teardown can be signficantly more
+expensive than without connection groups, but that steady-state
+processing can be significantly faster.
+.Pp
+Enabling PCBGROUPS in the kernel only provides the infrastructure
+required to create and manage multiple PCB groups.
+An implementation needs to fill in a few functions to provide PCB
+group hash information in order for PCBs to be placed in a PCB group.
+.Ss Operation
+By default, each PCB info block (struct pcbinfo) has a single hash for
+all PCB entries for the given protocol with a single lock protecting it.
+This can be a significant source of lock contention on SMP hardware.
+When a PCBGROUP is created, an array of separate hash tables are
+created, each with its own lock.
+A separate table for wildcard PCBs is provided.
+By default, a PCBGROUP table is created for each available CPU.
+The PCBGROUP code attempts to calculate a hash value from the given
+PCB or mbuf when looking up a PCBGROUP.
+While processing a received frame,
+.Fn in_pcbgroup_byhash()
+can be used in conjunction with either a hardware-provided hash
+value
+.Po
+eg the
+.Xr RSS 9
+calculated hash value provided by some NICs
+.Pc
+or a software-provided hash value in order to choose a PCBGROUP
+table to query.
+A single table lock is held while performing a wildcard match.
+However, all of the table locks are acquired before modifying the
+wildcard table.
+The PCBGROUP tables operate in conjunction with the normal single PCB list
+in a PCB info block.
+Thus, inserting and removing a PCB will still incur the same costs
+as without PCBGROUPS.
+A protocol which uses PCBGROUPS should fall back to the normal PCB list
+lookup if a call to the PCBGROUPS layer does not yield a lookup hit.
+.Ss Usage
+Initialize a PCBGROUP in a PCB info block
+.Pq Vt "struct pcbinfo"
+by calling
+.Fn in_pcbgroup_init .
+.Pp
+Add a connection to a PCBGROUP with
+.Fn in_pcbgroup_update .
+Connections are removed by with
+.Fn in_pcbgroup_remove .
+These in turn will determine which PCBGROUP bucket the given PCB
+is placed into and calculate the hash value appropriately.
+.Pp
+Wildcard PCBs are hashed differently and placed in a single wildcard
+PCB list.
+If
+.Xr RSS 9
+is enabled and in use, RSS-aware wildcard PCBs are placed in a single
+PCBGROUP based on RSS information.
+Protocols may look up the PCB entry in a PCBGROUP by using the lookup
+functions
+.Fn in_pcbgroup_byhash
+and
+.Fn in_pcbgroup_byinpcb .
+.Sh IMPLEMENTATION NOTES
+The PCB code in
+.Pa sys/netinet
+and
+.Pa sys/netinet6
+is aware of PCBGROUPS and will call into the PCBGROUPS code to do
+PCBGROUP assignment and lookup, preferring a PCBGROUP lookup to the
+default global PCB info table.
+.Pp
+An implementor wishing to experiment or modify the PCBGROUP assignment
+should modify this set of functions:
+.Bl -tag -width "12345678" -offset indent
+.It Fn in_pcbgroup_getbucket No and Fn in6_pcbgroup_getbucket
+Map a given 32 bit hash value to a PCBGROUP.
+By default this is hash % number_of_pcbgroups.
+However, this distribution may not align with NIC receive queues or
+the
+.Xr netisr 9
+configuration.
+.It Fn in_pcbgroup_byhash No and Fn in6_pcbgroup_byhash
+Map a 32 bit hash value and a hash type identifier to a PCBGROUP.
+By default, this simply returns NULL.
+This function is used by the
+.Xr mbuf 9
+receive path in
+.Pa sys/netinet/in_pcb.c
+to map an mbuf to a PCBGROUP.
+.It Fn in_pcbgroup_bytuple No and Fn in6_pcbgroup_bytuple
+Map the source and destination address and port details to a PCBGROUP.
+By default, this does a very simple XOR hash.
+This function is used by both the PCB lookup code and as a fallback in
+the
+.Xr mbuf 9
+receive path in
+.Pa sys/netinet/in_pcb.c .
+.El
+.Sh SEE ALSO
+.Xr mbuf 9 ,
+.Xr RSS 9 ,
+.Xr netisr 9
+.Sh HISTORY
+PCBGROUPS first appeared in FreeBSD 9.0.
+.Pp
+The PCBGROUPS implementation is inspired by Willman, Rixner, and Cox's
+2006 USENIX paper,
+.Qo
+An Evaluation of Network Stack Parallelization Strategies in Modern
+Operating Systems
+.Qc :
+.Li http://www.ece.rice.edu/~willmann/pubs/paranet_usenix.pdf
+.Sh AUTHORS
+.An -nosplit
+The PCBGROUPS implementation was written by
+.An Robert N. M. Watson Aq Mt rwatson@FreeBSD.org
+under contract to Juniper Networks, Inc.
+.Pp
+This manual page written by
+.An Adrian Chadd Aq Mt adrian@FreeBSD.org .
+.Sh NOTES
+The
+.Xr RSS 9
+implementation currently uses
+.Ic #ifdef
+blocks to tie into PCBGROUPS.
+This is a sign that a more abstract programming API is needed.
+.Pp
+There is currently no support for re-balancing the PCBGROUPS assignment,
+nor is there any support for overriding which PCBGROUP a socket/PCB
+should be in.
+.Pp
+No statistics are kept to indicate how often PCBGROUPS lookups
+succeed or fail.
OpenPOWER on IntegriCloud