summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorjtl <jtl@FreeBSD.org>2016-06-28 13:37:01 +0000
committerjtl <jtl@FreeBSD.org>2016-06-28 13:37:01 +0000
commitbbaee6454fe08d2a791576177149a54d980569d5 (patch)
treea337c7b411f726b37b98bb60f23b450701d6fca5
parent96d5d9cf5de60ade43c054d2cae374fd3e7b8054 (diff)
downloadFreeBSD-src-bbaee6454fe08d2a791576177149a54d980569d5.zip
FreeBSD-src-bbaee6454fe08d2a791576177149a54d980569d5.tar.gz
Document support for alternate TCP stacks.
Differential Revision: https://reviews.freebsd.org/D6940 Reviewed by: hiren Approved by: re (gjb) Sponsored by: Juniper Networks
-rw-r--r--share/man/man4/tcp.426
-rw-r--r--share/man/man9/Makefile3
-rw-r--r--share/man/man9/tcp_functions.9285
-rw-r--r--tools/build/options/WITH_EXTRA_TCP_STACKS2
4 files changed, 314 insertions, 2 deletions
diff --git a/share/man/man4/tcp.4 b/share/man/man4/tcp.4
index d02ff55..f275a4d 100644
--- a/share/man/man4/tcp.4
+++ b/share/man/man4/tcp.4
@@ -34,7 +34,7 @@
.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
.\" $FreeBSD$
.\"
-.Dd May 19, 2016
+.Dd June 28, 2016
.Dt TCP 4
.Os
.Sh NAME
@@ -119,7 +119,7 @@ supports a number of socket options which can be set with
.Xr setsockopt 2
and tested with
.Xr getsockopt 2 :
-.Bl -tag -width ".Dv TCP_CONGESTION"
+.Bl -tag -width ".Dv TCP_FUNCTION_BLK"
.It Dv TCP_INFO
Information about a socket's underlying TCP session may be retrieved
by passing the read-only option
@@ -148,6 +148,20 @@ connection.
See
.Xr mod_cc 4
for details.
+.It Dv TCP_FUNCTION_BLK
+Select or query the set of functions that TCP will use for this connection.
+This allows a user to select an alternate TCP stack.
+The alternate TCP stack must already be loaded in the kernel.
+To list the available TCP stacks, see
+.Va functions_available
+in the
+.Sx MIB Variables
+section further down.
+To list the default TCP stack, see
+.Va functions_default
+in the
+.Sx MIB Variables
+section.
.It Dv TCP_KEEPINIT
This
.Xr setsockopt 2
@@ -568,6 +582,10 @@ Number of times default MSS was used in an attempt to downshift.
.It Va pmtud_blackhole_failed
Number of connections for which retransmits continued even after MSS
downshift.
+.It Va functions_available
+List of available TCP function blocks (TCP stacks).
+.It Va functions_default
+The default TCP function block (TCP stack).
.El
.Sh ERRORS
A socket operation may fail with one of the following errors returned:
@@ -599,6 +617,10 @@ exists;
.It Bq Er EAFNOSUPPORT
when an attempt is made to bind or connect a socket to a multicast
address.
+.It Bq Er EINVAL
+when trying to change TCP function blocks at an invalid point in the session;
+.It Bq Er ENOENT
+when trying to use a TCP function block that is not available;
.El
.Sh SEE ALSO
.Xr getsockopt 2 ,
diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile
index 2894c12..8617cba 100644
--- a/share/man/man9/Makefile
+++ b/share/man/man9/Makefile
@@ -284,6 +284,7 @@ MAN= accept_filter.9 \
sysctl_ctx_init.9 \
SYSINIT.9 \
taskqueue.9 \
+ tcp_functions.9 \
thread_exit.9 \
time.9 \
timeout.9 \
@@ -1734,6 +1735,8 @@ MLINKS+=taskqueue.9 TASK_INIT.9 \
taskqueue.9 taskqueue_start_threads_pinned.9 \
taskqueue.9 taskqueue_unblock.9 \
taskqueue.9 TIMEOUT_TASK_INIT.9
+MLINKS+=tcp_functions.9 register_tcp_functions.9 \
+ tcp_functions.9 deregister_tcp_functions.9
MLINKS+=time.9 boottime.9 \
time.9 time_second.9 \
time.9 time_uptime.9
diff --git a/share/man/man9/tcp_functions.9 b/share/man/man9/tcp_functions.9
new file mode 100644
index 0000000..06c42ff
--- /dev/null
+++ b/share/man/man9/tcp_functions.9
@@ -0,0 +1,285 @@
+.\"
+.\" Copyright (c) 2016 Jonathan Looney <jtl@FreeBSD.org>
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
+.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd June 28, 2016
+.Dt TCP_FUNCTIONS 9
+.Os
+.Sh NAME
+.Nm tcp_functions
+.Nd Alternate TCP Stack Framework
+.Sh SYNOPSIS
+.In netinet/tcp.h
+.In netinet/tcp_var.h
+.Ft int
+.Fn register_tcp_functions "struct tcp_function_block *blk" "int wait"
+.Ft int
+.Fn deregister_tcp_functions "struct tcp_function_block *blk"
+.Sh DESCRIPTION
+The
+.Nm
+framework allows a kernel developer to implement alternate TCP stacks.
+The alternate stacks can be compiled in the kernel or can be implemented in
+loadable kernel modules.
+This functionality is intended to encourage experimentation with the TCP stack
+and to allow alternate behaviors to be deployed for different TCP connections
+on a single system.
+.Pp
+A system administrator can set a system default stack.
+By default, all TCP connections will use the system default stack.
+Additionally, users can specify a particular stack to use on a per-connection
+basis.
+(See
+.Xr tcp 4
+for details on setting the system default stack, or selecting a specific stack
+for a given connection.)
+.Pp
+This man page treats "TCP stacks" as synonymous with "function blocks".
+This is intentional.
+A "TCP stack" is a collection of functions that implement a set of behavior.
+Therefore, an alternate "function block" defines an alternate "TCP stack".
+.Pp
+.Nm
+modules must call the
+.Fn register_tcp_functions
+function during initialization and successfully call the
+.Fn deregister_tcp_functions
+function prior to allowing the module to be unloaded.
+.Pp
+The
+.Fn register_tcp_functions
+function requests that the system add a specified function block to the system.
+.Pp
+The
+.Fn deregister_tcp_functions
+function requests that the system remove a specified function block from the
+system.
+If the call fails because sockets are still using the specified function block,
+the system will mark the function block as being in the process of being
+removed.
+This will prevent additional sockets from using the specified function block.
+However, it will not impact sockets that are already using the function block.
+.Pp
+The
+.Fa blk
+argument is a pointer to a
+.Vt "struct tcp_function_block" ,
+which is explained below (see
+.Sx Function Block Structure ) .
+The
+.Fa wait
+argument is used as the
+.Fa flags
+argument to
+.Xr malloc 9 ,
+and must be set to one of the valid values defined in that man page.
+.Ss Function Block Structure
+The
+.Fa blk argument is a pointer to a
+.Vt "struct tcp_function_block" ,
+which has the following members:
+.Bd -literal -offset indent
+struct tcp_function_block {
+ char tfb_tcp_block_name[TCP_FUNCTION_NAME_LEN_MAX];
+ int (*tfb_tcp_output)(struct tcpcb *);
+ void (*tfb_tcp_do_segment)(struct mbuf *, struct tcphdr *,
+ struct socket *, struct tcpcb *,
+ int, int, uint8_t,
+ int);
+ int (*tfb_tcp_ctloutput)(struct socket *so,
+ struct sockopt *sopt,
+ struct inpcb *inp, struct tcpcb *tp);
+ /* Optional memory allocation/free routine */
+ void (*tfb_tcp_fb_init)(struct tcpcb *);
+ void (*tfb_tcp_fb_fini)(struct tcpcb *);
+ /* Optional timers, must define all if you define one */
+ int (*tfb_tcp_timer_stop_all)(struct tcpcb *);
+ void (*tfb_tcp_timer_activate)(struct tcpcb *,
+ uint32_t, u_int);
+ int (*tfb_tcp_timer_active)(struct tcpcb *, uint32_t);
+ void (*tfb_tcp_timer_stop)(struct tcpcb *, uint32_t);
+ void (*tfb_tcp_rexmit_tmr)(struct tcpcb *);
+ volatile uint32_t tfb_refcnt;
+ uint32_t tfb_flags;
+};
+.Ed
+.Pp
+The
+.Va tfb_tcp_block_name
+field identifies the unique name of the TCP stack, and should be no longer than
+TCP_FUNCTION_NAME_LEN_MAX-1 characters in length.
+.Pp
+The
+.Va tfb_tcp_output ,
+.Va tfb_tcp_do_segment ,
+and
+.Va tfb_tcp_ctloutput
+fields are pointers to functions that perform the equivalent actions
+as the default
+.Fn tcp_output ,
+.Fn tcp_do_segment ,
+and
+.Fn tcp_default_ctloutput
+functions, respectively.
+Each of these function pointers must be non-NULL.
+.Pp
+If a TCP stack needs to initialize data when a socket first selects the TCP
+stack (or, when the socket is first opened), it should set a non-NULL
+pointer in the
+.Va tfb_tcp_fb_init
+field.
+Likewise, if a TCP stack needs to cleanup data when a socket stops using the
+TCP stack (or, when the socket is closed), it should set a non-NULL pointer
+in the
+.Va tfb_tcp_fb_fini
+field.
+.Pp
+If the TCP stack implements additional timers, the TCP stack should set a
+non-NULL pointer in the
+.Va tfb_tcp_timer_stop_all ,
+.Va tfb_tcp_timer_activate ,
+.Va tfb_tcp_timer_active ,
+and
+.Va tfb_tcp_timer_stop
+fields.
+These fields should all be
+.Dv NULL
+or should all contain pointers to functions.
+The
+.Va tfb_tcp_timer_activate ,
+.Va tfb_tcp_timer_active ,
+and
+.Va tfb_tcp_timer_stop
+functions will be called when the
+.Fn tcp_timer_activate ,
+.Fn tcp_timer_active ,
+and
+.Fn tcp_timer_stop
+functions, respectively, are called with a timer type other than the standard
+types.
+The functions defined by the TCP stack have the same semantics (both for
+arguments and return values) as the normal timer functions they supplement.
+.Pp
+Additionally, a stack may define its own actions to take when the retransmit
+timer fires by setting a non-NULL function pointer in the
+.Va tfb_tcp_rexmit_tmr
+field.
+This function is called very early in the process of handling a retransmit
+timer.
+However, care must be taken to ensure the retransmit timer leaves the
+TCP control block in a valid state for the remainder of the retransmit
+timer logic.
+.Pp
+The
+.Va tfb_refcnt
+and
+.Va tfb_flags
+fields are used by the kernel's TCP code and will be initialized when the
+TCP stack is registered.
+.Ss Requirements for Alternate TCP Stacks
+If the TCP stack needs to store data beyond what is stored in the default
+TCP control block, the TCP stack can initialize its own per-connection storage.
+The
+.Va t_fb_ptr
+field in the
+.Vt "struct tcpcb"
+control block structure has been reserved to hold a pointer to this
+per-connection storage.
+If the TCP stack uses this alternate storage, it should understand that the
+value of the
+.Va t_fb_ptr
+pointer may not be initialized to
+.Dv NULL .
+Therefore, it should use a
+.Va tfb_tcp_fb_init
+function to initialize this field.
+Additionally, it should use a
+.Va tfb_tcp_fb_fini
+function to deallocate storage when the socket is closed.
+.Pp
+It is understood that alternate TCP stacks may keep different sets of data.
+However, in order to ensure that data is available to both the user and the
+rest of the system in a standardized format, alternate TCP stacks must
+update all fields in the TCP control block to the greatest extent practical.
+.Sh RETURN VALUES
+The
+.Fn register_tcp_functions
+and
+.Fn deregister_tcp_functions
+functions return zero on success and non-zero on failure.
+In particular, the
+.Fn deregister_tcp_functions
+will return
+.Er EBUSY
+until no more connections are using the specified TCP stack.
+A module calling
+.Fn deregister_tcp_functions
+must be prepared to wait until all connections have stopped using the
+specified TCP stack.
+.Sh ERRORS
+The
+.Fn register_tcp_functions
+function will fail if:
+.Bl -tag -width Er
+.It Bq Er EINVAL
+Any of the members of the
+.Fa blk
+argument are set incorrectly.
+.It Bq Er ENOMEM
+The function could not allocate memory for its internal data.
+.It Bq Er EALREADY
+A function block is already registered with the same name.
+.El
+The
+.Fn deregister_tcp_functions
+function will fail if:
+.Bl -tag -width Er
+.It Bq Er EPERM
+The
+.Fa blk
+argument references the kernel's compiled-in default function block.
+.It Bq Er EBUSY
+The function block is still in use by one or more sockets, or is defined as
+the current default function block.
+.It Bq Er ENOENT
+The
+.Fa blk
+argument references a function block that is not currently registered.
+.Sh SEE ALSO
+.Xr malloc 9 ,
+.Xr tcp 4
+.Sh HISTORY
+This framework first appeared in
+.Fx 11.0 .
+.Sh AUTHORS
+.An -nosplit
+The
+.Nm
+framework was written by
+.An Randall Stewart Aq Mt rrs@FreeBSD.org .
+.Pp
+This manual page was written by
+.An Jonathan Looney Aq Mt jtl@FreeBSD.org .
diff --git a/tools/build/options/WITH_EXTRA_TCP_STACKS b/tools/build/options/WITH_EXTRA_TCP_STACKS
new file mode 100644
index 0000000..6e9de17
--- /dev/null
+++ b/tools/build/options/WITH_EXTRA_TCP_STACKS
@@ -0,0 +1,2 @@
+.\" $FreeBSD$
+Set to build extra TCP stack modules.
OpenPOWER on IntegriCloud