summaryrefslogtreecommitdiffstats
path: root/share/man/man9/zero_copy.9
diff options
context:
space:
mode:
Diffstat (limited to 'share/man/man9/zero_copy.9')
-rw-r--r--share/man/man9/zero_copy.9142
1 files changed, 142 insertions, 0 deletions
diff --git a/share/man/man9/zero_copy.9 b/share/man/man9/zero_copy.9
new file mode 100644
index 0000000..8c2f6e8
--- /dev/null
+++ b/share/man/man9/zero_copy.9
@@ -0,0 +1,142 @@
+.\"
+.\" Copyright (c) 2002 Kenneth D. Merry.
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions, and the following disclaimer,
+.\" without modification, immediately at the beginning of the file.
+.\" 2. The name of the author may not be used to endorse or promote products
+.\" derived from this software without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
+.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd June 23, 2002
+.Dt ZERO_COPY 9
+.Os
+.Sh NAME
+.Nm zero_copy ,
+.Nm zero_copy_sockets
+.Sh SYNOPSIS
+.Cd options ZERO_COPY_SOCKETS
+.Sh DESCRIPTION
+The FreeBSD kernel includes a facility for eliminating data copies on
+socket reads and writes.
+.Pp
+This code is collectively known as the zero copy sockets code, because during
+normal network I/O, data will not be copied by the CPU at all. Rather it
+will be DMAed from the user's buffer to the NIC (for sends), or DMAed from
+the NIC to a buffer that will then be given to the user (receives).
+.Pp
+The zero copy sockets code uses the standard socket read and write
+semantics, and therefore has some limitations and restrictions that
+programmers should be aware of when trying to take advantage of this
+functionality.
+.Pp
+For sending data, there are no special requirements or capabilities that
+the sending NIC must have. The data written to the socket, though, must be
+at least a page in size and page aligned in order to be mapped into the
+kernel. If it doesn't meet the page size and alignment constraints, it
+will be copied into the kernel, as is normally the case with socket I/O.
+.Pp
+The user should be careful not to overwrite buffers that have been written
+to the socket before the data has been freed by the kernel, and the
+copy-on-write mapping cleared. If a buffer is overwritten before it has
+been given up by the kernel, the data will be copied, and no savings in CPU
+utilization and memory bandwidth utilization will be realized.
+.Pp
+The
+.Xr socket 2
+API doesn't really give the user any indication of when his data has
+actually been sent over the wire, or when the data has been freed from
+kernel buffers. For protocols like TCP, the data will be kept around in
+the kernel until it has been acknowledged by the other side; it must be
+kept until the acknowledgement is received in case retransmission is required.
+.Pp
+From an application standpoint, the best way to guarantee that the data has
+been sent out over the wire and freed by the kernel (for TCP-based sockets)
+is to set a socket buffer size (see the SO_SNDBUF socket option in the
+.Xr setsockopt 2
+man page) appropriate for the application and network environment and then
+make sure you have sent out twice as much data as the socket buffer size
+before reusing a buffer. For TCP, the send and receive socket buffer sizes
+generally directly correspond to the TCP window size.
+.Pp
+For receiving data, in order to take advantage of the zero copy receive
+code, the user must have a NIC that is configured for an MTU greater than
+the architecture page size. (e.g., for alpha this would be 8KB, for i386,
+it would be 4KB) Additionally, in order for zero copy receive to work,
+packet payloads must be at least a page in size and page aligned.
+.Pp
+Achieving page aligned payloads requires a NIC that can split an incoming
+packet into multiple buffers. It also generally requires some sort of
+intelligence on the NIC to make sure that the payload starts in its own
+buffer. This is called "header splitting". Currently the only NICs with
+support for header splitting are Alteon Tigon 2 based boards running
+slightly modified firmware. The FreeBSD
+.Xr ti 4
+driver includes modified firmware for Tigon 2 boards only. Header
+splitting code can be written, however, for any NIC that allows putting
+received packets into multiple buffers and that has enough programability
+to determine that the header should go into one buffer and the payload into
+another.
+.Pp
+You can also do a form of header splitting that doesn't require any NIC
+modifications if your NIC is at least capable of splitting packets into
+multiple buffers. This requires that you optimize the NIC driver for your
+most common packet header size. If that size (ethernet + IP + TCP headers)
+is generally 66 bytes, for instance, you would set the first buffer in a
+set for a particular packet to be 66 bytes long, and then subsequent
+buffers would be a page in size. For packets that have headers that are
+exactly 66 bytes long, your payload will be page aligned.
+.Pp
+The other requirement for zero copy receive to work is that the buffer that
+is the destination for the data read from a socket must be at least a page
+in size and page aligned.
+.Pp
+Obviously the requirements for receive side zero copy are impossible to
+meet without NIC hardware that is programmable enough to do header
+splitting of some sort. Since most NICs aren't that programmable, or their
+manufacturers won't share the source code to their firmware, this approach
+to zero copy receive isn't widely useful.
+.Pp
+There are other approaches, such as RDMA and TCP Offload, that may
+potentially help alleviate the CPU overhead associated with copying data
+out of the kernel. Most known techniques require some sort of support at
+the NIC level to work, and describing such techniques is beyond the scope
+of this manual page.
+.Pp
+The zero copy send and zero copy receive code can be individually turned
+off via the
+.Va kern.ipc.zero_copy.send
+and
+.Va kern.ipc.zero_copy.receive
+.Nm sysctl
+variables respectively.
+.Sh SEE ALSO
+.Xr socket 2 ,
+.Xr sendfile 2 ,
+.Xr ti 4,
+.Xr jumbo 9
+.Sh HISTORY
+The zero copy sockets code first appeared in FreeBSD 5.0, although it has
+been in existence in patch form since at least mid-1999.
+.Sh AUTHORS
+The zero copy sockets code was originally written by
+.An Andrew Gallatin Aq gallatin@FreeBSD.org
+and substantially modified and updated by
+.An Kenneth Merry Aq ken@FreeBSD.org .
OpenPOWER on IntegriCloud