diff options
Diffstat (limited to 'share/doc/iso/wisc/ipc.nr')
-rw-r--r-- | share/doc/iso/wisc/ipc.nr | 372 |
1 files changed, 372 insertions, 0 deletions
diff --git a/share/doc/iso/wisc/ipc.nr b/share/doc/iso/wisc/ipc.nr new file mode 100644 index 0000000..9f9d962 --- /dev/null +++ b/share/doc/iso/wisc/ipc.nr @@ -0,0 +1,372 @@ +.NC "The Design of Unix IPC" +.sh 1 "General" +.pp +The ARGO implementation of +TP and CLNP was designed to fit into the AOS +kernel +as easily as possible. +All the standard protocol hooks are used. +To understand the design, it is useful to have +read +Leffler, Joy, and Fabry: +\*(lq4.2 BSD Networking Implementation Notes\*(rq July 1983. +This section describes the +design of the IPC support in the AOS kernel. +.sh 1 "Functional Unit Overview" +.pp +The +AOS +kernel +is a monolithic program of considerable size and complexity. +The code can be separated into parts of distinct function, +but there are no kernel processes per se. +The kernel code is either executed on behalf of a user +process, in which case the kernel was entered by a system call, +or it is executed on behalf of a hardware or software interrupt. +The following sections describe briefly the major functional units +of the kernel. +.\" FIGURE +.so figs/func_units.nr +.CF +shows the arrangement of these kernel units and +their interactions. +.sh 2 "The file system." +.pp +.sh 2 "Virtual memory support." +.pp +This includes protection, swapping, paging, and +text sharing. +.sh 2 "Blocked device drivers (disks, tapes)." +.pp +All these drivers share some minor functional units, +such as buffer management and bus support +for the various types of busses on the machine. +.sh 2 "Interprocess communication (IPC)." +.pp +This includes +support for various protocols, +buffer management, and a standard interface for inter-protocol +communication. +.sh 2 "Network interface drivers." +.pp +These drivers are closely tied to the IPC support. +They use the IPC's buffer management unit rather +than the buffers used by the blocked device drivers. +The interface between these drivers and the rest of the kernel +differs from the interface used by the blocked devices. +.sh 2 "Tty driver" +.pp +This is terminal support, including the user interface +and the device drivers. +.sh 2 "System call interface." +.pp +This handles signals, traps, and system calls. +.sh 2 "Clock." +.pp +The clock is used in various forms by many +other units. +.sh 2 "User process support (the rest)." +.pp +This includes support for accounting, process creation, +control, scheduling, and destruction. +.pp +.sh 2 "IPC" +.pp +The major functional unit that supports IPC +can be divided into the following smaller functional +units. +.sh 3 "Buffer management." +.pp +All protocols share a pool of buffers called \fImbufs\fR: +.(b +\fC +.TS +tab(+); +l s s s. +struct mbuf { +.T& +l l l l. ++struct mbuf+*m_next;+/* next buffer in chain */ ++u_long+m_off;+/* offset of data */ ++short+m_len;+/* amount of data */ ++short+m_type;+/* mbuf type (0 == free) */ ++u_char+m_dat[MLEN];+/* data storage */ ++struct mbuf+*m_act;+/* link in 2-d structure */ +}; +.TE +\fR +.)b +.pp +There are two forms of mbufs - small ones and large ones. +Small ones are 128 octets in +AOS +and 256 octets +in the ARGO release. Small mbufs are copied by byte-to-byte +copies. +The data in these mbufs are kept in the character +array field \fIm_dat\fR in the mbuf structure +itself. +For this type of mbuf, the field \fIm_off\fR is positive, +and is the offset to the beginning of the data from +the beginning of the mbuf structure itself. +Large mbufs, called \fIclusters\fR, are page-sized +and page-aligned. +They may be \*(lqcopied\*(rq by multiply mapping the pages they occupy. +They consist of a page of memory plus a small mbuf structure +whose fields are used +to link clusters into chains, but whose \fIm_dat\fR array is +not used. +The \fIm_off\fR field of the structure +is the offset (positive or negative) from the +beginning of the mbuf structure to the beginning +of the data page part of the cluster. +In the case of clusters, the offset is always out of the +bounds of the \fIm_dat\fR array and so it is alway possible +to tell from the \fIm_off\fR field whether an mbuf structure +is part of a cluster or is a small mbuf. +All mbufs permanently reside in memory. +The mbuf management unit manages its own page table. +The mbuf manager keeps limited statistics on the quantities and +types of buffers in use. +Mbufs are used for many purposes, and most of these purposes +have a type associated with them. +Some of the types that buffers may take are +MT_FREE (not allocated), MT_DATA, +MT_HEADER, MT_SOCKET (socket structure), +MT_PCB (protocol control block), +MT_RTABLE (routing tables), +and +MT_SOOPTS (arguments passed to \fIgetsockopt()\fR and +\fIsetsockopt()\fR. +Data are passed among functional units by means +of queues, the contents of which are +either chains of mbufs or groups of chains of mbufs. +Mbufs are linked into chains with the \fIm_next\fR field. +Chains of mbufs are linked into groups with the \fIm_act\fR +field. +The \fIm_act\fR field allows a protocol to retain packet +boundaries in a queue of mbufs. +.sh 3 "Routing." +.pp +Routing decisions in the kernel are made by the procedure \fIrtalloc()\fR. +This procedure will scan the kernel routing tables (stored in mbufs) +looking for a route. A route is represented by +.(b +\fC +.TS +tab(+); +l s s s. +struct rtentry { +.T& +l l l l. ++u_long+rt_hash;+/* to speed lookups */ ++struct sockaddr+rt_dst;+/* key */ ++struct sockaddr+rt_gateway;+/* value */ ++short+rt_flags;+/* up/down?, host/net */ ++short+rt_refcnt;+/* # held references */ ++u_long+rt_use;+/* raw # packets forwarded */ ++struct ifnet+*rt_ifp;+/* interface to use */ +} +.TE +\fR +.)b +When looking for a route, \fIrtalloc()\fR will first hash the entire destination +address, and scan the routing tables looking for a complete route. If a route +is not found, then \fIrtalloc()\fR will rescan the table looking for a route +which matches the \fInetwork\fR portion of the address. If a route is still +not found, then a default route is used (if present). +.pp +If a route is found, the entity which called \fIrtalloc()\fR can use information +from the \fIrtentry\fR structure to dispatch the datagram. Specifically, the +datagram is queued on the interface identified by the interface +pointer \fIrt_ifp\fR. +.sh 3 "Socket code." +.pp +This is the protocol-independent part of the IPC support. +Each communication endpoint (which may or may not be associated +with a connection) is represented by the following structure: +.(b +\fC +.TS +tab(+); +l s s s. +struct socket { +.T& +l l l l. ++short+so_type;+/* type, e.g. SOCK_DGRAM */ ++short+so_options;+/* from socket call */ ++short+so_linger;+/* time to linger @ close */ ++short+so_state;+/* internal state flags */ ++caddr_t+so_pcb;+/* network layer pcb */ ++struct protosw+*so_proto;+/* protocol handle */ ++struct socket+*so_head;+/* ptr to accept socket */ ++struct socket+*so_q0;+/* queue of partial connX */ ++short+so_q0len;+/* # partials on so_q0 */ ++struct socket+*so_q;+/* queue of incoming connX */ ++short+so_qlen;+/* # connections on so_q */ ++short+so_qlimit;+/* max # queued connX */ ++struct sockbuf+{ +++short+sb_cc;+/* actual chars in buffer */ +++short+sb_hiwat;+/* max actual char count */ +++short+sb_mbcnt;+/* chars of mbufs used */ +++short+sb_mbmax;+/* max chars of mbufs to use */ +++short+sb_lowat;+/* low water mark (not used yet) */ +++short+sb_timeo;+/* timeout (not used ) */ +++struct mbuf+*sb_mb;+/* the mbuf chain */ +++struct proc+*sb_sel;+/* process selecting */ +++short+sb_flags;+/* flags, see below */ ++} so_rcv, so_snd; ++short+so_timeo;+/* connection timeout */ ++u_short+so_error;+/* error affecting connX */ ++short+so_oobmark;+/* oob mark (TCP only) */ ++short+so_pgrp;+/* pgrp for signals */ +} +.TE +\fR +.)b +.pp +The socket code maintains a pair of queues for each socket, +\fIso_rcv\fR and \fIso_snd\fR. +Each queue is associated with a count of the number of characters +in the queue, the maximum number of characters allowed to be put +in the queue, some status information (\fIsb_flags\fR), and +several unused fields. +For a send operation, data are copied from the user's address space +into chains of mbufs. +This is done by the socket module, which then calls the underlying +transport protocol module to place the data +on the send queue. +This is generally done by +appending to the chain beginning at \fIsb_mb\fR. +The socket module copies data from the \fIso_rcv\fR queue +to the user's address space to effect a receive operation. +The underlying transport layer is expected to have put incoming +data into \fIso_rcv\fR by calling procedures in this module. +.in -5 +.sh 3 "Transport protocol management." +.pp +All protocols and address types must be \*(lqregistered\*(rq in a +common way in order to use the IPC user interface. +Each protocol must have an entry in a protocol switch table. +Each entry takes the form: +.(b +\fC +.TS +tab(+); +l s s s. +struct protosw { +.T& +l l l l. ++short+pr_type;+/* socket type used for */ ++short+pr_family;+/* protocol family */ ++short+pr_protocol;+/* protocol # from the database */ ++short+pr_flags;+/* status information */ ++++/* protocol-protocol hooks */ ++int+(*pr_input)();+/* input (from below) */ ++int+(*pr_output)();+/* output (from above) */ ++int+(*pr_ctlinput)();+/* control input */ ++int+(*pr_ctloutput)();+/* control output */ ++++/* user-protocol hook */ ++int+(*pr_usrreq)();+/* user request: see list below */ ++++/* utility hooks */ ++int+(*pr_init)();+/* initialization hook */ ++int+(*pr_fasttimo)();+/* fast timeout (200ms) */ ++int+(*pr_slowtimo)();+/* slow timeout (500ms) */ ++int+(*pr_drain)();+/* free some space (not used) */ +} +.TE +\fR +.)b +.pp +Associated with each protocol are the types of socket +abstractions supported by the protocol (\fIpr_type\fR), the +format of the addresses used by the protocol (\fIpr_family\fR), +the routines to be called to perform +a standard set of protocol functions (\fIpr_input\fR,...,\fIpr_drain\fR), +and some status information (\fIpr_flags\fR). +The field pr_flags keeps such information as +SS_ISCONNECTED (this socket has a peer), +SS_ISCONNECTING (this socket is in the process of establishing +a connection), +SS_ISDISCONNECTING (this socket is in the process of being disconnected), +SS_CANTSENDMORE (this socket is half-closed and cannot send), +SS_CANTRCVMORE (this socket is half-closed and cannot receive). +There are some flags that are specific to the TCP concept +of out-of-band data. +A flag SS_OOBAVAIL was added for the ARGO implementation, to support +the TP concept of out-of-band data (expedited data). +.sh 3 "Network Interface Drivers" +.pp +The drivers for the devices attaching a Unix machine to a network +medium share a common interface to the protocol +software. +There is a common data structure for managing queues, +not surprisingly, a chain of mbufs. +There is a set of macros that are used to enqueue and +dequeue mbuf chains at high priority. +A driver +delivers an indication to a protocol entity when +an incoming packet has been placed on a queue by +issuing a +software +interrupt. +.sh 3 "Support for individual protocols." +.pp +Each protocol is written as a separate functional unit. +Because all protocols share the clock and the mbuf pool, they +are not entirely insulated from each other. +The details of TP are described in a section that +follows. +.\"***************************************************** +.\" FIGURE +.so figs/unix_ipc.nr +.pp +.CF +shows the arrangement of the IPC support. +.pp +The AOS +IPC was designed for DoD Internet protocols, all of +which run over DoD IP. +The assumptions that DoD Internet is the domain +and that DoD IP is the network layer +appear in the code and data structures in numerous places. +For example, it is assumed that addresses can be compared +by a bitwise comparison of 4 octets. +Another example is that the transport protocols all directly call +IP routines. +There are no hooks in the data structures through +which the transport layer can choose a network level protocol. +A third example is that the host's local addresses +are stored in the network interface drivers and the drivers +have only one address - an Internet address. +A fourth example is that headers are assumed to +fit in one small mbuf (112 bytes for data in AOS). +A fifth example is this: +It is assumed in many places that buffer space is managed +in units of characters or octets. +The user data are copied from user address space into the kernel mbufs +amorphously +by the socket code, a protocol-independent part of the kernel. +This is fine for a stream protocol, but it means that a +packet protocol, in order to \*(lqpacketize\*(rq the data, +must perform a memory-to-memory copy +that might have been avoided had the protocol layer done the original +copy from user address space. +Furthermore, protocols that count credit in terms of packets or +buffers rather than characters do not work efficiently because +the computation of buffer space is not in the protocol module, +but rather it is in the socket code module. +This list of examples is not complete. +.pp +To summarize, adding a new transport protocol to the kernel consists of +adding entries to the tables in the protocol management +unit, +modifying the network interface driver(s) to recognize +new network protocol identifiers, +adding the +new system calls to the kernel and to the user library, +and +adding code modules for each of the protocols, +and correcting deficiencies in the socket code, +where the assumptions made about the nature of +transport protocols do not apply. |