diff options
Diffstat (limited to 'share/doc')
317 files changed, 75354 insertions, 0 deletions
diff --git a/share/doc/IPv6/IMPLEMENTATION b/share/doc/IPv6/IMPLEMENTATION new file mode 100644 index 0000000..9b5d8ee --- /dev/null +++ b/share/doc/IPv6/IMPLEMENTATION @@ -0,0 +1,2392 @@ + Implementation Note + + KAME Project + http://www.kame.net/ + $KAME: IMPLEMENTATION,v 1.216 2001/05/25 07:43:01 jinmei Exp $ + $FreeBSD$ + +NOTE: The document tries to describe behaviors/implementation choices +of the latest KAME/*BSD stack. The description here may not be +applicable to KAME-integrated *BSD releases, as we have certain amount +of changes between them. Still, some of the content can be useful for +KAME-integrated *BSD releases. + +Table of Contents + + 1. IPv6 + 1.1 Conformance + 1.2 Neighbor Discovery + 1.3 Scope Zone Index + 1.3.1 Kernel internal + 1.3.2 Interaction with API + 1.3.3 Interaction with users (command line) + 1.4 Plug and Play + 1.4.1 Assignment of link-local, and special addresses + 1.4.2 Stateless address autoconfiguration on hosts + 1.4.3 DHCPv6 + 1.5 Generic tunnel interface + 1.6 Address Selection + 1.6.1 Source Address Selection + 1.6.2 Destination Address Ordering + 1.7 Jumbo Payload + 1.8 Loop prevention in header processing + 1.9 ICMPv6 + 1.10 Applications + 1.11 Kernel Internals + 1.12 IPv4 mapped address and IPv6 wildcard socket + 1.12.1 KAME/BSDI3 and KAME/FreeBSD228 + 1.12.2 KAME/FreeBSD[34]x + 1.12.2.1 KAME/FreeBSD[34]x, listening side + 1.12.2.2 KAME/FreeBSD[34]x, initiating side + 1.12.3 KAME/NetBSD + 1.12.3.1 KAME/NetBSD, listening side + 1.12.3.2 KAME/NetBSD, initiating side + 1.12.4 KAME/BSDI4 + 1.12.4.1 KAME/BSDI4, listening side + 1.12.4.2 KAME/BSDI4, initiating side + 1.12.5 KAME/OpenBSD + 1.12.5.1 KAME/OpenBSD, listening side + 1.12.5.2 KAME/OpenBSD, initiating side + 1.12.6 More issues + 1.12.7 Interaction with SIIT translator + 1.13 sockaddr_storage + 1.14 Invalid addresses on the wire + 1.15 Node's required addresses + 1.15.1 Host case + 1.15.2 Router case + 1.16 Advanced API + 1.17 DNS resolver + 2. Network Drivers + 2.1 FreeBSD 2.2.x-RELEASE + 2.2 BSD/OS 3.x + 2.3 NetBSD + 2.4 FreeBSD 3.x-RELEASE + 2.5 FreeBSD 4.x-RELEASE + 2.6 OpenBSD 2.x + 2.7 BSD/OS 4.x + 3. Translator + 3.1 FAITH TCP relay translator + 3.2 IPv6-to-IPv4 header translator + 4. IPsec + 4.1 Policy Management + 4.2 Key Management + 4.3 AH and ESP handling + 4.4 IPComp handling + 4.5 Conformance to RFCs and IDs + 4.6 ECN consideration on IPsec tunnels + 4.7 Interoperability + 4.8 Operations with IPsec tunnel mode + 4.8.1 RFC2401 IPsec tunnel mode approach + 4.8.2 draft-touch-ipsec-vpn approach + 5. ALTQ + 6. Mobile IPv6 + 6.1 KAME node as correspondent node + 6.2 KAME node as home agent/mobile node + 6.3 Old Mobile IPv6 code + 7. Coding style + 8. Policy on technology with intellectual property right restriction + +1. IPv6 + +1.1 Conformance + +The KAME kit conforms, or tries to conform, to the latest set of IPv6 +specifications. For future reference we list some of the relevant documents +below (NOTE: this is not a complete list - this is too hard to maintain...). +For details please refer to specific chapter in the document, RFCs, manpages +come with KAME, or comments in the source code. + +Conformance tests have been performed on past and latest KAME STABLE kit, +at TAHI project. Results can be viewed at http://www.tahi.org/report/KAME/. +We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/) +in the past, with our past snapshots. + +RFC1639: FTP Operation Over Big Address Records (FOOBAR) + * RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428, + then RFC1639 if failed. +RFC1886: DNS Extensions to support IPv6 +RFC1933: (see RFC2893) +RFC1981: Path MTU Discovery for IPv6 +RFC2080: RIPng for IPv6 + * KAME-supplied route6d, bgpd and hroute6d support this. +RFC2283: Multiprotocol Extensions for BGP-4 + * so-called "BGP4+". + * KAME-supplied bgpd supports this. +RFC2292: Advanced Sockets API for IPv6 + * see RFC3542 +RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM) + * RFC2362 defines the packet formats and the protcol of PIM-SM. +RFC2373: IPv6 Addressing Architecture + * KAME supports node required addresses, and conforms to the scope + requirement. +RFC2374: An IPv6 Aggregatable Global Unicast Address Format + * KAME supports 64-bit length of Interface ID. +RFC2375: IPv6 Multicast Address Assignments + * Userland applications use the well-known addresses assigned in the RFC. +RFC2428: FTP Extensions for IPv6 and NATs + * RFC2428 is preferred over RFC1639. ftp clients will first try RFC2428, + then RFC1639 if failed. +RFC2460: IPv6 specification +RFC2461: Neighbor discovery for IPv6 + * See 1.2 in this document for details. +RFC2462: IPv6 Stateless Address Autoconfiguration + * See 1.4 in this document for details. +RFC2463: ICMPv6 for IPv6 specification + * See 1.9 in this document for details. +RFC2464: Transmission of IPv6 Packets over Ethernet Networks +RFC2465: MIB for IPv6: Textual Conventions and General Group + * Necessary statistics are gathered by the kernel. Actual IPv6 MIB + support is provided as patchkit for ucd-snmp. +RFC2466: MIB for IPv6: ICMPv6 group + * Necessary statistics are gathered by the kernel. Actual IPv6 MIB + support is provided as patchkit for ucd-snmp. +RFC2467: Transmission of IPv6 Packets over FDDI Networks +RFC2472: IPv6 over PPP +RFC2492: IPv6 over ATM Networks + * only PVC is supported. +RFC2497: Transmission of IPv6 packet over ARCnet Networks +RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing +RFC2553: (see RFC3493) +RFC2671: Extension Mechanisms for DNS (EDNS0) + * see USAGE for how to use it. + * not supported on kame/freebsd4 and kame/bsdi4. +RFC2673: Binary Labels in the Domain Name System + * KAME/bsdi4 supports A6, DNAME and binary label to some extent. + * KAME apps/bind8 repository has resolver library with partial A6, DNAME + and binary label support. +RFC2675: IPv6 Jumbograms + * See 1.7 in this document for details. +RFC2710: Multicast Listener Discovery for IPv6 +RFC2711: IPv6 router alert option +RFC2732: Format for Literal IPv6 Addresses in URL's + * The spec is implemented in programs that handle URLs + (like freebsd ftpio(3) and fetch(1), or netbsd ftp(1)) +RFC2874: DNS Extensions to Support IPv6 Address Aggregation and Renumbering + * KAME/bsdi4 supports A6, DNAME and binary label to some extent. + * KAME apps/bind8 repository has resolver library with partial A6, DNAME + and binary label support. +RFC2893: Transition Mechanisms for IPv6 Hosts and Routers + * IPv4 compatible address is not supported. + * automatic tunneling (4.3) is not supported. + * "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way, + and it covers "configured tunnel" described in the spec. + See 1.5 in this document for details. +RFC2894: Router renumbering for IPv6 +RFC3041: Privacy Extensions for Stateless Address Autoconfiguration in IPv6 +RFC3056: Connection of IPv6 Domains via IPv4 Clouds + * So-called "6to4". + * "stf" interface implements it. Be sure to read + draft-itojun-ipv6-transition-abuse-01.txt + below before configuring it, there can be security issues. +RFC3142: An IPv6-to-IPv4 transport relay translator + * FAITH tcp relay translator (faithd) implements this. See 3.1 for more + details. +RFC3152: Delegation of IP6.ARPA + * libinet6 resolvers contained in the KAME snaps support to use + the ip6.arpa domain (with the nibble format) for IPv6 reverse + lookups. +RFC3484: Default Address Selection for IPv6 + * the selection algorithm for both source and destination addresses + is implemented based on the RFC, though some rules are still omitted. +RFC3493: Basic Socket Interface Extensions for IPv6 + * IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind + socket (3.8) are, + - supported and turned on by default on KAME/FreeBSD[34] + and KAME/BSDI4, + - supported but turned off by default on KAME/NetBSD and KAME/FreeBSD5, + - not supported on KAME/FreeBSD228, KAME/OpenBSD and KAME/BSDI3. + see 1.12 in this document for details. + * The AI_ALL and AI_V4MAPPED flags are not supported. +RFC3542: Advanced Sockets API for IPv6 (revised) + * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI. + * Some of the updates in the draft are not implemented yet. See + TODO.2292bis for more details. +RFC4007: IPv6 Scoped Address Architecture + * some part of the documentation (especially about the routing + model) is not supported yet. + * zone indices that contain scope types have not been supported yet. + +draft-ietf-ipngwg-icmp-name-lookups-09: IPv6 Name Lookups Through ICMP +draft-ietf-ipv6-router-selection-07.txt: + Default Router Preferences and More-Specific Routes + * router-side: both router preference and specific routes are supported. + * host-side: only router preference is supported. +draft-ietf-pim-sm-v2-new-02.txt + A revised version of RFC2362, which includes the IPv6 specific + packet format and protocol descriptions. +draft-ietf-dnsext-mdns-00.txt: Multicast DNS + * kame/mdnsd has test implementation, which will not be built in + default compilation. The draft will experience a major change in the + near future, so don't rely upon it. +draft-ietf-ipngwg-icmp-v3-02.txt: ICMPv6 for IPv6 specification (revised) + * See 1.9 in this document for details. +draft-itojun-ipv6-tcp-to-anycast-01.txt: + Disconnecting TCP connection toward IPv6 anycast address +draft-ietf-ipv6-rfc2462bis-06.txt: IPv6 Stateless Address + Autoconfiguration (revised) +draft-itojun-ipv6-transition-abuse-01.txt: + Possible abuse against IPv6 transition technologies (expired) + * KAME does not implement RFC1933/2893 automatic tunnel. + * "stf" interface implements some address filters. Refer to stf(4) + for details. Since there's no way to make 6to4 interface 100% secure, + we do not include "stf" interface into GENERIC.v6 compilation. + * kame/openbsd completely disables IPv4 mapped address support. + * kame/netbsd makes IPv4 mapped address support off by default. + * See section 1.12.6 and 1.14 for more details. +draft-itojun-ipv6-flowlabel-api-01.txt: Socket API for IPv6 flow label field + * no consideration is made against the use of routing headers and such. + +1.2 Neighbor Discovery + +Our implementation of Neighbor Discovery is fairly stable. Currently +Address Resolution, Duplicated Address Detection, and Neighbor +Unreachability Detection are supported. In the near future we will be +adding an Unsolicited Neighbor Advertisement transmission command as +an administration tool. + +Duplicated Address Detection (DAD) will be performed when an IPv6 address +is assigned to a network interface, or the network interface is enabled +(ifconfig up). It is documented in RFC2462 5.4. +If DAD fails, the address will be marked "duplicated" and message will be +generated to syslog (and usually to console). The "duplicated" mark +can be checked with ifconfig. It is administrators' responsibility to check +for and recover from DAD failures. We may try to improve failure recovery +in future KAME code. + +A successor version of RFC2462 (called rfc2462bis) clarifies the +behavior when DAD fails (i.e., duplicate is detected): if the +duplicate address is a link-local address formed from an interface +identifier based on the hardware address which is supposed to be +uniquely assigned (e.g., EUI-64 for an Ethernet interface), IPv6 +operation on the interface should be disabled. The KAME +implementation supports this as follows: if this type of duplicate is +detected, the kernel marks "disabled" in the ND specific data +structure for the interface. Every IPv6 I/O operation in the kernel +checks this mark, and the kernel will drop packets received on or +being sent to the "disabled" interface. Whether the IPv6 operation is +disabled or not can be confirmed by the ndp(8) command. See the man +page for more details. + +DAD procedure may not be effective on certain network interfaces/drivers. +If a network driver needs long initialization time (with wireless network +interfaces this situation is popular), and the driver mistakingly raises +IFF_RUNNING before the driver becomes ready, DAD code will try to transmit +DAD probes to not-really-ready network driver and the packet will not go out +from the interface. In such cases, network drivers should be corrected. + +Some of network drivers loop multicast packets back to themselves, +even if instructed not to do so (especially in promiscuous mode). In +such cases DAD may fail, because the DAD engine sees inbound NS packet +(actually from the node itself) and considers it as a sign of +duplicate. In this case, drivers should be corrected to honor +IFF_SIMPLEX behavior. For example, you may need to check source MAC +address on an inbound packet, and reject it if it is from the node +itself. + +Neighbor Discovery specification (RFC2461) does not talk about neighbor +cache handling in the following cases: +(1) when there was no neighbor cache entry, node received unsolicited + RS/NS/NA/redirect packet without link-layer address +(2) neighbor cache handling on medium without link-layer address + (we need a neighbor cache entry for IsRouter bit) +For (1), we implemented workaround based on discussions on IETF ipngwg mailing +list. For more details, see the comments in the source code and email +thread started from (IPng 7155), dated Feb 6 1999. + +IPv6 on-link determination rule (RFC2461) is quite different from +assumptions in BSD IPv4 network code. To implement the behavior in +RFC2461 section 6.3.6 (3), the kernel needs to know the default +outgoing interface. To configure the default outgoing interface, use +commands like "ndp -I de0" as root. Then the kernel will have a +"default" route to the interface with the cloning "C" bit being on. +This default route will cause to make a neighbor cache entry for every +destination that does not match an explicit route entry. + +Note that we intentionally disable configuring the default interface +by default. This is because we found it sometimes caused inconvenient +situation while it was rarely useful in practical usage. For example, +consider a destination that has both IPv4 and IPv6 addresses but is +only reachable via IPv4. Since our getaddrinfo(3) prefers IPv6 by +default, an (TCP) application using the library with PF_UNSPEC first +tries to connect to the IPv6 address. If we turn on RFC 2461 6.3.6 +(3), we have to wait for quite a long period before the first attempt +to make a connection fails. If we turn it off, the first attempt will +immediately fail with EHOSTUNREACH, and then the application can try +the next, reachable address. + +The notion of the default interface is also disabled when the node is +acting as a router. The reason is that routers tend to control all +routes stored in the kernel and the default route automatically +installed would rather confuse the routers. Note that the spec misuse +the word "host" and "node" in several places in Section 5.2 of RFC +2461. We basically read the word "node" in this section as "host," +and thus believe the implementation policy does not break the +specification. + +To avoid possible DoS attacks and infinite loops, KAME stack will accept +only 10 options on ND packet. Therefore, if you have 20 prefix options +attached to RA, only the first 10 prefixes will be recognized. +If this troubles you, please contact the KAME team and/or modify +nd6_maxndopt in sys/netinet6/nd6.c. If there are high demands we may +provide a sysctl knob for the variable. + +Proxy Neighbor Advertisement support is implemented in the kernel. +For instance, you can configure it by using the following command: + # ndp -s fe80::1234%ne0 0:1:2:3:4:5 proxy +where ne0 is the interface which attaches to the same link as the +proxy target. +There are certain limitations, though: +- It does not send unsolicited multicast NA on configuration. This is MAY + behavior in RFC2461. +- It does not add random delay before transmission of solicited NA. This is + SHOULD behavior in RFC2461. +- We cannot configure proxy NDP for off-link address. The target address for + proxying must be link-local address, or must be in prefixes configured to + node which does proxy NDP. +- RFC2461 is unclear about if it is legal for a host to perform proxy ND. + We do not prohibit hosts from doing proxy ND, but there will be very limited + use in it. + +Starting mid March 2000, we support Neighbor Unreachability Detection +(NUD) on p2p interfaces, including tunnel interfaces (gif). NUD is +turned on by default. Before March 2000 the KAME stack did not +perform NUD on p2p interfaces. If the change raises any +interoperability issues, you can turn off/on NUD by per-interface +basis. Use "ndp -i interface -nud" to turn it off. Consult ndp(8) +for details. + +RFC2461 specifies upper-layer reachability confirmation hint. Whenever +upper-layer reachability confirmation hint comes, ND process can use it +to optimize neighbor discovery process - ND process can omit real ND exchange +and keep the neighbor cache state in REACHABLE. +We currently have two sources for hints: (1) setsockopt(IPV6_REACHCONF) +defined by the RFC3542 API, and (2) hints from tcp(6)_input. + +It is questionable if they are really trustworthy. For example, a +rogue userland program can use IPV6_REACHCONF to confuse the ND +process. Neighbor cache is a system-wide information pool, and it is +bad to allow a single process to affect others. Also, tcp(6)_input +can be hosed by hijack attempts. It is wrong to allow hijack attempts +to affect the ND process. + +Starting June 2000, the ND code has a protection mechanism against +incorrect upper-layer reachability confirmation. The ND code counts +subsequent upper-layer hints. If the number of hints reaches the +maximum, the ND code will ignore further upper-layer hints and run +real ND process to confirm reachability to the peer. sysctl +net.inet6.icmp6.nd6_maxnudhint defines the maximum # of subsequent +upper-layer hints to be accepted. +(from April 2000 to June 2000, we rejected setsockopt(IPV6_REACHCONF) from +non-root process - after a local discussion, it looks that hints are not +that trustworthy even if they are from privileged processes) + +If inbound ND packets carry invalid values, the KAME kernel will +drop these packet and increment statistics variable. See +"netstat -sn", icmp6 section. For detailed debugging session, you can +turn on syslog output from the kernel on errors, by turning on sysctl MIB +net.inet6.icmp6.nd6_debug. nd6_debug can be turned on at bootstrap +time, by defining ND6_DEBUG kernel compilation option (so you can +debug behavior during bootstrap). nd6_debug configuration should +only be used for test/debug purposes - for a production environment, +nd6_debug must be set to 0. If you leave it to 1, malicious parties +can inject broken packet and fill up /var/log partition. + +1.3 Scope Zone Index + +IPv6 uses scoped addresses. It is therefore very important to +specify the scope zone index (link index for a link-local address, or +site index for a site-local address) with an IPv6 address. Without a +zone index, a scoped IPv6 address is ambiguous to the kernel, and +the kernel would not be able to determine the outbound zone for a +packet to the scoped address. KAME code tries to address the issue in +several ways. + +The entire architecture of scoped addresses is documented in RFC4007. +One non-trivial point of the architecture is that the link scope is +(theoretically) larger than the interface scope. That is, two +different interfaces can belong to a same single link. However, in a +normal operation, we can assume that there is 1-to-1 relationship +between links and interfaces. In other words, we can usually put +links and interfaces in the same scope type. The current KAME +implementation assumes the 1-to-1 relationship. In particular, we use +interface names such as "ne1" as unique link identifiers. This would +be much more human-readable and intuitive than numeric identifiers, +but please keep your mind on the theoretical difference between links +and interfaces. + +Site-local addresses are very vaguely defined in the specs, and both +the specification and the KAME code need tons of improvements to +enable its actual use. For example, it is still very unclear how we +define a site, or how we resolve host names in a site. There is work +underway to define behavior of routers at site border, but, we have +almost no code for site boundary node support (neither forwarding nor +routing) and we bet almost noone has. We recommend, at this moment, +you to use global addresses for experiments - there are way too many +pitfalls if you use site-local addresses. + +1.3.1 Kernel internal + +In the kernel, the link index for a link-local scope address is +embedded into the 2nd 16bit-word (the 3rd and 4th bytes) in the IPv6 +address. +For example, you may see something like: + fe80:1::200:f8ff:fe01:6317 +in the routing table and the interface address structure (struct +in6_ifaddr). The address above is a link-local unicast address which +belongs to a network link whose link identifier is 1 (note that it +eqauls to the interface index by the assumption of our +implementation). The embedded index enables us to identify IPv6 +link-local addresses over multiple links effectively and with only a +little code change. + +The use of the internal format must be limited inside the kernel. In +particular, addresses sent by an application should not contain the +embedded index (except via some very special APIs such as routing +sockets). Instead, the index should be specified in the sin6_scope_id +field of a sockaddr_in6 structure. Obviously, packets sent to or +received from must not contain the embedded index either, since the +index is meaningful only within the sending/receiving node. + +In order to deal with the differences, several kernel routines are +provided. These are available by including <netinet6/scope_var.h>. +Typically, the following functions will be most generally used: + +- int sa6_embedscope(struct sockaddr_in6 *sa6, int defaultok); + Embed sa6->sin6_scope_id into sa6->sin6_addr. If sin6_scope_id is + 0, defaultok is non-0, and the default zone ID (see RFC4007) is + configured, the default ID will be used instead of the value of the + sin6_scope_id field. On success, sa6->sin6_scope_id will be reset + to 0. + + This function returns 0 on success, or a non-0 error code otherwise. + +- int sa6_recoverscope(struct sockaddr_in6 *sa6); + Extract embedded zone ID in sa6->sin6_addr and set + sa6->sin6_scope_id to that ID. The embedded ID will be cleared with + 0. + + This function returns 0 on success, or a non-0 error code otherwise. + +- int in6_clearscope(struct in6_addr *in6); + Reset the embedded zone ID in 'in6' to 0. This function never fails, and + returns 0 if the original address is intact or non 0 if the address is + modified. The return value doesn't matter in most cases; currently, the + only point where we care about the return value is ip6_input() for checking + whether the source or destination addresses of the incoming packet is in + the embedded form. + +- int in6_setscope(struct in6_addr *in6, struct ifnet *ifp, + u_int32_t *zoneidp); + Embed zone ID determined by the address scope type for 'in6' and the + interface 'ifp' into 'in6'. If zoneidp is non NULL, *zoneidp will + also have the zone ID. + + This function returns 0 on success, or a non-0 error code otherwise. + +The typical usage of these functions is as follows: + +sa6_embedscope() will be used at the socket or transport layer to +convert a sockaddr_in6 structure passed by an application into the +kernel-internal form. In this usage, the second argument is often the +'ip6_use_defzone' global variable. + +sa6_recoverscope() will also be used at the socket or transport layer +to convert an in6_addr structure with the embedded zone ID into a +sockaddr_in6 structure with the corresponding ID in the sin6_scope_id +field (and without the embedded ID in sin6_addr). + +in6_clearscope() will be used just before sending a packet to the wire +to remove the embedded ID. In general, this must be done at the last +stage of an output path, since otherwise the address would lose the ID +and could be ambiguous with regard to scope. + +in6_setscope() will be used when the kernel receives a packet from the +wire to construct the kernel internal form for each address field in +the packet (typical examples are the source and destination addresses +of the packet). In the typical usage, the third argument 'zoneidp' +will be NULL. A non-NULL value will be used when the validity of the +zone ID must be checked, e.g., when forwarding a packet to another +link (see ip6_forward() for this usage). + +An application, when sending a packet, is basically assumed to specify +the appropriate scope zone of the destination address by the +sin6_scope_id field (this might be done transparently from the +application with getaddrinfo() and the extended textual format - see +below), or at least the default scope zone(s) must be configured as a +last resort. In some cases, however, an application could specify an +ambiguous address with regard to scope, expecting it is disambiguated +in the kernel by some other means. A typical usage is to specify the +outgoing interface through another API, which can disambiguate the +unspecified scope zone. Such a usage is not recommended, but the +kernel implements some trick to deal with even this case. + +A rough sketch of the trick can be summarized as the following +sequence. + + sa6_embedscope(dst, ip6_use_defzone); + in6_selectsrc(dst, ..., &ifp, ...); + in6_setscope(&dst->sin6_addr, ifp, NULL); + +sa6_embedscope() first tries to convert sin6_scope_id (or the default +zone ID) into the kernel-internal form. This can fail with an +ambiguous destination, but it still tries to get the outgoing +interface (ifp) in the attempt of determining the source address of +the outgoing packet using in6_selectsrc(). If the interface is +detected, and the scope zone was originally ambiguous, in6_setscope() +can finally determine the appropriate ID with the address itself and +the interface, and construct the kernel-internal form. See, for +example, comments in udp6_output() for more concrete example. + +In any case, kernel routines except ones in netinet6/scope6.c MUST NOT +directly refer to the embedded form. They MUST use the above +interface functions. In particular, kernel routines MUST NOT have the +following code fragment: + + /* This is a bad practice. Don't do this */ + if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr)) + sin6->sin6_addr.s6_addr16[1] = htons(ifp->if_index); + +This is bad for several reasons. First, address ambiguity is not +specific to link-local addresses (any non-global multicast addresses +are inherently ambiguous, and this is particularly true for +interface-local addresses). Secondly, this is vulnerable to future +changes of the embedded form (the embedded position may change, or the +zone ID may not actually be the interface index). Only scope6.c +routines should know the details. + +The above code fragment should thus actually be as follows: + + /* This is correct. */ + in6_setscope(&sin6->sin6_addr, ifp, NULL); + (and catch errors if possible and necessary) + +1.3.2 Interaction with API + +There are several candidates of API to deal with scoped addresses +without ambiguity. + +The IPV6_PKTINFO ancillary data type or socket option defined in the +advanced API (RFC2292 or RFC3542) can specify +the outgoing interface of a packet. Similarly, the IPV6_PKTINFO or +IPV6_RECVPKTINFO socket options tell kernel to pass the incoming +interface to user applications. + +These options are enough to disambiguate scoped addresses of an +incoming packet, because we can uniquely identify the corresponding +zone of the scoped address(es) by the incoming interface. However, +they are too strong for outgoing packets. For example, consider a +multi-sited node and suppose that more than one interface of the node +belongs to a same site. When we want to send a packet to the site, +we can only specify one of the interfaces for the outgoing packet with +these options; we cannot just say "send the packet to (one of the +interfaces of) the site." + +Another kind of candidates is to use the sin6_scope_id member in the +sockaddr_in6 structure, defined in RFC2553. The KAME kernel +interprets the sin6_scope_id field properly in order to disambiguate scoped +addresses. For example, if an application passes a sockaddr_in6 +structure that has a non-zero sin6_scope_id value to the sendto(2) +system call, the kernel should send the packet to the appropriate zone +according to the sin6_scope_id field. Similarly, when the source or +the destination address of an incoming packet is a scoped one, the +kernel should detect the correct zone identifier based on the address +and the receiving interface, fill the identifier in the sin6_scope_id +field of a sockaddr_in6 structure, and then pass the packet to an +application via the recvfrom(2) system call, etc. + +However, the semantics of the sin6_scope_id is still vague and on the +way to standardization. Additionally, not so many operating systems +support the behavior above at this moment. + +In summary, +- If your target system is limited to KAME based ones (i.e. BSD + variants and KAME snaps), use the sin6_scope_id field assuming the + kernel behavior described above. +- Otherwise, (i.e. if your program should be portable on other systems + than BSDs) + + Use the advanced API to disambiguate scoped addresses of incoming + packets. + + To disambiguate scoped addresses of outgoing packets, + * if it is okay to just specify the outgoing interface, use the + advanced API. This would be the case, for example, when you + should only consider link-local addresses and your system + assumes 1-to-1 relationship between links and interfaces. + * otherwise, sorry but you lose. Please rush the IETF IPv6 + community into standardizing the semantics of the sin6_scope_id + field. + +Routing daemons and configuration programs, like route6d and ifconfig, +will need to manipulate the "embedded" zone index. These programs use +routing sockets and ioctls (like SIOCGIFADDR_IN6) and the kernel API +will return IPv6 addresses with the 2nd 16bit-word filled in. The +APIs are for manipulating kernel internal structure. Programs that +use these APIs have to be prepared about differences in kernels +anyway. + +getaddrinfo(3) and getnameinfo(3) support an extended numeric IPv6 +syntax, as documented in RFC4007. You can specify the outgoing link, +by using the name of the outgoing interface as the link, like +"fe80::1%ne0" (again, note that we assume there is 1-to-1 relationship +between links and interfaces.) This way you will be able to specify a +link-local scoped address without much trouble. + +Other APIs like inet_pton(3) and inet_ntop(3) are inherently +unfriendly with scoped addresses, since they are unable to annotate +addresses with zone identifier. + +1.3.3 Interaction with users (command line) + +Most of user applications now support the extended numeric IPv6 +syntax. In this case, you can specify outgoing link, by using the name +of the outgoing interface like "fe80::1%ne0" (sorry for the duplicated +notice, but please recall again that we assume 1-to-1 relationship +between links and interfaces). This is even the case for some +management tools such as route(8) or ndp(8). For example, to install +the IPv6 default route by hand, you can type like + # route add -inet6 default fe80::9876:5432:1234:abcd%ne0 +(Although we suggest you to run dynamic routing instead of static +routes, in order to avoid configuration mistakes.) + +Some applications have command line options for specifying an +appropriate zone of a scoped address (like "ping6 -I ne0 ff02::1" to +specify the outgoing interface). However, you can't always expect such +options. Additionally, specifying the outgoing "interface" is in +theory an overspecification as a way to specify the outgoing "link" +(see above). Thus, we recommend you to use the extended format +described above. This should apply to the case where the outgoing +interface is specified. + +In any case, when you specify a scoped address to the command line, +NEVER write the embedded form (such as ff02:1::1 or fe80:2::fedc), +which should only be used inside the kernel (see Section 1.3.1), and +is not supposed to work. + +1.4 Plug and Play + +The KAME kit implements most of the IPv6 stateless address +autoconfiguration in the kernel. +Neighbor Discovery functions are implemented in the kernel as a whole. +Router Advertisement (RA) input for hosts is implemented in the +kernel. Router Solicitation (RS) output for endhosts, RS input +for routers, and RA output for routers are implemented in the +userland. + +1.4.1 Assignment of link-local, and special addresses + +IPv6 link-local address is generated from IEEE802 address (ethernet MAC +address). Each of interface is assigned an IPv6 link-local address +automatically, when the interface becomes up (IFF_UP). Also, direct route +for the link-local address is added to routing table. + +Here is an output of netstat command: + +Internet6: +Destination Gateway Flags Netif Expire +fe80::%ed0/64 link#1 UC ed0 +fe80::%ep0/64 link#2 UC ep0 + +Interfaces that has no IEEE802 address (pseudo interfaces like tunnel +interfaces, or ppp interfaces) will borrow IEEE802 address from other +interfaces, such as ethernet interfaces, whenever possible. +If there is no IEEE802 hardware attached, last-resort pseudorandom value, +which is from MD5(hostname), will be used as source of link-local address. +If it is not suitable for your usage, you will need to configure the +link-local address manually. + +If an interface is not capable of handling IPv6 (such as lack of multicast +support), link-local address will not be assigned to that interface. +See section 2 for details. + +Each interface joins the solicited multicast address and the +link-local all-nodes multicast addresses (e.g. fe80::1:ff01:6317 +and ff02::1, respectively, on the link the interface is attached). +In addition to a link-local address, the loopback address (::1) will be +assigned to the loopback interface. Also, ::1/128 and ff01::/32 are +automatically added to routing table, and loopback interface joins +node-local multicast group ff01::1. + +1.4.2 Stateless address autoconfiguration on hosts + +In IPv6 specification, nodes are separated into two categories: +routers and hosts. Routers forward packets addressed to others, hosts does +not forward the packets. net.inet6.ip6.forwarding defines whether this +node is a router or a host (router if it is 1, host if it is 0). + +It is NOT recommended to change net.inet6.ip6.forwarding while the node +is in operation. IPv6 specification defines behavior for "host" and "router" +quite differently, and switching from one to another can cause serious +troubles. It is recommended to configure the variable at bootstrap time only. + +The first step in stateless address configuration is Duplicated Address +Detection (DAD). See 1.2 for more detail on DAD. + +When a host hears Router Advertisement from the router, a host may +autoconfigure itself by stateless address autoconfiguration. This +behavior can be controlled by the net.inet6.ip6.accept_rtadv sysctl +variable and a per-interface flag managed in the kernel. The latter, +which we call "if_accept_rtadv" here, can be changed by the ndp(8) +command (see the manpage for more details). When the sysctl variable +is set to 1, and the flag is set, the host autoconfigures itself. By +autoconfiguration, network address prefixes for the receiving +interface (usually global address prefix) are added. The default +route is also configured. + +Routers periodically generate Router Advertisement packets. To +request an adjacent router to generate RA packet, a host can transmit +Router Solicitation. To generate an RS packet at any time, use the +"rtsol" command. The "rtsold" daemon is also available. "rtsold" +generates Router Solicitation whenever necessary, and it works greatly +for nomadic usage (notebooks/laptops). If one wishes to ignore Router +Advertisements, use sysctl to set net.inet6.ip6.accept_rtadv to 0. +Additionally, ndp(8) command can be used to control the behavior +per-interface basis. + +To generate Router Advertisement from a router, use the "rtadvd" daemon. + +Note that the IPv6 specification assumes the following items and that +nonconforming cases are left unspecified: +- Only hosts will listen to router advertisements +- Hosts have a single network interface (except loopback) +This is therefore unwise to enable net.inet6.ip6.accept_rtadv on routers, +or multi-interface hosts. A misconfigured node can behave strange +(KAME code allows nonconforming configuration, for those who would like +to do some experiments). + +To summarize the sysctl knob: + accept_rtadv forwarding role of the node + --- --- --- + 0 0 host (to be manually configured) + 0 1 router + 1 0 autoconfigured host + (spec assumes that hosts have a single + interface only, autoconfigred hosts + with multiple interfaces are + out-of-scope) + 1 1 invalid, or experimental + (out-of-scope of spec) + +The if_accept_rtadv flag is referred only when accept_rtadv is 1 (the +latter two cases). The flag does not have any effects when the sysctl +variable is 0. + +See 1.2 in the document for relationship between DAD and autoconfiguration. + +1.4.3 DHCPv6 + +We supply a tiny DHCPv6 server/client in kame/dhcp6. However, the +implementation is premature (for example, this does NOT implement +address lease/release), and it is not in default compilation tree on +some platforms. If you want to do some experiment, compile it on your +own. + +DHCPv6 and autoconfiguration also needs more work. "Managed" and "Other" +bits in RA have no special effect to stateful autoconfiguration procedure +in DHCPv6 client program ("Managed" bit actually prevents stateless +autoconfiguration, but no special action will be taken for DHCPv6 client). + +1.5 Generic tunnel interface + +GIF (Generic InterFace) is a pseudo interface for configured tunnel. +Details are described in gif(4) manpage. +Currently + v6 in v6 + v6 in v4 + v4 in v6 + v4 in v4 +are available. Use "gifconfig" to assign physical (outer) source +and destination address to gif interfaces. +Configuration that uses same address family for inner and outer IP +header (v4 in v4, or v6 in v6) is dangerous. It is very easy to +configure interfaces and routing tables to perform infinite level +of tunneling. Please be warned. + +gif can be configured to be ECN-friendly. See 4.5 for ECN-friendliness +of tunnels, and gif(4) manpage for how to configure. + +If you would like to configure an IPv4-in-IPv6 tunnel with gif interface, +read gif(4) carefully. You may need to remove IPv6 link-local address +automatically assigned to the gif interface. + +1.6 Address Selection + +1.6.1 Source Address Selection + +The KAME kernel chooses the source address for an outgoing packet +sent from a user application as follows: + +1. if the source address is explicitly specified via an IPV6_PKTINFO + ancillary data item or the socket option of that name, just use it. + Note that this item/option overrides the bound address of the + corresponding (datagram) socket. + +2. if the corresponding socket is bound, use the bound address. + +3. otherwise, the kernel first tries to find the outgoing interface of + the packet. If it fails, the source address selection also fails. + If the kernel can find an interface, choose the most appropriate + address based on the algorithm described in RFC3484. + + The policy table used in this algorithm is stored in the kernel. + To install or view the policy, use the ip6addrctl(8) command. The + kernel does not have pre-installed policy. It is expected that the + default policy described in the draft should be installed at the + bootstrap time using this command. + + This draft allows an implementation to add implementation-specific + rules with higher precedence than the rule "Use longest matching + prefix." KAME's implementation has the following additional rules + (that apply in the appeared order): + + - prefer addresses on alive interfaces, that is, interfaces with + the UP flag being on. This rule is particularly useful for + routers, since some routing daemons stop advertising prefixes + (addresses) on interfaces that have become down. + + - prefer addresses on "preferred" interfaces. "Preferred" + interfaces can be specified by the ndp(8) command. By default, + no interface is preferred, that is, this rule does not apply. + Again, this rule is particularly useful for routers, since there + is a convention, among router administrators, of assigning + "stable" addresses on a particular interface (typically a + loopback interface). + + In any case, addresses that break the scope zone of the + destination, or addresses whose zone do not contain the outgoing + interface are never chosen. + +When the procedure above fails, the kernel usually returns +EADDRNOTAVAIL to the application. + +In some cases, the specification explicitly requires the +implementation to choose a particular source address. The source +address for a Neighbor Advertisement (NA) message is an example. +Under the spec (RFC2461 7.2.2) NA's source should be the target +address of the corresponding NS's target. In this case we follow the +spec rather than the above rule. + +If you would like to prohibit the use of deprecated address for some +reason, configure net.inet6.ip6.use_deprecated to 0. The issue +related to deprecated address is described in RFC2462 5.5.4 (NOTE: +there is some debate underway in IETF ipngwg on how to use +"deprecated" address). + +As documented in the source address selection document, temporary +addresses for privacy extension are less preferred to public addresses +by default. However, for administrators who are particularly aware of +the privacy, there is a system-wide sysctl(3) variable +"net.inet6.ip6.prefer_tempaddr". When the variable is set to +non-zero, the kernel will rather prefer temporary addresses. The +default value of this variable is 0. + +1.6.2 Destination Address Ordering + +KAME's getaddrinfo(3) supports the destination address ordering +algorithm described in RFC3484. Getaddrinfo(3) needs to know the +source address for each destination address and policy entries +(described in the previous section) for the source and destination +addresses. To get the source address, the library function opens a +UDP socket and tries to connect(2) for the destination. To get the +policy entry, the function issues sysctl(3). + +1.7 Jumbo Payload + +KAME supports the Jumbo Payload hop-by-hop option used to send IPv6 +packets with payloads longer than 65,535 octets. But since currently +KAME does not support any physical interface whose MTU is more than +65,535, such payloads can be seen only on the loopback interface(i.e. +lo0). + +If you want to try jumbo payloads, you first have to reconfigure the +kernel so that the MTU of the loopback interface is more than 65,535 +bytes; add the following to the kernel configuration file: + options "LARGE_LOMTU" #To test jumbo payload +and recompile the new kernel. + +Then you can test jumbo payloads by the ping6 command with -b and -s +options. The -b option must be specified to enlarge the size of the +socket buffer and the -s option specifies the length of the packet, +which should be more than 65,535. For example, type as follows; + % ping6 -b 70000 -s 68000 ::1 + +The IPv6 specification requires that the Jumbo Payload option must not +be used in a packet that carries a fragment header. If this condition +is broken, an ICMPv6 Parameter Problem message must be sent to the +sender. KAME kernel follows the specification, but you cannot usually +see an ICMPv6 error caused by this requirement. + +If KAME kernel receives an IPv6 packet, it checks the frame length of +the packet and compares it to the length specified in the payload +length field of the IPv6 header or in the value of the Jumbo Payload +option, if any. If the former is shorter than the latter, KAME kernel +discards the packet and increments the statistics. You can see the +statistics as output of netstat command with `-s -p ip6' option: + % netstat -s -p ip6 + ip6: + (snip) + 1 with data size < data length + +So, KAME kernel does not send an ICMPv6 error unless the erroneous +packet is an actual Jumbo Payload, that is, its packet size is more +than 65,535 bytes. As described above, KAME kernel currently does not +support physical interface with such a huge MTU, so it rarely returns an +ICMPv6 error. + +TCP/UDP over jumbogram is not supported at this moment. This is because +we have no medium (other than loopback) to test this. Contact us if you +need this. + +IPsec does not work on jumbograms. This is due to some specification twists +in supporting AH with jumbograms (AH header size influences payload length, +and this makes it real hard to authenticate inbound packet with jumbo payload +option as well as AH). + +There are fundamental issues in *BSD support for jumbograms. We would like to +address those, but we need more time to finalize the task. To name a few: +- mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it cannot hold + jumbogram with len > 2G on 32bit architecture CPUs. If we would like to + support jumbogram properly, the field must be expanded to hold 4G + + IPv6 header + link-layer header. Therefore, it must be expanded to at least + int64_t (u_int32_t is NOT enough). +- We mistakingly use "int" to hold packet length in many places. We need + to convert them into larger numeric type. It needs a great care, as we may + experience overflow during packet length computation. +- We mistakingly check for ip6_plen field of IPv6 header for packet payload + length in various places. We should be checking mbuf pkthdr.len instead. + ip6_input() will perform sanity check on jumbo payload option on input, + and we can safely use mbuf pkthdr.len afterwards. +- TCP code needs careful updates in bunch of places, of course. + +1.8 Loop prevention in header processing + +IPv6 specification allows arbitrary number of extension headers to +be placed onto packets. If we implement IPv6 packet processing +code in the way BSD IPv4 code is implemented, kernel stack may +overflow due to long function call chain. KAME sys/netinet6 code +is carefully designed to avoid kernel stack overflow. Because of +this, KAME sys/netinet6 code defines its own protocol switch +structure, as "struct ip6protosw" (see netinet6/ip6protosw.h). + +In addition to this, we restrict the number of extension headers +(including the IPv6 header) in each incoming packet, in order to +prevent a DoS attack that tries to send packets with a massive number +of extension headers. The upper limit can be configured by the sysctl +value net.inet6.ip6.hdrnestlimit. In particular, if the value is 0, +the node will allow an arbitrary number of headers. As of writing this +document, the default value is 50. + +IPv4 part (sys/netinet) remains untouched for compatibility. +Because of this, if you receive IPsec-over-IPv4 packet with massive +number of IPsec headers, kernel stack may blow up. IPsec-over-IPv6 is okay. + +1.9 ICMPv6 + +After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error +packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium. +KAME already implements this into the kernel. + +RFC2463 requires rate limitation for ICMPv6 error packets generated by a +node, to avoid possible DoS attacks. KAME kernel implements two rate- +limitation mechanisms, tunable via sysctl: +- Minimum time interval between ICMPv6 error packets + KAME kernel will generate no more than one ICMPv6 error packet, + during configured time interval. net.inet6.icmp6.errratelimit + controls the interval (default: disabled). +- Maximum ICMPv6 error packet-per-second + KAME kernel will generate no more than the configured number of + packets in one second. net.inet6.icmp6.errppslimit controls the + maximum packet-per-second value (default: 200pps) +Basically, we need to pick values that are suitable against the bandwidth +of link layer devices directly attached to the node. In some cases the +default values may not fit well. We are still unsure if the default value +is sane or not. Comments are welcome. + +1.10 Applications + +For userland programming, we support IPv6 socket API as specified in +RFC2553/3493, RFC3542 and upcoming internet drafts. + +TCP/UDP over IPv6 is available and quite stable. You can enjoy "telnet", +"ftp", "rlogin", "rsh", "ssh", etc. These applications are protocol +independent. That is, they automatically chooses IPv4 or IPv6 +according to DNS. + +1.11 Kernel Internals + + (*) TCP/UDP part is handled differently between operating system platforms. + See 1.12 for details. + +The current KAME has escaped from the IPv4 netinet logic. While +ip_forward() calls ip_output(), ip6_forward() directly calls +if_output() since routers must not divide IPv6 packets into fragments. + +ICMPv6 should contain the original packet as long as possible up to +1280. UDP6/IP6 port unreach, for instance, should contain all +extension headers and the *unchanged* UDP6 and IP6 headers. +So, all IP6 functions except TCP6 never convert network byte +order into host byte order, to save the original packet. + +tcp6_input(), udp6_input() and icmp6_input() can't assume that IP6 +header is preceding the transport headers due to extension +headers. So, in6_cksum() was implemented to handle packets whose IP6 +header and transport header is not continuous. TCP/IP6 nor UDP/IP6 +header structure don't exist for checksum calculation. + +To process IP6 header, extension headers and transport headers easily, +KAME requires network drivers to store packets in one internal mbuf or +one or more external mbufs. A typical old driver prepares two +internal mbufs for 100 - 208 bytes data, however, KAME's reference +implementation stores it in one external mbuf. + +"netstat -s -p ip6" tells you whether or not your driver conforms +KAME's requirement. In the following example, "cce0" violates the +requirement. (For more information, refer to Section 2.) + + Mbuf statistics: + 317 one mbuf + two or more mbuf:: + lo0 = 8 + cce0 = 10 + 3282 one ext mbuf + 0 two or more ext mbuf + +Each input function calls IP6_EXTHDR_CHECK in the beginning to check +if the region between IP6 and its header is +continuous. IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has +M_LOOP flag, that is, the packet comes from the loopback +interface. m_pullup() is never called for packets coming from physical +network interfaces. + +TCP6 reassembly makes use of IP6 header to store reassemble +information. IP6 is not supposed to be just before TCP6, so +ip6tcpreass structure has a pointer to TCP6 header. Of course, it has +also a pointer back to mbuf to avoid m_pullup(). + +Like TCP6, both IP and IP6 reassemble functions never call m_pullup(). + +xxx_ctlinput() calls in_mrejoin() on PRC_IFNEWADDR. We think this is +one of 4.4BSD implementation flaws. Since 4.4BSD keeps ia_multiaddrs +in in_ifaddr{}, it can't use multicast feature if the interface has no +unicast address. So, if an application joins to an interface and then +all unicast addresses are removed from the interface, the application +can't send/receive any multicast packets. Moreover, if a new unicast +address is assigned to the interface, in_mrejoin() must be called. +KAME's interfaces, however, have ALWAYS one link-local unicast +address. These extensions have thus not been implemented in KAME. + +1.12 IPv4 mapped address and IPv6 wildcard socket + +RFC2553/3493 describes IPv4 mapped address (3.7) and special behavior +of IPv6 wildcard bind socket (3.8). The spec allows you to: +- Accept IPv4 connections by AF_INET6 wildcard bind socket. +- Transmit IPv4 packet over AF_INET6 socket by using special form of + the address like ::ffff:10.1.1.1. +but the spec itself is very complicated and does not specify how the +socket layer should behave. +Here we call the former one "listening side" and the latter one "initiating +side", for reference purposes. + +Almost all KAME implementations treat tcp/udp port number space separately +between IPv4 and IPv6. You can perform wildcard bind on both of the address +families, on the same port. + +There are some OS-platform differences in KAME code, as we use tcp/udp +code from different origin. The following table summarizes the behavior. + + listening side initiating side + (AF_INET6 wildcard (connection to ::ffff:10.1.1.1) + socket gets IPv4 conn.) + --- --- +KAME/BSDI3 not supported not supported +KAME/FreeBSD228 not supported not supported +KAME/FreeBSD3x configurable supported + default: enabled +KAME/FreeBSD4x configurable supported + default: enabled +KAME/NetBSD configurable supported + default: disabled +KAME/BSDI4 enabled supported +KAME/OpenBSD not supported not supported + +The following sections will give you more details, and how you can +configure the behavior. + +Comments on listening side: + +It looks that RFC2553/3493 talks too little on wildcard bind issue, +specifically on (1) port space issue, (2) failure mode, (3) relationship +between AF_INET/INET6 wildcard bind like ordering constraint, and (4) behavior +when conflicting socket is opened/closed. There can be several separate +interpretation for this RFC which conform to it but behaves differently. +So, to implement portable application you should assume nothing +about the behavior in the kernel. Using getaddrinfo() is the safest way. +Port number space and wildcard bind issues were discussed in detail +on ipv6imp mailing list, in mid March 1999 and it looks that there's +no concrete consensus (means, up to implementers). You may want to +check the mailing list archives. +We supply a tool called "bindtest" that explores the behavior of +kernel bind(2). The tool will not be compiled by default. + +If a server application would like to accept IPv4 and IPv6 connections, +it should use AF_INET and AF_INET6 socket (you'll need two sockets). +Use getaddrinfo() with AI_PASSIVE into ai_flags, and socket(2) and bind(2) +to all the addresses returned. +By opening multiple sockets, you can accept connections onto the socket with +proper address family. IPv4 connections will be accepted by AF_INET socket, +and IPv6 connections will be accepted by AF_INET6 socket (NOTE: KAME/BSDI4 +kernel sometimes violate this - we will fix it). + +If you try to support IPv6 traffic only and would like to reject IPv4 +traffic, always check the peer address when a connection is made toward +AF_INET6 listening socket. If the address is IPv4 mapped address, you may +want to reject the connection. You can check the condition by using +IN6_IS_ADDR_V4MAPPED() macro. This is one of the reasons the author of +the section (itojun) dislikes special behavior of AF_INET6 wildcard bind. + +Comments on initiating side: + +Advise to application implementers: to implement a portable IPv6 application +(which works on multiple IPv6 kernels), we believe that the following +is the key to the success: +- NEVER hardcode AF_INET nor AF_INET6. +- Use getaddrinfo() and getnameinfo() throughout the system. + Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*(). +- If you would like to connect to destination, use getaddrinfo() and try + all the destination returned, like telnet does. +- Some of the IPv6 stack is shipped with buggy getaddrinfo(). Ship a minimal + working version with your application and use that as last resort. + +If you would like to use AF_INET6 socket for both IPv4 and IPv6 outgoing +connection, you will need tweaked implementation in DNS support libraries, +as documented in RFC2553/3493 6.1. KAME libinet6 includes the tweak in +getipnodebyname(). Note that getipnodebyname() itself is not recommended as +it does not handle scoped IPv6 addresses at all. For IPv6 name resolution +getaddrinfo() is the preferred API. getaddrinfo() does not implement the +tweak. + +When writing applications that make outgoing connections, story goes much +simpler if you treat AF_INET and AF_INET6 as totally separate address family. +{set,get}sockopt issue goes simpler, DNS issue will be made simpler. We do +not recommend you to rely upon IPv4 mapped address. + +1.12.1 KAME/BSDI3 and KAME/FreeBSD228 + +The platforms do not support IPv4 mapped address at all (both listening side +and initiating side). AF_INET6 and AF_INET sockets are totally separated. + +Port number space is totally separate between AF_INET and AF_INET6 sockets. + +It should be noted that KAME/BSDI3 and KAME/FreeBSD228 are not conformant +to RFC2553/3493 section 3.7 and 3.8. It is due to code sharing reasons. + +1.12.2 KAME/FreeBSD[34]x + +KAME/FreeBSD3x and KAME/FreeBSD4x use shared tcp4/6 code (from +sys/netinet/tcp*) and shared udp4/6 code (from sys/netinet/udp*). +They use unified inpcb/in6pcb structure. + +1.12.2.1 KAME/FreeBSD[34]x, listening side + +The platform can be configured to support IPv4 mapped address/special +AF_INET6 wildcard bind (enabled by default). There is no kernel compilation +option to disable it. You can enable/disable the behavior with sysctl +(per-node), or setsockopt (per-socket). + +Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following +conditions are satisfied: +- there's no AF_INET socket that matches the IPv4 connection +- the AF_INET6 socket is configured to accept IPv4 traffic, i.e. + getsockopt(IPV6_V6ONLY) returns 0. + +(XXX need checking) + +1.12.2.2 KAME/FreeBSD[34]x, initiating side + +KAME/FreeBSD3x supports outgoing connection to IPv4 mapped address +(::ffff:10.1.1.1), if the node is configured to accept IPv4 connections +by AF_INET6 socket. + +(XXX need checking) + +1.12.3 KAME/NetBSD + +KAME/NetBSD uses shared tcp4/6 code (from sys/netinet/tcp*) and shared +udp4/6 code (from sys/netinet/udp*). The implementation is made differently +from KAME/FreeBSD[34]x. KAME/NetBSD uses separate inpcb/in6pcb structures, +while KAME/FreeBSD[34]x uses merged inpcb structure. + +It should be noted that the default configuration of KAME/NetBSD is not +conformant to RFC2553/3493 section 3.8. It is intentionally turned off by +default for security reasons. + +The platform can be configured to support IPv4 mapped address/special AF_INET6 +wildcard bind (disabled by default). Kernel behavior can be summarized as +follows: +- default: special support code will be compiled in, but is disabled by + default. It can be controlled by sysctl (net.inet6.ip6.v6only), + or setsockopt(IPV6_V6ONLY). +- add "INET6_BINDV6ONLY": No special support code for AF_INET6 wildcard socket + will be compiled in. AF_INET6 sockets and AF_INET sockets are totally + separate. The behavior is similar to what described in 1.12.1. + +sysctl setting will affect per-socket configuration at in6pcb creation time +only. In other words, per-socket configuration will be copied from sysctl +configuration at in6pcb creation time. To change per-socket behavior, you +must perform setsockopt or reopen the socket. Change in sysctl configuration +will not change the behavior or sockets that are already opened. + +1.12.3.1 KAME/NetBSD, listening side + +Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following +conditions are satisfied: +- there's no AF_INET socket that matches the IPv4 connection +- the AF_INET6 socket is configured to accept IPv4 traffic, i.e. + getsockopt(IPV6_V6ONLY) returns 0. + +You cannot bind(2) with IPv4 mapped address. This is a workaround for port +number duplicate and other twists. + +1.12.3.2 KAME/NetBSD, initiating side + +When getsockopt(IPV6_V6ONLY) is 0 for a socket, you can make an outgoing +traffic to IPv4 destination over AF_INET6 socket, using IPv4 mapped +address destination (::ffff:10.1.1.1). + +When getsockopt(IPV6_V6ONLY) is 1 for a socket, you cannot use IPv4 mapped +address for outgoing traffic. + +1.12.4 KAME/BSDI4 + +KAME/BSDI4 uses NRL-based TCP/UDP stack and inpcb source code, +which was derived from NRL IPv6/IPsec stack. We guess it supports IPv4 mapped +address and speical AF_INET6 wildcard bind. The implementation is, again, +different from other KAME/*BSDs. + +1.12.4.1 KAME/BSDI4, listening side + +NRL inpcb layer supports special behavior of AF_INET6 wildcard socket. +There is no way to disable the behavior. + +Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following +condition is satisfied: +- there's no AF_INET socket that matches the IPv4 connection + +1.12.4.2 KAME/BSDI4, initiating side + +KAME/BSDi4 supports connection initiation to IPv4 mapped address +(like ::ffff:10.1.1.1). + +1.12.5 KAME/OpenBSD + +KAME/OpenBSD uses NRL-based TCP/UDP stack and inpcb source code, +which was derived from NRL IPv6/IPsec stack. + +It should be noted that KAME/OpenBSD is not conformant to RFC2553/3493 section +3.7 and 3.8. It is intentionally omitted for security reasons. + +1.12.5.1 KAME/OpenBSD, listening side + +KAME/OpenBSD disables special behavior on AF_INET6 wildcard bind for +security reasons (if IPv4 traffic toward AF_INET6 wildcard bind is allowed, +access control will become much harder). KAME/BSDI4 uses NRL-based TCP/UDP +stack as well, however, the behavior is different due to OpenBSD's security +policy. + +As a result the behavior of KAME/OpenBSD is similar to KAME/BSDI3 and +KAME/FreeBSD228 (see 1.12.1 for more detail). + +1.12.5.2 KAME/OpenBSD, initiating side + +KAME/OpenBSD does not support connection initiation to IPv4 mapped address +(like ::ffff:10.1.1.1). + +1.12.6 More issues + +IPv4 mapped address support adds a big requirement to EVERY userland codebase. +Every userland code should check if an AF_INET6 sockaddr contains IPv4 +mapped address or not. This adds many twists: + +- Access controls code becomes harder to write. + For example, if you would like to reject packets from 10.0.0.0/8, + you need to reject packets to AF_INET socket from 10.0.0.0/8, + and to AF_INET6 socket from ::ffff:10.0.0.0/104. +- If a protocol on top of IPv4 is defined differently with IPv6, we need to be + really careful when we determine which protocol to use. + For example, with FTP protocol, we can not simply use sa_family to determine + FTP command sets. The following example is incorrect: + if (sa_family == AF_INET) + use EPSV/EPRT or PASV/PORT; /*IPv4*/ + else if (sa_family == AF_INET6) + use EPSV/EPRT or LPSV/LPRT; /*IPv6*/ + else + error; + The correct code, with consideration to IPv4 mapped address, would be: + if (sa_family == AF_INET) + use EPSV/EPRT or PASV/PORT; /*IPv4*/ + else if (sa_family == AF_INET6 && IPv4 mapped address) + use EPSV/EPRT or PASV/PORT; /*IPv4 command set on AF_INET6*/ + else if (sa_family == AF_INET6 && !IPv4 mapped address) + use EPSV/EPRT or LPSV/LPRT; /*IPv6*/ + else + error; + It is too much to ask for every body to be careful like this. + The problem is, we are not sure if the above code fragment is perfect for + all situations. +- By enabling kernel support for IPv4 mapped address (outgoing direction), + servers on the kernel can be hosed by IPv6 native packet that has IPv4 + mapped address in IPv6 header source, and can generate unwanted IPv4 packets. + draft-itojun-ipv6-transition-abuse-01.txt, draft-cmetz-v6ops-v4mapped-api- + harmful-00.txt, and draft-itojun-v6ops-v4mapped-harmful-01.txt + has more on this scenario. + +Due to the above twists, some of KAME userland programs has restrictions on +the use of IPv4 mapped addresses: +- rshd/rlogind do not accept connections from IPv4 mapped address. + This is to avoid malicious use of IPv4 mapped address in IPv6 native + packet, to bypass source-address based authentication. +- ftp/ftpd assume that you are on dual stack network. IPv4 mapped address + will be decoded in userland, and will be passed to AF_INET sockets + (in other words, ftp/ftpd do not support SIIT environment). + +1.12.7 Interaction with SIIT translator + +SIIT translator is specified in RFC2765. KAME node cannot become a SIIT +translator box, nor SIIT end node (a node in SIIT cloud). + +To become a SIIT translator box, we need to put additional code for that. +We do not have the code in our tree at this moment. + +There are multiple reasons that we are unable to become SIIT end node. +(1) SIIT translators require end nodes in the SIIT cloud to be IPv6-only. +Since we are unable to compile INET-less kernel, we are unable to become +SIIT end node. (2) As presented in 1.12.6, some of our userland code assumes +dual stack network. (3) KAME stack filters out IPv6 packets with IPv4 +mapped address in the header, to secure non-SIIT case (which is much more +common). Effectively KAME node will reject any packets via SIIT translator +box. See section 1.14 for more detail about the last item. + +There are documentation issues too - SIIT document requires very strange +things. For example, SIIT document asks IPv6-only (meaning no IPv4 code) +node to be able to construct IPv4 IPsec headers. If a node knows how to +construct IPv4 IPsec headers, that is not an IPv6-only node, it is a dual-stack +node. The requirements imposed in SIIT document contradict with the other +part of the document itself. + +1.13 sockaddr_storage + +When RFC2553 was about to be finalized, there was discussion on how struct +sockaddr_storage members are named. One proposal is to prepend "__" to the +members (like "__ss_len") as they should not be touched. The other proposal +was that don't prepend it (like "ss_len") as we need to touch those members +directly. There was no clear consensus on it. + +As a result, RFC2553 defines struct sockaddr_storage as follows: + struct sockaddr_storage { + u_char __ss_len; /* address length */ + u_char __ss_family; /* address family */ + /* and bunch of padding */ + }; +On the contrary, XNET draft defines as follows: + struct sockaddr_storage { + u_char ss_len; /* address length */ + u_char ss_family; /* address family */ + /* and bunch of padding */ + }; + +In December 1999, it was agreed that RFC2553bis (RFC3493) should pick the +latter (XNET) definition. + +KAME kit prior to December 1999 used RFC2553 definition. KAME kit after +December 1999 (including December) will conform to XNET definition, +based on RFC3493 discussion. + +If you look at multiple IPv6 implementations, you will be able to see +both definitions. As an userland programmer, the most portable way of +dealing with it is to: +(1) ensure ss_family and/or ss_len are available on the platform, by using + GNU autoconf, +(2) have -Dss_family=__ss_family to unify all occurences (including header + file) into __ss_family, or +(3) never touch __ss_family. cast to sockaddr * and use sa_family like: + struct sockaddr_storage ss; + family = ((struct sockaddr *)&ss)->sa_family + +1.14 Invalid addresses on the wire + +Some of IPv6 transition technologies embed IPv4 address into IPv6 address. +These specifications themselves are fine, however, there can be certain +set of attacks enabled by these specifications. Recent speicifcation +documents covers up those issues, however, there are already-published RFCs +that does not have protection against those (like using source address of +::ffff:127.0.0.1 to bypass "reject packet from remote" filter). + +To name a few, these address ranges can be used to hose an IPv6 implementation, +or bypass security controls: +- IPv4 mapped address that embeds unspecified/multicast/loopback/broadcast + IPv4 address (if they are in IPv6 native packet header, they are malicious) + ::ffff:0.0.0.0/104 ::ffff:127.0.0.0/104 + ::ffff:224.0.0.0/100 ::ffff:255.0.0.0/104 +- 6to4 (RFC3056) prefix generated from unspecified/multicast/loopback/ + broadcast/private IPv4 address + 2002:0000::/24 2002:7f00::/24 2002:e000::/24 + 2002:ff00::/24 2002:0a00::/24 2002:ac10::/28 + 2002:c0a8::/32 +- IPv4 compatible address that embeds unspecified/multicast/loopback/broadcast + IPv4 address (if they are in IPv6 native packet header, they are malicious). + Note that, since KAME doe snot support RFC1933/2893 auto tunnels, KAME nodes + are not vulnerable to these packets. + ::0.0.0.0/104 ::127.0.0.0/104 ::224.0.0.0/100 ::255.0.0.0/104 + +Also, since KAME does not support RFC1933/2893 auto tunnels, seeing IPv4 +compatible is very rare. You should take caution if you see those on the wire. + +If we see IPv6 packets with IPv4 mapped address (::ffff:0.0.0.0/96) in the +header in dual-stack environment (not in SIIT environment), they indicate +that someone is trying to inpersonate IPv4 peer. The packet should be dropped. + +IPv6 specifications do not talk very much about IPv6 unspecified address (::) +in the IPv6 source address field. Clarification is in progress. +Here are couple of comments: +- IPv6 unspecified address can be used in IPv6 source address field, if and + only if we have no legal source address for the node. The legal situations + include, but may not be limited to, (1) MLD while no IPv6 address is assigned + to the node and (2) DAD. +- If IPv6 TCP packet has IPv6 unspecified address, it is an attack attempt. + The form can be used as a trigger for TCP DoS attack. KAME code already + filters them out. +- The following examples are seemingly illegal. It seems that there's general + consensus among ipngwg for those. (1) Mobile IPv6 home address option, + (2) offlink packets (so routers should not forward them). + KAME implmements (2) already. + +KAME code is carefully written to avoid such incidents. More specifically, +KAME kernel will reject packets with certain source/dstination address in IPv6 +base header, or IPv6 routing header. Also, KAME default configuration file +is written carefully, to avoid those attacks. + +draft-itojun-ipv6-transition-abuse-01.txt, draft-cmetz-v6ops-v4mapped-api- +harmful-00.txt and draft-itojun-v6ops-v4mapped-harmful-01.txt has more on +this issue. + +1.15 Node's required addresses + +RFC2373 section 2.8 talks about required addresses for an IPv6 +node. The section talks about how KAME stack manages those required +addresses. + +1.15.1 Host case + +The following items are automatically assigned to the node (or the node will +automatically joins the group), at bootstrap time: +- Loopback address +- All-nodes multicast addresses (ff01::1) + +The following items will be automatically handled when the interface becomes +IFF_UP: +- Its link-local address for each interface +- Solicited-node multicast address for link-local addresses +- Link-local allnodes multicast address (ff02::1) + +The following items need to be configured manually by ifconfig(8) or prefix(8). +Alternatively, these can be autoconfigured by using stateless address +autoconfiguration. +- Assigned unicast/anycast addresses +- Solicited-Node multicast address for assigned unicast address + +Users can join groups by using appropriate system calls like setsockopt(2). + +1.15.2 Router case + +In addition to the above, routers needs to handle the following items. + +The following items need to be configured manually by using ifconfig(8). +o The subnet-router anycast addresses for the interfaces it is configured + to act as a router on (prefix::/64) +o All other anycast addresses with which the router has been configured + +The router will join the following multicast group when rtadvd(8) is available +for the interface. +o All-Routers Multicast Addresses (ff02::2) + +Routing daemons will join appropriate multicast groups, as necessary, +like ff02::9 for RIPng. + +Users can join groups by using appropriate system calls like setsockopt(2). + +1.16 Advanced API + +Current KAME kernel implements RFC3542 API. It also implements RFC2292 API, +for backward compatibility purposes with *BSD-integrated codebase. +KAME tree ships with RFC3542 headers. +*BSD-integrated codebase implements either RFC2292, or RFC3542, API. +see "COVERAGE" document for detailed implementation status. + +Here are couple of issues to mention: +- *BSD-integrated binaries, compiled for RFC2292, will work on KAME kernel. + For example, OpenBSD 2.7 /sbin/rtsol will work on KAME/openbsd kernel. +- KAME binaries, compiled using RFC3542, will not work on *BSD-integrated + kenrel. For example, KAME /usr/local/v6/sbin/rtsol will not work on + OpenBSD 2.7 kernel. +- RFC3542 API is not compatible with RFC2292 API. RFC3542 #define symbols + conflict with RFC2292 symbols. Therefore, if you compile programs that + assume RFC2292 API, the compilation itself goes fine, however, the compiled + binary will not work correctly. The problem is not KAME issue, but API + issue. For example, Solaris 8 implements RFC3542 API. If you compile + RFC2292-based code on Solaris 8, the binary can behave strange. + +There are few (or couple of) incompatible behavior in RFC2292 binary backward +compatibility support in KAME tree. To enumerate: +- Type 0 routing header lacks support for strict/loose bitmap. + Even if we see packets with "strict" bit set, those bits will not be made + visible to the userland. + Background: RFC2292 document is based on RFC1883 IPv6, and it uses + strict/loose bitmap. RFC3542 document is based on RFC2460 IPv6, and it has + no strict/loose bitmap (it was removed from RFC2460). KAME tree obeys + RFC2460 IPv6, and lacks support for strict/loose bitmap. + +The RFC3542 documents leave some particular cases unspecified. The +KAME implementation treats them as follows: +- The IPV6_DONTFRAG and IPV6_RECVPATHMTU socket options for TCP + sockets are ignored. That is, the setsocktopt() call will succeed + but the specified value will have no effect. + +1.17 DNS resolver + +KAME ships with modified DNS resolver, in libinet6.a. +libinet6.a has a comple of extensions against libc DNS resolver: +- Can take "options insecure1" and "options insecure2" in /etc/resolv.conf, + which toggles RES_INSECURE[12] option flag bit. +- EDNS0 receive buffer size notification support. It can be enabled by + "options edns0" in /etc/resolv.conf. See USAGE for details. +- IPv6 transport support (queries/responses over IPv6). Most of BSD official + releases now has it already. +- Partial A6 chain chasing/DNAME/bit string label support (KAME/BSDI4). + + +2. Network Drivers + +KAME requires three items to be added into the standard drivers: + +(1) (freebsd[234] and bsdi[34] only) mbuf clustering requirement. + In this stable release, we changed MINCLSIZE into MHLEN+1 for all the + operating systems in order to make all the drivers behave as we expect. + +(2) multicast. If "ifmcstat" yields no multicast group for a + interface, that interface has to be patched. + +To avoid troubles, we suggest you to comment out the device drivers +for unsupported/unnecessary cards, from the kernel configuration file. +If you accidentally enable unsupported drivers, some of the userland +tools may not work correctly (routing daemons are typical example). + +In the following sections, "official support" means that KAME developers +are using that ethernet card/driver frequently. + +(NOTE: In the past we required all pcmcia drivers to have a call to +in6_ifattach(). We have no such requirement any more) + +2.1 FreeBSD 2.2.x-RELEASE + +Here is a list of FreeBSD 2.2.x-RELEASE drivers and its conditions: + + driver mbuf(1) multicast(2) official support? + --- --- --- --- + (Ethernet) + ar looks ok - - + cnw ok ok yes (*) + ed ok ok yes + ep ok ok yes + fe ok ok yes + sn looks ok - - (*) + vx looks ok - - + wlp ok ok - (*) + xl ok ok yes + zp ok ok - + (FDDI) + fpa looks ok ? - + (ATM) + en ok ok yes + (Serial) + lp ? - not work + sl ? - not work + sr looks ok ok - (**) + +You may want to add an invocation of "rtsol" in "/etc/pccard_ether", +if you are using notebook computers and PCMCIA ethernet card. + +(*) These drivers are distributed with PAO (http://www.jp.freebsd.org/PAO/). + +(**) There was some report says that, if you make sr driver up and down and +then up, the kernel may hang up. We have disabled frame-relay support from +sr driver and after that this looks to be working fine. If you need +frame-relay support to come back, please contact KAME developers. + +2.2 BSD/OS 3.x + +The following lists BSD/OS 3.x device drivers and its conditions: + + driver mbuf(1) multicast(2) official support? + --- --- --- --- + (Ethernet) + cnw ok ok yes + de ok ok - + df ok ok - + eb ok ok - + ef ok ok yes + exp ok ok - + mz ok ok yes + ne ok ok yes + we ok ok - + (FDDI) + fpa ok ok - + (ATM) + en maybe ok - + (Serial) + ntwo ok ok yes + sl ? - not work + appp ? - not work + +You may want to use "@insert" directive in /etc/pccard.conf to invoke +"rtsol" command right after dynamic insertion of PCMCIA ethernet cards. + +2.3 NetBSD + +The following table lists the network drivers we have tried so far. + + driver mbuf(1) multicast(2) official support? + --- --- --- --- + (Ethernet) + awi pcmcia/i386 ok ok - + bah zbus/amiga NG(*) + cnw pcmcia/i386 ok ok yes + ep pcmcia/i386 ok ok - + fxp pci/i386 ok(*2) ok - + tlp pci/i386 ok ok - + le sbus/sparc ok ok yes + ne pci/i386 ok ok yes + ne pcmcia/i386 ok ok yes + rtk pci/i386 ok ok - + wi pcmcia/i386 ok ok yes + (ATM) + en pci/i386 ok ok - + +(*) This may need some fix, but I'm not sure what arcnet interfaces assume... + +2.4 FreeBSD 3.x-RELEASE + +Here is a list of FreeBSD 3.x-RELEASE drivers and its conditions: + + driver mbuf(1) multicast(2) official support? + --- --- --- --- + (Ethernet) + cnw ok ok -(*) + ed ? ok - + ep ok ok - + fe ok ok yes + fxp ?(**) + lnc ? ok - + sn ? ? -(*) + wi ok ok yes + xl ? ok - + +(*) These drivers are distributed with PAO as PAO3 + (http://www.jp.freebsd.org/PAO/). +(**) there were trouble reports with multicast filter initialization. + +More drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not +been checked yet. + +2.5 FreeBSD 4.x-RELEASE + +Here is a list of FreeBSD 4.x-RELEASE drivers and its conditions: + + driver multicast + --- --- + (Ethernet) + lnc/vmware ok + +2.6 OpenBSD 2.x + +Here is a list of OpenBSD 2.x drivers and its conditions: + + driver mbuf(1) multicast(2) official support? + --- --- --- --- + (Ethernet) + de pci/i386 ok ok yes + fxp pci/i386 ?(*) + le sbus/sparc ok ok yes + ne pci/i386 ok ok yes + ne pcmcia/i386 ok ok yes + wi pcmcia/i386 ok ok yes + +(*) There seem to be some problem in driver, with multicast filter +configuration. This happens with certain revision of chipset on the card. +Should be fixed by now by workaround in sys/net/if.c, but still not sure. + +2.7 BSD/OS 4.x + +The following lists BSD/OS 4.x device drivers and its conditions: + + driver mbuf(1) multicast(2) official support? + --- --- --- --- + (Ethernet) + de ok ok yes + exp (*) + +You may want to use "@insert" directive in /etc/pccard.conf to invoke +"rtsol" command right after dynamic insertion of PCMCIA ethernet cards. + +(*) exp driver has serious conflict with KAME initialization sequence. +A workaround is committed into sys/i386/pci/if_exp.c, and should be okay by now. + + +3. Translator + +We categorize IPv4/IPv6 translator into 4 types. + +Translator A --- It is used in the early stage of transition to make +it possible to establish a connection from an IPv6 host in an IPv6 +island to an IPv4 host in the IPv4 ocean. + +Translator B --- It is used in the early stage of transition to make +it possible to establish a connection from an IPv4 host in the IPv4 +ocean to an IPv6 host in an IPv6 island. + +Translator C --- It is used in the late stage of transition to make it +possible to establish a connection from an IPv4 host in an IPv4 island +to an IPv6 host in the IPv6 ocean. + +Translator D --- It is used in the late stage of transition to make it +possible to establish a connection from an IPv6 host in the IPv6 ocean +to an IPv4 host in an IPv4 island. + +KAME provides an TCP relay translator for category A. This is called +"FAITH". We also provide IP header translator for category A. + +3.1 FAITH TCP relay translator + +FAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel. +FAITH will reserve an IPv6 address prefix, and relay TCP connection +toward that prefix to IPv4 destination. + +For example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and +the IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12, +the connection will be relayed toward IPv4 destination 163.221.202.12. + + destination IPv4 node (163.221.202.12) + ^ + | IPv4 tcp toward 163.221.202.12 + FAITH-relay dual stack node + ^ + | IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12 + source IPv6 node + +faithd must be invoked on FAITH-relay dual stack node. + +For more details, consult kame/kame/faithd/README and RFC3142. + +3.2 IPv6-to-IPv4 header translator + +(to be written) + + +4. IPsec + +IPsec is implemented as the following three components. + +(1) Policy Management +(2) Key Management +(3) AH, ESP and IPComp handling in kernel + +Note that KAME/OpenBSD does NOT include support for KAME IPsec code, +as OpenBSD team has their home-brew IPsec stack and they have no plan +to replace it. IPv6 support for IPsec is, therefore, lacking on KAME/OpenBSD. + +http://www.netbsd.org/Documentation/network/ipsec/ has more information +including usage examples. + +4.1 Policy Management + +The kernel implements experimental policy management code. There are two ways +to manage security policy. One is to configure per-socket policy using +setsockopt(3). In this cases, policy configuration is described in +ipsec_set_policy(3). The other is to configure kernel packet filter-based +policy using PF_KEY interface, via setkey(8). + +The policy entry will be matched in order. The order of entries makes +difference in behavior. + +4.2 Key Management + +The key management code implemented in this kit (sys/netkey) is a +home-brew PFKEY v2 implementation. This conforms to RFC2367. + +The home-brew IKE daemon, "racoon" is included in the kit (kame/kame/racoon, +or usr.sbin/racoon). +Basically you'll need to run racoon as daemon, then setup a policy +to require keys (like ping -P 'out ipsec esp/transport//use'). +The kernel will contact racoon daemon as necessary to exchange keys. + +In IKE spec, there's ambiguity about interpretation of "tunnel" proposal. +For example, if we would like to propose the use of following packet: + IP AH ESP IP payload +some implementation proposes it as "AH transport and ESP tunnel", since +this is more logical from packet construction point of view. Some +implementation proposes it as "AH tunnel and ESP tunnel". +Racoon follows the latter route (previously it followed the former, and +the latter interpretation seems to be popular/consensus). +This raises real interoperability issue. We hope this to be resolved quickly. + +racoon does not implement byte lifetime for both phase 1 and phase 2 +(RFC2409 page 35, Life Type = kilobytes). + +4.3 AH and ESP handling + +IPsec module is implemented as "hooks" to the standard IPv4/IPv6 +processing. When sending a packet, ip{,6}_output() checks if ESP/AH +processing is required by checking if a matching SPD (Security +Policy Database) is found. If ESP/AH is needed, +{esp,ah}{4,6}_output() will be called and mbuf will be updated +accordingly. When a packet is received, {esp,ah}4_input() will be +called based on protocol number, i.e. (*inetsw[proto])(). +{esp,ah}4_input() will decrypt/check authenticity of the packet, +and strips off daisy-chained header and padding for ESP/AH. It is +safe to strip off the ESP/AH header on packet reception, since we +will never use the received packet in "as is" form. + +By using ESP/AH, TCP4/6 effective data segment size will be affected by +extra daisy-chained headers inserted by ESP/AH. Our code takes care of +the case. + +Basic crypto functions can be found in directory "sys/crypto". ESP/AH +transform are listed in {esp,ah}_core.c with wrapper functions. If you +wish to add some algorithm, add wrapper function in {esp,ah}_core.c, and +add your crypto algorithm code into sys/crypto. + +Tunnel mode works basically fine, but comes with the following restrictions: +- You cannot run routing daemon across IPsec tunnel, since we do not model + IPsec tunnel as pseudo interfaces. +- Authentication model for AH tunnel must be revisited. We'll need to + improve the policy management engine, eventually. +- Path MTU discovery does not work across IPv6 IPsec tunnel gateway due to + insufficient code. + +AH specificaton does not talk much about "multiple AH on a packet" case. +We incrementally compute AH checksum, from inside to outside. Also, we +treat inner AH to be immutable. +For example, if we are to create the following packet: + IP AH1 AH2 AH3 payload +we do it incrementally. As a result, we get crypto checksums like below: + AH3 has checksum against "IP AH3' payload". + where AH3' = AH3 with checksum field filled with 0. + AH2 has checksum against "IP AH2' AH3 payload". + AH1 has checksum against "IP AH1' AH2 AH3 payload", +Also note that AH3 has the smallest sequence number, and AH1 has the largest +sequence number. + +To avoid traffic analysis on shorter packets, ESP output logic supports +random length padding. By setting net.inet.ipsec.esp_randpad (or +net.inet6.ipsec6.esp_randpad) to positive value N, you can ask the kernel +to randomly pad packets shorter than N bytes, to random length smaller than +or equal to N. Note that N does not include ESP authentication data length. +Also note that the random padding is not included in TCP segment +size computation. Negative value will turn off the functionality. +Recommeded value for N is like 128, or 256. If you use a too big number +as N, you may experience inefficiency due to fragmented packtes. + +4.4 IPComp handling + +IPComp stands for IP payload compression protocol. This is aimed for +payload compression, not the header compression like PPP VJ compression. +This may be useful when you are using slow serial link (say, cell phone) +with powerful CPU (well, recent notebook PCs are really powerful...). +The protocol design of IPComp is very similar to IPsec, though it was +defined separately from IPsec itself. + +Here are some points to be noted: +- IPComp is treated as part of IPsec protocol suite, and SPI and + CPI space is unified. Spec says that there's no relationship + between two so they are assumed to be separate in specs. +- IPComp association (IPCA) is kept in SAD. +- It is possible to use well-known CPI (CPI=2 for DEFLATE for example), + for outbound/inbound packet, but for indexing purposes one element from + SPI/CPI space will be occupied anyway. +- pfkey is modified to support IPComp. However, there's no official + SA type number assignment yet. Portability with other IPComp + stack is questionable (anyway, who else implement IPComp on UN*X?). +- Spec says that IPComp output processing must be performed before AH/ESP + output processing, to achieve better compression ratio and "stir" data + stream before encryption. The most meaningful processing order is: + (1) compress payload by IPComp, (2) encrypt payload by ESP, then (3) attach + authentication data by AH. + However, with manual SPD setting, you are able to violate the ordering + (KAME code is too generic, maybe). Also, it is just okay to use IPComp + alone, without AH/ESP. +- Though the packet size can be significantly decreased by using IPComp, no + special consideration is made about path MTU (spec talks nothing about MTU + consideration). IPComp is designed for serial links, not ethernet-like + medium, it seems. +- You can change compression ratio on outbound packet, by changing + deflate_policy in sys/netinet6/ipcomp_core.c. You can also change outbound + history buffer size by changing deflate_window_out in the same source code. + (should it be sysctl accessible, or per-SAD configurable?) +- Tunnel mode IPComp is not working right. KAME box can generate tunnelled + IPComp packet, however, cannot accept tunneled IPComp packet. +- You can negotiate IPComp association with racoon IKE daemon. +- KAME code does not attach Adler32 checksum to compressed data. + see ipsec wg mailing list discussion in Jan 2000 for details. + +4.5 Conformance to RFCs and IDs + +The IPsec code in the kernel conforms (or, tries to conform) to the +following standards: + "old IPsec" specification documented in rfc182[5-9].txt + "new IPsec" specification documented in: + rfc240[1-6].txt rfc241[01].txt rfc2451.txt rfc3602.txt + IPComp: + RFC2393: IP Payload Compression Protocol (IPComp) +IKE specifications (rfc240[7-9].txt) are implemented in userland +as "racoon" IKE daemon. + +Currently supported algorithms are: + old IPsec AH + null crypto checksum (no document, just for debugging) + keyed MD5 with 128bit crypto checksum (rfc1828.txt) + keyed SHA1 with 128bit crypto checksum (no document) + HMAC MD5 with 128bit crypto checksum (rfc2085.txt) + HMAC SHA1 with 128bit crypto checksum (no document) + HMAC RIPEMD160 with 128bit crypto checksum (no document) + old IPsec ESP + null encryption (no document, similar to rfc2410.txt) + DES-CBC mode (rfc1829.txt) + new IPsec AH + null crypto checksum (no document, just for debugging) + keyed MD5 with 96bit crypto checksum (no document) + keyed SHA1 with 96bit crypto checksum (no document) + HMAC MD5 with 96bit crypto checksum (rfc2403.txt + HMAC SHA1 with 96bit crypto checksum (rfc2404.txt) + HMAC SHA2-256 with 96bit crypto checksum (draft-ietf-ipsec-ciph-sha-256-00.txt) + HMAC SHA2-384 with 96bit crypto checksum (no document) + HMAC SHA2-512 with 96bit crypto checksum (no document) + HMAC RIPEMD160 with 96bit crypto checksum (RFC2857) + AES XCBC MAC with 96bit crypto checksum (RFC3566) + new IPsec ESP + null encryption (rfc2410.txt) + DES-CBC with derived IV + (draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired) + DES-CBC with explicit IV (rfc2405.txt) + 3DES-CBC with explicit IV (rfc2451.txt) + BLOWFISH CBC (rfc2451.txt) + CAST128 CBC (rfc2451.txt) + RIJNDAEL/AES CBC (rfc3602.txt) + AES counter mode (rfc3686.txt) + + each of the above can be combined with new IPsec AH schemes for + ESP authentication. + IPComp + RFC2394: IP Payload Compression Using DEFLATE + +The following algorithms are NOT supported: + old IPsec AH + HMAC MD5 with 128bit crypto checksum + 64bit replay prevention + (rfc2085.txt) + keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt) + +The key/policy management API is based on the following document, with fair +amount of extensions: + RFC2367: PF_KEY key management API + +4.6 ECN consideration on IPsec tunnels + +KAME IPsec implements ECN-friendly IPsec tunnel, described in +draft-ietf-ipsec-ecn-02.txt. +Normal IPsec tunnel is described in RFC2401. On encapsulation, +IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner +IP header to outer IP header. On decapsulation outer IP header +will be simply dropped. The decapsulation rule is not compatible +with ECN, since ECN bit on the outer IP TOS/traffic class field will be +lost. +To make IPsec tunnel ECN-friendly, we should modify encapsulation +and decapsulation procedure. This is described in +draft-ietf-ipsec-ecn-02.txt, chapter 3.3. + +KAME IPsec tunnel implementation can give you three behaviors, by setting +net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value: +- RFC2401: no consideration for ECN (sysctl value -1) +- ECN forbidden (sysctl value 0) +- ECN allowed (sysctl value 1) +Note that the behavior is configurable in per-node manner, not per-SA manner +(draft-ietf-ipsec-ecn-02 wants per-SA configuration, but it looks too much +for me). + +The behavior is summarized as follows (see source code for more detail): + + encapsulate decapsulate + --- --- +RFC2401 copy all TOS bits drop TOS bits on outer + from inner to outer. (use inner TOS bits as is) + +ECN forbidden copy TOS bits except for ECN drop TOS bits on outer + (masked with 0xfc) from inner (use inner TOS bits as is) + to outer. set ECN bits to 0. + +ECN allowed copy TOS bits except for ECN use inner TOS bits with some + CE (masked with 0xfe) from change. if outer ECN CE bit + inner to outer. is 1, enable ECN CE bit on + set ECN CE bit to 0. the inner. + +General strategy for configuration is as follows: +- if both IPsec tunnel endpoint are capable of ECN-friendly behavior, + you'd better configure both end to "ECN allowed" (sysctl value 1). +- if the other end is very strict about TOS bit, use "RFC2401" + (sysctl value -1). +- in other cases, use "ECN forbidden" (sysctl value 0). +The default behavior is "ECN forbidden" (sysctl value 0). + +For more information, please refer to: + draft-ietf-ipsec-ecn-02.txt + RFC2481 (Explicit Congestion Notification) + KAME sys/netinet6/{ah,esp}_input.c + +(Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis) + +4.7 Interoperability + +IPsec, IPComp (in kernel) and IKE (in userland as "racoon") has been tested +at several interoperability test events, and it is known to interoperate +with many other implementations well. Also, KAME IPsec has quite wide +coverage for IPsec crypto algorithms documented in RFC (we do not cover +algorithms with intellectual property issues, though). + +Here are (some of) platforms we have tested IPsec/IKE interoperability +in the past, no particular order. Note that both ends (KAME and +others) may have modified their implementation, so use the following +list just for reference purposes. + 6WIND, ACC, Allied-telesis, Altiga, Ashley-laurent (vpcom.com), + BlueSteel, CISCO IOS, Checkpoint FW-1, Compaq Tru54 UNIX + X5.1B-BL4, Cryptek, Data Fellows (F-Secure), Ericsson, + F-Secure VPN+ 5.40, Fitec, Fitel, FreeS/WAN, HITACHI, HiFn, + IBM AIX 5.1, III, IIJ (fujie stack), Intel Canada, Intel + Packet Protect, MEW NetCocoon, MGCS, Microsoft WinNT/2000/XP, + NAI PGPnet, NEC IX5000, NIST (linux IPsec + plutoplus), + NetLock, Netoctave, Netopia, Netscreen, Nokia EPOC, Nortel + GatewayController/CallServer 2000 (not released yet), + NxNetworks, OpenBSD isakmpd on OpenBSD, Oullim information + technologies SECUREWORKS VPN gateway 3.0, Pivotal, RSA, + Radguard, RapidStream, RedCreek, Routerware, SSH, SecGo + CryptoIP v3, Secure Computing, Soliton, Sun Solaris 8, + TIS/NAI Gauntret, Toshiba, Trilogy AdmitOne 2.6, Trustworks + TrustedClient v3.2, USAGI linux, VPNet, Yamaha RT series, + ZyXEL + +Here are (some of) platforms we have tested IPComp/IKE interoperability +in the past, in no particular order. + Compaq, IRE, SSH, NetLock, FreeS/WAN, F-Secure VPN+ 5.40 + +VPNC (vpnc.org) provides IPsec conformance tests, using KAME and OpenBSD +IPsec/IKE implementations. Their test results are available at +http://www.vpnc.org/conformance.html, and it may give you more idea +about which implementation interoperates with KAME IPsec/IKE implementation. + +4.8 Operations with IPsec tunnel mode + +First of all, IPsec tunnel is a very hairy thing. It seems to do a neat thing +like VPN configuration or secure remote accesses, however, it comes with lots +of architectural twists. + +RFC2401 defines IPsec tunnel mode, within the context of IPsec. RFC2401 +defines tunnel mode packet encapsulation/decapsulation on its own, and +does not refer other tunnelling specifications. Since RFC2401 advocates +filter-based SPD database matches, it would be natural for us to implement +IPsec IPsec tunnel mode as filters - not as pseudo interfaces. + +There are some people who are trying to separate IPsec "tunnel mode" from +the IPsec itself. They would like to implement IPsec transport mode only, +and combine it with tunneling pseudo devices. The prime example is found +in draft-touch-ipsec-vpn-01.txt. However, if you really define pseudo +interfaces separately from IPsec, IKE daemons would need to negotiate +transport mode SAs, instead of tunnel mode SAs. Therefore, we cannot +really mix RFC2401-based interpretation and draft-touch-ipsec-vpn-01.txt +interpretation. + +The KAME stack implements can be configured in two ways. You may need +to recompile your kernel to switch the behavior. +- RFC2401 IPsec tunnel mode appraoch (4.8.1) +- draft-touch-ipsec-vpn approach (4.8.2) + Works in all kernel configuration, but racoon(8) may not interoperate. + +There are pros and cons on these approaches: + +RFC2401 IPsec tunnel mode (filter-like) approach + PRO: SPD lookup fits nicely with packet filters (if you integrate them) + CON: cannot run routing daemons across IPsec tunnels + CON: it is very hard to control source address selection on originating + cases + ???: IPv6 scope zone is kept the same +draft-touch-ipsec-vpn (transportmode + Pseudo-interface) approach + PRO: run routing daemons across IPsec tunnels + PRO: source address selection can be done normally, by looking at + IPsec tunnel pseudo devices + CON: on outbound, possibility of infinite loops if routing setup + is wrong + CON: due to differences in encap/decap logic from RFC2401, it may not + interoperate with very picky RFC2401 implementations + (those who check TOS bits, for example) + CON: cannot negotiate IKE with other IPsec tunnel-mode devices + (the other end has to implement + ???: IPv6 scope zone is likely to be different from the real ethernet + interface + +The recommendation is different depending on the situation you have: +- use draft-touch-ipsec-vpn if you have the control over the other end. + this one is the best in terms of simplicity. +- if the other end is normal IPsec device with RFC2401 implementation, + you need to use RFC2401, otherwise you won't be able to run IKE. +- use RFC2401 approach if you just want to forward packets back and forth + and there's no plan to use IPsec gateway itself as an originating device. + +4.8.1 RFC2401 IPsec tunnel mode approach + +To configure your device as RFC2401 IPsec tunnel mode endpoint, you will +use "tunnel" keyword in setkey(8) "spdadd" directives. Let us assume the +following topology (A and B could be a network, like prefix/length): + + ((((((((((((The internet)))))))))))) + | | + |C (global) |D + your device peer's device + |A (private) |B + ==+===== VPN net ==+===== VPN net + +The policy configuration directive is like this. You will need manual +SAs, or IKE daemon, for actual encryption: + + # setkey -c <<EOF + spdadd A B any -P out ipsec esp/tunnel/C-D/use; + spdadd B A any -P in ipsec esp/tunnel/D-C/use; + ^D + +The inbound/outbound traffic is monitored/captured by SPD engine, which works +just like packet filters. + +With this, forwarding case should work flawlessly. However, troubles arise +when you have one of the following requirements: +- When you originate traffic from your VPN gateway device to VPN net on the + other end (like B), you want your source address to be A (private side) + so that the traffic would be protected by the policy. + With this approach, however, the source address selection logic follows + normal routing table, and C (global side) will be picked for any outgoing + traffic, even if the destination is B. The resulting packet will be like + this: + IP[C -> B] payload + and will not match the policy (= sent in clear). +- When you want to run routing protocols on top of the IPsec tunnel, it is + not possible. As there is no pseudo device that identifies the IPsec tunnel, + you cannot identify where the routing information came from. As a result, + you can't run routing daemons. + +4.8.2 draft-touch-ipsec-vpn approach + +With this approach, you will configure gif(4) tunnel interfaces, as well as +IPsec transport mode SAs. + + # gifconfig gif0 C D + # ifconfig gif0 A B + # setkey -c <<EOF + spdadd C D any -P out ipsec esp/transport//use; + spdadd D C any -P in ipsec esp/transport//use; + ^D + +Since we have a pseudo-interface "gif0", and it affects the routes and +the source address selection logic, we can have source address A, for +packets originated by the VPN gateway to B (and the VPN cloud). +We can also exchange routing information over the tunnel (gif0), as the tunnel +is represented as a pseudo interface (dynamic routes points to the +pseudo interface). + +There is a big drawbacks, however; with this, you can use IKE if and only if +the other end is using draft-touch-ipsec-vpn approach too. Since racoon(8) +grabs phase 2 IKE proposals from the kernel SPD database, you will be +negotiating IPsec transport-mode SAs with the other end, not tunnel-mode SAs. +Also, since the encapsulation mechanism is different from RFC2401, you may not +be able to interoperate with a picky RFC2401 implementations - if the other +end checks certain outer IP header fields (like TOS), you will not be able to +interoperate. + + +5. ALTQ + +KAME kit includes ALTQ, which supports FreeBSD3, FreeBSD4, FreeBSD5 +NetBSD. OpenBSD has ALTQ merged into pf and its ALTQ code is not +compatible with other platforms so that KAME's ALTQ is not used for +OpenBSD. For BSD/OS, ALTQ does not work. +ALTQ in KAME supports IPv6. +(actually, ALTQ is developed on KAME repository since ALTQ 2.1 - Jan 2000) + +ALTQ occupies single character device number. For FreeBSD, it is officially +allocated. For OpenBSD and NetBSD, we use the number which is not +currently allocated (will eventually get an official number). +The character device is enabled for i386 architecture only. To enable and +compile ALTQ-ready kernel for other archititectures, take the following steps: +- assume that your architecture is FOOBAA. +- modify sys/arch/FOOBAA/FOOBAA/conf.c (or somewhere that defines cdevsw), + to include a line for ALTQ. look at sys/arch/i386/i386/conf.c for + example. The major number must be same as i386 case. +- copy kernel configuration file (like ALTQ.v6 or GENERIC.v6) from i386, + and modify accordingly. +- build a kernel. +- before building userland, change netbsd/{lib,usr.sbin,usr.bin}/Makefile + (or openbsd/foobaa) so that it will visit altq-related sub directories. + + +6. Mobile IPv6 + +6.1 KAME node as correspondent node + +Default installation recognizes home address option (in destination +options header). No sub-options are supported. interaction with +IPsec, and/or 2292bis API, needs further study. + +6.2 KAME node as home agent/mobile node + +KAME kit includes Ericsson mobile-ip6 code. The integration is just started +(in Feb 2000), and we will need some more time to integrate it better. + +See kame/mip6config/{QUICKSTART,README_MIP6.txt} for more details. + +The Ericsson code implements revision 09 of the mobile-ip6 draft. There +are other implementations available: + NEC: http://www.6bone.nec.co.jp/mipv6/internal-dist/ (-13 draft) + SFC: http://neo.sfc.wide.ad.jp/~mip6/ (-13 draft) + +7. Coding style + +The KAME developers basically do not make a bother about coding +style. However, there is still some agreement on the style, in order +to make the distributed develoment smooth. + +- follow *BSD KNF where possible. note: there are multiple KNF standards. +- the tab character should be 8 columns wide (tabstops are at 8, 16, 24, ... + column). With vi, use ":set ts=8 sw=8". + With GNU Emacs 20 and later, the easiest way is to use the "bsd" style of + cc-mode with the variable "c-basic-offset" being 8; + (add-hook 'c-mode-common-hook + (function + (lambda () + (c-set-style "bsd") + (setq c-basic-offset 8) ; XXX for Emacs 20 only + ))) + The "bsd" style in GNU Emacs 21 sets the variable to 8 by default, + so the line marked by "XXX" is not necessary if you only use GNU + Emacs 21. +- each line should be within 80 characters. +- keep a single open/close bracket in a comment such as in the following + line: + putchar('('); /* ) */ + without this, some vi users would have a hard time to match a pair of + brackets. Although this type of bracket seems clumsy and is even + harmful for some other type of vi users and Emacs users, the + agreement in the KAME developers is to allow it. +- add the following line to the head of every KAME-derived file: + /* (dollar)KAME(dollar) */ + where "(dollar)" is the dollar character ($), and around "$" are tabs. + (this is for C. For other language, you should use its own comment + line.) + Once commited to the CVS repository, this line will contain its + version number (see, for example, at the top of this file). This + would make it easy to report a bug. +- when creating a new file with the WIDE copyright, tap "make copyright.c" at + the top-level, and use copyright.c as a template. KAME RCS tag will be + included automatically. +- when editting a third-party package, keep its own coding style as + much as possible, even if the style does not follow the items above. +- it is recommended to always wrap an expression containing + bitwise operators by parentheses, especially when the expression is + combined with relational operators, in order to avoid unintentional + mismatch of operators. Thus, we should write + if ((a & b) == 0) /* (A) */ + or + if (a & (b == 0)) /* (B) */ + instead of + if (a & b == 0) /* (C) */ + even if the programmer's intention was (C), which is equivalent to + (B) according to the grammar of the language C. + Thus, we should write a code to test if a bit-flag is set for a + given variable as follows: + if ((flag & FLAG_A) == 0) /* (D) the FLAG_A is NOT set */ + if ((flag & FLAG_A) != 0) /* (E) the FLAG_A is set */ + Some developers in the KAME project rather prefer the following style: + if (!(flag & FLAG_A)) /* (F) the FLAG_A is NOT set */ + if ((flag & FLAG_A)) /* (G) the FLAG_A is set */ + because it would be more intuitive in terms of the relationship + between the negation operator (!) and the semantics of the + condition. The KAME developers have discussed the style, and have + agreed that all the styles from (D) to (G) are valid. So, when you + see styles like (D) and (E) in the KAME code and feel a bit strange, + please just keep them. They are intentional. +- When inserting a separate block just to define some intra-block + variables, add the level of indentation as if the block was in a + control statement such as if-else, for, or while. For example, + foo () + { + int a; + + { + int internal_a; + ... + } + } + should be used, instead of + foo () + { + int a; + + { + int internal_a; + ... + } + } +- Do not use printf() or log() in the packet input path of the kernel code. + They can make the system vulnerable to packet flooding attacks (results in + /var overflow). +- (not a style issue) + To disable a module that is mistakenly imported (by CVS), just + remove the source tree in the repository. Note, however, that the + removal might annoy other developers who have already checked the + module out, so you should announce the removal as soon as possible. + Also, be 100% sure not to remove other modules. + +When you want to contribute something to the KAME project, and if *you +do not mind* the agreement, it would be helpful for the project to +keep these rules. Note, however, that we would never intend to force +you to adopt our rules. We would rather regard your own style, +especially when you have a policy about the style. + + +8. Policy on technology with intellectual property right restriction + +There are quite a few IETF documents/whatever which has intellectual property +right (IPR) restriction. KAME's stance is stated below. + + The goal of KAME is to provide freely redistributable, BSD-licensed, + implementation of Internet protocol technologies. + For this purpose, we implement protocols that (1) do not need license + contract with IPR holder, and (2) are royalty-free. + The reason for (1) is, even if KAME contracts with the IPR holder in + question, the users of KAME stack (usually implementers of some other + codebase) would need to make a license contract with the IPR holder. + It would damage the "freely redistributable" status of KAME codebase. + + By doing so KAME is (implicitly) trying to advocate no-license-contract, + royalty-free, release of IPRs. + +Note however, as documented in README, we do not guarantee that KAME code +is free of IPR infringement, you MUST check it if you are to integrate +KAME into your product (or whatever): + READ CAREFULLY: Several countries have legal enforcement for + export/import/use of cryptographic software. Check it before playing + with the kit. We do not intend to be your legalease clearing house + (NO WARRANTY). If you intend to include KAME stack into your product, + you'll need to check if the licenses on each file fit your situations, + and/or possible intellectual property right issues. + + <end of IMPLEMENTATION> diff --git a/share/doc/IPv6/Makefile b/share/doc/IPv6/Makefile new file mode 100644 index 0000000..62e160c --- /dev/null +++ b/share/doc/IPv6/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +NO_OBJ= +FILES= IMPLEMENTATION +FILESDIR= ${SHAREDIR}/doc/IPv6 + +.include <bsd.prog.mk> diff --git a/share/doc/Makefile b/share/doc/Makefile new file mode 100644 index 0000000..61b26d7 --- /dev/null +++ b/share/doc/Makefile @@ -0,0 +1,16 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/5/93 +# $FreeBSD$ + +.include <bsd.own.mk> + +SUBDIR= ${_bind9} IPv6 legal papers psd smm usd + +.if ${MK_BIND} != "no" +_bind9= bind9 +.endif + +# Default output format for troff documents is ascii. +# To generate postscript versions of troff documents, use: +# make PRINTERDEVICE=ps + +.include <bsd.subdir.mk> diff --git a/share/doc/bind9/Makefile b/share/doc/bind9/Makefile new file mode 100644 index 0000000..ce36fcd --- /dev/null +++ b/share/doc/bind9/Makefile @@ -0,0 +1,28 @@ +# $FreeBSD$ + +BIND_DIR= ${.CURDIR}/../../../contrib/bind9 +SRCDIR= ${BIND_DIR}/doc + +.PATH: ${BIND_DIR} ${SRCDIR}/arm ${SRCDIR}/misc + +NO_OBJ= + +FILESGROUPS= TOP ARM MISC +TOP= CHANGES COPYRIGHT FAQ NSEC3-NOTES README \ + README.idnkit README.pkcs11 +TOPDIR= ${DOCDIR}/bind9 +ARM= Bv9ARM.ch01.html Bv9ARM.ch02.html Bv9ARM.ch03.html \ + Bv9ARM.ch04.html Bv9ARM.ch05.html Bv9ARM.ch06.html \ + Bv9ARM.ch07.html Bv9ARM.ch08.html Bv9ARM.ch09.html \ + Bv9ARM.ch10.html Bv9ARM.html man.dig.html \ + man.dnssec-dsfromkey.html man.dnssec-keyfromlabel.html \ + man.dnssec-keygen.html man.dnssec-signzone.html man.host.html \ + man.named-checkconf.html man.named-checkzone.html \ + man.named.html man.nsupdate.html \ + man.rndc-confgen.html man.rndc.conf.html man.rndc.html +ARMDIR= ${TOPDIR}/arm +MISC= dnssec format-options.pl ipv6 migration migration-4to9 \ + options rfc-compliance roadmap sdb sort-options.pl +MISCDIR= ${TOPDIR}/misc + +.include <bsd.prog.mk> diff --git a/share/doc/legal/Makefile b/share/doc/legal/Makefile new file mode 100644 index 0000000..bc079ec --- /dev/null +++ b/share/doc/legal/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +SUBDIR= intel_ipw \ + intel_iwi \ + intel_wpi + +.include <bsd.subdir.mk> diff --git a/share/doc/legal/intel_ipw/Makefile b/share/doc/legal/intel_ipw/Makefile new file mode 100644 index 0000000..8f4f822 --- /dev/null +++ b/share/doc/legal/intel_ipw/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +NO_OBJ= +FILES= ${.CURDIR}/../../../../sys/contrib/dev/ipw/LICENSE +FILESDIR= ${SHAREDIR}/doc/legal/intel_ipw + +.include <bsd.prog.mk> diff --git a/share/doc/legal/intel_iwi/Makefile b/share/doc/legal/intel_iwi/Makefile new file mode 100644 index 0000000..8596237 --- /dev/null +++ b/share/doc/legal/intel_iwi/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +NO_OBJ= +FILES= ${.CURDIR}/../../../../sys/contrib/dev/iwi/LICENSE +FILESDIR= ${SHAREDIR}/doc/legal/intel_iwi + +.include <bsd.prog.mk> diff --git a/share/doc/legal/intel_wpi/Makefile b/share/doc/legal/intel_wpi/Makefile new file mode 100644 index 0000000..81014be --- /dev/null +++ b/share/doc/legal/intel_wpi/Makefile @@ -0,0 +1,8 @@ +# $FreeBSD$ + +NO_OBJ= +FILES= ${.CURDIR}/../../../../sys/contrib/dev/wpi/LICENSE +FILESDIR= ${SHAREDIR}/doc/legal/intel_wpi + +.include <bsd.prog.mk> + diff --git a/share/doc/papers/Makefile b/share/doc/papers/Makefile new file mode 100644 index 0000000..866fe20 --- /dev/null +++ b/share/doc/papers/Makefile @@ -0,0 +1,19 @@ +# $FreeBSD$ + +SUBDIR= beyond4.3 \ + bufbio \ + contents \ + devfs \ + diskperf \ + fsinterface \ + hwpmc \ + jail \ + kernmalloc \ + kerntune \ + malloc \ + newvm \ + relengr \ + sysperf \ + timecounter + +.include <bsd.subdir.mk> diff --git a/share/doc/papers/beyond4.3/Makefile b/share/doc/papers/beyond4.3/Makefile new file mode 100644 index 0000000..7d1fa49 --- /dev/null +++ b/share/doc/papers/beyond4.3/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 5.2 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= beyond43 +SRCS= beyond43.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/papers/beyond4.3/beyond43.ms b/share/doc/papers/beyond4.3/beyond43.ms new file mode 100644 index 0000000..b682ffc --- /dev/null +++ b/share/doc/papers/beyond4.3/beyond43.ms @@ -0,0 +1,519 @@ +.\" Copyright (c) 1989 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)beyond43.ms 5.1 (Berkeley) 6/5/90 +.\" $FreeBSD$ +.\" +.\" *troff -ms +.rm CM +.sp 2 +.ce 100 +\fB\s+2Current Research by +The Computer Systems Research Group +of Berkeley\s-2\fP +.ds DT "February 10, 1989 +.\" \fBDRAFT of \*(DT\fP +.sp 2 +.nf +Marshall Kirk McKusick +Michael J Karels +Keith Sklower +Kevin Fall +Marc Teitelbaum +Keith Bostic +.fi +.sp 2 +.ce 1 +\fISummary\fP +.ce 0 +.PP +The release of 4.3BSD in April of 1986 addressed many of the +performance problems and unfinished interfaces +present in 4.2BSD [Leffler84] [McKusick85]. +The Computer Systems Research Group at Berkeley +has now embarked on a new development phase to +update other major components of the system, as well as to offer +new functionality. +There are five major ongoing projects. +The first is to develop an OSI network protocol suite and to integrate +existing ISO applications into Berkeley UNIX. +The second is to develop and support an interface compliant with the +P1003.1 POSIX standard recently approved by the IEEE. +The third is to refine the TCP/IP networking to improve +its performance and limit congestion on slow and/or lossy networks. +The fourth is to provide a standard interface to file systems +so that multiple local and remote file systems can be supported, +much as multiple networking protocols are supported by 4.3BSD. +The fifth is to evaluate alternate access control mechanisms and +audit the existing security features of the system, particularly +with respect to network services. +Other areas of work include multi-architecture support, +a general purpose kernel memory allocator, disk labels, and +extensions to the 4.2BSD fast filesystem. +.PP +We are planning to finish implementation prototypes for each of the +five main areas of work over the next year, and provide an informal +test release sometime next year for interested developers. +After incorporating feedback and refinements from the testers, +they will appear in the next full Berkeley release, which is typically +made about a year after the test release. +.br +.ne 10 +.sp 2 +.NH +Recently Completed Projects +.PP +There have been several changes in the system that were included +in the recent 4.3BSD Tahoe release. +.NH 2 +Multi-architecture support +.PP +Support has been added for the DEC VAX 8600/8650, VAX 8200/8250, +MicroVAXII and MicroVAXIII. +.PP +The largest change has been the incorporation of support for the first +non-VAX processor, the CCI Power 6/32 and 6/32SX. (This addition also +supports the +Harris HCX-7 and HCX-9, as well as the Sperry 7000/40 and ICL machines.) +The Power 6 version of 4.3BSD is largely based on the compilers and +device drivers done for CCI's 4.2BSD UNIX, +and is otherwise similar to the VAX release of 4.3BSD. +The entire source tree, including all kernel and user-level sources, +has been merged using a structure that will easily accommodate the addition +of other processor families. A MIPS R2000 has been donated to us, +making the MIPS architecture a likely candidate for inclusion into a future +BSD release. +.NH 2 +Kernel Memory Allocator +.PP +The 4.3BSD UNIX kernel used 10 different memory allocation mechanisms, +each designed for the particular needs of the utilizing subsystem. +These mechanisms have been replaced by a general purpose dynamic +memory allocator that can be used by all of the kernel subsystems. +The design of this allocator takes advantage of known memory usage +patterns in the UNIX kernel and a hybrid strategy that is time-efficient +for small allocations and space-efficient for large allocations. +This allocator replaces the multiple memory allocation interfaces +with a single easy-to-program interface, +results in more efficient use of global memory by eliminating +partitioned and specialized memory pools, +and is quick enough (approximately 15 VAX instructions) that no +performance loss is observed relative to the current implementations. +[McKusick88]. +.NH 2 +Disk Labels +.PP +During the work on the CCI machine, +it became obvious that disk geometry and filesystem layout information +must be stored on each disk in a pack label. +Disk labels were implemented for the CCI disks and for the most common +types of disk controllers on the VAX. +A utility was written to create and maintain the disk information, +and other user-level programs that use such information now obtain +it from the disk label. +The use of this facility has allowed improvements in the file system's +knowledge of irregular disk geometries such as track-to-track skew. +.NH 2 +Fat Fast File System +.PP +The 4.2 fast file system [McKusick84] +contained several statically sized structures, +imposing limits on the number of cylinders per cylinder group, +inodes per cylinder group, +and number of distinguished rotational positions. +The new ``fat'' filesystem allows these limits to be set at filesystem +creation time. +Old kernels will treat the new filesystems as read-only, +and new kernels +will accommodate both formats. +The filesystem check facility, \fBfsck\fP, has also been modified to check +either type. +.br +.ne 10 +.sp 2 +.NH +Current UNIX Research at Berkeley +.PP +Since the release of 4.3BSD in mid 1986, +we have begun work on several new major areas of research. +Our goal is to apply leading edge research ideas into a stable +and reliable implementation that solves current problems in +operating systems development. +.NH 2 +OSI network protocol development +.PP +The network architecture of 4.2BSD was designed to accommodate +multiple network protocol families and address formats, +and an implementation of the ISO OSI network protocols +should enter into this framework without much difficulty. +We plan to +implement the OSI connectionless internet protocol (CLNP), +and device drivers for X.25, 802.3, and possibly 802.5 interfaces, and +to integrate these with an OSI transport class 4 (TP-4) implementation. +We will also incorporate into the Berkeley Software Distribution an +updated ISO Development Environment (ISODE) +featuring International Standard (IS) versions of utilities. +ISODE implements the session and presentation layers of the OSI protocol suite, +and will include an implementation of the file transfer protocol (FTAM). +It is also possible that an X.400 implementation now being done at +University College, London and the University of Nottingham +will be available for testing and distribution. +.LP +This implementation is comprised of four areas. +.IP 1) +We are updating the University of +Wisconsin TP-4 to match GOSIP requirements. +The University of Wisconsin developed a transport class 4 +implementation for the 4.2BSD kernel under contract to Mitre. +This implementation must be updated to reflect the National Institute +of Standards and Technology (NIST, formerly NBS) workshop agreements, +GOSIP, and 4.3BSD requirements. +We will make this TP-4 operate with an OSI IP, +as the original implementation was built to run over the DoD IP. +.IP 2) +A kernel version of the OSI IP and ES-IS protocols must be produced. +We will implement the kernel version of these protocols. +.IP 3) +The required device drivers need to be integrated into a BSD kernel. +4.3BSD has existing device drivers for many Ethernet devices; future +BSD versions may also support X.25 devices as well as token ring +networks. +These device drivers must be integrated +into the kernel OSI protocol implementations. +.IP 4) +The existing OSINET interoperability test network is available so +that the interoperability of the ISODE and BSD kernel protocols +can be established through tests with several vendors. +Testing is crucial because an openly available version of GOSIP protocols +that does not interoperate with DEC, IBM, SUN, ICL, HIS, and other +major vendors would be embarrassing. +To allow testing of the integrated pieces the most desirable +approach is to provide access to OSINET at UCB. +A second approach is to do the interoperability testing at +the site of an existing OSINET member, such as the NBS. +.NH 2 +Compliance with POSIX 1003 +.PP +Berkeley became involved several months ago in the development +of the IEEE POSIX P1003.1 system interface standard. +Since then, we have been participating in the working groups +of P1003.2 (shell and application utility interface), +P1003.6 (security), P1003.7 (system administration), and P1003.8 +(networking). +.PP +The IEEE published the POSIX P1003.1 standard in late 1988. +POSIX related changes to the BSD system have included a new terminal +driver, support for POSIX sessions and job control, expanded signal +functionality, restructured directory access routines, and new set-user +and set-group id facilities. +We currently have a prototype implementation of the +POSIX driver with extensions to provide binary compatibility with +applications developed for the old Berkeley terminal driver. +We also have a prototype implementation of the 4.2BSD-based POSIX +job control facility. +.PP +The P1003.2 draft is currently being voted on by the IEEE +P1003.2 balloting group. +Berkeley is particularly interested in the results of this standard, +as it will profoundly influence the user environment. +The other groups are in comparatively early phases, with drafts +coming to ballot sometime in the 90's. +Berkeley will continue to participate in these groups, and +move in the near future toward a P1003.1 and P1003.2 compliant +system. +We have many of the utilities outlined in the current P1003.2 draft +already implemented, and have other parties willing to contribute +additional implementations. +.NH 2 +Improvements to the TCP/IP Networking Protocols +.PP +The Internet and the Berkeley collection of local-area networks +have both grown at high rates in the last year. +The Bay Area Regional Research Network (BARRNet), +connecting several UC campuses, Stanford and NASA-Ames +has recently become operational, increasing the complexity +of the network connectivity. +Both Internet and local routing algorithms are showing the strain +of continued growth. +We have made several changes in the local routing algorithm +to keep accommodating the current topology, +and are participating in the development of new routing algorithms +and standard protocols. +.PP +Recent work in collaboration with Van Jacobson of the Lawrence Berkeley +Laboratory has led to the design and implementation of several new algorithms +for TCP that improve throughput on both local and long-haul networks +while reducing unnecessary retransmission. +The improvement is especially striking when connections must traverse +slow and/or lossy networks. +The new algorithms include ``slow-start,'' +a technique for opening the TCP flow control window slowly +and using the returning stream of acknowledgements as a clock +to drive the connection at the highest speed tolerated by the intervening +network. +A modification of this technique allows the sender to dynamically modify +the send window size to adjust to changing network conditions. +In addition, the round-trip timer has been modified to estimate the variance +in round-trip time, thus allowing earlier retransmission of lost packets +with less spurious retransmission due to increasing network delay. +Along with a scheme proposed by Phil Karn of Bellcore, +these changes reduce unnecessary retransmission over difficult paths +such as Satnet by nearly two orders of magnitude +while improving throughput dramatically. +.PP +The current TCP implementation is now being readied +for more widespread distribution via the network and as a +standard Berkeley distribution unencumbered by any commercial licensing. +We are continuing to refine the TCP and IP implementations +using the ARPANET, BARRNet, the NSF network +and local campus nets as testbeds. +In addition, we are incorporating applicable algorithms from this work +into the TP-4 protocol implementation. +.NH 2 +Toward a Compatible File System Interface +.PP +The most critical shortcoming of the 4.3BSD UNIX system was in the +area of distributed file systems. +As with networking protocols, +there is no single distributed file system +that provides sufficient speed and functionality for all problems. +It is frequently necessary to support several different remote +file system protocols, just as it is necessary to run several +different network protocols. +.PP +As network or remote file systems have been implemented for UNIX, +several stylized interfaces between the file system implementation +and the rest of the kernel have been developed. +Among these are Sun Microsystems' Virtual File System interface (VFS) +using \fBvnodes\fP [Sandburg85] [Kleiman86], +Digital Equipment's Generic File System (GFS) architecture [Rodriguez86], +AT&T's File System Switch (FSS) [Rifkin86], +the LOCUS distributed file system [Walker85], +and Masscomp's extended file system [Cole85]. +Other remote file systems have been implemented in research or +university groups for internal use, +notably the network file system in the Eighth Edition UNIX +system [Weinberger84] and two different file systems used at Carnegie Mellon +University [Satyanarayanan85]. +Numerous other remote file access methods have been devised for use +within individual UNIX processes, +many of them by modifications to the C I/O library +similar to those in the Newcastle Connection [Brownbridge82]. +.PP +Each design attempts to isolate file system-dependent details +below a generic interface and to provide a framework within which +new file systems may be incorporated. +However, each of these interfaces is different from +and incompatible with the others. +Each addresses somewhat different design goals, +having been based on a different version of UNIX, +having targeted a different set of file systems with varying characteristics, +and having selected a different set of file system primitive operations. +.PP +Our effort in this area is aimed at providing a common framework to +support these different distributed file systems simultaneously rather than to +simply implement yet another protocol. +This requires a detailed study of the existing protocols, +and discussion with their implementors to determine whether +they could modify their implementation to fit within our proposed +framework. We have studied the various file system interfaces to determine +their generality, completeness, robustness, efficiency, and aesthetics +and are currently working on a file system interface +that we believe includes the best features of +each of the existing implementations. +This work and the rationale underlying its development +have been presented to major software vendors as an early step +toward convergence on a standard compatible file system interface. +Briefly, the proposal adopts the 4.3BSD calling convention for file +name lookup but otherwise is closely related to Sun's VFS +and DEC's GFS. [Karels86]. +.NH 2 +System Security +.PP +The recent invasion of the DARPA Internet by a quickly reproducing ``worm'' +highlighted the need for a thorough review of the access +safeguards built into the system. +Until now, we have taken a passive approach to dealing with +weaknesses in the system access mechanisms, rather than actively +searching for possible weaknesses. +When we are notified of a problem or loophole in a system utility +by one of our users, +we have a well defined procedure for fixing the problem and +expeditiously disseminating the fix to the BSD mailing list. +This procedure has proven itself to be effective in +solving known problems as they arise +(witness its success in handling the recent worm). +However, we feel that it would be useful to take a more active +role in identifying problems before they are reported (or exploited). +We will make a complete audit of the system +utilities and network servers to find unintended system access mechanisms. +.PP +As a part of the work to make the system more resistant to attack +from local users or via the network, it will be necessary to produce +additional documentation on the configuration and operation of the system. +This documentation will cover such topics as file and directory ownership +and access, network and server configuration, +and control of privileged operations such as file system backups. +.PP +We are investigating the addition of access control lists (ACLs) for +filesystem objects. +ACLs provide a much finer granularity of control over file access permissions +than the current +discretionary access control mechanism (mode bits). +Furthermore, they are necessary +in environments where C2 level security or better, as defined in the DoD +TCSEC [DoD83], is required. +The POSIX P1003.6 security group has made notable progress in determining +how an ACL mechanism should work, and several vendors have implemented +ACLs for their commercial systems. +Berkeley will investigate the existing implementations and determine +how to best integrate ACLs with the existing mechanism. +.PP +A major shortcoming of the present system is that authentication +over the network is based solely on the privileged port mechanism +between trusting hosts and users. +Although privileged ports can only be created by processes running as root +on a UNIX system, +such processes are easy for a workstation user to obtain; +they simply reboot their workstation in single user mode. +Thus, a better authentication mechanism is needed. +At present, we believe that the MIT Kerberos authentication +server [Steiner88] provides the best solution to this problem. +We propose to investigate Kerberos further as well as other +authentication mechanisms and then to integrate +the best one into Berkeley UNIX. +Part of this integration would be the addition of the +authentication mechanism into utilities such as +telnet, login, remote shell, etc. +We will add support for telnet (eventually replacing rlogin), +the X window system, and the mail system within an authentication +domain (a Kerberos \fIrealm\fP). +We hope to replace the existing password authentication on each host +with the network authentication system. +.NH +References +.sp +.IP Brownbridge82 +Brownbridge, D.R., L.F. Marshall, B. Randell, +``The Newcastle Connection, or UNIXes of the World Unite!,'' +\fISoftware\- Practice and Experience\fP, Vol. 12, pp. 1147-1162, 1982. +.sp +.IP Cole85 +.br +Cole, C.T., P.B. Flinn, A.B. Atlas, +``An Implementation of an Extended File System for UNIX,'' +\fIUsenix Conference Proceedings\fP, +pp. 131-150, June, 1985. +.sp +.IP DoD83 +.br +Department of Defense, +``Trusted Computer System Evaluation Criteria,'' +\fICSC-STD-001-83\fP, +DoD Computer Security Center, August, 1983. +.sp +.IP Karels86 +Karels, M., M. McKusick, +``Towards a Compatible File System Interface,'' +\fIProceedings of the European UNIX Users Group Meeting\fP, +Manchester, England, pp. 481-496, September 1986. +.sp +.IP Kleiman86 +Kleiman, S., +``Vnodes: An Architecture for Multiple File System Types in Sun UNIX,'' +\fIUsenix Conference Proceedings\fP, +pp. 238-247, June, 1986. +.sp +.IP Leffler84 +Leffler, S., M.K. McKusick, M. Karels, +``Measuring and Improving the Performance of 4.2BSD,'' +\fIUsenix Conference Proceedings\fP, pp. 237-252, June, 1984. +.sp +.IP McKusick84 +McKusick, M.K., W. Joy, S. Leffler, R. Fabry, +``A Fast File System for UNIX'', +\fIACM Transactions on Computer Systems 2\fP, 3. +pp 181-197, August 1984. +.sp +.IP McKusick85 +McKusick, M.K., M. Karels, S. Leffler, +``Performance Improvements and Functional Enhancements in 4.3BSD,'' +\fIUsenix Conference Proceedings\fP, pp. 519-531, June, 1985. +.sp +.IP McKusick86 +McKusick, M.K., M. Karels, +``A New Virtual Memory Implementation for Berkeley UNIX,'' +\fIProceedings of the European UNIX Users Group Meeting\fP, +Manchester, England, pp. 451-460, September 1986. +.sp +.IP McKusick88 +McKusick, M.K., M. Karels, +``Design of a General Purpose Memory Allocator for the 4.3BSD UNIX Kernel,'' +\fIUsenix Conference Proceedings\fP, +pp. 295-303, June, 1988. +.sp +.IP Rifkin86 +Rifkin, A.P., M.P. Forbes, R.L. Hamilton, M. Sabrio, S. Shah, K. Yueh, +``RFS Architectural Overview,'' \fIUsenix Conference Proceedings\fP, +pp. 248-259, June, 1986. +.sp +.IP Rodriguez86 +Rodriguez, R., M. Koehler, R. Hyde, +``The Generic File System,'' +\fIUsenix Conference Proceedings\fP, +pp. 260-269, June, 1986. +.sp +.IP Sandberg85 +Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh, B. Lyon, +``Design and Implementation of the Sun Network File System,'' +\fIUsenix Conference Proceedings\fP, +pp. 119-130, June, 1985. +.sp +.IP Satyanarayanan85 +Satyanarayanan, M., \fIet al.\fP, +``The ITC Distributed File System: Principles and Design,'' +\fIProc. 10th Symposium on Operating Systems Principles\fP, pp. 35-50, +ACM, December, 1985. +.sp +.IP Steiner88 +Steiner, J., C. Newman, J. Schiller, +``\fIKerberos:\fP An Authentication Service for Open Network Systems,'' +\fIUsenix Conference Proceedings\fP, pp. 191-202, February, 1988. +.sp +.IP Walker85 +Walker, B.J. and S.H. Kiser, ``The LOCUS Distributed File System,'' +\fIThe LOCUS Distributed System Architecture\fP, +G.J. Popek and B.J. Walker, ed., The MIT Press, Cambridge, MA, 1985. +.sp +.IP Weinberger84 +Weinberger, P.J., ``The Version 8 Network File System,'' +\fIUsenix Conference presentation\fP, +June, 1984. diff --git a/share/doc/papers/bufbio/Makefile b/share/doc/papers/bufbio/Makefile new file mode 100644 index 0000000..9bdd487 --- /dev/null +++ b/share/doc/papers/bufbio/Makefile @@ -0,0 +1,14 @@ +# $FreeBSD$ + +VOLUME= papers +DOC= bio +SRCS= bio.ms-patched +EXTRA= bufsize.eps +MACROS= -ms +USE_PIC= +CLEANFILES= bio.ms-patched + +bio.ms-patched: bio.ms + sed "s;bufsize\.eps;${.CURDIR}/&;" ${.ALLSRC} > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/papers/bufbio/bio.ms b/share/doc/papers/bufbio/bio.ms new file mode 100644 index 0000000..123f8e7 --- /dev/null +++ b/share/doc/papers/bufbio/bio.ms @@ -0,0 +1,830 @@ +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.ORG> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.if n .ftr C R +.nr PI 2n +.TL +The case for struct bio +.br +- or - +.br +A road map for a stackable BIO subsystem in FreeBSD +.AU +Poul-Henning Kamp <phk@FreeBSD.org> +.AI +The FreeBSD Project +.AB +Historically, the only translation performed on I/O requests after +they they left the file-system layer were logical sub disk implementation +done in the device driver. No universal standard for how sub disks are +configured and implemented exists, in fact pretty much every single platform +and operating system have done it their own way. As FreeBSD migrates to +other platforms it needs to understand these local conventions to be +able to co-exist with other operating systems on the same disk. +.PP +Recently a number of technologies like RAID have expanded the +concept of "a disk" a fair bit and while these technologies initially +were implemented in separate hardware they increasingly migrate into +the operating systems as standard functionality. +.PP +Both of these factors indicate the need for a structured approach to +systematic "geometry manipulation" facilities in FreeBSD. +.PP +This paper contains the road-map for a stackable "BIO" system in +FreeBSD, which will support these facilities. +.AE +.NH +The miseducation of \fCstruct buf\fP. +.PP +To fully appreciate the topic, I include a little historic overview +of struct buf, it is a most enlightening case of not exactly bit-rot +but more appropriately design-rot. +.PP +In the beginning, which for this purpose extends until virtual +memory is was introduced into UNIX, all disk I/O were done from or +to a struct buf. In the 6th edition sources, as printed in Lions +Book, struct buf looks like this: +.DS +.ft C +.ps -1 +struct buf +{ + int b_flags; /* see defines below */ + struct buf *b_forw; /* headed by devtab of b_dev */ + struct buf *b_back; /* ' */ + struct buf *av_forw; /* position on free list, */ + struct buf *av_back; /* if not BUSY*/ + int b_dev; /* major+minor device name */ + int b_wcount; /* transfer count (usu. words) */ + char *b_addr; /* low order core address */ + char *b_xmem; /* high order core address */ + char *b_blkno; /* block # on device */ + char b_error; /* returned after I/O */ + char *b_resid; /* words not transferred after + error */ +} buf[NBUF]; +.ps +1 +.ft P +.DE +.PP +At this point in time, struct buf had only two functions: +To act as a cache +and to transport I/O operations to device drivers. For the purpose of +this document, the cache functionality is uninteresting and will be +ignored. +.PP +The I/O operations functionality consists of three parts: +.IP "" 5n +\(bu Where in Ram/Core is the data located (b_addr, b_xmem, b_wcount). +.IP +\(bu Where on disk is the data located (b_dev, b_blkno) +.IP +\(bu Request and result information (b_flags, b_error, b_resid) +.PP +In addition to this, the av_forw and av_back elements are +used by the disk device drivers to put requests on a linked list. +All in all the majority of struct buf is involved with the I/O +aspect and only a few fields relate exclusively to the cache aspect. +.PP +If we step forward to the BSD 4.4-Lite-2 release, struct buf has grown +a bit here or there: +.DS +.ft C +.ps -1 +struct buf { + LIST_ENTRY(buf) b_hash; /* Hash chain. */ + LIST_ENTRY(buf) b_vnbufs; /* Buffer's associated vnode. */ + TAILQ_ENTRY(buf) b_freelist; /* Free list position if not active. */ + struct buf *b_actf, **b_actb; /* Device driver queue when active. */ + struct proc *b_proc; /* Associated proc; NULL if kernel. */ + volatile long b_flags; /* B_* flags. */ + int b_error; /* Errno value. */ + long b_bufsize; /* Allocated buffer size. */ + long b_bcount; /* Valid bytes in buffer. */ + long b_resid; /* Remaining I/O. */ + dev_t b_dev; /* Device associated with buffer. */ + struct { + caddr_t b_addr; /* Memory, superblocks, indirect etc. */ + } b_un; + void *b_saveaddr; /* Original b_addr for physio. */ + daddr_t b_lblkno; /* Logical block number. */ + daddr_t b_blkno; /* Underlying physical block number. */ + /* Function to call upon completion. */ + void (*b_iodone) __P((struct buf *)); + struct vnode *b_vp; /* Device vnode. */ + long b_pfcent; /* Center page when swapping cluster. */ + /* XXX pfcent should be int; overld. */ + int b_dirtyoff; /* Offset in buffer of dirty region. */ + int b_dirtyend; /* Offset of end of dirty region. */ + struct ucred *b_rcred; /* Read credentials reference. */ + struct ucred *b_wcred; /* Write credentials reference. */ + int b_validoff; /* Offset in buffer of valid region. */ + int b_validend; /* Offset of end of valid region. */ +}; +.ps +1 +.ft P +.DE +.PP +The main piece of action is the addition of vnodes, a VM system and a +prototype LFS filesystem, all of which needed some handles on struct +buf. Comparison will show that the I/O aspect of struct buf is in +essence unchanged, the length field is now in bytes instead of words, +the linked list the drivers can use has been renamed (b_actf, +b_actb) and a b_iodone pointer for callback notification has been added +but otherwise there is no change to the fields which +represent the I/O aspect. All the new fields relate to the cache +aspect, link buffers to the VM system, provide hacks for file-systems +(b_lblkno) etc etc. +.PP +By the time we get to FreeBSD 3.0 more stuff has grown on struct buf: +.DS +.ft C +.ps -1 +struct buf { + LIST_ENTRY(buf) b_hash; /* Hash chain. */ + LIST_ENTRY(buf) b_vnbufs; /* Buffer's associated vnode. */ + TAILQ_ENTRY(buf) b_freelist; /* Free list position if not active. */ + TAILQ_ENTRY(buf) b_act; /* Device driver queue when active. *new* */ + struct proc *b_proc; /* Associated proc; NULL if kernel. */ + long b_flags; /* B_* flags. */ + unsigned short b_qindex; /* buffer queue index */ + unsigned char b_usecount; /* buffer use count */ + int b_error; /* Errno value. */ + long b_bufsize; /* Allocated buffer size. */ + long b_bcount; /* Valid bytes in buffer. */ + long b_resid; /* Remaining I/O. */ + dev_t b_dev; /* Device associated with buffer. */ + caddr_t b_data; /* Memory, superblocks, indirect etc. */ + caddr_t b_kvabase; /* base kva for buffer */ + int b_kvasize; /* size of kva for buffer */ + daddr_t b_lblkno; /* Logical block number. */ + daddr_t b_blkno; /* Underlying physical block number. */ + off_t b_offset; /* Offset into file */ + /* Function to call upon completion. */ + void (*b_iodone) __P((struct buf *)); + /* For nested b_iodone's. */ + struct iodone_chain *b_iodone_chain; + struct vnode *b_vp; /* Device vnode. */ + int b_dirtyoff; /* Offset in buffer of dirty region. */ + int b_dirtyend; /* Offset of end of dirty region. */ + struct ucred *b_rcred; /* Read credentials reference. */ + struct ucred *b_wcred; /* Write credentials reference. */ + int b_validoff; /* Offset in buffer of valid region. */ + int b_validend; /* Offset of end of valid region. */ + daddr_t b_pblkno; /* physical block number */ + void *b_saveaddr; /* Original b_addr for physio. */ + caddr_t b_savekva; /* saved kva for transfer while bouncing */ + void *b_driver1; /* for private use by the driver */ + void *b_driver2; /* for private use by the driver */ + void *b_spc; + union cluster_info { + TAILQ_HEAD(cluster_list_head, buf) cluster_head; + TAILQ_ENTRY(buf) cluster_entry; + } b_cluster; + struct vm_page *b_pages[btoc(MAXPHYS)]; + int b_npages; + struct workhead b_dep; /* List of filesystem dependencies. */ +}; +.ps +1 +.ft P +.DE +.PP +Still we find that the I/O aspect of struct buf is in essence unchanged. A couple of fields have been added which allows the driver to hang local data off the buf while working on it have been added (b_driver1, b_driver2) and a "physical block number" (b_pblkno) have been added. +.PP +This p_blkno is relevant, it has been added because the disklabel/slice +code have been abstracted out of the device drivers, the filesystem +ask for b_blkno, the slice/label code translates this into b_pblkno +which the device driver operates on. +.PP +After this point some minor cleanups have happened, some unused fields +have been removed etc but the I/O aspect of struct buf is still only +a fraction of the entire structure: less than a quarter of the +bytes in a struct buf are used for the I/O aspect and struct buf +seems to continue to grow and grow. +.PP +Since version 6 as documented in Lions book, a three significant pieces +of code have emerged which need to do non-trivial translations of +the I/O request before it reaches the device drivers: CCD, slice/label +and Vinum. They all basically do the same: they map I/O requests from +a logical space to a physical space, and the mappings they perform +can be 1:1 or 1:N. \** +.FS +It is interesting to note that Lions in his comments to the \fCrkaddr\fP +routine (p. 16-2) writes \fIThe code in this procedure incorporates +a special feature for files which extend over more than one disk +drive. This feature is described in the UPM Section "RK(IV)". Its +usefulness seems to be restricted.\fP This more than hints at the +presence already then of various hacks to stripe/span multiple devices. +.FE +.PP +The 1:1 mapping of the slice/label code is rather trivial, and the +addition of the b_pblkno field catered for the majority of the issues +this resulted in, leaving but one: Reads or writes to the magic "disklabel" +or equally magic "MBR" sectors on a disk must be caught, examined and in +some cases modified before being passed on to the device driver. This need +resulted in the addition of the b_iodone_chain field which adds a limited +ability to stack I/O operations; +.PP +The 1:N mapping of CCD and Vinum are far more interesting. These two +subsystems look like a device driver, but rather than drive some piece +of hardware, they allocate new struct buf data structures populates +these and pass them on to other device drivers. +.PP +Apart from it being inefficient to lug about a 348 bytes data structure +when 80 bytes would have done, it also leads to significant code rot +when programmers don't know what to do about the remaining fields or +even worse: "borrow" a field or two for their own uses. +.PP +.ID +.if t .PSPIC bufsize.eps +.if n [graph not available in this format] +.DE +.I +Conclusions: +.IP "" 5n +\(bu Struct buf is victim of chronic bloat. +.IP +\(bu The I/O aspect of +struct buf is practically constant and only about \(14 of the total bytes. +.IP +\(bu Struct buf currently have several users, vinum, ccd and to +limited extent diskslice/label, which +need only the I/O aspect, not the vnode, caching or VM linkage. +.IP +.I +The I/O aspect of struct buf should be put in a separate \fCstruct bio\fP. +.R +.NH 1 +Implications for future struct buf improvements +.PP +Concerns have been raised about the implications this separation +will have for future work on struct buf, I will try to address +these concerns here. +.PP +As the existence and popularity of vinum and ccd proves, there is +a legitimate and valid requirement to be able to do I/O operations +which are not initiated by a vnode or filesystem operation. +In other words, an I/O request is a fully valid entity in its own +right and should be treated like that. +.PP +Without doubt, the I/O request has to be tuned to fit the needs +of struct buf users in the best possible way, and consequently +any future changes in struct buf are likely to affect the I/O request +semantics. +.PP +One particular change which has been proposed is to drop the present +requirement that a struct buf be mapped contiguously into kernel +address space. The argument goes that since many modern drivers use +physical address DMA to transfer the data maintaining such a mapping +is needless overhead. +.PP +Of course some drivers will still need to be able to access the +buffer in kernel address space and some kind of compatibility +must be provided there. +.PP +The question is, if such a change is made impossible by the +separation of the I/O aspect into its own data structure? +.PP +The answer to this is ``no''. +Anything that could be added to or done with +the I/O aspect of struct buf can also be added to or done +with the I/O aspect if it lives in a new "struct bio". +.NH 1 +Implementing a \fCstruct bio\fP +.PP +The first decision to be made was who got to use the name "struct buf", +and considering the fact that it is the I/O aspect which gets separated +out and that it only covers about \(14 of the bytes in struct buf, +obviously the new structure for the I/O aspect gets a new name. +Examining the naming in the kernel, the "bio" prefix seemed a given, +for instance, the function to signal completion of an I/O request is +already named "biodone()". +.PP +Making the transition smooth is obviously also a priority and after +some prototyping \** +.FS +The software development technique previously known as "Trial & Error". +.FE +it was found that a totally transparent transition could be made by +embedding a copy of the new "struct bio" as the first element of "struct buf" +and by using cpp(1) macros to alias the fields to the legacy struct buf +names. +.NH 2 +The b_flags problem. +.PP +Struct bio was defined by examining all code existing in the driver tree +and finding all the struct buf fields which were legitimately used (as +opposed to "hi-jacked" fields). +One field was found to have "dual-use": the b_flags field. +This required special attention. +Examination showed that b_flags were used for three things: +.IP "" 5n +\(bu Communication of the I/O command (READ, WRITE, FORMAT, DELETE) +.IP +\(bu Communication of ordering and error status +.IP +\(bu General status for non I/O aspect consumers of struct buf. +.PP +For historic reasons B_WRITE was defined to be zero, which lead to +confusion and bugs, this pushed the decision to have a separate +"b_iocmd" field in struct buf and struct bio for communicating +only the action to be performed. +.PP +The ordering and error status bits were put in a new flag field "b_ioflag". +This has left sufficiently many now unused bits in b_flags that the b_xflags element +can now be merged back into b_flags. +.NH 2 +Definition of struct bio +.PP +With the cleanup of b_flags in place, the definition of struct bio looks like this: +.DS +.ft C +.ps -1 +struct bio { + u_int bio_cmd; /* I/O operation. */ + dev_t bio_dev; /* Device to do I/O on. */ + daddr_t bio_blkno; /* Underlying physical block number. */ + off_t bio_offset; /* Offset into file. */ + long bio_bcount; /* Valid bytes in buffer. */ + caddr_t bio_data; /* Memory, superblocks, indirect etc. */ + u_int bio_flags; /* BIO_ flags. */ + struct buf *_bio_buf; /* Parent buffer. */ + int bio_error; /* Errno for BIO_ERROR. */ + long bio_resid; /* Remaining I/O in bytes. */ + void (*bio_done) __P((struct buf *)); + void *bio_driver1; /* Private use by the callee. */ + void *bio_driver2; /* Private use by the callee. */ + void *bio_caller1; /* Private use by the caller. */ + void *bio_caller2; /* Private use by the caller. */ + TAILQ_ENTRY(bio) bio_queue; /* Disksort queue. */ + daddr_t bio_pblkno; /* physical block number */ + struct iodone_chain *bio_done_chain; +}; +.ps +1 +.ft P +.DE +.NH 2 +Definition of struct buf +.PP +After adding a struct bio to struct buf and the fields aliased into it +struct buf looks like this: +.DS +.ft C +.ps -1 +struct buf { + /* XXX: b_io must be the first element of struct buf for now /phk */ + struct bio b_io; /* "Builtin" I/O request. */ +#define b_bcount b_io.bio_bcount +#define b_blkno b_io.bio_blkno +#define b_caller1 b_io.bio_caller1 +#define b_caller2 b_io.bio_caller2 +#define b_data b_io.bio_data +#define b_dev b_io.bio_dev +#define b_driver1 b_io.bio_driver1 +#define b_driver2 b_io.bio_driver2 +#define b_error b_io.bio_error +#define b_iocmd b_io.bio_cmd +#define b_iodone b_io.bio_done +#define b_iodone_chain b_io.bio_done_chain +#define b_ioflags b_io.bio_flags +#define b_offset b_io.bio_offset +#define b_pblkno b_io.bio_pblkno +#define b_resid b_io.bio_resid + LIST_ENTRY(buf) b_hash; /* Hash chain. */ + TAILQ_ENTRY(buf) b_vnbufs; /* Buffer's associated vnode. */ + TAILQ_ENTRY(buf) b_freelist; /* Free list position if not active. */ + TAILQ_ENTRY(buf) b_act; /* Device driver queue when active. *new* */ + long b_flags; /* B_* flags. */ + unsigned short b_qindex; /* buffer queue index */ + unsigned char b_xflags; /* extra flags */ +[...] +.ps +1 +.ft P +.DE +.PP +Putting the struct bio as the first element in struct buf during a transition +period allows a pointer to either to be cast to a pointer of the other, +which means that certain pieces of code can be left un-converted with the +use of a couple of casts while the remaining pieces of code are tested. +The ccd and vinum modules have been left un-converted like this for now. +.PP +This is basically where FreeBSD-current stands today. +.PP +The next step is to substitute struct bio for struct buf in all the code +which only care about the I/O aspect: device drivers, diskslice/label. +The patch to do this is up for review. \** +.FS +And can be found at http://phk.freebsd.dk/misc +.FE +and consists mainly of systematic substitutions like these +.DS +.ft C +s/struct buf/struct bio/ +s/b_flags/bio_flags/ +s/b_bcount/bio_bcount/ +&c &c +.ft P +.DE +.NH 2 +Future work +.PP +It can be successfully argued that the cpp(1) macros used for aliasing +above are ugly and should be expanded in place. It would certainly +be trivial to do so, but not by definition worthwhile. +.PP +Retaining the aliasing for the b_* and bio_* name-spaces this way +leaves us with considerable flexibility in modifying the future +interaction between the two. The DEV_STRATEGY() macro is the single +point where a struct buf is turned into a struct bio and launched +into the drivers to full-fill the I/O request and this provides us +with a single isolated location for performing non-trivial translations. +.PP +As an example of this flexibility: It has been proposed to essentially +drop the b_blkno field and use the b_offset field to communicate the +on-disk location of the data. b_blkno is a 32bit offset of B_DEVSIZE +(512) bytes sectors which allows us to address two terabytes worth +of data. Using b_offset as a 64 bit byte-address would not only allow +us to address 8 million times larger disks, it would also make it +possible to accommodate disks which use non-power-of-two sector-size, +Audio CD-ROMs for instance. +.PP +The above mentioned flexibility makes an implementation almost trivial: +.IP "" 5n +\(bu Add code to DEV_STRATEGY() to populate b_offset from b_blkno in the +cases where it is not valid. Today it is only valid for a struct buf +marked B_PHYS. +.IP +\(bu Change diskslice/label, ccd, vinum and device drivers to use b_offset +instead of b_blkno. +.IP +\(bu Remove the bio_blkno field from struct bio, add it to struct buf as +b_blkno and remove the cpp(1) macro which aliased it into struct bio. +.PP +Another possible transition could be to not have a "built-in" struct bio +in struct buf. If for some reason struct bio grows fields of no relevance +to struct buf it might be cheaper to remove struct bio from struct buf, +un-alias the fields and have DEV_STRATEGY() allocate a struct bio and populate +the relevant fields from struct buf. +This would also be entirely transparent to both users of struct buf and +struct bio as long as we retain the aliasing mechanism and DEV_STRATEGY(). +.bp +.NH 1 +Towards a stackable BIO subsystem. +.PP +Considering that we now have three distinct pieces of code living +in the nowhere between DEV_STRATEGY() and the device drivers: +diskslice/label, ccd and vinum, it is not unreasonable to start +to look for a more structured and powerful API for these pieces +of code. +.PP +In traditional UNIX semantics a "disk" is a one-dimensional array of +512 byte sectors which can be read or written. Support for sectors +of multiple of 512 bytes were implemented with a sort of "don't ask-don't tell" policy where system administrator would specify a larger minimum sector-size +to the filesystem, and things would "just work", but no formal communication about the size of the smallest transfer possible were exchanged between the disk driver and the filesystem. +.PP +A truly generalised concept of a disk needs to be more flexible and more +expressive. For instance, a user of a disk will want to know: +.IP "" 5n +\(bu What is the sector size. Sector-size these days may not be a power +of two, for instance Audio CDs have 2352 byte "sectors". +.IP +\(bu How many sectors are there. +.IP +\(bu Is writing of sectors supported. +.IP +\(bu Is freeing of sectors supported. This is important for flash based +devices where a wear-distribution software or hardware function uses +the information about which sectors are actually in use to optimise the +usage of the slow erase function to a minimum. +.IP +\(bu Is opening this device in a specific mode, (read-only or read-write) +allowed. The VM system and the file-systems generally assume that nobody +writes to "their storage" under their feet, and therefore opens which +would make that possible should be rejected. +.IP +\(bu What is the "native" geometry of this device (Sectors/Heads/Cylinders). +This is useful for staying compatible with badly designed on-disk formats +from other operating systems. +.PP +Obviously, all of these properties are dynamic in the sense that in +these days disks are removable devices, and they may therefore change +at any time. While some devices like CD-ROMs can lock the media in +place with a special command, this cannot be done for all devices, +in particular it cannot be done with normal floppy disk drives. +.PP +If we adopt such a model for disk, retain the existing "strategy/biodone" model of I/O scheduling and decide to use a modular or stackable approach to +geometry translations we find that nearly endless flexibility emerges: +Mirroring, RAID, striping, interleaving, disk-labels and sub-disks, all of +these techniques would get a common framework to operate in. +.PP +In practice of course, such a scheme must not complicate the use of or +installation of FreeBSD. The code will have to act and react exactly +like the current code but fortunately the current behaviour is not at +all hard to emulate so implementation-wise this is a non-issue. +.PP +But lets look at some drawings to see what this means in practice. +.PP +Today the plumbing might look like this on a machine: +.DS +.PS + Ad0: box "disk (ad0)" + arrow up from Ad0.n + SL0: box "slice/label" + Ad1: box "disk (ad1)" with .w at Ad0.e + (.2,0) + arrow up from Ad1.n + SL1: box "slice/label" + Ad2: box "disk (ad2)" with .w at Ad1.e + (.2,0) + arrow up from Ad2.n + SL2: box "slice/label" + Ad3: box "disk (ad3)" with .w at Ad2.e + (.2,0) + arrow up from Ad3.n + SL3: box "slice/label" + DML: box dashed width 4i height .9i with .sw at SL0.sw + (-.2,-.2) + "Disk-mini-layer" with .n at DML.s + (0, .1) + + V: box "vinum" at 1/2 <SL1.n, SL2.n> + (0,1.2) + + A0A: arrow up from 1/4 <SL0.nw, SL0.ne> + A0B: arrow up from 2/4 <SL0.nw, SL0.ne> + A0E: arrow up from 3/4 <SL0.nw, SL0.ne> + A1C: arrow up from 2/4 <SL1.nw, SL1.ne> + arrow to 1/3 <V.sw, V.se> + A2C: arrow up from 2/4 <SL2.nw, SL2.ne> + arrow to 2/3 <V.sw, V.se> + A3A: arrow up from 1/4 <SL3.nw, SL3.ne> + A3E: arrow up from 2/4 <SL3.nw, SL3.ne> + A3F: arrow up from 3/4 <SL3.nw, SL3.ne> + + "ad0s1a" with .s at A0A.n + (0, .1) + "ad0s1b" with .s at A0B.n + (0, .3) + "ad0s1e" with .s at A0E.n + (0, .5) + "ad1s1c" with .s at A1C.n + (0, .1) + "ad2s1c" with .s at A2C.n + (0, .1) + "ad3s4a" with .s at A3A.n + (0, .1) + "ad3s4e" with .s at A3E.n + (0, .3) + "ad3s4f" with .s at A3F.n + (0, .5) + + V1: arrow up from 1/4 <V.nw, V.ne> + V2: arrow up from 2/4 <V.nw, V.ne> + V3: arrow up from 3/4 <V.nw, V.ne> + "V1" with .s at V1.n + (0, .1) + "V2" with .s at V2.n + (0, .1) + "V3" with .s at V3.n + (0, .1) + +.PE +.DE +.PP +And while this drawing looks nice and clean, the code underneat isn't. +With a stackable BIO implementation, the picture would look like this: +.DS +.PS + Ad0: box "disk (ad0)" + arrow up from Ad0.n + M0: box "MBR" + arrow up + B0: box "BSD" + + A0A: arrow up from 1/4 <B0.nw, B0.ne> + A0B: arrow up from 2/4 <B0.nw, B0.ne> + A0E: arrow up from 3/4 <B0.nw, B0.ne> + + Ad1: box "disk (ad1)" with .w at Ad0.e + (.2,0) + Ad2: box "disk (ad2)" with .w at Ad1.e + (.2,0) + Ad3: box "disk (ad3)" with .w at Ad2.e + (.2,0) + arrow up from Ad3.n + SL3: box "MBR" + arrow up + B3: box "BSD" + + V: box "vinum" at 1/2 <Ad1.n, Ad2.n> + (0,.8) + arrow from Ad1.n to 1/3 <V.sw, V.se> + arrow from Ad2.n to 2/3 <V.sw, V.se> + + A3A: arrow from 1/4 <B3.nw, B3.ne> + A3E: arrow from 2/4 <B3.nw, B3.ne> + A3F: arrow from 3/4 <B3.nw, B3.ne> + + "ad0s1a" with .s at A0A.n + (0, .1) + "ad0s1b" with .s at A0B.n + (0, .3) + "ad0s1e" with .s at A0E.n + (0, .5) + "ad3s4a" with .s at A3A.n + (0, .1) + "ad3s4e" with .s at A3E.n + (0, .3) + "ad3s4f" with .s at A3F.n + (0, .5) + + V1: arrow up from 1/4 <V.nw, V.ne> + V2: arrow up from 2/4 <V.nw, V.ne> + V3: arrow up from 3/4 <V.nw, V.ne> + "V1" with .s at V1.n + (0, .1) + "V2" with .s at V2.n + (0, .1) + "V3" with .s at V3.n + (0, .1) + +.PE +.DE +.PP +The first thing we notice is that the disk mini-layer is gone, instead +separate modules for the Microsoft style MBR and the BSD style disklabel +are now stacked over the disk. We can also see that Vinum no longer +needs to go though the BSD/MBR layers if it wants access to the entire +physical disk, it can be stacked right over the disk. +.PP +Now, imagine that a ZIP drive is connected to the machine, and the +user loads a ZIP disk in it. First the device driver notices the +new disk and instantiates a new disk: +.DS +.PS + box "disk (da0)" +.PE +.DE +.PP +A number of the geometry modules have registered as "auto-discovering" +and will be polled sequentially to see if any of them recognise what +is on this disk. The MBR module finds a MBR in sector 0 and attach +an instance of itself to the disk: +.DS +.PS + D: box "disk (da0)" + arrow up from D.n + M: box "MBR" + M1: arrow up from 1/3 <M.nw, M.ne> + M2: arrow up from 2/3 <M.nw, M.ne> +.PE +.DE +.PP +It finds two "slices" in the MBR and creates two new "disks" one for +each of these. The polling of modules is repeated and this time the +BSD label module recognises a FreeBSD label on one of the slices and +attach itself: +.DS +.PS + D: box "disk (da0)" + arrow "O" up from D.n + M: box "MBR" + M1: line up .3i from 1/3 <M.nw, M.ne> + arrow "O" left + M2: arrow "O" up from 2/3 <M.nw, M.ne> + B: box "BSD" + B1: arrow "O" up from 1/4 <B.nw, B.ne> + B2: arrow "O" up from 2/4 <B.nw, B.ne> + B3: arrow "O" up from 3/4 <B.nw, B.ne> + +.PE +.DE +.PP +The BSD module finds three partitions, creates them as disks and the +polling is repeated for each of these. No modules recognise these +and the process ends. In theory one could have a module recognise +the UFS superblock and extract from there the path to mount the disk +on, but this is probably better implemented in a general "device-daemon" +in user-land. +.PP +On this last drawing I have marked with "O" the "disks" which can be +accessed from user-land or kernel. The VM and file-systems generally +prefer to have exclusive write access to the disk sectors they use, +so we need to enforce this policy. Since we cannot know what transformation +a particular module implements, we need to ask the modules if the open +is OK, and they may need to ask their neighbours before they can answer. +.PP +We decide to mount a filesystem on one of the BSD partitions at the very top. +The open request is passed to the BSD module, which finds that none of +the other open partitions (there are none) overlap this one, so far no +objections. It then passes the open to the MBR module, which goes through +basically the same procedure finds no objections and pass the request to +the disk driver, which since it was not previously open approves of the +open. +.PP +Next we mount a filesystem on the next BSD partition. The +BSD module again checks for overlapping open partitions and find none. +This time however, it finds that it has already opened the "downstream" +in R/W mode so it does not need to ask for permission for that again +so the open is OK. +.PP +Next we mount a msdos filesystem on the other MBR slice. This is the +same case, the MBR finds no overlapping open slices and has already +opened "downstream" so the open is OK. +.PP +If we now try to open the other slice for writing, the one which has the +BSD module attached already. The open is passed to the MBR module which +notes that the device is already opened for writing by a module (the BSD +module) and consequently the open is refused. +.PP +While this sounds complicated it actually took less than 200 lines of +code to implement in a prototype implementation. +.PP +Now, the user ejects the ZIP disk. If the hardware can give a notification +of intent to eject, a call-up from the driver can try to get devices synchronised +and closed, this is pretty trivial. If the hardware just disappears like +a unplugged parallel zip drive, a floppy disk or a PC-card, we have no +choice but to dismantle the setup. The device driver sends a "gone" notification to the MBR module, which replicates this upwards to the mounted msdosfs +and the BSD module. The msdosfs unmounts forcefully, invalidates any blocks +in the buf/vm system and returns. The BSD module replicates the "gone" to +the two mounted file-systems which in turn unmounts forcefully, invalidates +blocks and return, after which the BSD module releases any resources held +and returns, the MBR module releases any resources held and returns and all +traces of the device have been removed. +.PP +Now, let us get a bit more complicated. We add another disk and mirror +two of the MBR slices: +.DS +.PS + D0: box "disk (da0)" + + arrow "O" up from D0.n + M0: box "MBR" + M01: line up .3i from 1/3 <M0.nw, M0.ne> + arrow "O" left + M02: arrow "O" up from 2/3 <M0.nw, M0.ne> + + D1: box "disk (da1)" with .w at D0.e + (.2,0) + arrow "O" up from D1.n + M1: box "MBR" + M11: line up .3i from 1/3 <M1.nw, M1.ne> + line "O" left + M11a: arrow up .2i + + I: box "Mirror" with .s at 1/2 <M02.n, M11a.n> + arrow "O" up + BB: box "BSD" + BB1: arrow "O" up from 1/4 <BB.nw, BB.ne> + BB2: arrow "O" up from 2/4 <BB.nw, BB.ne> + BB3: arrow "O" up from 3/4 <BB.nw, BB.ne> + + M12: arrow "O" up from 2/3 <M1.nw, M1.ne> + B: box "BSD" + B1: arrow "O" up from 1/4 <B.nw, B.ne> + B2: arrow "O" up from 2/4 <B.nw, B.ne> + B3: arrow "O" up from 3/4 <B.nw, B.ne> +.PE +.DE +.PP +Now assuming that we lose disk da0, the notification goes up like before +but the mirror module still has a valid mirror from disk da1, so it +doesn't propagate the "gone" notification further up and the three +file-systems mounted are not affected. +.PP +It is possible to modify the graph while in action, as long as the +modules know that they will not affect any I/O in progress. This is +very handy for moving things around. At any of the arrows we can +insert a mirroring module, since it has a 1:1 mapping from input +to output. Next we can add another copy to the mirror, give the +mirror time to sync the two copies. Detach the first mirror copy +and remove the mirror module. We have now in essence moved a partition +from one disk to another transparently. +.NH 1 +Getting stackable BIO layers from where we are today. +.PP +Most of the infrastructure is in place now to implement stackable +BIO layers: +.IP "" 5n +\(bu The dev_t change gave us a public structure where +information about devices can be put. This enabled us to get rid +of all the NFOO limits on the number of instances of a particular +driver/device, and significantly cleaned up the vnode aliasing for +device vnodes. +.IP +\(bu The disk-mini-layer has +taken the knowledge about diskslice/labels out of the +majority of the disk-drivers, saving on average 100 lines of code per +driver. +.IP +\(bu The struct bio/buf divorce is giving us an IO request of manageable +size which can be modified without affecting all the filesystem and +VM system users of struct buf. +.PP +The missing bits are: +.IP "" 5n +\(bu changes to struct bio to make it more +stackable. This mostly relates to the handling of the biodone() +event, something which will be transparent to all current users +of struct buf/bio. +.IP +\(bu code to stich modules together and to pass events and notifications +between them. +.NH 1 +An Implementation plan for stackable BIO layers +.PP +My plan for implementation stackable BIO layers is to first complete +the struct bio/buf divorce with the already mentioned patch. +.PP +The next step is to re-implement the monolithic disk-mini-layer so +that it becomes the stackable BIO system. Vinum and CCD and all +other consumers should not be unable to tell the difference between +the current and the new disk-mini-layer. The new implementation +will initially use a static stacking to remain compatible with the +current behaviour. This will be the next logical checkpoint commit. +.PP +The next step is to make the stackable layers configurable, +to provide the means to initialise the stacking and to subsequently +change it. This will be the next logical checkpoint commit. +.PP +At this point new functionality can be added inside the stackable +BIO system: CCD can be re-implemented as a mirror module and a stripe +module. Vinum can be integrated either as one "macro-module" or +as separate functions in separate modules. Also modules for other +purposes can be added, sub-disk handling for Solaris, MacOS, etc +etc. These modules can be committed one at a time. diff --git a/share/doc/papers/bufbio/bufsize.eps b/share/doc/papers/bufbio/bufsize.eps new file mode 100644 index 0000000..2396ac6 --- /dev/null +++ b/share/doc/papers/bufbio/bufsize.eps @@ -0,0 +1,479 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: a.ps +%%Creator: $FreeBSD$ +%%CreationDate: Sat Apr 8 08:32:58 2000 +%%DocumentFonts: (atend) +%%BoundingBox: 50 50 410 302 +%%Orientation: Portrait +%%EndComments +/gnudict 256 dict def +gnudict begin +/Color false def +/Solid false def +/gnulinewidth 5.000 def +/userlinewidth gnulinewidth def +/vshift -46 def +/dl {10 mul} def +/hpt_ 31.5 def +/vpt_ 31.5 def +/hpt hpt_ def +/vpt vpt_ def +/M {moveto} bind def +/L {lineto} bind def +/R {rmoveto} bind def +/V {rlineto} bind def +/vpt2 vpt 2 mul def +/hpt2 hpt 2 mul def +/Lshow { currentpoint stroke M + 0 vshift R show } def +/Rshow { currentpoint stroke M + dup stringwidth pop neg vshift R show } def +/Cshow { currentpoint stroke M + dup stringwidth pop -2 div vshift R show } def +/UP { dup vpt_ mul /vpt exch def hpt_ mul /hpt exch def + /hpt2 hpt 2 mul def /vpt2 vpt 2 mul def } def +/DL { Color {setrgbcolor Solid {pop []} if 0 setdash } + {pop pop pop Solid {pop []} if 0 setdash} ifelse } def +/BL { stroke gnulinewidth 2 mul setlinewidth } def +/AL { stroke gnulinewidth 2 div setlinewidth } def +/UL { gnulinewidth mul /userlinewidth exch def } def +/PL { stroke userlinewidth setlinewidth } def +/LTb { BL [] 0 0 0 DL } def +/LTa { AL [1 dl 2 dl] 0 setdash 0 0 0 setrgbcolor } def +/LT0 { PL [] 1 0 0 DL } def +/LT1 { PL [4 dl 2 dl] 0 1 0 DL } def +/LT2 { PL [2 dl 3 dl] 0 0 1 DL } def +/LT3 { PL [1 dl 1.5 dl] 1 0 1 DL } def +/LT4 { PL [5 dl 2 dl 1 dl 2 dl] 0 1 1 DL } def +/LT5 { PL [4 dl 3 dl 1 dl 3 dl] 1 1 0 DL } def +/LT6 { PL [2 dl 2 dl 2 dl 4 dl] 0 0 0 DL } def +/LT7 { PL [2 dl 2 dl 2 dl 2 dl 2 dl 4 dl] 1 0.3 0 DL } def +/LT8 { PL [2 dl 2 dl 2 dl 2 dl 2 dl 2 dl 2 dl 4 dl] 0.5 0.5 0.5 DL } def +/Pnt { stroke [] 0 setdash + gsave 1 setlinecap M 0 0 V stroke grestore } def +/Dia { stroke [] 0 setdash 2 copy vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath stroke + Pnt } def +/Pls { stroke [] 0 setdash vpt sub M 0 vpt2 V + currentpoint stroke M + hpt neg vpt neg R hpt2 0 V stroke + } def +/Box { stroke [] 0 setdash 2 copy exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath stroke + Pnt } def +/Crs { stroke [] 0 setdash exch hpt sub exch vpt add M + hpt2 vpt2 neg V currentpoint stroke M + hpt2 neg 0 R hpt2 vpt2 V stroke } def +/TriU { stroke [] 0 setdash 2 copy vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath stroke + Pnt } def +/Star { 2 copy Pls Crs } def +/BoxF { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath fill } def +/TriUF { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath fill } def +/TriD { stroke [] 0 setdash 2 copy vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath stroke + Pnt } def +/TriDF { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath fill} def +/DiaF { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath fill } def +/Pent { stroke [] 0 setdash 2 copy gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath stroke grestore Pnt } def +/PentF { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath fill grestore } def +/Circle { stroke [] 0 setdash 2 copy + hpt 0 360 arc stroke Pnt } def +/CircleF { stroke [] 0 setdash hpt 0 360 arc fill } def +/C0 { BL [] 0 setdash 2 copy moveto vpt 90 450 arc } bind def +/C1 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc closepath fill + vpt 0 360 arc closepath } bind def +/C2 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C3 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C4 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 180 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C5 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc + 2 copy moveto + 2 copy vpt 180 270 arc closepath fill + vpt 0 360 arc } bind def +/C6 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C7 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C8 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 270 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C9 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 270 450 arc closepath fill + vpt 0 360 arc closepath } bind def +/C10 { BL [] 0 setdash 2 copy 2 copy moveto vpt 270 360 arc closepath fill + 2 copy moveto + 2 copy vpt 90 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C11 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 180 arc closepath fill + 2 copy moveto + 2 copy vpt 270 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C12 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 180 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C13 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc closepath fill + 2 copy moveto + 2 copy vpt 180 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C14 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 360 arc closepath fill + vpt 0 360 arc } bind def +/C15 { BL [] 0 setdash 2 copy vpt 0 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/Rec { newpath 4 2 roll moveto 1 index 0 rlineto 0 exch rlineto + neg 0 rlineto closepath } bind def +/Square { dup Rec } bind def +/Bsquare { vpt sub exch vpt sub exch vpt2 Square } bind def +/S0 { BL [] 0 setdash 2 copy moveto 0 vpt rlineto BL Bsquare } bind def +/S1 { BL [] 0 setdash 2 copy vpt Square fill Bsquare } bind def +/S2 { BL [] 0 setdash 2 copy exch vpt sub exch vpt Square fill Bsquare } bind def +/S3 { BL [] 0 setdash 2 copy exch vpt sub exch vpt2 vpt Rec fill Bsquare } bind def +/S4 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt Square fill Bsquare } bind def +/S5 { BL [] 0 setdash 2 copy 2 copy vpt Square fill + exch vpt sub exch vpt sub vpt Square fill Bsquare } bind def +/S6 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill Bsquare } bind def +/S7 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill + 2 copy vpt Square fill + Bsquare } bind def +/S8 { BL [] 0 setdash 2 copy vpt sub vpt Square fill Bsquare } bind def +/S9 { BL [] 0 setdash 2 copy vpt sub vpt vpt2 Rec fill Bsquare } bind def +/S10 { BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt Square fill + Bsquare } bind def +/S11 { BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt2 vpt Rec fill + Bsquare } bind def +/S12 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill Bsquare } bind def +/S13 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill + 2 copy vpt Square fill Bsquare } bind def +/S14 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill + 2 copy exch vpt sub exch vpt Square fill Bsquare } bind def +/S15 { BL [] 0 setdash 2 copy Bsquare fill Bsquare } bind def +/D0 { gsave translate 45 rotate 0 0 S0 stroke grestore } bind def +/D1 { gsave translate 45 rotate 0 0 S1 stroke grestore } bind def +/D2 { gsave translate 45 rotate 0 0 S2 stroke grestore } bind def +/D3 { gsave translate 45 rotate 0 0 S3 stroke grestore } bind def +/D4 { gsave translate 45 rotate 0 0 S4 stroke grestore } bind def +/D5 { gsave translate 45 rotate 0 0 S5 stroke grestore } bind def +/D6 { gsave translate 45 rotate 0 0 S6 stroke grestore } bind def +/D7 { gsave translate 45 rotate 0 0 S7 stroke grestore } bind def +/D8 { gsave translate 45 rotate 0 0 S8 stroke grestore } bind def +/D9 { gsave translate 45 rotate 0 0 S9 stroke grestore } bind def +/D10 { gsave translate 45 rotate 0 0 S10 stroke grestore } bind def +/D11 { gsave translate 45 rotate 0 0 S11 stroke grestore } bind def +/D12 { gsave translate 45 rotate 0 0 S12 stroke grestore } bind def +/D13 { gsave translate 45 rotate 0 0 S13 stroke grestore } bind def +/D14 { gsave translate 45 rotate 0 0 S14 stroke grestore } bind def +/D15 { gsave translate 45 rotate 0 0 S15 stroke grestore } bind def +/DiaE { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath stroke } def +/BoxE { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath stroke } def +/TriUE { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath stroke } def +/TriDE { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath stroke } def +/PentE { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath stroke grestore } def +/CircE { stroke [] 0 setdash + hpt 0 360 arc stroke } def +/Opaque { gsave closepath 1 setgray fill grestore 0 setgray closepath } def +/DiaW { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V Opaque stroke } def +/BoxW { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V Opaque stroke } def +/TriUW { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V Opaque stroke } def +/TriDW { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V Opaque stroke } def +/PentW { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + Opaque stroke grestore } def +/CircW { stroke [] 0 setdash + hpt 0 360 arc Opaque stroke } def +/BoxFill { gsave Rec 1 setgray fill grestore } def +end +%%EndProlog +gnudict begin +gsave +50 50 translate +0.050 0.050 scale +0 setgray +newpath +(Helvetica) findfont 140 scalefont setfont +1.000 UL +LTb +630 420 M +63 0 V +6269 0 R +-63 0 V +546 420 M +(0) Rshow +630 1020 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(50) Rshow +630 1620 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(100) Rshow +630 2220 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(150) Rshow +630 2820 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(200) Rshow +630 3420 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(250) Rshow +630 4020 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(300) Rshow +630 4620 M +63 0 V +6269 0 R +-63 0 V +-6353 0 R +(350) Rshow +630 420 M +0 63 V +0 4137 R +0 -63 V +630 280 M +(0) Cshow +1263 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(10) Cshow +1896 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(20) Cshow +2530 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(30) Cshow +3163 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(40) Cshow +3796 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(50) Cshow +4429 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(60) Cshow +5062 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(70) Cshow +5696 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(80) Cshow +6329 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(90) Cshow +6962 420 M +0 63 V +0 4137 R +0 -63 V +0 -4277 R +(100) Cshow +1.000 UL +LTb +630 420 M +6332 0 V +0 4200 V +-6332 0 V +630 420 L +140 2520 M +currentpoint gsave translate 90 rotate 0 0 M +(Bytes) Cshow +grestore +3796 70 M +(CVS revision of <sys/buf.h>) Cshow +3796 4830 M +(Sizeof\(struct buf\)) Cshow +1.000 UL +LT0 +693 1764 M +64 384 V +63 0 V +63 0 V +64 -96 V +63 0 V +63 0 V +64 816 V +63 0 V +63 0 V +64 768 V +63 48 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 48 V +63 96 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 -48 V +63 0 V +63 -48 V +64 0 V +63 0 V +63 96 V +64 0 V +63 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 48 V +64 0 V +63 48 V +63 96 V +64 -48 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +63 0 V +64 96 V +63 -96 V +63 -48 V +64 48 V +63 0 V +63 384 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +64 0 V +63 0 V +63 0 V +63 48 V +64 0 V +63 0 V +63 96 V +64 96 V +63 0 V +stroke +grestore +end +showpage +%%Trailer +%%DocumentFonts: Helvetica diff --git a/share/doc/papers/contents/Makefile b/share/doc/papers/contents/Makefile new file mode 100644 index 0000000..d15ff9c --- /dev/null +++ b/share/doc/papers/contents/Makefile @@ -0,0 +1,8 @@ +# $FreeBSD$ + +VOLUME= papers +DOC= contents +SRCS= contents.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/papers/contents/contents.ms b/share/doc/papers/contents/contents.ms new file mode 100644 index 0000000..12b287a --- /dev/null +++ b/share/doc/papers/contents/contents.ms @@ -0,0 +1,218 @@ +.\" Copyright (c) 1996 FreeBSD Inc. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.OH '''Papers Contents' +.EH 'Papers Contents''' +.TL +UNIX Papers coming with FreeBSD +.PP +These papers are of both historic and current interest, but most of them are +many years old. +More recent documentation is available from +.>> <a href="http://www.freebsd.org/docs/"> +http://www.FreeBSD.org/docs/ +.>> </a> + +.IP +.tl '\fBBerkeley Pascal''px\fP' +.if !r.U .nr .U 0 +.if \n(.U \{\ +.br +.>> <a href="px.html">px.html</a> +.\} +.QP +Berkeley Pascal +PX Implementation Notes +.br +Version 2.0 +.sp +Performance Effects of Disk Subsystem Choices +for VAX\(dg Systems Running 4.2BSD UNIX. +.sp +William N. Joy, M. Kirk McKusick. +.sp +Revised January, 1979. + +.sp +.IP +.tl '\fBDisk Performance''diskperf\fP' +.if \n(.U \{\ +.br +.>> <a href="diskperf.html">diskperf.html</a> +.\} +.QP +Performance Effects of Disk Subsystem Choices +for VAX\(dg Systems Running 4.2BSD UNIX. +.sp +Bob Kridle, Marshall Kirk McKusick. +.sp +Revised July 27, 1983. + +.sp +.IP +.tl '\fBTune the 4.2BSD Kernel''kerntune\fP' +.if \n(.U \{\ +.br +.>> <a href="kerntune.html">kerntune.html</a> +.\} +.QP +Using gprof to Tune the 4.2BSD Kernel. +.sp +Marshall Kirk McKusick. +.sp +Revised May 21, 1984 (?). + +.sp +.IP +.tl '\fBNew Virtual Memory''newvm\fP' +.if \n(.U \{\ +.br +.>> <a href="newvm.html">newvm.html</a> +.\} +.QP +A New Virtual Memory Implementation for Berkeley. +.sp +Marshall Kirk McKusick, Michael J. Karels. +.sp +Revised 1986. + +.sp +.IP +.tl '\fBKernel Malloc''kernmalloc\fP' +.if \n(.U \{\ +.br +.>> <a href="kernmalloc.html">kernmalloc.html</a> +.\} +.QP +Design of a General Purpose Memory Allocator for the 4.3BSD UNIX Kernel. +.sp +Marshall Kirk McKusick, Michael J. Karels. +.sp +Reprinted from: +\fIProceedings of the San Francisco USENIX Conference\fP, +pp. 295-303, June 1988. + +.sp +.IP +.tl '\fBRelease Engineering''relengr\fP' +.if \n(.U \{\ +.br +.>> <a href="releng.html">releng.html</a> +.\} +.QP +The Release Engineering of 4.3\s-1BSD\s0. +.sp +Marshall Kirk McKusick, Michael J. Karels, Keith Bostic. +.sp +Revised 1989. + +.sp +.IP +.tl '\fBBeyond 4.3BSD''beyond4.3\fP' +.if \n(.U \{\ +.br +.>> <a href="beyond43.html">beyond43.html</a> +.\} +.QP +Current Research by The Computer Systems Research Group of Berkeley. +.sp +Marshall Kirk McKusick, Michael J Karels, Keith Sklower, Kevin Fall, +Marc Teitelbaum, Keith Bostic. +.sp +Revised February 2, 1989. + +.sp +.IP +.tl '\fBFilesystem Interface''fsinterface\fP' +.if \n(.U \{\ +.br +.>> <a href="fsinterface.html">fsinterface.html</a> +.\} +.QP +Toward a Compatible Filesystem Interface. +.sp +Michael J. Karels, Marshall Kirk McKusick. +.sp +Conference of the European Users' Group, September 1986. +Last modified April 16, 1991. + +.sp +.IP +.tl '\fBSystem Performance''sysperf\fP' +.if \n(.U \{\ +.br +.>> <a href="sysperf.html">sysperf.html</a> +.\} +.QP +Measuring and Improving the Performance of Berkeley UNIX. +.sp +Marshall Kirk McKusick, Samuel J. Leffler, Michael J. Karels. +.sp +Revised April 17, 1991. + +.sp +.IP +.tl '\fBNot Quite NFS''nqnfs\fP' +.if \n(.U \{\ +.br +.>> <a href="nqnfs.html">nqnfs.html</a> +.\} +.QP +Not Quite NFS, Soft Cache Consistency for NFS. +.sp +Rick Macklem. +.sp +Reprinted with permission from the "Proceedings of the Winter 1994 Usenix +Conference", January 1994, San Francisco. + +.sp +.IP +.tl '\fBMalloc(3)''malloc\fP' +.if \n(.U \{\ +.br +.>> <a href="malloc.html">malloc.html</a> +.\} +.QP +Malloc(3) in modern Virtual Memory environments. +.sp +Poul-Henning Kamp. +.sp +Revised April 5, 1996. + +.sp +.IP +.tl '\fBJails: Confining the omnipotent root''jail\fP' +.if \n(.U \{\ +.br +.>> <a href="jail.html">jail.html</a> +.\} +.QP +The jail system call sets up a jail and locks the current process in it. +.sp +Poul-Henning Kamp, Robert N. M. Watson. +.sp +This paper was presented at the 2nd International System Administration +and Networking Conference "SANE 2000" May 22-25, 2000 in Maastricht, +The Netherlands and is published in the proceedings. diff --git a/share/doc/papers/devfs/Makefile b/share/doc/papers/devfs/Makefile new file mode 100644 index 0000000..53a79fc --- /dev/null +++ b/share/doc/papers/devfs/Makefile @@ -0,0 +1,9 @@ +# $FreeBSD$ + +VOLUME= papers +DOC= devfs +SRCS= paper.me +MACROS= -me +USE_PIC= + +.include <bsd.doc.mk> diff --git a/share/doc/papers/devfs/paper.me b/share/doc/papers/devfs/paper.me new file mode 100644 index 0000000..9b775e9 --- /dev/null +++ b/share/doc/papers/devfs/paper.me @@ -0,0 +1,1277 @@ +.\" format with ditroff -me +.\" $FreeBSD$ +.\" format made to look as a paper for the proceedings is to look +.\" (as specified in the text) +.if n \{ .po 0 +. ll 78n +. na +.\} +.if t \{ .po 1.0i +. ll 6.5i +. nr pp 10 \" text point size +. nr sp \n(pp+2 \" section heading point size +. nr ss 1.5v \" spacing before section headings +.\} +.nr tm 1i +.nr bm 1i +.nr fm 2v +.he '''' +.de bu +.ip \0\s-2\(bu\s+2 +.. +.lp +.rs +.ce 5 +.sp +.sz 14 +.b "Rethinking /dev and devices in the UNIX kernel" +.sz 12 +.sp +.i "Poul-Henning Kamp" +.sp .1 +.i "<phk@FreeBSD.org>" +.i "The FreeBSD Project" +.i +.sp 1.5 +.b Abstract +.lp +An outstanding novelty in UNIX at its introduction was the notion +of ``a file is a file is a file and even a device is a file.'' +Going from ``hardware only changes when the DEC Field engineer is here'' +to ``my toaster has USB'' has put serious strain on the rather crude +implementation of the ``devices as files'' concept, an implementation which +has survived practically unchanged for 30 years in most UNIX variants. +Starting from a high-level view of devices and the semantics that +have grown around them over the years, this paper takes the audience on a +grand tour of the redesigned FreeBSD device-I/O system, +to convey an overview of how it all fits together, and to explain why +things ended up as they did, how to use the new features and +in particular how not to. +.sp +.if t \{ +.2c +.\} +.\" end boilerplate... paper starts here. +.sh 1 "Introduction" +.sp +There are really only two fundamental ways to conceptualise +I/O devices in an operating system: +The usual way and the UNIX way. +.lp +The usual way is to treat I/O devices as their own class of things, +possibly several classes of things, and provide APIs tailored +to the semantics of the devices. +In practice this means that a program must know what it is dealing +with, it has to interact with disks one way, tapes another and +rodents yet a third way, all of which are different from how it +interacts with a plain disk file. +.lp +The UNIX way has never been described better than in the very first +paper +published on UNIX by Ritchie and Thompson [Ritchie74]: +.(q +Special files constitute the most unusual feature of the UNIX filesystem. +Each supported I/O device is associated with at least one such file. +Special files are read and written just like ordinary disk files, +but requests to read or write result in activation of the associated device. +An entry for each special file resides in directory /dev, +although a link may be made to one of these files just as it may to an +ordinary file. +Thus, for example, to write on a magnetic tape one may write on the file /dev/mt. + +Special files exist for each communication line, each disk, each tape drive, +and for physical main memory. +Of course, the active disks and the memory special files are protected from indiscriminate access. + +There is a threefold advantage in treating I/O devices this way: +file and device I/O are as similar as possible; +file and device names have the same syntax and meaning, +so that a program expecting a file name as a parameter can be passed a device name; +finally, special files are subject to the same protection mechanism as regular files. +.)q +.lp +.\" (Why was this so special at the time?) +At the time, this was quite a strange concept; it was totally accepted +for instance, that neither the system administrator nor the users were +able to interact with a disk as a disk. +Operating systems simply +did not provide access to disk other than as a filesystem. +Most vendors did not even release a program to initialise a +disk-pack with a filesystem: selling pre-initialised and ``quality +tested'' disk-packs was quite a profitable business. +.lp +In many cases some kind of API for reading and +writing individual sectors on a disk pack +did exist in the operating system, +but more often than not +it was not listed in the public documentation. +.sh 2 "The traditional implementation" +.lp +.\" (Explain how opening /dev/lpt0 lands you in the right device driver) +The initial implementation used hardcoded inode numbers [Ritchie98]. +The console +device would be inode number 5, the paper-tape-punch number 6 and so on, +even if those inodes were also actual regular files in the filesystem. +.lp +For reasons one can only too vividly imagine, this was changed and +Thompson +[Thompson78] +describes how the implementation now used ``major and minor'' +device numbers to index though the devsw array to the correct device driver. +.lp +For all intents and purposes, this is the implementation which survives +in most UNIX-like systems even to this day. +Apart from the access control and timestamp information which is +found in all inodes, the special inodes in the filesystem contain only +one piece of information: the major and minor device numbers, often +logically OR'ed to one field. +.lp +When a program opens a special file, the kernel uses the major number +to find the entry points in the device driver, and passes the combined +major and minor numbers as a parameter to the device driver. +.sh 1 "The challenge" +.lp +Now, we did not talk much about where the special inodes came from +to begin with. +They were created by hand, using the +mknod(2) system call, usually through the mknod(8) program. +.lp +In those days a +computer had a very static hardware configuration\** +.(f +\** Unless your assigned field engineer was present on site. +.)f +and it certainly did not +change while the system was up and running, so creating device nodes +by hand was certainly an acceptable solution. +.lp +The first sign that this would not hold up as a solution came with +the advent of TCP/IP and the telnet(1) program, or more precisely +with the telnetd(8) daemon. +In order to support remote login a ``pseudo-tty'' device driver was implemented, +basically as tty driver which instead of hardware had another device which +would allow a process to ``act as hardware'' for the tty. +The telnetd(8) daemon would read and write data on the ``master'' side of +the pseudo-tty and the user would be running on the ``slave'' side, +which would act just like any other tty: you could change the erase +character if you wanted to and all the signals and all that stuff worked. +.lp +Obviously with a device requiring no hardware, you can compile as many +instances into the kernel as you like, as long as you do not use +too much memory. +As system after system was connected +to the ARPANet, ``increasing number of ptys'' became a regular task +for system administrators, and part of this task was to create +more special nodes in the filesystem. +.lp +Several UNIX vendors also noticed an issue when they sold minicomputers +in many different configurations: explaining to system administrators +just which special nodes they would need and how to create them were +a significant documentation hassle. Some opted for the simple solution +and pre-populated /dev with every conceivable device node, resulting +in a predictable slowdown on access to filenames in /dev. +.lp +System V UNIX provided a band-aid solution: +a special boot sequence would take effect if the kernel or +the hardware had changed since last reboot. +This boot procedure would +amongst other things create the necessary special files in the filesystem, +based on an intricate system of per device driver configuration files. +.lp +In the recent years, we have become used to hardware which changes +configuration at any time: people plug USB, Firewire and PCCard +devices into their computers. +These devices can be anything from modems and disks to GPS receivers +and fingerprint authentication hardware. +Suddenly maintaining the +correct set of special devices in ``/dev'' became a major headache. +.lp +Along the way, UNIX kernels had learned to deal with multiple filesystem +types [Heidemann91a] and a ``device-pseudo-filesystem'' was a pretty +obvious idea. +The device drivers have a pretty good idea which +devices they have found in the configuration, so all that is needed is +to present this information as a filesystem filled with just the right +special files. +Experience has shown that this like most other ``pseudo +filesystems'' sound a lot simpler in theory than in practice. +.sh 1 "Truly understanding devices" +.lp +Before we continue, we need to fully understand the +``device special file'' in UNIX. +.lp +First we need to realize that a special file has the nature of +a pointer from the filesystem into a different namespace; +a little understood fact with far reaching consequences. +.lp +One implication of this is that several special files can +exist in the filename namespace all pointing to the same device +but each having their own access and timestamp attributes: +.lp +.(b M +.vs -3 +\fC\s-3guest# ls -l /dev/fd0 /tmp/fd0 +crw-r----- 1 root operator 9, 0 Sep 27 19:21 /dev/fd0 +crw-rw-rw- 1 root wheel 9, 0 Sep 27 19:24 /tmp/fd0\fP\s+3 +.vs +3 +.)b +Obviously, the administrator needs to be on top of this: +one popular way to exploit an unguarded root prompt is +to create a replica of the special file /dev/kmem +in a location where it will not be noticed. +Since /dev/kmem gives access to the kernel memory, +gaining any particular +privilege can be arranged by suitably modifying the kernel's +data structures through the illicit special file. +.lp +When NFS appeared it opened a new avenue for this attack: +People may have root privilege on one machine but not another. +Since device nodes are not interpreted on the NFS server +but rather on the local computer, +a user with root privilege on a NFS client +computer can create a device node to his liking on a filesystem +mounted from an NFS server. +This device node can in turn be used to +circumvent the security of other computers which mount that filesystem, +including the server, unless they protect themselves by not +trusting any device entries on untrusted filesystem by mounting such +filesystems with the \fCnodev\fP mount-option. +.lp +The fact that the device itself does not actually exist inside the +filesystem which holds the special file makes it possible +to perform boot-strapping stunts in the spirit +of Baron Von Münchausen [raspe1785], +where a filesystem is (re)mounted using one of its own +device vnodes: +.(b M +.vs -3 +\fC\s-2guest# mount -o ro /dev/fd0 /mnt +guest# fsck /mnt/dev/fd0 +guest# mount -u -o rw /mnt/dev/fd0 /mnt\fP\s+2 +.vs +3 +.)b +.lp +Other interesting details are chroot(2) and jail(2) [Kamp2000] which +provide filesystem isolation for process-trees. +Whereas chroot(2) was not implemented as a security tool [Mckusick1999] +(although it has been widely used as such), the jail(2) security +facility in FreeBSD provides a pretty convincing ``virtual machine'' +where even the root privilege is isolated and restricted to the designated +area of the machine. +Obviously chroot(2) and jail(2) may require access to a well-defined +subset of devices like /dev/null, /dev/zero and /dev/tty, +whereas access to other devices such as /dev/kmem +or any disks could be used to compromise the integrity of the jail(2) +confinement. +.lp +For a long time FreeBSD, like almost all UNIX-like systems had two kinds +of devices, ``block'' and +``character'' special files, the difference being that ``block'' +devices would provide caching and alignment for disk device access. +This was one of those minor architectural mistakes which took +forever to correct. +.lp +The argument that block devices were a mistake is really very +very simple: Many devices other than disks have multiple modes +of access which you select by choosing which special file to use. +.lp +Pick any old timer and he will be able to recite painful +sagas about the crucial difference between the /dev/rmt +and /dev/nrmt devices for tape access.\** +.(f +\** Make absolutely sure you know the difference before you take +important data on a multi-file 9-track tape to remote locations. +.)f +.lp +Tapes, asynchronous ports, line printer ports and many other devices +have implemented submodes, selectable by the user +at a special filename level, but that has not earned them their +own special file types. +Only disks\** +.(f +\** Well, OK: and some 9-track tapes. +.)f +have enjoyed the privilege of getting an entire file type dedicated to a +a minor device mode. +.lp +Caching and alignment modes should have been enabled by setting +some bit in the minor device number on the disk special file, +not by polluting the filesystem code with another file type. +.lp +In FreeBSD block devices were not even implemented in a fashion +which would be of any use, since any write errors would never be +reported to the writing process. For this reason, and since no +applications +were found to be in existence which relied on block devices +and since historical usage was indeed historical [Mckusick2000], +block devices were removed from the FreeBSD system. +This greatly simlified the task of keeping track of open(2) +reference counts for disks and +removed much magic special-case code throughout. +.lp +.sh 1 "Files, sockets, pipes, SVID IPC and devices" +.sp +It is an instructive lesson in inconsistency to look at the +various types of ``things'' a process can access in UNIX-like +systems today. +.lp +First there are normal files, which are our reference yardstick here: +they are accessed with open(2), read(2), write(2), mmap(2), close(2) +and various other auxiliary system calls. +.lp +Sockets and pipes are also accessed via file handles but each has +its own namespace. That means you cannot open(2) a socket,\** +.(f +\** This is particularly bizarre in the case of UNIX domain sockets +which use the filesystem as their namespace and appear in directory +listings. +.)f +but you can read(2) and write(2) to it. +Sockets and pipes vector off at the file descriptor level and do +not get in touch with the vnode based part of the kernel at all. +.lp +Devices land somewhere in the middle between pipes and sockets on +one side and normal files on the other. +They use the filesystem +namespace, are implemented with vnodes, and can be operated +on like normal files, but don't actually live in the filesystem. +.lp +Devices are in fact special-cased all the way through the vnode system. +For one thing devices break the ``one file-one vnode'' +rule, making it necessary to chain all vnodes for the same +device together in +order to be able to find ``the canonical vnode for this device node'', +but more importantly, many operations have to be specifically denied +on special file vnodes since they do not make any sense. +.lp +For true inconsistency, consider the SVID IPC mechanisms - not +only do they not operate via file handles, +but they also sport a singularly +illconceived 32 bit numeric namespace and a dedicated set of +system calls for access. +.lp +Several people have convincingly argued that this is an inconsistent +mess, and have proposed and implemented more consistent operating systems +like the Plan9 from Bell Labs [Pike90a] [Pike92a]. +Unfortunately reality is that people are not interested in learning a new +operating system when the one they have is pretty darn good, and +consequently research into better and more consistent ways is +a pretty frustrating [Pike2000] but by no means irrelevant topic. +.sh 1 "Solving the /dev maintenance problem" +.lp +There are a number of obvious, simple but wrong ways one could +go about solving the ``/dev'' maintenance problem. +.lp +The very straightforward way is to hack the namei() kernel function +responsible for filename translation and lookup. +It is only a minor matter of programming to +add code to special-case any lookup which ends up in ``/dev''. +But this leads to problems: in the case of chroot(2) or jail(2), the +administrator will want to present only a subset of the available +devices in ``/dev'', so some kind of state will have to be kept per +chroot(2)/jail(2) about which devices are visible and +which devices are hidden, but no obvious location for this information +is available in the absence of a mount data structure. +.lp +It also leads to some unpleasant issues +because of the fact that ``/dev/foo'' is a synthesised directory +entry which may or may not actually be present on the filesystem +which seems to provide ``/dev''. +The vnodes either have to belong to a filesystem or they +must be special-cased throughout the vnode layer of the kernel. +.lp +Finally there is the simple matter of generality: +hardcoding the string "/dev" in the kernel is very general. +.lp +A cruder solution is to leave it to a daemon: make a special +device driver, have a daemon read messages from it and create and +destroy nodes in ``/dev'' in response to these messages. +.lp +The main drawback to this idea is that now we have added IPC +to the mix introducing new and interesting race conditions. +.lp +Otherwise this solution is a surprisingly effective, +but chroot(2)/jail(2) requirements prevents a simple implementation +and running a daemon per jail would become an administrative +nightmare. +.lp +Another pitfall of +this approach is that we are not able to remount the root filesystem +read-write at boot until we have a device node for the root device, +but if this node is missing we cannot create it with a daemon since +the root filesystem (and hence /dev) is read-only. +Adding a read-write memory-filesystem mount /dev to solve this problem +does not improve +the architectural qualities further and certainly the KISS principle has +been violated by now. +.lp +The final and in the end only satisfactory solution is to write a ``DEVFS'' +which mounts on ``/dev''. +.lp +The good news is that it does solve the problem with chroot(2) and jail(2): +just mount a DEVFS instance on the ``dev'' directory inside the filesystem +subtree where the chroot or jail lives. Having a mountpoint gives us +a convenient place to keep track of the local state of this DEVFS mount. +.lp +The bad news is that it takes a lot of cleanup and care to implement +a DEVFS into a UNIX kernel. +.sh 1 "DEVFS architectural decisions" +.lp +Before implementing a DEVFS, it is necessary to decide on a range +of corner cases in behaviour, and some of these choices have proved +surprisingly hard to settle for the FreeBSD project. +.sh 2 "The ``persistence'' issue" +.lp +When DEVFS in FreeBSD was initially presented at a BoF at the 1995 +USENIX Technical Conference in New Orleans, +a group of people demanded that it provide ``persistence'' +for administrative changes. +.lp +When trying to get a definition of ``persistence'', people can generally +agree that if the administrator changes the access control bits of +a device node, they want that mode to survive across reboots. +.lp +Once more tricky examples of the sort of manipulations one can do +on special files are proposed, people rapidly disagree about what +should be supported and what should not. +.lp +For instance, imagine a +system with one floppy drive which appears in DEVFS as ``/dev/fd0''. +Now the administrator, in order to get some badly written software +to run, links this to ``/dev/fd1'': +.(b M +\fC\s-2ln /dev/fd0 /dev/fd1\fP\s+2 +.)b +This works as expected and with persistence in DEVFS, the link is +still there after a reboot. +But what if after a reboot another floppy drive has been connected +to the system? +This drive would naturally have the name ``/dev/fd1'', +but this name is now occupied by the administrators hard link. +Should the link be broken? +Should the new floppy drive be called +``/dev/fd2''? Nobody can agree on anything but the ugliness of the +situation. +.lp +Given that we are no longer dependent on DEC Field engineers to +change all four wheels to see which one is flat, the basic assumption +that the machine has a constant hardware configuration is simply no +longer true. +The new assumption one should start from when analysing this +issue is that when the system boots, we cannot know what devices we +will find, and we can not know if the devices we do find +are the same ones we had when the system was last shut down. +.lp +And in fact, this is very much the case with laptops today: if I attach +my IOmega Zip drive to my laptop it appears like a SCSI disk named +``/dev/da0'', but so does the RAID-5 array attached to the PCI SCSI controller +installed in my laptop's docking station. If I change mode to ``a+rw'' +on the Zip drive, do I want that mode to apply to the RAID-5 as well? +Unlikely. +.lp +And what if we have persistent information about the mode of +device ``/dev/sio0'', but we boot and do not find any sio devices? +Do we keep the information in our device-persistence registry? +How long do we keep it? If I borrow a modem card, +set the permissions to some non-standard value like 0666, +and then attach some other serial device a year from now - do I +want some old permissions changes to come back and haunt me, +just because they both happened to be ``/dev/sio0''? +Unlikely. +.lp +The fact that more people have laptop computers today than +five years ago, and the fact that nobody has been able to credibly +propose where a persistent DEVFS would actually store the +information about these things in the first place has settled the issue. +.lp +Persistence may be the right answer, but to the +wrong question: persistence is not a desirable property for a DEVFS +when the hardware configuration may change literally at any time. +.sh 2 "Who decides on the names?" +.lp +In a DEVFS-enabled system, the responsibility for creating nodes in +/dev shifts to the device drivers, and consequently the device +drivers get to choose the names of the device files. +In addition an initial value for owner, group and mode bits are +provided by the device driver. +.lp +But should it be possible to rename ``/dev/lpt0'' to ``/dev/myprinter''? +While the obvious affirmative answer is easy to arrive at, it leaves +a lot to be desired once the implications are unmasked. +.lp +Most device drivers know their own name and use it purposefully in +their debug and log messages to identify themselves. +Furthermore, the ``NewBus'' [NewBus] infrastructure facility, +which ties hardware to device drivers, identifies things by name +and unit numbers. +.lp +A very common way to report errors in fact: +.(b M +.vs -3 +\fC\s-2#define LPT_NAME "lpt" /* our official name */ +[...] +printf(LPT_NAME + ": cannot alloc ppbus (%d)!", error);\fP\s+2 +.vs +3 +.)b +.lp +So despite the user renaming the device node pointing to the printer +to ``myprinter'', this has absolutely no effect in the kernel and can +be considered a userland aliasing operation. +.lp +The decision was therefore made that it should not be possible to rename +device nodes since it would only lead to confusion and because the desired +effect could be attained by giving the user the ability to create +symlinks in DEVFS. +.sh 2 "On-demand device creation" +.lp +Pseudo-devices like pty, tun and bpf, +but also some real devices, may not pre-emptively create entries for all +possible device nodes. It would be a pointless waste of resources +to always create 1000 ptys just in case they are needed, +and in the worst case more than 1800 device nodes would be needed per +physical disk to represent all possible slices and partitions. +.lp +For pseudo-devices the task at hand is to make a magic device node, +``/dev/pty'', which when opened will magically transmogrify into the +first available pty subdevice, maybe ``/dev/pty123''. +.lp +Device submodes, on the other hand, work by having multiple +entries in /dev, each with a different minor number, as a way to instruct +the device driver in aspects of its operation. The most widespread +example is probably ``/dev/mt0'' and ``/dev/nmt0'', where the node +with the extra ``n'' +instructs the tape device driver to not rewind on close.\** +.(f +\** This is the answer to the question in footnote number 2. +.)f +.lp +Some UNIX systems have solved the problem for pseudo-devices by +creating magic cloning devices like ``/dev/tcp''. +When a cloning device is opened, +it finds a free instance and through vnode and file descriptor mangling +return this new device to the opening process. +.lp +This scheme has two disadvantages: the complexity of switching vnodes +in midstream is non-trivial, but even worse is the fact that it +does not work for +submodes for a device because it only reacts to one particular /dev entry. +.lp +The solution for both needs is a more flexible on-demand device +creation, implemented in FreeBSD as a two-level lookup. +When a +filename is looked up in DEVFS, a match in the existing device nodes is +sought first and if found, returned. +If no match is found, device drivers are polled in turn to ask if +they would be able to synthesise a device node of the given name. +.lp +The device driver gets a chance to modify the name +and create a device with make_dev(). +If one of the drivers succeeds in this, the lookup is started over and +the newly found device node is returned: +.(b M +.vs -3 +\fC\s-2pty_clone() + if (name != "pty") + return(NULL); /* no luck */ + n = find_next_unit(); + dev = make_dev(...,n,"pty%d",n); + name = dev->name; + return(dev);\fP\s+2 +.vs +3 +.)b +.lp +An interesting mixed use of this mechanism is with the sound device drivers. +Modern sound devices have multiple channels, presumably to allow the +user to listen to CNN, Napstered MP3 files and Quake sound effects at +the same time. +The only problem is that all applications attempt to open ``/dev/dsp'' +since they have no concept of multiple sound devices. +The sound device drivers use the cloning facility to direct ``/dev/dsp'' +to the first available sound channel completely transparently to the +process. +.lp +There are very few drawbacks to this mechanism, the major one being +that ``ls /dev'' now errs on the sparse side instead of the rich when used +as a system device inventory, a practice which has always been +of dubious precision at best. +.sh 2 "Deleting and recreating devices" +.lp +Deleting device nodes is no problem to implement, but as likely as not, +some people will want a method to get them back. +Since only the device driver know how to create a given device, +recreation cannot be performed solely on the basis of the parameters +provided by a process in userland. +.lp +In order to not complicate the code which updates the directory +structure for a mountpoint to reflect changes in the DEVFS inode list, +a deleted entry is merely marked with DE_WHITEOUT instead of being +removed entirely. +Otherwise a separate list would be needed for inodes which we had +deleted so that they would not be mistaken for new inodes. +.lp +The obvious way to recreate deleted devices is to let mknod(2) do it +by matching the name and disregarding the major/minor arguments. +Recreating the device with mknod(2) will simply remove the DE_WHITEOUT +flag. +.sh 2 "Jail(2), chroot(2) and DEVFS" +.lp +The primary requirement from facilities like jail(2) and chroot(2) +is that it must be possible to control the contents of a DEVFS mount +point. +.lp +Obviously, it would not be desirable for dynamic devices to pop +into existence in the carefully pruned /dev of jails so it must be +possible to mark a DEVFS mountpoint as ``no new devices''. +And in the same way, the jailed root should not be able to recreate +device nodes which the real root has removed. +.lp +These behaviours will be controlled with mount options, but these have not +yet been implemented because FreeBSD has run out of bitmap flags for +mount options, and a new unlimited mount option implementation is +still not in place at the time of writing. +.lp +One mount option ``jaildevfs'', will restrict the contents of the +DEVFS mountpoint to the ``normal set'' of devices for a jail and +automatically hide all future devices and make it impossible +for a jailed root to un-hide hidden entries while letting an un-jailed +root do so. +.lp +Mounting or remounting read-only, will prevent all future +devices from appearing and will make it impossible to +hide or un-hide entries in the mountpoint. +This is probably only useful for chroots or jails where no tty +access is intended since cloning will not work either. +.lp +More mount options may be needed as more experience is gained. +.sh 2 "Default mode, owner & group" +.lp +When a device driver creates a device node, and a DEVFS mount adds it +to its directory tree, it needs to have some values for the access +control fields: mode, owner and group. +.lp +Currently, the device driver specifies the initial values in the +make_dev() call, but this is far from optimal. +For one thing, embedding magic UIDs and GIDs in the kernel is simply +bad style unless they are numerically zero. +More seriously, they represent compile-time defaults which in these +enlightened days is rather old-fashioned. +.lp +.sh 1 "Cleaning up before we build: struct specinfo and dev_t" +.lp +Most of the rest of the paper will be about the various challenges +and issues in the implementation of DEVFS in FreeBSD. +All of this should be applicable to other systems derived from +4.4BSD-Lite as well. +.lp +POSIX has defined a type called ``dev_t'' which is the identity of a device. +This is mainly for use in the few system calls which knows about devices: +stat(2), fstat(2) and mknod(2). +A dev_t is constructed by logically OR'ing +the major# and minor# for the device. +Since those have been defined +as having no overlapping bits, the major# and minor# +can be retrieved from the dev_t by a simple masking operation. +.lp +Although the kernel had a well-defined concept of any particular +device it did not have a data structure to represent "a device". +The device driver has such a structure, traditionally called ``softc'' +but the high kernel does not (and should not!) have access to the +device driver's private data structures. +.lp +It is an interesting tale how things got to be this way,\** +.(f +\** Basically, devices should have been moved up with sockets and +pipes at the file descriptor level when the VFS layering was introduced, +rather than have all the special casing throughout the vnode system. +.)f +but for now just record for +a fact how the actual relationship between the data structures was +in the 4.4BSD release (Fig. 1). [44BSDBook] +.(z +.PS 3 +F: box "file" "handle" +arrow down from F.s +V: box "vnode" +arrow right from V.e +S: box "specinfo" +arrow down from V.s +I: box "inode" +arrow right from I.e +C: box invis "devsw[]" "[major#]" +arrow down from C.s +D: box "device" "driver" +line right from D.e +box invis "softc[]" "[minor#]" +F2: box "file" "handle" at F + (2.5,0) +arrow down from F2.s +V2: box "vnode" +arrow right from V2.e +S2: box "specinfo" +arrow down from V2.s +I2: box "inode" +arrow left from I2.w +.PE +.ce 1 +Fig. 1 - Data structures in 4.4BSD +.)z +.lp +As for all other files, a vnode references a filesystem inode, but +in addition it points to a ``specinfo'' structure. In the inode +we find the dev_t which is used to reference the device driver. +.lp +Access to the device driver happens by extracting the major# from +the dev_t, indexing through the global devsw[] array to locate +the device driver's entry point. +.lp +The device driver will extract the minor# from the dev_t and use +that as the index into the softc array of private data per device. +.lp +The ``specinfo'' structure is a little sidekick vnodes grew underway, +and is used to find all vnodes which reference the same device (i.e. +they have the same major# and minor#). +This linkage is used to determine +which vnode is the ``chosen one'' for this device, and to keep track of +open(2)/close(2) against this device. +The actual implementation was an inefficient hash implementation, +which depending on the vnode reclamation rate and /dev directory lookup +traffic, may become a measurable performance liability. +.sh 2 "The new vnode/inode/dev_t layout" +.lp +In the new layout (Fig. 2) the specinfo structure takes a central +role. There is only one instanace of struct specinfo per +device (i.e. unique major# +and minor# combination) and all vnodes referencing this device point +to this structure directly. +.(z +.PS 2.25 +F: box "file" "handle" +arrow down from F.s +V: box "vnode" +arrow right from V.e +S: box "specinfo" +arrow down from V.s +I: box "inode" +F2: box "file" "handle" at F + (2.5,0) +arrow down from F2.s +V2: box "vnode" +arrow left from V2.w +arrow down from V2.s +I2: box "inode" +arrow down from S.s +D: box "device" "driver" +.PE +.ce 1 +Fig. 2 - The new FreeBSD data structures. +.)z +.lp +In userland, a dev_t is still the logical OR of the major# and +minor#, but this entity is now called a udev_t in the kernel. +In the kernel a dev_t is now a pointer to a struct specinfo. +.lp +All vnodes referencing a device are linked to a list hanging +directly off the specinfo structure, removing the need for the +hash table and consequently simplifying and speeding up a lot +of code dealing with vnode instantiation, retirement and +name-caching. +.lp +The entry points to the device driver are stored in the specinfo +structure, removing the need for the devsw[] array and allowing +device drivers to use separate entrypoints for various minor numbers. +.lp +This is is very convenient for devices which have a ``control'' +device for management and tuning. The control device, almost always +have entirely separate open/close/ioctl implementations [MD.C]. +.lp +In addition to this, two data elements are included in the specinfo +structure but ``owned'' by the device driver. Typically the +device driver will store a pointer to the softc structure in +one of these, and unit number or mode information in the other. +.lp +This removes the need for drivers to find the softc using array +indexing based on the minor#, and at the same time has obliviated +the need for the compiled-in ``NFOO'' constants which traditionally +determined how many softc structures and therefore devices +the driver could support.\** +.(f +\** Not to mention all the drivers which implemented panic(2) +because they forgot to perform bounds checking on the index before +using it on their softc arrays. +.)f +.lp +There are some trivial technical issues relating to allocating +the storage for specinfo early in the boot sequence and how to +find a specinfo from the udev_t/major#+minor#, but they will +not be discussed here. +.sh 2 "Creating and destroying devices" +.lp +Ideally, devices should only be created and +destroyed by the device drivers which know what devices are present. +This is accomplished with the make_dev() and destroy_dev() +function calls. +.lp +Life is seldom quite that simple. The operating system might be called +on to act as a NFS server for a diskless workstation, possibly even +of a different architecture, so we still need to be able to represent +device nodes with no device driver backing in the filesystems and +consequently we need to be able to create a specinfo from +the major#+minor# in these inodes when we encounter them. +In practice this is quite trivial, but in a few places in the code +one needs to be aware of the existence +of both ``named'' and ``anonymous'' specinfo structures. +.lp +The make_dev() call creates a specinfo structure and populates +it with driver entry points, major#, minor#, device node name +(for instance ``lpt0''), UID, GID and access mode bits. The return +value is a dev_t (i.e., a pointer to struct specinfo). +If the device driver determines that the device is no longer +present, it calls destroy_dev(), giving a dev_t as argument +and the dev_t will be cleaned and converted to an anonymous dev_t. +.lp +Once created with make_dev() a named dev_t exists until destroy_dev() +is called by the driver. The driver can rely on this and keep state +in the fields in dev_t which is reserved for driver use. +.sh 1 "DEVFS" +.lp +By now we have all the relevant information about each device node +collected in struct specinfo but we still have one problem to +solve before we can add the DEVFS filesystem on top of it. +.sh 2 "The interrupt problem" +.lp +Some device drivers, notably the CAM/SCSI subsystem in FreeBSD +will discover changes in the device configuration inside an interrupt +routine. +.lp +This imposes some limitations on what can and should do be done: +first one should minimise the amount +of work done in an interrupt routine for performance reasons; +second, to avoid deadlocks, vnodes and mountpoints should not be +accessed from an interrupt routine. +.lp +Also, in addition to the locking issue, +a machine can have many instances of DEVFS mounted: +for a jail(8) based virtual-machine web-server several hundred instances +is not unheard of, making it far too expensive to update all of them +in an interrupt routine. +.lp +The solution to this problem is to do all the filesystem work on +the filesystem side of DEVFS and use atomically manipulated integer indices +(``inode numbers'') as the barrier between the two sides. +.lp +The functions called from the device drivers, make_dev(), destroy_dev() +&c. only manipulate the DEVFS inode number of the dev_t in +question and do not even get near any mountpoints or vnodes. +.lp +For make_dev() the task is to assign a unique inode number to the +dev_t and store the dev_t in the DEVFS-global inode-to-dev_t array. +.(b M +.vs -3 +\fC\s-2make_dev(...) + store argument values in dev_t + assign unique inode number to dev_t + atomically insert dev_t into inode_array\fP\s+2 +.vs +3 +.)b +.lp +For destroy_dev() the task is the opposite: clear the inode number +in the dev_t and NULL the pointer in the devfs-global inode-to-dev_t +array. +.(b M +.vs -3 +\fC\s-2destroy_dev(...) + clear fields in dev_t + zero dev_t inode number. + atomically clear entry in inode_array\fP\s+2 +.vs +3 +.)b +.lp +Both functions conclude by atomically incrementing a global variable +\fCdevfs_generation\fP to leave an indication to the filesystem +side that something has changed. +.lp +By modifying the global state only with atomic instructions, locks +have been entirely avoided in this part of the code which means that +the make_dev() and destroy_dev() functions can be called from practically +anywhere in the kernel at any time. +.lp +On the filesystem side of DEVFS, the only two vnode methods which examine +or rely on the directory structure, VOP_LOOKUP and VOP_READDIR, +call the function devfs_populate() to update their mountpoint's view +of the device hierarchy to match current reality before doing any work. +.(b M +.vs -3 +\fC\s-2devfs_readdir(...) + devfs_populate(...) + ...\fP\s+2 +.)b +.vs +3 +.lp +The devfs_populate() function, compares the current \fCdevfs_generation\fP +to the value saved in the mountpoint last time devfs_populate() completed +and if (actually: while) they differ a linear run is made through the +devfs-global inode-array and the directory tree of the mountpoint is +brought up to date. +.lp +The actual code is slightly more complicated than shown in the pseudo-code +here because it has to deal with subdirectories and hidden entries. +.(b M +.vs -3 +\fC\s-2devfs_populate(...) + while (mount->generation != devfs_generation) + for i in all inodes + if inode created) + create directory entry + else if inode destroyed + remove directory entry +.vs +3 +.)b +.lp +Access to the global DEVFS inode table is again implemented +with atomic instructions and failsafe retries to avoid the +need for locking. +.lp +From a performance point of view this scheme also means that a particular +DEVFS mountpoint is not updated until it needs to be, and then always by +a process belonging to the jail in question thus minimising and +distributing the CPU load. +.sh 1 "Device-driver impact" +.lp +All these changes have had a significant impact on how device drivers +interact with the rest of the kernel regarding registration of +devices. +.lp +If we look first at the ``before'' image in Fig. 3, we notice first +the NFOO define which imposes a firm upper limit on the number of +devices the kernel can deal with. +Also notice that the softc structure for all of them is allocated +at compile time. +This is because most device drivers (and texts on writing device +drivers) are from before the general +kernel malloc facility [Mckusick1988] was introduced into the BSD kernel. +.lp +.(b M +.vs -3 +\fC\s-2 +#ifndef NFOO +# define NFOO 4 +#endif + +struct foo_softc { + ... +} foo_softc[NFOO]; + +int nfoo = 0; + +foo_open(dev, ...) +{ + int unit = minor(dev); + struct foo_softc *sc; + + if (unit >= NFOO || unit >= nfoo) + return (ENXIO); + + sc = &foo_softc[unit] + + ... +} + +foo_attach(...) +{ + struct foo_softc *sc; + static int once; + + ... + if (nfoo >= NFOO) { + /* Have hardware, can't handle */ + return (-1); + } + sc = &foo_softc[nfoo++]; + if (!once) { + cdevsw_add(&cdevsw); + once++; + } + ... +} +\fP\s+2 +Fig. 3 - Device-driver, old style. +.vs +3 +.)b +.lp +Also notice how range checking is needed to make sure that the +minor# is inside range. This code gets more complex if device-numbering +is sparse. Code equivalent to that shown in the foo_open() routine +would also be needed in foo_read(), foo_write(), foo_ioctl() &c. +.lp +Finally notice how the attach routine needs to remember to register +the cdevsw structure (not shown) when the first device is found. +.lp +Now, compare this to our ``after'' image in Fig. 4. +NFOO is totally gone and so is the compile time allocation +of space for softc structures. +.lp +The foo_open (and foo_close, foo_ioctl &c) functions can now +derive the softc pointer directly from the dev_t they receive +as an argument. +.lp +.(b M +.vs -3 +\fC\s-2 +struct foo_softc { + .... +}; + +int nfoo; + +foo_open(dev, ...) +{ + struct foo_softc *sc = dev->si_drv1; + + ... +} + +foo_attach(...) +{ + struct foo_softc *sc; + + ... + sc = MALLOC(..., M_ZERO); + if (sc == NULL) { + /* Have hardware, can't handle */ + return (-1); + } + sc->dev = make_dev(&cdevsw, nfoo, + UID_ROOT, GID_WHEEL, 0644, + "foo%d", nfoo); + nfoo++; + sc->dev->si_drv1 = sc; + ... +} +\fP\s+2 +Fig. 4 - Device-driver, new style. +.vs +3 +.)b +.lp +In foo_attach() we can now attach to all the devices we can +allocate memory for and we register the cdevsw structure per +dev_t rather than globally. +.lp +This last trick is what allows us to discard all bounds checking +in the foo_open() &c. routines, because they can only be +called through the cdevsw, and the cdevsw is only attached to +dev_t's which foo_attach() has created. +There is no way to end +up in foo_open() with a dev_t not created by foo_attach(). +.lp +In the two examples here, the difference is only 10 lines of source +code, primarily because only one of the worker functions of the +device driver is shown. +In real device drivers it is not uncommon to save 50 or more lines +of source code which typically is about a percent or two of the +entire driver. +.sh 1 "Future work" +.lp +Apart from some minor issues to be cleaned up, DEVFS is now a reality +and future work therefore is likely concentrate on applying the +facilities and functionality of DEVFS to FreeBSD. +.sh 2 "devd" +.lp +It would be logical to complement DEVFS with a ``device-daemon'' which +could configure and de-configure devices as they come and go. +When a disk appears, mount it. +When a network interface appears, configure it. +And in some configurable way allow the user to customise the action, +so that for instance images will automatically be copied off the +flash-based media from a camera, &c. +.lp +In this context it is good to question how we view dynamic devices. +If for instance a printer is removed in the middle of a print job +and another printer arrives a moment later, should the system +automatically continue the print job on this new printer? +When a disk-like device arrives, should we always mount it? Should +we have a database of known disk-like devices to tell us where to +mount it, what permissions to give the mountpoint? +Some computers come in multiple configurations, for instance laptops +with and without their docking station. How do we want to present +this to the users and what behaviour do the users expect? +.sh 2 "Pathname length limitations" +.lp +In order to simplify memory management in the early stages of boot, +the pathname relative to the mountpoint is presently stored in a +small fixed size buffer inside struct specinfo. +It should be possible to use filenames as long as the system otherwise +permits, so some kind of extension mechanism is called for. +.lp +Since it cannot be guaranteed that memory can be allocated in +all the possible scenarios where make_dev() can be called, it may +be necessary to mandate that the caller allocates the buffer if +the content will not fit inside the default buffer size. +.sh 2 "Initial access parameter selection" +.lp +As it is now, device drivers propose the initial mode, owner and group +for the device nodes, but it would be more flexible if it were possible +to give the kernel a set of rules, much like packet filtering rules, +which allow the user to set the wanted policy for new devices. +Such a mechanism could also be used to filter new devices for mount +points in jails and to determine other behaviour. +.lp +Doing these things from userland results in some awkward race conditions +and software bloat for embedded systems, so a kernel approach may be more +suitable. +.sh 2 "Applications of on-demand device creation" +.lp +The facility for on-demand creation of devices has some very interesting +possibilities. +.lp +One planned use is to enable user-controlled labelling +of disks. +Today disks have names like /dev/da0, /dev/ad4, but since +this numbering is topological any change in the hardware configuration +may rename the disks, causing /etc/fstab and backup procedures +to get out of sync with the hardware. +.lp +The current idea is to store on the media of the disk a user-chosen +disk name and allow access through this name, so that for instance +/dev/mydisk0 +would be a symlink to whatever topological name the disk might have +at any given time. +.lp +To simplify this and to avoid a forest of symlinks, it will probably +be decided to move all the sub-divisions of a disk into one subdirectory +per disk so just a single symlink can do the job. +In practice that means that the current /dev/ad0s2f will become +something like /dev/ad0/s2f and so on. +Obviously, in the same way, disks could also be accessed by their +topological address, down to the specific path in a SAN environment. +.lp +Another potential use could be for automated offline data media libraries. +It would be quite trivial to make it possible to access all the media +in the library using /dev/lib/$LABEL which would be a remarkable +simplification compared with most current automated retrieval facilities. +.lp +Another use could be to access devices by parameter rather than by +name. One could imagine sending a printjob to /dev/printer/color/A2 +and behind the scenes a search would be made for a device with the +correct properties and paper-handling facilities. +.sh 1 "Conclusion" +.lp +DEVFS has been successfully implemented in FreeBSD, +including a powerful, simple and flexible solution supporting +pseudo-devices and on-demand device node creation. +.lp +Contrary to the trend, the implementation added functionality +with a net decrease in source lines, +primarily because of the improved API seen from device drivers point of view. +.lp +Even if DEVFS is not desired, other 4.4BSD derived UNIX variants +would probably benefit from adopting the dev_t/specinfo related +cleanup. +.sh 1 "Acknowledgements" +.lp +I first got started on DEVFS in 1989 because the abysmal performance +of the Olivetti M250 computer forced me to implement a network-disk-device +for Minix in order to retain my sanity. +That initial work led to a +crude but working DEVFS for Minix, so obviously both Andrew Tannenbaum +and Olivetti deserve credit for inspiration. +.lp +Julian Elischer implemented a DEVFS for FreeBSD around 1994 which never +quite made it to maturity and subsequently was abandoned. +.lp +Bruce Evans deserves special credit not only for his keen eye for detail, +and his competent criticism but also for his enthusiastic resistance to the +very concept of DEVFS. +.lp +Many thanks to the people who took time to help me stamp out ``Danglish'' +through their reviews and comments: Chris Demetriou, Paul Richards, +Brian Somers, Nik Clayton, and Hanne Munkholm. +Any remaining insults to proper use of english language are my own fault. +.\" (list & why) +.sh 1 "References" +.lp +[44BSDBook] +Mckusick, Bostic, Karels & Quarterman: +``The Design and Implementation of 4.4 BSD Operating System.'' +Addison Wesley, 1996, ISBN 0-201-54979-4. +.lp +[Heidemann91a] +John S. Heidemann: +``Stackable layers: an architecture for filesystem development.'' +Master's thesis, University of California, Los Angeles, July 1991. +Available as UCLA technical report CSD-910056. +.lp +[Kamp2000] +Poul-Henning Kamp and Robert N. M. Watson: +``Confining the Omnipotent root.'' +Proceedings of the SANE 2000 Conference. +Available in FreeBSD distributions in \fC/usr/share/papers\fP. +.lp +[MD.C] +Poul-Henning Kamp et al: +FreeBSD memory disk driver: +\fCsrc/sys/dev/md/md.c\fP +.lp +[Mckusick1988] +Marshall Kirk Mckusick, Mike J. Karels: +``Design of a General Purpose Memory Allocator for the 4.3BSD UNIX-Kernel'' +Proceedings of the San Francisco USENIX Conference, pp. 295-303, June 1988. +.lp +[Mckusick1999] +Dr. Marshall Kirk Mckusick: +Private email communication. +\fI``According to the SCCS logs, the chroot call was added by Bill Joy +on March 18, 1982 approximately 1.5 years before 4.2BSD was released. +That was well before we had ftp servers of any sort (ftp did not +show up in the source tree until January 1983). My best guess as +to its purpose was to allow Bill to chroot into the /4.2BSD build +directory and build a system using only the files, include files, +etc contained in that tree. That was the only use of chroot that +I remember from the early days.''\fP +.lp +[Mckusick2000] +Dr. Marshall Kirk Mckusick: +Private communication at BSDcon2000 conference. +\fI``I have not used block devices since I wrote FFS and that +was \fPmany\fI years ago.''\fP +.lp +[NewBus] +NewBus is a subsystem which provides most of the glue between +hardware and device drivers. Despite the importance of this +there has never been published any good overview documentation +for it. +The following article by Alexander Langer in ``Dæmonnews'' is +the best reference I can come up with: +\fC\s-2http://www.daemonnews.org/200007/newbus-intro.html\fP\s+2 +.lp +[Pike2000] +Rob Pike: +``Systems Software Research is Irrelevant.'' +\fC\s-2http://www.cs.bell\-labs.com/who/rob/utah2000.pdf\fP\s+2 +.lp +[Pike90a] +Rob Pike, Dave Presotto, Ken Thompson and Howard Trickey: +``Plan 9 from Bell Labs.'' +Proceedings of the Summer 1990 UKUUG Conference. +.lp +[Pike92a] +Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey and Phil Winterbottom: +``The Use of Name Spaces in Plan 9.'' +Proceedings of the 5th ACM SIGOPS Workshop. +.lp +[Raspe1785] +Rudolf Erich Raspe: +``Baron Münchhausen's Narrative of his marvellous Travels and Campaigns in Russia.'' +Kearsley, 1785. +.lp +[Ritchie74] +D.M. Ritchie and K. Thompson: +``The UNIX Time-Sharing System'' +Communications of the ACM, Vol. 17, No. 7, July 1974. +.lp +[Ritchie98] +Dennis Ritchie: private conversation at USENIX Annual Technical Conference +New Orleans, 1998. +.lp +[Thompson78] +Ken Thompson: +``UNIX Implementation'' +The Bell System Technical Journal, vol 57, 1978, number 6 (part 2) p. 1931ff. diff --git a/share/doc/papers/diskperf/Makefile b/share/doc/papers/diskperf/Makefile new file mode 100644 index 0000000..7f7670c --- /dev/null +++ b/share/doc/papers/diskperf/Makefile @@ -0,0 +1,11 @@ +# From: @(#)Makefile 6.3 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= diskperf +SRCS= abs.ms motivation.ms equip.ms methodology.ms tests.ms \ + results.ms conclusions.ms appendix.ms +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/papers/diskperf/abs.ms b/share/doc/papers/diskperf/abs.ms new file mode 100644 index 0000000..a61104d --- /dev/null +++ b/share/doc/papers/diskperf/abs.ms @@ -0,0 +1,176 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)abs.ms 6.2 (Berkeley) 4/16/91 +.\" +.if n .ND +.TL +Performance Effects of Disk Subsystem Choices +for VAX\(dg Systems Running 4.2BSD UNIX* +.sp +Revised July 27, 1983 +.AU +Bob Kridle +.AI +mt Xinu +2560 9th Street +Suite #312 +Berkeley, California 94710 +.AU +Marshall Kirk McKusick\(dd +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +\(dgVAX, UNIBUS, and MASSBUS are trademarks of Digital Equipment Corporation. +.FE +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +.FS +\(ddThis work was supported under grants from +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +Arpa Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.FE +Measurements were made of the UNIX file system +throughput for various I/O operations using the most attractive currently +available Winchester disks and controllers attached to both the +native busses (SBI/CMI) and the UNIBUS on both VAX 11/780s and VAX 11/750s. +The tests were designed to highlight the performance of single +and dual drive subsystems operating in the 4.2BSD +.I +fast file system +.R +environment. +Many of the results of the tests were initially counter-intuitive +and revealed several important aspects of the VAX implementations +which were surprising to us. +.PP +The hardware used included two Fujitsu 2351A +``Eagle'' +disk drives on each of two foreign-vendor disk controllers +and two DEC RA-81 disk drives on a DEC UDA-50 disk controller. +The foreign-vendor controllers were Emulex SC750, SC780 +and Systems Industries 9900 native bus interfaced controllers. +The DEC UDA-50 controller is a UNIBUS interfaced, heavily buffered +controller which is the first implementation of a new DEC storage +system architecture, DSA. +.PP +One of the most important results of our testing was the correction +of several timing parameters in our device handler for devices +with an RH750/RH780 type interface and having high burst transfer +rates. +The correction of these parameters resulted in an increase in +performance of over twenty percent in some cases. +In addition, one of the controller manufacturers altered their bus +arbitration scheme to produce another increase in throughput. +.AE +.LP +.de PT +.lt \\n(LLu +.pc % +.nr PN \\n% +.tl '\\*(LH'\\*(CH'\\*(RH' +.lt \\n(.lu +.. +.af PN i +.ds LH Performance +.ds RH Contents +.bp 1 +.\".if t .ds CF July 27, 1983 +.\".if t .ds LF CSRG TR/8 +.\".if t .ds RF Kridle, et. al. +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Motivation" +.LP +.sp .5v +.nf +.B "2. Equipment +2.1. DEC UDA50 disk controller +2.2. Emulex SC750/SC780 disk controllers +2.3. Systems Industries 9900 disk controller +2.4. DEC RA81 disk drives +2.5. Fujitsu 2351A disk drives +.LP +.sp .5v +.nf +.B "3. Methodology +.LP +.sp .5v +.nf +.B "4. Tests +.LP +.sp .5v +.nf +.B "5. Results +.LP +.sp .5v +.nf +.B "6. Conclusions +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References +.LP +.sp .5v +.nf +.B "Appendix A +A.1. read_8192 +A.2. write_4096 +A.3. write_8192 +A.4. rewrite_8192 +.ds RH Motivation +.af PN 1 +.bp 1 +.de _d +.if t .ta .6i 2.1i 2.6i +.\" 2.94 went to 2.6, 3.64 to 3.30 +.if n .ta .84i 2.6i 3.30i +.. +.de _f +.if t .ta .5i 1.25i 2.5i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i +.. diff --git a/share/doc/papers/diskperf/appendix.ms b/share/doc/papers/diskperf/appendix.ms new file mode 100644 index 0000000..e059249 --- /dev/null +++ b/share/doc/papers/diskperf/appendix.ms @@ -0,0 +1,102 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)appendix.ms 6.2 (Berkeley) 4/16/91 +.\" +.\" .nr H2 1 +.ds RH Appendix A +.NH +\s+2Appendix A\s0 +.NH 2 +read_8192 +.PP +.DS +#define BUFSIZ 8192 +main( argc, argv) +char **argv; +{ + char buf[BUFSIZ]; + int i, j; + + j = open(argv[1], 0); + for (i = 0; i < 1024; i++) + read(j, buf, BUFSIZ); +} +.DE +.NH 2 +write_4096 +.PP +.DS +#define BUFSIZ 4096 +main( argc, argv) +char **argv; +{ + char buf[BUFSIZ]; + int i, j; + + j = creat(argv[1], 0666); + for (i = 0; i < 2048; i++) + write(j, buf, BUFSIZ); +} +.DE +.NH 2 +write_8192 +.PP +.DS +#define BUFSIZ 8192 +main( argc, argv) +char **argv; +{ + char buf[BUFSIZ]; + int i, j; + + j = creat(argv[1], 0666); + for (i = 0; i < 1024; i++) + write(j, buf, BUFSIZ); +} +.DE +.bp +.NH 2 +rewrite_8192 +.PP +.DS +#define BUFSIZ 8192 +main( argc, argv) +char **argv; +{ + char buf[BUFSIZ]; + int i, j; + + j = open(argv[1], 2); + for (i = 0; i < 1024; i++) + write(j, buf, BUFSIZ); +} +.DE diff --git a/share/doc/papers/diskperf/conclusions.ms b/share/doc/papers/diskperf/conclusions.ms new file mode 100644 index 0000000..9e20f1a --- /dev/null +++ b/share/doc/papers/diskperf/conclusions.ms @@ -0,0 +1,128 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)conclusions.ms 6.2 (Berkeley) 4/16/91 +.\" $FreeBSD$ +.\" +.ds RH Conclusions +.NH +Conclusions +.PP +Peak available throughput is only one criterion +in most storage system purchasing decisions. +Most of the VAX UNIX systems we are familiar with +are not I/O bandwidth constrained. +Nevertheless, an adequate disk bandwidth is necessary for +good performance and especially to preserve snappy +response time. +All of the disk systems we tested provide more than +adequate bandwidth for typical VAX UNIX system application. +Perhaps in some I/O-intensive applications such as +image processing, more consideration should be given +to the peak throughput available. +In most situations, we feel that other factors are more +important in making a storage choice between the systems we +tested. +Cost, reliability, availability, and support are some of these +factors. +The maturity of the technology purchased must also be weighed +against the future value and expandability of newer technologies. +.PP +Two important conclusions about storage systems in general +can be drawn from these tests. +The first is that buffering can be effective in smoothing +the effects of lower bus speeds and bus contention. +Even though the UDA50 is located on the relatively slow +UNIBUS, its performance is similar to controllers located on +the faster processor busses. +However, the SC780 with only one sector of buffering shows that +little buffering is needed if the underlying bus is fast enough. +.PP +Placing more intelligence in the controller seems to hinder UNIX system +performance more than it helps. +Our profiling tests have indicated that UNIX spends about +the same percentage of time in the SC780 driver and the UDA50 driver +(about 10-14%). +Normally UNIX uses a disk sort algorithm that separates reads and +writes into two seek order queues. +The read queue has priority over the write queue, +since reads cause processes to block, +while writes can be done asynchronously. +This is particularly useful when generating large files, +as it allows the disk allocator to read +new disk maps and begin doing new allocations +while the blocks allocated out of the previous map are written to disk. +Because the UDA50 handles all block ordering, +and because it keeps all requests in a single queue, +there is no way to force the longer seek needed to get the next disk map. +This disfunction causes all the writes to be done before the disk map read, +which idles the disk until a new set of blocks can be allocated. +.PP +The additional functionality of the UDA50 controller that allows it +to transfer simultaneously from two drives at once tends to make +the two drive transfer tests run much more effectively. +Tuning for the single drive case works more effectively in the two +drive case than when controllers that cannot handle simultaneous +transfers are used. +.ds RH Acknowledgements +.nr H2 1 +.sp 1 +.NH +\s+2Acknowledgements\s0 +.PP +We thank Paul Massigilia and Bill Grace +of Digital Equipment Corp for helping us run our +disk tests on their UDA50/RA81. +We also thank Rich Notari and Paul Ritkowski +of Emulex for making their machines available +to us to run our tests of the SC780/Eagles. +Dan McKinster, then of Systems Industries, +arranged to make their equipment available for the tests. +We appreciate the time provided by Bob Gross, Joe Wolf, and +Sam Leffler on their machines to refine our benchmarks. +Finally we thank our sponsors, +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +Arpa Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.ds RH References +.nr H2 1 +.sp 1 +.NH +\s+2References\s0 +.LP +.IP [McKusick83] 20 +M. McKusick, W. Joy, S. Leffler, R. Fabry, +``A Fast File System for UNIX'', +\fIACM Transactions on Computer Systems 2\fP, 3. +pp 181-197, August 1984. +.ds RH Appendix A +.bp diff --git a/share/doc/papers/diskperf/equip.ms b/share/doc/papers/diskperf/equip.ms new file mode 100644 index 0000000..264ea04 --- /dev/null +++ b/share/doc/papers/diskperf/equip.ms @@ -0,0 +1,177 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)equip.ms 6.2 (Berkeley) 4/16/91 +.\" +.ds RH Equipment +.NH +Equipment +.PP +Various combinations of the three manufacturers disk controllers, +and two pairs of Winchester disk drives were tested on both +VAX 11/780 and VAX 11/750 CPUs. The Emulex and Systems Industries +disk controllers were interfaced to Fujitsu 2351A +``Eagle'' +404 Megabyte disk drives. +The DEC UDA50 disk controller was interfaced to two DEC RA81 +456 Megabyte Winchester disk drives. +All three controllers were tested on the VAX 780 although +only the Emulex and DEC controllers were benchmarked on the VAX 11/750. +Systems Industries makes a VAX 11/750 CMI interface for +their controller, but we did not have time to test this device. +In addition, not all the storage systems were tested for +two drive throughput. +Each of the controllers and disk drives used in the benchmarks +is described briefly below. +.NH 2 +DEC UDA50 disk controller +.PP +This is a new controller design which is part of a larger, long range +storage architecture referred to as +``DSA'' +or \fBD\fRigital \fBS\fRtorage \fBA\fRrchetecture. +An important aspect of DSA is migrating a large part +of the storage management previously handled in the operating +system to the storage system. Thus, the UDA50 is a much more +intelligent controller than previous interfaces like the RH750 or +RH780. +The UDA50 handles all error correction. +It also deals with most of the physical storage parameters. +Typically, system software requests a logical block or +sequence of blocks. +The physical locations of these blocks, +their head, track, and cylinder indices, +are determined by the controller. +The UDA50 also orders disk requests to maximize throughput +where possible, minimizing total seek and rotational delays. +Where multiple drives are attached to a single controller, +the UDA50 can interleave +simultaneous +data transfers from multiple drives. +.PP +The UDA50 is a UNIBUS implementation of a DSA controller. +It contains 52 sectors of internal buffering to minimize +the effects of a slow UNIBUS such as the one on the VAX-11/780. +This buffering also minimizes the effects of contention with +other UNIBUS peripherals. +.NH 2 +Emulex SC750/SC780 disk controllers +.PP +These two models of the same controller interface to the CMI bus +of a VAX 11/750 and the SBI bus of a 11/VAX 780, respectively. +To the operating system, they emulate either an RH750 or +and RH780. +The controllers install in the +MASSBUS +locations in the CPU cabinets and operate from the +VAX power suplies. +They provide an +``SMD'' +or \fBS\fRtorage \fBM\fRodule \fBD\fRrive +interface to the disk drives. +Although a large number of disk drives use this interface, we tested +the controller exclusively connected to Fujitsu 2351A disks. +.PP +The controller ws first implemented for the VAX-11/750 as the SC750 +model several years ago. Although the SC780 was introduced more +recently, both are stable products with no bugs known to us. +.NH 2 +System Industries 9900 disk controller +.PP +This controller is an evolution of the S.I. 9400 first introduced +as a UNIBUS SMD interface. +The 9900 has been enhanced to include an interface to the VAX 11/780 native +bus, the SBI. +It has also been upgraded to operate with higher data rate drives such +as the Fujitsu 2351As we used in this test. +The controller is contained in its own rack-mounted drawer with an integral +power supply. +The interface to the SMD is a four module set which mounts in a +CPU cabinet slot normally occupied by an RH780. +The SBI interface derives power from the VAX CPU cabinet power +supplies. +.NH 2 +DEC RA81 disk drives +.PP +The RA81 is a rack-mountable 456 Megabyte (formatted) Winchester +disk drive manufactured by DEC. +It includes a great deal of technology which is an integral part +of the DEC \fBDSA\fR scheme. +The novel technology includes a serial packet based communications +protocol with the controller over a pair of mini-coaxial cables. +The physical characteristics of the RA81 are shown in the +table below: +.DS +.TS +box,center; +c s +l l. +DEC RA81 Disk Drive Characteristics +_ +Peak Transfer Rate 2.2 Mbytes/sec. +Rotational Speed 3,600 RPM +Data Sectors/Track 51 +Logical Cylinders 1,248 +Logical Data Heads 14 +Data Capacity 456 Mbytes +Minimum Seek Time 6 milliseconds +Average Seek Time 28 milliseconds +Maximum Seek Time 52 milliseconds +.TE +.DE +.NH 2 +Fujitsu 2351A disk drives +.PP +The Fujitsu 2351A disk drive is a Winchester disk drive +with an SMD controller interface. +Fujitsu has developed a very good reputation for +reliable storage products over the last several years. +The 2351A has the following physical characteristics: +.DS +.TS +box,center; +c s +l l. +Fujitsu 2351A Disk Drive Characteristics +_ +Peak Transfer Rate 1.859 Mbytes/sec. +Rotational Speed 3,961 RPM +Data Sectors/Track 48 +Cylinders 842 +Data Heads 20 +Data Capacity 404 Mbytes +Minimum Seek Time 5 milliseconds +Average Seek Time 18 milliseconds +Maximum Seek Time 35 milliseconds +.TE +.DE +.ds RH Methodology +.bp diff --git a/share/doc/papers/diskperf/methodology.ms b/share/doc/papers/diskperf/methodology.ms new file mode 100644 index 0000000..703d7b6 --- /dev/null +++ b/share/doc/papers/diskperf/methodology.ms @@ -0,0 +1,111 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)methodology.ms 6.2 (Berkeley) 4/16/91 +.\" +.ds RH Methodology +.NH +Methodology +.PP +Our goal was to evaluate the performance of the target peripherals +in an environment as much like our 4.2BSD UNIX systems as possible. +There are two basic approaches to creating this kind of test environment. +These might be termed the \fIindirect\fR and the \fIdirect\fR approach. +The approach used by DEC in producing most of the performance data +on the UDA50/RA81 system under VMS is what we term the indirect +approach. +We chose to use the direct approach. +.PP +The indirect approach used by DEC involves two steps. +First, the environment in which performance is to be evaluated +is parameterized. +In this case, the disk I/O characteristics of VMS were measured +as to the distribution of various sizes of accesses and the proportion +of reads and writes. +This parameterization of +typical +I/O activity was termed a +``vax mix.'' +The second stage involves simulating this mixture of I/O activities +with the devices to be tested and noting the total volume of transactions +processed per unit time by each system. +.PP +The problems encountered with this indirect approach often +have to do with the completeness and correctness of the parameterization +of the context environment. +For example, the +``vax mix'' +model constructed for DECs tests uses a random distribution of seeks +to the blocks read or written. +It is not likely that any real system produces a distribution +of disk transfer locations which is truly random and does not +exhibit strong locality characteristics. +.PP +The methodology chosen by us is direct +in the sense that it uses the standard structured file system mechanism present +in the 4.2BSD UNIX operating system to create the sequence of locations +and sizes of reads and writes to the benchmarked equipment. +We simply create, write, and read +files as they would be by user's activities. +The disk space allocation and disk cacheing mechanism built into +UNIX is used to produce the actual device reads and writes as well +as to determine their size and location on the disk. +We measure and compare the rate at which these +.I +user files +.R +can be written, rewritten, or read. +.PP +The advantage of this approach is the implicit accuracy in +testing in the same environment in which the peripheral +will be used. +Although this system does not account for the I/O produced +by some paging and swapping, in our memory rich environment +these activities account for a relatively small portion +of the total disk activity. +.PP +A more significant disadvantage to the direct approach +is the occasional difficulty we have in accounting for our +measured results. +The apparently straight-forward activity of reading or writing a logical file +on disk can produce a complex mixture of disk traffic. +File I/O is supported by a file management system that +buffers disk traffic through an internal cache, +which allows writes to ba handled asynchronously. +Reads must be done synchronously, +however this restriction is moderated by the use of read-ahead. +Small changes in the performance of the disk controller +subsystem can result in large and unexpected +changes in the file system performance, +as it may change the characteristics of the memory contention +experienced by the processor. +.ds RH Tests +.bp diff --git a/share/doc/papers/diskperf/motivation.ms b/share/doc/papers/diskperf/motivation.ms new file mode 100644 index 0000000..d5fde9d --- /dev/null +++ b/share/doc/papers/diskperf/motivation.ms @@ -0,0 +1,95 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)motivation.ms 6.2 (Berkeley) 4/16/91 +.\" +.\" $FreeBSD$ +.\" +.ds RH Motivation +.NH +Motivation +.PP +These benchmarks were performed for several reasons. +Foremost was our desire to obtain guideline to aid +in choosing one the most expensive components of any +VAX UNIX configuration, the disk storage system. +The range of choices in this area has increased dramatically +in the last year. +DEC has become, with the introduction of the UDA50/RA81 system, +cost competitive +in the area of disk storage for the first time. +Emulex's entry into the VAX 11/780 SBI controller +field, the SC780, represented an important choice for us to examine, given +our previous success with their VAX 11/750 SC750 controller and +their UNIBUS controllers. +The Fujitsu 2351A +Winchester disk drive represents the lowest cost-per-byte disk storage +known to us. +In addition, Fujitsu's reputation for reliability was appealing. +The many attractive aspects of these components justified a more +careful examination of their performance aspects under UNIX. +.PP +In addition to the direct motivation of developing an effective +choice of storage systems, we hoped to gain more insight into +VAX UNIX file system and I/O performance in general. +What generic characteristics of I/O subsystems are most +important? +How important is the location of the controller on the SBI/CMI versus +the UNIBUS? +Is extensive buffering in the controller essential or even important? +How much can be gained by putting more of the storage system +management and optimization function in the controller as +DEC does with the UDA50? +.PP +We also wanted to resolve particular speculation about the value of +storage system optimization by a controller in a UNIX +environment. +Is the access optimization as effective as that already provided +by the existing 4.2BSD UNIX device handlers for traditional disks? +VMS disk handlers do no seek optimization. +This gives the UDA50 controller an advantage over other controllers +under VMS which is not likely to be as important to UNIX. +Are there penalties associated with greater intelligence in the controller? +.PP +A third and last reason for evaluating this equipment is comparable +to the proverbial mountain climbers answer when asked why he climbs +a particular mountain, +``It was there.'' +In our case the equipment +was there. +We were lucky enough to assemble all the desired disks and controllers +and get them installed on a temporarily idle VAX 11/780. +This got us started collecting data. +Although many of the tests were later rerun on a variety of other systems, +this initial test bed was essential for working out the testing bugs +and getting our feet wet. +.ds RH Equipment +.bp diff --git a/share/doc/papers/diskperf/results.ms b/share/doc/papers/diskperf/results.ms new file mode 100644 index 0000000..09f61a8 --- /dev/null +++ b/share/doc/papers/diskperf/results.ms @@ -0,0 +1,337 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)results.ms 6.2 (Berkeley) 4/16/91 +.\" +.ds RH Results +.NH +Results +.PP +The following tables indicate the results of our +test runs. +Note that each table contains results for tests run +on two varieties of 4.2BSD file systems. +The first set of results is always for a file system +with a basic blocking factor of eight Kilobytes and a +fragment size of 1 Kilobyte. The second sets of measurements +are for file systems with a four Kilobyte block size and a +one Kilobyte fragment size. +The values in parenthesis indicate the percentage of CPU +time used by the test program. +In the case of the two disk arm tests, +the value in parenthesis indicates the sum of the percentage +of the test programs that were run. +Entries of ``n. m.'' indicate this value was not measured. +.DS +.TS +box,center; +c s s s s +c s s s s +c s s s s +l | l s | l s +l | l s | l s +l | l l | l l +l | c c | c c. +4.2BSD File Systems Tests - \fBVAX 11/750\fR += +Logically Sequential Transfers +from an \fB8K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test Emulex SC750/Eagle UDA50/RA81 + + 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 490 (69%) 620 (96%) 310 (44%) 520 (65%) +write_4096 380 (99%) 370 (99%) 370 (97%) 360 (98%) +write_8192 470 (99%) 470 (99%) 320 (71%) 410 (83%) +rewrite_8192 650 (99%) 620 (99%) 310 (50%) 450 (70%) += +.T& +c s s s s +c s s s s +l | l s | l s +l | l s | l s +l | l l | l l +l | c c | c c. +Logically Sequential Transfers +from \fB4K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test Emulex SC750/Eagle UDA50/RA81 + + 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 300 (60%) 400 (84%) 210 (42%) 340 (77%) +write_4096 320 (98%) 320 (98%) 220 (67%) 290 (99%) +write_8192 340 (98%) 340 (99%) 220 (65%) 310 (98%) +rewrite_8192 450 (99%) 450 (98%) 230 (47%) 340 (78%) +.TE +.DE +.PP +Note that the rate of write operations on the VAX 11/750 are ultimately +CPU limited in some cases. +The write rates saturate the CPU at a lower bandwidth than the reads +because they must do disk allocation in addition to moving the data +from the user program to the disk. +The UDA50/RA81 saturates the CPU at a lower transfer rate for a given +operation than the SC750/Eagle because +it causes more memory contention with the CPU. +We do not know if this contention is caused by +the UNIBUS controller or the UDA50. +.PP +The following table reports the results of test runs on a VAX 11/780 +with 4 Megabytes of main memory. +.DS +.TS +box,center; +c s s s s s s +c s s s s s s +c s s s s s s +l | l s | l s | l s +l | l s | l s | l s +l | l l | l l | l l +l | c c | c c | c c. +4.2BSD File Systems Tests - \fBVAX 11/780\fR += +Logically Sequential Transfers +from an \fB8K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test Emulex SC780/Eagle UDA50/RA81 Sys. Ind. 9900/Eagle + + 1 Drive 2 Drives 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 560 (70%) 480 (58%) 360 (45%) 540 (72%) 340 (41%) 520 (66%) +write_4096 440 (98%) 440 (98%) 380 (99%) 480 (96%) 490 (96%) 440 (84%) +write_8192 490 (98%) 490 (98%) 220 (58%)* 480 (92%) 490 (80%) 430 (72%) +rewrite_8192 760 (100%) 560 (72%) 220 (50%)* 180 (52%)* 490 (60%) 520 (62%) += +.T& +c s s s s s s +c s s s s s s +l | l s | l s | l s +l | l s | l s | l s +l | l l | l l | l l +l | c c | c c | c c. +Logically Sequential Transfers +from an \fB4K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test Emulex SC780/Eagle UDA50/RA81 Sys. Ind. 9900/Eagle + + 1 Drive 2 Drives 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 490 (77%) 370 (66%) n.m. n.m. 200 (31%) 370 (56%) +write_4096 380 (98%) 370 (98%) n.m. n.m. 200 (46%) 370 (88%) +write_8192 380 (99%) 370 (97%) n.m. n.m. 200 (45%) 320 (76%) +rewrite_8192 490 (87%) 350 (66%) n.m. n.m. 200 (31%) 300 (46%) +.TE +* the operation of the hardware was suspect during these tests. +.DE +.PP +The dropoff in reading and writing rates for the two drive SC780/Eagle +tests are probably due to the file system using insufficient +rotational delay for these tests. +We have not fully investigated these times. +.PP +The following table compares data rates on VAX 11/750s directly +with those of VAX 11/780s using the UDA50/RA81 storage system. +.DS +.TS +box,center; +c s s s s +c s s s s +c s s s s +l | l s | l s +l | l s | l s +l | l l | l l +l | c c | c c. +4.2BSD File Systems Tests - \fBDEC UDA50 - 750 vs. 780\fR += +Logically Sequential Transfers +from an \fB8K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test VAX 11/750 UNIBUS VAX 11/780 UNIBUS + + 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 310 (44%) 520 (84%) 360 (45%) 540 (72%) +write_4096 370 (97%) 360 (100%) 380 (99%) 480 (96%) +write_8192 320 (71%) 410 (96%) 220 (58%)* 480 (92%) +rewrite_8192 310 (50%) 450 (80%) 220 (50%)* 180 (52%)* += +.T& +c s s s s +c s s s s +l | l s | l s +l | l s | l s +l | l l | l l +l | c c | c c. +Logically Sequential Transfers +from an \fB4K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test VAX 11/750 UNIBUS VAX 11/780 UNIBUS + + 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 210 (42%) 342 (77%) n.m. n.m. +write_4096 215 (67%) 294 (99%) n.m. n.m. +write_8192 215 (65%) 305 (98%) n.m. n.m. +rewrite_8192 227 (47%) 336 (78%) n.m. n.m. +.TE +* the operation of the hardware was suspect during these tests. +.DE +.PP +The higher throughput available on VAX 11/780s is due to a number +of factors. +The larger main memory size allows a larger file system cache. +The block allocation routines run faster, raising the upper limit +on the data rates in writing new files. +.PP +The next table makes the same comparison using an Emulex controller +on both systems. +.DS +.TS +box, center; +c s s s s +c s s s s +c s s s s +l | l s | l s +l | l s | l s +l | l l | l l +l | c c | c c. +4.2BSD File Systems Tests - \fBEmulex - 750 vs. 780\fR += +Logically Sequential Transfers +from an \fB8K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test VAX 11/750 CMI Bus VAX 11/780 SBI Bus + + 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 490 (69%) 620 (96%) 560 (70%) 480 (58%) +write_4096 380 (99%) 370 (99%) 440 (98%) 440 (98%) +write_8192 470 (99%) 470 (99%) 490 (98%) 490 (98%) +rewrite_8192 650 (99%) 620 (99%) 760 (100%) 560 (72%) += +.T& +c s s s s +c s s s s +l | l s | l s +l | l s | l s +l | l l | l l +l | c c | c c. +Logically Sequential Transfers +from an \fB4K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test VAX 11/750 CMI Bus VAX 11/780 SBI Bus + + 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 300 (60%) 400 (84%) 490 (77%) 370 (66%) +write_4096 320 (98%) 320 (98%) 380 (98%) 370 (98%) +write_8192 340 (98%) 340 (99%) 380 (99%) 370 (97%) +rewrite_8192 450 (99%) 450 (98%) 490 (87%) 350 (66%) +.TE +.DE +.PP +The following table illustrates the evolution of our testing +process as both hardware and software problems effecting +the performance of the Emulex SC780 were corrected. +The software change was suggested to us by George Goble +of Purdue University. +.PP +The 4.2BSD handler for RH750/RH780 interfaced disk drives +contains several constants which to determine how +much time is provided between an interrupt signaling the completion +of a positioning command and the subsequent start of a data transfer +operation. These lead times are expressed as sectors of rotational delay. +If they are too small, an extra complete rotation will often be required +between a seek and subsequent read or write operation. +The higher bit rate and rotational speed of the 2351A Fujitsu +disk drives required +increasing these constants. +.PP +The hardware change involved allowing for slightly longer +delays in arbitrating for cycles on the SBI bus by +starting the bus arbitration cycle a little further ahead of +when the data was ready for transfer. +Finally we had to increase the rotational delay between consecutive +blocks in the file because +the higher bandwidth from the disk generated more memory contention, +which slowed down the processor. +.DS +.TS +box,center,expand; +c s s s s s s +c s s s s s s +c s s s s s s +l | l s | l s | l s +l | l s | l s | l s +l | l s | l s | l s +l | c c | c c | c c +l | c c | c c | c c. +4.2BSD File Systems Tests - \fBEmulex SC780 Disk Controller Evolution\fR += +Logically Sequential Transfers +from an \fB8K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test Inadequate Search Lead OK Search Lead OK Search Lead + Initial SBI Arbitration Init SBI Arb. Improved SBI Arb. + + 1 Drive 2 Drives 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 320 370 440 (60%) n.m. 560 (70%) 480 (58%) +write_4096 250 270 300 (63%) n.m. 440 (98%) 440 (98%) +write_8192 250 280 340 (60%) n.m. 490 (98%) 490 (98%) +rewrite_8192 250 290 380 (48%) n.m. 760 (100%) 560 (72%) += +.T& +c s s s s s s +c s s s s s s +l | l s | l s | l s +l | l s | l s | l s +l | l s | l s | l s +l | c c | c c | c c +l | c c | c c | c c. +Logically Sequential Transfers +from an \fB4K/1K\fR 4.2BSD File System (Kbytes/sec.) +_ +Test Inadequate Search Lead OK Search Lead OK Search Lead + Initial SBI Arbitration Init SBI Arb. Improved SBI Arb. + + 1 Drive 2 Drives 1 Drive 2 Drives 1 Drive 2 Drives +_ +read_8192 200 220 280 n.m. 490 (77%) 370 (66%) +write_4096 180 190 300 n.m. 380 (98%) 370 (98%) +write_8192 180 200 320 n.m. 380 (99%) 370 (97%) +rewrite_8192 190 200 340 n.m. 490 (87%) 350 (66%) +.TE +.DE +.ds RH Conclusions +.bp diff --git a/share/doc/papers/diskperf/tests.ms b/share/doc/papers/diskperf/tests.ms new file mode 100644 index 0000000..e937931 --- /dev/null +++ b/share/doc/papers/diskperf/tests.ms @@ -0,0 +1,109 @@ +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)tests.ms 6.2 (Berkeley) 4/16/91 +.\" $FreeBSD$ +.\" +.ds RH Tests +.NH +Tests +.PP +Our battery of tests consists of four programs, +read_8192, write_8192, write_4096 +and rewrite_8192 originally written by [McKusick83] +to evaluate the performance of the new file system in 4.2BSD. +These programs all follow the same model and are typified by +read_8192 shown here. +.DS +#define BUFSIZ 8192 +main( argc, argv) +char **argv; +{ + char buf[BUFSIZ]; + int i, j; + + j = open(argv[1], 0); + for (i = 0; i < 1024; i++) + read(j, buf, BUFSIZ); +} +.DE +The remaining programs are included in appendix A. +.PP +These programs read, write with two different blocking factors, +and rewrite logical files in structured file system on the disk +under test. +The write programs create new files while the rewrite program +overwrites an existing file. +Each of these programs represents an important segment of the +typical UNIX file system activity with the read program +representing by far the largest class and the rewrite the smallest. +.PP +A blocking factor of 8192 is used by all programs except write_4096. +This is typical of most 4.2BSD user programs since a standard set of +I/O support routines is commonly used and these routines buffer +data in similar block sizes. +.PP +For each test run, an empty eight Kilobyte block +file system was created in the target +storage system. +Then each of the four tests was run and timed. +Each test was run three times; +the first to clear out any useful data in the cache, +and the second two to insure that the experiment +had stablized and was repeatable. +Each test operated on eight Megabytes of data to +insure that the cache did not overly influence the results. +Another file system was then initialized using a +basic blocking factor of four Kilobytes and the same tests +were run again and timed. +A command script for a run appears as follows: +.DS +#!/bin/csh +set time=2 +echo "8K/1K file system" +newfs /dev/rhp0g eagle +mount /dev/hp0g /mnt0 +mkdir /mnt0/foo +echo "write_8192 /mnt0/foo/tst2" +rm -f /mnt0/foo/tst2 +write_8192 /mnt0/foo/tst2 +rm -f /mnt0/foo/tst2 +write_8192 /mnt0/foo/tst2 +rm -f /mnt0/foo/tst2 +write_8192 /mnt0/foo/tst2 +echo "read_8192 /mnt0/foo/tst2" +read_8192 /mnt0/foo/tst2 +read_8192 /mnt0/foo/tst2 +read_8192 /mnt0/foo/tst2 +umount /dev/hp0g +.DE +.ds RH Results +.bp diff --git a/share/doc/papers/fsinterface/Makefile b/share/doc/papers/fsinterface/Makefile new file mode 100644 index 0000000..f11021b --- /dev/null +++ b/share/doc/papers/fsinterface/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 5.3 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= fsinterface +SRCS= fsinterface.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/papers/fsinterface/abstract.ms b/share/doc/papers/fsinterface/abstract.ms new file mode 100644 index 0000000..ab8b473 --- /dev/null +++ b/share/doc/papers/fsinterface/abstract.ms @@ -0,0 +1,73 @@ +.\" Copyright (c) 1986 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)abstract.ms 5.2 (Berkeley) 4/16/91 +.\" +.TL +Toward a Compatible Filesystem Interface +.AU +Michael J. Karels +Marshall Kirk McKusick +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.LP +As network or remote filesystems have been implemented for +.UX , +several stylized interfaces between the filesystem implementation +and the rest of the kernel have been developed. +Notable among these are Sun Microsystems' virtual filesystem interface +using vnodes, Digital Equipment's Generic File System architecture, +and AT&T's File System Switch. +Each design attempts to isolate filesystem-dependent details +below the generic interface and to provide a framework within which +new filesystems may be incorporated. +However, each of these interfaces is different from +and incompatible with the others. +Each of them addresses somewhat different design goals. +Each was based upon a different starting version of +.UX , +targetted a different set of filesystems with varying characteristics, +and uses a different set of primitive operations provided by the filesystem. +The current study compares the various filesystem interfaces. +Criteria for comparison include generality, completeness, robustness, +efficiency and esthetics. +As a result of this comparison, a proposal for a new filesystem interface +is advanced that includes the best features of the existing implementations. +The proposal adopts the calling convention for name lookup introduced +in 4.3BSD. +A prototype implementation is described. +This proposal and the rationale underlying its development +have been presented to major software vendors +as an early step toward convergence upon a compatible filesystem interface. diff --git a/share/doc/papers/fsinterface/fsinterface.ms b/share/doc/papers/fsinterface/fsinterface.ms new file mode 100644 index 0000000..453cc7e --- /dev/null +++ b/share/doc/papers/fsinterface/fsinterface.ms @@ -0,0 +1,1176 @@ +.\" Copyright (c) 1986 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)fsinterface.ms 1.4 (Berkeley) 4/16/91 +.\" $FreeBSD$ +.\" +.nr UX 0 +.de UX +.ie \\n(UX \s-1UNIX\s0\\$1 +.el \{\ +\s-1UNIX\s0\\$1\(dg +.FS +\(dg \s-1UNIX\s0 is a registered trademark of AT&T. +.FE +.nr UX 1 +.\} +.. +.TL +Toward a Compatible Filesystem Interface +.AU +Michael J. Karels +Marshall Kirk McKusick +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.AB +.LP +As network or remote filesystems have been implemented for +.UX , +several stylized interfaces between the filesystem implementation +and the rest of the kernel have been developed. +.FS +This is an update of a paper originally presented +at the September 1986 conference of the European +.UX +Users' Group. +Last modified April 16, 1991. +.FE +Notable among these are Sun Microsystems' Virtual Filesystem interface (VFS) +using vnodes, Digital Equipment's Generic File System (GFS) architecture, +and AT&T's File System Switch (FSS). +Each design attempts to isolate filesystem-dependent details +below a generic interface and to provide a framework within which +new filesystems may be incorporated. +However, each of these interfaces is different from +and incompatible with the others. +Each of them addresses somewhat different design goals. +Each was based on a different starting version of +.UX , +targetted a different set of filesystems with varying characteristics, +and uses a different set of primitive operations provided by the filesystem. +The current study compares the various filesystem interfaces. +Criteria for comparison include generality, completeness, robustness, +efficiency and esthetics. +Several of the underlying design issues are examined in detail. +As a result of this comparison, a proposal for a new filesystem interface +is advanced that includes the best features of the existing implementations. +The proposal adopts the calling convention for name lookup introduced +in 4.3BSD, but is otherwise closely related to Sun's VFS. +A prototype implementation is now being developed at Berkeley. +This proposal and the rationale underlying its development +have been presented to major software vendors +as an early step toward convergence on a compatible filesystem interface. +.AE +.NH +Introduction +.PP +As network communications and workstation environments +became common elements in +.UX +systems, several vendors of +.UX +systems have designed and built network file systems +that allow client process on one +.UX +machine to access files on a server machine. +Examples include Sun's Network File System, NFS [Sandberg85], +AT&T's recently-announced Remote File Sharing, RFS [Rifkin86], +the LOCUS distributed filesystem [Walker85], +and Masscomp's extended filesystem [Cole85]. +Other remote filesystems have been implemented in research or university groups +for internal use, notably the network filesystem in the Eighth Edition +.UX +system [Weinberger84] and two different filesystems used at Carnegie-Mellon +University [Satyanarayanan85]. +Numerous other remote file access methods have been devised for use +within individual +.UX +processes, +many of them by modifications to the C I/O library +similar to those in the Newcastle Connection [Brownbridge82]. +.PP +Multiple network filesystems may frequently +be found in use within a single organization. +These circumstances make it highly desirable to be able to transport filesystem +implementations from one system to another. +Such portability is considerably enhanced by the use of a stylized interface +with carefully-defined entry points to separate the filesystem from the rest +of the operating system. +This interface should be similar to the interface between device drivers +and the kernel. +Although varying somewhat among the common versions of +.UX , +the device driver interfaces are sufficiently similar that device drivers +may be moved from one system to another without major problems. +A clean, well-defined interface to the filesystem also allows a single +system to support multiple local filesystem types. +.PP +For reasons such as these, several filesystem interfaces have been used +when integrating new filesystems into the system. +The best-known of these are Sun Microsystems' Virtual File System interface, +VFS [Kleiman86], and AT&T's File System Switch, FSS. +Another interface, known as the Generic File System, GFS, +has been implemented for the ULTRIX\(dd +.FS +\(dd ULTRIX is a trademark of Digital Equipment Corp. +.FE +system by Digital [Rodriguez86]. +There are numerous differences among these designs. +The differences may be understood from the varying philosophies +and design goals of the groups involved, from the systems under which +the implementations were done, and from the filesystems originally targetted +by the designs. +These differences are summarized in the following sections +within the limitations of the published specifications. +.NH +Design goals +.PP +There are several design goals which, in varying degrees, +have driven the various designs. +Each attempts to divide the filesystem into a filesystem-type-independent +layer and individual filesystem implementations. +The division between these layers occurs at somewhat different places +in these systems, reflecting different views of the diversity and types +of the filesystems that may be accommodated. +Compatibility with existing local filesystems has varying importance; +at the user-process level, each attempts to be completely transparent +except for a few filesystem-related system management programs. +The AT&T interface also makes a major effort to retain familiar internal +system interfaces, and even to retain object-file-level binary compatibility +with operating system modules such as device drivers. +Both Sun and DEC were willing to change internal data structures and interfaces +so that other operating system modules might require recompilation +or source-code modification. +.PP +AT&T's interface both allows and requires filesystems to support the full +and exact semantics of their previous filesystem, +including interruptions of system calls on slow operations. +System calls that deal with remote files are encapsulated +with their environment and sent to a server where execution continues. +The system call may be aborted by either client or server, returning +control to the client. +Most system calls that descend into the file-system dependent layer +of a filesystem other than the standard local filesystem do not return +to the higher-level kernel calling routines. +Instead, the filesystem-dependent code completes the requested +operation and then executes a non-local goto (\fIlongjmp\fP) to exit the +system call. +These efforts to avoid modification of main-line kernel code +indicate a far greater emphasis on internal compatibility than on modularity, +clean design, or efficiency. +.PP +In contrast, the Sun VFS interface makes major modifications to the internal +interfaces in the kernel, with a very clear separation +of filesystem-independent and -dependent data structures and operations. +The semantics of the filesystem are largely retained for local operations, +although this is achieved at some expense where it does not fit the internal +structuring well. +The filesystem implementations are not required to support the same +semantics as local +.UX +filesystems. +Several historical features of +.UX +filesystem behavior are difficult to achieve using the VFS interface, +including the atomicity of file and link creation and the use of open files +whose names have been removed. +.PP +A major design objective of Sun's network filesystem, +statelessness, +permeates the VFS interface. +No locking may be done in the filesystem-independent layer, +and locking in the filesystem-dependent layer may occur only during +a single call into that layer. +.PP +A final design goal of most implementors is performance. +For remote filesystems, +this goal tends to be in conflict with the goals of complete semantic +consistency, compatibility and modularity. +Sun has chosen performance over modularity in some areas, +but has emphasized clean separation of the layers within the filesystem +at the expense of performance. +Although the performance of RFS is yet to be seen, +AT&T seems to have considered compatibility far more important than modularity +or performance. +.NH +Differences among filesystem interfaces +.PP +The existing filesystem interfaces may be characterized +in several ways. +Each system is centered around a few data structures or objects, +along with a set of primitives for performing operations upon these objects. +In the original +.UX +filesystem [Ritchie74], +the basic object used by the filesystem is the inode, or index node. +The inode contains all of the information about a file except its name: +its type, identification, ownership, permissions, timestamps and location. +Inodes are identified by the filesystem device number and the index within +the filesystem. +The major entry points to the filesystem are \fInamei\fP, +which translates a filesystem pathname into the underlying inode, +and \fIiget\fP, which locates an inode by number and installs it in the in-core +inode table. +\fINamei\fP performs name translation by iterative lookup +of each component name in its directory to find its inumber, +then using \fIiget\fP to return the actual inode. +If the last component has been reached, this inode is returned; +otherwise, the inode describes the next directory to be searched. +The inode returned may be used in various ways by the caller; +it may be examined, the file may be read or written, +types and access may be checked, and fields may be modified. +Modified inodes are automatically written back to the filesystem +on disk when the last reference is released with \fIiput\fP. +Although the details are considerably different, +the same general scheme is used in the faster filesystem in 4.2BSD +.UX +[Mckusick85]. +.PP +Both the AT&T interface and, to a lesser extent, the DEC interface +attempt to preserve the inode-oriented interface. +Each modify the inode to allow different varieties of the structure +for different filesystem types by separating the filesystem-dependent +parts of the inode into a separate structure or one arm of a union. +Both interfaces allow operations +equivalent to the \fInamei\fP and \fIiget\fP operations +of the old filesystem to be performed in the filesystem-independent +layer, with entry points to the individual filesystem implementations to support +the type-specific parts of these operations. Implicit in this interface +is that files may be conveniently be named by and located using a single +index within a filesystem. +The GFS provides specific entry points to the filesystems +to change most file properties rather than allowing arbitrary changes +to be made to the generic part of the inode. +.PP +In contrast, the Sun VFS interface replaces the inode as the primary object +with the vnode. +The vnode contains no filesystem-dependent fields except the pointer +to the set of operations implemented by the filesystem. +Properties of a vnode that might be transient, such as the ownership, +permissions, size and timestamps, are maintained by the lower layer. +These properties may be presented in a generic format upon request; +callers are expected not to hold this information for any length of time, +as they may not be up-to-date later on. +The vnode operations do not include a corollary for \fIiget\fP; +the only external interface for obtaining vnodes for specific files +is the name lookup operation. +(Separate procedures are provided outside of this interface +that obtain a ``file handle'' for a vnode which may be given +to a client by a server, such that the vnode may be retrieved +upon later presentation of the file handle.) +.NH +Name translation issues +.PP +Each of the systems described include a mechanism for performing +pathname-to-internal-representation translation. +The style of the name translation function is very different in all +three systems. +As described above, the AT&T and DEC systems retain the \fInamei\fP function. +The two are quite different, however, as the ULTRIX interface uses +the \fInamei\fP calling convention introduced in 4.3BSD. +The parameters and context for the name lookup operation +are collected in a \fInameidata\fP structure which is passed to \fInamei\fP +for operation. +Intent to create or delete the named file is declared in advance, +so that the final directory scan in \fInamei\fP may retain information +such as the offset in the directory at which the modification will be made. +Filesystems that use such mechanisms to avoid redundant work +must therefore lock the directory to be modified so that it may not +be modified by another process before completion. +In the System V filesystem, as in previous versions of +.UX , +this information is stored in the per-process \fIuser\fP structure +by \fInamei\fP for use by a low-level routine called after performing +the actual creation or deletion of the file itself. +In 4.3BSD and in the GFS interface, these side effects of \fInamei\fP +are stored in the \fInameidata\fP structure given as argument to \fInamei\fP, +which is also presented to the routine implementing file creation or deletion. +.PP +The ULTRIX \fInamei\fP routine is responsible for the generic +parts of the name translation process, such as copying the name into +an internal buffer, validating it, interpolating +the contents of symbolic links, and indirecting at mount points. +As in 4.3BSD, the name is copied into the buffer in a single call, +according to the location of the name. +After determining the type of the filesystem at the start of translation +(the current directory or root directory), it calls the filesystem's +\fInamei\fP entry with the same structure it received from its caller. +The filesystem-specific routine translates the name, component by component, +as long as no mount points are reached. +It may return after any number of components have been processed. +\fINamei\fP performs any processing at mount points, then calls +the correct translation routine for the next filesystem. +Network filesystems may pass the remaining pathname to a server for translation, +or they may look up the pathname components one at a time. +The former strategy would be more efficient, +but the latter scheme allows mount points within a remote filesystem +without server knowledge of all client mounts. +.PP +The AT&T \fInamei\fP interface is presumably the same as that in previous +.UX +systems, accepting the name of a routine to fetch pathname characters +and an operation (one of: lookup, lookup for creation, or lookup for deletion). +It translates, component by component, as before. +If it detects that a mount point crosses to a remote filesystem, +it passes the remainder of the pathname to the remote server. +A pathname-oriented request other than open may be completed +within the \fInamei\fP call, +avoiding return to the (unmodified) system call handler +that called \fInamei\fP. +.PP +In contrast to the first two systems, Sun's VFS interface has replaced +\fInamei\fP with \fIlookupname\fP. +This routine simply calls a new pathname-handling module to allocate +a pathname buffer and copy in the pathname (copying a character per call), +then calls \fIlookuppn\fP. +\fILookuppn\fP performs the iteration over the directories leading +to the destination file; it copies each pathname component to a local buffer, +then calls the filesystem \fIlookup\fP entry to locate the vnode +for that file in the current directory. +Per-filesystem \fIlookup\fP routines may translate only one component +per call. +For creation and deletion of new files, the lookup operation is unmodified; +the lookup of the final component only serves to check for the existence +of the file. +The subsequent creation or deletion call, if any, must repeat the final +name translation and associated directory scan. +For new file creation in particular, this is rather inefficient, +as file creation requires two complete scans of the directory. +.PP +Several of the important performance improvements in 4.3BSD +were related to the name translation process [McKusick85][Leffler84]. +The following changes were made: +.IP 1. 4 +A system-wide cache of recent translations is maintained. +The cache is separate from the inode cache, so that multiple names +for a file may be present in the cache. +The cache does not hold ``hard'' references to the inodes, +so that the normal reference pattern is not disturbed. +.IP 2. +A per-process cache is kept of the directory and offset +at which the last successful name lookup was done. +This allows sequential lookups of all the entries in a directory to be done +in linear time. +.IP 3. +The entire pathname is copied into a kernel buffer in a single operation, +rather than using two subroutine calls per character. +.IP 4. +A pool of pathname buffers are held by \fInamei\fP, avoiding allocation +overhead. +.LP +All of these performance improvements from 4.3BSD are well worth using +within a more generalized filesystem framework. +The generalization of the structure may otherwise make an already-expensive +function even more costly. +Most of these improvements are present in the GFS system, as it derives +from the beta-test version of 4.3BSD. +The Sun system uses a name-translation cache generally like that in 4.3BSD. +The name cache is a filesystem-independent facility provided for the use +of the filesystem-specific lookup routines. +The Sun cache, like that first used at Berkeley but unlike that in 4.3, +holds a ``hard'' reference to the vnode (increments the reference count). +The ``soft'' reference scheme in 4.3BSD cannot be used with the current +NFS implementation, as NFS allocates vnodes dynamically and frees them +when the reference count returns to zero rather than caching them. +As a result, fewer names may be held in the cache +than (local filesystem) vnodes, and the cache distorts the normal reference +patterns otherwise seen by the LRU cache. +As the name cache references overflow the local filesystem inode table, +the name cache must be purged to make room in the inode table. +Also, to determine whether a vnode is in use (for example, +before mounting upon it), the cache must be flushed to free any +cache reference. +These problems should be corrected +by the use of the soft cache reference scheme. +.PP +A final observation on the efficiency of name translation in the current +Sun VFS architecture is that the number of subroutine calls used +by a multi-component name lookup is dramatically larger +than in the other systems. +The name lookup scheme in GFS suffers from this problem much less, +at no expense in violation of layering. +.PP +A final problem to be considered is synchronization and consistency. +As the filesystem operations are more stylized and broken into separate +entry points for parts of operations, it is more difficult to guarantee +consistency throughout an operation and/or to synchronize with other +processes using the same filesystem objects. +The Sun interface suffers most severely from this, +as it forbids the filesystems from locking objects across calls +to the filesystem. +It is possible that a file may be created between the time that a lookup +is performed and a subsequent creation is requested. +Perhaps more strangely, after a lookup fails to find the target +of a creation attempt, the actual creation might find that the target +now exists and is a symbolic link. +The call will either fail unexpectedly, as the target is of the wrong type, +or the generic creation routine will have to note the error +and restart the operation from the lookup. +This problem will always exist in a stateless filesystem, +but the VFS interface forces all filesystems to share the problem. +This restriction against locking between calls also +forces duplication of work during file creation and deletion. +This is considered unacceptable. +.NH +Support facilities and other interactions +.PP +Several support facilities are used by the current +.UX +filesystem and require generalization for use by other filesystem types. +For filesystem implementations to be portable, +it is desirable that these modified support facilities +should also have a uniform interface and +behave in a consistent manner in target systems. +A prominent example is the filesystem buffer cache. +The buffer cache in a standard (System V or 4.3BSD) +.UX +system contains physical disk blocks with no reference to the files containing +them. +This works well for the local filesystem, but has obvious problems +for remote filesystems. +Sun has modified the buffer cache routines to describe buffers by vnode +rather than by device. +For remote files, the vnode used is that of the file, and the block +numbers are virtual data blocks. +For local filesystems, a vnode for the block device is used for cache reference, +and the block numbers are filesystem physical blocks. +Use of per-file cache description does not easily accommodate +caching of indirect blocks, inode blocks, superblocks or cylinder group blocks. +However, the vnode describing the block device for the cache +is one created internally, +rather than the vnode for the device looked up when mounting, +and it is located by searching a private list of vnodes +rather than by holding it in the mount structure. +Although the Sun modification makes it possible to use the buffer +cache for data blocks of remote files, a better generalization +of the buffer cache is needed. +.PP +The RFS filesystem used by AT&T does not currently cache data blocks +on client systems, thus the buffer cache is probably unmodified. +The form of the buffer cache in ULTRIX is unknown to us. +.PP +Another subsystem that has a large interaction with the filesystem +is the virtual memory system. +The virtual memory system must read data from the filesystem +to satisfy fill-on-demand page faults. +For efficiency, this read call is arranged to place the data directly +into the physical pages assigned to the process (a ``raw'' read) to avoid +copying the data. +Although the read operation normally bypasses the filesystem buffer cache, +consistency must be maintained by checking the buffer cache and copying +or flushing modified data not yet stored on disk. +The 4.2BSD virtual memory system, like that of Sun and ULTRIX, +maintains its own cache of reusable text pages. +This creates additional complications. +As the virtual memory systems are redesigned, these problems should be +resolved by reading through the buffer cache, then mapping the cached +data into the user address space. +If the buffer cache or the process pages are changed while the other reference +remains, the data would have to be copied (``copy-on-write''). +.PP +In the meantime, the current virtual memory systems must be used +with the new filesystem framework. +Both the Sun and AT&T filesystem interfaces +provide entry points to the filesystem for optimization of the virtual +memory system by performing logical-to-physical block number translation +when setting up a fill-on-demand image for a process. +The VFS provides a vnode operation analogous to the \fIbmap\fP function of the +.UX +filesystem. +Given a vnode and logical block number, it returns a vnode and block number +which may be read to obtain the data. +If the filesystem is local, it returns the private vnode for the block device +and the physical block number. +As the \fIbmap\fP operations are all performed at one time, during process +startup, any indirect blocks for the file will remain in the cache +after they are once read. +In addition, the interface provides a \fIstrategy\fP entry that may be used +for ``raw'' reads from a filesystem device, +used to read data blocks into an address space without copying. +This entry uses a buffer header (\fIbuf\fP structure) +to describe the I/O operation +instead of a \fIuio\fP structure. +The buffer-style interface is the same as that used by disk drivers internally. +This difference allows the current \fIuio\fP primitives to be avoided, +as they copy all data to/from the current user process address space. +Instead, for local filesystems these operations could be done internally +with the standard raw disk read routines, +which use a \fIuio\fP interface. +When loading from a remote filesystems, +the data will be received in a network buffer. +If network buffers are suitably aligned, +the data may be mapped into the process address space by a page swap +without copying. +In either case, it should be possible to use the standard filesystem +read entry from the virtual memory system. +.PP +Other issues that must be considered in devising a portable +filesystem implementation include kernel memory allocation, +the implicit use of user-structure global context, +which may create problems with reentrancy, +the style of the system call interface, +and the conventions for synchronization +(sleep/wakeup, handling of interrupted system calls, semaphores). +.NH +The Berkeley Proposal +.PP +The Sun VFS interface has been most widely used of the three described here. +It is also the most general of the three, in that filesystem-specific +data and operations are best separated from the generic layer. +Although it has several disadvantages which were described above, +most of them may be corrected with minor changes to the interface +(and, in a few areas, philosophical changes). +The DEC GFS has other advantages, in particular the use of the 4.3BSD +\fInamei\fP interface and optimizations. +It allows single or multiple components of a pathname +to be translated in a single call to the specific filesystem +and thus accommodates filesystems with either preference. +The FSS is least well understood, as there is little public information +about the interface. +However, the design goals are the least consistent with those of the Berkeley +research groups. +Accordingly, a new filesystem interface has been devised to avoid +some of the problems in the other systems. +The proposed interface derives directly from Sun's VFS, +but, like GFS, uses a 4.3BSD-style name lookup interface. +Additional context information has been moved from the \fIuser\fP structure +to the \fInameidata\fP structure so that name translation may be independent +of the global context of a user process. +This is especially desired in any system where kernel-mode servers +operate as light-weight or interrupt-level processes, +or where a server may store or cache context for several clients. +This calling interface has the additional advantage +that the call parameters need not all be pushed onto the stack for each call +through the filesystem interface, +and they may be accessed using short offsets from a base pointer +(unlike global variables in the \fIuser\fP structure). +.PP +The proposed filesystem interface is described very tersely here. +For the most part, data structures and procedures are analogous +to those used by VFS, and only the changes will be treated here. +See [Kleiman86] for complete descriptions of the vfs and vnode operations +in Sun's interface. +.PP +The central data structure for name translation is the \fInameidata\fP +structure. +The same structure is used to pass parameters to \fInamei\fP, +to pass these same parameters to filesystem-specific lookup routines, +to communicate completion status from the lookup routines back to \fInamei\fP, +and to return completion status to the calling routine. +For creation or deletion requests, the parameters to the filesystem operation +to complete the request are also passed in this same structure. +The form of the \fInameidata\fP structure is: +.br +.ne 2i +.ID +.nf +.ta .5i +\w'caddr_t\0\0\0'u +\w'struct\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +/* + * Encapsulation of namei parameters. + * One of these is located in the u. area to + * minimize space allocated on the kernel stack + * and to retain per-process context. + */ +struct nameidata { + /* arguments to namei and related context: */ + caddr_t ni_dirp; /* pathname pointer */ + enum uio_seg ni_seg; /* location of pathname */ + short ni_nameiop; /* see below */ + struct vnode *ni_cdir; /* current directory */ + struct vnode *ni_rdir; /* root directory, if not normal root */ + struct ucred *ni_cred; /* credentials */ + + /* shared between namei, lookup routines and commit routines: */ + caddr_t ni_pnbuf; /* pathname buffer */ + char *ni_ptr; /* current location in pathname */ + int ni_pathlen; /* remaining chars in path */ + short ni_more; /* more left to translate in pathname */ + short ni_loopcnt; /* count of symlinks encountered */ + + /* results: */ + struct vnode *ni_vp; /* vnode of result */ + struct vnode *ni_dvp; /* vnode of intermediate directory */ + +/* BEGIN UFS SPECIFIC */ + struct diroffcache { /* last successful directory search */ + struct vnode *nc_prevdir; /* terminal directory */ + long nc_id; /* directory's unique id */ + off_t nc_prevoffset; /* where last entry found */ + } ni_nc; +/* END UFS SPECIFIC */ +}; +.DE +.DS +.ta \w'#define\0\0'u +\w'WANTPARENT\0\0'u +\w'0x40\0\0\0\0\0\0\0'u +/* + * namei operations and modifiers + */ +#define LOOKUP 0 /* perform name lookup only */ +#define CREATE 1 /* setup for file creation */ +#define DELETE 2 /* setup for file deletion */ +#define WANTPARENT 0x10 /* return parent directory vnode also */ +#define NOCACHE 0x20 /* name must not be left in cache */ +#define FOLLOW 0x40 /* follow symbolic links */ +#define NOFOLLOW 0x0 /* don't follow symbolic links (pseudo) */ +.DE +As in current systems other than Sun's VFS, \fInamei\fP is called +with an operation request, one of LOOKUP, CREATE or DELETE. +For a LOOKUP, the operation is exactly like the lookup in VFS. +CREATE and DELETE allow the filesystem to ensure consistency +by locking the parent inode (private to the filesystem), +and (for the local filesystem) to avoid duplicate directory scans +by storing the new directory entry and its offset in the directory +in the \fIndirinfo\fP structure. +This is intended to be opaque to the filesystem-independent levels. +Not all lookups for creation or deletion are actually followed +by the intended operation; permission may be denied, the filesystem +may be read-only, etc. +Therefore, an entry point to the filesystem is provided +to abort a creation or deletion operation +and allow release of any locked internal data. +After a \fInamei\fP with a CREATE or DELETE flag, the pathname pointer +is set to point to the last filename component. +Filesystems that choose to implement creation or deletion entirely +within the subsequent call to a create or delete entry +are thus free to do so. +.PP +The \fInameidata\fP is used to store context used during name translation. +The current and root directories for the translation are stored here. +For the local filesystem, the per-process directory offset cache +is also kept here. +A file server could leave the directory offset cache empty, +could use a single cache for all clients, +or could hold caches for several recent clients. +.PP +Several other data structures are used in the filesystem operations. +One is the \fIucred\fP structure which describes a client's credentials +to the filesystem. +This is modified slightly from the Sun structure; +the ``accounting'' group ID has been merged into the groups array. +The actual number of groups in the array is given explicitly +to avoid use of a reserved group ID as a terminator. +Also, typedefs introduced in 4.3BSD for user and group ID's have been used. +The \fIucred\fP structure is thus: +.DS +.ta .5i +\w'caddr_t\0\0\0'u +\w'struct\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +/* + * Credentials. + */ +struct ucred { + u_short cr_ref; /* reference count */ + uid_t cr_uid; /* effective user id */ + short cr_ngroups; /* number of groups */ + gid_t cr_groups[NGROUPS]; /* groups */ + /* + * The following either should not be here, + * or should be treated as opaque. + */ + uid_t cr_ruid; /* real user id */ + gid_t cr_svgid; /* saved set-group id */ +}; +.DE +.PP +A final structure used by the filesystem interface is the \fIuio\fP +structure mentioned earlier. +This structure describes the source or destination of an I/O +operation, with provision for scatter/gather I/O. +It is used in the read and write entries to the filesystem. +The \fIuio\fP structure presented here is modified from the one +used in 4.2BSD to specify the location of each vector of the operation +(user or kernel space) +and to allow an alternate function to be used to implement the data movement. +The alternate function might perform page remapping rather than a copy, +for example. +.DS +.ta .5i +\w'caddr_t\0\0\0'u +\w'struct\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +/* + * Description of an I/O operation which potentially + * involves scatter-gather, with individual sections + * described by iovec, below. uio_resid is initially + * set to the total size of the operation, and is + * decremented as the operation proceeds. uio_offset + * is incremented by the amount of each operation. + * uio_iov is incremented and uio_iovcnt is decremented + * after each vector is processed. + */ +struct uio { + struct iovec *uio_iov; + int uio_iovcnt; + off_t uio_offset; + int uio_resid; + enum uio_rw uio_rw; +}; + +enum uio_rw { UIO_READ, UIO_WRITE }; +.DE +.DS +.ta .5i +\w'caddr_t\0\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +/* + * Description of a contiguous section of an I/O operation. + * If iov_op is non-null, it is called to implement the copy + * operation, possibly by remapping, with the call + * (*iov_op)(from, to, count); + * where from and to are caddr_t and count is int. + * Otherwise, the copy is done in the normal way, + * treating base as a user or kernel virtual address + * according to iov_segflg. + */ +struct iovec { + caddr_t iov_base; + int iov_len; + enum uio_seg iov_segflg; + int (*iov_op)(); +}; +.DE +.DS +.ta .5i +\w'UIO_USERSPACE\0\0\0\0\0'u +/* + * Segment flag values. + */ +enum uio_seg { + UIO_USERSPACE, /* from user data space */ + UIO_SYSSPACE, /* from system space */ +}; +.DE +.NH +File and filesystem operations +.PP +With the introduction of the data structures used by the filesystem +operations, the complete list of filesystem entry points may be listed. +As noted, they derive mostly from the Sun VFS interface. +Lines marked with \fB+\fP are additions to the Sun definitions; +lines marked with \fB!\fP are modified from VFS. +.PP +The structure describing the externally-visible features of a mounted +filesystem, \fIvfs\fP, is: +.DS +.ta .5i +\w'struct vfsops\0\0\0'u +\w'*vfs_vnodecovered;\0\0\0\0\0'u +/* + * Structure per mounted file system. + * Each mounted file system has an array of + * operations and an instance record. + * The file systems are put on a doubly linked list. + */ +struct vfs { + struct vfs *vfs_next; /* next vfs in vfs list */ +\fB+\fP struct vfs *vfs_prev; /* prev vfs in vfs list */ + struct vfsops *vfs_op; /* operations on vfs */ + struct vnode *vfs_vnodecovered; /* vnode we mounted on */ + int vfs_flag; /* flags */ +\fB!\fP int vfs_fsize; /* fundamental block size */ +\fB+\fP int vfs_bsize; /* optimal transfer size */ +\fB!\fP uid_t vfs_exroot; /* exported fs uid 0 mapping */ + short vfs_exflags; /* exported fs flags */ + caddr_t vfs_data; /* private data */ +}; +.DE +.DS +.ta \w'\fB+\fP 'u +\w'#define\0\0'u +\w'VFS_EXPORTED\0\0'u +\w'0x40\0\0\0\0\0'u + /* + * vfs flags. + * VFS_MLOCK lock the vfs so that name lookup cannot proceed past the vfs. + * This keeps the subtree stable during mounts and unmounts. + */ + #define VFS_RDONLY 0x01 /* read only vfs */ +\fB+\fP #define VFS_NOEXEC 0x02 /* can't exec from filesystem */ + #define VFS_MLOCK 0x04 /* lock vfs so that subtree is stable */ + #define VFS_MWAIT 0x08 /* someone is waiting for lock */ + #define VFS_NOSUID 0x10 /* don't honor setuid bits on vfs */ + #define VFS_EXPORTED 0x20 /* file system is exported (NFS) */ + + /* + * exported vfs flags. + */ + #define EX_RDONLY 0x01 /* exported read only */ +.DE +.LP +The operations supported by the filesystem-specific layer +on an individual filesystem are: +.DS +.ta .5i +\w'struct vfsops\0\0\0'u +\w'*vfs_vnodecovered;\0\0\0\0\0'u +/* + * Operations supported on virtual file system. + */ +struct vfsops { +\fB!\fP int (*vfs_mount)( /* vfs, path, data, datalen */ ); +\fB!\fP int (*vfs_unmount)( /* vfs, forcibly */ ); +\fB+\fP int (*vfs_mountroot)(); + int (*vfs_root)( /* vfs, vpp */ ); +\fB!\fP int (*vfs_statfs)( /* vfs, vp, sbp */ ); +\fB!\fP int (*vfs_sync)( /* vfs, waitfor */ ); +\fB+\fP int (*vfs_fhtovp)( /* vfs, fhp, vpp */ ); +\fB+\fP int (*vfs_vptofh)( /* vp, fhp */ ); +}; +.DE +.LP +The \fIvfs_statfs\fP entry returns a structure of the form: +.DS +.ta .5i +\w'struct vfsops\0\0\0'u +\w'*vfs_vnodecovered;\0\0\0\0\0'u +/* + * file system statistics + */ +struct statfs { +\fB!\fP short f_type; /* type of filesystem */ +\fB+\fP short f_flags; /* copy of vfs (mount) flags */ +\fB!\fP long f_fsize; /* fundamental file system block size */ +\fB+\fP long f_bsize; /* optimal transfer block size */ + long f_blocks; /* total data blocks in file system */ + long f_bfree; /* free blocks in fs */ + long f_bavail; /* free blocks avail to non-superuser */ + long f_files; /* total file nodes in file system */ + long f_ffree; /* free file nodes in fs */ + fsid_t f_fsid; /* file system id */ +\fB+\fP char *f_mntonname; /* directory on which mounted */ +\fB+\fP char *f_mntfromname; /* mounted filesystem */ + long f_spare[7]; /* spare for later */ +}; + +typedef long fsid_t[2]; /* file system id type */ +.DE +.LP +The modifications to Sun's interface at this level are minor. +Additional arguments are present for the \fIvfs_mount\fP and \fIvfs_umount\fP +entries. +\fIvfs_statfs\fP accepts a vnode as well as filesystem identifier, +as the information may not be uniform throughout a filesystem. +For example, +if a client may mount a file tree that spans multiple physical +filesystems on a server, different sections may have different amounts +of free space. +(NFS does not allow remotely-mounted file trees to span physical filesystems +on the server.) +The final additions are the entries that support file handles. +\fIvfs_vptofh\fP is provided for the use of file servers, +which need to obtain an opaque +file handle to represent the current vnode for transmission to clients. +This file handle may later be used to relocate the vnode using \fIvfs_fhtovp\fP +without requiring the vnode to remain in memory. +.PP +Finally, the external form of a filesystem object, the \fIvnode\fP, is: +.DS +.ta .5i +\w'struct vnodeops\0\0'u +\w'*v_vfsmountedhere;\0\0\0'u +/* + * vnode types. VNON means no type. + */ +enum vtype { VNON, VREG, VDIR, VBLK, VCHR, VLNK, VSOCK }; + +struct vnode { + u_short v_flag; /* vnode flags (see below) */ + u_short v_count; /* reference count */ + u_short v_shlockc; /* count of shared locks */ + u_short v_exlockc; /* count of exclusive locks */ + struct vfs *v_vfsmountedhere; /* ptr to vfs mounted here */ + struct vfs *v_vfsp; /* ptr to vfs we are in */ + struct vnodeops *v_op; /* vnode operations */ +\fB+\fP struct text *v_text; /* text/mapped region */ + enum vtype v_type; /* vnode type */ + caddr_t v_data; /* private data for fs */ +}; +.DE +.DS +.ta \w'#define\0\0'u +\w'NOFOLLOW\0\0'u +\w'0x40\0\0\0\0\0\0\0'u +/* + * vnode flags. + */ +#define VROOT 0x01 /* root of its file system */ +#define VTEXT 0x02 /* vnode is a pure text prototype */ +#define VEXLOCK 0x10 /* exclusive lock */ +#define VSHLOCK 0x20 /* shared lock */ +#define VLWAIT 0x40 /* proc is waiting on shared or excl. lock */ +.DE +.LP +The operations supported by the filesystems on individual \fIvnode\fP\^s +are: +.DS +.ta .5i +\w'int\0\0\0\0\0'u +\w'(*vn_getattr)(\0\0\0\0\0'u +/* + * Operations on vnodes. + */ +struct vnodeops { +\fB!\fP int (*vn_lookup)( /* ndp */ ); +\fB!\fP int (*vn_create)( /* ndp, vap, fflags */ ); +\fB+\fP int (*vn_mknod)( /* ndp, vap, fflags */ ); +\fB!\fP int (*vn_open)( /* vp, fflags, cred */ ); + int (*vn_close)( /* vp, fflags, cred */ ); + int (*vn_access)( /* vp, fflags, cred */ ); + int (*vn_getattr)( /* vp, vap, cred */ ); + int (*vn_setattr)( /* vp, vap, cred */ ); + +\fB+\fP int (*vn_read)( /* vp, uiop, offp, ioflag, cred */ ); +\fB+\fP int (*vn_write)( /* vp, uiop, offp, ioflag, cred */ ); +\fB!\fP int (*vn_ioctl)( /* vp, com, data, fflag, cred */ ); + int (*vn_select)( /* vp, which, cred */ ); +\fB+\fP int (*vn_mmap)( /* vp, ..., cred */ ); + int (*vn_fsync)( /* vp, cred */ ); +\fB+\fP int (*vn_seek)( /* vp, offp, off, whence */ ); + +\fB!\fP int (*vn_remove)( /* ndp */ ); +\fB!\fP int (*vn_link)( /* vp, ndp */ ); +\fB!\fP int (*vn_rename)( /* src ndp, target ndp */ ); +\fB!\fP int (*vn_mkdir)( /* ndp, vap */ ); +\fB!\fP int (*vn_rmdir)( /* ndp */ ); +\fB!\fP int (*vn_symlink)( /* ndp, vap, nm */ ); + int (*vn_readdir)( /* vp, uiop, offp, ioflag, cred */ ); + int (*vn_readlink)( /* vp, uiop, ioflag, cred */ ); + +\fB+\fP int (*vn_abortop)( /* ndp */ ); +\fB+\fP int (*vn_lock)( /* vp */ ); +\fB+\fP int (*vn_unlock)( /* vp */ ); +\fB!\fP int (*vn_inactive)( /* vp */ ); +}; +.DE +.DS +.ta \w'#define\0\0'u +\w'NOFOLLOW\0\0'u +\w'0x40\0\0\0\0\0'u +/* + * flags for ioflag + */ +#define IO_UNIT 0x01 /* do io as atomic unit for VOP_RDWR */ +#define IO_APPEND 0x02 /* append write for VOP_RDWR */ +#define IO_SYNC 0x04 /* sync io for VOP_RDWR */ +.DE +.LP +The argument types listed in the comments following each operation are: +.sp +.IP ndp 10 +A pointer to a \fInameidata\fP structure. +.IP vap +A pointer to a \fIvattr\fP structure (vnode attributes; see below). +.IP fflags +File open flags, possibly including O_APPEND, O_CREAT, O_TRUNC and O_EXCL. +.IP vp +A pointer to a \fIvnode\fP previously obtained with \fIvn_lookup\fP. +.IP cred +A pointer to a \fIucred\fP credentials structure. +.IP uiop +A pointer to a \fIuio\fP structure. +.IP ioflag +Any of the IO flags defined above. +.IP com +An \fIioctl\fP command, with type \fIunsigned long\fP. +.IP data +A pointer to a character buffer used to pass data to or from an \fIioctl\fP. +.IP which +One of FREAD, FWRITE or 0 (select for exceptional conditions). +.IP off +A file offset of type \fIoff_t\fP. +.IP offp +A pointer to file offset of type \fIoff_t\fP. +.IP whence +One of L_SET, L_INCR, or L_XTND. +.IP fhp +A pointer to a file handle buffer. +.sp +.PP +Several changes have been made to Sun's set of vnode operations. +Most obviously, the \fIvn_lookup\fP receives a \fInameidata\fP structure +containing its arguments and context as described. +The same structure is also passed to one of the creation or deletion +entries if the lookup operation is for CREATE or DELETE to complete +an operation, or to the \fIvn_abortop\fP entry if no operation +is undertaken. +For filesystems that perform no locking between lookup for creation +or deletion and the call to implement that action, +the final pathname component may be left untranslated by the lookup +routine. +In any case, the pathname pointer points at the final name component, +and the \fInameidata\fP contains a reference to the vnode of the parent +directory. +The interface is thus flexible enough to accommodate filesystems +that are fully stateful or fully stateless, while avoiding redundant +operations whenever possible. +One operation remains problematical, the \fIvn_rename\fP call. +It is tempting to look up the source of the rename for deletion +and the target for creation. +However, filesystems that lock directories during such lookups must avoid +deadlock if the two paths cross. +For that reason, the source is translated for LOOKUP only, +with the WANTPARENT flag set; +the target is then translated with an operation of CREATE. +.PP +In addition to the changes concerned with the \fInameidata\fP interface, +several other changes were made in the vnode operations. +The \fIvn_rdrw\fP entry was split into \fIvn_read\fP and \fIvn_write\fP; +frequently, the read/write entry amounts to a routine that checks +the direction flag, then calls either a read routine or a write routine. +The two entries may be identical for any given filesystem; +the direction flag is contained in the \fIuio\fP given as an argument. +.PP +All of the read and write operations use a \fIuio\fP to describe +the file offset and buffer locations. +All of these fields must be updated before return. +In particular, the \fIvn_readdir\fP entry uses this +to return a new file offset token for its current location. +.PP +Several new operations have been added. +The first, \fIvn_seek\fP, is a concession to record-oriented files +such as directories. +It allows the filesystem to verify that a seek leaves a file at a sensible +offset, or to return a new offset token relative to an earlier one. +For most filesystems and files, this operation amounts to performing +simple arithmetic. +Another new entry point is \fIvn_mmap\fP, for use in mapping device memory +into a user process address space. +Its semantics are not yet decided. +The final additions are the \fIvn_lock\fP and \fIvn_unlock\fP entries. +These are used to request that the underlying file be locked against +changes for short periods of time if the filesystem implementation allows it. +They are used to maintain consistency +during internal operations such as \fIexec\fP, +and may not be used to construct atomic operations from other filesystem +operations. +.PP +The attributes of a vnode are not stored in the vnode, +as they might change with time and may need to be read from a remote +source. +Attributes have the form: +.DS +.ta .5i +\w'struct vnodeops\0\0'u +\w'*v_vfsmountedhere;\0\0\0'u +/* + * Vnode attributes. A field value of -1 + * represents a field whose value is unavailable + * (getattr) or which is not to be changed (setattr). + */ +struct vattr { + enum vtype va_type; /* vnode type (for create) */ + u_short va_mode; /* files access mode and type */ +\fB!\fP uid_t va_uid; /* owner user id */ +\fB!\fP gid_t va_gid; /* owner group id */ + long va_fsid; /* file system id (dev for now) */ +\fB!\fP long va_fileid; /* file id */ + short va_nlink; /* number of references to file */ + u_long va_size; /* file size in bytes (quad?) */ +\fB+\fP u_long va_size1; /* reserved if not quad */ + long va_blocksize; /* blocksize preferred for i/o */ + struct timeval va_atime; /* time of last access */ + struct timeval va_mtime; /* time of last modification */ + struct timeval va_ctime; /* time file changed */ + dev_t va_rdev; /* device the file represents */ + u_long va_bytes; /* bytes of disk space held by file */ +\fB+\fP u_long va_bytes1; /* reserved if va_bytes not a quad */ +}; +.DE +.NH +Conclusions +.PP +The Sun VFS filesystem interface is the most widely used generic +filesystem interface. +Of the interfaces examined, it creates the cleanest separation +between the filesystem-independent and -dependent layers and data structures. +It has several flaws, but it is felt that certain changes in the interface +can ameliorate most of them. +The interface proposed here includes those changes. +The proposed interface is now being implemented by the Computer Systems +Research Group at Berkeley. +If the design succeeds in improving the flexibility and performance +of the filesystem layering, it will be advanced as a model interface. +.NH +Acknowledgements +.PP +The filesystem interface described here is derived from Sun's VFS interface. +It also includes features similar to those of DEC's GFS interface. +We are indebted to members of the Sun and DEC system groups +for long discussions of the issues involved. +.br +.ne 2i +.NH +References + +.IP Brownbridge82 \w'Satyanarayanan85\0\0'u +Brownbridge, D.R., L.F. Marshall, B. Randell, +``The Newcastle Connection, or UNIXes of the World Unite!,'' +\fISoftware\- Practice and Experience\fP, Vol. 12, pp. 1147-1162, 1982. + +.IP Cole85 +Cole, C.T., P.B. Flinn, A.B. Atlas, +``An Implementation of an Extended File System for UNIX,'' +\fIUsenix Conference Proceedings\fP, +pp. 131-150, June, 1985. + +.IP Kleiman86 +``Vnodes: An Architecture for Multiple File System Types in Sun UNIX,'' +\fIUsenix Conference Proceedings\fP, +pp. 238-247, June, 1986. + +.IP Leffler84 +Leffler, S., M.K. McKusick, M. Karels, +``Measuring and Improving the Performance of 4.2BSD,'' +\fIUsenix Conference Proceedings\fP, pp. 237-252, June, 1984. + +.IP McKusick84 +McKusick, M.K., W.N. Joy, S.J. Leffler, R.S. Fabry, +``A Fast File System for UNIX,'' \fITransactions on Computer Systems\fP, +Vol. 2, pp. 181-197, +ACM, August, 1984. + +.IP McKusick85 +McKusick, M.K., M. Karels, S. Leffler, +``Performance Improvements and Functional Enhancements in 4.3BSD,'' +\fIUsenix Conference Proceedings\fP, pp. 519-531, June, 1985. + +.IP Rifkin86 +Rifkin, A.P., M.P. Forbes, R.L. Hamilton, M. Sabrio, S. Shah, and K. Yueh, +``RFS Architectural Overview,'' \fIUsenix Conference Proceedings\fP, +pp. 248-259, June, 1986. + +.IP Ritchie74 +Ritchie, D.M. and K. Thompson, ``The Unix Time-Sharing System,'' +\fICommunications of the ACM\fP, Vol. 17, pp. 365-375, July, 1974. + +.IP Rodriguez86 +Rodriguez, R., M. Koehler, R. Hyde, +``The Generic File System,'' \fIUsenix Conference Proceedings\fP, +pp. 260-269, June, 1986. + +.IP Sandberg85 +Sandberg, R., D. Goldberg, S. Kleiman, D. Walsh, B. Lyon, +``Design and Implementation of the Sun Network Filesystem,'' +\fIUsenix Conference Proceedings\fP, +pp. 119-130, June, 1985. + +.IP Satyanarayanan85 +Satyanarayanan, M., \fIet al.\fP, +``The ITC Distributed File System: Principles and Design,'' +\fIProc. 10th Symposium on Operating Systems Principles\fP, pp. 35-50, +ACM, December, 1985. + +.IP Walker85 +Walker, B.J. and S.H. Kiser, ``The LOCUS Distributed Filesystem,'' +\fIThe LOCUS Distributed System Architecture\fP, +G.J. Popek and B.J. Walker, ed., The MIT Press, Cambridge, MA, 1985. + +.IP Weinberger84 +Weinberger, P.J., ``The Version 8 Network File System,'' +\fIUsenix Conference presentation\fP, +June, 1984. diff --git a/share/doc/papers/fsinterface/slides.t b/share/doc/papers/fsinterface/slides.t new file mode 100644 index 0000000..3caaafb --- /dev/null +++ b/share/doc/papers/fsinterface/slides.t @@ -0,0 +1,318 @@ +.\" Copyright (c) 1986 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)slides.t 5.2 (Berkeley) 4/16/91 +.\" +.so macros +.nf +.LL +Encapsulation of namei parameters +.NP 0 +.ta .5i +\w'caddr_t\0\0'u +\w'struct\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +struct nameidata { + /* arguments and context: */ + caddr_t ni_dirp; + enum uio_seg ni_seg; + short ni_nameiop; + struct vnode *ni_cdir; + struct vnode *ni_rdir; + struct ucred *ni_cred; +.sp .2 + /* shared with lookup and commit: */ + caddr_t ni_pnbuf; + char *ni_ptr; + int ni_pathlen; + short ni_more; + short ni_loopcnt; +.sp .2 + /* results: */ + struct vnode *ni_vp; + struct vnode *ni_dvp; +.sp .2 +/* BEGIN UFS SPECIFIC */ + struct diroffcache { + struct vnode *nc_prevdir; + long nc_id; + off_t nc_prevoffset; + } ni_nc; +/* END UFS SPECIFIC */ +}; +.bp + + +.LL +Namei operations and modifiers + +.NP 0 +.ta \w'#define\0\0'u +\w'WANTPARENT\0\0'u +\w'0x40\0\0\0\0\0\0\0'u +#define LOOKUP 0 /* name lookup only */ +#define CREATE 1 /* setup for creation */ +#define DELETE 2 /* setup for deletion */ +#define WANTPARENT 0x10 /* return parent vnode also */ +#define NOCACHE 0x20 /* remove name from cache */ +#define FOLLOW 0x40 /* follow symbolic links */ +.bp + +.LL +Namei operations and modifiers + +.NP 0 +.ta \w'#define\0\0'u +\w'WANTPARENT\0\0'u +\w'0x40\0\0\0\0\0\0\0'u +#define LOOKUP 0 +#define CREATE 1 +#define DELETE 2 +#define WANTPARENT 0x10 +#define NOCACHE 0x20 +#define FOLLOW 0x40 +.bp + + +.LL +Credentials + +.NP 0 +.ta .5i +\w'caddr_t\0\0\0'u +\w'struct\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +struct ucred { + u_short cr_ref; + uid_t cr_uid; + short cr_ngroups; + gid_t cr_groups[NGROUPS]; + /* + * The following either should not be here, + * or should be treated as opaque. + */ + uid_t cr_ruid; + gid_t cr_svgid; +}; +.bp +.LL +Scatter-gather I/O +.NP 0 +.ta .5i +\w'caddr_t\0\0\0'u +\w'struct\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +struct uio { + struct iovec *uio_iov; + int uio_iovcnt; + off_t uio_offset; + int uio_resid; + enum uio_rw uio_rw; +}; + +enum uio_rw { UIO_READ, UIO_WRITE }; + + + +.ta .5i +\w'caddr_t\0\0\0'u +\w'vnode *nc_prevdir;\0\0\0\0\0'u +struct iovec { + caddr_t iov_base; + int iov_len; + enum uio_seg iov_segflg; + int (*iov_op)(); +}; +.bp +.LL +Per-filesystem information +.NP 0 +.ta .25i +\w'struct vfsops\0\0\0'u +\w'*vfs_vnodecovered;\0\0\0\0\0'u +struct vfs { + struct vfs *vfs_next; +\fB+\fP struct vfs *vfs_prev; + struct vfsops *vfs_op; + struct vnode *vfs_vnodecovered; + int vfs_flag; +\fB!\fP int vfs_fsize; +\fB+\fP int vfs_bsize; +\fB!\fP uid_t vfs_exroot; + short vfs_exflags; + caddr_t vfs_data; +}; + +.NP 0 +.ta \w'\fB+\fP 'u +\w'#define\0\0'u +\w'VFS_EXPORTED\0\0'u +\w'0x40\0\0\0\0\0'u + /* vfs flags: */ + #define VFS_RDONLY 0x01 +\fB+\fP #define VFS_NOEXEC 0x02 + #define VFS_MLOCK 0x04 + #define VFS_MWAIT 0x08 + #define VFS_NOSUID 0x10 + #define VFS_EXPORTED 0x20 + + /* exported vfs flags: */ + #define EX_RDONLY 0x01 +.bp + + +.LL +Operations supported on virtual file system. + +.NP 0 +.ta .25i +\w'int\0\0'u +\w'*vfs_mountroot();\0'u +struct vfsops { +\fB!\fP int (*vfs_mount)(vfs, path, data, len); +\fB!\fP int (*vfs_unmount)(vfs, forcibly); +\fB+\fP int (*vfs_mountroot)(); + int (*vfs_root)(vfs, vpp); + int (*vfs_statfs)(vfs, sbp); +\fB!\fP int (*vfs_sync)(vfs, waitfor); +\fB+\fP int (*vfs_fhtovp)(vfs, fhp, vpp); +\fB+\fP int (*vfs_vptofh)(vp, fhp); +}; +.bp + + +.LL +Dynamic file system information + +.NP 0 +.ta .5i +\w'struct\0\0\0'u +\w'*vfs_vnodecovered;\0\0\0\0\0'u +struct statfs { +\fB!\fP short f_type; +\fB+\fP short f_flags; +\fB!\fP long f_fsize; +\fB+\fP long f_bsize; + long f_blocks; + long f_bfree; + long f_bavail; + long f_files; + long f_ffree; + fsid_t f_fsid; +\fB+\fP char *f_mntonname; +\fB+\fP char *f_mntfromname; + long f_spare[7]; +}; + +typedef long fsid_t[2]; +.bp +.LL +Filesystem objects (vnodes) +.NP 0 +.ta .25i +\w'struct vnodeops\0\0'u +\w'*v_vfsmountedhere;\0\0\0'u +enum vtype { VNON, VREG, VDIR, VBLK, VCHR, VLNK, VSOCK }; + +struct vnode { + u_short v_flag; + u_short v_count; + u_short v_shlockc; + u_short v_exlockc; + struct vfs *v_vfsmountedhere; + struct vfs *v_vfsp; + struct vnodeops *v_op; +\fB+\fP struct text *v_text; + enum vtype v_type; + caddr_t v_data; +}; +.ta \w'#define\0\0'u +\w'NOFOLLOW\0\0'u +\w'0x40\0\0\0\0\0\0\0'u + +/* vnode flags */ +#define VROOT 0x01 +#define VTEXT 0x02 +#define VEXLOCK 0x10 +#define VSHLOCK 0x20 +#define VLWAIT 0x40 +.bp +.LL +Operations on vnodes + +.NP 0 +.ta .25i +\w'int\0\0'u +\w'(*vn_getattr)(\0\0\0\0\0'u +struct vnodeops { +\fB!\fP int (*vn_lookup)(ndp); +\fB!\fP int (*vn_create)(ndp, vap, fflags); +\fB+\fP int (*vn_mknod)(ndp, vap, fflags); +\fB!\fP int (*vn_open)(vp, fflags, cred); + int (*vn_close)(vp, fflags, cred); + int (*vn_access)(vp, fflags, cred); + int (*vn_getattr)(vp, vap, cred); + int (*vn_setattr)(vp, vap, cred); +.sp .5 +\fB+\fP int (*vn_read)(vp, uiop, + offp, ioflag, cred); +\fB+\fP int (*vn_write)(vp, uiop, + offp, ioflag, cred); +\fB!\fP int (*vn_ioctl)(vp, com, + data, fflag, cred); + int (*vn_select)(vp, which, cred); +\fB+\fP int (*vn_mmap)(vp, ..., cred); + int (*vn_fsync)(vp, cred); +\fB+\fP int (*vn_seek)(vp, offp, off, + whence); +.bp +.LL +Operations on vnodes (cont) + +.NP 0 +.ta .25i +\w'int\0\0'u +\w'(*vn_getattr)(\0\0\0\0\0'u + +\fB!\fP int (*vn_remove)(ndp); +\fB!\fP int (*vn_link)(vp, ndp); +\fB!\fP int (*vn_rename)(sndp, tndp); +\fB!\fP int (*vn_mkdir)(ndp, vap); +\fB!\fP int (*vn_rmdir)(ndp); +\fB!\fP int (*vn_symlink)(ndp, vap, nm); +\fB!\fP int (*vn_readdir)(vp, uiop, + offp, ioflag, cred); +\fB!\fP int (*vn_readlink)(vp, uiop, + offp, ioflag, cred); +.sp .5 +\fB+\fP int (*vn_abortop)(ndp); +\fB!\fP int (*vn_inactive)(vp); +}; + +.NP 0 +.ta \w'#define\0\0'u +\w'NOFOLLOW\0\0'u +\w'0x40\0\0\0\0\0'u +/* flags for ioflag */ +#define IO_UNIT 0x01 +#define IO_APPEND 0x02 +#define IO_SYNC 0x04 +.bp + +.LL +Vnode attributes + +.NP 0 +.ta .5i +\w'struct timeval\0\0'u +\w'*v_vfsmountedhere;\0\0\0'u +struct vattr { + enum vtype va_type; + u_short va_mode; +\fB!\fP uid_t va_uid; +\fB!\fP gid_t va_gid; + long va_fsid; +\fB!\fP long va_fileid; + short va_nlink; + u_long va_size; +\fB+\fP u_long va_size1; + long va_blocksize; + struct timeval va_atime; + struct timeval va_mtime; + struct timeval va_ctime; + dev_t va_rdev; +\fB!\fP u_long va_bytes; +\fB+\fP u_long va_bytes1; +}; diff --git a/share/doc/papers/hwpmc/Makefile b/share/doc/papers/hwpmc/Makefile new file mode 100644 index 0000000..d24fe06 --- /dev/null +++ b/share/doc/papers/hwpmc/Makefile @@ -0,0 +1,8 @@ +# $FreeBSD$ + +VOLUME= papers +DOC= hwpmc +SRCS= hwpmc.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/papers/hwpmc/hwpmc.ms b/share/doc/papers/hwpmc/hwpmc.ms new file mode 100644 index 0000000..9061bb7 --- /dev/null +++ b/share/doc/papers/hwpmc/hwpmc.ms @@ -0,0 +1,34 @@ +.\" Copyright (c) 2004 Joseph Koshy. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY JOSEPH KOSHY AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL JOSEPH KOSHY OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.OH '''Using Hardware Performance Monitoring Counters' +.EH 'HWPMC''' +.TL +Using Hardware Performance Monitoring Counters in FreeBSD +.sp +\s-2FreeBSD 5.2.1\s+2 +.sp +\fRJuly, 2004\fR +.PP diff --git a/share/doc/papers/jail/Makefile b/share/doc/papers/jail/Makefile new file mode 100644 index 0000000..5d49354 --- /dev/null +++ b/share/doc/papers/jail/Makefile @@ -0,0 +1,14 @@ +# $FreeBSD$ + +VOLUME= papers +DOC= jail +SRCS= paper.ms-patched +EXTRA= implementation.ms mgt.ms future.ms jail01.eps +MACROS= -ms +USE_SOELIM= +CLEANFILES= paper.ms-patched + +paper.ms-patched: paper.ms + sed "s;jail01\.eps;${.CURDIR}/&;" ${.ALLSRC} > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/papers/jail/future.ms b/share/doc/papers/jail/future.ms new file mode 100644 index 0000000..01c325d --- /dev/null +++ b/share/doc/papers/jail/future.ms @@ -0,0 +1,104 @@ +.\" +.\" $FreeBSD$ +.\" +.NH +Future Directions +.PP +The jail facility has already been deployed in numerous capacities and +a few opportunities for improvement have manifested themselves. +.NH 2 +Improved Virtualisation +.PP +As it stands, the jail code provides a strict subset of system resources +to the jail environment, based on access to processes, files, network +resources, and privileged services. +Virtualisation, or making the jail environments appear to be fully +functional FreeBSD systems, allows maximum application support and the +ability to offer a wide range of services within a jail environment. +However, there are a number of limitations on the degree of virtualisation +in the current code, and removing these limitations will enhance the +ability to offer services in a jail environment. +Two areas that deserve greater attention are the virtualisation of +network resources, and management of scheduling resources. +.PP +Currently, a single IP address may be allocated to each jail, and all +communication from the jail is limited to that IP address. +In particular, these addresses are IPv4 addresses. +There has been substantial interest in improving interface virtualisation, +allowing one or more addresses to be assigned to an interface, and +removing the requirement that the address be an IPv4 address, allowing +the use of IPv6. +Also, access to raw sockets is currently prohibited, as the current +implementation of raw sockets allows access to raw IP packets associated +with all interfaces. +Limiting the scope of the raw socket would allow its safe use within +a jail, re-enabling support for ping, and other network debugging and +evaluation tools. +.PP +Another area of great interest to the current consumers of the jail code +is the ability to limit the impact of one jail on the CPU resources +available for other jails. +Specifically, this would require that the jail of a process play a rule in +its scheduling parameters. +Prior work in the area of lottery scheduling, currently available as +patches on FreeBSD 2.2.x, might be leveraged to allow some degree of +partitioning between jail environments \s-2[LOTTERY1] [LOTTERY2]\s+2. +However, as the current scheduling mechanism is targeted at time +sharing, and FreeBSD does not currently support real time preemption +of processes in kernel, complete partitioning is not possible within the +current framework. +.NH 2 +Improved Management +.PP +Management of jail environments is currently somewhat ad hoc--creating +and starting jails is a well-documented procedure, but day-to-day +management of jails, as well as special case procedures such as shutdown, +are not well analysed and documented. +The current kernel process management infrastructure does not have the +ability to manage pools of processes in a jail-centric way. +For example, it is possible to, within a jail, deliver a signal to all +processes in a jail, but it is not possibly to atomically target all +processes within a jail from outside of the jail. +If the jail code is to effectively limit the behaviour of a jail, the +ability to shut it down cleanly is paramount. +Similarly, shutting down a jail cleanly from within is also not well +defined, the traditional shutdown utilities having been written with +a host environment in mind. +This suggests a number of improvements, both in the kernel and in the +user-land utility set. +.PP +First, the ability to address kernel-centric management mechanisms at +jails is important. +One way in which this might be done is to assign a unique jail id, not +unlike a process id or process group id, at jail creation time. +A new jailkill() syscall would permit the direction of signals to +specific jailids, allowing for the effective termination of all processes +in the jail. +A unique jailid could also supplant the hostname as the unique +identifier for a jail, allowing the hostname to be changed by the +processes in the jail without interfering with jail management. +.PP +More carefully defining the user-land semantics of a jail during startup +and shutdown is also important. +The traditional FreeBSD environment makes use of an init process to +bring the system up during the boot process, and to assist in shutdown. +A similar technique might be used for jail, in effect a jailinit, +formulated to handle the clean startup and shutdown, including calling +out to jail-local /etc/rc.shutdown, and other useful shutdown functions. +A jailinit would also present a central location for delivering +management requests to within a jail from the host environment, allowing +the host environment to request the shutdown of the jail cleanly, before +resorting to terminating processes, in the same style as the host +environment shutting down before killing all processes and halting the +kernel. +.PP +Improvements in the host environment would also assist in improving +jail management, possibly including automated runtime jail management tools, +tools to more easily construct the per-jail file system area, and +include jail shutdown as part of normal system shutdown. +.PP +These improvements in the jail framework would improve both raw +functionality and usability from a management perspective. +The jail code has raised significant interest in the FreeBSD community, +and it is hoped that this type of improved functionality will be +available in upcoming releases of FreeBSD. diff --git a/share/doc/papers/jail/implementation.ms b/share/doc/papers/jail/implementation.ms new file mode 100644 index 0000000..eafc8f2 --- /dev/null +++ b/share/doc/papers/jail/implementation.ms @@ -0,0 +1,126 @@ +.\" +.\" $FreeBSD$ +.\" +.NH +Implementation jail in the FreeBSD kernel. +.NH 2 +The jail(2) system call, allocation, refcounting and deallocation of +\fCstruct prison\fP. +.PP +The jail(2) system call is implemented as a non-optional system call +in FreeBSD. Other system calls are controlled by compile time options +in the kernel configuration file, but due to the minute footprint of +the jail implementation, it was decided to make it a standard +facility in FreeBSD. +.PP +The implementation of the system call is straightforward: a data structure +is allocated and populated with the arguments provided. The data structure +is attached to the current process' \fCstruct proc\fP, its reference count +set to one and a call to the +chroot(2) syscall implementation completes the task. +.PP +Hooks in the code implementing process creation and destruction maintains +the reference count on the data structure and free it when the last reference +is lost. +Any new process created by a process in a jail will inherit a reference +to the jail, which effectively puts the new process in the same jail. +.PP +There is no way to modify the contents of the data structure describing +the jail after its creation, and no way to attach a process to an existing +jail if it was not created from the inside that jail. +.NH 2 +Fortification of the chroot(2) facility for filesystem name scoping. +.PP +A number of ways to escape the confines of a chroot(2)-created subscope +of the filesystem view have been identified over the years. +chroot(2) was never intended to be security mechanism as such, but even +then the ftp daemon largely depended on the security provided by +chroot(2) to provide the ``anonymous ftp'' access method. +.PP +Three classes of escape routes existed: recursive chroot(2) escapes, +``..'' based escapes and fchdir(2) based escapes. +All of these exploited the fact that chroot(2) didn't try sufficiently +hard to enforce the new root directory. +.PP +New code were added to detect and thwart these escapes, amongst +other things by tracking the directory of the first level of chroot(2) +experienced by a process and refusing backwards traversal across +this directory, as well as additional code to refuse chroot(2) if +file-descriptors were open referencing directories. +.NH 2 +Restriction of process visibility and interaction. +.PP +A macro was already in available in the kernel to determine if one process +could affect another process. This macro did the rather complex checking +of uid and gid values. It was felt that the complexity of the macro were +approaching the lower edge of IOCCC entrance criteria, and it was therefore +converted to a proper function named \fCp_trespass(p1, p2)\fP which does +all the previous checks and additionally checks the jail aspect of the access. +The check is implemented such that access fails if the origin process is jailed +but the target process is not in the same jail. +.PP +Process visibility is provided through two mechanisms in FreeBSD, +the \fCprocfs\fP file system and a sub-tree of the \fCsysctl\fP tree. +Both of these were modified to report only the processes in the same +jail to a jailed process. +.NH 2 +Restriction to one IP number. +.PP +Restricting TCP and UDP access to just one IP number was done almost +entirely in the code which manages ``protocol control blocks''. +When a jailed process binds to a socket, the IP number provided by +the process will not be used, instead the pre-configured IP number of +the jail is used. +.PP +BSD based TCP/IP network stacks sport a special interface, the loop-back +interface, which has the ``magic'' IP number 127.0.0.1. +This is often used by processes to contact servers on the local machine, +and consequently special handling for jails were needed. +To handle this case it was necessary to also intercept and modify the +behaviour of connection establishment, and when the 127.0.0.1 address +were seen from a jailed process, substitute the jails configured IP number. +.PP +Finally the APIs through which the network configuration and connection +state may be queried were modified to report only information relevant +to the configured IP number of a jailed process. +.NH 2 +Adding jail awareness to selected device drivers. +.PP +A couple of device drivers needed to be taught about jails, the ``pty'' +driver is one of them. The pty driver provides ``virtual terminals'' to +services like telnet, ssh, rlogin and X11 terminal window programs. +Therefore jails need access to the pty driver, and code had to be added +to enforce that a particular virtual terminal were not accessed from more +than one jail at the same time. +.NH 2 +General restriction of super-users powers for jailed super-users. +.PP +This item proved to be the simplest but most tedious to implement. +Tedious because a manual review of all places where the kernel allowed +the super user special powers were called for, +simple because very few places were required to let a jailed root through. +Of the approximately 260 checks in the FreeBSD 4.0 kernel, only +about 35 will let a jailed root through. +.PP +Since the default is for jailed roots to not receive privilege, new +code or drivers in the FreeBSD kernel are automatically jail-aware: they +will refuse jailed roots privilege. +The other part of this protection comes from the fact that a jailed +root cannot create new device nodes with the mknod(2) systemcall, so +unless the machine administrator creates device nodes for a particular +device inside the jails filesystem tree, the driver in effect does +not exist in the jail. +.PP +As a side-effect of this work the suser(9) API were cleaned up and +extended to cater for not only the jail facility, but also to make room +for future partitioning facilities. +.NH 2 +Implementation statistics +.PP +The change of the suser(9) API modified approx 350 source lines +distributed over approx. 100 source files. The vast majority of +these changes were generated automatically with a script. +.PP +The implementation of the jail facility added approx 200 lines of +code in total, distributed over approx. 50 files. and about 200 lines +in two new kernel files. diff --git a/share/doc/papers/jail/jail01.eps b/share/doc/papers/jail/jail01.eps new file mode 100644 index 0000000..ffcfa30 --- /dev/null +++ b/share/doc/papers/jail/jail01.eps @@ -0,0 +1,234 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: jail01.eps +%%Creator: fig2dev Version 3.2 Patchlevel 1 +%%CreationDate: Fri Mar 24 20:37:59 2000 +%%For: $FreeBSD$ +%%Orientation: Portrait +%%BoundingBox: 0 0 425 250 +%%Pages: 0 +%%BeginSetup +%%EndSetup +%%Magnification: 1.0000 +%%EndComments +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end +save +-117.0 298.0 translate +1 -1 scale + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def +%%EndProlog + +$F2psBegin +10 setmiterlimit +n -1000 5962 m -1000 -1000 l 10022 -1000 l 10022 5962 l cp clip + 0.06000 0.06000 sc +/Courier-BoldOblique ff 180.00 scf sf +7725 3600 m +gs 1 -1 sc (10.0.0.2) dup sw pop neg 0 rm col0 sh gr +% Polyline +15.000 slw +n 9000 3300 m 9000 4275 l gs col0 s gr +% Polyline +2 slc +n 7875 3225 m 7800 3225 l gs col0 s gr +% Polyline +0 slc +n 7875 4125 m 7800 4125 l gs col0 s gr +% Polyline +n 7875 3225 m 7875 4425 l gs col0 s gr +% Polyline +n 7875 3825 m 7800 3825 l gs col0 s gr +% Polyline +n 7875 3525 m 7800 3525 l gs col0 s gr +% Polyline +n 8175 3825 m 7875 3825 l gs col0 s gr +% Polyline +2 slc +n 7875 4425 m 7800 4425 l gs col0 s gr +/Courier-Bold ff 180.00 scf sf +8700 3900 m +gs 1 -1 sc (fxp0) dup sw pop neg 0 rm col0 sh gr +% Polyline +0 slc +7.500 slw +n 2925 1425 m 3075 1425 l gs col0 s gr +% Polyline +15.000 slw +n 2475 1350 m 2472 1347 l 2465 1342 l 2453 1334 l 2438 1323 l 2420 1311 l + 2401 1299 l 2383 1289 l 2366 1281 l 2351 1275 l 2338 1274 l + 2325 1275 l 2314 1279 l 2303 1285 l 2291 1293 l 2278 1303 l + 2264 1314 l 2250 1326 l 2236 1339 l 2222 1353 l 2209 1366 l + 2198 1379 l 2188 1391 l 2181 1403 l 2177 1414 l 2175 1425 l + 2177 1436 l 2181 1447 l 2188 1459 l 2198 1471 l 2209 1484 l + 2222 1497 l 2236 1511 l 2250 1524 l 2264 1536 l 2278 1547 l + 2291 1557 l 2303 1565 l 2314 1571 l 2325 1575 l 2338 1576 l + 2351 1575 l 2366 1569 l 2383 1561 l 2401 1551 l 2420 1539 l + 2438 1527 l 2453 1516 l 2465 1508 l 2472 1503 l 2475 1500 l gs col0 s gr +/Courier-Bold ff 180.00 scf sf +2550 1500 m +gs 1 -1 sc (lo0) col0 sh gr +/Courier-BoldOblique ff 180.00 scf sf +3075 1500 m +gs 1 -1 sc (127.0.0.1) col0 sh gr +% Polyline +7.500 slw +n 2100 3525 m 2250 3525 l gs col0 s gr +% Polyline +n 2550 2100 m 2250 2400 l 2250 4500 l 2550 4800 l gs col0 s gr +/Courier-Bold ff 180.00 scf sf +1950 3600 m +gs 1 -1 sc (/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 3900 m +gs 1 -1 sc (jail_1/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 4200 m +gs 1 -1 sc (jail_2/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 4500 m +gs 1 -1 sc (jail_3/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 2400 m +gs 1 -1 sc (dev/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 2700 m +gs 1 -1 sc (etc/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 3000 m +gs 1 -1 sc (usr/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 3300 m +gs 1 -1 sc (var/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +2550 3600 m +gs 1 -1 sc (home/) col0 sh gr +% Polyline +n 3375 3825 m 3900 3825 l 4950 1800 l 5100 1800 l gs col0 s gr +% Polyline +n 3375 4125 m 3900 4125 l 4950 3900 l 5100 3900 l gs col0 s gr +% Polyline +n 5400 900 m 5100 1200 l 5100 2400 l 5400 2700 l gs col0 s gr +% Polyline +n 5400 3000 m 5100 3300 l 5100 4500 l 5400 4800 l gs col0 s gr +% Polyline +n 4650 825 m 4650 2775 l 6675 2775 l 6675 3375 l 7950 3375 l 7950 825 l + cp gs col0 s gr +% Polyline +n 4650 2775 m 4650 4950 l 6300 4950 l 6300 3675 l 7950 3675 l 7950 3375 l + 6675 3375 l 6675 2775 l cp gs col0 s gr +/Courier-Bold ff 180.00 scf sf +5400 1200 m +gs 1 -1 sc (dev/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 1500 m +gs 1 -1 sc (etc/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 1800 m +gs 1 -1 sc (usr/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 2100 m +gs 1 -1 sc (var/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 2400 m +gs 1 -1 sc (home/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 3300 m +gs 1 -1 sc (dev/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 3600 m +gs 1 -1 sc (etc/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 3900 m +gs 1 -1 sc (usr/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 4200 m +gs 1 -1 sc (var/) col0 sh gr +/Courier-Bold ff 180.00 scf sf +5400 4500 m +gs 1 -1 sc (home/) col0 sh gr +/Courier-BoldOblique ff 180.00 scf sf +7725 3300 m +gs 1 -1 sc (10.0.0.1) dup sw pop neg 0 rm col0 sh gr +/Courier-BoldOblique ff 180.00 scf sf +7725 4500 m +gs 1 -1 sc (10.0.0.5) dup sw pop neg 0 rm col0 sh gr +/Courier-BoldOblique ff 180.00 scf sf +7725 4200 m +gs 1 -1 sc (10.0.0.4) dup sw pop neg 0 rm col0 sh gr +/Courier-BoldOblique ff 180.00 scf sf +7725 3900 m +gs 1 -1 sc (10.0.0.3) dup sw pop neg 0 rm col0 sh gr +% Polyline +15.000 slw +n 9000 3825 m 8775 3825 l gs col0 s gr +$F2psEnd +rs diff --git a/share/doc/papers/jail/jail01.fig b/share/doc/papers/jail/jail01.fig new file mode 100644 index 0000000..d4ef165 --- /dev/null +++ b/share/doc/papers/jail/jail01.fig @@ -0,0 +1,86 @@ +#FIG 3.2 +# $FreeBSD$ +Landscape +Center +Inches +A4 +100.00 +Single +-2 +1200 2 +6 7725 3150 9075 4500 +6 8700 3225 9075 4350 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 9000 3825 8775 3825 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 9000 3300 9000 4275 +-6 +2 1 0 2 0 7 100 0 -1 0.000 0 2 -1 0 0 2 + 7875 3225 7800 3225 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 7875 4125 7800 4125 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 7875 3225 7875 4425 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 7875 3825 7800 3825 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 7875 3525 7800 3525 +2 1 0 2 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 8175 3825 7875 3825 +2 1 0 2 0 7 100 0 -1 0.000 0 2 -1 0 0 2 + 7875 4425 7800 4425 +4 2 0 100 0 14 12 0.0000 4 180 420 8700 3900 fxp0\001 +-6 +6 2100 1200 4050 1650 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 2925 1425 3075 1425 +3 2 0 2 0 7 100 0 -1 0.000 0 0 0 5 + 2475 1350 2325 1275 2175 1425 2325 1575 2475 1500 + 0.000 -1.000 -1.000 -1.000 0.000 +4 0 0 100 0 14 12 0.0000 4 135 315 2550 1500 lo0\001 +4 0 0 100 0 15 12 0.0000 4 135 945 3075 1500 127.0.0.1\001 +-6 +6 1950 2100 3300 4800 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 2 + 2100 3525 2250 3525 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4 + 2550 2100 2250 2400 2250 4500 2550 4800 +4 0 0 100 0 14 12 0.0000 4 150 105 1950 3600 /\001 +4 0 0 100 0 14 12 0.0000 4 180 735 2550 3900 jail_1/\001 +4 0 0 100 0 14 12 0.0000 4 180 735 2550 4200 jail_2/\001 +4 0 0 100 0 14 12 0.0000 4 180 735 2550 4500 jail_3/\001 +4 0 0 100 0 14 12 0.0000 4 165 420 2550 2400 dev/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 2550 2700 etc/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 2550 3000 usr/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 2550 3300 var/\001 +4 0 0 100 0 14 12 0.0000 4 165 525 2550 3600 home/\001 +-6 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4 + 3375 3825 3900 3825 4950 1800 5100 1800 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4 + 3375 4125 3900 4125 4950 3900 5100 3900 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4 + 5400 900 5100 1200 5100 2400 5400 2700 +2 1 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 4 + 5400 3000 5100 3300 5100 4500 5400 4800 +2 3 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 7 + 4650 825 4650 2775 6675 2775 6675 3375 7950 3375 7950 825 + 4650 825 +2 3 0 1 0 7 100 0 -1 0.000 0 0 -1 0 0 9 + 4650 2775 4650 4950 6300 4950 6300 3675 7950 3675 7950 3375 + 6675 3375 6675 2775 4650 2775 +4 0 0 100 0 14 12 0.0000 4 165 420 5400 1200 dev/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 5400 1500 etc/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 5400 1800 usr/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 5400 2100 var/\001 +4 0 0 100 0 14 12 0.0000 4 165 525 5400 2400 home/\001 +4 0 0 100 0 14 12 0.0000 4 165 420 5400 3300 dev/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 5400 3600 etc/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 5400 3900 usr/\001 +4 0 0 100 0 14 12 0.0000 4 150 420 5400 4200 var/\001 +4 0 0 100 0 14 12 0.0000 4 165 525 5400 4500 home/\001 +4 2 0 100 0 15 12 0.0000 4 135 840 7725 3300 10.0.0.1\001 +4 2 0 100 0 15 12 0.0000 4 135 840 7725 4500 10.0.0.5\001 +4 2 0 100 0 15 12 0.0000 4 135 840 7725 4200 10.0.0.4\001 +4 2 0 100 0 15 12 0.0000 4 135 840 7725 3900 10.0.0.3\001 +4 2 0 100 0 15 12 0.0000 4 135 840 7725 3600 10.0.0.2\001 diff --git a/share/doc/papers/jail/mgt.ms b/share/doc/papers/jail/mgt.ms new file mode 100644 index 0000000..e2835d7 --- /dev/null +++ b/share/doc/papers/jail/mgt.ms @@ -0,0 +1,218 @@ +.\" +.\" $FreeBSD$ +.\" +.NH +Managing Jails and the Jail File System Environment +.NH 2 +Creating a Jail Environment +.PP +While the jail(2) call could be used in a number of ways, the expected +configuration creates a complete FreeBSD installation for each jail. +This includes copies of all relevant system binaries, data files, and its +own \fC/etc\fP directory. +Such a configuration maximises the independence of various jails, +and reduces the chances of interference between jails being possible, +especially when it is desirable to provide root access within a jail to +a less trusted user. +.PP +On a box making use of the jail facility, we refer to two types of +environment: the host environment, and the jail environment. +The host environment is the real operating system environment, which is +used to configure interfaces, and start up the jails. +There are then one or more jail environments, effectively virtual +FreeBSD machines. +When configuring Jail for use, it is necessary to configure both the +host and jail environments to prevent overlap. +.PP +As jailed virtual machines are generally bound to an IP address configured +using the normal IP alias mechanism, those jail IP addresses are also +accessible to host environment applications to use. +If the accessibility of some host applications in the jail environment is +not desirable, it is necessary to configure those applications to only +listen on appropriate addresses. +.PP +In most of the production environments where jail is currently in use, +one IP address is allocated to the host environment, and then a number +are allocated to jail boxes, with each jail box receiving a unique IP. +In this situation, it is sufficient to configure the networking applications +on the host to listen only on the host IP. +Generally, this consists of specifying the appropriate IP address to be +used by inetd and SSH, and disabling applications that are not capable +of limiting their address scope, such as sendmail, the port mapper, and +syslogd. +Other third party applications that have been installed on the host must also be +configured in this manner, or users connecting to the jailbox will +discover the host environment service, unless the jailbox has +specifically bound a service to that port. +In some situations, this can actually be the desirable behaviour. +.PP +The jail environments must also be custom-configured. +This consists of building and installing a miniature version of the +FreeBSD file system tree off of a subdirectory in the host environment, +usually \fC/usr/jail\fP, or \fC/data/jail\fP, with a subdirectory per jail. +Appropriate instructions for generating this tree are included in the +jail(8) man page, but generally this process may be automated using the +FreeBSD build environment. +.PP +One notable difference from the default FreeBSD install is that only +a limited set of device nodes should be created. +MAKEDEV(8) has been modified to accept a ``jail'' argument that creates +the correct set of nodes. +.PP +To improve storage efficiency, a fair number of the binaries in the system tree +may be deleted, as they are not relevant in a jail environment. +This includes the kernel, boot loader, and related files, as well as +hardware and network configuration tools. +.PP +After the creation of the jail tree, the easiest way to configure it is +to start up the jail in single-user mode. +The sysinstall admin tool may be used to help with the task, although +it is not installed by default as part of the system tree. +These tools should be run in the jail environment, or they will affect +the host environment's configuration. +.DS +.ft C +.ps -2 +# mkdir /data/jail/192.168.11.100/stand +# cp /stand/sysinstall /data/jail/192.168.11.100/stand +# jail /data/jail/192.168.11.100 testhostname 192.168.11.100 \e + /bin/sh +.ps +2 +.R +.DE +.PP +After running the jail command, the shell is now within the jail environment, +and all further commands +will be limited to the scope of the jail until the shell exits. +If the network alias has not yet been configured, then the jail will be +unable to access the network. +.PP +The startup configuration of the jail environment may be configured so +as to quell warnings from services that cannot run in the jail. +Also, any per-system configuration required for a normal FreeBSD system +is also required for each jailbox. +Typically, this includes: +.IP "" 5n +\(bu Create empty /etc/fstab +.IP +\(bu Disable portmapper +.IP +\(bu Run newaliases +.IP +\(bu Disabling interface configuration +.IP +\(bu Configure the resolver +.IP +\(bu Set root password +.IP +\(bu Set timezone +.IP +\(bu Add any local accounts +.IP +\(bu Install any packets +.NH 2 +Starting Jails +.PP +Jails are typically started by executing their /etc/rc script in much +the same manner a shell was started in the previous section. +Before starting the jail, any relevant networking configuration +should also be performed. +Typically, this involves adding an additional IP address to the +appropriate network interface, setting network properties for the +IP address using IP filtering, forwarding, and bandwidth shaping, +and mounting a process file system for the jail, if the ability to +debug processes from within the jail is desired. +.DS +.ft C +.ps -2 +# ifconfig ed0 inet add 192.168.11.100 netmask 255.255.255.255 +# mount -t procfs proc /data/jail/192.168.11.100/proc +# jail /data/jail/192.168.11.100 testhostname 192.168.11.100 \e + /bin/sh /etc/rc +.ps +2 +.ft P +.DE +.PP +A few warnings are generated for sysctl's that are not permitted +to be set within the jail, but the end result is a set of processes +in an isolated process environment, bound to a single IP address. +Normal procedures for accessing a FreeBSD machine apply: telneting in +through the network reveals a telnet prompt, login, and shell. +.DS +.ft C +.ps -2 +% ps ax + PID TT STAT TIME COMMAND + 228 ?? SsJ 0:18.73 syslogd + 247 ?? IsJ 0:00.05 inetd -wW + 249 ?? IsJ 0:28.43 cron + 252 ?? SsJ 0:30.46 sendmail: accepting connections on port 25 + 291 ?? IsJ 0:38.53 /usr/local/sbin/sshd +93694 ?? SJ 0:01.01 sshd: rwatson@ttyp0 (sshd) +93695 p0 SsJ 0:00.06 -csh (csh) +93700 p0 R+J 0:00.00 ps ax +.ps +2 +.ft P +.DE +.PP +It is immediately obvious that the environment is within a jailbox: there +is no init process, no kernel daemons, and a J flag is present beside all +processes indicating the presence of a jail. +.PP +As with any FreeBSD system, accounts may be created and deleted, +mail is delivered, logs are generated, packages may be added, and the +system may be hacked into if configured incorrectly, or running a buggy +version of a piece of software. +However, all of this happens strictly within the scope of the jail. +.NH 2 +Jail Management +.PP +Jail management is an interesting prospect, as there are two perspectives +from which a jail environment may be administered: from within the jail, +and from the host environment. +From within the jail, as described above, the process is remarkably similar +to any regular FreeBSD install, although certain actions are prohibited, +such as mounting file systems, modifying system kernel properties, etc. +The only area that really differs are that of shutting +the system down: the processes within the jail may deliver signals +between them, allowing all processes to be killed, but bringing the +system back up requires intervention from outside of the jailbox. +.PP +From outside of the jail, there are a range of capabilities, as well +as limitations. +The jail environment is, in effect, a subset of the host environment: +the jail file system appears as part of the host file system, and may +be directly modified by processes in the host environment. +Processes within the jail appear in the process listing of the host, +and may likewise be signalled or debugged. +The host process file system makes the hostname of the jail environment +accessible in /proc/procnum/status, allowing utilities in the host +environment to manage processes based on jailname. +However, the default configuration allows privileged processes within +jails to set the hostname of the jail, which makes the status file less +useful from a management perspective if the contents of the jail are +malicious. +To prevent a jail from changing its hostname, the +"security.jail.set_hostname_allowed" sysctl may be set to 0 prior to +starting any jails. +.PP +One aspect immediately observable in an environment with multiple jails +is that uids and gids are local to each jail environment: the uid associated +with a process in one jail may be for a different user than in another +jail. +This collision of identifiers is only visible in the host environment, +as normally processes from one jail are never visible in an environment +with another scope for user/uid and group/gid mapping. +Managers in the host environment should understand these scoping issues, +or confusion and unintended consequences may result. +.PP +Jailed processes are subject to the normal restrictions present for +any processes, including resource limits, and limits placed by the network +code, including firewall rules. +By specifying firewall rules for the IP address bound to a jail, it is +possible to place connectivity and bandwidth limitations on individual +jails, restricting services that may be consumed or offered. +.PP +Management of jails is an area that will see further improvement in +future versions of FreeBSD. Some of these potential improvements are +discussed later in this paper. diff --git a/share/doc/papers/jail/paper.ms b/share/doc/papers/jail/paper.ms new file mode 100644 index 0000000..60be9f2 --- /dev/null +++ b/share/doc/papers/jail/paper.ms @@ -0,0 +1,438 @@ +.\" +.\" $FreeBSD$ +.\" +.if n .ftr C R +.ig TL +.ds CH " +.nr PI 2n +.nr PS 12 +.nr LL 15c +.nr PO 3c +.nr FM 3.5c +.po 3c +.TL +Jails: Confining the omnipotent root. +.FS +This paper was presented at the 2nd International System Administration and Networking Conference "SANE 2000" May 22-25, 2000 in Maastricht, The Netherlands and is published in the proceedings. +.FE +.AU +Poul-Henning Kamp <phk@FreeBSD.org> +.AU +Robert N. M. Watson <rwatson@FreeBSD.org> +.AI +The FreeBSD Project +.FS +This work was sponsored by \fChttp://www.servetheweb.com/\fP and +donated to the FreeBSD Project for inclusion in the FreeBSD +OS. FreeBSD 4.0-RELEASE was the first release including this +code. +Follow-on work was sponsored by Safeport Network Services, +\fChttp://www.safeport.com/\fP +.FE +.AB +The traditional UNIX security model is simple but inexpressive. +Adding fine-grained access control improves the expressiveness, +but often dramatically increases both the cost of system management +and implementation complexity. +In environments with a more complex management model, with delegation +of some management functions to parties under varying degrees of trust, +the base UNIX model and most natural +extensions are inappropriate at best. +Where multiple mutually un-trusting parties are introduced, +``inappropriate'' rapidly transitions to ``nightmarish'', especially +with regards to data integrity and privacy protection. +.PP +The FreeBSD ``Jail'' facility provides the ability to partition +the operating system environment, while maintaining the simplicity +of the UNIX ``root'' model. +In Jail, users with privilege find that the scope of their requests +is limited to the jail, allowing system administrators to delegate +management capabilities for each virtual machine +environment. +Creating virtual machines in this manner has many potential uses; the +most popular thus far has been for providing virtual machine services +in Internet Service Provider environments. +.AE +.NH +Introduction +.PP +The UNIX access control mechanism is designed for an environment with two +types of users: those with, and without administrative privilege. +Within this framework, every attempt is made to provide an open +system, allowing easy sharing of files and inter-process communication. +As a member of the UNIX family, FreeBSD inherits these +security properties. +Users of FreeBSD in non-traditional UNIX environments must balance +their need for strong application support, high network performance +and functionality, and low total cost of ownership with the need +for alternative security models that are difficult or impossible to +implement with the UNIX security mechanisms. +.PP +One such consideration is the desire to delegate some (but not all) +administrative functions to untrusted or less trusted parties, and +simultaneously impose system-wide mandatory policies on process +interaction and sharing. +Attempting to create such an environment in the current-day FreeBSD +security environment is both difficult and costly: in many cases, +the burden of implementing these policies falls on user +applications, which means an increase in the size and complexity +of the code base, in turn translating to higher development +and maintenance cost, as well as less overall flexibility. +.PP +This abstract risk becomes more clear when applied to a practical, +real-world example: +many web service providers turn to the FreeBSD +operating system to host customer web sites, as it provides a +high-performance, network-centric server environment. +However, these providers have a number of concerns on their plate, both in +terms of protecting the integrity and confidentiality of their own +files and services from their customers, as well as protecting the files +and services of one customer from (accidental or +intentional) access by any other customer. +At the same time, a provider would like to provide +substantial autonomy to customers, allowing them to install and +maintain their own software, and to manage their own services, +such as web servers and other content-related daemon programs. +.PP +This problem space points strongly in the direction of a partitioning +solution, in which customer processes and storage are isolated from those of +other customers, both in terms of accidental disclosure of data or process +information, but also in terms of the ability to modify files or processes +outside of a compartment. +Delegation of management functions within the system must +be possible, but not at the cost of system-wide requirements, including +integrity and privacy protection between partitions. +.PP +However, UNIX-style access control makes it notoriously difficult to +compartmentalise functionality. +While mechanisms such as chroot(2) provide a modest +level compartmentalisation, it is well known +that these mechanisms have serious shortcomings, both in terms of the +scope of their functionality, and effectiveness at what they provide \s-2[CHROOT]\s+2. +.PP +In the case of the chroot(2) call, a process's visibility of +the file system name-space is limited to a single subtree. +However, the compartmentalisation does not extend to the process +or networking spaces and therefore both observation of and interference +with processes outside their compartment is possible. +.PP +To this end, we describe the new FreeBSD ``Jail'' facility, which +provides a strong partitioning solution, leveraging existing +mechanisms, such as chroot(2), to what effectively amounts to a +virtual machine environment. Processes in a jail are provided +full access to the files that they may manipulate, processes they +may influence, and network services they can make use of, and neither +access nor visibility of files, processes or network services outside +their partition. +.PP +Unlike other fine-grained security solutions, Jail does not +substantially increase the policy management requirements for the +system administrator, as each Jail is a virtual FreeBSD environment +permitting local policy to be independently managed, with much the +same properties as the main system itself, making Jail easy to use +for the administrator, and far more compatible with applications. +.NH +Traditional UNIX Security, or, ``God, root, what difference?" \s-2[UF]\s+2. +.PP +The traditional UNIX access model assigns numeric uids to each user of the +system. In turn, each process ``owned'' by a user will be tagged with that +user's uid in an unforgeable manner. The uids serve two purposes: first, +they determine how discretionary access control mechanisms will be applied, and +second, they are used to determine whether special privileges are accorded. +.PP +In the case of discretionary access controls, the primary object protected is +a file. The uid (and related gids indicating group membership) are mapped to +a set of rights for each object, courtesy the UNIX file mode, in effect acting +as a limited form of access control list. Jail is, in general, not concerned +with modifying the semantics of discretionary access control mechanisms, +although there are important implications from a management perspective. +.PP +For the purposes of determining whether special privileges are accorded to a +process, the check is simple: ``is the numeric uid equal to 0 ?''. +If so, the +process is acting with ``super-user privileges'', and all access checks are +granted, in effect allowing the process the ability to do whatever it wants +to \**. +.FS +\&... no matter how patently stupid it may be. +.FE +.PP +For the purposes of human convenience, uid 0 is canonically allocated +to the ``root'' user \s-2[ROOT]\s+2. +For the purposes of jail, this behaviour is extremely relevant: many of +these privileged operations can be used to manage system hardware and +configuration, file system name-space, and special network operations. +.PP +Many limitations to this model are immediately clear: the root user is a +single, concentrated source of privilege that is exposed to many pieces of +software, and as such an immediate target for attacks. In the event of a +compromise of the root capability set, the attacker has complete control over +the system. Even without an attacker, the risks of a single administrative +account are serious: delegating a narrow scope of capability to an +inexperienced administrator is difficult, as the granularity of delegation is +that of all system management abilities. These features make the omnipotent +root account a sharp, efficient and extremely dangerous tool. +.PP +The BSD family of operating systems have implemented the ``securelevel'' +mechanism which allows the administrator to block certain configuration +and management functions from being performed by root, +until the system is restarted and brought up into single-user mode. +While this does provide some amount of protection in the case of a root +compromise of the machine, it does nothing to address the need for +delegation of certain root abilities. +.NH +Other Solutions to the Root Problem +.PP +Many operating systems attempt to address these limitations by providing +fine-grained access controls for system resources \s-2[BIBA]\s+2. +These efforts vary in +degrees of success, but almost all suffer from at least three serious +limitations: +.PP +First, increasing the granularity of security controls increases the +complexity of the administration process, in turn increasing both the +opportunity for incorrect configuration, as well as the demand on +administrator time and resources. In many cases, the increased complexity +results in significant frustration for the administrator, which may result +in two +disastrous types of policy: ``all doors open as it's too much trouble'', and +``trust that the system is secure, when in fact it isn't''. +.PP +The extent of the trouble is best illustrated by the fact that an entire +niche industry has emerged providing tools to manage fine grained security +controls \s-2[UAS]\s+2. +.PP +Second, usefully segregating capabilities and assigning them to running code +and users is very difficult. Many privileged operations in UNIX seem +independent, but are in fact closely related, and the handing out of one +privilege may, in effect, be transitive to the many others. For example, in +some trusted operating systems, a system capability may be assigned to a +running process to allow it to read any file, for the purposes of backup. +However, this capability is, in effect, equivalent to the ability to switch to +any other account, as the ability to access any file provides access to system +keying material, which in turn provides the ability to authenticate as any +user. Similarly, many operating systems attempt to segregate management +capabilities from auditing capabilities. In a number of these operating +systems, however, ``management capabilities'' permit the administrator to +assign ``auditing capabilities'' to itself, or another account, circumventing +the segregation of capability. +.PP +Finally, introducing new security features often involves introducing new +security management APIs. When fine-grained capabilities are introduced to +replace the setuid mechanism in UNIX-like operating systems, applications that +previously did an ``appropriateness check'' to see if they were running as +root before executing must now be changed to know that they need not run as +root. In the case of applications running with privilege and executing other +programs, there is now a new set of privileges that must be voluntarily given +up before executing another program. These change can introduce significant +incompatibility for existing applications, and make life more difficult for +application developers who may not be aware of differing security semantics on +different systems \s-2[POSIX1e]\s+2. +.NH +The Jail Partitioning Solution +.PP +Jail neatly side-steps the majority of these problems through partitioning. +Rather +than introduce additional fine-grained access control mechanism, we partition +a FreeBSD environment (processes, file system, network resources) into a +management environment, and optionally subset Jail environments. In doing so, +we simultaneously maintain the existing UNIX security model, allowing +multiple users and a privileged root user in each jail, while +limiting the scope of root's activities to his jail. +Consequently the administrator of a +FreeBSD machine can partition the machine into separate jails, and provide +access to the super-user account in each of these without losing control of +the over-all environment. +.PP +A process in a partition is referred to as ``in jail''. When a FreeBSD +system is booted up after a fresh install, no processes will be in jail. +When +a process is placed in a jail, it, and any descendents of the process created +after the jail creation, will be in that jail. A process may be in only one +jail, and after creation, it can not leave the jail. +Jails are created when a +privileged process calls the jail(2) syscall, with a description of the jail as an +argument to the call. Each call to jail(2) creates a new jail; the only way +for a new process to enter the jail is by inheriting access to the jail from +another process already in that jail. +Processes may never +leave the jail they created, or were created in. +.KF +.if t .PSPIC jail01.eps 4i +.ce 1 +Fig. 1 \(em Schematic diagram of machine with two configured jails +.sp +.KE +.PP +Membership in a jail involves a number of restrictions: access to the file +name-space is restricted in the style of chroot(2), the ability to bind network +resources is limited to a specific IP address, the ability to manipulate +system resources and perform privileged operations is sharply curtailed, and +the ability to interact with other processes is limited to only processes +inside the same jail. +.PP +Jail takes advantage of the existing chroot(2) behaviour to limit access to the +file system name-space for jailed processes. When a jail is created, it is +bound to a particular file system root. +Processes are unable to manipulate files that they cannot address, +and as such the integrity and confidentiality of files outside of the jail +file system root are protected. Traditional mechanisms for breaking out of +chroot(2) have been blocked. +In the expected and documented configuration, each jail is provided +with its exclusive file system root, and standard FreeBSD directory layout, +but this is not mandated by the implementation. +.PP +Each jail is bound to a single IP address: processes within the jail may not +make use of any other IP address for outgoing or incoming connections; this +includes the ability to restrict what network services a particular jail may +offer. As FreeBSD distinguishes attempts to bind all IP addresses from +attempts to bind a particular address, bind requests for all IP addresses are +redirected to the individual Jail address. Some network functionality +associated with privileged calls are wholesale disabled due to the nature of the +functionality offered, in particular facilities which would allow ``spoofing'' +of IP numbers or disruptive traffic to be generated have been disabled. +.PP +Processes running without root privileges will notice few, if any differences +between a jailed environment or un-jailed environment. Processes running with +root privileges will find that many restrictions apply to the privileged calls +they may make. Some calls will now return an access error \(em for example, an +attempt to create a device node will now fail. Others will have a more +limited scope than normal \(em attempts to bind a reserved port number on all +available addresses will result in binding only the address associated with +the jail. Other calls will succeed as normal: root may read a file owned by +any uid, as long as it is accessible through the jail file system name-space. +.PP +Processes within the jail will find that they are unable to interact or +even verify the existence of +processes outside the jail \(em processes within the jail are +prevented from delivering signals to processes outside the jail, as well as +connecting to those processes with debuggers, or even see them in the +sysctl or process file system monitoring mechanisms. Jail does not prevent, +nor is it intended to prevent, the use of covert channels or communications +mechanisms via accepted interfaces \(em for example, two processes may communicate +via sockets over the IP network interface. Nor does it attempt to provide +scheduling services based on the partition; however, it does prevent calls +that interfere with normal process operation. +.PP +As a result of these attempts to retain the standard FreeBSD API and +framework, almost all applications will run unaffected. Standard system +services such as Telnet, FTP, and SSH all behave normally, as do most third +party applications, including the popular Apache web server. +.NH +Jail Implementation +.PP +Processes running with root privileges in the jail find that there are serious +restrictions on what it is capable of doing \(em in particular, activities that +would extend outside of the jail: +.IP "" 5n +\(bu Modifying the running kernel by direct access and loading kernel +modules is prohibited. +.IP +\(bu Modifying any of the network configuration, interfaces, addresses, and +routing table is prohibited. +.IP +\(bu Mounting and unmounting file systems is prohibited. +.IP +\(bu Creating device nodes is prohibited. +.IP +\(bu Accessing raw, divert, or routing sockets is prohibited. +.IP +\(bu Modifying kernel runtime parameters, such as most sysctl settings, is +prohibited. +.IP +\(bu Changing securelevel-related file flags is prohibited. +.IP +\(bu Accessing network resources not associated with the jail is prohibited. +.PP +Other privileged activities are permitted as long as they are limited to the +scope of the jail: +.IP "" 5n +\(bu Signalling any process within the jail is permitted. +.IP +\(bu Changing the ownership and mode of any file within the jail is permitted, as +long as the file flags permit this. +.IP +\(bu Deleting any file within the jail is permitted, as long as the file flags +permit this. +.IP +\(bu Binding reserved TCP and UDP port numbers on the jails IP address is +permitted. (Attempts to bind TCP and UDP ports using INADDR_ANY will be +redirected to the jails IP address.) +.IP +\(bu Functions which operate on the uid/gid space are all permitted since they +act as labels for filesystem objects of proceses +which are partitioned off by other mechanisms. +.PP +These restrictions on root access limit the scope of root processes, enabling +most applications to run un-hindered, but preventing calls that might allow an +application to reach beyond the jail and influence other processes or +system-wide configuration. +.PP +.so implementation.ms +.so mgt.ms +.so future.ms +.NH +Conclusion +.PP +The jail facility provides FreeBSD with a conceptually simple security +partitioning mechanism, allowing the delegation of administrative rights +within virtual machine partitions. +.PP +The implementation relies on +restricting access within the jail environment to a well-defined subset +of the overall host environment. This includes limiting interaction +between processes, and to files, network resources, and privileged +operations. Administrative overhead is reduced through avoiding +fine-grained access control mechanisms, and maintaining a consistent +administrative interface across partitions and the host environment. +.PP +The jail facility has already seen widespread deployment in particular as +a vehicle for delivering "virtual private server" services. +.PP +The jail code is included in the base system as part of FreeBSD 4.0-RELEASE, +and fully documented in the jail(2) and jail(8) man-pages. +.bp +.SH +Notes & References +.IP \s-2[BIBA]\s+2 .5i +K. J. Biba, Integrity Considerations for Secure +Computer Systems, USAF Electronic Systems Division, 1977 +.IP \s-2[CHROOT]\s+2 .5i +Dr. Marshall Kirk Mckusick, private communication: +``According to the SCCS logs, the chroot call was added by Bill Joy +on March 18, 1982 approximately 1.5 years before 4.2BSD was released. +That was well before we had ftp servers of any sort (ftp did not +show up in the source tree until January 1983). My best guess as +to its purpose was to allow Bill to chroot into the /4.2BSD build +directory and build a system using only the files, include files, +etc contained in that tree. That was the only use of chroot that +I remember from the early days.'' +.IP \s-2[LOTTERY1]\s+2 .5i +David Petrou and John Milford. Proportional-Share Scheduling: +Implementation and Evaluation in a Widely-Deployed Operating System, +December 1997. +.nf +\s-2\fChttp://www.cs.cmu.edu/~dpetrou/papers/freebsd_lottery_writeup98.ps\fP\s+2 +\s-2\fChttp://www.cs.cmu.edu/~dpetrou/code/freebsd_lottery_code.tar.gz\fP\s+2 +.IP \s-2[LOTTERY2]\s+2 .5i +Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management, Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI '94), pages 1-11, Monterey, California, November 1994. +.nf +\s-2\fChttp://www.research.digital.com/SRC/personal/caw/papers.html\fP\s+2 +.IP \s-2[POSIX1e]\s+2 .5i +Draft Standard for Information Technology \(em +Portable Operating System Interface (POSIX) \(em +Part 1: System Application Program Interface (API) \(em Amendment: +Protection, Audit and Control Interfaces [C Language] +IEEE Std 1003.1e Draft 17 Editor Casey Schaufler +.IP \s-2[ROOT]\s+2 .5i +Historically other names have been used at times, Zilog for instance +called the super-user account ``zeus''. +.IP \s-2[UAS]\s+2 .5i +One such niche product is the ``UAS'' system to maintain and audit +RACF configurations on MVS systems. +.nf +\s-2\fChttp://www.entactinfo.com/products/uas/\fP\s+2 +.IP \s-2[UF]\s+2 .5i +Quote from the User-Friendly cartoon by Illiad. +.nf +\s-2\fChttp://www.userfriendly.org/cartoons/archives/98nov/19981111.html\fP\s+2 diff --git a/share/doc/papers/kernmalloc/Makefile b/share/doc/papers/kernmalloc/Makefile new file mode 100644 index 0000000..e706f0a --- /dev/null +++ b/share/doc/papers/kernmalloc/Makefile @@ -0,0 +1,18 @@ +# From: @(#)Makefile 1.8 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= kernmalloc +SRCS= kernmalloc.t appendix.ms +EXTRA= alloc.fig usage.tbl +MACROS= -ms +USE_EQN= +USE_PIC= +USE_SOELIM= +USE_TBL= +CLEANFILES= appendix.ms + +appendix.ms: appendix.t + ${GRIND} < ${.ALLSRC} > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/papers/kernmalloc/alloc.fig b/share/doc/papers/kernmalloc/alloc.fig new file mode 100644 index 0000000..1ef260b --- /dev/null +++ b/share/doc/papers/kernmalloc/alloc.fig @@ -0,0 +1,115 @@ +.\" Copyright (c) 1988 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)alloc.fig 5.1 (Berkeley) 4/16/91 +.\" +.PS +scale=100 +define m0 | +[ box invis ht 16 wid 32 with .sw at 0,0 +line from 4,12 to 4,4 +line from 8,12 to 8,4 +line from 12,12 to 12,4 +line from 16,12 to 16,4 +line from 20,12 to 20,4 +line from 24,12 to 24,4 +line from 28,12 to 28,4 +line from 0,16 to 0,0 +line from 0,8 to 32,8 +] | + +define m1 | +[ box invis ht 16 wid 32 with .sw at 0,0 +line from 8,12 to 8,4 +line from 16,12 to 16,4 +line from 24,12 to 24,4 +line from 0,8 to 32,8 +line from 0,16 to 0,0 +] | + +define m2 | +[ box invis ht 16 wid 32 with .sw at 0,0 +line from 0,8 to 32,8 +line from 0,16 to 0,0 +] | + +define m3 | +[ box invis ht 16 wid 31 with .sw at 0,0 +line from 15,12 to 15,4 +line from 0,8 to 31,8 +line from 0,16 to 0,0 +] | + +box invis ht 212 wid 580 with .sw at 0,0 +"\f1\s10\&kernel memory pages\f1\s0" at 168,204 +"\f1\s10\&Legend:\f1\s0" at 36,144 +"\f1\s10\&cont \- continuation of previous page\f1\s0" at 28,112 ljust +"\f1\s10\&free \- unused page\f1\s0" at 28,128 ljust +"\f1\s10\&Usage:\f1\s0" at 34,87 +"\f1\s10\&memsize(addr)\f1\s0" at 36,71 ljust +"\f1\s10\&char *addr;\f1\s0" at 66,56 ljust +"\f1\s10\&{\f1\s0" at 36,43 ljust +"\f1\s10\&return(kmemsizes[(addr \- kmembase) \- \s-1PAGESIZE\s+1]);\f1" at 66,29 ljust +"\f1\s10\&}\f1\s0" at 36,8 ljust +line from 548,192 to 548,176 +line from 548,184 to 580,184 dotted +"\f1\s10\&1024,\f1\s0" at 116,168 +"\f1\s10\&256,\f1\s0" at 148,168 +"\f1\s10\&512,\f1\s0" at 180,168 +"\f1\s10\&3072,\f1\s0" at 212,168 +"\f1\s10\&cont,\f1\s0" at 276,168 +"\f1\s10\&cont,\f1\s0" at 244,168 +"\f1\s10\&128,\f1\s0" at 308,168 +"\f1\s10\&128,\f1\s0" at 340,168 +"\f1\s10\&free,\f1\s0" at 372,168 +"\f1\s10\&cont,\f1\s0" at 404,168 +"\f1\s10\&128,\f1\s0" at 436,168 +"\f1\s10\&1024,\f1\s0" at 468,168 +"\f1\s10\&free,\f1\s0" at 500,168 +"\f1\s10\&cont,\f1\s0" at 532,168 +"\f1\s10\&cont,\f1\s0" at 564,168 +m2 with .nw at 100,192 +m1 with .nw at 132,192 +m3 with .nw at 164,192 +m2 with .nw at 196,192 +m2 with .nw at 228,192 +m2 with .nw at 260,192 +m0 with .nw at 292,192 +m0 with .nw at 324,192 +m2 with .nw at 356,192 +m2 with .nw at 388,192 +m0 with .nw at 420,192 +m2 with .nw at 452,192 +m2 with .nw at 484,192 +m2 with .nw at 516,192 +"\f1\s10\&kmemsizes[] = {\f1\s0" at 100,168 rjust +"\f1\s10\&char *kmembase\f1\s0" at 97,184 rjust +.PE diff --git a/share/doc/papers/kernmalloc/appendix.t b/share/doc/papers/kernmalloc/appendix.t new file mode 100644 index 0000000..bcd3e8c --- /dev/null +++ b/share/doc/papers/kernmalloc/appendix.t @@ -0,0 +1,137 @@ +.\" Copyright (c) 1988 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)appendix.t 5.1 (Berkeley) 4/16/91 +.\" +.bp +.H 1 "Appendix A - Implementation Details" +.LP +.nf +.vS +/* + * Constants for setting the parameters of the kernel memory allocator. + * + * 2 ** MINBUCKET is the smallest unit of memory that will be + * allocated. It must be at least large enough to hold a pointer. + * + * Units of memory less or equal to MAXALLOCSAVE will permanently + * allocate physical memory; requests for these size pieces of memory + * are quite fast. Allocations greater than MAXALLOCSAVE must + * always allocate and free physical memory; requests for these size + * allocations should be done infrequently as they will be slow. + * Constraints: CLBYTES <= MAXALLOCSAVE <= 2 ** (MINBUCKET + 14) + * and MAXALLOCSIZE must be a power of two. + */ +#define MINBUCKET 4 /* 4 => min allocation of 16 bytes */ +#define MAXALLOCSAVE (2 * CLBYTES) + +/* + * Maximum amount of kernel dynamic memory. + * Constraints: must be a multiple of the pagesize. + */ +#define MAXKMEM (1024 * PAGESIZE) + +/* + * Arena for all kernel dynamic memory allocation. + * This arena is known to start on a page boundary. + */ +extern char kmembase[MAXKMEM]; + +/* + * Array of descriptors that describe the contents of each page + */ +struct kmemsizes { + short ks_indx; /* bucket index, size of small allocations */ + u_short ks_pagecnt; /* for large allocations, pages allocated */ +} kmemsizes[MAXKMEM / PAGESIZE]; + +/* + * Set of buckets for each size of memory block that is retained + */ +struct kmembuckets { + caddr_t kb_next; /* list of free blocks */ +} bucket[MINBUCKET + 16]; +.bp +/* + * Macro to convert a size to a bucket index. If the size is constant, + * this macro reduces to a compile time constant. + */ +#define MINALLOCSIZE (1 << MINBUCKET) +#define BUCKETINDX(size) \ + (size) <= (MINALLOCSIZE * 128) \ + ? (size) <= (MINALLOCSIZE * 8) \ + ? (size) <= (MINALLOCSIZE * 2) \ + ? (size) <= (MINALLOCSIZE * 1) \ + ? (MINBUCKET + 0) \ + : (MINBUCKET + 1) \ + : (size) <= (MINALLOCSIZE * 4) \ + ? (MINBUCKET + 2) \ + : (MINBUCKET + 3) \ + : (size) <= (MINALLOCSIZE* 32) \ + ? (size) <= (MINALLOCSIZE * 16) \ + ? (MINBUCKET + 4) \ + : (MINBUCKET + 5) \ + : (size) <= (MINALLOCSIZE * 64) \ + ? (MINBUCKET + 6) \ + : (MINBUCKET + 7) \ + : (size) <= (MINALLOCSIZE * 2048) \ + /* etc ... */ + +/* + * Macro versions for the usual cases of malloc/free + */ +#define MALLOC(space, cast, size, flags) { \ + register struct kmembuckets *kbp = &bucket[BUCKETINDX(size)]; \ + long s = splimp(); \ + if (kbp->kb_next == NULL) { \ + (space) = (cast)malloc(size, flags); \ + } else { \ + (space) = (cast)kbp->kb_next; \ + kbp->kb_next = *(caddr_t *)(space); \ + } \ + splx(s); \ +} + +#define FREE(addr) { \ + register struct kmembuckets *kbp; \ + register struct kmemsizes *ksp = \ + &kmemsizes[((addr) - kmembase) / PAGESIZE]; \ + long s = splimp(); \ + if (1 << ksp->ks_indx > MAXALLOCSAVE) { \ + free(addr); \ + } else { \ + kbp = &bucket[ksp->ks_indx]; \ + *(caddr_t *)(addr) = kbp->kb_next; \ + kbp->kb_next = (caddr_t)(addr); \ + } \ + splx(s); \ +} +.vE diff --git a/share/doc/papers/kernmalloc/kernmalloc.t b/share/doc/papers/kernmalloc/kernmalloc.t new file mode 100644 index 0000000..d074c9e --- /dev/null +++ b/share/doc/papers/kernmalloc/kernmalloc.t @@ -0,0 +1,653 @@ +.\" Copyright (c) 1988 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)kernmalloc.t 5.1 (Berkeley) 4/16/91 +.\" $FreeBSD$ +.\" +.\" reference a system routine name +.de RN +\fI\\$1\fP\^(\h'1m/24u')\\$2 +.. +.\" reference a header name +.de H +.NH \\$1 +\\$2 +.. +.\" begin figure +.\" .FI "title" +.nr Fn 0 1 +.de FI +.ds Lb Figure \\n+(Fn +.ds Lt \\$1 +.KF +.DS B +.nf +.. +.\" +.\" end figure +.de Fe +.DE +.ce +\\*(Lb. \\*(Lt +.sp +.KE +.. +.EQ +delim $$ +.EN +.ds CH " +.pn 295 +.sp +.rs +.ps -1 +.sp -1 +.fi +Reprinted from: +\fIProceedings of the San Francisco USENIX Conference\fP, +pp. 295-303, June 1988. +.ps +.\".sp |\n(HMu +.rm CM +.nr PO 1.25i +.TL +Design of a General Purpose Memory Allocator for the 4.3BSD UNIX\(dg Kernel +.ds LF Summer USENIX '88 +.ds CF "% +.ds RF San Francisco, June 20-24 +.EH 'Design of a General Purpose Memory ...''McKusick, Karels' +.OH 'McKusick, Karels''Design of a General Purpose Memory ...' +.FS +\(dgUNIX is a registered trademark of AT&T in the US and other countries. +.FE +.AU +Marshall Kirk McKusick +.AU +Michael J. Karels +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.AB +The 4.3BSD UNIX kernel uses many memory allocation mechanisms, +each designed for the particular needs of the utilizing subsystem. +This paper describes a general purpose dynamic memory allocator +that can be used by all of the kernel subsystems. +The design of this allocator takes advantage of known memory usage +patterns in the UNIX kernel and a hybrid strategy that is time-efficient +for small allocations and space-efficient for large allocations. +This allocator replaces the multiple memory allocation interfaces +with a single easy-to-program interface, +results in more efficient use of global memory by eliminating +partitioned and specialized memory pools, +and is quick enough that no performance loss is observed +relative to the current implementations. +The paper concludes with a discussion of our experience in using +the new memory allocator, +and directions for future work. +.AE +.LP +.H 1 "Kernel Memory Allocation in 4.3BSD +.PP +The 4.3BSD kernel has at least ten different memory allocators. +Some of them handle large blocks, +some of them handle small chained data structures, +and others include information to describe I/O operations. +Often the allocations are for small pieces of memory that are only +needed for the duration of a single system call. +In a user process such short-term +memory would be allocated on the run-time stack. +Because the kernel has a limited run-time stack, +it is not feasible to allocate even moderate blocks of memory on it. +Consequently, such memory must be allocated through a more dynamic mechanism. +For example, +when the system must translate a pathname, +it must allocate a one kilobye buffer to hold the name. +Other blocks of memory must be more persistent than a single system call +and really have to be allocated from dynamic memory. +Examples include protocol control blocks that remain throughout +the duration of the network connection. +.PP +Demands for dynamic memory allocation in the kernel have increased +as more services have been added. +Each time a new type of memory allocation has been required, +a specialized memory allocation scheme has been written to handle it. +Often the new memory allocation scheme has been built on top +of an older allocator. +For example, the block device subsystem provides a crude form of +memory allocation through the allocation of empty buffers [Thompson78]. +The allocation is slow because of the implied semantics of +finding the oldest buffer, pushing its contents to disk if they are dirty, +and moving physical memory into or out of the buffer to create +the requested size. +To reduce the overhead, a ``new'' memory allocator was built in 4.3BSD +for name translation that allocates a pool of empty buffers. +It keeps them on a free list so they can +be quickly allocated and freed [McKusick85]. +.PP +This memory allocation method has several drawbacks. +First, the new allocator can only handle a limited range of sizes. +Second, it depletes the buffer pool, as it steals memory intended +to buffer disk blocks to other purposes. +Finally, it creates yet another interface of +which the programmer must be aware. +.PP +A generalized memory allocator is needed to reduce the complexity +of writing code inside the kernel. +Rather than providing many semi-specialized ways of allocating memory, +the kernel should provide a single general purpose allocator. +With only a single interface, +programmers do not need to figure +out the most appropriate way to allocate memory. +If a good general purpose allocator is available, +it helps avoid the syndrome of creating yet another special +purpose allocator. +.PP +To ease the task of understanding how to use it, +the memory allocator should have an interface similar to the interface +of the well-known memory allocator provided for +applications programmers through the C library routines +.RN malloc +and +.RN free . +Like the C library interface, +the allocation routine should take a parameter specifying the +size of memory that is needed. +The range of sizes for memory requests should not be constrained. +The free routine should take a pointer to the storage being freed, +and should not require additional information such as the size +of the piece of memory being freed. +.H 1 "Criteria for a Kernel Memory Allocator +.PP +The design specification for a kernel memory allocator is similar to, +but not identical to, +the design criteria for a user level memory allocator. +The first criterion for a memory allocator is that it make good use +of the physical memory. +Good use of memory is measured by the amount of memory needed to hold +a set of allocations at any point in time. +Percentage utilization is expressed as: +.ie t \{\ +.EQ +utilization~=~requested over required +.EN +.\} +.el \{\ +.sp +.ce +\fIutilization\fP=\fIrequested\fP/\fIrequired\fP +.sp +.\} +Here, ``requested'' is the sum of the memory that has been requested +and not yet freed. +``Required'' is the amount of memory that has been +allocated for the pool from which the requests are filled. +An allocator requires more memory than requested because of fragmentation +and a need to have a ready supply of free memory for future requests. +A perfect memory allocator would have a utilization of 100%. +In practice, +having a 50% utilization is considered good [Korn85]. +.PP +Good memory utilization in the kernel is more important than +in user processes. +Because user processes run in virtual memory, +unused parts of their address space can be paged out. +Thus pages in the process address space +that are part of the ``required'' pool that are not +being ``requested'' need not tie up physical memory. +Because the kernel is not paged, +all pages in the ``required'' pool are held by the kernel and +cannot be used for other purposes. +To keep the kernel utilization percentage as high as possible, +it is desirable to release unused memory in the ``required'' pool +rather than to hold it as is typically done with user processes. +Because the kernel can directly manipulate its own page maps, +releasing unused memory is fast; +a user process must do a system call to release memory. +.PP +The most important criterion for a memory allocator is that it be fast. +Because memory allocation is done frequently, +a slow memory allocator will degrade the system performance. +Speed of allocation is more critical when executing in the +kernel than in user code, +because the kernel must allocate many data structure that user +processes can allocate cheaply on their run-time stack. +In addition, the kernel represents the platform on which all user +processes run, +and if it is slow, it will degrade the performance of every process +that is running. +.PP +Another problem with a slow memory allocator is that programmers +of frequently-used kernel interfaces will feel that they +cannot afford to use it as their primary memory allocator. +Instead they will build their own memory allocator on top of the +original by maintaining their own pool of memory blocks. +Multiple allocators reduce the efficiency with which memory is used. +The kernel ends up with many different free lists of memory +instead of a single free list from which all allocation can be drawn. +For example, +consider the case of two subsystems that need memory. +If they have their own free lists, +the amount of memory tied up in the two lists will be the +sum of the greatest amount of memory that each of +the two subsystems has ever used. +If they share a free list, +the amount of memory tied up in the free list may be as low as the +greatest amount of memory that either subsystem used. +As the number of subsystems grows, +the savings from having a single free list grow. +.H 1 "Existing User-level Implementations +.PP +There are many different algorithms and +implementations of user-level memory allocators. +A survey of those available on UNIX systems appeared in [Korn85]. +Nearly all of the memory allocators tested made good use of memory, +though most of them were too slow for use in the kernel. +The fastest memory allocator in the survey by nearly a factor of two +was the memory allocator provided on 4.2BSD originally +written by Chris Kingsley at California Institute of Technology. +Unfortunately, +the 4.2BSD memory allocator also wasted twice as much memory +as its nearest competitor in the survey. +.PP +The 4.2BSD user-level memory allocator works by maintaining a set of lists +that are ordered by increasing powers of two. +Each list contains a set of memory blocks of its corresponding size. +To fulfill a memory request, +the size of the request is rounded up to the next power of two. +A piece of memory is then removed from the list corresponding +to the specified power of two and returned to the requester. +Thus, a request for a block of memory of size 53 returns +a block from the 64-sized list. +A typical memory allocation requires a roundup calculation +followed by a linked list removal. +Only if the list is empty is a real memory allocation done. +The free operation is also fast; +the block of memory is put back onto the list from which it came. +The correct list is identified by a size indicator stored +immediately preceding the memory block. +.H 1 "Considerations Unique to a Kernel Allocator +.PP +There are several special conditions that arise when writing a +memory allocator for the kernel that do not apply to a user process +memory allocator. +First, the maximum memory allocation can be determined at +the time that the machine is booted. +This number is never more than the amount of physical memory on the machine, +and is typically much less since a machine with all its +memory dedicated to the operating system is uninteresting to use. +Thus, the kernel can statically allocate a set of data structures +to manage its dynamically allocated memory. +These data structures never need to be +expanded to accommodate memory requests; +yet, if properly designed, they need not be large. +For a user process, the maximum amount of memory that may be allocated +is a function of the maximum size of its virtual memory. +Although it could allocate static data structures to manage +its entire virtual memory, +even if they were efficiently encoded they would potentially be huge. +The other alternative is to allocate data structures as they are needed. +However, that adds extra complications such as new +failure modes if it cannot allocate space for additional +structures and additional mechanisms to link them all together. +.PP +Another special condition of the kernel memory allocator is that it +can control its own address space. +Unlike user processes that can only grow and shrink their heap at one end, +the kernel can keep an arena of kernel addresses and allocate +pieces from that arena which it then populates with physical memory. +The effect is much the same as a user process that has parts of +its address space paged out when they are not in use, +except that the kernel can explicitly control the set of pages +allocated to its address space. +The result is that the ``working set'' of pages in use by the +kernel exactly corresponds to the set of pages that it is really using. +.FI "One day memory usage on a Berkeley time-sharing machine" +.so usage.tbl +.Fe +.PP +A final special condition that applies to the kernel is that +all of the different uses of dynamic memory are known in advance. +Each one of these uses of dynamic memory can be assigned a type. +For each type of dynamic memory that is allocated, +the kernel can provide allocation limits. +One reason given for having separate allocators is that +no single allocator could starve the rest of the kernel of all +its available memory and thus a single runaway +client could not paralyze the system. +By putting limits on each type of memory, +the single general purpose memory allocator can provide the same +protection against memory starvation.\(dg +.FS +\(dgOne might seriously ask the question what good it is if ``only'' +one subsystem within the kernel hangs if it is something like the +network on a diskless workstation. +.FE +.PP +\*(Lb shows the memory usage of the kernel over a one day period +on a general timesharing machine at Berkeley. +The ``In Use'', ``Free'', and ``Mem Use'' fields are instantaneous values; +the ``Requests'' field is the number of allocations since system startup; +the ``High Use'' field is the maximum value of +the ``Mem Use'' field since system startup. +The figure demonstrates that most +allocations are for small objects. +Large allocations occur infrequently, +and are typically for long-lived objects +such as buffers to hold the superblock for +a mounted file system. +Thus, a memory allocator only needs to be +fast for small pieces of memory. +.H 1 "Implementation of the Kernel Memory Allocator +.PP +In reviewing the available memory allocators, +none of their strategies could be used without some modification. +The kernel memory allocator that we ended up with is a hybrid +of the fast memory allocator found in the 4.2BSD C library +and a slower but more-memory-efficient first-fit allocator. +.PP +Small allocations are done using the 4.2BSD power-of-two list strategy; +the typical allocation requires only a computation of +the list to use and the removal of an element if it is available, +so it is quite fast. +Macros are provided to avoid the cost of a subroutine call. +Only if the request cannot be fulfilled from a list is a call +made to the allocator itself. +To ensure that the allocator is always called for large requests, +the lists corresponding to large allocations are always empty. +Appendix A shows the data structures and implementation of the macros. +.PP +Similarly, freeing a block of memory can be done with a macro. +The macro computes the list on which to place the request +and puts it there. +The free routine is called only if the block of memory is +considered to be a large allocation. +Including the cost of blocking out interrupts, +the allocation and freeing macros generate respectively +only nine and sixteen (simple) VAX instructions. +.PP +Because of the inefficiency of power-of-two allocation strategies +for large allocations, +a different strategy is used for allocations larger than two kilobytes. +The selection of two kilobytes is derived from our statistics on +the utilization of memory within the kernel, +that showed that 95 to 98% of allocations are of size one kilobyte or less. +A frequent caller of the memory allocator +(the name translation function) +always requests a one kilobyte block. +Additionally the allocation method for large blocks is based on allocating +pieces of memory in multiples of pages. +Consequently the actual allocation size for requests of size +$2~times~pagesize$ or less are identical.\(dg +.FS +\(dgTo understand why this number is $size 8 {2~times~pagesize}$ one +observes that the power-of-two algorithm yields sizes of 1, 2, 4, 8, \&... +pages while the large block algorithm that allocates in multiples +of pages yields sizes of 1, 2, 3, 4, \&... pages. +Thus for allocations of sizes between one and two pages +both algorithms use two pages; +it is not until allocations of sizes between two and three pages +that a difference emerges where the power-of-two algorithm will use +four pages while the large block algorithm will use three pages. +.FE +In 4.3BSD on the VAX, the (software) page size is one kilobyte, +so two kilobytes is the smallest logical cutoff. +.PP +Large allocations are first rounded up to be a multiple of the page size. +The allocator then uses a first-fit algorithm to find space in the +kernel address arena set aside for dynamic allocations. +Thus a request for a five kilobyte piece of memory will use exactly +five pages of memory rather than eight kilobytes as with +the power-of-two allocation strategy. +When a large piece of memory is freed, +the memory pages are returned to the free memory pool, +and the address space is returned to the kernel address arena +where it is coalesced with adjacent free pieces. +.PP +Another technique to improve both the efficiency of memory utilization +and the speed of allocation +is to cluster same-sized small allocations on a page. +When a list for a power-of-two allocation is empty, +a new page is allocated and divided into pieces of the needed size. +This strategy speeds future allocations as several pieces of memory +become available as a result of the call into the allocator. +.PP +.FI "Calculation of allocation size" +.so alloc.fig +.Fe +Because the size is not specified when a block of memory is freed, +the allocator must keep track of the sizes of the pieces it has handed out. +The 4.2BSD user-level allocator stores the size of each block +in a header just before the allocation. +However, this strategy doubles the memory requirement for allocations that +require a power-of-two-sized block. +Therefore, +instead of storing the size of each piece of memory with the piece itself, +the size information is associated with the memory page. +\*(Lb shows how the kernel determines +the size of a piece of memory that is being freed, +by calculating the page in which it resides, +and looking up the size associated with that page. +Eliminating the cost of the overhead per piece improved utilization +far more than expected. +The reason is that many allocations in the kernel are for blocks of +memory whose size is exactly a power of two. +These requests would be nearly doubled if the user-level strategy were used. +Now they can be accommodated with no wasted memory. +.PP +The allocator can be called both from the top half of the kernel, +which is willing to wait for memory to become available, +and from the interrupt routines in the bottom half of the kernel +that cannot wait for memory to become available. +Clients indicate their willingness (and ability) to wait with a flag +to the allocation routine. +For clients that are willing to wait, +the allocator guarrentees that their request will succeed. +Thus, these clients can need not check the return value from the allocator. +If memory is unavailable and the client cannot wait, +the allocator returns a null pointer. +These clients must be prepared to cope with this +(hopefully infrequent) condition +(usually by giving up and hoping to do better later). +.H 1 "Results of the Implementation +.PP +The new memory allocator was written about a year ago. +Conversion from the old memory allocators to the new allocator +has been going on ever since. +Many of the special purpose allocators have been eliminated. +This list includes +.RN calloc , +.RN wmemall , +and +.RN zmemall . +Many of the special purpose memory allocators built on +top of other allocators have also been eliminated. +For example, the allocator that was built on top of the buffer pool allocator +.RN geteblk +to allocate pathname buffers in +.RN namei +has been eliminated. +Because the typical allocation is so fast, +we have found that none of the special purpose pools are needed. +Indeed, the allocation is about the same as the previous cost of +allocating buffers from the network pool (\fImbuf\fP\^s). +Consequently applications that used to allocate network +buffers for their own uses have been switched over to using +the general purpose allocator without increasing their running time. +.PP +Quantifying the performance of the allocator is difficult because +it is hard to measure the amount of time spent allocating +and freeing memory in the kernel. +The usual approach is to compile a kernel for profiling +and then compare the running time of the routines that +implemented the old abstraction versus those that implement the new one. +The old routines are difficult to quantify because +individual routines were used for more than one purpose. +For example, the +.RN geteblk +routine was used both to allocate one kilobyte memory blocks +and for its intended purpose of providing buffers to the filesystem. +Differentiating these uses is often difficult. +To get a measure of the cost of memory allocation before +putting in our new allocator, +we summed up the running time of all the routines whose +exclusive task was memory allocation. +To this total we added the fraction +of the running time of the multi-purpose routines that could +clearly be identified as memory allocation usage. +This number showed that approximately three percent of +the time spent in the kernel could be accounted to memory allocation. +.PP +The new allocator is difficult to measure +because the usual case of the memory allocator is implemented as a macro. +Thus, its running time is a small fraction of the running time of the +numerous routines in the kernel that use it. +To get a bound on the cost, +we changed the macro always to call the memory allocation routine. +Running in this mode, the memory allocator accounted for six percent +of the time spent in the kernel. +Factoring out the cost of the statistics collection and the +subroutine call overhead for the cases that could +normally be handled by the macro, +we estimate that the allocator would account for +at most four percent of time in the kernel. +These measurements show that the new allocator does not introduce +significant new run-time costs. +.PP +The other major success has been in keeping the size information +on a per-page basis. +This technique allows the most frequently requested sizes to be +allocated without waste. +It also reduces the amount of bookkeeping information associated +with the allocator to four kilobytes of information +per megabyte of memory under management (with a one kilobyte page size). +.H 1 "Future Work +.PP +Our next project is to convert many of the static +kernel tables to be dynamically allocated. +Static tables include the process table, the file table, +and the mount table. +Making these tables dynamic will have two benefits. +First, it will reduce the amount of memory +that must be statically allocated at boot time. +Second, it will eliminate the arbitrary upper limit imposed +by the current static sizing +(although a limit will be retained to constrain runaway clients). +Other researchers have already shown the memory savings +achieved by this conversion [Rodriguez88]. +.PP +Under the current implementation, +memory is never moved from one size list to another. +With the 4.2BSD memory allocator this causes problems, +particularly for large allocations where a process may use +a quarter megabyte piece of memory once, +which is then never available for any other size request. +In our hybrid scheme, +memory can be shuffled between large requests so that large blocks +of memory are never stranded as they are with the 4.2BSD allocator. +However, pages allocated to small requests are allocated once +to a particular size and never changed thereafter. +If a burst of requests came in for a particular size, +that size would acquire a large amount of memory +that would then not be available for other future requests. +.PP +In practice, we do not find that the free lists become too large. +However, we have been investigating ways to handle such problems +if they occur in the future. +Our current investigations involve a routine +that can run as part of the idle loop that would sort the elements +on each of the free lists into order of increasing address. +Since any given page has only one size of elements allocated from it, +the effect of the sorting would be to sort the list into distinct pages. +When all the pieces of a page became free, +the page itself could be released back to the free pool so that +it could be allocated to another purpose. +Although there is no guarantee that all the pieces of a page would ever +be freed, +most allocations are short-lived, lasting only for the duration of +an open file descriptor, an open network connection, or a system call. +As new allocations would be made from the page sorted to +the front of the list, +return of elements from pages at the back would eventually +allow pages later in the list to be freed. +.PP +Two of the traditional UNIX +memory allocators remain in the current system. +The terminal subsystem uses \fIclist\fP\^s (character lists). +That part of the system is expected to undergo major revision within +the next year or so, and it will probably be changed to use +\fImbuf\fP\^s as it is merged into the network system. +The other major allocator that remains is +.RN getblk , +the routine that manages the filesystem buffer pool memory +and associated control information. +Only the filesystem uses +.RN getblk +in the current system; +it manages the constant-sized buffer pool. +We plan to merge the filesystem buffer cache into the virtual memory system's +page cache in the future. +This change will allow the size of the buffer pool to be changed +according to memory load, +but will require a policy for balancing memory needs +with filesystem cache performance. +.H 1 "Acknowledgments +.PP +In the spirit of community support, +we have made various versions of our allocator available to our test sites. +They have been busily burning it in and giving +us feedback on their experiences. +We acknowledge their invaluable input. +The feedback from the Usenix program committee on the initial draft of +our paper suggested numerous important improvements. +.H 1 "References +.LP +.IP Korn85 \w'Rodriguez88\0\0'u +David Korn, Kiem-Phong Vo, +``In Search of a Better Malloc'' +\fIProceedings of the Portland Usenix Conference\fP, +pp 489-506, June 1985. +.IP McKusick85 +M. McKusick, M. Karels, S. Leffler, +``Performance Improvements and Functional Enhancements in 4.3BSD'' +\fIProceedings of the Portland Usenix Conference\fP, +pp 519-531, June 1985. +.IP Rodriguez88 +Robert Rodriguez, Matt Koehler, Larry Palmer, Ricky Palmer, +``A Dynamic UNIX Operating System'' +\fIProceedings of the San Francisco Usenix Conference\fP, +June 1988. +.IP Thompson78 +Ken Thompson, +``UNIX Implementation'' +\fIBell System Technical Journal\fP, volume 57, number 6, +pp 1931-1946, 1978. diff --git a/share/doc/papers/kernmalloc/spell.ok b/share/doc/papers/kernmalloc/spell.ok new file mode 100644 index 0000000..10c3ab7 --- /dev/null +++ b/share/doc/papers/kernmalloc/spell.ok @@ -0,0 +1,57 @@ +BUCKETINDX +CLBYTES +CM +Karels +Kiem +Koehler +Korn +Korn85 +MAXALLOCSAVE +MAXALLOCSIZE +MAXKMEM +MINALLOCSIZE +MINBUCKET +Matt +McKusick +McKusick85 +Mem +Phong +Ricky +Rodriguez88 +S.Leffler +Thompson78 +ULTRIX +Usenix +VAX +Vo +arptbl +caddr +devbuf +extern +fragtbl +freelist +geteblk +indx +ioctlops +kb +kbp +kmembase +kmembuckets +kmemsizes +ks +ksp +mbuf +mbufs +namei +pagecnt +pathname +pcb +pp +routetbl +runtime +splimp +splx +superblk +temp +wmemall +zmemall diff --git a/share/doc/papers/kernmalloc/usage.tbl b/share/doc/papers/kernmalloc/usage.tbl new file mode 100644 index 0000000..c5ebdfe --- /dev/null +++ b/share/doc/papers/kernmalloc/usage.tbl @@ -0,0 +1,75 @@ +.\" Copyright (c) 1988 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)usage.tbl 5.1 (Berkeley) 4/16/91 +.\" +.TS +box; +c s s s +c c c c +n n n n. +Memory statistics by bucket size += +Size In Use Free Requests +_ +128 329 39 3129219 +256 0 0 0 +512 4 0 16 +1024 17 5 648771 +2048 13 0 13 +2049\-4096 0 0 157 +4097\-8192 2 0 103 +8193\-16384 0 0 0 +16385\-32768 1 0 1 +.TE +.DE +.DS B +.TS +box; +c s s s s +c c c c c +c n n n n. +Memory statistics by type += +Type In Use Mem Use High Use Requests +_ +mbuf 6 1K 17K 3099066 +devbuf 13 53K 53K 13 +socket 37 5K 6K 1275 +pcb 55 7K 8K 1512 +routetbl 229 29K 29K 2424 +fragtbl 0 0K 1K 404 +zombie 3 1K 1K 24538 +namei 0 0K 5K 648754 +ioctlops 0 0K 1K 12 +superblk 24 34K 34K 24 +temp 0 0K 8K 258 +.TE diff --git a/share/doc/papers/kerntune/0.t b/share/doc/papers/kerntune/0.t new file mode 100644 index 0000000..90fa2bf --- /dev/null +++ b/share/doc/papers/kerntune/0.t @@ -0,0 +1,129 @@ +.\" Copyright (c) 1984 M. K. McKusick +.\" Copyright (c) 1984 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 1.2 (Berkeley) 11/8/90 +.\" +.EQ +delim $$ +.EN +.if n .ND +.TL +Using gprof to Tune the 4.2BSD Kernel +.AU +Marshall Kirk McKusick +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.AB +This paper describes how the \fIgprof\fP profiler +accounts for the running time of called routines +in the running time of the routines that call them. +It then explains how to configure a profiling kernel on +the 4.2 Berkeley Software Distribution of +.UX +for the VAX\(dd +.FS +\(dd VAX is a trademark of Digital Equipment Corporation. +.FE +and discusses tradeoffs in techniques for collecting +profile data. +\fIGprof\fP identifies problems +that severely affects the overall performance of the kernel. +Once a potential problem areas is identified +benchmark programs are devised to highlight the bottleneck. +These benchmarks verify that the problem exist and provide +a metric against which to validate proposed solutions. +Two caches are added to the kernel to alleviate the bottleneck +and \fIgprof\fP is used to validates their effectiveness. +.AE +.LP +.de PT +.lt \\n(LLu +.pc % +.nr PN \\n% +.tl '\\*(LH'\\*(CH'\\*(RH' +.lt \\n(.lu +.. +.af PN i +.ds LH 4.2BSD Performance +.ds RH Contents +.bp 1 +.if t .ds CF May 21, 1984 +.if t .ds LF +.if t .ds RF McKusick +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. The \fIgprof\fP Profiler" +\0.1. Data Presentation" +\0.1.1. The Flat Profile +\0.1.2. The Call Graph Profile +\0.2 Profiling the Kernel +.LP +.sp .5v +.nf +.B "3. Using \fIgprof\fP to Improve Performance +\0.1. Using the Profiler +\0.2. An Example of Tuning +.LP +.sp .5v +.nf +.B "4. Conclusions" +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References +.af PN 1 +.bp 1 +.de _d +.if t .ta .6i 2.1i 2.6i +.\" 2.94 went to 2.6, 3.64 to 3.30 +.if n .ta .84i 2.6i 3.30i +.. +.de _f +.if t .ta .5i 1.25i 2.5i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i +.. diff --git a/share/doc/papers/kerntune/1.t b/share/doc/papers/kerntune/1.t new file mode 100644 index 0000000..49b653f5 --- /dev/null +++ b/share/doc/papers/kerntune/1.t @@ -0,0 +1,49 @@ +.\" Copyright (c) 1984 M. K. McKusick +.\" Copyright (c) 1984 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 1.2 (Berkeley) 11/8/90 +.\" $FreeBSD$ +.\" +.ds RH Introduction +.NH 1 +Introduction +.PP +The purpose of this paper is to describe the tools and techniques +that are available for improving the performance of the kernel. +The primary tool used to measure the kernel is the hierarchical +profiler \fIgprof\fP. +The profiler enables the user to measure the cost of +the abstractions that the kernel provides to the user. +Once the expensive abstractions are identified, +optimizations are postulated to help improve their performance. +These optimizations are each individually +verified to insure that they are producing a measurable improvement. diff --git a/share/doc/papers/kerntune/2.t b/share/doc/papers/kerntune/2.t new file mode 100644 index 0000000..2857dc2 --- /dev/null +++ b/share/doc/papers/kerntune/2.t @@ -0,0 +1,234 @@ +.\" Copyright (c) 1984 M. K. McKusick +.\" Copyright (c) 1984 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 1.3 (Berkeley) 11/8/90 +.\" +.ds RH The \fIgprof\fP Profiler +.NH 1 +The \fIgprof\fP Profiler +.PP +The purpose of the \fIgprof\fP profiling tool is to +help the user evaluate alternative implementations +of abstractions. +The \fIgprof\fP design takes advantage of the fact that the kernel +though large, is structured and hierarchical. +We provide a profile in which the execution time +for a set of routines that implement an +abstraction is collected and charged +to that abstraction. +The profile can be used to compare and assess the costs of +various implementations [Graham82] [Graham83]. +.NH 2 +Data presentation +.PP +The data is presented to the user in two different formats. +The first presentation simply lists the routines +without regard to the amount of time their descendants use. +The second presentation incorporates the call graph of the +kernel. +.NH 3 +The Flat Profile +.PP +The flat profile consists of a list of all the routines +that are called during execution of the kernel, +with the count of the number of times they are called +and the number of seconds of execution time for which they +are themselves accountable. +The routines are listed in decreasing order of execution time. +A list of the routines that are never called during execution of +the kernel is also available +to verify that nothing important is omitted by +this profiling run. +The flat profile gives a quick overview of the routines that are used, +and shows the routines that are themselves responsible +for large fractions of the execution time. +In practice, +this profile usually shows that no single function +is overwhelmingly responsible for +the total time of the kernel. +Notice that for this profile, +the individual times sum to the total execution time. +.NH 3 +The Call Graph Profile +.PP +Ideally, we would like to print the call graph of the kernel, +but we are limited by the two-dimensional nature of our output +devices. +We cannot assume that a call graph is planar, +and even if it is, that we can print a planar version of it. +Instead, we choose to list each routine, +together with information about +the routines that are its direct parents and children. +This listing presents a window into the call graph. +Based on our experience, +both parent information and child information +is important, +and should be available without searching +through the output. +Figure 1 shows a sample \fIgprof\fP entry. +.KF +.DS L +.TS +box center; +c c c c c l l +c c c c c l l +c c c c c l l +l n n n c l l. + called/total \ \ parents +index %time self descendants called+self name index + called/total \ \ children +_ + 0.20 1.20 4/10 \ \ \s-1CALLER1\s+1 [7] + 0.30 1.80 6/10 \ \ \s-1CALLER2\s+1 [1] +[2] 41.5 0.50 3.00 10+4 \s-1EXAMPLE\s+1 [2] + 1.50 1.00 20/40 \ \ \s-1SUB1\s+1 <cycle1> [4] + 0.00 0.50 1/5 \ \ \s-1SUB2\s+1 [9] + 0.00 0.00 0/5 \ \ \s-1SUB3\s+1 [11] +.TE +.ce +Figure 1. Profile entry for \s-1EXAMPLE\s+1. +.DE +.KE +.PP +The major entries of the call graph profile are the entries from the +flat profile, augmented by the time propagated to each +routine from its descendants. +This profile is sorted by the sum of the time for the routine +itself plus the time inherited from its descendants. +The profile shows which of the higher level routines +spend large portions of the total execution time +in the routines that they call. +For each routine, we show the amount of time passed by each child +to the routine, which includes time for the child itself +and for the descendants of the child +(and thus the descendants of the routine). +We also show the percentage these times represent of the total time +accounted to the child. +Similarly, the parents of each routine are listed, +along with time, +and percentage of total routine time, +propagated to each one. +.PP +Cycles are handled as single entities. +The cycle as a whole is shown as though it were a single routine, +except that members of the cycle are listed in place of the children. +Although the number of calls of each member +from within the cycle are shown, +they do not affect time propagation. +When a child is a member of a cycle, +the time shown is the appropriate fraction of the time +for the whole cycle. +Self-recursive routines have their calls broken +down into calls from the outside and self-recursive calls. +Only the outside calls affect the propagation of time. +.PP +The example shown in Figure 2 is the fragment of a call graph +corresponding to the entry in the call graph profile listing +shown in Figure 1. +.KF +.DS L +.so fig2.pic +.ce +Figure 2. Example call graph fragment. +.DE +.KE +.PP +The entry is for routine \s-1EXAMPLE\s+1, which has +the Caller routines as its parents, +and the Sub routines as its children. +The reader should keep in mind that all information +is given \fIwith respect to \s-1EXAMPLE\s+1\fP. +The index in the first column shows that \s-1EXAMPLE\s+1 +is the second entry in the profile listing. +The \s-1EXAMPLE\s+1 routine is called ten times, four times by \s-1CALLER1\s+1, +and six times by \s-1CALLER2\s+1. +Consequently 40% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER1\s+1, +and 60% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER2\s+1. +The self and descendant fields of the parents +show the amount of self and descendant time \s-1EXAMPLE\s+1 +propagates to them (but not the time used by +the parents directly). +Note that \s-1EXAMPLE\s+1 calls itself recursively four times. +The routine \s-1EXAMPLE\s+1 calls routine \s-1SUB1\s+1 twenty times, \s-1SUB2\s+1 once, +and never calls \s-1SUB3\s+1. +Since \s-1SUB2\s+1 is called a total of five times, +20% of its self and descendant time is propagated to \s-1EXAMPLE\s+1's +descendant time field. +Because \s-1SUB1\s+1 is a member of \fIcycle 1\fR, +the self and descendant times +and call count fraction +are those for the cycle as a whole. +Since cycle 1 is called a total of forty times +(not counting calls among members of the cycle), +it propagates 50% of the cycle's self and descendant +time to \s-1EXAMPLE\s+1's descendant time field. +Finally each name is followed by an index that shows +where on the listing to find the entry for that routine. +.NH 2 +Profiling the Kernel +.PP +It is simple to build a 4.2BSD kernel that will automatically +collect profiling information as it operates simply by specifying the +.B \-p +option to \fIconfig\fP\|(8) when configuring a kernel. +The program counter sampling can be driven by the system clock, +or by an alternate real time clock. +The latter is highly recommended as use of the system clock results +in statistical anomalies in accounting for +the time spent in the kernel clock routine. +.PP +Once a profiling system has been booted statistic gathering is +handled by \fIkgmon\fP\|(8). +\fIKgmon\fP allows profiling to be started and stopped +and the internal state of the profiling buffers to be dumped. +\fIKgmon\fP can also be used to reset the state of the internal +buffers to allow multiple experiments to be run without +rebooting the machine. +The profiling data can then be processed with \fIgprof\fP\|(1) +to obtain information regarding the system's operation. +.PP +A profiled system is about 5-10% larger in its text space because of +the calls to count the subroutine invocations. +When the system executes, +the profiling data is stored in a buffer that is 1.2 +times the size of the text space. +All the information is summarized in memory, +it is not necessary to have a trace file +being continuously dumped to disk. +The overhead for running a profiled system varies; +under normal load we see anywhere from 5-25% +of the system time spent in the profiling code. +Thus the system is noticeably slower than an unprofiled system, +yet is not so bad that it cannot be used in a production environment. +This is important since it allows us to gather data +in a real environment rather than trying to +devise synthetic work loads. diff --git a/share/doc/papers/kerntune/3.t b/share/doc/papers/kerntune/3.t new file mode 100644 index 0000000..e03236b --- /dev/null +++ b/share/doc/papers/kerntune/3.t @@ -0,0 +1,290 @@ +.\" Copyright (c) 1984 M. K. McKusick +.\" Copyright (c) 1984 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)3.t 1.2 (Berkeley) 11/8/90 +.\" +.ds RH Techniques for Improving Performance +.NH 1 +Techniques for Improving Performance +.PP +This section gives several hints on general optimization techniques. +It then proceeds with an example of how they can be +applied to the 4.2BSD kernel to improve its performance. +.NH 2 +Using the Profiler +.PP +The profiler is a useful tool for improving +a set of routines that implement an abstraction. +It can be helpful in identifying poorly coded routines, +and in evaluating the new algorithms and code that replace them. +Taking full advantage of the profiler +requires a careful examination of the call graph profile, +and a thorough knowledge of the abstractions underlying +the kernel. +.PP +The easiest optimization that can be performed +is a small change +to a control construct or data structure. +An obvious starting point +is to expand a small frequently called routine inline. +The drawback to inline expansion is that the data abstractions +in the kernel may become less parameterized, +hence less clearly defined. +The profiling will also become less useful since the loss of +routines will make its output more granular. +.PP +Further potential for optimization lies in routines that +implement data abstractions whose total execution +time is long. +If the data abstraction function cannot easily be speeded up, +it may be advantageous to cache its results, +and eliminate the need to rerun +it for identical inputs. +These and other ideas for program improvement are discussed in +[Bentley81]. +.PP +This tool is best used in an iterative approach: +profiling the kernel, +eliminating one bottleneck, +then finding some other part of the kernel +that begins to dominate execution time. +.PP +A completely different use of the profiler is to analyze the control +flow of an unfamiliar section of the kernel. +By running an example that exercises the unfamiliar section of the kernel, +and then using \fIgprof\fR, you can get a view of the +control structure of the unfamiliar section. +.NH 2 +An Example of Tuning +.PP +The first step is to come up with a method for generating +profile data. +We prefer to run a profiling system for about a one day +period on one of our general timesharing machines. +While this is not as reproducible as a synthetic workload, +it certainly represents a realistic test. +We have run one day profiles on several +occasions over a three month period. +Despite the long period of time that elapsed +between the test runs the shape of the profiles, +as measured by the number of times each system call +entry point was called, were remarkably similar. +.PP +A second alternative is to write a small benchmark +program to repeated exercise a suspected bottleneck. +While these benchmarks are not useful as a long term profile +they can give quick feedback on whether a hypothesized +improvement is really having an effect. +It is important to realize that the only real assurance +that a change has a beneficial effect is through +long term measurements of general timesharing. +We have numerous examples where a benchmark program +suggests vast improvements while the change +in the long term system performance is negligible, +and conversely examples in which the benchmark program run more slowly, +but the long term system performance improves significantly. +.PP +An investigation of our long term profiling showed that +the single most expensive function performed by the kernel +is path name translation. +We find that our general time sharing systems do about +500,000 name translations per day. +The cost of doing name translation in the original 4.2BSD +is 24.2 milliseconds, +representing 40% of the time processing system calls, +which is 19% of the total cycles in the kernel, +or 11% of all cycles executed on the machine. +The times are shown in Figure 3. +.KF +.DS L +.TS +center box; +l r r. +part time % of kernel +_ +self 14.3 ms/call 11.3% +child 9.9 ms/call 7.9% +_ +total 24.2 ms/call 19.2% +.TE +.ce +Figure 3. Call times for \fInamei\fP. +.DE +.KE +.PP +The system measurements collected showed the +pathname translation routine, \fInamei\fP, +was clearly worth optimizing. +An inspection of \fInamei\fP shows that +it consists of two nested loops. +The outer loop is traversed once per pathname component. +The inner loop performs a linear search through a directory looking +for a particular pathname component. +.PP +Our first idea was to observe that many programs +step through a directory performing an operation on +each entry in turn. +This caused us to modify \fInamei\fP to cache +the directory offset of the last pathname +component looked up by a process. +The cached offset is then used +as the point at which a search in the same directory +begins. Changing directories invalidates the cache, as +does modifying the directory. +For programs that step sequentially through a directory with +$N$ files, search time decreases from $O ( N sup 2 )$ +to $O(N)$. +.PP +The cost of the cache is about 20 lines of code +(about 0.2 kilobytes) +and 16 bytes per process, with the cached data +stored in a process's \fIuser\fP vector. +.PP +As a quick benchmark to verify the effectiveness of the +cache we ran ``ls \-l'' +on a directory containing 600 files. +Before the per-process cache this command +used 22.3 seconds of system time. +After adding the cache the program used the same amount +of user time, but the system time dropped to 3.3 seconds. +.PP +This change prompted our rerunning a profiled system +on a machine containing the new \fInamei\fP. +The results showed that the time in \fInamei\fP +dropped by only 2.6 ms/call and +still accounted for 36% of the system call time, +18% of the kernel, or about 10% of all the machine cycles. +This amounted to a drop in system time from 57% to about 55%. +The results are shown in Figure 4. +.KF +.DS L +.TS +center box; +l r r. +part time % of kernel +_ +self 11.0 ms/call 9.2% +child 10.6 ms/call 8.9% +_ +total 21.6 ms/call 18.1% +.TE +.ce +Figure 4. Call times for \fInamei\fP with per-process cache. +.DE +.KE +.PP +The small performance improvement +was caused by a low cache hit ratio. +Although the cache was 90% effective when hit, +it was only usable on about 25% of the names being translated. +An additional reason for the small improvement was that +although the amount of time spent in \fInamei\fP itself +decreased substantially, +more time was spent in the routines that it called +since each directory had to be accessed twice; +once to search from the middle to the end, +and once to search from the beginning to the middle. +.PP +Most missed names were caused by path name components +other than the last. +Thus Robert Elz introduced a system wide cache of most recent +name translations. +The cache is keyed on a name and the +inode and device number of the directory that contains it. +Associated with each entry is a pointer to the corresponding +entry in the inode table. +This has the effect of short circuiting the outer loop of \fInamei\fP. +For each path name component, +\fInamei\fP first looks in its cache of recent translations +for the needed name. +If it exists, the directory search can be completely eliminated. +If the name is not recognized, +then the per-process cache may still be useful in +reducing the directory search time. +The two cacheing schemes complement each other well. +.PP +The cost of the name cache is about 200 lines of code +(about 1.2 kilobytes) +and 44 bytes per cache entry. +Depending on the size of the system, +about 200 to 1000 entries will normally be configured, +using 10-44 kilobytes of physical memory. +The name cache is resident in memory at all times. +.PP +After adding the system wide name cache we reran ``ls \-l'' +on the same directory. +The user time remained the same, +however the system time rose slightly to 3.7 seconds. +This was not surprising as \fInamei\fP +now had to maintain the cache, +but was never able to make any use of it. +.PP +Another profiled system was created and measurements +were collected over a one day period. These measurements +showed a 6 ms/call decrease in \fInamei\fP, with +\fInamei\fP accounting for only 31% of the system call time, +16% of the time in the kernel, +or about 7% of all the machine cycles. +System time dropped from 55% to about 49%. +The results are shown in Figure 5. +.KF +.DS L +.TS +center box; +l r r. +part time % of kernel +_ +self 9.5 ms/call 9.6% +child 6.1 ms/call 6.1% +_ +total 15.6 ms/call 15.7% +.TE +.ce +Figure 5. Call times for \fInamei\fP with both caches. +.DE +.KE +.PP +Statistics on the performance of both caches show +the large performance improvement is +caused by the high hit ratio. +On the profiled system a 60% hit rate was observed in +the system wide cache. This, coupled with the 25% +hit rate in the per-process offset cache yielded an +effective cache hit rate of 85%. +While the system wide cache reduces both the amount of time in +the routines that \fInamei\fP calls as well as \fInamei\fP itself +(since fewer directories need to be accessed or searched), +it is interesting to note that the actual percentage of system +time spent in \fInamei\fP itself increases even though the +actual time per call decreases. +This is because less total time is being spent in the kernel, +hence a smaller absolute time becomes a larger total percentage. diff --git a/share/doc/papers/kerntune/4.t b/share/doc/papers/kerntune/4.t new file mode 100644 index 0000000..38bae43 --- /dev/null +++ b/share/doc/papers/kerntune/4.t @@ -0,0 +1,99 @@ +.\" Copyright (c) 1984 M. K. McKusick +.\" Copyright (c) 1984 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)4.t 1.2 (Berkeley) 11/8/90 +.\" +.ds RH Conclusions +.NH 1 +Conclusions +.PP +We have created a profiler that aids in the evaluation +of the kernel. +For each routine in the kernel, +the profile shows the extent to which that routine +helps support various abstractions, +and how that routine uses other abstractions. +The profile assesses the cost of routines +at all levels of the kernel decomposition. +The profiler is easily used, +and can be compiled into the kernel. +It adds only five to thirty percent execution overhead to the kernel +being profiled, +produces no additional output while the kernel is running +and allows the kernel to be measured in its real environment. +Kernel profiles can be used to identify bottlenecks in performance. +We have shown how to improve performance +by caching recently calculated name translations. +The combined caches added to the name translation process +reduce the average cost of translating a pathname to an inode by 35%. +These changes reduce the percentage of time spent running +in the system by nearly 9%. +.nr H2 1 +.ds RH Acknowledgements +.NH +\s+2Acknowledgements\s0 +.PP +I would like to thank Robert Elz for sharing his ideas and +his code for cacheing system wide names. +Thanks also to all the users at Berkeley who provided all the +input to generate the kernel profiles. +This work was supported by +the Defense Advance Research Projects Agency (DoD) under +Arpa Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.ds RH References +.nr H2 1 +.sp 2 +.NH +\s+2References\s-2 +.LP +.IP [Bentley81] 20 +Bentley, J. L., +``Writing Efficient Code'', +Department of Computer Science, +Carnegie-Mellon University, +Pittsburgh, Pennsylvania, +CMU-CS-81-116, 1981. +.IP [Graham82] 20 +Graham, S., Kessler, P., McKusick, M., +``gprof: A Call Graph Execution Profiler'', +Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, +Volume 17, Number 6, June 1982. pp 120-126 +.IP [Graham83] 20 +Graham, S., Kessler, P., McKusick, M., +``An Execution Profiler for Modular Programs'' +Software - Practice and Experience, +Volume 13, 1983. pp 671-685 +.IP [Ritchie74] 20 +Ritchie, D. M. and Thompson, K., +``The UNIX Time-Sharing System'', +CACM 17, 7. July 1974. pp 365-375 diff --git a/share/doc/papers/kerntune/Makefile b/share/doc/papers/kerntune/Makefile new file mode 100644 index 0000000..33416d6 --- /dev/null +++ b/share/doc/papers/kerntune/Makefile @@ -0,0 +1,14 @@ +# From: @(#)Makefile 1.5 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= kerntune +SRCS= 0.t 1.t 2.t 3.t 4.t +EXTRA= fig2.pic +MACROS= -ms +USE_EQN= +USE_PIC= +USE_SOELIM= +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/papers/kerntune/fig2.pic b/share/doc/papers/kerntune/fig2.pic new file mode 100644 index 0000000..6731ca9 --- /dev/null +++ b/share/doc/papers/kerntune/fig2.pic @@ -0,0 +1,57 @@ +.\" Copyright (c) 1987 M. K. McKusick +.\" Copyright (c) 1987 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)fig2.pic 1.2 (Berkeley) 11/8/90 +.\" +.PS +ellipse ht .3i wid .75i "\s-1CALLER1\s+1" +ellipse ht .3i wid .75i "\s-1CALLER2\s+1" at 1st ellipse + (2i,0i) +ellipse ht .3i wid .8i "\s-1EXAMPLE\s+1" at 1st ellipse + (1i,-.5i) +ellipse ht .3i wid .5i "\s-1SUB1\s+1" at 1st ellipse - (0i,1i) +ellipse ht .3i wid .5i "\s-1SUB2\s+1" at 3rd ellipse - (0i,.5i) +ellipse ht .3i wid .5i "\s-1SUB3\s+1" at 2nd ellipse - (0i,1i) +line <- from 1st ellipse up .5i left .5i chop .1875i +line <- from 1st ellipse up .5i right .5i chop .1875i +line <- from 2nd ellipse up .5i left .5i chop .1875i +line <- from 2nd ellipse up .5i right .5i chop .1875i +arrow from 1st ellipse to 3rd ellipse chop +arrow from 2nd ellipse to 3rd ellipse chop +arrow from 3rd ellipse to 4th ellipse chop +arrow from 3rd ellipse to 5th ellipse chop .15i chop .15i +arrow from 3rd ellipse to 6th ellipse chop +arrow from 4th ellipse down .5i left .5i chop .1875i +arrow from 4th ellipse down .5i right .5i chop .1875i +arrow from 5th ellipse down .5i left .5i chop .1875i +arrow from 5th ellipse down .5i right .5i chop .1875i +arrow from 6th ellipse down .5i left .5i chop .1875i +arrow from 6th ellipse down .5i right .5i chop .1875i +.PE diff --git a/share/doc/papers/malloc/Makefile b/share/doc/papers/malloc/Makefile new file mode 100644 index 0000000..00e1e3d --- /dev/null +++ b/share/doc/papers/malloc/Makefile @@ -0,0 +1,10 @@ +# From: @(#)Makefile 6.3 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= malloc +SRCS= abs.ms intro.ms kernel.ms malloc.ms problems.ms alternatives.ms \ + performance.ms implementation.ms conclusion.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/papers/malloc/abs.ms b/share/doc/papers/malloc/abs.ms new file mode 100644 index 0000000..f58d719 --- /dev/null +++ b/share/doc/papers/malloc/abs.ms @@ -0,0 +1,35 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.if n .ND +.TL +Malloc(3) in modern Virtual Memory environments. +.sp +Revised +Fri Apr 5 12:50:07 1996 +.AU +Poul-Henning Kamp +.AI +<phk@FreeBSD.org> +Den Andensidste Viking +Valbygaardsvej 8 +DK-4200 Slagelse +Denmark +.AB +Malloc/free is one of the oldest parts of the C language environment +and obviously the world has changed a bit since it was first made. +The fact that most UNIX kernels have changed from swap/segment to +virtual memory/page based memory management has not been sufficiently +reflected in the implementations of the malloc/free API. +.PP +A new implementation was designed, written, tested and bench-marked +with an eye on the workings and performance characteristics of modern +Virtual Memory systems. +.AE diff --git a/share/doc/papers/malloc/alternatives.ms b/share/doc/papers/malloc/alternatives.ms new file mode 100644 index 0000000..5a46f95 --- /dev/null +++ b/share/doc/papers/malloc/alternatives.ms @@ -0,0 +1,45 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH Alternative implementations +.NH +Alternative implementations +.PP +These problems were actually the inspiration for the first alternative +malloc implementations. +Since their main aim was debugging, they would often use techniques +like allocating a guard zone before and after the chunk, +and possibly filling these guard zones +with some pattern, so accesses outside the allocated chunk could be detected +with some decent probability. +Another widely used technique is to use tables to keep track of which +chunks are actually in which state and so on. +.PP +This class of debugging has been taken to its practical extreme by +the product "Purify" which does the entire memory-coloring exercise +and not only keeps track of what is in use and what isn't, but also +detects if the first reference is a read (which would return undefined +values) and other such violations. +.PP +Later actual complete implementations of malloc arrived, but many of +these still based their workings on the basic schema mentioned previously, +disregarding that in the meantime virtual memory and paging have +become the standard environment. +.PP +The most widely used "alternative" malloc is undoubtedly ``gnumalloc'' +which has received wide acclaim and certainly runs faster than +most stock mallocs. It does, however, tend to fare badly in +cases where paging is the norm rather than the exception. +.PP +The particular malloc that prompted this work basically didn't bother +reusing storage until the kernel forced it to do so by refusing +further allocations with sbrk(2). +That may make sense if you work alone on your own personal mainframe, +but as a general policy it is less than optimal. diff --git a/share/doc/papers/malloc/conclusion.ms b/share/doc/papers/malloc/conclusion.ms new file mode 100644 index 0000000..da7d7e9 --- /dev/null +++ b/share/doc/papers/malloc/conclusion.ms @@ -0,0 +1,48 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH Conclusion and experience. +.NH +Conclusion and experience. +.PP +In general the performance differences between gnumalloc and this +malloc are not that big. +The major difference comes when primary storage is seriously +over-committed, in which case gnumalloc +wastes time paging in pages it's not going to use. +In such cases as much as a factor of five in wall-clock time has +been seen in difference. +Apart from that gnumalloc and this implementation are pretty +much head-on performance-wise. +.PP +Several legacy programs in the BSD 4.4 Lite distribution had +code that depended on the memory returned from malloc +being zeroed. In a couple of cases, free(3) was called more than +once for the same allocation, and a few cases even called free(3) +with pointers to objects in the data section or on the stack. +.PP +A couple of users have reported that using this malloc on other +platforms yielded "pretty impressive results", but no hard benchmarks +have been made. +.ds RH Acknowledgements & references. +.NH +Acknowledgements & references. +.PP +The first implementation of this algorithm was actually a file system, +done in assembler using 5-hole ``Baudot'' paper tape for a drum storage +device attached to a 20 bit germanium transistor computer with 2000 words +of memory, but that was many years ago. +.PP +Peter Wemm <peter@FreeBSD.org> came up with the idea to store the +page-directory in mmap(2)'ed memory instead of in the heap. +This has proven to be a good move. +.PP +Lars Fredriksen <fredriks@mcs.com> found and identified a +fence-post bug in the code. diff --git a/share/doc/papers/malloc/implementation.ms b/share/doc/papers/malloc/implementation.ms new file mode 100644 index 0000000..2507e4c --- /dev/null +++ b/share/doc/papers/malloc/implementation.ms @@ -0,0 +1,225 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH Implementation +.NH +Implementation +.PP +A new malloc(3) implementation was written to meet the goals, +and to the extent possible to address the shortcomings listed previously. +.PP +The source is 1218 lines of C code, and can be found in FreeBSD 2.2 +(and probably later versions as well) as src/lib/libc/stdlib/malloc.c. +.PP +The main data structure is the +.I page-directory +which contains a +.B void* +for each page we have control over. +The value can be one of: +.IP +.B MALLOC_NOT_MINE +Another part of the code may call brk(2) to get a piece of the cake. +Consequently, we cannot rely on the memory we get from the kernel +being one consecutive piece of memory, and therefore we need a way to +mark such pages as "untouchable". +.IP +.B MALLOC_FREE +This is a free page. +.IP +.B MALLOC_FIRST +This is the first page in a (multi-)page allocation. +.IP +.B MALLOC_FOLLOW +This is a subsequent page in a multi-page allocation. +.IP +.B +struct pginfo* +.R +A pointer to a structure describing a partitioned page. +.PP +In addition, there exists a linked list of small data structures that +describe the free space as runs of free pages. +.PP +Notice that these structures are not part of the free pages themselves, +but rather allocated with malloc so that the free pages themselves +are never referenced while they are free. +.PP +When a request for storage comes in, it will be treated as a ``page'' +allocation if it is bigger than half a page. +The free list will be searched and the first run of free pages that +can satisfy the request is used. The first page gets set to +.B MALLOC_FIRST +status. If more than that one page is needed, the rest of them get +.B MALLOC_FOLLOW +status in the page-directory. +.PP +If there were no pages on the free list, brk(2) will be called, and +the pages will get added to the page-directory with status +.B MALLOC_FREE +and the search restarts. +.PP +Freeing a number of pages is done by changing their state in the +page directory to MALLOC_FREE, and then traversing the free-pages list to +find the right place for this run of pages, possibly collapsing +with the two neighboring runs into one run and, if possible, +releasing some memory back to the kernel by calling brk(2). +.PP +If the request is less than or equal to half of a page, its size will be +rounded up to the nearest power of two before being processed +and if the request is less than some minimum size, it is rounded up to +that size. +.PP +These sub-page allocations are served from pages which are split up +into some number of equal size chunks. +For each of these pages a +.B +struct pginfo +.R +describes the size of the chunks on this page, how many there are, +how many are free and so on. +The description consist of a bitmap of used chunks, and various counters +and numbers used to keep track of the stuff in the page. +.PP +For each size of sub-page allocation, the pginfo structures for the +pages that have free chunks in them form a list. +The heads of these lists are stored in predetermined slots at +the beginning of the page directory to make access fast. +.PP +To allocate a chunk of some size, the head of the list for the +corresponding size is examined, and a free chunk found. The number +of free chunks on that page is decreased by one and, if zero, the +pginfo structure is unlinked from the list. +.PP +To free a chunk, the page is derived from the pointer, the page table +for that page contains a pointer to the pginfo structure, where the +free bit is set for the chunk, the number of free chunks increased by +one, and if equal to one, the pginfo structure is linked into the +proper place on the list for this size of chunks. +If the count increases to match the number of chunks on the page, the +pginfo structure is unlinked from the list and free(3)'ed and the +actual page itself is free(3)'ed too. +.PP +To be 100% correct performance-wise these lists should be ordered +according to the recent number of accesses to that page. This +information is not available and it would essentially mean a reordering +of the list on every memory reference to keep it up-to-date. +Instead they are ordered according to the address of the pages. +Interestingly enough, in practice this comes out to almost the same +thing performance-wise. +.PP +It's not that surprising after all, it's the difference between +following the crowd or actively directing where it can go, in both +ways you can end up in the middle of it all. +.PP +The side effect of this compromise is that it also uses less storage, +and the list never has to be reordered, all the ordering happens when +pages are added or deleted. +.PP +It is an interesting twist to the implementation that the +.B +struct pginfo +.R +is allocated with malloc. +That is, "as with malloc" to be painfully correct. +The code knows the special case where the first (couple) of allocations on +the page is actually the pginfo structure and deals with it accordingly. +This avoids some silly "chicken and egg" issues. +.ds RH Bells and whistles. +.NH +Bells and whistles. +.PP +brk(2) is actually not a very fast system call when you ask for storage. +This is mainly because of the need by the kernel to zero the pages before +handing them over, so therefore this implementation does not release +heap pages until there is a large chunk to release back to the kernel. +Chances are pretty good that we will need it again pretty soon anyway. +Since these pages are not accessed at all, they will soon be paged out +and don't affect anything but swap-space usage. +.PP +The page directory is actually kept in a mmap(2)'ed piece of +anonymous memory. This avoids some rather silly cases that +would otherwise have to be handled when the page directory +has to be extended. +.PP +One particularly nice feature is that all pointers passed to free(3) +and realloc(3) can be checked conclusively for validity: +First the pointer is masked to find the page. The page directory +is then examined, it must contain either MALLOC_FIRST, in which +case the pointer must point exactly at the page, or it can contain +a struct pginfo*, in which case the pointer must point to one of +the chunks described by that structure. +Warnings will be printed on +.B stderr +and nothing will be done with +the pointer if it is found to be invalid. +.PP +An environment variable +.B MALLOC_OPTIONS +allows the user some control over the behavior of malloc. +Some of the more interesting options are: +.IP +.B Abort +If malloc fails to allocate storage, core-dump the process with +a message rather than expect it handle this correctly. +It's amazing how few programs actually handle this condition correctly, +and consequently the havoc they can create is the more creative or +destructive. +.IP +.B Dump +Writes malloc statistics to a file called ``malloc.out'' prior +to process termination. +.IP +.B Hint +Pass a hint to the kernel about pages we no longer need through the +madvise(2) system call. This can help performance on machines that +page heavily by eliminating unnecessary page-ins and page-outs of +unused data. +.IP +.B Realloc +Always do a free and malloc when realloc(3) is called. +For programs doing garbage collection using realloc(3), this makes the +heap collapse faster since malloc will reallocate from the +lowest available address. +The default +is to leave things alone if the size of the allocation is still in +the same size-class. +.IP +.B Junk +will explicitly fill the allocated area with a particular value +to try to detect if programs rely on it being zero. +.IP +.B Zero +will explicitly zero out the allocated chunk of memory, while any +space after the allocation in the chunk will be filled with the +junk value to try to catch out of the chunk references. +.ds RH The road not taken. +.NH +The road not yet taken. +.PP +A couple of avenues were explored that could be interesting in some +set of circumstances. +.PP +Using mmap(2) instead of brk(2) was actually slower, since brk(2) +knows a lot of the things that mmap has to find out first. +.PP +In general there is little room for further improvement of the +time-overhead of the malloc, further improvements will have to +be in the area of improving paging behavior. +.PP +It is still under consideration to add a feature such that +if realloc is called with two zero arguments, the internal +allocations will be reallocated to perform a garbage collect. +This could be used in certain types of programs to collapse +the memory use, but so far it doesn't seem to be worth the effort. +.PP +Malloc/Free can be a significant point of contention in multi-threaded +programs. Low-grain locking of the data-structures inside the +implementation should be implemented to avoid excessive spin-waiting. diff --git a/share/doc/papers/malloc/intro.ms b/share/doc/papers/malloc/intro.ms new file mode 100644 index 0000000..0ee87c9 --- /dev/null +++ b/share/doc/papers/malloc/intro.ms @@ -0,0 +1,74 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH Introduction +.NH +Introduction +.PP +Most programs need to allocate storage dynamically in addition +to whatever static storage the compiler reserved at compile-time. +To C programmers this fact is rather obvious, but for many years +this was not an accepted and recognized fact, and many languages +still used today don't support this notion adequately. +.PP +The classic UNIX kernel provides two very simple and powerful +mechanisms for obtaining dynamic storage, the execution stack +and the heap. +The stack is usually put at the far upper end of the address-space, +from where it grows down as far as needed, though this may depend on +the CPU design. +The heap starts at the end of the +.B bss +segment and grows upwards as needed. +.PP +There isn't really a kernel-interface to the stack as such. +The kernel will allocate some amount of memory for it, +not even telling the process the exact size. +If the process needs more space than that, it will simply try to access +it, hoping that the kernel will detect that an access has been +attempted outside the allocated memory, and try to extend it. +If the kernel fails to extend the stack, this could be because of lack +of resources or permissions or because it may just be impossible +to do in the first place, the process will usually be shot down by the +kernel. +.PP +In the C language, there exists a little used interface to the stack, +.B alloca(3) , +which will explicitly allocate space on the stack. +This is not an interface to the kernel, but merely an adjustment +done to the stack-pointer such that space will be available and +unharmed by any subroutine calls yet to be made while the context +of the current subroutine is intact. +.PP +Due to the nature of normal use of the stack, there is no corresponding +"free" operator, but instead the space is returned when the current +function returns to its caller and the stack frame is dismantled. +This is the cause of much grief, and probably the single most important +reason that alloca(3) is not, and should not be, used widely. +.PP +The heap on the other hand has an explicit kernel-interface in the +system call +.B brk(2) . +The argument to brk(2) is a pointer to where the process wants the +heap to end. +There is also an interface called +.B sbrk(2) +taking an increment to the current end of the heap, but this is merely a +.B libc +front for brk(2). +.PP +In addition to these two memory resources, modern virtual memory kernels +provide the mmap(2)/munmap(2) interface which allows almost complete +control over any bit of virtual memory in the process address space. +.PP +Because of the generality of the mmap(2) interface and the way the +data structures representing the regions are laid out, sbrk(2) is actually +faster in use than the equivalent mmap(2) call, simply because +mmap(2) has to search for information that is implicit in the sbrk(2) call. diff --git a/share/doc/papers/malloc/kernel.ms b/share/doc/papers/malloc/kernel.ms new file mode 100644 index 0000000..952e95c --- /dev/null +++ b/share/doc/papers/malloc/kernel.ms @@ -0,0 +1,56 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH The kernel and memory +.NH +The kernel and memory +.PP +Brk(2) isn't a particularly convenient interface, +it was probably made more to fit the memory model of the +hardware being used, than to fill the needs of the programmers. +.PP +Before paged and/or virtual memory systems became +common, the most popular memory management facility used for +UNIX was segments. +This was also very often the only vehicle for imposing protection on +various parts of memory. +Depending on the hardware, segments can be anything, and consequently +how the kernels exploited them varied a lot from UNIX to UNIX and from +machine to machine. +.PP +Typically a process would have one segment for the text section, one +for the data and bss section combined and one for the stack. +On some systems the text shared a segment with the data and bss, and was +consequently just as writable as them. +.PP +In this setup all the brk(2) system call has to do is to find the +right amount of free storage, possibly moving things around in physical +memory, maybe even swapping out a segment or two to make space, +and change the upper limit on the data segment according to the address given. +.PP +In a more modern page based virtual memory implementation this is still +pretty much the situation, except that the granularity is now pages: +The kernel finds the right number of free pages, possibly paging some +pages out to free them up, and then plugs them into the page-table of +the process. +.PP +As such the difference is very small, the real difference is that in +the old world of swapping, either the entire process was in primary +storage or it wouldn't be selected to be run. In a modern VM kernel, +a process might only have a subset of its pages in primary memory, +the rest will be paged in, if and when the process tries to access them. +.PP +Only very few programs deal with the brk(2) interface directly. +The few that do usually have their own memory management facilities. +LISP or FORTH interpreters are good examples. +Most other programs use the +.B malloc(3) +interface instead, and leave it to the malloc implementation to +use brk(2) to get storage allocated from the kernel. diff --git a/share/doc/papers/malloc/malloc.ms b/share/doc/papers/malloc/malloc.ms new file mode 100644 index 0000000..4f3cf7d --- /dev/null +++ b/share/doc/papers/malloc/malloc.ms @@ -0,0 +1,72 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH Malloc and free +.NH +Malloc and free +.PP +The job of malloc(3) is to turn the rather simple +brk(2) facility into a service programs can +actually use without getting hurt. +.PP +The archetypical malloc(3) implementation keeps track of the memory between +the end of the bss section, as defined by the +.B _end +symbol, and the current brk(2) point using a linked list of chunks of memory. +Each item on the list has a status as either free or used, a pointer +to the next entry and in most cases to the previous as well, to speed +up inserts and deletes in the list. +.PP +When a malloc(3) request comes in, the list is traversed from the +front and if a free chunk big enough to hold the request is found, +it is returned, if the free chunk is bigger than the size requested, +a new free chunk is made from the excess and put back on the list. +.PP +When a chunk is +.B free(3) 'ed, +the chunk is found in the list, its status +is changed to free and if one or both of the surrounding chunks +are free, they are collapsed to one. +.PP +A third kind of request, +.B realloc(3) , +will resize +a chunk, trying to avoid copying the contents if possible. +It is seldom used, and has only had a significant impact on performance +in a few special situations. +The typical pattern of use is to malloc(3) a chunk of the maximum size +needed, read in the data and adjust the size of the chunk to match the +size of the data read using realloc(3). +.PP +For reasons of efficiency, the original implementation of malloc(3) +put the small structure used to contain the next and previous pointers +plus the state of the chunk right before the chunk itself. +.PP +As a matter of fact, the canonical malloc(3) implementation can be +studied in the ``Old testament'', chapter 8 verse 7 [Kernighan & Ritchie] +.PP +Various optimisations can be applied to the above basic algorithm: +.IP +If in freeing a chunk, we end up with the last chunk on the list being +free, we can return that to the kernel by calling brk(2) with the first +address of that chunk and then make the previous chunk the last on the +chain by terminating its ``next'' pointer. +.IP +A best-fit algorithm can be used instead of first-fit at an expense +of memory, because statistically fewer chances to brk(2) backwards will +present themselves. +.IP +Splitting the list in two, one for used and one for free chunks, to +speed the searching. +.IP +Putting free chunks on one of several free lists, depending on their size, +to speed allocation. +.IP +\&... diff --git a/share/doc/papers/malloc/performance.ms b/share/doc/papers/malloc/performance.ms new file mode 100644 index 0000000..773f92a --- /dev/null +++ b/share/doc/papers/malloc/performance.ms @@ -0,0 +1,113 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH Performance +.NH +Performance +.PP +Performance for a malloc(3) implementation comes as two variables: +.IP +A: How much time does it use for searching and manipulating data structures. +We will refer to this as ``overhead time''. +.IP +B: How well does it manage the storage. +This rather vague metric we call ``quality of allocation''. +.PP +The overhead time is easy to measure, just do a lot of malloc/free calls +of various kinds and combination, and compare the results. +.PP +The quality of allocation is not quite as simple as that. +One measure of quality is the size of the process, that should obviously +be minimized. +Another measure is the execution time of the process. +This is not an obvious indicator of quality, but people will generally +agree that it should be minimized as well, and if malloc(3) can do +anything to do so, it should. +Explanation why it is still a good metric follows: +.PP +In a traditional segment/swap kernel, the desirable behavior of a process +is to keep the brk(2) as low as possible, thus minimizing the size of the +data/bss/heap segment, which in turn translates to a smaller process and +a smaller probability of the process being swapped out, qed: faster +execution time as an average. +.PP +In a paging environment this is not a bad choice for a default, but +a couple of details needs to be looked at much more carefully. +.PP +First of all, the size of a process becomes a more vague concept since +only the pages that are actually used need to be in primary storage +for execution to progress, and they only need to be there when used. +That implies that many more processes can fit in the same amount of +primary storage, since most processes have a high degree of locality +of reference and thus only need some fraction of their pages to actually +do their job. +.PP +From this it follows that the interesting size of the process, is some +subset of the total amount of virtual memory occupied by the process. +This number isn't a constant, it varies depending on the whereabouts +of the process, and it may indeed fluctuate wildly over the lifetime +of the process. +.PP +One of the names for this vague concept is ``current working set''. +It has been defined many different ways over the years, mostly to +satisfy and support claims in marketing or benchmark contexts. +.PP +For now we can simply say that it is the number of pages the process +needs in order to run at a sufficiently low paging rate in a congested +primary storage. +(If primary storage isn't congested, this is not really important +of course, but most systems would be better off using the pages for +disk-cache or similar functions, so from that perspective it will +always be congested.) +If the number of pages is too small, the process will wait for its +pages to be read from secondary storage much of the time, if it's too +big, the space could be used better for something else. +.PP +From the view of any single process, this number of pages is +"all of my pages", but from the point of view of the OS it should +be tuned to maximise the total throughput of all the processes on +the machine at the time. +This is usually done using various kinds of least-recently-used +replacement algorithms to select page candidates for replacement. +.PP +With this knowledge, can we decide what the performance goal is for +a modern malloc(3) ? +Well, it's almost as simple as it used to be: +.B +Minimize the number of pages accessed. +.R +.PP +This really is the core of it all. +If the number of accessed pages is smaller, then locality of reference is +higher, and all kinds of caches (which is essentially what the +primary storage is in a VM system) work better. +.PP +It's interesting to notice that the classical malloc fails on this one +because the information about free chunks is kept with the free +chunks themselves. In some of the benchmarks this came out as all the +pages being paged in every time a malloc call was made, because malloc +had to traverse the free list to find a suitable chunk for the allocation. +If memory is not in use, then you shouldn't access it. +.PP +The secondary goal is more evident: +.B +Try to work in pages. +.R +.PP +That makes it easier for the kernel, and wastes less virtual memory. +Most modern implementations do this when they interact with the +kernel, but few try to avoid objects spanning pages. +.PP +If an object's size +is less than or equal to a page, there is no reason for it to span two pages. +Having objects span pages means that two pages must be +paged in, if that object is accessed. +.PP +With this analysis in the luggage, we can start coding. diff --git a/share/doc/papers/malloc/problems.ms b/share/doc/papers/malloc/problems.ms new file mode 100644 index 0000000..980f2e9 --- /dev/null +++ b/share/doc/papers/malloc/problems.ms @@ -0,0 +1,54 @@ +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@FreeBSD.org> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.ds RH The problems +.NH +The problems +.PP +Even though malloc(3) is a lot simpler to use +than the raw brk(2)/sbrk(2) interface, +or maybe exactly because +of that, +a lot of problems arise from its use. +.IP +Writing to memory outside the allocated chunk. +The most likely result being that the data structure used to hold +the links and flags about this chunk or the next one gets thrashed. +.IP +Freeing a pointer to memory not allocated by malloc. +This is often a pointer that points to an object on the stack or in the +data-section, in newer implementations of C it may even be in the text- +section where it is likely to be readonly. +Some malloc implementations detect this, some don't. +.IP +Freeing a modified pointer. This is a very common mistake, freeing +not the pointer malloc(3) returned, but rather some offset from it. +Some mallocs will handle this correctly if the offset is positive. +.IP +Freeing the same pointer more than once. +.IP +Accessing memory in a chunk after it has been free(3)'ed. +.PP +The handling of these problems have traditionally been weak. +A core-dump was the most common form for "handling", but in rare +cases one could experience the famous "malloc: corrupt arena." +message before the core-dump. +Even worse though, very often the program will just continue, +possibly giving wrong results. +.PP +An entirely different form of problem is that +the memory returned by malloc(3) can contain any value. +Unfortunately most kernels, correctly, zero out the storage they +provide with brk(2), and thus the storage malloc returns will be zeroed +in many cases as well, so programmers are not particular apt to notice +that their code depends on malloc'ed storage being zeroed. +.PP +With problems this big and error handling this weak, it is not +surprising that problems are hard and time consuming to find and fix. diff --git a/share/doc/papers/newvm/0.t b/share/doc/papers/newvm/0.t new file mode 100644 index 0000000..e23a95d --- /dev/null +++ b/share/doc/papers/newvm/0.t @@ -0,0 +1,86 @@ +.\" Copyright (c) 1986 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 5.1 (Berkeley) 4/16/91 +.\" +.rm CM +.TL +A New Virtual Memory Implementation for Berkeley +.UX +.AU +Marshall Kirk McKusick +Michael J. Karels +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.AB +With the cost per byte of memory approaching that of the cost per byte +for disks, and with file systems increasingly distant from the host +machines, a new approach to the implementation of virtual memory is +necessary. Rather than preallocating swap space which limits the +maximum virtual memory that can be supported to the size of the swap +area, the system should support virtual memory up to the sum of the +sizes of physical memory plus swap space. For systems with a local swap +disk, but remote file systems, it may be useful to use some of the memory +to keep track of the contents of the swap space to avoid multiple fetches +of the same data from the file system. +.PP +The new implementation should also add new functionality. Processes +should be allowed to have large sparse address spaces, to map files +into their address spaces, to map device memory into their address +spaces, and to share memory with other processes. The shared address +space may either be obtained by mapping a file into (possibly +different) parts of their address space, or by arranging to share +``anonymous memory'' (that is, memory that is zero fill on demand, and +whose contents are lost when the last process unmaps the memory) with +another process as is done in System V. +.PP +One use of shared memory is to provide a high-speed +Inter-Process Communication (IPC) mechanism between two or more +cooperating processes. To insure the integrity of data structures +in a shared region, processes must be able to use semaphores to +coordinate their access to these shared structures. In System V, +these semaphores are provided as a set of system calls. Unfortunately, +the use of system calls reduces the throughput of the shared memory +IPC to that of existing IPC mechanisms. We are proposing a scheme +that places the semaphores in the shared memory segment, so that +machines that have a test-and-set instruction can handle the usual +uncontested lock and unlock without doing a system call. Only in +the unusual case of trying to lock an already-locked lock or in +releasing a wanted lock will a system call be required. The +interface will allow a user-level implementation of the System V +semaphore interface on most machines with a much lower runtime cost. +.AE +.LP +.bp diff --git a/share/doc/papers/newvm/1.t b/share/doc/papers/newvm/1.t new file mode 100644 index 0000000..02ac8be --- /dev/null +++ b/share/doc/papers/newvm/1.t @@ -0,0 +1,378 @@ +.\" Copyright (c) 1986 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 5.1 (Berkeley) 4/16/91 +.\" $FreeBSD$ +.\" +.NH +Motivations for a New Virtual Memory System +.PP +The virtual memory system distributed with Berkeley UNIX has served +its design goals admirably well over the ten years of its existence. +However the relentless advance of technology has begun to render it +obsolete. +This section of the paper describes the current design, +points out the current technological trends, +and attempts to define the new design considerations that should +be taken into account in a new virtual memory design. +.NH 2 +Implementation of 4.3BSD virtual memory +.PP +All Berkeley Software Distributions through 4.3BSD +have used the same virtual memory design. +All processes, whether active or sleeping, have some amount of +virtual address space associated with them. +This virtual address space +is the combination of the amount of address space with which they initially +started plus any stack or heap expansions that they have made. +All requests for address space are allocated from available swap space +at the time that they are first made; +if there is insufficient swap space left to honor the allocation, +the system call requesting the address space fails synchronously. +Thus, the limit to available virtual memory is established by the +amount of swap space allocated to the system. +.PP +Memory pages are used in a sort of shell game to contain the +contents of recently accessed locations. +As a process first references a location +a new page is allocated and filled either with initialized data or +zeros (for new stack and break pages). +As the supply of free pages begins to run out, dirty pages are +pushed to the previously allocated swap space so that they can be reused +to contain newly faulted pages. +If a previously accessed page that has been pushed to swap is once +again used, a free page is reallocated and filled from the swap area +[Babaoglu79], [Someren84]. +.NH 2 +Design assumptions for 4.3BSD virtual memory +.PP +The design criteria for the current virtual memory implementation +were made in 1979. +At that time the cost of memory was about a thousand times greater per +byte than magnetic disks. +Most machines were used as centralized time sharing machines. +These machines had far more disk storage than they had memory +and given the cost tradeoff between memory and disk storage, +wanted to make maximal use of the memory even at the cost of +wasting some of the disk space or generating extra disk I/O. +.PP +The primary motivation for virtual memory was to allow the +system to run individual programs whose address space exceeded +the memory capacity of the machine. +Thus the virtual memory capability allowed programs to be run that +could not have been run on a swap based system. +Equally important in the large central timesharing environment +was the ability to allow the sum of the memory requirements of +all active processes to exceed the amount of physical memory on +the machine. +The expected mode of operation for which the system was tuned +was to have the sum of active virtual memory be one and a half +to two times the physical memory on the machine. +.PP +At the time that the virtual memory system was designed, +most machines ran with little or no networking. +All the file systems were contained on disks that were +directly connected to the machine. +Similarly all the disk space devoted to swap space was also +directly connected. +Thus the speed and latency with which file systems could be accessed +were roughly equivalent to the speed and latency with which swap +space could be accessed. +Given the high cost of memory there was little incentive to have +the kernel keep track of the contents of the swap area once a process +exited since it could almost as easily and quickly be reread from the +file system. +.NH 2 +New influences +.PP +In the ten years since the current virtual memory system was designed, +many technological advances have occurred. +One effect of the technological revolution is that the +micro-processor has become powerful enough to allow users to have their +own personal workstations. +Thus the computing environment is moving away from a purely centralized +time sharing model to an environment in which users have a +computer on their desk. +This workstation is linked through a network to a centralized +pool of machines that provide filing, computing, and spooling services. +The workstations tend to have a large quantity of memory, +but little or no disk space. +Because users do not want to be bothered with backing up their disks, +and because of the difficulty of having a centralized administration +backing up hundreds of small disks, these local disks are typically +used only for temporary storage and as swap space. +Long term storage is managed by the central file server. +.PP +Another major technical advance has been in all levels of storage capacity. +In the last ten years we have experienced a factor of four decrease in the +cost per byte of disk storage. +In this same period of time the cost per byte of memory has dropped +by a factor of a hundred! +Thus the cost per byte of memory compared to the cost per byte of disk is +approaching a difference of only about a factor of ten. +The effect of this change is that the way in which a machine is used +is beginning to change dramatically. +As the amount of physical memory on machines increases and the number of +users per machine decreases, the expected +mode of operation is changing from that of supporting more active virtual +memory than physical memory to that of having a surplus of memory that can +be used for other purposes. +.PP +Because many machines will have more physical memory than they do swap +space (with diskless workstations as an extreme example!), +it is no longer reasonable to limit the maximum virtual memory +to the amount of swap space as is done in the current design. +Consequently, the new design will allow the maximum virtual memory +to be the sum of physical memory plus swap space. +For machines with no swap space, the maximum virtual memory will +be governed by the amount of physical memory. +.PP +Another effect of the current technology is that the latency and overhead +associated with accessing the file system is considerably higher +since the access must be over the network +rather than to a locally-attached disk. +One use of the surplus memory would be to +maintain a cache of recently used files; +repeated uses of these files would require at most a verification from +the file server that the data was up to date. +Under the current design, file caching is done by the buffer pool, +while the free memory is maintained in a separate pool. +The new design should have only a single memory pool so that any +free memory can be used to cache recently accessed files. +.PP +Another portion of the memory will be used to keep track of the contents +of the blocks on any locally-attached swap space analogously +to the way that memory pages are handled. +Thus inactive swap blocks can also be used to cache less-recently-used +file data. +Since the swap disk is locally attached, it can be much more quickly +accessed than a remotely located file system. +This design allows the user to simply allocate their entire local disk +to swap space, thus allowing the system to decide what files should +be cached to maximize its usefulness. +This design has two major benefits. +It relieves the user of deciding what files +should be kept in a small local file system. +It also insures that all modified files are migrated back to the +file server in a timely fashion, thus eliminating the need to dump +the local disk or push the files manually. +.NH +User Interface +.PP +This section outlines our new virtual memory interface as it is +currently envisioned. +The details of the system call interface are contained in Appendix A. +.NH 2 +Regions +.PP +The virtual memory interface is designed to support both large, +sparse address spaces as well as small, densely-used address spaces. +In this context, ``small'' is an address space roughly the +size of the physical memory on the machine, +while ``large'' may extend up to the maximum addressability of the machine. +A process may divide its address space up into a number of regions. +Initially a process begins with four regions; +a shared read-only fill-on-demand region with its text, +a private fill-on-demand region for its initialized data, +a private zero-fill-on-demand region for its uninitialized data and heap, +and a private zero-fill-on-demand region for its stack. +In addition to these regions, a process may allocate new ones. +The regions may not overlap and the system may impose an alignment +constraint, but the size of the region should not be limited +beyond the constraints of the size of the virtual address space. +.PP +Each new region may be mapped either as private or shared. +When it is privately mapped, changes to the contents of the region +are not reflected to any other process that map the same region. +Regions may be mapped read-only or read-write. +As an example, a shared library would be implemented as two regions; +a shared read-only region for the text, and a private read-write +region for the global variables associated with the library. +.PP +A region may be allocated with one of several allocation strategies. +It may map some memory hardware on the machine such as a frame buffer. +Since the hardware is responsible for storing the data, +such regions must be exclusive use if they are privately mapped. +.PP +A region can map all or part of a file. +As the pages are first accessed, the region is filled in with the +appropriate part of the file. +If the region is mapped read-write and shared, changes to the +contents of the region are reflected back into the contents of the file. +If the region is read-write but private, +changes to the region are copied to a private page that is not +visible to other processes mapping the file, +and these modified pages are not reflected back to the file. +.PP +The final type of region is ``anonymous memory''. +Uninitialed data uses such a region, privately mapped; +it is zero-fill-on-demand and its contents are abandoned +when the last reference is dropped. +Unlike a region that is mapped from a file, +the contents of an anonymous region will never be read from or +written to a disk unless memory is short and part of the region +must be paged to a swap area. +If one of these regions is mapped shared, +then all processes see the changes in the region. +This difference has important performance considerations; +the overhead of reading, flushing, and possibly allocating a file +is much higher than simply zeroing memory. +.PP +If several processes wish to share a region, +then they must have some way of rendezvousing. +For a mapped file this is easy; +the name of the file is used as the rendezvous point. +However, processes may not need the semantics of mapped files +nor be willing to pay the overhead associated with them. +For anonymous memory they must use some other rendezvous point. +Our current interface allows processes to associate a +descriptor with a region, which it may then pass to other +processes that wish to attach to the region. +Such a descriptor may be bound into the UNIX file system +name space so that other processes can find it just as +they would with a mapped file. +.NH 2 +Shared memory as high speed interprocess communication +.PP +The primary use envisioned for shared memory is to +provide a high speed interprocess communication (IPC) mechanism +between cooperating processes. +Existing IPC mechanisms (\fIi.e.\fP pipes, sockets, or streams) +require a system call to hand off a set +of data destined for another process, and another system call +by the recipient process to receive the data. +Even if the data can be transferred by remapping the data pages +to avoid a memory to memory copy, the overhead of doing the system +calls limits the throughput of all but the largest transfers. +Shared memory, by contrast, allows processes to share data at any +level of granularity without system intervention. +.PP +However, to maintain all but the simplest of data structures, +the processes must serialize their modifications to shared +data structures if they are to avoid corrupting them. +This serialization is typically done with semaphores. +Unfortunately, most implementations of semaphores are +done with system calls. +Thus processes are once again limited by the need to do two +system calls per transaction, one to lock the semaphore, the +second to release it. +The net effect is that the shared memory model provides little if +any improvement in interprocess bandwidth. +.PP +To achieve a significant improvement in interprocess bandwidth +requires a large decrease in the number of system calls needed to +achieve the interaction. +In profiling applications that use +serialization locks such as the UNIX kernel, +one typically finds that most locks are not contested. +Thus if one can find a way to avoid doing a system call in the case +in which a lock is not contested, +one would expect to be able to dramatically reduce the number +of system calls needed to achieve serialization. +.PP +In our design, cooperating processes manage their semaphores +in their own address space. +In the typical case, a process executes an atomic test-and-set instruction +to acquire a lock, finds it free, and thus is able to get it. +Only in the (rare) case where the lock is already set does the process +need to do a system call to wait for the lock to clear. +When a process is finished with a lock, +it can clear the lock itself. +Only if the ``WANT'' flag for the lock has been set is +it necessary for the process to do a system call to cause the other +process(es) to be awakened. +.PP +Another issue that must be considered is portability. +Some computers require access to special hardware to implement +atomic interprocessor test-and-set. +For such machines the setting and clearing of locks would +all have to be done with system calls; +applications could still use the same interface without change, +though they would tend to run slowly. +.PP +The other issue of compatibility is with System V's semaphore +implementation. +Since the System V interface has been in existence for several years, +and applications have been built that depend on this interface, +it is important that this interface also be available. +Although the interface is based on system calls for both setting and +clearing locks, +the same interface can be obtained using our interface without +system calls in most cases. +.PP +This implementation can be achieved as follows. +System V allows entire sets of semaphores to be set concurrently. +If any of the locks are unavailable, the process is put to sleep +until they all become available. +Under our paradigm, a single additional semaphore is defined +that serializes access to the set of semaphores being simulated. +Once obtained in the usual way, the set of semaphores can be +inspected to see if the desired ones are available. +If they are available, they are set, the guardian semaphore +is released and the process proceeds. +If one or more of the requested set is not available, +the guardian semaphore is released and the process selects an +unavailable semaphores for which to wait. +On being reawakened, the whole selection process must be repeated. +.PP +In all the above examples, there appears to be a race condition. +Between the time that the process finds that a semaphore is locked, +and the time that it manages to call the system to sleep on the +semaphore another process may unlock the semaphore and issue a wakeup call. +Luckily the race can be avoided. +The insight that is critical is that the process and the kernel agree +on the physical byte of memory that is being used for the semaphore. +The system call to put a process to sleep takes a pointer +to the desired semaphore as its argument so that once inside +the kernel, the kernel can repeat the test-and-set. +If the lock has cleared +(and possibly the wakeup issued) between the time that the process +did the test-and-set and eventually got into the sleep request system call, +then the kernel immediately resumes the process rather than putting +it to sleep. +Thus the only problem to solve is how the kernel interlocks between testing +a semaphore and going to sleep; +this problem has already been solved on existing systems. +.NH +References +.sp +.IP [Babaoglu79] 20 +Babaoglu, O., and Joy, W., +``Data Structures Added in the Berkeley Virtual Memory Extensions +to the UNIX Operating System'' +Computer Systems Research Group, Dept of EECS, University of California, +Berkeley, CA 94720, USA, November 1979. +.IP [Someren84] 20 +Someren, J. van, +``Paging in Berkeley UNIX'', +Laboratorium voor schakeltechniek en techneik v.d. +informatieverwerkende machines, +Codenummer 051560-44(1984)01, February 1984. diff --git a/share/doc/papers/newvm/Makefile b/share/doc/papers/newvm/Makefile new file mode 100644 index 0000000..6b1a9e3 --- /dev/null +++ b/share/doc/papers/newvm/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 1.4 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= newvm +SRCS= 0.t 1.t a.t +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/papers/newvm/a.t b/share/doc/papers/newvm/a.t new file mode 100644 index 0000000..bb20df1 --- /dev/null +++ b/share/doc/papers/newvm/a.t @@ -0,0 +1,240 @@ +.\" Copyright (c) 1986 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)a.t 5.1 (Berkeley) 4/16/91 +.\" $FreeBSD$ +.\" +.sp 2 +.ne 2i +.NH +Appendix A \- Virtual Memory Interface +.NH 2 +Mapping pages +.PP +The system supports sharing of data between processes +by allowing pages to be mapped into memory. These mapped +pages may be \fIshared\fP with other processes or \fIprivate\fP +to the process. +Protection and sharing options are defined in \fI<sys/mman.h>\fP as: +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* protections are chosen from these bits, or-ed together */ +#define PROT_READ 0x04 /* pages can be read */ +#define PROT_WRITE 0x02 /* pages can be written */ +#define PROT_EXEC 0x01 /* pages can be executed */ +.DE +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* flags contain mapping type, sharing type and options */ +/* mapping type; choose one */ +#define MAP_FILE 0x0001 /* mapped from a file or device */ +#define MAP_ANON 0x0002 /* allocated from memory, swap space */ +#define MAP_TYPE 0x000f /* mask for type field */ +.DE +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* sharing types; choose one */ +#define MAP_SHARED 0x0010 /* share changes */ +#define MAP_PRIVATE 0x0000 /* changes are private */ +.DE +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* other flags */ +#define MAP_FIXED 0x0020 /* map addr must be exactly as requested */ +#define MAP_INHERIT 0x0040 /* region is retained after exec */ +#define MAP_HASSEMAPHORE 0x0080 /* region may contain semaphores */ +.DE +The cpu-dependent size of a page is returned by the +\fIgetpagesize\fP system call: +.DS +pagesize = getpagesize(); +result int pagesize; +.DE +.LP +The call: +.DS +maddr = mmap(addr, len, prot, flags, fd, pos); +result caddr_t maddr; caddr_t addr; int *len, prot, flags, fd; off_t pos; +.DE +causes the pages starting at \fIaddr\fP and continuing +for at most \fIlen\fP bytes to be mapped from the object represented by +descriptor \fIfd\fP, starting at byte offset \fIpos\fP. +The starting address of the region is returned; +for the convenience of the system, +it may differ from that supplied +unless the MAP_FIXED flag is given, +in which case the exact address will be used or the call will fail. +The actual amount mapped is returned in \fIlen\fP. +The \fIaddr\fP, \fIlen\fP, and \fIpos\fP parameters +must all be multiples of the pagesize. +A successful \fImmap\fP will delete any previous mapping +in the allocated address range. +The parameter \fIprot\fP specifies the accessibility +of the mapped pages. +The parameter \fIflags\fP specifies +the type of object to be mapped, +mapping options, and +whether modifications made to +this mapped copy of the page +are to be kept \fIprivate\fP, or are to be \fIshared\fP with +other references. +Possible types include MAP_FILE, +mapping a regular file or character-special device memory, +and MAP_ANON, which maps memory not associated with any specific file. +The file descriptor used for creating MAP_ANON regions is used only +for naming, and may be given as \-1 if no name +is associated with the region.\(dg +.FS +\(dg The current design does not allow a process +to specify the location of swap space. +In the future we may define an additional mapping type, MAP_SWAP, +in which the file descriptor argument specifies a file +or device to which swapping should be done. +.FE +The MAP_INHERIT flag allows a region to be inherited after an \fIexec\fP. +The MAP_HASSEMAPHORE flag allows special handling for +regions that may contain semaphores. +.PP +A facility is provided to synchronize a mapped region with the file +it maps; the call +.DS +msync(addr, len); +caddr_t addr; int len; +.DE +writes any modified pages back to the filesystem and updates +the file modification time. +If \fIlen\fP is 0, all modified pages within the region containing \fIaddr\fP +will be flushed; +if \fIlen\fP is non-zero, only the pages containing \fIaddr\fP and \fIlen\fP +succeeding locations will be examined. +Any required synchronization of memory caches +will also take place at this time. +Filesystem operations on a file that is mapped for shared modifications +are unpredictable except after an \fImsync\fP. +.PP +A mapping can be removed by the call +.DS +munmap(addr, len); +caddr_t addr; int len; +.DE +This call deletes the mappings for the specified address range, +and causes further references to addresses within the range +to generate invalid memory references. +.NH 2 +Page protection control +.PP +A process can control the protection of pages using the call +.DS +mprotect(addr, len, prot); +caddr_t addr; int len, prot; +.DE +This call changes the specified pages to have protection \fIprot\fP\|. +Not all implementations will guarantee protection on a page basis; +the granularity of protection changes may be as large as an entire region. +.NH 2 +Giving and getting advice +.PP +A process that has knowledge of its memory behavior may +use the \fImadvise\fP call: +.DS +madvise(addr, len, behav); +caddr_t addr; int len, behav; +.DE +\fIBehav\fP describes expected behavior, as given +in \fI<sys/mman.h>\fP: +.DS +.ta \w'#define\ \ 'u +\w'MADV_SEQUENTIAL\ \ 'u +\w'00\ \ \ \ 'u +#define MADV_NORMAL 0 /* no further special treatment */ +#define MADV_RANDOM 1 /* expect random page references */ +#define MADV_SEQUENTIAL 2 /* expect sequential references */ +#define MADV_WILLNEED 3 /* will need these pages */ +#define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_SPACEAVAIL 5 /* insure that resources are reserved */ +.DE +Finally, a process may obtain information about whether pages are +core resident by using the call +.DS +mincore(addr, len, vec) +caddr_t addr; int len; result char *vec; +.DE +Here the current core residency of the pages is returned +in the character array \fIvec\fP, with a value of 1 meaning +that the page is in-core. +.NH 2 +Synchronization primitives +.PP +Primitives are provided for synchronization using semaphores in shared memory. +Semaphores must lie within a MAP_SHARED region with at least modes +PROT_READ and PROT_WRITE. +The MAP_HASSEMAPHORE flag must have been specified when the region was created. +To acquire a lock a process calls: +.DS +value = mset(sem, wait) +result int value; semaphore *sem; int wait; +.DE +\fIMset\fP indivisibly tests and sets the semaphore \fIsem\fP. +If the previous value is zero, the process has acquired the lock +and \fImset\fP returns true immediately. +Otherwise, if the \fIwait\fP flag is zero, +failure is returned. +If \fIwait\fP is true and the previous value is non-zero, +\fImset\fP relinquishes the processor until notified that it should retry. +.LP +To release a lock a process calls: +.DS +mclear(sem) +semaphore *sem; +.DE +\fIMclear\fP indivisibly tests and clears the semaphore \fIsem\fP. +If the ``WANT'' flag is zero in the previous value, +\fImclear\fP returns immediately. +If the ``WANT'' flag is non-zero in the previous value, +\fImclear\fP arranges for waiting processes to retry before returning. +.PP +Two routines provide services analogous to the kernel +\fIsleep\fP and \fIwakeup\fP functions interpreted in the domain of +shared memory. +A process may relinquish the processor by calling \fImsleep\fP +with a set semaphore: +.DS +msleep(sem) +semaphore *sem; +.DE +If the semaphore is still set when it is checked by the kernel, +the process will be put in a sleeping state +until some other process issues an \fImwakeup\fP for the same semaphore +within the region using the call: +.DS +mwakeup(sem) +semaphore *sem; +.DE +An \fImwakeup\fP may awaken all sleepers on the semaphore, +or may awaken only the next sleeper on a queue. diff --git a/share/doc/papers/newvm/spell.ok b/share/doc/papers/newvm/spell.ok new file mode 100644 index 0000000..543dc7e --- /dev/null +++ b/share/doc/papers/newvm/spell.ok @@ -0,0 +1,56 @@ +ANON +Babaoglu +Babaoglu79 +Behav +CM +Codenummer +DONTNEED +Dept +EECS +Filesystem +HASSEMAPHORE +IPC +Karels +Laboratorium +MADV +McKusick +Mclear +Mset +NOEXTEND +PROT +SPACEAVAIL +Someren +Someren84 +WILLNEED +addr +behav +caching +caddr +es +fd +filesystem +getpagesize +informatieverwerkende +len +maddr +madvise +mclear +mincore +mman.h +mmap +mprotect +mset +msleep +msync +munmap +mwakeup +pagesize +pos +prot +runtime +schakeltechniek +sem +techneik +v.d +vec +voor diff --git a/share/doc/papers/relengr/0.t b/share/doc/papers/relengr/0.t new file mode 100644 index 0000000..569be69 --- /dev/null +++ b/share/doc/papers/relengr/0.t @@ -0,0 +1,92 @@ +.\" Copyright (c) 1989 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 5.1 (Berkeley) 4/17/91 +.\" $FreeBSD$ +.\" +.rm CM +.nr PO 1.25i +.ds CH " +.ds CF "% +.nr Fn 0 1 +.ds b3 4.3\s-1BSD\s+1 +.de KI +.ds Lb "Fig. \\n+(Fn +.KF +.ce 1 +Figure \\n(Fn - \\$1. +.. +.de SM +\\s-1\\$1\\s+1\\$2 +.. +.de NM +\&\fI\\$1\fP\\$2 +.. +.de RN +\&\fI\\$1\fP\^(\^)\\$2 +.. +.de PN +\&\fB\\$1\fP\\$2 +.. +.TL +The Release Engineering of 4.3\s-1BSD\s0 +.AU +Marshall Kirk McKusick +.AU +Michael J. Karels +.AU +Keith Bostic +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.AB +This paper describes an approach used by a small group of people +to develop and integrate a large software system. +It details the development and release engineering strategy +used during the preparation of the \*(b3 version of the UNIX\(dg +.FS +\(dgUNIX is a registered trademark of AT&T in the US and other countries. +.FE +operating system. +Each release cycle is divided into an initial development phase +followed by a release engineering phase. +The release engineering of the distribution is done in three steps. +The first step has an informal control policy for tracking modifications; +it results in an alpha distribution. +The second step has more rigid change mechanisms in place; +it results in a beta release. +During the final step changes are tracked very closely; +the result is the final distribution. +.AE +.LP diff --git a/share/doc/papers/relengr/1.t b/share/doc/papers/relengr/1.t new file mode 100644 index 0000000..6fbe287 --- /dev/null +++ b/share/doc/papers/relengr/1.t @@ -0,0 +1,69 @@ +.\" Copyright (c) 1989 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 5.1 (Berkeley) 4/17/91 +.\" +.NH +Introduction +.PP +The Computer Systems Research Group (\c +.SM CSRG ) +has always been a small group of software developers. +This resource limitation requires careful software-engineering management +as well as careful coordination of both +.SM CSRG +personnel and the members of the general community who +contribute to the development of the system. +.PP +Releases from Berkeley alternate between those that introduce +major new facilities and those that provide bug fixes and efficiency +improvements. +This alternation allows timely releases, while providing for refinement, +tuning, and correction of the new facilities. +The timely followup of ``cleanup'' releases reflects the importance +.SM CSRG +places on providing a reliable and robust system on which its +user community can depend. +.PP +The development of the Berkeley Software Distribution (\c +.SM BSD ) +illustrates an \fIadvantage\fP of having a few +principal developers: +the developers all understand the entire system thoroughly enough +to be able to coordinate their own work with +that of other people to produce a coherent final system. +Companies with large development organizations find +this result difficult to duplicate. +This paper describes the process by which +the development effort for \*(b3 was managed. +.[ +design and implementation +.] diff --git a/share/doc/papers/relengr/2.t b/share/doc/papers/relengr/2.t new file mode 100644 index 0000000..0c3ce8c --- /dev/null +++ b/share/doc/papers/relengr/2.t @@ -0,0 +1,146 @@ +.\" Copyright (c) 1989 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 5.1 (Berkeley) 4/17/91 +.\" +.NH +System Development +.PP +The first phase of each Berkeley system is its development. +.SM CSRG +maintains a continuously evolving list of projects that are candidates +for integration into the system. +Some of these are prompted by emerging ideas from the research world, +such as the availability of a new technology, while other additions +are suggested by the commercial world, such as the introduction of +new standards like +.SM POSIX , +and still other projects are emergency responses to situations like +the Internet Worm. +.PP +These projects are ordered based on the perceived benefit of the +project as opposed to its difficulty; +the most important are selected for inclusion in each new release. +Often there is a prototype available from a group outside +.SM CSRG . +Because of the limited staff at +.SM CSRG , +this prototype is obtained to use as a starting base +for integration into the +.SM BSD +system. +Only if no prototype is available is the project begun in-house. +In either case, the design of the facility is forced to conform to the +.SM CSRG +style. +.PP +Unlike other development groups, the staff of +.SM CSRG +specializes by projects rather than by particular parts +of the system; +a staff person will be responsible for all aspects of a project. +This responsibility starts at the associated kernel device drivers; +it proceeds up through the rest of the kernel, +through the C library and system utility programs, +ending at the user application layer. +This staff person is also responsible for related documentation, +including manual pages. +Many projects proceed in parallel, +interacting with other projects as their paths cross. +.PP +All source code, documentation, and auxiliary files are kept +under a source code control system. +During development, +this control system is critical for notifying people +when they are colliding with other ongoing projects. +Even more important, however, +is the audit trail maintained by the control system that +is critical to the release engineering phase of the project +described in the next section. +.PP +Much of the development of +.SM BSD +is done by personnel that are located at other institutions. +Many of these people not only have interim copies of the release +running on their own machines, +but also have user accounts on the main development +machine at Berkeley. +Such users are commonly found logged in at Berkeley over the +Internet, or sometimes via telephone dialup, from places as far away +as Massachusetts or Maryland, as well as from closer places, such as +Stanford. +For the \*(b3 release, +certain users had permission to modify the master copy of the +system source directly. +People given access to the master sources +are carefully screened beforehand, +but are not closely supervised. +Their work is checked at the end of the beta-test period by +.SM CSRG +personnel who back out inappropriate changes. +Several facilities, including the +Fortran and C compilers, +as well as important system programs, for example, +.PN telnet +and +.PN ftp , +include significant contributions from people who did not work +directly for +.SM CSRG . +One important exception to this approach is that changes to the kernel +are made only by +.SM CSRG +personnel, although the changes are often suggested by the larger community. +.PP +The development phase continues until +.SM CSRG +decides that it is appropriate to make a release. +The decision to halt development and transition to release mode +is driven by several factors. +The most important is that enough projects have been completed +to make the system significantly superior to the previously released +version of the system. +For example, +\*(b3 was released primarily because of the need for +the improved networking capabilities and the markedly +improved system performance. +Of secondary importance is the issue of timing. +If the releases are too infrequent, then +.SM CSRG +will be inundated with requests for interim releases. +Conversely, +if systems are released too frequently, +the integration cost for many vendors will be too high, +causing them to ignore the releases. +Finally, +the process of release engineering is long and tedious. +Frequent releases slow the rate of development and +cause undue tedium to the staff. diff --git a/share/doc/papers/relengr/3.t b/share/doc/papers/relengr/3.t new file mode 100644 index 0000000..8d89ded --- /dev/null +++ b/share/doc/papers/relengr/3.t @@ -0,0 +1,390 @@ +.\" Copyright (c) 1989 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)3.t 5.1 (Berkeley) 4/17/91 +.\" +.NH +System Release +.PP +Once the decision has been made to halt development +and begin release engineering, +all currently unfinished projects are evaluated. +This evaluation involves computing the time required to complete +the project as opposed to how important the project is to the +upcoming release. +Projects that are not selected for completion are +removed from the distribution branch of the source code control system +and saved on branch deltas so they can be retrieved, +completed, and merged into a future release; +the remaining unfinished projects are brought to orderly completion. +.PP +Developments from +.SM CSRG +are released in three steps: alpha, beta, and final. +Alpha and beta releases are not true distributions\(emthey +are test systems. +Alpha releases are normally available to only a few sites, +usually those working closely with +.SM CSRG . +More sites are given beta releases, +as the system is closer to completion, +and needs wider testing to find more obscure problems. +For example, \*(b3 alpha was distributed to about fifteen +sites, while \*(b3 beta ran at more than a hundred. +.NH 2 +Alpha Distribution Development +.PP +The first step in creating an alpha distribution is to evaluate the +existing state of the system and to decide what software should be +included in the release. +This decision process includes not only deciding what software should +be added, but also what obsolete software ought to be retired from the +distribution. +The new software includes the successful projects that have been +completed at +.SM CSRG +and elsewhere, as well as some portion of the vast quantity of +contributed software that has been offered during the development +period. +.PP +Once an initial list has been created, +a prototype filesystem corresponding to the distribution +is constructed, typically named +.PN /nbsd . +This prototype will eventually turn into the master source tree for the +final distribution. +During the period that the alpha distribution is being created, +.PN /nbsd +is mounted read-write, and is highly fluid. +Programs are created and deleted, +old versions of programs are completely replaced, +and the correspondence between the sources and binaries +is only loosely tracked. +People outside +.SM CSRG +who are helping with the distribution are free to +change their parts of the distribution at will. +.PP +During this period the newly forming distribution is +checked for interoperability. +For example, +in \*(b3 the output of context differences from +.PN diff +was changed to merge overlapping sections. +Unfortunately, this change broke the +.PN patch +program which could no longer interpret the output of +.PN diff . +Since the change to +.PN diff +and the +.PN patch +program had originated outside Berkeley, +.SM CSRG +had to coordinate the efforts of the respective authors +to make the programs work together harmoniously. +.PP +Once the sources have stabilized, +an attempt is made to compile the entire source tree. +Often this exposes errors caused by changed header files, +or use of obsoleted C library interfaces. +If the incompatibilities affect too many programs, +or require excessive amounts of change in the programs +that are affected, +the incompatibility is backed out or some backward-compatible +interface is provided. +The incompatibilities that are found and left in are noted +in a list that is later incorporated into the release notes. +Thus, users upgrading to the new system can anticipate problems +in their own software that will require change. +.PP +Once the source tree compiles completely, +it is installed and becomes the running system that +.SM CSRG +uses on its main development machine. +Once in day-to-day use, +other interoperability problems become apparent +and are resolved. +When all known problems have been resolved, and the system has been +stable for some period of time, an alpha distribution tape is made +from the contents of +.PN /nbsd . +.PP +The alpha distribution is sent out to a small set of test sites. +These test sites are selected as having a +sophisticated user population, not only capable of finding bugs, +but also of determining their cause and developing a fix for the problem. +These sites are usually composed of groups that are contributing +software to the distribution or groups that have a particular expertise +with some portion of the system. +.NH 2 +Beta Distribution Development +.PP +After the alpha tape is created, +the distribution filesystem is mounted read-only. +Further changes are requested in a change log rather than +being made directly to the distribution. +The change requests are inspected and implemented by a +.SM CSRG +staff person, followed by a compilation of the affected +programs to ensure that they still build correctly. +Once the alpha tape has been cut, +changes to the distribution are no longer made by people outside +.SM CSRG . +.PP +As the alpha sites install and begin running the alpha distribution, +they monitor the problems that they encounter. +For minor bugs, they typically report back the bug along with +a suggested fix. +Since many of the alpha sites are selected from among the people +working closely with +.SM CSRG , +they often have accounts on, and access to, the primary +.SM CSRG +development machine. +Thus, they are able to directly install the fix themselves, +and simply notify +.SM CSRG +when they have fixed the problem. +After verifying the fix, the affected files are added to +the list to be updated on +.PN /nbsd . +.PP +The more important task of the alpha sites is to test out the +new facilities that have been added to the system. +The alpha sites often find major design flaws +or operational shortcomings of the facilities. +When such problems are found, +the person in charge of that facility is responsible +for resolving the problem. +Occasionally this requires redesigning and reimplementing +parts of the affected facility. +For example, +in 4.2\s-1BSD\s+1, +the alpha release of the networking system did not have connection queueing. +This shortcoming prevented the network from handling many +connections to a single server. +The result was that the networking interface had to be +redesigned to provide this functionality. +.PP +The alpha sites are also responsible for ferreting out interoperability +problems between different utilities. +The user populations of the test sites differ from the user population at +.SM CSRG , +and, as a result, the utilities are exercised in ways that differ +from the ways that they are used at +.SM CSRG . +These differences in usage patterns turn up problems that +do not occur in our initial test environment. +.PP +The alpha sites frequently redistribute the alpha tape to several +of their own alpha sites that are particularly interested +in parts of the new system. +These additional sites are responsible for reporting +problems back to the site from which they received the distribution, +not to +.SM CSRG . +Often these redistribution sites are less sophisticated than the +direct alpha sites, so their reports need to be filtered +to avoid spurious, or site dependent, bug reports. +The direct alpha sites sift through the reports to find those that +are relevant, and usually verify the suggested fix if one is given, +or develop a fix if none is provided. +This hierarchical testing process forces +bug reports, fixes, and new software +to be collected, evaluated, and checked for inaccuracies +by first-level sites before being forwarded to +.SM CSRG , +allowing the developers at +.SM CSRG +to concentrate on tracking the changes being made to the system +rather than sifting through information (often voluminous) from every +alpha-test site. +.PP +Once the major problems have been attended to, +the focus turns to getting the documentation synchronized +with the code that is being shipped. +The manual pages need to be checked to be sure that +they accurately reflect any changes to the programs that +they describe. +Usually the manual pages are kept up to date as +the program they describe evolves. +However, the supporting documents frequently do not get changed, +and must be edited to bring them up to date. +During this review, the need for other documents becomes evident. +For example, it was +during this phase of \*(b3 that it was decided +to add a tutorial document on how to use the socket +interprocess communication primitives. +.PP +Another task during this period is to contact the people that +have contributed complete software packages +(such as +.PN RCS +or +.PN MH ) +in previous releases to see if they wish to +make any revisions to their software. +For those who do, +the new software has to be obtained, +and tested to verify that it compiles and runs +correctly on the system to be released. +Again, this integration and testing can often be done by the +contributors themselves by logging directly into the master machine. +.PP +After the stream of bug reports has slowed down +to a reasonable level, +.SM CSRG +begins a careful review of all the changes to the +system since the previous release. +The review is done by running a recursive +.PN diff +of the entire source tree\(emhere, of +.PN /nbsd +with 4.2\s-1BSD\s+1. +All the changes are checked to ensure that they are reasonable, +and have been properly documented. +The process often turns up questionable changes. +When such a questionable change is found, +the source code control system log is examined to find +out who made the change and what their explanation was +for the change. +If the log does not resolve the problem, +the person responsible for the change is asked for an explanation +of what they were trying to accomplish. +If the reason is not compelling, +the change is backed out. +Facilities deemed inappropriate in \*(b3 included new options to +the directory-listing command and a changed return value for the +.RN fseek +library routine; +the changes were removed from the source before final distribution. +Although this process is long and tedious, +it forces the developers to obtain a coherent picture of the entire set of +changes to the system. +This exercise often turns up inconsistencies that would +otherwise never be found. +.PP +The outcome of the comparison results in +a pair of documents detailing +changes to every user-level command +.[ +Bug Fixes and Changes +.] +and to every kernel source file. +.[ +Changes to the Kernel +.] +These documents are delivered with the final distribution. +A user can look up any command by name and see immediately +what has changed, +and a developer can similarly look up any kernel +file by name and get a summary of the changes to that file. +.PP +Having completed the review of the entire system, +the preparation of the beta distribution is started. +Unlike the alpha distribution, where pieces of the system +may be unfinished and the documentation incomplete, +the beta distribution is put together as if it were +going to be the final distribution. +All known problems are fixed, and any remaining development +is completed. +Once the beta tape has been prepared, +no further changes are permitted to +.PN /nbsd +without careful review, +as spurious changes made after the system has been +.PN diff ed +are unlikely to be caught. +.NH 2 +Final Distribution Development +.PP +The beta distribution goes to more sites than the +alpha distribution for three main reasons. +First, as it is closer to the final release, more sites are willing +to run it in a production environment without fear of catastrophic failures. +Second, more commercial sites delivering +.SM BSD -\c +derived systems are interested in getting a preview of the +upcoming changes in preparation for merging them into their +own systems. +Finally, because the beta tape has fewer problems, +it is beneficial to offer it to more sites in hopes of +finding as many of the remaining problems as possible. +Also, by handing the system out to less sophisticated sites, +issues that would be ignored by the users of the alpha sites +become apparent. +.PP +The anticipation is that the beta tape will not require +extensive changes to either the programs or the documentation. +Most of the work involves sifting through the reported bugs +to find those that are relevant and devising the minimal +reasonable set of changes to fix them. +After throughly testing the fix, it is listed in the update log for +.PN /nbsd . +One person at +.SM CSRG +is responsible for doing the update of +.PN /nbsd +and ensuring that everything affected by the change is rebuilt and tested. +Thus, a change to a C library routine requires that the entire +system be rebuilt. +.PP +During this period, the documentation is all printed and proofread. +As minor changes are made to the manual pages and documentation, +the affected pages must be reprinted. +.PP +The final step in the release process is to check the distribution tree +to ensure that it is in a consistent state. +This step includes verification that every file and directory +on the distribution has the proper owner, group, and modes. +All source files must be checked to be sure that they have +appropriate copyright notices and source code control system headers. +Any extraneous files must be removed. +Finally, the installed binaries must be checked to ensure that they correspond +exactly to the sources and libraries that are on the distribution. +.PP +This checking is a formidable task given that there are over 20,000 files on +a typical distribution. +Much of the checking can be done by a set of programs set to scan +over the distribution tree. +Unfortunately, the exception list is long, and requires +hours of tedious hand checking; this has caused +.SM CSRG +to develop even +more comprehensive validation programs for use in our next release. +.PP +Once the final set of checks has been run, +the master tape can be made, and the official distribution started. +As for the staff of +.SM CSRG , +we usually take a brief vacation before plunging back into +a new development phase. diff --git a/share/doc/papers/relengr/Makefile b/share/doc/papers/relengr/Makefile new file mode 100644 index 0000000..88ab5af --- /dev/null +++ b/share/doc/papers/relengr/Makefile @@ -0,0 +1,15 @@ +# From: @(#)Makefile 1.6 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= releng +SRCS= stubs 0.t 1.t 2.t 3.t +EXTRA= ref.bib +MACROS= -ms +USE_REFER= +CLEANFILES= stubs + +stubs: + @(echo .R1; echo database ${.CURDIR}/ref.bib; echo .R2) > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/papers/relengr/ref.bib b/share/doc/papers/relengr/ref.bib new file mode 100644 index 0000000..6f33cd7 --- /dev/null +++ b/share/doc/papers/relengr/ref.bib @@ -0,0 +1,26 @@ +%A M. K. McKusick +%A J. M. Bloom +%A M. J. Karels +%T Bug Fixes and Changes in 4.3BSD +%B \s-1UNIX\s0 System Manager's Manual, 4.3 Berkeley Software Distribution, Virtual VAX-11 Version +%I \s-1USENIX\s0 Association +%C Berkeley, CA +%P 12:1\-22 +%D 1986 + +%A M. J. Karels +%T Changes to the Kernel in 4.3BSD +%B \s-1UNIX\s0 System Manager's Manual, 4.3 Berkeley Software Distribution, Virtual VAX-11 Version +%I \s-1USENIX\s0 Association +%C Berkeley, CA +%P 13:1\-32 +%D 1986 + +%A S. J. Leffler +%A M. K. McKusick +%A M. J. Karels +%A J. S. Quarterman +%T The Design and Implementation of the 4.3BSD UNIX Operating System +%I Addison-Wesley +%C Reading, MA +%D 1989 diff --git a/share/doc/papers/relengr/spell.ok b/share/doc/papers/relengr/spell.ok new file mode 100644 index 0000000..13f5cf8 --- /dev/null +++ b/share/doc/papers/relengr/spell.ok @@ -0,0 +1,15 @@ +BSD +Bostic +CH +CM +CSRG +Fn +Karels +Lb +McKusick +POSIX +editted +filesystem +followup +mothballed +nbsd diff --git a/share/doc/papers/sysperf/0.t b/share/doc/papers/sysperf/0.t new file mode 100644 index 0000000..0c27a34 --- /dev/null +++ b/share/doc/papers/sysperf/0.t @@ -0,0 +1,247 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 5.1 (Berkeley) 4/17/91 +.\" +.if n .ND +.TL +Measuring and Improving the Performance of Berkeley UNIX* +.sp +April 17, 1991 +.AU +Marshall Kirk McKusick, +Samuel J. Leffler\(dg, +Michael J. Karels +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +* UNIX is a trademark of AT&T Bell Laboratories. +.FE +.FS +\(dg Samuel J. Leffler is currently employed by: +Silicon Graphics, Inc. +.FE +.FS +This work was done under grants from +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +ARPA Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.FE +The 4.2 Berkeley Software Distribution of +.UX +for the VAX\(dd +.FS +\(dd VAX, MASSBUS, UNIBUS, and DEC are trademarks of +Digital Equipment Corporation. +.FE +had several problems that could severely affect the overall +performance of the system. +These problems were identified with +kernel profiling and system tracing during day to day use. +Once potential problem areas had been identified +benchmark programs were devised to highlight the bottlenecks. +These benchmarks verified that the problems existed and provided +a metric against which to validate proposed solutions. +This paper examines +the performance problems encountered and describes +modifications that have been made +to the system since the initial distribution. +.PP +The changes to the system have consisted of improvements to the +performance of the existing facilities, +as well as enhancements to the current facilities. +Performance improvements in the kernel include cacheing of path name +translations, reductions in clock handling and scheduling overhead, +and improved throughput of the network subsystem. +Performance improvements in the libraries and utilities include replacement of +linear searches of system databases with indexed lookup, +merging of most network services into a single daemon, +and conversion of system utilities to use the more efficient +facilities available in 4.2BSD. +Enhancements in the kernel include the addition of subnets and gateways, +increases in many kernel limits, +cleanup of the signal and autoconfiguration implementations, +and support for windows and system logging. +Functional extensions in the libraries and utilities include +the addition of an Internet name server, +new system management tools, +and extensions to \fIdbx\fP to work with Pascal. +The paper concludes with a brief discussion of changes made to +the system to enhance security. +All of these enhancements are present in Berkeley UNIX 4.3BSD. +.AE +.LP +.sp 2 +CR Categories and Subject Descriptors: +D.4.3 +.B "[Operating Systems]": +File Systems Management \- +.I "file organization, directory structures, access methods"; +D.4.8 +.B "[Operating Systems]": +Performance \- +.I "measurements, operational analysis"; +.sp +Additional Keywords and Phrases: +Berkeley UNIX, +system performance, +application program interface. +.sp +General Terms: +UNIX operating system, +measurement, +performance. +.de PT +.lt \\n(LLu +.pc % +.nr PN \\n% +.tl '\\*(LH'\\*(CH'\\*(RH' +.lt \\n(.lu +.. +.af PN i +.ds LH Performance +.ds RH Contents +.bp 1 +.if t .ds CF April 17, 1991 +.if t .ds LF DRAFT +.if t .ds RF McKusick, et. al. +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. Observation techniques +\0.1. System maintenance tools +\0.2. Kernel profiling +\0.3. Kernel tracing +\0.4. Benchmark programs +.LP +.sp .5v +.nf +.B "3. Results of our observations +\0.1. User programs +\0.1.1. Mail system +\0.1.2. Network servers +\0.2. System overhead +\0.2.1. Micro-operation benchmarks +\0.2.2. Path name translation +\0.2.3. Clock processing +\0.2.4. Terminal multiplexors +\0.2.5. Process table management +\0.2.6. File system buffer cache +\0.2.7. Network subsystem +\0.2.8. Virtual memory subsystem +.LP +.sp .5v +.nf +.B "4. Performance Improvements +\0.1. Performance Improvements in the Kernel +\0.1.1. Name Cacheing +\0.1.2. Intelligent Auto Siloing +\0.1.3. Process Table Management +\0.1.4. Scheduling +\0.1.5. Clock Handling +\0.1.6. File System +\0.1.7. Network +\0.1.8. Exec +\0.1.9. Context Switching +\0.1.10. Setjmp and Longjmp +\0.1.11. Compensating for Lack of Compiler Technology +\0.2. Improvements to Libraries and Utilities +\0.2.1. Hashed Databases +\0.2.2. Buffered I/O +\0.2.3. Mail System +\0.2.4. Network Servers +\0.2.5. The C Run-time Library +\0.2.6. Csh +.LP +.sp .5v +.nf +.B "5. Functional Extensions +\0.1. Kernel Extensions +\0.1.1. Subnets, Broadcasts, and Gateways +\0.1.2. Interface Addressing +\0.1.3. User Control of Network Buffering +\0.1.4. Number of File Descriptors +\0.1.5. Kernel Limits +\0.1.6. Memory Management +\0.1.7. Signals +\0.1.8. System Logging +\0.1.9. Windows +\0.1.10. Configuration of UNIBUS Devices +\0.1.11. Disk Recovery from Errors +\0.2. Functional Extensions to Libraries and Utilities +\0.2.1. Name Server +\0.2.2. System Management +\0.2.3. Routing +\0.2.4. Compilers +.LP +.sp .5v +.nf +.B "6. Security Tightening +\0.1. Generic Kernel +\0.2. Security Problems in Utilities +.LP +.sp .5v +.nf +.B "7. Conclusions +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References +.LP +.sp .5v +.nf +.B "Appendix \- Benchmark Programs" +.de _d +.if t .ta .6i 2.1i 2.6i +.\" 2.94 went to 2.6, 3.64 to 3.30 +.if n .ta .84i 2.6i 3.30i +.. +.de _f +.if t .ta .5i 1.25i 2.5i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i +.. diff --git a/share/doc/papers/sysperf/1.t b/share/doc/papers/sysperf/1.t new file mode 100644 index 0000000..88608ee --- /dev/null +++ b/share/doc/papers/sysperf/1.t @@ -0,0 +1,81 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 5.1 (Berkeley) 4/17/91 +.\" +.ds RH Introduction +.af PN 1 +.bp 1 +.NH +Introduction +.PP +The Berkeley Software Distributions of +.UX +for the VAX have added many new capabilities that were +previously unavailable under +.UX . +The development effort for 4.2BSD concentrated on providing new +facilities, and in getting them to work correctly. +Many new data structures were added to the system to support +these new capabilities. +In addition, +many of the existing data structures and algorithms +were put to new uses or their old functions placed under increased demand. +The effect of these changes was that +mechanisms that were well tuned under 4.1BSD +no longer provided adequate performance for 4.2BSD. +The increased user feedback that came with the release of +4.2BSD and a growing body of experience with the system +highlighted the performance shortcomings of 4.2BSD. +.PP +This paper details the work that we have done since +the release of 4.2BSD to measure the performance of the system, +detect the bottlenecks, +and find solutions to remedy them. +Most of our tuning has been in the context of the real +timesharing systems in our environment. +Rather than using simulated workloads, +we have sought to analyze our tuning efforts under +realistic conditions. +Much of the work has been done in the machine independent parts +of the system, hence these improvements could be applied to +other variants of UNIX with equal success. +All of the changes made have been included in 4.3BSD. +.PP +Section 2 of the paper describes the tools and techniques +available to us for measuring system performance. +In Section 3 we present the results of using these tools, while Section 4 +has the performance improvements +that have been made to the system based on our measurements. +Section 5 highlights the functional enhancements that have +been made to Berkeley UNIX 4.2BSD. +Section 6 discusses some of the security problems that +have been addressed. diff --git a/share/doc/papers/sysperf/2.t b/share/doc/papers/sysperf/2.t new file mode 100644 index 0000000..703cbb6 --- /dev/null +++ b/share/doc/papers/sysperf/2.t @@ -0,0 +1,258 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 5.1 (Berkeley) 4/17/91 +.\" +.ds RH Observation techniques +.NH +Observation techniques +.PP +There are many tools available for monitoring the performance +of the system. +Those that we found most useful are described below. +.NH 2 +System maintenance tools +.PP +Several standard maintenance programs are invaluable in +observing the basic actions of the system. +The \fIvmstat\fP(1) +program is designed to be an aid to monitoring +systemwide activity. Together with the +\fIps\fP\|(1) +command (as in ``ps av''), it can be used to investigate systemwide +virtual memory activity. +By running \fIvmstat\fP +when the system is active you can judge the system activity in several +dimensions: job distribution, virtual memory load, paging and swapping +activity, disk and cpu utilization. +Ideally, to have a balanced system in activity, +there should be few blocked (b) jobs, +there should be little paging or swapping activity, there should +be available bandwidth on the disk devices (most single arms peak +out at 25-35 tps in practice), and the user cpu utilization (us) should +be high (above 50%). +.PP +If the system is busy, then the count of active jobs may be large, +and several of these jobs may often be blocked (b). If the virtual +memory is active, then the paging demon will be running (sr will +be non-zero). It is healthy for the paging demon to free pages when +the virtual memory gets active; it is triggered by the amount of free +memory dropping below a threshold and increases its pace as free memory +goes to zero. +.PP +If you run \fIvmstat\fP +when the system is busy (a ``vmstat 5'' gives all the +numbers computed by the system), you can find +imbalances by noting abnormal job distributions. If many +processes are blocked (b), then the disk subsystem +is overloaded or imbalanced. If you have several non-dma +devices or open teletype lines that are ``ringing'', or user programs +that are doing high-speed non-buffered input/output, then the system +time may go high (60-80% or higher). +It is often possible to pin down the cause of high system time by +looking to see if there is excessive context switching (cs), interrupt +activity (in) or system call activity (sy). Long term measurements +on one of +our large machines show +an average of 60 context switches and interrupts +per second and an average of 90 system calls per second. +.PP +If the system is heavily loaded, or if you have little memory +for your load (1 megabyte is little in our environment), then the system +may be forced to swap. This is likely to be accompanied by a noticeable +reduction in the system responsiveness and long pauses when interactive +jobs such as editors swap out. +.PP +A second important program is \fIiostat\fP\|(1). +\fIIostat\fP +iteratively reports the number of characters read and written to terminals, +and, for each disk, the number of transfers per second, kilobytes +transferred per second, +and the milliseconds per average seek. +It also gives the percentage of time the system has +spent in user mode, in user mode running low priority (niced) processes, +in system mode, and idling. +.PP +To compute this information, for each disk, seeks and data transfer completions +and the number of words transferred are counted; +for terminals collectively, the number +of input and output characters are counted. +Also, every 100 ms, +the state of each disk is examined +and a tally is made if the disk is active. +From these numbers and the transfer rates +of the devices it is possible to determine +average seek times for each device. +.PP +When filesystems are poorly placed on the available +disks, figures reported by \fIiostat\fP can be used +to pinpoint bottlenecks. Under heavy system load, disk +traffic should be spread out among the drives with +higher traffic expected to the devices where the root, swap, and +/tmp filesystems are located. When multiple disk drives are +attached to the same controller, the system will +attempt to overlap seek operations with I/O transfers. When +seeks are performed, \fIiostat\fP will show +non-zero average seek times. Most modern disk drives should +exhibit an average seek time of 25-35 ms. +.PP +Terminal traffic reported by \fIiostat\fP should be heavily +output oriented unless terminal lines are being used for +data transfer by programs such as \fIuucp\fP. Input and +output rates are system specific. Screen editors +such as \fIvi\fP and \fIemacs\fP tend to exhibit output/input +ratios of anywhere from 5/1 to 8/1. On one of our largest +systems, 88 terminal lines plus 32 pseudo terminals, we observed +an average of 180 characters/second input and 450 characters/second +output over 4 days of operation. +.NH 2 +Kernel profiling +.PP +It is simple to build a 4.2BSD kernel that will automatically +collect profiling information as it operates simply by specifying the +.B \-p +option to \fIconfig\fP\|(8) when configuring a kernel. +The program counter sampling can be driven by the system clock, +or by an alternate real time clock. +The latter is highly recommended as use of the system clock results +in statistical anomalies in accounting for +the time spent in the kernel clock routine. +.PP +Once a profiling system has been booted statistic gathering is +handled by \fIkgmon\fP\|(8). +\fIKgmon\fP allows profiling to be started and stopped +and the internal state of the profiling buffers to be dumped. +\fIKgmon\fP can also be used to reset the state of the internal +buffers to allow multiple experiments to be run without +rebooting the machine. +.PP +The profiling data is processed with \fIgprof\fP\|(1) +to obtain information regarding the system's operation. +Profiled systems maintain histograms of the kernel program counter, +the number of invocations of each routine, +and a dynamic call graph of the executing system. +The postprocessing propagates the time spent in each +routine along the arcs of the call graph. +\fIGprof\fP then generates a listing for each routine in the kernel, +sorted according to the time it uses +including the time of its call graph descendents. +Below each routine entry is shown its (direct) call graph children, +and how their times are propagated to this routine. +A similar display above the routine shows how this routine's time and the +time of its descendents is propagated to its (direct) call graph parents. +.PP +A profiled system is about 5-10% larger in its text space because of +the calls to count the subroutine invocations. +When the system executes, +the profiling data is stored in a buffer that is 1.2 +times the size of the text space. +All the information is summarized in memory, +it is not necessary to have a trace file +being continuously dumped to disk. +The overhead for running a profiled system varies; +under normal load we see anywhere from 5-25% +of the system time spent in the profiling code. +Thus the system is noticeably slower than an unprofiled system, +yet is not so bad that it cannot be used in a production environment. +This is important since it allows us to gather data +in a real environment rather than trying to +devise synthetic work loads. +.NH 2 +Kernel tracing +.PP +The kernel can be configured to trace certain operations by +specifying ``options TRACE'' in the configuration file. This +forces the inclusion of code that records the occurrence of +events in \fItrace records\fP in a circular buffer in kernel +memory. Events may be enabled/disabled selectively while the +system is operating. Each trace record contains a time stamp +(taken from the VAX hardware time of day clock register), an +event identifier, and additional information that is interpreted +according to the event type. Buffer cache operations, such as +initiating a read, include +the disk drive, block number, and transfer size in the trace record. +Virtual memory operations, such as a pagein completing, include +the virtual address and process id in the trace record. The circular +buffer is normally configured to hold 256 16-byte trace records.\** +.FS +\** The standard trace facilities distributed with 4.2 +differ slightly from those described here. The time stamp in the +distributed system is calculated from the kernel's time of day +variable instead of the VAX hardware register, and the buffer cache +trace points do not record the transfer size. +.FE +.PP +Several user programs were written to sample and interpret the +tracing information. One program runs in the background and +periodically reads the circular buffer of trace records. The +trace information is compressed, in some instances interpreted +to generate additional information, and a summary is written to a +file. In addition, the sampling program can also record +information from other kernel data structures, such as those +interpreted by the \fIvmstat\fP program. Data written out to +a file is further buffered to minimize I/O load. +.PP +Once a trace log has been created, programs that compress +and interpret the data may be run to generate graphs showing the +data and relationships between traced events and +system load. +.PP +The trace package was used mainly to investigate the operation of +the file system buffer cache. The sampling program maintained a +history of read-ahead blocks and used the trace information to +calculate, for example, percentage of read-ahead blocks used. +.NH 2 +Benchmark programs +.PP +Benchmark programs were used in two ways. First, a suite of +programs was constructed to calculate the cost of certain basic +system operations. Operations such as system call overhead and +context switching time are critically important in evaluating the +overall performance of a system. Because of the drastic changes in +the system between 4.1BSD and 4.2BSD, it was important to verify +the overhead of these low level operations had not changed appreciably. +.PP +The second use of benchmarks was in exercising +suspected bottlenecks. +When we suspected a specific problem with the system, +a small benchmark program was written to repeatedly use +the facility. +While these benchmarks are not useful as a general tool +they can give quick feedback on whether a hypothesized +improvement is really having an effect. +It is important to realize that the only real assurance +that a change has a beneficial effect is through +long term measurements of general timesharing. +We have numerous examples where a benchmark program +suggests vast improvements while the change +in the long term system performance is negligible, +and conversely examples in which the benchmark program run more slowly, +but the long term system performance improves significantly. diff --git a/share/doc/papers/sysperf/3.t b/share/doc/papers/sysperf/3.t new file mode 100644 index 0000000..832ad42 --- /dev/null +++ b/share/doc/papers/sysperf/3.t @@ -0,0 +1,694 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)3.t 5.1 (Berkeley) 4/17/91 +.\" +.ds RH Results of our observations +.NH +Results of our observations +.PP +When 4.2BSD was first installed on several large timesharing systems +the degradation in performance was significant. +Informal measurements showed 4.2BSD providing 80% of the throughput +of 4.1BSD (based on load averages observed under a normal timesharing load). +Many of the initial problems found were because of programs that were +not part of 4.1BSD. Using the techniques described in the previous +section and standard process profiling several problems were identified. +Later work concentrated on the operation of the kernel itself. +In this section we discuss the problems uncovered; in the next +section we describe the changes made to the system. +.NH 2 +User programs +.PP +.NH 3 +Mail system +.PP +The mail system was the first culprit identified as a major +contributor to the degradation in system performance. +At Lucasfilm the mail system is heavily used +on one machine, a VAX-11/780 with eight megabytes of memory.\** +.FS +\** During part of these observations the machine had only four +megabytes of memory. +.FE +Message +traffic is usually between users on the same machine and ranges from +person-to-person telephone messages to per-organization distribution +lists. After conversion to 4.2BSD, it was +immediately noticed that mail to distribution lists of 20 or more people +caused the system load to jump by anywhere from 3 to 6 points. +The number of processes spawned by the \fIsendmail\fP program and +the messages sent from \fIsendmail\fP to the system logging +process, \fIsyslog\fP, generated significant load both from their +execution and their interference with basic system operation. The +number of context switches and disk transfers often doubled while +\fIsendmail\fP operated; the system call rate jumped dramatically. +System accounting information consistently +showed \fIsendmail\fP as the top cpu user on the system. +.NH 3 +Network servers +.PP +The network services provided in 4.2BSD add new capabilities to the system, +but are not without cost. The system uses one daemon process to accept +requests for each network service provided. The presence of many +such daemons increases the numbers of active processes and files, +and requires a larger configuration to support the same number of users. +The overhead of the routing and status updates can consume +several percent of the cpu. +Remote logins and shells incur more overhead +than their local equivalents. +For example, a remote login uses three processes and a +pseudo-terminal handler in addition to the local hardware terminal +handler. When using a screen editor, sending and echoing a single +character involves four processes on two machines. +The additional processes, context switching, network traffic, and +terminal handler overhead can roughly triple the load presented by one +local terminal user. +.NH 2 +System overhead +.PP +To measure the costs of various functions in the kernel, +a profiling system was run for a 17 hour +period on one of our general timesharing machines. +While this is not as reproducible as a synthetic workload, +it certainly represents a realistic test. +This test was run on several occasions over a three month period. +Despite the long period of time that elapsed +between the test runs the shape of the profiles, +as measured by the number of times each system call +entry point was called, were remarkably similar. +.PP +These profiles turned up several bottlenecks that are +discussed in the next section. +Several of these were new to 4.2BSD, +but most were caused by overloading of mechanisms +which worked acceptably well in previous BSD systems. +The general conclusion from our measurements was that +the ratio of user to system time had increased from +45% system / 55% user in 4.1BSD to 57% system / 43% user +in 4.2BSD. +.NH 3 +Micro-operation benchmarks +.PP +To compare certain basic system operations +between 4.1BSD and 4.2BSD a suite of benchmark +programs was constructed and run on a VAX-11/750 with 4.5 megabytes +of physical memory and two disks on a MASSBUS controller. +Tests were run with the machine operating in single user mode +under both 4.1BSD and 4.2BSD. Paging was localized to the drive +where the root file system was located. +.PP +The benchmark programs were modeled after the Kashtan benchmarks, +[Kashtan80], with identical sources compiled under each system. +The programs and their intended purpose are described briefly +before the presentation of the results. The benchmark scripts +were run twice with the results shown as the average of +the two runs. +The source code for each program and the shell scripts used during +the benchmarks are included in the Appendix. +.PP +The set of tests shown in Table 1 was concerned with +system operations other than paging. The intent of most +benchmarks is clear. The result of running \fIsignocsw\fP is +deducted from the \fIcsw\fP benchmark to calculate the context +switch overhead. The \fIexec\fP tests use two different jobs to gauge +the cost of overlaying a larger program with a smaller one +and vice versa. The +``null job'' and ``big job'' differ solely in the size of their data +segments, 1 kilobyte versus 256 kilobytes. In both cases the +text segment of the parent is larger than that of the child.\** +.FS +\** These tests should also have measured the cost of expanding the +text segment; unfortunately time did not permit running additional tests. +.FE +All programs were compiled into the default load format that causes +the text segment to be demand paged out of the file system and shared +between processes. +.KF +.DS L +.TS +center box; +l | l. +Test Description +_ +syscall perform 100,000 \fIgetpid\fP system calls +csw perform 10,000 context switches using signals +signocsw send 10,000 signals to yourself +pipeself4 send 10,000 4-byte messages to yourself +pipeself512 send 10,000 512-byte messages to yourself +pipediscard4 send 10,000 4-byte messages to child who discards +pipediscard512 send 10,000 512-byte messages to child who discards +pipeback4 exchange 10,000 4-byte messages with child +pipeback512 exchange 10,000 512-byte messages with child +forks0 fork-exit-wait 1,000 times +forks1k sbrk(1024), fault page, fork-exit-wait 1,000 times +forks100k sbrk(102400), fault pages, fork-exit-wait 1,000 times +vforks0 vfork-exit-wait 1,000 times +vforks1k sbrk(1024), fault page, vfork-exit-wait 1,000 times +vforks100k sbrk(102400), fault pages, vfork-exit-wait 1,000 times +execs0null fork-exec ``null job''-exit-wait 1,000 times +execs0null (1K env) execs0null above, with 1K environment added +execs1knull sbrk(1024), fault page, fork-exec ``null job''-exit-wait 1,000 times +execs1knull (1K env) execs1knull above, with 1K environment added +execs100knull sbrk(102400), fault pages, fork-exec ``null job''-exit-wait 1,000 times +vexecs0null vfork-exec ``null job''-exit-wait 1,000 times +vexecs1knull sbrk(1024), fault page, vfork-exec ``null job''-exit-wait 1,000 times +vexecs100knull sbrk(102400), fault pages, vfork-exec ``null job''-exit-wait 1,000 times +execs0big fork-exec ``big job''-exit-wait 1,000 times +execs1kbig sbrk(1024), fault page, fork-exec ``big job''-exit-wait 1,000 times +execs100kbig sbrk(102400), fault pages, fork-exec ``big job''-exit-wait 1,000 times +vexecs0big vfork-exec ``big job''-exit-wait 1,000 times +vexecs1kbig sbrk(1024), fault pages, vfork-exec ``big job''-exit-wait 1,000 times +vexecs100kbig sbrk(102400), fault pages, vfork-exec ``big job''-exit-wait 1,000 times +.TE +.ce +Table 1. Kernel Benchmark programs. +.DE +.KE +.PP +The results of these tests are shown in Table 2. If the 4.1BSD results +are scaled to reflect their being run on a VAX-11/750, they +correspond closely to those found in [Joy80].\** +.FS +\** We assume that a VAX-11/750 runs at 60% of the speed of a VAX-11/780 +(not considering floating point operations). +.FE +.KF +.DS L +.TS +center box; +c s s s s s s s s s +c || c s s || c s s || c s s +c || c s s || c s s || c s s +c || c | c | c || c | c | c || c | c | c +l || n | n | n || n | n | n || n | n | n. +Berkeley Software Distribution UNIX Systems +_ +Test Elapsed Time User Time System Time +\^ _ _ _ +\^ 4.1 4.2 4.3 4.1 4.2 4.3 4.1 4.2 4.3 += +syscall 28.0 29.0 23.0 4.5 5.3 3.5 23.9 23.7 20.4 +csw 45.0 60.0 45.0 3.5 4.3 3.3 19.5 25.4 19.0 +signocsw 16.5 23.0 16.0 1.9 3.0 1.1 14.6 20.1 15.2 +pipeself4 21.5 29.0 26.0 1.1 1.1 0.8 20.1 28.0 25.6 +pipeself512 47.5 59.0 55.0 1.2 1.2 1.0 46.1 58.3 54.2 +pipediscard4 32.0 42.0 36.0 3.2 3.7 3.0 15.5 18.8 15.6 +pipediscard512 61.0 76.0 69.0 3.1 2.1 2.0 29.7 36.4 33.2 +pipeback4 57.0 75.0 66.0 2.9 3.2 3.3 25.1 34.2 29.7 +pipeback512 110.0 138.0 125.0 3.1 3.4 2.2 52.2 65.7 57.7 +forks0 37.5 41.0 22.0 0.5 0.3 0.3 34.5 37.6 21.5 +forks1k 40.0 43.0 22.0 0.4 0.3 0.3 36.0 38.8 21.6 +forks100k 217.5 223.0 176.0 0.7 0.6 0.4 214.3 218.4 175.2 +vforks0 34.5 37.0 22.0 0.5 0.6 0.5 27.3 28.5 17.9 +vforks1k 35.0 37.0 22.0 0.6 0.8 0.5 27.2 28.6 17.9 +vforks100k 35.0 37.0 22.0 0.6 0.8 0.6 27.6 28.9 17.9 +execs0null 97.5 92.0 66.0 3.8 2.4 0.6 68.7 82.5 48.6 +execs0null (1K env) 197.0 229.0 75.0 4.1 2.6 0.9 167.8 212.3 62.6 +execs1knull 99.0 100.0 66.0 4.1 1.9 0.6 70.5 86.8 48.7 +execs1knull (1K env) 199.0 230.0 75.0 4.2 2.6 0.7 170.4 214.9 62.7 +execs100knull 283.5 278.0 216.0 4.8 2.8 1.1 251.9 269.3 202.0 +vexecs0null 100.0 92.0 66.0 5.1 2.7 1.1 63.7 76.8 45.1 +vexecs1knull 100.0 91.0 66.0 5.2 2.8 1.1 63.2 77.1 45.1 +vexecs100knull 100.0 92.0 66.0 5.1 3.0 1.1 64.0 77.7 45.6 +execs0big 129.0 201.0 101.0 4.0 3.0 1.0 102.6 153.5 92.7 +execs1kbig 130.0 202.0 101.0 3.7 3.0 1.0 104.7 155.5 93.0 +execs100kbig 318.0 385.0 263.0 4.8 3.1 1.1 286.6 339.1 247.9 +vexecs0big 128.0 200.0 101.0 4.6 3.5 1.6 98.5 149.6 90.4 +vexecs1kbig 125.0 200.0 101.0 4.7 3.5 1.3 98.9 149.3 88.6 +vexecs100kbig 126.0 200.0 101.0 4.2 3.4 1.3 99.5 151.0 89.0 +.TE +.ce +Table 2. Kernel Benchmark results (all times in seconds). +.DE +.KE +.PP +In studying the measurements we found that the basic system call +and context switch overhead did not change significantly +between 4.1BSD and 4.2BSD. The \fIsignocsw\fP results were caused by +the changes to the \fIsignal\fP interface, resulting +in an additional subroutine invocation for each call, not +to mention additional complexity in the system's implementation. +.PP +The times for the use of pipes are significantly higher under +4.2BSD because of their implementation on top of the interprocess +communication facilities. Under 4.1BSD pipes were implemented +without the complexity of the socket data structures and with +simpler code. Further, while not obviously a factor here, +4.2BSD pipes have less system buffer space provided them than +4.1BSD pipes. +.PP +The \fIexec\fP tests shown in Table 2 were performed with 34 bytes of +environment information under 4.1BSD and 40 bytes under 4.2BSD. +To figure the cost of passing data through the environment, +the execs0null and execs1knull tests were rerun with +1065 additional bytes of data. The results are show in Table 3. +.KF +.DS L +.TS +center box; +c || c s || c s || c s +c || c s || c s || c s +c || c | c || c | c || c | c +l || n | n || n | n || n | n. +Test Real User System +\^ _ _ _ +\^ 4.1 4.2 4.1 4.2 4.1 4.2 += +execs0null 197.0 229.0 4.1 2.6 167.8 212.3 +execs1knull 199.0 230.0 4.2 2.6 170.4 214.9 +.TE +.ce +Table 3. Benchmark results with ``large'' environment (all times in seconds). +.DE +.KE +These results show that passing argument data is significantly +slower than under 4.1BSD: 121 ms/byte versus 93 ms/byte. Even using +this factor to adjust the basic overhead of an \fIexec\fP system +call, this facility is more costly under 4.2BSD than under 4.1BSD. +.NH 3 +Path name translation +.PP +The single most expensive function performed by the kernel +is path name translation. +This has been true in almost every UNIX kernel [Mosher80]; +we find that our general time sharing systems do about +500,000 name translations per day. +.PP +Name translations became more expensive in 4.2BSD for several reasons. +The single most expensive addition was the symbolic link. +Symbolic links +have the effect of increasing the average number of components +in path names to be translated. +As an insidious example, +consider the system manager that decides to change /tmp +to be a symbolic link to /usr/tmp. +A name such as /tmp/tmp1234 that previously required two component +translations, +now requires four component translations plus the cost of reading +the contents of the symbolic link. +.PP +The new directory format also changes the characteristics of +name translation. +The more complex format requires more computation to determine +where to place new entries in a directory. +Conversely the additional information allows the system to only +look at active entries when searching, +hence searches of directories that had once grown large +but currently have few active entries are checked quickly. +The new format also stores the length of each name so that +costly string comparisons are only done on names that are the +same length as the name being sought. +.PP +The net effect of the changes is that the average time to +translate a path name in 4.2BSD is 24.2 milliseconds, +representing 40% of the time processing system calls, +that is 19% of the total cycles in the kernel, +or 11% of all cycles executed on the machine. +The times are shown in Table 4. We have no comparable times +for \fInamei\fP under 4.1 though they are certain to +be significantly less. +.KF +.DS L +.TS +center box; +l r r. +part time % of kernel +_ +self 14.3 ms/call 11.3% +child 9.9 ms/call 7.9% +_ +total 24.2 ms/call 19.2% +.TE +.ce +Table 4. Call times for \fInamei\fP in 4.2BSD. +.DE +.KE +.NH 3 +Clock processing +.PP +Nearly 25% of the time spent in the kernel is spent in the clock +processing routines. +(This is a clear indication that to avoid sampling bias when profiling the +kernel with our tools +we need to drive them from an independent clock.) +These routines are responsible for implementing timeouts, +scheduling the processor, +maintaining kernel statistics, +and tending various hardware operations such as +draining the terminal input silos. +Only minimal work is done in the hardware clock interrupt +routine (at high priority), the rest is performed (at a lower priority) +in a software interrupt handler scheduled by the hardware interrupt +handler. +In the worst case, with a clock rate of 100 Hz +and with every hardware interrupt scheduling a software +interrupt, the processor must field 200 interrupts per second. +The overhead of simply trapping and returning +is 3% of the machine cycles, +figuring out that there is nothing to do +requires an additional 2%. +.NH 3 +Terminal multiplexors +.PP +The terminal multiplexors supported by 4.2BSD have programmable receiver +silos that may be used in two ways. +With the silo disabled, each character received causes an interrupt +to the processor. +Enabling the receiver silo allows the silo to fill before +generating an interrupt, allowing multiple characters to be read +for each interrupt. +At low rates of input, received characters will not be processed +for some time unless the silo is emptied periodically. +The 4.2BSD kernel uses the input silos of each terminal multiplexor, +and empties each silo on each clock interrupt. +This allows high input rates without the cost of per-character interrupts +while assuring low latency. +However, as character input rates on most machines are usually +low (about 25 characters per second), +this can result in excessive overhead. +At the current clock rate of 100 Hz, a machine with 5 terminal multiplexors +configured makes 500 calls to the receiver interrupt routines per second. +In addition, to achieve acceptable input latency +for flow control, each clock interrupt must schedule +a software interrupt to run the silo draining routines.\** +.FS +\** It is not possible to check the input silos at +the time of the actual clock interrupt without modifying the terminal +line disciplines, as the input queues may not be in a consistent state \**. +.FE +\** This implies that the worst case estimate for clock processing +is the basic overhead for clock processing. +.NH 3 +Process table management +.PP +In 4.2BSD there are numerous places in the kernel where a linear search +of the process table is performed: +.IP \(bu 3 +in \fIexit\fP to locate and wakeup a process's parent; +.IP \(bu 3 +in \fIwait\fP when searching for \fB\s-2ZOMBIE\s+2\fP and +\fB\s-2STOPPED\s+2\fP processes; +.IP \(bu 3 +in \fIfork\fP when allocating a new process table slot and +counting the number of processes already created by a user; +.IP \(bu 3 +in \fInewproc\fP, to verify +that a process id assigned to a new process is not currently +in use; +.IP \(bu 3 +in \fIkill\fP and \fIgsignal\fP to locate all processes to +which a signal should be delivered; +.IP \(bu 3 +in \fIschedcpu\fP when adjusting the process priorities every +second; and +.IP \(bu 3 +in \fIsched\fP when locating a process to swap out and/or swap +in. +.LP +These linear searches can incur significant overhead. The rule +for calculating the size of the process table is: +.ce +nproc = 20 + 8 * maxusers +.sp +that means a 48 user system will have a 404 slot process table. +With the addition of network services in 4.2BSD, as many as a dozen +server processes may be maintained simply to await incoming requests. +These servers are normally created at boot time which causes them +to be allocated slots near the beginning of the process table. This +means that process table searches under 4.2BSD are likely to take +significantly longer than under 4.1BSD. System profiling shows +that as much as 20% of the time spent in the kernel on a loaded +system (a VAX-11/780) can be spent in \fIschedcpu\fP and, on average, +5-10% of the kernel time is spent in \fIschedcpu\fP. +The other searches of the proc table are similarly affected. +This shows the system can no longer tolerate using linear searches of +the process table. +.NH 3 +File system buffer cache +.PP +The trace facilities described in section 2.3 were used +to gather statistics on the performance of the buffer cache. +We were interested in measuring the effectiveness of the +cache and the read-ahead policies. +With the file system block size in 4.2BSD four to +eight times that of a 4.1BSD file system, we were concerned +that large amounts of read-ahead might be performed without +being used. Also, we were interested in seeing if the +rules used to size the buffer cache at boot time were severely +affecting the overall cache operation. +.PP +The tracing package was run over a three hour period during +a peak mid-afternoon period on a VAX 11/780 with four megabytes +of physical memory. +This resulted in a buffer cache containing 400 kilobytes of memory +spread among 50 to 200 buffers +(the actual number of buffers depends on the size mix of +disk blocks being read at any given time). +The pertinent configuration information is shown in Table 5. +.KF +.DS L +.TS +center box; +l l l l. +Controller Drive Device File System +_ +DEC MASSBUS DEC RP06 hp0d /usr + hp0b swap +Emulex SC780 Fujitsu Eagle hp1a /usr/spool/news + hp1b swap + hp1e /usr/src + hp1d /u0 (users) + Fujitsu Eagle hp2a /tmp + hp2b swap + hp2d /u1 (users) + Fujitsu Eagle hp3a / +.TE +.ce +Table 5. Active file systems during buffer cache tests. +.DE +.KE +.PP +During the test period the load average ranged from 2 to 13 +with an average of 5. +The system had no idle time, 43% user time, and 57% system time. +The system averaged 90 interrupts per second +(excluding the system clock interrupts), +220 system calls per second, +and 50 context switches per second (40 voluntary, 10 involuntary). +.PP +The active virtual memory (the sum of the address space sizes of +all jobs that have run in the previous twenty seconds) +over the period ranged from 2 to 6 megabytes with an average +of 3.5 megabytes. +There was no swapping, though the page daemon was inspecting +about 25 pages per second. +.PP +On average 250 requests to read disk blocks were initiated +per second. +These include read requests for file blocks made by user +programs as well as requests initiated by the system. +System reads include requests for indexing information to determine +where a file's next data block resides, +file system layout maps to allocate new data blocks, +and requests for directory contents needed to do path name translations. +.PP +On average, an 85% cache hit rate was observed for read requests. +Thus only 37 disk reads were initiated per second. +In addition, 5 read-ahead requests were made each second +filling about 20% of the buffer pool. +Despite the policies to rapidly reuse read-ahead buffers +that remain unclaimed, more than 90% of the read-ahead +buffers were used. +.PP +These measurements showed that the buffer cache was working +effectively. Independent tests have also showed that the size +of the buffer cache may be reduced significantly on memory-poor +system without severe effects; +we have not yet tested this hypothesis [Shannon83]. +.NH 3 +Network subsystem +.PP +The overhead associated with the +network facilities found in 4.2BSD is often +difficult to gauge without profiling the system. +This is because most input processing is performed +in modules scheduled with software interrupts. +As a result, the system time spent performing protocol +processing is rarely attributed to the processes that +really receive the data. Since the protocols supported +by 4.2BSD can involve significant overhead this was a serious +concern. Results from a profiled kernel show an average +of 5% of the system time is spent +performing network input and timer processing in our environment +(a 3Mb/s Ethernet with most traffic using TCP). +This figure can vary significantly depending on +the network hardware used, the average message +size, and whether packet reassembly is required at the network +layer. On one machine we profiled over a 17 hour +period (our gateway to the ARPANET) +206,000 input messages accounted for 2.4% of the system time, +while another 0.6% of the system time was spent performing +protocol timer processing. +This machine was configured with an ACC LH/DH IMP interface +and a DMA 3Mb/s Ethernet controller. +.PP +The performance of TCP over slower long-haul networks +was degraded substantially by two problems. +The first problem was a bug that prevented round-trip timing measurements +from being made, thus increasing retransmissions unnecessarily. +The second was a problem with the maximum segment size chosen by TCP, +that was well-tuned for Ethernet, but was poorly chosen for +the ARPANET, where it causes packet fragmentation. (The maximum +segment size was actually negotiated upwards to a value that +resulted in excessive fragmentation.) +.PP +When benchmarked in Ethernet environments the main memory buffer management +of the network subsystem presented some performance anomalies. +The overhead of processing small ``mbufs'' severely affected throughput for a +substantial range of message sizes. +In spite of the fact that most system ustilities made use of the throughput +optimal 1024 byte size, user processes faced large degradations for some +arbitrary sizes. This was specially true for TCP/IP transmissions [Cabrera84, +Cabrera85]. +.NH 3 +Virtual memory subsystem +.PP +We ran a set of tests intended to exercise the virtual +memory system under both 4.1BSD and 4.2BSD. +The tests are described in Table 6. +The test programs dynamically allocated +a 7.3 Megabyte array (using \fIsbrk\fP\|(2)) then referenced +pages in the array either: sequentially, in a purely random +fashion, or such that the distance between +successive pages accessed was randomly selected from a Gaussian +distribution. In the last case, successive runs were made with +increasing standard deviations. +.KF +.DS L +.TS +center box; +l | l. +Test Description +_ +seqpage sequentially touch pages, 10 iterations +seqpage-v as above, but first make \fIvadvise\fP\|(2) call +randpage touch random page 30,000 times +randpage-v as above, but first make \fIvadvise\fP call +gausspage.1 30,000 Gaussian accesses, standard deviation of 1 +gausspage.10 as above, standard deviation of 10 +gausspage.30 as above, standard deviation of 30 +gausspage.40 as above, standard deviation of 40 +gausspage.50 as above, standard deviation of 50 +gausspage.60 as above, standard deviation of 60 +gausspage.80 as above, standard deviation of 80 +gausspage.inf as above, standard deviation of 10,000 +.TE +.ce +Table 6. Paging benchmark programs. +.DE +.KE +.PP +The results in Table 7 show how the additional +memory requirements +of 4.2BSD can generate more work for the paging system. +Under 4.1BSD, +the system used 0.5 of the 4.5 megabytes of physical memory +on the test machine; +under 4.2BSD it used nearly 1 megabyte of physical memory.\** +.FS +\** The 4.1BSD system used for testing was really a 4.1a +system configured +with networking facilities and code to support +remote file access. The +4.2BSD system also included the remote file access code. +Since both +systems would be larger than similarly configured ``vanilla'' +4.1BSD or 4.2BSD system, we consider out conclusions to still be valid. +.FE +This resulted in more page faults and, hence, more system time. +To establish a common ground on which to compare the paging +routines of each system, we check instead the average page fault +service times for those test runs that had a statistically significant +number of random page faults. These figures, shown in Table 8, show +no significant difference between the two systems in +the area of page fault servicing. We currently have +no explanation for the results of the sequential +paging tests. +.KF +.DS L +.TS +center box; +l || c s || c s || c s || c s +l || c s || c s || c s || c s +l || c | c || c | c || c | c || c | c +l || n | n || n | n || n | n || n | n. +Test Real User System Page Faults +\^ _ _ _ _ +\^ 4.1 4.2 4.1 4.2 4.1 4.2 4.1 4.2 += +seqpage 959 1126 16.7 12.8 197.0 213.0 17132 17113 +seqpage-v 579 812 3.8 5.3 216.0 237.7 8394 8351 +randpage 571 569 6.7 7.6 64.0 77.2 8085 9776 +randpage-v 572 562 6.1 7.3 62.2 77.5 8126 9852 +gausspage.1 25 24 23.6 23.8 0.8 0.8 8 8 +gausspage.10 26 26 22.7 23.0 3.2 3.6 2 2 +gausspage.30 34 33 25.0 24.8 8.6 8.9 2 2 +gausspage.40 42 81 23.9 25.0 11.5 13.6 3 260 +gausspage.50 113 175 24.2 26.2 19.6 26.3 784 1851 +gausspage.60 191 234 27.6 26.7 27.4 36.0 2067 3177 +gausspage.80 312 329 28.0 27.9 41.5 52.0 3933 5105 +gausspage.inf 619 621 82.9 85.6 68.3 81.5 8046 9650 +.TE +.ce +Table 7. Paging benchmark results (all times in seconds). +.DE +.KE +.KF +.DS L +.TS +center box; +c || c s || c s +c || c s || c s +c || c | c || c | c +l || n | n || n | n. +Test Page Faults PFST +\^ _ _ +\^ 4.1 4.2 4.1 4.2 += +randpage 8085 9776 791 789 +randpage-v 8126 9852 765 786 +gausspage.inf 8046 9650 848 844 +.TE +.ce +Table 8. Page fault service times (all times in microseconds). +.DE +.KE diff --git a/share/doc/papers/sysperf/4.t b/share/doc/papers/sysperf/4.t new file mode 100644 index 0000000..373a0d0 --- /dev/null +++ b/share/doc/papers/sysperf/4.t @@ -0,0 +1,776 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)4.t 5.1 (Berkeley) 4/17/91 +.\" +.\" $FreeBSD$ +.\" +.ds RH Performance Improvements +.NH +Performance Improvements +.PP +This section outlines the changes made to the system +since the 4.2BSD distribution. +The changes reported here were made in response +to the problems described in Section 3. +The improvements fall into two major classes; +changes to the kernel that are described in this section, +and changes to the system libraries and utilities that are +described in the following section. +.NH 2 +Performance Improvements in the Kernel +.PP +Our goal has been to optimize system performance +for our general timesharing environment. +Since most sites running 4.2BSD have been forced to take +advantage of declining +memory costs rather than replace their existing machines with +ones that are more powerful, we have +chosen to optimize running time at the expense of memory. +This tradeoff may need to be reconsidered for personal workstations +that have smaller memories and higher latency disks. +Decreases in the running time of the system may be unnoticeable +because of higher paging rates incurred by a larger kernel. +Where possible, we have allowed the size of caches to be controlled +so that systems with limited memory may reduce them as appropriate. +.NH 3 +Name Cacheing +.PP +Our initial profiling studies showed that more than one quarter +of the time in the system was spent in the +pathname translation routine, \fInamei\fP, +translating path names to inodes\u\s-21\s0\d\**. +.FS +\** \u\s-21\s0\d Inode is an abbreviation for ``Index node''. +Each file on the system is described by an inode; +the inode maintains access permissions, and an array of pointers to +the disk blocks that hold the data associated with the file. +.FE +An inspection of \fInamei\fP shows that +it consists of two nested loops. +The outer loop is traversed once per pathname component. +The inner loop performs a linear search through a directory looking +for a particular pathname component. +.PP +Our first idea was to reduce the number of iterations +around the inner loop of \fInamei\fP by observing that many programs +step through a directory performing an operation on each entry in turn. +To improve performance for processes doing directory scans, +the system keeps track of the directory offset of the last component of the +most recently translated path name for each process. +If the next name the process requests is in the same directory, +the search is started from the offset that the previous name was found +(instead of from the beginning of the directory). +Changing directories invalidates the cache, as +does modifying the directory. +For programs that step sequentially through a directory with +.EQ +delim $$ +.EN +$N$ files, search time decreases from $O ( N sup 2 )$ to $O(N)$. +.EQ +delim off +.EN +.PP +The cost of the cache is about 20 lines of code +(about 0.2 kilobytes) +and 16 bytes per process, with the cached data +stored in a process's \fIuser\fP vector. +.PP +As a quick benchmark to verify the maximum effectiveness of the +cache we ran ``ls \-l'' +on a directory containing 600 files. +Before the per-process cache this command +used 22.3 seconds of system time. +After adding the cache the program used the same amount +of user time, but the system time dropped to 3.3 seconds. +.PP +This change prompted our rerunning a profiled system +on a machine containing the new \fInamei\fP. +The results showed that the time in \fInamei\fP +dropped by only 2.6 ms/call and +still accounted for 36% of the system call time, +18% of the kernel, or about 10% of all the machine cycles. +This amounted to a drop in system time from 57% to about 55%. +The results are shown in Table 9. +.KF +.DS L +.TS +center box; +l r r. +part time % of kernel +_ +self 11.0 ms/call 9.2% +child 10.6 ms/call 8.9% +_ +total 21.6 ms/call 18.1% +.TE +.ce +Table 9. Call times for \fInamei\fP with per-process cache. +.DE +.KE +.PP +The small performance improvement +was caused by a low cache hit ratio. +Although the cache was 90% effective when hit, +it was only usable on about 25% of the names being translated. +An additional reason for the small improvement was that +although the amount of time spent in \fInamei\fP itself +decreased substantially, +more time was spent in the routines that it called +since each directory had to be accessed twice; +once to search from the middle to the end, +and once to search from the beginning to the middle. +.PP +Frequent requests for a small set of names are best handled +with a cache of recent name translations\**. +.FS +\** The cache is keyed on a name and the +inode and device number of the directory that contains it. +Associated with each entry is a pointer to the corresponding +entry in the inode table. +.FE +This has the effect of eliminating the inner loop of \fInamei\fP. +For each path name component, +\fInamei\fP first looks in its cache of recent translations +for the needed name. +If it exists, the directory search can be completely eliminated. +.PP +The system already maintained a cache of recently accessed inodes, +so the initial name cache +maintained a simple name-inode association that was used to +check each component of a path name during name translations. +We considered implementing the cache by tagging each inode +with its most recently translated name, +but eventually decided to have a separate data structure that +kept names with pointers to the inode table. +Tagging inodes has two drawbacks; +many inodes such as those associated with login ports remain in +the inode table for a long period of time, but are never looked +up by name. +Other inodes, such as those describing directories are looked up +frequently by many different names (\fIe.g.\fP ``..''). +By keeping a separate table of names, the cache can +truly reflect the most recently used names. +An added benefit is that the table can be sized independently +of the inode table, so that machines with small amounts of memory +can reduce the size of the cache (or even eliminate it) +without modifying the inode table structure. +.PP +Another issue to be considered is how the name cache should +hold references to the inode table. +Normally processes hold ``hard references'' by incrementing the +reference count in the inode they reference. +Since the system reuses only inodes with zero reference counts, +a hard reference insures that the inode pointer will remain valid. +However, if the name cache holds hard references, +it is limited to some fraction of the size of the inode table, +since some inodes must be left free for new files. +It also makes it impossible for other parts of the kernel +to verify sole use of a device or file. +These reasons made it impractical to use hard references +without affecting the behavior of the inode caching scheme. +Thus, we chose instead to keep ``soft references'' protected +by a \fIcapability\fP \- a 32-bit number +guaranteed to be unique\u\s-22\s0\d \**. +.FS +\** \u\s-22\s0\d When all the numbers have been exhausted, all outstanding +capabilities are purged and numbering starts over from scratch. +Purging is possible as all capabilities are easily found in kernel memory. +.FE +When an entry is made in the name cache, +the capability of its inode is copied to the name cache entry. +When an inode is reused it is issued a new capability. +When a name cache hit occurs, +the capability of the name cache entry is compared +with the capability of the inode that it references. +If the capabilities do not match, the name cache entry is invalid. +Since the name cache holds only soft references, +it may be sized independent of the size of the inode table. +A final benefit of using capabilities is that all +cached names for an inode may be invalidated without +searching through the entire cache; +instead all you need to do is assign a new capability to the inode. +.PP +The cost of the name cache is about 200 lines of code +(about 1.2 kilobytes) +and 48 bytes per cache entry. +Depending on the size of the system, +about 200 to 1000 entries will normally be configured, +using 10-50 kilobytes of physical memory. +The name cache is resident in memory at all times. +.PP +After adding the system wide name cache we reran ``ls \-l'' +on the same directory. +The user time remained the same, +however the system time rose slightly to 3.7 seconds. +This was not surprising as \fInamei\fP +now had to maintain the cache, +but was never able to make any use of it. +.PP +Another profiled system was created and measurements +were collected over a 17 hour period. These measurements +showed a 13 ms/call decrease in \fInamei\fP, with +\fInamei\fP accounting for only 26% of the system call time, +13% of the time in the kernel, +or about 7% of all the machine cycles. +System time dropped from 55% to about 49%. +The results are shown in Table 10. +.KF +.DS L +.TS +center box; +l r r. +part time % of kernel +_ +self 4.2 ms/call 6.2% +child 4.4 ms/call 6.6% +_ +total 8.6 ms/call 12.8% +.TE +.ce +Table 10. Call times for \fInamei\fP with both caches. +.DE +.KE +.PP +On our general time sharing systems we find that during the twelve +hour period from 8AM to 8PM the system does 500,000 to 1,000,000 +name translations. +Statistics on the performance of both caches show that +the large performance improvement is +caused by the high hit ratio. +The name cache has a hit rate of 70%-80%; +the directory offset cache gets a hit rate of 5%-15%. +The combined hit rate of the two caches almost always adds up to 85%. +With the addition of the two caches, +the percentage of system time devoted to name translation has +dropped from 25% to less than 13%. +While the system wide cache reduces both the amount of time in +the routines that \fInamei\fP calls as well as \fInamei\fP itself +(since fewer directories need to be accessed or searched), +it is interesting to note that the actual percentage of system +time spent in \fInamei\fP itself increases even though the +actual time per call decreases. +This is because less total time is being spent in the kernel, +hence a smaller absolute time becomes a larger total percentage. +.NH 3 +Intelligent Auto Siloing +.PP +Most terminal input hardware can run in two modes: +it can either generate an interrupt each time a character is received, +or collect characters in a silo that the system then periodically drains. +To provide quick response for interactive input and flow control, +a silo must be checked 30 to 50 times per second. +Ascii terminals normally exhibit +an input rate of less than 30 characters per second. +At this input rate +they are most efficiently handled with interrupt per character mode, +since this generates fewer interrupts than draining the input silos +of the terminal multiplexors at each clock interrupt. +When input is being generated by another machine +or a malfunctioning terminal connection, however, +the input rate is usually more than 50 characters per second. +It is more efficient to use a device's silo input mode, +since this generates fewer interrupts than handling each character +as a separate interrupt. +Since a given dialup port may switch between uucp logins and user logins, +it is impossible to statically select the most efficient input mode to use. +.PP +We therefore changed the terminal multiplexor handlers +to dynamically choose between the use of the silo and the use of +per-character interrupts. +At low input rates the handler processes characters on an +interrupt basis, avoiding the overhead +of checking each interface on each clock interrupt. +During periods of sustained input, the handler enables the silo +and starts a timer to drain input. +This timer runs less frequently than the clock interrupts, +and is used only when there is a substantial amount of input. +The transition from using silos to an interrupt per character is +damped to minimize the number of transitions with bursty traffic +(such as in network communication). +Input characters serve to flush the silo, preventing long latency. +By switching between these two modes of operation dynamically, +the overhead of checking the silos is incurred only +when necessary. +.PP +In addition to the savings in the terminal handlers, +the clock interrupt routine is no longer required to schedule +a software interrupt after each hardware interrupt to drain the silos. +The software-interrupt level portion of the clock routine is only +needed when timers expire or the current user process is collecting +an execution profile. +Thus, the number of interrupts attributable to clock processing +is substantially reduced. +.NH 3 +Process Table Management +.PP +As systems have grown larger, the size of the process table +has grown far past 200 entries. +With large tables, linear searches must be eliminated +from any frequently used facility. +The kernel process table is now multi-threaded to allow selective searching +of active and zombie processes. +A third list threads unused process table slots. +Free slots can be obtained in constant time by taking one +from the front of the free list. +The number of processes used by a given user may be computed by scanning +only the active list. +Since the 4.2BSD release, +the kernel maintained linked lists of the descendents of each process. +This linkage is now exploited when dealing with process exit; +parents seeking the exit status of children now avoid linear search +of the process table, but examine only their direct descendents. +In addition, the previous algorithm for finding all descendents of an exiting +process used multiple linear scans of the process table. +This has been changed to follow the links between child process and siblings. +.PP +When forking a new process, +the system must assign it a unique process identifier. +The system previously scanned the entire process table each time it created +a new process to locate an identifier that was not already in use. +Now, to avoid scanning the process table for each new process, +the system computes a range of unused identifiers +that can be directly assigned. +Only when the set of identifiers is exhausted is another process table +scan required. +.NH 3 +Scheduling +.PP +Previously the scheduler scanned the entire process table +once per second to recompute process priorities. +Processes that had run for their entire time slice had their +priority lowered. +Processes that had not used their time slice, or that had +been sleeping for the past second had their priority raised. +On systems running many processes, +the scheduler represented nearly 20% of the system time. +To reduce this overhead, +the scheduler has been changed to consider only +runnable processes when recomputing priorities. +To insure that processes sleeping for more than a second +still get their appropriate priority boost, +their priority is recomputed when they are placed back on the run queue. +Since the set of runnable process is typically only a small fraction +of the total number of processes on the system, +the cost of invoking the scheduler drops proportionally. +.NH 3 +Clock Handling +.PP +The hardware clock interrupts the processor 100 times per second +at high priority. +As most of the clock-based events need not be done at high priority, +the system schedules a lower priority software interrupt to do the less +time-critical events such as cpu scheduling and timeout processing. +Often there are no such events, and the software interrupt handler +finds nothing to do and returns. +The high priority event now checks to see if there are low priority +events to process; +if there is nothing to do, the software interrupt is not requested. +Often, the high priority interrupt occurs during a period when the +machine had been running at low priority. +Rather than posting a software interrupt that would occur as +soon as it returns, +the hardware clock interrupt handler simply lowers the processor priority +and calls the software clock routines directly. +Between these two optimizations, nearly 80 of the 100 software +interrupts per second can be eliminated. +.NH 3 +File System +.PP +The file system uses a large block size, typically 4096 or 8192 bytes. +To allow small files to be stored efficiently, the large blocks can +be broken into smaller fragments, typically multiples of 1024 bytes. +To minimize the number of full-sized blocks that must be broken +into fragments, the file system uses a best fit strategy. +Programs that slowly grow files using write of 1024 bytes or less +can force the file system to copy the data to +successively larger and larger fragments until it finally +grows to a full sized block. +The file system still uses a best fit strategy the first time +a fragment is written. +However, the first time that the file system is forced to copy a growing +fragment it places it at the beginning of a full sized block. +Continued growth can be accommodated without further copying +by using up the rest of the block. +If the file ceases to grow, the rest of the block is still +available for holding other fragments. +.PP +When creating a new file name, +the entire directory in which it will reside must be scanned +to insure that the name does not already exist. +For large directories, this scan is time consuming. +Because there was no provision for shortening directories, +a directory that is once over-filled will increase the cost +of file creation even after the over-filling is corrected. +Thus, for example, a congested uucp connection can leave a legacy long +after it is cleared up. +To alleviate the problem, the system now deletes empty blocks +that it finds at the end of a directory while doing a complete +scan to create a new name. +.NH 3 +Network +.PP +The default amount of buffer space allocated for stream sockets (including +pipes) has been increased to 4096 bytes. +Stream sockets and pipes now return their buffer sizes in the block size field +of the stat structure. +This information allows the standard I/O library to use more optimal buffering. +Unix domain stream sockets also return a dummy device and inode number +in the stat structure to increase compatibility +with other pipe implementations. +The TCP maximum segment size is calculated according to the destination +and interface in use; non-local connections use a more conservative size +for long-haul networks. +.PP +On multiply-homed hosts, the local address bound by TCP now always corresponds +to the interface that will be used in transmitting data packets for the +connection. +Several bugs in the calculation of round trip timing have been corrected. +TCP now switches to an alternate gateway when an existing route fails, +or when an ICMP redirect message is received. +ICMP source quench messages are used to throttle the transmission +rate of TCP streams by temporarily creating an artificially small +send window, and retransmissions send only a single packet +rather than resending all queued data. +A send policy has been implemented +that decreases the number of small packets outstanding +for network terminal traffic [Nagle84], +providing additional reduction of network congestion. +The overhead of packet routing has been decreased by changes in the routing +code and by caching the most recently used route for each datagram socket. +.PP +The buffer management strategy implemented by \fIsosend\fP has been +changed to make better use of the increased size of the socket buffers +and a better tuned delayed acknowledgement algorithm. +Routing has been modified to include a one element cache of the last +route computed. +Multiple messages send with the same destination now require less processing. +Performance deteriorates because of load in +either the sender host, receiver host, or ether. +Also, any CPU contention degrades substantially +the throughput achievable by user processes [Cabrera85]. +We have observed empty VAX 11/750s using up to 90% of their cycles +transmitting network messages. +.NH 3 +Exec +.PP +When \fIexec\fP-ing a new process, the kernel creates the new +program's argument list by copying the arguments and environment +from the parent process's address space into the system, then back out +again onto the stack of the newly created process. +These two copy operations were done one byte at a time, but +are now done a string at a time. +This optimization reduced the time to process +an argument list by a factor of ten; +the average time to do an \fIexec\fP call decreased by 25%. +.NH 3 +Context Switching +.PP +The kernel used to post a software event when it wanted to force +a process to be rescheduled. +Often the process would be rescheduled for other reasons before +exiting the kernel, delaying the event trap. +At some later time the process would again +be selected to run and would complete its pending system call, +finally causing the event to take place. +The event would cause the scheduler to be invoked a second time +selecting the same process to run. +The fix to this problem is to cancel any software reschedule +events when saving a process context. +This change doubles the speed with which processes +can synchronize using pipes or signals. +.NH 3 +Setjmp/Longjmp +.PP +The kernel routine \fIsetjmp\fP, that saves the current system +context in preparation for a non-local goto used to save many more +registers than necessary under most circumstances. +By trimming its operation to save only the minimum state required, +the overhead for system calls decreased by an average of 13%. +.NH 3 +Compensating for Lack of Compiler Technology +.PP +The current compilers available for C do not +do any significant optimization. +Good optimizing compilers are unlikely to be built; +the C language is not well suited to optimization +because of its rampant use of unbound pointers. +Thus, many classical optimizations such as common subexpression +analysis and selection of register variables must be done +by hand using ``exterior'' knowledge of when such optimizations are safe. +.PP +Another optimization usually done by optimizing compilers +is inline expansion of small or frequently used routines. +In past Berkeley systems this has been done by using \fIsed\fP to +run over the assembly language and replace calls to small +routines with the code for the body of the routine, often +a single VAX instruction. +While this optimization eliminated the cost of the subroutine +call and return, +it did not eliminate the pushing and popping of several arguments +to the routine. +The \fIsed\fP script has been replaced by a more intelligent expander, +\fIinline\fP, that merges the pushes and pops into moves to registers. +For example, if the C code +.DS +if (scanc(map[i], 1, 47, i - 63)) +.DE +is compiled into assembly language it generates the code shown +in the left hand column of Table 11. +The \fIsed\fP inline expander changes this code to that +shown in the middle column. +The newer optimizer eliminates most of the stack +operations to generate the code shown in the right hand column. +.KF +.TS +center, box; +c s s s s s +c s | c s | c s +l l | l l | l l. +Alternative C Language Code Optimizations +_ +cc sed inline +_ +subl3 $64,_i,\-(sp) subl3 $64,_i,\-(sp) subl3 $64,_i,r5 +pushl $47 pushl $47 movl $47,r4 +pushl $1 pushl $1 pushl $1 +mull2 $16,_i,r3 mull2 $16,_i,r3 mull2 $16,_i,r3 +pushl \-56(fp)[r3] pushl \-56(fp)[r3] movl \-56(fp)[r3],r2 +calls $4,_scanc movl (sp)+,r5 movl (sp)+,r3 +tstl r0 movl (sp)+,r4 scanc r2,(r3),(r4),r5 +jeql L7 movl (sp)+,r3 tstl r0 + movl (sp)+,r2 jeql L7 + scanc r2,(r3),(r4),r5 + tstl r0 + jeql L7 +.TE +.ce +Table 11. Alternative inline code expansions. +.KE +.PP +Another optimization involved reevaluating +existing data structures in the context of the current system. +For example, disk buffer hashing was implemented when the system +typically had thirty to fifty buffers. +Most systems today have 200 to 1000 buffers. +Consequently, most of the hash chains contained +ten to a hundred buffers each! +The running time of the low level buffer management primitives was +dramatically improved simply by enlarging the size of the hash table. +.NH 2 +Improvements to Libraries and Utilities +.PP +Intuitively, changes to the kernel would seem to have the greatest +payoff since they affect all programs that run on the system. +However, the kernel has been tuned many times before, so the +opportunity for significant improvement was small. +By contrast, many of the libraries and utilities had never been tuned. +For example, we found utilities that spent 90% of their +running time doing single character read system calls. +Changing the utility to use the standard I/O library cut the +running time by a factor of five! +Thus, while most of our time has been spent tuning the kernel, +more than half of the speedups are because of improvements in +other parts of the system. +Some of the more dramatic changes are described in the following +subsections. +.NH 3 +Hashed Databases +.PP +UNIX provides a set of database management routines, \fIdbm\fP, +that can be used to speed lookups in large data files +with an external hashed index file. +The original version of dbm was designed to work with only one +database at a time. These routines were generalized to handle +multiple database files, enabling them to be used in rewrites +of the password and host file lookup routines. The new routines +used to access the password file significantly improve the running +time of many important programs such as the mail subsystem, +the C-shell (in doing tilde expansion), \fIls \-l\fP, etc. +.NH 3 +Buffered I/O +.PP +The new filesystem with its larger block sizes allows better +performance, but it is possible to degrade system performance +by performing numerous small transfers rather than using +appropriately-sized buffers. +The standard I/O library +automatically determines the optimal buffer size for each file. +Some C library routines and commonly-used programs use low-level +I/O or their own buffering, however. +Several important utilities that did not use the standard I/O library +and were buffering I/O using the old optimal buffer size, +1Kbytes; the programs were changed to buffer I/O according to the +optimal file system blocksize. +These include the editor, the assembler, loader, remote file copy, +the text formatting programs, and the C compiler. +.PP +The standard error output has traditionally been unbuffered +to prevent delay in presenting the output to the user, +and to prevent it from being lost if buffers are not flushed. +The inordinate expense of sending single-byte packets through +the network led us to impose a buffering scheme on the standard +error stream. +Within a single call to \fIfprintf\fP, all output is buffered temporarily. +Before the call returns, all output is flushed and the stream is again +marked unbuffered. +As before, the normal block or line buffering mechanisms can be used +instead of the default behavior. +.PP +It is possible for programs with good intentions to unintentionally +defeat the standard I/O library's choice of I/O buffer size by using +the \fIsetbuf\fP call to assign an output buffer. +Because of portability requirements, the default buffer size provided +by \fIsetbuf\fP is 1024 bytes; this can lead, once again, to added +overhead. +One such program with this problem was \fIcat\fP; +there are undoubtedly other standard system utilities with similar problems +as the system has changed much since they were originally written. +.NH 3 +Mail System +.PP +The problems discussed in section 3.1.1 prompted significant work +on the entire mail system. The first problem identified was a bug +in the \fIsyslog\fP program. The mail delivery program, \fIsendmail\fP +logs all mail transactions through this process with the 4.2BSD interprocess +communication facilities. \fISyslog\fP then records the information in +a log file. Unfortunately, \fIsyslog\fP was performing a \fIsync\fP +operation after each message it received, whether it was logged to a file +or not. This wreaked havoc on the effectiveness of the +buffer cache and explained, to a large +extent, why sending mail to large distribution lists generated such a +heavy load on the system (one syslog message was generated for each +message recipient causing almost a continuous sequence of sync operations). +.PP +The hashed data base files were +installed in all mail programs, resulting in an order of magnitude +speedup on large distribution lists. The code in \fI/bin/mail\fP +that notifies the \fIcomsat\fP program when mail has been delivered to +a user was changed to cache host table lookups, resulting in a similar +speedup on large distribution lists. +.PP +Next, the file locking facilities +provided in 4.2BSD, \fIflock\fP\|(2), were used in place of the old +locking mechanism. +The mail system previously used \fIlink\fP and \fIunlink\fP in +implementing file locking primitives. +Because these operations usually modify the contents of directories +they require synchronous disk operations and cannot take +advantage of the name cache maintained by the system. +Unlink requires that the entry be found in the directory so that +it can be removed; +link requires that the directory be scanned to insure that the name +does not already exist. +By contrast the advisory locking facility in 4.2BSD is +efficient because it is all done with in-memory tables. +Thus, the mail system was modified to use the file locking primitives. +This yielded another 10% cut in the basic overhead of delivering mail. +Extensive profiling and tuning of \fIsendmail\fP and +compiling it without debugging code reduced the overhead by another 20%. +.NH 3 +Network Servers +.PP +With the introduction of the network facilities in 4.2BSD, +a myriad of services became available, each of which +required its own daemon process. +Many of these daemons were rarely if ever used, +yet they lay asleep in the process table consuming +system resources and generally slowing down response. +Rather than having many servers started at boot time, a single server, +\fIinetd\fP was substituted. +This process reads a simple configuration file +that specifies the services the system is willing to support +and listens for service requests on each service's Internet port. +When a client requests service the appropriate server is created +and passed a service connection as its standard input. Servers +that require the identity of their client may use the \fIgetpeername\fP +system call; likewise \fIgetsockname\fP may be used to find out +a server's local address without consulting data base files. +This scheme is attractive for several reasons: +.IP \(bu 3 +it eliminates +as many as a dozen processes, easing system overhead and +allowing the file and text tables to be made smaller, +.IP \(bu 3 +servers need not contain the code required to handle connection +queueing, simplifying the programs, and +.IP \(bu 3 +installing and replacing servers becomes simpler. +.PP +With an increased numbers of networks, both local and external to Berkeley, +we found that the overhead of the routing process was becoming +inordinately high. +Several changes were made in the routing daemon to reduce this load. +Routes to external networks are no longer exchanged by routers +on the internal machines, only a route to a default gateway. +This reduces the amount of network traffic and the time required +to process routing messages. +In addition, the routing daemon was profiled +and functions responsible for large amounts +of time were optimized. +The major changes were a faster hashing scheme, +and inline expansions of the ubiquitous byte-swapping functions. +.PP +Under certain circumstances, when output was blocked, +attempts by the remote login process +to send output to the user were rejected by the system, +although a prior \fIselect\fP call had indicated that data could be sent. +This resulted in continuous attempts to write the data until the remote +user restarted output. +This problem was initially avoided in the remote login handler, +and the original problem in the kernel has since been corrected. +.NH 3 +The C Run-time Library +.PP +Several people have found poorly tuned code +in frequently used routines in the C library [Lankford84]. +In particular the running time of the string routines can be +cut in half by rewriting them using the VAX string instructions. +The memory allocation routines have been tuned to waste less +memory for memory allocations with sizes that are a power of two. +Certain library routines that did file input in one-character reads +have been corrected. +Other library routines including \fIfread\fP and \fIfwrite\fP +have been rewritten for efficiency. +.NH 3 +Csh +.PP +The C-shell was converted to run on 4.2BSD by +writing a set of routines to simulate the old jobs library. +While this provided a functioning C-shell, +it was grossly inefficient, generating up +to twenty system calls per prompt. +The C-shell has been modified to use the new signal +facilities directly, +cutting the number of system calls per prompt in half. +Additional tuning was done with the help of profiling +to cut the cost of frequently used facilities. diff --git a/share/doc/papers/sysperf/5.t b/share/doc/papers/sysperf/5.t new file mode 100644 index 0000000..ff008c3 --- /dev/null +++ b/share/doc/papers/sysperf/5.t @@ -0,0 +1,287 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)5.t 5.1 (Berkeley) 4/17/91 +.\" +.\" $FreeBSD$ +.\" +.ds RH Functional Extensions +.NH +Functional Extensions +.PP +Some of the facilities introduced in 4.2BSD were not completely +implemented. An important part of the effort that went into +4.3BSD was to clean up and unify both new and old facilities. +.NH 2 +Kernel Extensions +.PP +A significant effort went into improving +the networking part of the kernel. +The work consisted of fixing bugs, +tuning the algorithms, +and revamping the lowest levels of the system +to better handle heterogeneous network topologies. +.NH 3 +Subnets, Broadcasts and Gateways +.PP +To allow sites to expand their network in an autonomous +and orderly fashion, subnetworks have been introduced in 4.3BSD [GADS85]. +This facility allows sites to subdivide their local Internet address +space into multiple subnetwork address spaces that are visible +only by hosts at that site. To off-site hosts machines on a site's +subnetworks appear to reside on a single network. The routing daemon +has been reworked to provide routing support in this type of +environment. +.PP +The default Internet broadcast address is now specified with a host part +of all one's, rather than all zero's. +The broadcast address may be set at boot time on a per-interface basis. +.NH 3 +Interface Addressing +.PP +The organization of network interfaces has been +reworked to more cleanly support multiple +network protocols. Network interfaces no longer +contain a host's address on that network; instead +each interface contains a pointer to a list of addresses +assigned to that interface. This permits a single +interface to support, for example, Internet protocols +at the same time as XNS protocols. +.PP +The Address Resolution Protocol (ARP) support +for 10 megabyte/second Ethernet\(dg +.FS +\(dg Ethernet is a trademark of Xerox. +.FE +has been made more flexible by allowing hosts to +act as a ``clearing house'' for hosts that do +not support ARP. In addition, system managers have +more control over the contents of the ARP translation +cache and may interactively interrogate and modify +the cache's contents. +.NH 3 +User Control of Network Buffering +.PP +Although the system allocates reasonable default amounts of buffering +for most connections, certain operations such as file system dumps +to remote machines benefit from significant increases in buffering [Walsh84]. +The \fIsetsockopt\fP system call has been extended to allow such requests. +In addition, \fIgetsockopt\fP and \fIsetsockopt\fP, +are now interfaced to the protocol level allowing protocol-specific +options to be manipulated by the user. +.NH 3 +Number of File Descriptors +.PP +To allow full use of the many descriptor based services available, +the previous hard limit of 30 open files per process has been relaxed. +The changes entailed generalizing \fIselect\fP to handle arrays of +32-bit words, removing the dependency on file descriptors from +the page table entries, +and limiting most of the linear scans of a process's file table. +The default per-process descriptor limit was raised from 20 to 64, +though there are no longer any hard upper limits on the number +of file descriptors. +.NH 3 +Kernel Limits +.PP +Many internal kernel configuration limits have been increased by suitable +modifications to data structures. +The limit on physical memory has been changed from 8 megabyte to 64 megabyte, +and the limit of 15 mounted file systems has been changed to 255. +The maximum file system size has been increased to 8 gigabyte, +number of processes to 65536, +and per process size to 64 megabyte of data and 64 megabyte of stack. +Note that these are upper bounds, +the default limits for these quantities are tuned for systems +with 4-8 megabyte of physical memory. +.NH 3 +Memory Management +.PP +The global clock page replacement algorithm used to have a single +hand that was used both to mark and to reclaim memory. +The first time that it encountered a page it would clear its reference bit. +If the reference bit was still clear on its next pass across the page, +it would reclaim the page. +The use of a single hand does not work well with large physical +memories as the time to complete a single revolution of the hand +can take up to a minute or more. +By the time the hand gets around to the marked pages, +the information is usually no longer pertinent. +During periods of sudden shortages, +the page daemon will not be able to find any reclaimable pages until +it has completed a full revolution. +To alleviate this problem, +the clock hand has been split into two separate hands. +The front hand clears the reference bits, +the back hand follows a constant number of pages behind +reclaiming pages that still have cleared reference bits. +While the code has been written to allow the distance between +the hands to be varied, we have not found any algorithms +suitable for determining how to dynamically adjust this distance. +.PP +The configuration of the virtual memory system used to require +a significant understanding of its operation to do such +simple tasks as increasing the maximum process size. +This process has been significantly improved so that the most +common configuration parameters, such as the virtual memory sizes, +can be specified using a single option in the configuration file. +Standard configurations support data and stack segments +of 17, 33 and 64 megabytes. +.NH 3 +Signals +.PP +The 4.2BSD signal implementation would push several words +onto the normal run-time stack before switching to an +alternate signal stack. +The 4.3BSD implementation has been corrected so that +the entire signal handler's state is now pushed onto the signal stack. +Another limitation in the original signal implementation was +that it used an undocumented system call to return from signals. +Users could not write their own return from exceptions; +4.3BSD formally specifies the \fIsigreturn\fP system call. +.PP +Many existing programs depend on interrupted system calls. +The restartable system call semantics of 4.2BSD signals caused +many of these programs to break. +To simplify porting of programs from inferior versions of +.UX +the \fIsigvec\fP system call has been extended so that +programmers may specify that system calls are not to be +restarted after particular signals. +.NH 3 +System Logging +.PP +A system logging facility has been added +that sends kernel messages to the +syslog daemon for logging in /usr/adm/messages and possibly for +printing on the system console. +The revised scheme for logging messages +eliminates the time lag in updating the messages file, +unifies the format of kernel messages, +provides a finer granularity of control over the messages +that get printed on the console, +and eliminates the degradation in response during the printing of +low-priority kernel messages. +Recoverable system errors and common resource limitations are logged +using this facility. +Most system utilities such as init and login, +have been modified to log errors to syslog +rather than writing directly on the console. +.NH 3 +Windows +.PP +The tty structure has been augmented to hold +information about the size +of an associated window or terminal. +These sizes can be obtained by programs such as editors that want +to know the size of the screen they are manipulating. +When these sizes are changed, +a new signal, SIGWINCH, is sent the current process group. +The editors have been modified to catch this signal and reshape +their view of the world, and the remote login program and server +now cooperate to propagate window sizes and window size changes +across a network. +Other programs and libraries such as curses that need the width +or height of the screen have been modified to use this facility as well. +.NH 3 +Configuration of UNIBUS Devices +.PP +The UNIBUS configuration routines have been extended to allow auto-configuration +of dedicated UNIBUS memory held by devices. +The new routines simplify the configuration of memory-mapped devices +and correct problems occurring on reset of the UNIBUS. +.NH 3 +Disk Recovery from Errors +.PP +The MASSBUS disk driver's error recovery routines have been fixed to +retry before correcting ECC errors, support ECC on bad-sector replacements, +and correctly attempt retries after earlier +corrective actions in the same transfer. +The error messages are more accurate. +.NH 2 +Functional Extensions to Libraries and Utilities +.PP +Most of the changes to the utilities and libraries have been to +allow them to handle a more general set of problems, +or to handle the same set of problems more quickly. +.NH 3 +Name Server +.PP +In 4.2BSD the name resolution routines (\fIgethostbyname\fP, +\fIgetservbyname\fP, +etc.) were implemented by a set of database files maintained on the +local machine. +Inconsistencies or obsolescence in these files resulted in inaccessibility of +hosts or services. +In 4.3BSD these files may be replaced by a network name server that can +insure a consistent view of the name space in a multimachine environment. +This name server operates in accordance with Internet standards +for service on the ARPANET [Mockapetris83]. +.NH 3 +System Management +.PP +A new utility, \fIrdist\fP, +has been provided to assist system managers in keeping +all their machines up to date with a consistent set of sources and binaries. +A master set of sources may reside on a single central machine, +or be distributed at (known) locations throughout the environment. +New versions of \fIgetty\fP, \fIinit\fP, and \fIlogin\fP +merge the functions of several +files into a single place, and allow more flexibility in the +startup of processes such as window managers. +.PP +The new utility \fItimed\fP keeps the time on a group of cooperating machines +(within a single LAN) synchronized to within 30 milliseconds. +It does its corrections using a new system call that changes +the rate of time advance without stopping or reversing the system clock. +It normally selects one machine to act as a master. +If the master dies or is partitioned, a new master is elected. +Other machines may participate in a purely slave role. +.NH 3 +Routing +.PP +Many bugs in the routing daemon have been fixed; +it is considerably more robust, +and now understands how to properly deal with +subnets and point-to-point networks. +Its operation has been made more efficient by tuning with the use +of execution profiles, along with inline expansion of common operations +using the kernel's \fIinline\fP optimizer. +.NH 3 +Compilers +.PP +The symbolic debugger \fIdbx\fP has had many new features added, +and all the known bugs fixed. In addition \fIdbx\fP +has been extended to work with the Pascal compiler. +The fortran compiler \fIf77\fP has had numerous bugs fixed. +The C compiler has been modified so that it can, optionally, +generate single precision floating point instructions when operating +on single precision variables. diff --git a/share/doc/papers/sysperf/6.t b/share/doc/papers/sysperf/6.t new file mode 100644 index 0000000..a445ee1 --- /dev/null +++ b/share/doc/papers/sysperf/6.t @@ -0,0 +1,70 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)6.t 5.1 (Berkeley) 4/17/91 +.\" +.ds RH Security Tightening +.NH +Security Tightening +.PP +Since we do not wish to encourage rampant system cracking, +we describe only briefly the changes made to enhance security. +.NH 2 +Generic Kernel +.PP +Several loopholes in the process tracing facility have been corrected. +Programs being traced may not be executed; +executing programs may not be traced. +Programs may not provide input to terminals to which they do not +have read permission. +The handling of process groups has been tightened to eliminate +some problems. +When a program attempts to change its process group, +the system checks to see if the process with the pid of the process +group was started by the same user. +If it exists and was started by a different user the process group +number change is denied. +.NH 2 +Security Problems in Utilities +.PP +Setuid utilities no longer use the \fIpopen\fP or \fIsystem\fP library routines. +Access to the kernel's data structures through the kmem device +is now restricted to programs that are set group id ``kmem''. +Thus many programs that used to run with root privileges +no longer need to do so. +Access to disk devices is now controlled by an ``operator'' group id; +this permission allows operators to function without being the super-user. +Only users in group wheel can do ``su root''; this restriction +allows administrators to define a super-user access list. +Numerous holes have been closed in the shell to prevent +users from gaining privileges from set user id shell scripts, +although use of such scripts is still highly discouraged on systems +that are concerned about security. diff --git a/share/doc/papers/sysperf/7.t b/share/doc/papers/sysperf/7.t new file mode 100644 index 0000000..68f5717 --- /dev/null +++ b/share/doc/papers/sysperf/7.t @@ -0,0 +1,164 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)7.t 5.1 (Berkeley) 4/17/91 +.\" +.ds RH Conclusions +.NH +Conclusions +.PP +4.2BSD, while functionally superior to 4.1BSD, lacked much of the +performance tuning required of a good system. We found that +the distributed system spent 10-20% more time in the kernel than +4.1BSD. This added overhead combined with problems with several +user programs severely limited the overall performance of the +system in a general timesharing environment. +.PP +Changes made to the system since the 4.2BSD distribution have +eliminated most of the +added system overhead by replacing old algorithms +or introducing additional cacheing schemes. +The combined caches added to the name translation process +reduce the average cost of translating a pathname to an inode by more than 50%. +These changes reduce the percentage of time spent running +in the system by nearly 9%. +.PP +The use of silo input on terminal ports only when necessary +has allowed the system to avoid a large amount of software interrupt +processing. Observations show that the system is forced to +field about 25% fewer interrupts than before. +.PP +The kernel +changes, combined with many bug fixes, make the system much more +responsive in a general timesharing environment. +The 4.3BSD Berkeley UNIX system now appears +capable of supporting loads at least as large as those supported under +4.1BSD while providing all the new interprocess communication, networking, +and file system facilities. +.nr H2 1 +.ds RH Acknowledgements +.SH +\s+2Acknowledgements\s0 +.PP +We would like to thank Robert Elz for sharing his ideas and +his code for cacheing system wide names and searching the process table. +We thank Alan Smith for initially suggesting the use of a +capability based cache. +We also acknowledge +George Goble who dropped many of our changes +into his production system and reported back fixes to the +disasters that they caused. +The buffer cache read-ahead trace package was based +on a program written by Jim Lawson. Ralph Campbell +implemented several of the C library changes. The original +version of the Internet daemon was written by Bill Joy. +In addition, +we would like to thank the many other people that contributed +ideas, information, and work while the system was undergoing change. +.ds RH References +.nr H2 1 +.sp 2 +.SH +\s+2References\s-2 +.LP +.IP [Cabrera84] 20 +Luis Felipe Cabrera, Eduard Hunter, Michael J. Karels, and David Mosher, +``A User-Process Oriented Performance Study of Ethernet Networking Under +Berkeley UNIX 4.2BSD,'' +Research Report No. UCB/CSD 84/217, University of California, +Berkeley, December 1984. +.IP [Cabrera85] 20 +Luis Felipe Cabrera, Michael J. Karels, and David Mosher, +``The Impact of Buffer Management on Networking Software Performance +in Berkeley UNIX 4.2BSD: A Case Study,'' +Proceedings of the Summer Usenix Conference, Portland, Oregon, +June 1985, pp. 507-517. +.IP [GADS85] 20 +GADS (Gateway Algorithms and Data Structures Task Force), +``Toward an Internet Standard for Subnetting,'' RFC-940, +Network Information Center, SRI International, +April 1985. +.IP [Joy80] 20 +Joy, William, +``Comments on the performance of UNIX on the VAX'', +Computer System Research Group, U.C. Berkeley. +April 1980. +.IP [Kashtan80] 20 +Kashtan, David L., +``UNIX and VMS, Some Performance Comparisons'', +SRI International. February 1980. +.IP [Lankford84] 20 +Jeffrey Lankford, +``UNIX System V and 4BSD Performance,'' +\fIProceedings of the Salt Lake City Usenix Conference\fP, +pp 228-236, June 1984. +.IP [Leffler84] 20 +Sam Leffler, Mike Karels, and M. Kirk McKusick, +``Measuring and Improving the Performance of 4.2BSD,'' +\fIProceedings of the Salt Lake City Usenix Conference\fP, +pp 237-252, June 1984. +.IP [McKusick85] +M. Kirk McKusick, Mike Karels, and Samual Leffler, +``Performance Improvements and Functional Enhancements in 4.3BSD'' +\fIProceedings of the Portland Usenix Conference\fP, +pp 519-531, June 1985. +.IP [Mockapetris83] 20 +Paul Mockapetris, ``Domain Names \- Implementation and Schedule,'' +Network Information Center, SRI International, +RFC-883, +November 1983. +.IP [Mogul84] 20 +Jeffrey Mogul, ``Broadcasting Internet Datagrams,'' RFC-919, +Network Information Center, SRI International, +October 1984. +.IP [Mosher80] 20 +Mosher, David, +``UNIX Performance, an Introspection'', +Presented at the Boulder, Colorado Usenix Conference, January 1980. +Copies of the paper are available from +Computer System Research Group, U.C. Berkeley. +.IP [Nagle84] 20 +John Nagle, ``Congestion Control in IP/TCP Internetworks,'' RFC-896, +Network Information Center, SRI International, +January 1984. +.IP [Ritchie74] 20 +Ritchie, D. M. and Thompson, K., +``The UNIX Time-Sharing System'', +CACM 17, 7. July 1974. pp 365-375 +.IP [Shannon83] 20 +Shannon, W., +private communication, +July 1983 +.IP [Walsh84] 20 +Robert Walsh and Robert Gurwitz, +``Converting BBN TCP/IP to 4.2BSD,'' +\fIProceedings of the Salt Lake City Usenix Conference\fP, +pp 52-61, June 1984. diff --git a/share/doc/papers/sysperf/Makefile b/share/doc/papers/sysperf/Makefile new file mode 100644 index 0000000..9ddbc9d --- /dev/null +++ b/share/doc/papers/sysperf/Makefile @@ -0,0 +1,19 @@ +# From: @(#)Makefile 1.6 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= papers +DOC= sysperf +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t 7.t appendix.tmp +EXTRA= a1.t a2.t +MACROS= -ms +USE_EQN= +USE_TBL= +CLEANFILES= appendix.tmp + +appendix.tmp: a1.t a2.t + ${GRIND} ${.CURDIR}/a1.t | awk '/\.\(\)/{ cnt = 2 } \ + { if (cnt) cnt -= 1; else print $$0; } ' > appendix.tmp + ${GRIND} -lcsh ${.CURDIR}/a2.t | awk '/\.\(\)/{ cnt = 2 } \ + { if (cnt) cnt -= 1; else print $$0; } ' >> appendix.tmp + +.include <bsd.doc.mk> diff --git a/share/doc/papers/sysperf/a1.t b/share/doc/papers/sysperf/a1.t new file mode 100644 index 0000000..b94f6aa --- /dev/null +++ b/share/doc/papers/sysperf/a1.t @@ -0,0 +1,668 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)a1.t 5.1 (Berkeley) 4/17/91 +.\" +.ds RH Appendix A \- Benchmark sources +.nr H2 1 +.sp 2 +.de vS +.nf +.. +.de vE +.fi +.. +.bp +.SH +\s+2Appendix A \- Benchmark sources\s-2 +.LP +The programs shown here run under 4.2 with only routines +from the standard libraries. When run under 4.1 they were augmented +with a \fIgetpagesize\fP routine and a copy of the \fIrandom\fP +function from the C library. The \fIvforks\fP and \fIvexecs\fP +programs are constructed from the \fIforks\fP and \fIexecs\fP programs, +respectively, by substituting calls to \fIfork\fP with calls to +\fIvfork\fP. +.SH +syscall +.LP +.vS +/* + * System call overhead benchmark. + */ +main(argc, argv) + char *argv[]; +{ + register int ncalls; + + if (argc < 2) { + printf("usage: %s #syscalls\n", argv[0]); + exit(1); + } + ncalls = atoi(argv[1]); + while (ncalls-- > 0) + (void) getpid(); +} +.vE +.SH +csw +.LP +.vS +/* + * Context switching benchmark. + * + * Force system to context switch 2*nsigs + * times by forking and exchanging signals. + * To calculate system overhead for a context + * switch, the signocsw program must be run + * with nsigs. Overhead is then estimated by + * t1 = time csw <n> + * t2 = time signocsw <n> + * overhead = t1 - 2 * t2; + */ +#include <signal.h> + +int sigsub(); +int otherpid; +int nsigs; + +main(argc, argv) + char *argv[]; +{ + int pid; + + if (argc < 2) { + printf("usage: %s nsignals\n", argv[0]); + exit(1); + } + nsigs = atoi(argv[1]); + signal(SIGALRM, sigsub); + otherpid = getpid(); + pid = fork(); + if (pid != 0) { + otherpid = pid; + kill(otherpid, SIGALRM); + } + for (;;) + sigpause(0); +} + +sigsub() +{ + + signal(SIGALRM, sigsub); + kill(otherpid, SIGALRM); + if (--nsigs <= 0) + exit(0); +} +.vE +.SH +signocsw +.LP +.vS +/* + * Signal without context switch benchmark. + */ +#include <signal.h> + +int pid; +int nsigs; +int sigsub(); + +main(argc, argv) + char *argv[]; +{ + register int i; + + if (argc < 2) { + printf("usage: %s nsignals\n", argv[0]); + exit(1); + } + nsigs = atoi(argv[1]); + signal(SIGALRM, sigsub); + pid = getpid(); + for (i = 0; i < nsigs; i++) + kill(pid, SIGALRM); +} + +sigsub() +{ + + signal(SIGALRM, sigsub); +} +.vE +.SH +pipeself +.LP +.vS +/* + * IPC benchmark, + * write to self using pipes. + */ + +main(argc, argv) + char *argv[]; +{ + char buf[512]; + int fd[2], msgsize; + register int i, iter; + + if (argc < 3) { + printf("usage: %s iterations message-size\n", argv[0]); + exit(1); + } + argc--, argv++; + iter = atoi(*argv); + argc--, argv++; + msgsize = atoi(*argv); + if (msgsize > sizeof (buf) || msgsize <= 0) { + printf("%s: Bad message size.\n", *argv); + exit(2); + } + if (pipe(fd) < 0) { + perror("pipe"); + exit(3); + } + for (i = 0; i < iter; i++) { + write(fd[1], buf, msgsize); + read(fd[0], buf, msgsize); + } +} +.vE +.SH +pipediscard +.LP +.vS +/* + * IPC benchmarkl, + * write and discard using pipes. + */ + +main(argc, argv) + char *argv[]; +{ + char buf[512]; + int fd[2], msgsize; + register int i, iter; + + if (argc < 3) { + printf("usage: %s iterations message-size\n", argv[0]); + exit(1); + } + argc--, argv++; + iter = atoi(*argv); + argc--, argv++; + msgsize = atoi(*argv); + if (msgsize > sizeof (buf) || msgsize <= 0) { + printf("%s: Bad message size.\n", *argv); + exit(2); + } + if (pipe(fd) < 0) { + perror("pipe"); + exit(3); + } + if (fork() == 0) + for (i = 0; i < iter; i++) + read(fd[0], buf, msgsize); + else + for (i = 0; i < iter; i++) + write(fd[1], buf, msgsize); +} +.vE +.SH +pipeback +.LP +.vS +/* + * IPC benchmark, + * read and reply using pipes. + * + * Process forks and exchanges messages + * over a pipe in a request-response fashion. + */ + +main(argc, argv) + char *argv[]; +{ + char buf[512]; + int fd[2], fd2[2], msgsize; + register int i, iter; + + if (argc < 3) { + printf("usage: %s iterations message-size\n", argv[0]); + exit(1); + } + argc--, argv++; + iter = atoi(*argv); + argc--, argv++; + msgsize = atoi(*argv); + if (msgsize > sizeof (buf) || msgsize <= 0) { + printf("%s: Bad message size.\n", *argv); + exit(2); + } + if (pipe(fd) < 0) { + perror("pipe"); + exit(3); + } + if (pipe(fd2) < 0) { + perror("pipe"); + exit(3); + } + if (fork() == 0) + for (i = 0; i < iter; i++) { + read(fd[0], buf, msgsize); + write(fd2[1], buf, msgsize); + } + else + for (i = 0; i < iter; i++) { + write(fd[1], buf, msgsize); + read(fd2[0], buf, msgsize); + } +} +.vE +.SH +forks +.LP +.vS +/* + * Benchmark program to calculate fork+wait + * overhead (approximately). Process + * forks and exits while parent waits. + * The time to run this program is used + * in calculating exec overhead. + */ + +main(argc, argv) + char *argv[]; +{ + register int nforks, i; + char *cp; + int pid, child, status, brksize; + + if (argc < 2) { + printf("usage: %s number-of-forks sbrk-size\n", argv[0]); + exit(1); + } + nforks = atoi(argv[1]); + if (nforks < 0) { + printf("%s: bad number of forks\n", argv[1]); + exit(2); + } + brksize = atoi(argv[2]); + if (brksize < 0) { + printf("%s: bad size to sbrk\n", argv[2]); + exit(3); + } + cp = (char *)sbrk(brksize); + if ((int)cp == -1) { + perror("sbrk"); + exit(4); + } + for (i = 0; i < brksize; i += 1024) + cp[i] = i; + while (nforks-- > 0) { + child = fork(); + if (child == -1) { + perror("fork"); + exit(-1); + } + if (child == 0) + _exit(-1); + while ((pid = wait(&status)) != -1 && pid != child) + ; + } + exit(0); +} +.vE +.SH +execs +.LP +.vS +/* + * Benchmark program to calculate exec + * overhead (approximately). Process + * forks and execs "null" test program. + * The time to run the fork program should + * then be deducted from this one to + * estimate the overhead for the exec. + */ + +main(argc, argv) + char *argv[]; +{ + register int nexecs, i; + char *cp, *sbrk(); + int pid, child, status, brksize; + + if (argc < 3) { + printf("usage: %s number-of-execs sbrk-size job-name\n", + argv[0]); + exit(1); + } + nexecs = atoi(argv[1]); + if (nexecs < 0) { + printf("%s: bad number of execs\n", argv[1]); + exit(2); + } + brksize = atoi(argv[2]); + if (brksize < 0) { + printf("%s: bad size to sbrk\n", argv[2]); + exit(3); + } + cp = sbrk(brksize); + if ((int)cp == -1) { + perror("sbrk"); + exit(4); + } + for (i = 0; i < brksize; i += 1024) + cp[i] = i; + while (nexecs-- > 0) { + child = fork(); + if (child == -1) { + perror("fork"); + exit(-1); + } + if (child == 0) { + execv(argv[3], argv); + perror("execv"); + _exit(-1); + } + while ((pid = wait(&status)) != -1 && pid != child) + ; + } + exit(0); +} +.vE +.SH +nulljob +.LP +.vS +/* + * Benchmark "null job" program. + */ + +main(argc, argv) + char *argv[]; +{ + + exit(0); +} +.vE +.SH +bigjob +.LP +.vS +/* + * Benchmark "null big job" program. + */ +/* 250 here is intended to approximate vi's text+data size */ +char space[1024 * 250] = "force into data segment"; + +main(argc, argv) + char *argv[]; +{ + + exit(0); +} +.vE +.bp +.SH +seqpage +.LP +.vS +/* + * Sequential page access benchmark. + */ +#include <sys/vadvise.h> + +char *valloc(); + +main(argc, argv) + char *argv[]; +{ + register i, niter; + register char *pf, *lastpage; + int npages = 4096, pagesize, vflag = 0; + char *pages, *name; + + name = argv[0]; + argc--, argv++; +again: + if (argc < 1) { +usage: + printf("usage: %s [ -v ] [ -p #pages ] niter\n", name); + exit(1); + } + if (strcmp(*argv, "-p") == 0) { + argc--, argv++; + if (argc < 1) + goto usage; + npages = atoi(*argv); + if (npages <= 0) { + printf("%s: Bad page count.\n", *argv); + exit(2); + } + argc--, argv++; + goto again; + } + if (strcmp(*argv, "-v") == 0) { + argc--, argv++; + vflag++; + goto again; + } + niter = atoi(*argv); + pagesize = getpagesize(); + pages = valloc(npages * pagesize); + if (pages == (char *)0) { + printf("Can't allocate %d pages (%2.1f megabytes).\n", + npages, (npages * pagesize) / (1024. * 1024.)); + exit(3); + } + lastpage = pages + (npages * pagesize); + if (vflag) + vadvise(VA_SEQL); + for (i = 0; i < niter; i++) + for (pf = pages; pf < lastpage; pf += pagesize) + *pf = 1; +} +.vE +.SH +randpage +.LP +.vS +/* + * Random page access benchmark. + */ +#include <sys/vadvise.h> + +char *valloc(); +int rand(); + +main(argc, argv) + char *argv[]; +{ + register int npages = 4096, pagesize, pn, i, niter; + int vflag = 0, debug = 0; + char *pages, *name; + + name = argv[0]; + argc--, argv++; +again: + if (argc < 1) { +usage: + printf("usage: %s [ -d ] [ -v ] [ -p #pages ] niter\n", name); + exit(1); + } + if (strcmp(*argv, "-p") == 0) { + argc--, argv++; + if (argc < 1) + goto usage; + npages = atoi(*argv); + if (npages <= 0) { + printf("%s: Bad page count.\n", *argv); + exit(2); + } + argc--, argv++; + goto again; + } + if (strcmp(*argv, "-v") == 0) { + argc--, argv++; + vflag++; + goto again; + } + if (strcmp(*argv, "-d") == 0) { + argc--, argv++; + debug++; + goto again; + } + niter = atoi(*argv); + pagesize = getpagesize(); + pages = valloc(npages * pagesize); + if (pages == (char *)0) { + printf("Can't allocate %d pages (%2.1f megabytes).\n", + npages, (npages * pagesize) / (1024. * 1024.)); + exit(3); + } + if (vflag) + vadvise(VA_ANOM); + for (i = 0; i < niter; i++) { + pn = random() % npages; + if (debug) + printf("touch page %d\n", pn); + pages[pagesize * pn] = 1; + } +} +.vE +.SH +gausspage +.LP +.vS +/* + * Random page access with + * a gaussian distribution. + * + * Allocate a large (zero fill on demand) address + * space and fault the pages in a random gaussian + * order. + */ + +float sqrt(), log(), rnd(), cos(), gauss(); +char *valloc(); +int rand(); + +main(argc, argv) + char *argv[]; +{ + register int pn, i, niter, delta; + register char *pages; + float sd = 10.0; + int npages = 4096, pagesize, debug = 0; + char *name; + + name = argv[0]; + argc--, argv++; +again: + if (argc < 1) { +usage: + printf( +"usage: %s [ -d ] [ -p #pages ] [ -s standard-deviation ] iterations\n", name); + exit(1); + } + if (strcmp(*argv, "-s") == 0) { + argc--, argv++; + if (argc < 1) + goto usage; + sscanf(*argv, "%f", &sd); + if (sd <= 0) { + printf("%s: Bad standard deviation.\n", *argv); + exit(2); + } + argc--, argv++; + goto again; + } + if (strcmp(*argv, "-p") == 0) { + argc--, argv++; + if (argc < 1) + goto usage; + npages = atoi(*argv); + if (npages <= 0) { + printf("%s: Bad page count.\n", *argv); + exit(2); + } + argc--, argv++; + goto again; + } + if (strcmp(*argv, "-d") == 0) { + argc--, argv++; + debug++; + goto again; + } + niter = atoi(*argv); + pagesize = getpagesize(); + pages = valloc(npages*pagesize); + if (pages == (char *)0) { + printf("Can't allocate %d pages (%2.1f megabytes).\n", + npages, (npages*pagesize) / (1024. * 1024.)); + exit(3); + } + pn = 0; + for (i = 0; i < niter; i++) { + delta = gauss(sd, 0.0); + while (pn + delta < 0 || pn + delta > npages) + delta = gauss(sd, 0.0); + pn += delta; + if (debug) + printf("touch page %d\n", pn); + else + pages[pn * pagesize] = 1; + } +} + +float +gauss(sd, mean) + float sd, mean; +{ + register float qa, qb; + + qa = sqrt(log(rnd()) * -2.0); + qb = 3.14159 * rnd(); + return (qa * cos(qb) * sd + mean); +} + +float +rnd() +{ + static int seed = 1; + static int biggest = 0x7fffffff; + + return ((float)rand(seed) / (float)biggest); +} +.vE diff --git a/share/doc/papers/sysperf/a2.t b/share/doc/papers/sysperf/a2.t new file mode 100644 index 0000000..e1882cf --- /dev/null +++ b/share/doc/papers/sysperf/a2.t @@ -0,0 +1,117 @@ +.\" Copyright (c) 1985 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)a2.t 5.1 (Berkeley) 4/17/91 +.\" +.SH +run (shell script) +.LP +.vS +#! /bin/csh -fx +# Script to run benchmark programs. +# +date +make clean; time make +time syscall 100000 +time seqpage -p 7500 10 +time seqpage -v -p 7500 10 +time randpage -p 7500 30000 +time randpage -v -p 7500 30000 +time gausspage -p 7500 -s 1 30000 +time gausspage -p 7500 -s 10 30000 +time gausspage -p 7500 -s 30 30000 +time gausspage -p 7500 -s 40 30000 +time gausspage -p 7500 -s 50 30000 +time gausspage -p 7500 -s 60 30000 +time gausspage -p 7500 -s 80 30000 +time gausspage -p 7500 -s 10000 30000 +time csw 10000 +time signocsw 10000 +time pipeself 10000 512 +time pipeself 10000 4 +time udgself 10000 512 +time udgself 10000 4 +time pipediscard 10000 512 +time pipediscard 10000 4 +time udgdiscard 10000 512 +time udgdiscard 10000 4 +time pipeback 10000 512 +time pipeback 10000 4 +time udgback 10000 512 +time udgback 10000 4 +size forks +time forks 1000 0 +time forks 1000 1024 +time forks 1000 102400 +size vforks +time vforks 1000 0 +time vforks 1000 1024 +time vforks 1000 102400 +countenv +size nulljob +time execs 1000 0 nulljob +time execs 1000 1024 nulljob +time execs 1000 102400 nulljob +time vexecs 1000 0 nulljob +time vexecs 1000 1024 nulljob +time vexecs 1000 102400 nulljob +size bigjob +time execs 1000 0 bigjob +time execs 1000 1024 bigjob +time execs 1000 102400 bigjob +time vexecs 1000 0 bigjob +time vexecs 1000 1024 bigjob +time vexecs 1000 102400 bigjob +# fill environment with ~1024 bytes +setenv a 012345678901234567890123456789012345678901234567890123456780123456789 +setenv b 012345678901234567890123456789012345678901234567890123456780123456789 +setenv c 012345678901234567890123456789012345678901234567890123456780123456789 +setenv d 012345678901234567890123456789012345678901234567890123456780123456789 +setenv e 012345678901234567890123456789012345678901234567890123456780123456789 +setenv f 012345678901234567890123456789012345678901234567890123456780123456789 +setenv g 012345678901234567890123456789012345678901234567890123456780123456789 +setenv h 012345678901234567890123456789012345678901234567890123456780123456789 +setenv i 012345678901234567890123456789012345678901234567890123456780123456789 +setenv j 012345678901234567890123456789012345678901234567890123456780123456789 +setenv k 012345678901234567890123456789012345678901234567890123456780123456789 +setenv l 012345678901234567890123456789012345678901234567890123456780123456789 +setenv m 012345678901234567890123456789012345678901234567890123456780123456789 +setenv n 012345678901234567890123456789012345678901234567890123456780123456789 +setenv o 012345678901234567890123456789012345678901234567890123456780123456789 +countenv +time execs 1000 0 nulljob +time execs 1000 1024 nulljob +time execs 1000 102400 nulljob +time execs 1000 0 bigjob +time execs 1000 1024 bigjob +time execs 1000 102400 bigjob +.vE +.bp diff --git a/share/doc/papers/timecounter/Makefile b/share/doc/papers/timecounter/Makefile new file mode 100644 index 0000000..f6d158b --- /dev/null +++ b/share/doc/papers/timecounter/Makefile @@ -0,0 +1,20 @@ +# $FreeBSD$ + +# You really want: +# PRINTERDEVICE=ps +# or you will not get the illustration. +VOLUME= papers +DOC= timecounter +SRCS= tmac.usenix timecounter.ms-patched +EXTRA= fig1.eps fig2.eps fig3.eps fig4.eps fig5.eps gps.ps intr.ps +MACROS= -ms +CLEANFILES= timecounter.ms-patched +USE_PIC= +USE_EQN= +USE_TBL= + +timecounter.ms-patched: timecounter.ms + sed -E -e 's;(gps|intr).ps;${.CURDIR}/&;' -e 's;fig[0-9].eps;${.CURDIR}/&;' \ + ${.ALLSRC} > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/papers/timecounter/fig1.eps b/share/doc/papers/timecounter/fig1.eps new file mode 100644 index 0000000..012fed2 --- /dev/null +++ b/share/doc/papers/timecounter/fig1.eps @@ -0,0 +1,227 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: fig1.eps +%%Creator: fig2dev Version 3.2 Patchlevel 3d +%%CreationDate: $FreeBSD$ +%%For: phk@critter.freebsd.dk (Poul-Henning Kamp) +%%BoundingBox: 0 0 191 194 +%%Magnification: 1.0000 +%%EndComments +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end +save +newpath 0 194 moveto 0 0 lineto 191 0 lineto 191 194 lineto closepath clip newpath +-7.6 201.2 translate +1 -1 scale + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def + /DrawEllipse { + /endangle exch def + /startangle exch def + /yrad exch def + /xrad exch def + /y exch def + /x exch def + /savematrix mtrx currentmatrix def + x y tr xrad yrad sc 0 0 1 startangle endangle arc + closepath + savematrix setmatrix + } def + +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +$F2psBegin +10 setmiterlimit + 0.06000 0.06000 sc +% +% Fig objects follow +% +/Times-Roman ff 180.00 scf sf +750 3300 m +gs 1 -1 sc (Imprecise) dup sw pop 2 div neg 0 rm col0 sh gr +15.000 slw +% Ellipse +n 750 750 300 300 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 750 750 450 450 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 750 750 600 600 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 750 2250 150 150 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 750 2250 300 300 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 750 2250 450 450 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 750 2250 600 600 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 2250 150 150 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 2250 300 300 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 2250 450 450 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 2250 600 600 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 750 150 150 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 750 300 300 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 750 450 450 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2250 750 600 600 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 2280 2197 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2152 2212 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2145 2332 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2265 2325 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2370 2295 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 292 2002 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 367 1905 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 390 2040 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 180 1950 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 1965 472 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2355 517 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2505 870 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 907 1170 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 1282 1305 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 975 825 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2071 1074 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 2550 600 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 1350 675 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 1350 1050 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +% Ellipse +n 225 2100 38 38 0 360 DrawEllipse gs 0.00 setgray ef gr gs col0 s gr + +/Times-Roman ff 180.00 scf sf +3300 750 m +gs 1 -1 sc 90.0 rot (Unstable) dup sw pop 2 div neg 0 rm col0 sh gr +/Times-Roman ff 180.00 scf sf +3300 2250 m +gs 1 -1 sc 90.0 rot (Stable) dup sw pop 2 div neg 0 rm col0 sh gr +/Times-Roman ff 180.00 scf sf +2250 3300 m +gs 1 -1 sc (Precise) dup sw pop 2 div neg 0 rm col0 sh gr +% Ellipse +n 750 750 150 150 0 360 DrawEllipse gs col0 s gr + +$F2psEnd +rs diff --git a/share/doc/papers/timecounter/fig2.eps b/share/doc/papers/timecounter/fig2.eps new file mode 100644 index 0000000..6771435 --- /dev/null +++ b/share/doc/papers/timecounter/fig2.eps @@ -0,0 +1,150 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: fig2.eps +%%Creator: fig2dev Version 3.2 Patchlevel 3d +%%CreationDate: $FreeBSD$ +%%For: phk@critter.freebsd.dk (Poul-Henning Kamp) +%%BoundingBox: 0 0 191 194 +%%Magnification: 1.0000 +%%EndComments +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end +save +newpath 0 194 moveto 0 0 lineto 191 0 lineto 191 194 lineto closepath clip newpath +-7.7 201.2 translate +1 -1 scale + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +$F2psBegin +10 setmiterlimit + 0.06000 0.06000 sc +% +% Fig objects follow +% +/Times-Roman ff 180.00 scf sf +750 3300 m +gs 1 -1 sc (Imprecise) dup sw pop 2 div neg 0 rm col0 sh gr +% Polyline +15.000 slw +n 150 750 m + 1350 750 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 1650 150 m + 1650 1350 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 1650 750 m + 2850 750 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 1650 1650 m + 1650 2850 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 1650 2250 m + 2850 2250 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 150 1650 m + 150 2850 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 150 2250 m + 1350 2250 l gs 0.00 setgray ef gr gs col0 s gr +% Polyline +n 1665 2205 m 1792 2182 l 1942 2220 l 2100 2295 l 2257 2212 l 2392 2205 l + 2460 2280 l 2520 2295 l 2617 2197 l + 2850 2212 l gs col0 s gr +% Polyline +n 165 2565 m 360 2490 l 487 2362 l 615 2347 l 705 2250 l 825 2212 l + 915 2130 l 1057 2085 l 1155 1980 l 1237 1972 l 1297 1920 l + + 1342 1897 l gs col0 s gr +% Polyline +n 1657 465 m 1770 637 l 1927 705 l 2002 1020 l 2107 862 l 2190 525 l + 2227 652 l 2272 555 l 2362 982 l 2475 1147 l 2512 832 l + 2557 427 l 2587 502 l 2647 277 l 2677 630 l 2775 967 l + + 2850 525 l gs col0 s gr +% Polyline +n 150 232 m 352 307 l 375 637 l 562 577 l 622 982 l 690 622 l + 780 870 l 885 622 l 945 1207 l 1035 952 l 1080 1140 l + 1140 1080 l 1192 1372 l + 1350 1185 l gs col0 s gr +/Times-Roman ff 180.00 scf sf +3300 750 m +gs 1 -1 sc 90.0 rot (Unstable) dup sw pop 2 div neg 0 rm col0 sh gr +/Times-Roman ff 180.00 scf sf +3300 2250 m +gs 1 -1 sc 90.0 rot (Stable) dup sw pop 2 div neg 0 rm col0 sh gr +/Times-Roman ff 180.00 scf sf +2250 3300 m +gs 1 -1 sc (Precise) dup sw pop 2 div neg 0 rm col0 sh gr +% Polyline +n 150 150 m + 150 1350 l gs 0.00 setgray ef gr gs col0 s gr +$F2psEnd +rs diff --git a/share/doc/papers/timecounter/fig3.eps b/share/doc/papers/timecounter/fig3.eps new file mode 100644 index 0000000..9972823 --- /dev/null +++ b/share/doc/papers/timecounter/fig3.eps @@ -0,0 +1,126 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: fig3.eps +%%Creator: fig2dev Version 3.2 Patchlevel 3d +%%CreationDate: $FreeBSD$ +%%For: phk@critter.freebsd.dk (Poul-Henning Kamp) +%%BoundingBox: 0 0 181 56 +%%Magnification: 1.0000 +%%EndComments +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end +save +newpath 0 56 moveto 0 0 lineto 181 0 lineto 181 56 lineto closepath clip newpath +-16.7 81.0 translate +1 -1 scale + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +$F2psBegin +10 setmiterlimit + 0.06000 0.06000 sc +% +% Fig objects follow +% +% Polyline +7.500 slw +gs clippath +1740 780 m 1740 720 l 1588 720 l 1708 750 l 1588 780 l cp +eoclip +n 1200 750 m + 1725 750 l gs col0 s gr gr + +% arrowhead +n 1588 780 m 1708 750 l 1588 720 l col0 s +% Arc +n 900.0 750.0 150.0 180.0 0.0 arcn +gs col0 s gr + +% Polyline +15.000 slw +n 300 450 m 1200 450 l 1200 1050 l 300 1050 l + cp gs col0 s gr +% Arc +7.500 slw +n 600.0 750.0 150.0 180.0 0.0 arc +gs col0 s gr + +% Polyline +15.000 slw +n 1725 600 m 3225 600 l 3225 900 l 1725 900 l + cp gs col0 s gr +/Times-Roman ff 180.00 scf sf +1725 1350 m +gs 1 -1 sc (Oscillator + Counter = Clock) dup sw pop 2 div neg 0 rm col0 sh gr +/Helvetica-Bold ff 180.00 scf sf +2475 825 m +gs 1 -1 sc (1 0 3 7 5 4 2 5 0 0) dup sw pop 2 div neg 0 rm col0 sh gr +$F2psEnd +rs diff --git a/share/doc/papers/timecounter/fig4.eps b/share/doc/papers/timecounter/fig4.eps new file mode 100644 index 0000000..7a5684f --- /dev/null +++ b/share/doc/papers/timecounter/fig4.eps @@ -0,0 +1,259 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: fig4.eps +%%Creator: fig2dev Version 3.2 Patchlevel 3d +%%CreationDate: $FreeBSD$ +%%For: phk@critter.freebsd.dk (Poul-Henning Kamp) +%%BoundingBox: 0 0 119 203 +%%Magnification: 1.0000 +%%EndComments +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end +save +newpath 0 203 moveto 0 0 lineto 119 0 lineto 119 203 lineto closepath clip newpath +-8.3 207.7 translate +1 -1 scale + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +$F2psBegin +10 setmiterlimit + 0.06000 0.06000 sc +% +% Fig objects follow +% +/Times-Roman ff 180.00 scf sf +300 450 m +gs 1 -1 sc (*volatile timehands;) col0 sh gr +% Polyline +7.500 slw +n 1005 750 m 900 750 900 1095 105 arcto 4 {pop} repeat + 900 1200 1245 1200 105 arcto 4 {pop} repeat + 1350 1200 1350 855 105 arcto 4 {pop} repeat + 1350 750 1005 750 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 1755 750 m 1650 750 1650 1095 105 arcto 4 {pop} repeat + 1650 1200 1995 1200 105 arcto 4 {pop} repeat + 2100 1200 2100 855 105 arcto 4 {pop} repeat + 2100 750 1755 750 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 1755 1500 m 1650 1500 1650 1845 105 arcto 4 {pop} repeat + 1650 1950 1995 1950 105 arcto 4 {pop} repeat + 2100 1950 2100 1605 105 arcto 4 {pop} repeat + 2100 1500 1755 1500 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 1755 2250 m 1650 2250 1650 2595 105 arcto 4 {pop} repeat + 1650 2700 1995 2700 105 arcto 4 {pop} repeat + 2100 2700 2100 2355 105 arcto 4 {pop} repeat + 2100 2250 1755 2250 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 1755 3000 m 1650 3000 1650 3345 105 arcto 4 {pop} repeat + 1650 3450 1995 3450 105 arcto 4 {pop} repeat + 2100 3450 2100 3105 105 arcto 4 {pop} repeat + 2100 3000 1755 3000 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 1005 3000 m 900 3000 900 3345 105 arcto 4 {pop} repeat + 900 3450 1245 3450 105 arcto 4 {pop} repeat + 1350 3450 1350 3105 105 arcto 4 {pop} repeat + 1350 3000 1005 3000 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 255 3000 m 150 3000 150 3345 105 arcto 4 {pop} repeat + 150 3450 495 3450 105 arcto 4 {pop} repeat + 600 3450 600 3105 105 arcto 4 {pop} repeat + 600 3000 255 3000 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 255 2250 m 150 2250 150 2595 105 arcto 4 {pop} repeat + 150 2700 495 2700 105 arcto 4 {pop} repeat + 600 2700 600 2355 105 arcto 4 {pop} repeat + 600 2250 255 2250 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +n 255 1500 m 150 1500 150 1845 105 arcto 4 {pop} repeat + 150 1950 495 1950 105 arcto 4 {pop} repeat + 600 1950 600 1605 105 arcto 4 {pop} repeat + 600 1500 255 1500 105 arcto 4 {pop} repeat + cp gs col0 s gr +% Polyline +gs clippath +915 1005 m 915 945 l 763 945 l 883 975 l 763 1005 l cp +eoclip +n 600 975 m + 900 975 l gs col0 s gr gr + +% arrowhead +n 763 1005 m 883 975 l 763 945 l col0 s +% Polyline +gs clippath +1665 1005 m 1665 945 l 1513 945 l 1633 975 l 1513 1005 l cp +eoclip +n 1350 975 m + 1650 975 l gs col0 s gr gr + +% arrowhead +n 1513 1005 m 1633 975 l 1513 945 l col0 s +% Polyline +gs clippath +1845 1515 m 1905 1515 l 1905 1363 l 1875 1483 l 1845 1363 l cp +eoclip +n 1875 1200 m + 1875 1500 l gs col0 s gr gr + +% arrowhead +n 1845 1363 m 1875 1483 l 1905 1363 l col0 s +% Polyline +gs clippath +1845 2265 m 1905 2265 l 1905 2113 l 1875 2233 l 1845 2113 l cp +eoclip +n 1875 1950 m + 1875 2250 l gs col0 s gr gr + +% arrowhead +n 1845 2113 m 1875 2233 l 1905 2113 l col0 s +% Polyline +gs clippath +1845 3015 m 1905 3015 l 1905 2863 l 1875 2983 l 1845 2863 l cp +eoclip +n 1875 2700 m + 1875 3000 l gs col0 s gr gr + +% arrowhead +n 1845 2863 m 1875 2983 l 1905 2863 l col0 s +% Polyline +gs clippath +1335 3195 m 1335 3255 l 1487 3255 l 1367 3225 l 1487 3195 l cp +eoclip +n 1650 3225 m + 1350 3225 l gs col0 s gr gr + +% arrowhead +n 1487 3195 m 1367 3225 l 1487 3255 l col0 s +% Polyline +gs clippath +585 3195 m 585 3255 l 737 3255 l 617 3225 l 737 3195 l cp +eoclip +n 900 3225 m + 600 3225 l gs col0 s gr gr + +% arrowhead +n 737 3195 m 617 3225 l 737 3255 l col0 s +% Polyline +gs clippath +405 2685 m 345 2685 l 345 2837 l 375 2717 l 405 2837 l cp +eoclip +n 375 3000 m + 375 2700 l gs col0 s gr gr + +% arrowhead +n 405 2837 m 375 2717 l 345 2837 l col0 s +% Polyline +gs clippath +405 1935 m 345 1935 l 345 2087 l 375 1967 l 405 2087 l cp +eoclip +n 375 2250 m + 375 1950 l gs col0 s gr gr + +% arrowhead +n 405 2087 m 375 1967 l 345 2087 l col0 s +% Polyline +gs clippath +405 1185 m 345 1185 l 345 1337 l 375 1217 l 405 1337 l cp +eoclip +n 375 1500 m + 375 1200 l gs col0 s gr gr + +% arrowhead +n 405 1337 m 375 1217 l 345 1337 l col0 s +% Polyline +gs clippath +1845 765 m 1905 765 l 1905 613 l 1875 733 l 1845 613 l cp +eoclip +n 1800 375 m 1875 375 l + 1875 750 l gs col0 s gr gr + +% arrowhead +n 1845 613 m 1875 733 l 1905 613 l col0 s +/Times-Roman ff 180.00 scf sf +150 225 m +gs 1 -1 sc (struct timehands) col0 sh gr +% Polyline +n 255 750 m 150 750 150 1095 105 arcto 4 {pop} repeat + 150 1200 495 1200 105 arcto 4 {pop} repeat + 600 1200 600 855 105 arcto 4 {pop} repeat + 600 750 255 750 105 arcto 4 {pop} repeat + cp gs col0 s gr +$F2psEnd +rs diff --git a/share/doc/papers/timecounter/fig5.eps b/share/doc/papers/timecounter/fig5.eps new file mode 100644 index 0000000..b6274c1 --- /dev/null +++ b/share/doc/papers/timecounter/fig5.eps @@ -0,0 +1,211 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: fig5.eps +%%Creator: fig2dev Version 3.2 Patchlevel 3d +%%CreationDate: $FreeBSD$ +%%For: phk@critter.freebsd.dk (Poul-Henning Kamp) +%%BoundingBox: 0 0 140 225 +%%Magnification: 1.0000 +%%EndComments +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end +save +newpath 0 225 moveto 0 0 lineto 140 0 lineto 140 225 lineto closepath clip newpath +-7.7 234.7 translate +1 -1 scale + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +$F2psBegin +10 setmiterlimit + 0.06000 0.06000 sc +% +% Fig objects follow +% +/Times-Roman ff 180.00 scf sf +1950 600 m +gs 1 -1 sc (PPS#1) col0 sh gr +% Polyline +7.500 slw +gs clippath +915 2130 m 915 2070 l 763 2070 l 883 2100 l 763 2130 l cp +eoclip +n 600 2100 m + 900 2100 l gs col0 s gr gr + +% arrowhead +n 763 2130 m 883 2100 l 763 2070 l col0 s +% Polyline +gs clippath +1665 2130 m 1665 2070 l 1513 2070 l 1633 2100 l 1513 2130 l cp +eoclip +n 1350 2100 m + 1650 2100 l gs col0 s gr gr + +% arrowhead +n 1513 2130 m 1633 2100 l 1513 2070 l col0 s +% Polyline +15.000 slw +n 900 1050 m 1350 1050 l 1350 3000 l 900 3000 l + cp gs col0 s gr +% Polyline +n 1650 1050 m 2100 1050 l 2100 3000 l 1650 3000 l + cp gs col0 s gr +% Polyline +7.500 slw +gs clippath +345 3465 m 405 3465 l 405 3313 l 375 3433 l 345 3313 l cp +eoclip +n 375 3000 m + 375 3450 l gs col0 s gr gr + +% arrowhead +n 345 3313 m 375 3433 l 405 3313 l col0 s +% Polyline +gs clippath +1095 3465 m 1155 3465 l 1155 3313 l 1125 3433 l 1095 3313 l cp +eoclip +n 1125 3000 m + 1125 3450 l gs col0 s gr gr + +% arrowhead +n 1095 3313 m 1125 3433 l 1155 3313 l col0 s +% Polyline +gs clippath +1845 3465 m 1905 3465 l 1905 3313 l 1875 3433 l 1845 3313 l cp +eoclip +n 1875 3000 m + 1875 3450 l gs col0 s gr gr + +% arrowhead +n 1845 3313 m 1875 3433 l 1905 3313 l col0 s +% Polyline +gs clippath +2070 3915 m 2130 3915 l 2130 3763 l 2100 3883 l 2070 3763 l cp +eoclip +n 150 3450 m 2100 3450 l + 2100 3900 l gs col0 s gr gr + +% arrowhead +n 2070 3763 m 2100 3883 l 2130 3763 l col0 s +% Polyline +gs clippath +1845 1065 m 1905 1065 l 1905 913 l 1875 1033 l 1845 913 l cp +eoclip +n 1875 600 m + 1875 1050 l gs col0 s gr gr + +% arrowhead +n 1845 913 m 1875 1033 l 1905 913 l col0 s +% Polyline +gs clippath +1095 1065 m 1155 1065 l 1155 913 l 1125 1033 l 1095 913 l cp +eoclip +n 1125 450 m + 1125 1050 l gs col0 s gr gr + +% arrowhead +n 1095 913 m 1125 1033 l 1155 913 l col0 s +% Polyline +gs clippath +345 1065 m 405 1065 l 405 913 l 375 1033 l 345 913 l cp +eoclip +n 375 300 m + 375 1050 l gs col0 s gr gr + +% arrowhead +n 345 913 m 375 1033 l 405 913 l col0 s +/Times-Roman ff 180.00 scf sf +450 2850 m +gs 1 -1 sc 90.0 rot (26 bit binary counter.) col0 sh gr +/Times-Roman ff 180.00 scf sf +2250 2025 m +gs 1 -1 sc (...) col0 sh gr +/Times-Roman ff 180.00 scf sf +1200 2850 m +gs 1 -1 sc 90.0 rot (26 bit latch) col0 sh gr +/Times-Roman ff 180.00 scf sf +1950 2850 m +gs 1 -1 sc 90.0 rot (26 bit latch) col0 sh gr +/Times-Roman ff 180.00 scf sf +450 3675 m +gs 1 -1 sc (PCI system bus) col0 sh gr +/Times-Roman ff 180.00 scf sf +450 300 m +gs 1 -1 sc (Clock) col0 sh gr +/Times-Roman ff 180.00 scf sf +1200 450 m +gs 1 -1 sc (PPS#0) col0 sh gr +% Polyline +15.000 slw +n 150 1050 m 600 1050 l 600 3000 l 150 3000 l + cp gs col0 s gr +$F2psEnd +rs diff --git a/share/doc/papers/timecounter/gps.ps b/share/doc/papers/timecounter/gps.ps new file mode 100644 index 0000000..aaaae81 --- /dev/null +++ b/share/doc/papers/timecounter/gps.ps @@ -0,0 +1,1488 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: _.ps +%%Creator: gnuplot 3.7 patchlevel 1 +%%CreationDate: $FreeBSD$ +%%DocumentFonts: (atend) +%%BoundingBox: 50 50 266 201 +%%Orientation: Portrait +%%EndComments +/gnudict 256 dict def +gnudict begin +/Color false def +/Solid false def +/gnulinewidth 5.000 def +/userlinewidth gnulinewidth def +/vshift -46 def +/dl {10 mul} def +/hpt_ 31.5 def +/vpt_ 31.5 def +/hpt hpt_ def +/vpt vpt_ def +/M {moveto} bind def +/L {lineto} bind def +/R {rmoveto} bind def +/V {rlineto} bind def +/vpt2 vpt 2 mul def +/hpt2 hpt 2 mul def +/Lshow { currentpoint stroke M + 0 vshift R show } def +/Rshow { currentpoint stroke M + dup stringwidth pop neg vshift R show } def +/Cshow { currentpoint stroke M + dup stringwidth pop -2 div vshift R show } def +/UP { dup vpt_ mul /vpt exch def hpt_ mul /hpt exch def + /hpt2 hpt 2 mul def /vpt2 vpt 2 mul def } def +/DL { Color {setrgbcolor Solid {pop []} if 0 setdash } + {pop pop pop Solid {pop []} if 0 setdash} ifelse } def +/BL { stroke userlinewidth 2 mul setlinewidth } def +/AL { stroke userlinewidth 2 div setlinewidth } def +/UL { dup gnulinewidth mul /userlinewidth exch def + 10 mul /udl exch def } def +/PL { stroke userlinewidth setlinewidth } def +/LTb { BL [] 0 0 0 DL } def +/LTa { AL [1 udl mul 2 udl mul] 0 setdash 0 0 0 setrgbcolor } def +/LT0 { PL [] 1 0 0 DL } def +/LT1 { PL [4 dl 2 dl] 0 1 0 DL } def +/LT2 { PL [2 dl 3 dl] 0 0 1 DL } def +/LT3 { PL [1 dl 1.5 dl] 1 0 1 DL } def +/LT4 { PL [5 dl 2 dl 1 dl 2 dl] 0 1 1 DL } def +/LT5 { PL [4 dl 3 dl 1 dl 3 dl] 1 1 0 DL } def +/LT6 { PL [2 dl 2 dl 2 dl 4 dl] 0 0 0 DL } def +/LT7 { PL [2 dl 2 dl 2 dl 2 dl 2 dl 4 dl] 1 0.3 0 DL } def +/LT8 { PL [2 dl 2 dl 2 dl 2 dl 2 dl 2 dl 2 dl 4 dl] 0.5 0.5 0.5 DL } def +/Pnt { stroke [] 0 setdash + gsave 1 setlinecap M 0 0 V stroke grestore } def +/Dia { stroke [] 0 setdash 2 copy vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath stroke + Pnt } def +/Pls { stroke [] 0 setdash vpt sub M 0 vpt2 V + currentpoint stroke M + hpt neg vpt neg R hpt2 0 V stroke + } def +/Box { stroke [] 0 setdash 2 copy exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath stroke + Pnt } def +/Crs { stroke [] 0 setdash exch hpt sub exch vpt add M + hpt2 vpt2 neg V currentpoint stroke M + hpt2 neg 0 R hpt2 vpt2 V stroke } def +/TriU { stroke [] 0 setdash 2 copy vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath stroke + Pnt } def +/Star { 2 copy Pls Crs } def +/BoxF { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath fill } def +/TriUF { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath fill } def +/TriD { stroke [] 0 setdash 2 copy vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath stroke + Pnt } def +/TriDF { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath fill} def +/DiaF { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath fill } def +/Pent { stroke [] 0 setdash 2 copy gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath stroke grestore Pnt } def +/PentF { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath fill grestore } def +/Circle { stroke [] 0 setdash 2 copy + hpt 0 360 arc stroke Pnt } def +/CircleF { stroke [] 0 setdash hpt 0 360 arc fill } def +/C0 { BL [] 0 setdash 2 copy moveto vpt 90 450 arc } bind def +/C1 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc closepath fill + vpt 0 360 arc closepath } bind def +/C2 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C3 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C4 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 180 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C5 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc + 2 copy moveto + 2 copy vpt 180 270 arc closepath fill + vpt 0 360 arc } bind def +/C6 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C7 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C8 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 270 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C9 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 270 450 arc closepath fill + vpt 0 360 arc closepath } bind def +/C10 { BL [] 0 setdash 2 copy 2 copy moveto vpt 270 360 arc closepath fill + 2 copy moveto + 2 copy vpt 90 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C11 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 180 arc closepath fill + 2 copy moveto + 2 copy vpt 270 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C12 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 180 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C13 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc closepath fill + 2 copy moveto + 2 copy vpt 180 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C14 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 360 arc closepath fill + vpt 0 360 arc } bind def +/C15 { BL [] 0 setdash 2 copy vpt 0 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/Rec { newpath 4 2 roll moveto 1 index 0 rlineto 0 exch rlineto + neg 0 rlineto closepath } bind def +/Square { dup Rec } bind def +/Bsquare { vpt sub exch vpt sub exch vpt2 Square } bind def +/S0 { BL [] 0 setdash 2 copy moveto 0 vpt rlineto BL Bsquare } bind def +/S1 { BL [] 0 setdash 2 copy vpt Square fill Bsquare } bind def +/S2 { BL [] 0 setdash 2 copy exch vpt sub exch vpt Square fill Bsquare } bind def +/S3 { BL [] 0 setdash 2 copy exch vpt sub exch vpt2 vpt Rec fill Bsquare } bind def +/S4 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt Square fill Bsquare } bind def +/S5 { BL [] 0 setdash 2 copy 2 copy vpt Square fill + exch vpt sub exch vpt sub vpt Square fill Bsquare } bind def +/S6 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill Bsquare } bind def +/S7 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill + 2 copy vpt Square fill + Bsquare } bind def +/S8 { BL [] 0 setdash 2 copy vpt sub vpt Square fill Bsquare } bind def +/S9 { BL [] 0 setdash 2 copy vpt sub vpt vpt2 Rec fill Bsquare } bind def +/S10 { BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt Square fill + Bsquare } bind def +/S11 { BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt2 vpt Rec fill + Bsquare } bind def +/S12 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill Bsquare } bind def +/S13 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill + 2 copy vpt Square fill Bsquare } bind def +/S14 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill + 2 copy exch vpt sub exch vpt Square fill Bsquare } bind def +/S15 { BL [] 0 setdash 2 copy Bsquare fill Bsquare } bind def +/D0 { gsave translate 45 rotate 0 0 S0 stroke grestore } bind def +/D1 { gsave translate 45 rotate 0 0 S1 stroke grestore } bind def +/D2 { gsave translate 45 rotate 0 0 S2 stroke grestore } bind def +/D3 { gsave translate 45 rotate 0 0 S3 stroke grestore } bind def +/D4 { gsave translate 45 rotate 0 0 S4 stroke grestore } bind def +/D5 { gsave translate 45 rotate 0 0 S5 stroke grestore } bind def +/D6 { gsave translate 45 rotate 0 0 S6 stroke grestore } bind def +/D7 { gsave translate 45 rotate 0 0 S7 stroke grestore } bind def +/D8 { gsave translate 45 rotate 0 0 S8 stroke grestore } bind def +/D9 { gsave translate 45 rotate 0 0 S9 stroke grestore } bind def +/D10 { gsave translate 45 rotate 0 0 S10 stroke grestore } bind def +/D11 { gsave translate 45 rotate 0 0 S11 stroke grestore } bind def +/D12 { gsave translate 45 rotate 0 0 S12 stroke grestore } bind def +/D13 { gsave translate 45 rotate 0 0 S13 stroke grestore } bind def +/D14 { gsave translate 45 rotate 0 0 S14 stroke grestore } bind def +/D15 { gsave translate 45 rotate 0 0 S15 stroke grestore } bind def +/DiaE { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath stroke } def +/BoxE { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath stroke } def +/TriUE { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath stroke } def +/TriDE { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath stroke } def +/PentE { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath stroke grestore } def +/CircE { stroke [] 0 setdash + hpt 0 360 arc stroke } def +/Opaque { gsave closepath 1 setgray fill grestore 0 setgray closepath } def +/DiaW { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V Opaque stroke } def +/BoxW { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V Opaque stroke } def +/TriUW { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V Opaque stroke } def +/TriDW { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V Opaque stroke } def +/PentW { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + Opaque stroke grestore } def +/CircW { stroke [] 0 setdash + hpt 0 360 arc Opaque stroke } def +/BoxFill { gsave Rec 1 setgray fill grestore } def +end +%%EndProlog +gnudict begin +gsave +50 50 translate +0.050 0.050 scale +0 setgray +newpath +(Helvetica) findfont 140 scalefont setfont +1.000 UL +LTb +1.000 UL +LTa +630 420 M +3452 0 V +1.000 UL +LTb +630 420 M +63 0 V +3389 0 R +-63 0 V +546 420 M +(-20) Rshow +1.000 UL +LTa +630 826 M +3452 0 V +1.000 UL +LTb +630 826 M +63 0 V +3389 0 R +-63 0 V +546 826 M +(-15) Rshow +1.000 UL +LTa +630 1232 M +3452 0 V +1.000 UL +LTb +630 1232 M +63 0 V +3389 0 R +-63 0 V +-3473 0 R +(-10) Rshow +1.000 UL +LTa +630 1638 M +3452 0 V +1.000 UL +LTb +630 1638 M +63 0 V +3389 0 R +-63 0 V +-3473 0 R +(-5) Rshow +1.000 UL +LTa +630 2044 M +3452 0 V +1.000 UL +LTb +630 2044 M +63 0 V +3389 0 R +-63 0 V +-3473 0 R +(0) Rshow +1.000 UL +LTa +630 2450 M +3452 0 V +1.000 UL +LTb +630 2450 M +63 0 V +3389 0 R +-63 0 V +-3473 0 R +(5) Rshow +1.000 UL +LTa +630 2856 M +3452 0 V +1.000 UL +LTb +630 2856 M +63 0 V +3389 0 R +-63 0 V +-3473 0 R +(10) Rshow +1.000 UL +LTa +630 420 M +0 2436 V +1.000 UL +LTb +630 420 M +0 63 V +0 2373 R +0 -63 V +630 280 M +(0) Cshow +1.000 UL +LTa +975 420 M +0 2436 V +1.000 UL +LTb +975 420 M +0 63 V +0 2373 R +0 -63 V +975 280 M +(100) Cshow +1.000 UL +LTa +1320 420 M +0 2436 V +1.000 UL +LTb +1320 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(200) Cshow +1.000 UL +LTa +1666 420 M +0 2436 V +1.000 UL +LTb +1666 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(300) Cshow +1.000 UL +LTa +2011 420 M +0 2436 V +1.000 UL +LTb +2011 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(400) Cshow +1.000 UL +LTa +2356 420 M +0 2436 V +1.000 UL +LTb +2356 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(500) Cshow +1.000 UL +LTa +2701 420 M +0 2436 V +1.000 UL +LTb +2701 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(600) Cshow +1.000 UL +LTa +3046 420 M +0 2436 V +1.000 UL +LTb +3046 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(700) Cshow +1.000 UL +LTa +3392 420 M +0 2436 V +1.000 UL +LTb +3392 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(800) Cshow +1.000 UL +LTa +3737 420 M +0 2373 V +0 63 V +1.000 UL +LTb +3737 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(900) Cshow +1.000 UL +LTa +4082 420 M +0 2436 V +1.000 UL +LTb +4082 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(1000) Cshow +1.000 UL +LTb +630 420 M +3452 0 V +0 2436 V +-3452 0 V +630 420 L +140 1638 M +currentpoint gsave translate 90 rotate 0 0 M +(nanoseconds) Cshow +grestore +2356 70 M +(seconds) Cshow +1.000 UL +LT0 +631 2125 M +3 -81 V +4 0 V +3 -162 V +4 0 V +3 162 V +3 0 V +4 0 V +3 -244 V +4 0 V +3 244 V +4 -162 V +3 0 V +4 -163 V +3 0 V +4 163 V +3 -163 V +3 0 V +4 731 V +3 -812 V +4 162 V +3 650 V +4 -812 V +3 731 V +4 81 V +3 -731 V +4 650 V +3 -812 V +3 731 V +4 81 V +3 -731 V +4 650 V +3 0 V +4 -82 V +3 82 V +4 81 V +3 -81 V +4 0 V +3 -163 V +3 81 V +4 82 V +3 -163 V +4 0 V +3 -81 V +4 81 V +3 81 V +4 -81 V +3 0 V +3 -81 V +4 0 V +3 81 V +4 -81 V +3 0 V +4 -162 V +3 162 V +4 568 V +3 -568 V +4 0 V +3 568 V +3 -730 V +4 568 V +3 -568 V +4 0 V +3 649 V +4 -731 V +3 569 V +4 162 V +3 -812 V +4 731 V +3 81 V +3 -243 V +4 162 V +3 -731 V +4 569 V +3 81 V +4 -244 V +3 163 V +4 -731 V +3 568 V +3 0 V +4 -243 V +3 162 V +4 0 V +3 -244 V +4 82 V +3 -244 V +4 162 V +3 82 V +4 -244 V +3 81 V +3 -243 V +4 243 V +3 81 V +4 -243 V +3 81 V +4 -162 V +3 162 V +4 81 V +3 -243 V +4 81 V +3 -163 V +3 163 V +4 81 V +3 -162 V +4 81 V +3 -244 V +4 163 V +3 162 V +4 -244 V +3 82 V +3 -163 V +4 81 V +3 163 V +4 -244 V +3 81 V +4 -162 V +3 81 V +4 81 V +3 -162 V +4 0 V +3 -81 V +3 81 V +4 81 V +3 -81 V +4 0 V +3 650 V +4 -569 V +3 0 V +4 -162 V +3 162 V +4 569 V +3 -731 V +3 162 V +4 -162 V +3 81 V +4 812 V +3 0 V +4 -731 V +3 731 V +4 -893 V +3 731 V +4 0 V +3 -650 V +3 812 V +4 0 V +3 -244 V +4 82 V +3 162 V +4 -244 V +3 82 V +4 -244 V +3 81 V +3 163 V +4 -163 V +3 81 V +4 -162 V +3 0 V +4 244 V +3 -244 V +4 81 V +3 -162 V +4 81 V +3 162 V +3 -243 V +4 162 V +3 -162 V +4 0 V +3 162 V +4 -162 V +3 81 V +4 650 V +3 -812 V +4 162 V +3 650 V +3 -731 V +4 649 V +3 -812 V +4 731 V +3 163 V +4 -812 V +3 649 V +4 0 V +3 -81 V +4 162 V +3 -812 V +3 650 V +4 81 V +3 -243 V +4 162 V +3 -812 V +4 650 V +3 162 V +4 -325 V +3 81 V +4 0 V +3 -81 V +3 81 V +4 -243 V +3 162 V +4 0 V +3 -81 V +4 81 V +3 -162 V +4 162 V +3 81 V +3 -162 V +4 81 V +3 -243 V +4 243 V +3 81 V +4 -243 V +3 81 V +4 -162 V +3 162 V +4 0 V +3 -244 V +3 82 V +4 -82 V +3 82 V +4 81 V +3 -163 V +4 0 V +3 -81 V +4 81 V +3 82 V +4 -163 V +3 0 V +3 -162 V +4 162 V +3 0 V +4 -81 V +3 0 V +4 650 V +3 -569 V +4 0 V +3 -162 V +3 81 V +4 650 V +3 -731 V +4 162 V +3 569 V +4 -650 V +3 650 V +4 0 V +3 -569 V +3 569 V +4 162 V +3 -162 V +4 0 V +3 162 V +4 0 V +3 0 V +4 -162 V +3 162 V +4 0 V +3 0 V +3 0 V +4 -162 V +3 162 V +4 0 V +3 0 V +4 0 V +3 -244 V +4 244 V +3 0 V +4 -162 V +3 162 V +3 -162 V +4 162 V +3 -325 V +4 325 V +3 0 V +4 -162 V +3 162 V +4 -244 V +3 244 V +4 0 V +3 -162 V +3 162 V +4 -244 V +3 244 V +4 0 V +3 -162 V +4 162 V +3 -325 V +4 163 V +3 162 V +4 -244 V +3 82 V +3 568 V +4 -650 V +3 82 V +4 649 V +3 -731 V +4 569 V +3 162 V +4 -649 V +3 568 V +3 -731 V +4 569 V +3 162 V +4 -650 V +3 569 V +4 81 V +3 -162 V +4 81 V +3 -650 V +4 650 V +3 81 V +3 -162 V +4 0 V +3 243 V +4 -243 V +3 162 V +4 -162 V +3 0 V +4 243 V +3 -243 V +4 81 V +3 -81 V +3 0 V +4 162 V +3 -162 V +4 81 V +3 -163 V +4 0 V +3 244 V +4 -244 V +3 82 V +3 -82 V +4 0 V +3 163 V +4 -244 V +3 163 V +4 -163 V +3 0 V +4 -81 V +3 81 V +4 81 V +3 -81 V +3 0 V +4 -81 V +3 0 V +4 81 V +3 -81 V +4 -162 V +3 -163 V +4 81 V +3 82 V +4 -244 V +3 0 V +3 -162 V +4 81 V +3 81 V +4 -81 V +3 0 V +4 649 V +3 -649 V +4 0 V +3 -81 V +3 0 V +4 649 V +3 -649 V +4 0 V +3 730 V +4 -812 V +3 650 V +4 244 V +3 -812 V +4 649 V +3 -812 V +3 731 V +4 81 V +3 -731 V +4 650 V +3 81 V +4 -243 V +3 162 V +4 0 V +3 -162 V +4 162 V +3 -244 V +3 244 V +4 0 V +3 -244 V +4 82 V +3 -244 V +4 162 V +3 82 V +4 -163 V +3 81 V +4 -243 V +3 243 V +3 82 V +4 -244 V +3 81 V +4 -162 V +3 162 V +4 81 V +3 569 V +currentpoint stroke M +4 -731 V +3 731 V +3 81 V +4 -731 V +3 650 V +4 -731 V +3 650 V +4 0 V +3 -650 V +4 568 V +3 163 V +4 -163 V +3 0 V +3 163 V +4 -81 V +3 -82 V +4 -81 V +3 81 V +4 82 V +3 -163 V +4 0 V +3 -81 V +4 81 V +3 81 V +3 -162 V +4 0 V +3 0 V +4 0 V +3 81 V +4 -81 V +3 0 V +4 -162 V +3 162 V +4 0 V +3 -162 V +3 0 V +4 -163 V +3 81 V +4 82 V +3 -82 V +4 0 V +3 650 V +4 -731 V +3 163 V +3 649 V +4 -812 V +3 650 V +4 81 V +3 -650 V +4 650 V +3 -812 V +4 650 V +3 81 V +4 -731 V +3 650 V +3 81 V +4 -163 V +3 82 V +4 81 V +3 -163 V +4 82 V +3 -163 V +4 81 V +3 -162 V +4 81 V +3 163 V +3 -244 V +4 81 V +3 -81 V +4 0 V +3 81 V +4 -81 V +3 0 V +4 -244 V +3 244 V +3 0 V +4 -162 V +3 0 V +4 649 V +3 -731 V +4 82 V +3 649 V +4 -812 V +3 650 V +4 81 V +3 -731 V +3 650 V +4 81 V +3 -162 V +4 81 V +3 -731 V +4 650 V +3 0 V +4 -163 V +3 163 V +4 81 V +3 -163 V +3 0 V +4 -162 V +3 81 V +4 81 V +3 -81 V +4 0 V +3 -81 V +4 81 V +3 0 V +3 -81 V +4 0 V +3 -244 V +4 244 V +3 0 V +4 -244 V +3 0 V +4 650 V +3 -650 V +4 0 V +3 731 V +3 -812 V +4 650 V +3 162 V +4 -812 V +3 650 V +4 -812 V +3 731 V +4 162 V +3 0 V +4 -162 V +3 0 V +3 -82 V +4 163 V +3 0 V +4 -163 V +3 0 V +4 -81 V +3 163 V +4 0 V +3 -163 V +4 0 V +3 -81 V +3 81 V +4 81 V +3 -162 V +4 81 V +3 -81 V +4 0 V +3 81 V +4 -81 V +3 0 V +4 650 V +3 0 V +3 -650 V +4 650 V +3 -650 V +4 568 V +3 0 V +4 -568 V +3 650 V +4 0 V +3 -163 V +3 0 V +4 163 V +3 -163 V +4 0 V +3 -81 V +4 0 V +3 162 V +4 -162 V +3 0 V +4 -81 V +3 0 V +3 162 V +4 -325 V +3 82 V +4 -163 V +3 0 V +4 163 V +3 -244 V +4 81 V +3 -81 V +4 0 V +3 81 V +3 569 V +4 -650 V +3 487 V +4 81 V +3 -568 V +4 487 V +3 -731 V +4 650 V +3 0 V +3 -650 V +4 569 V +3 81 V +4 -162 V +3 0 V +4 162 V +3 -162 V +4 0 V +3 -82 V +4 0 V +3 163 V +3 -244 V +4 81 V +3 -162 V +4 0 V +3 162 V +4 -162 V +3 0 V +4 -162 V +3 162 V +4 81 V +3 -243 V +3 162 V +4 487 V +3 -731 V +4 82 V +3 730 V +4 -730 V +3 568 V +4 81 V +3 -731 V +3 650 V +4 81 V +3 -162 V +4 81 V +3 81 V +4 -162 V +3 81 V +4 -244 V +3 82 V +4 162 V +3 -244 V +3 82 V +4 -244 V +3 81 V +4 163 V +3 -163 V +4 81 V +3 -162 V +4 0 V +3 81 V +4 -81 V +3 0 V +3 -244 V +4 82 V +3 162 V +4 -162 V +3 0 V +4 730 V +3 -730 V +4 568 V +3 162 V +3 -812 V +4 650 V +3 162 V +4 -243 V +3 162 V +4 -812 V +3 650 V +4 81 V +3 -162 V +4 81 V +3 0 V +3 -81 V +4 81 V +3 -244 V +4 163 V +3 81 V +4 -163 V +3 82 V +4 -244 V +3 162 V +4 0 V +3 -162 V +3 0 V +4 -244 V +3 244 V +4 0 V +3 -325 V +4 0 V +3 -243 V +4 162 V +3 -81 V +4 -81 V +3 -82 V +3 731 V +4 -649 V +3 0 V +4 649 V +3 -812 V +4 731 V +3 162 V +4 -812 V +3 731 V +3 -812 V +4 731 V +3 162 V +4 -812 V +3 731 V +4 0 V +3 -81 V +4 162 V +3 -812 V +4 650 V +3 81 V +3 -81 V +4 81 V +3 0 V +4 -81 V +3 81 V +4 -243 V +3 162 V +4 162 V +3 -162 V +4 0 V +3 -244 V +3 244 V +4 0 V +3 -162 V +4 162 V +3 -244 V +4 82 V +3 162 V +4 -325 V +3 81 V +3 -162 V +4 81 V +3 163 V +4 -244 V +3 162 V +4 -162 V +3 0 V +4 244 V +3 -244 V +4 162 V +3 -162 V +3 0 V +4 244 V +3 -244 V +4 81 V +3 -81 V +4 0 V +3 162 V +4 -81 V +3 0 V +4 -81 V +3 81 V +3 81 V +4 -81 V +3 0 V +4 -81 V +3 81 V +4 163 V +3 -163 V +4 81 V +3 -162 V +3 81 V +4 650 V +3 -650 V +4 81 V +3 650 V +4 -731 V +3 731 V +4 -731 V +3 81 V +4 569 V +3 -650 V +3 650 V +4 -731 V +3 162 V +4 650 V +3 -731 V +4 650 V +3 81 V +4 -731 V +3 731 V +4 -812 V +3 650 V +3 243 V +4 -812 V +3 650 V +4 0 V +3 -81 V +4 81 V +3 81 V +4 -162 V +3 81 V +4 -244 V +3 163 V +3 0 V +4 -82 V +3 82 V +4 -244 V +3 81 V +4 0 V +3 -81 V +4 -162 V +3 -244 V +3 81 V +4 -81 V +currentpoint stroke M +3 -406 V +4 -325 V +3 0 V +4 325 V +3 0 V +4 -81 V +3 0 V +4 -81 V +3 81 V +3 0 V +4 -163 V +3 -81 V +4 -162 V +3 162 V +4 0 V +3 -81 V +4 81 V +3 -162 V +4 162 V +3 0 V +3 -162 V +4 81 V +3 -162 V +4 162 V +3 81 V +4 -243 V +3 162 V +4 -162 V +3 162 V +3 81 V +4 -162 V +3 81 V +4 812 V +3 -812 V +4 244 V +3 -163 V +4 0 V +3 731 V +4 -731 V +3 81 V +3 731 V +4 -731 V +3 731 V +4 0 V +3 -649 V +4 649 V +3 -731 V +4 731 V +3 0 V +4 -649 V +3 649 V +3 81 V +4 -162 V +3 162 V +4 -730 V +3 730 V +4 82 V +3 -163 V +4 81 V +3 -162 V +3 162 V +4 82 V +3 -163 V +4 81 V +3 -162 V +4 81 V +3 81 V +4 -81 V +3 0 V +4 -81 V +3 162 V +3 0 V +4 -81 V +3 0 V +4 -162 V +3 243 V +4 0 V +3 -81 V +4 81 V +3 -162 V +4 162 V +3 82 V +3 -244 V +4 81 V +3 569 V +4 -650 V +3 81 V +4 569 V +3 -731 V +4 731 V +3 81 V +4 -731 V +3 650 V +3 81 V +4 -81 V +3 0 V +4 -650 V +3 731 V +4 0 V +3 -163 V +4 82 V +3 162 V +3 -81 V +4 0 V +3 -163 V +4 82 V +3 81 V +4 -81 V +3 0 V +4 -163 V +3 81 V +4 82 V +3 -163 V +3 163 V +4 -244 V +3 81 V +4 81 V +3 -81 V +4 81 V +3 -162 V +4 0 V +3 81 V +4 -81 V +3 81 V +3 650 V +4 -731 V +3 81 V +4 -81 V +3 0 V +4 650 V +3 -650 V +4 0 V +3 731 V +4 -731 V +3 568 V +3 82 V +4 -163 V +3 163 V +4 81 V +3 -244 V +4 81 V +3 -162 V +4 81 V +3 81 V +3 -243 V +4 81 V +3 -162 V +4 81 V +3 81 V +4 -244 V +3 82 V +4 -163 V +3 81 V +4 82 V +3 -244 V +3 81 V +4 -243 V +3 243 V +4 81 V +3 -162 V +4 0 V +3 568 V +4 -568 V +3 0 V +4 568 V +3 -730 V +3 649 V +4 81 V +3 -812 V +4 650 V +3 -731 V +4 569 V +3 162 V +4 0 V +3 -162 V +3 81 V +4 -244 V +3 163 V +4 0 V +3 -82 V +4 82 V +3 -244 V +4 162 V +3 82 V +4 -244 V +3 81 V +3 -81 V +4 0 V +3 81 V +4 -81 V +3 0 V +4 568 V +3 -568 V +4 0 V +3 487 V +4 81 V +3 -243 V +3 162 V +4 -731 V +stroke +grestore +end +showpage +%%Trailer +%%DocumentFonts: Helvetica diff --git a/share/doc/papers/timecounter/intr.ps b/share/doc/papers/timecounter/intr.ps new file mode 100644 index 0000000..a6bb7ce --- /dev/null +++ b/share/doc/papers/timecounter/intr.ps @@ -0,0 +1,1501 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: _.ps +%%Creator: gnuplot 3.7 patchlevel 1 +%%CreationDate: $FreeBSD$ +%%DocumentFonts: (atend) +%%BoundingBox: 50 50 266 201 +%%Orientation: Portrait +%%EndComments +/gnudict 256 dict def +gnudict begin +/Color false def +/Solid false def +/gnulinewidth 5.000 def +/userlinewidth gnulinewidth def +/vshift -46 def +/dl {10 mul} def +/hpt_ 31.5 def +/vpt_ 31.5 def +/hpt hpt_ def +/vpt vpt_ def +/M {moveto} bind def +/L {lineto} bind def +/R {rmoveto} bind def +/V {rlineto} bind def +/vpt2 vpt 2 mul def +/hpt2 hpt 2 mul def +/Lshow { currentpoint stroke M + 0 vshift R show } def +/Rshow { currentpoint stroke M + dup stringwidth pop neg vshift R show } def +/Cshow { currentpoint stroke M + dup stringwidth pop -2 div vshift R show } def +/UP { dup vpt_ mul /vpt exch def hpt_ mul /hpt exch def + /hpt2 hpt 2 mul def /vpt2 vpt 2 mul def } def +/DL { Color {setrgbcolor Solid {pop []} if 0 setdash } + {pop pop pop Solid {pop []} if 0 setdash} ifelse } def +/BL { stroke userlinewidth 2 mul setlinewidth } def +/AL { stroke userlinewidth 2 div setlinewidth } def +/UL { dup gnulinewidth mul /userlinewidth exch def + 10 mul /udl exch def } def +/PL { stroke userlinewidth setlinewidth } def +/LTb { BL [] 0 0 0 DL } def +/LTa { AL [1 udl mul 2 udl mul] 0 setdash 0 0 0 setrgbcolor } def +/LT0 { PL [] 1 0 0 DL } def +/LT1 { PL [4 dl 2 dl] 0 1 0 DL } def +/LT2 { PL [2 dl 3 dl] 0 0 1 DL } def +/LT3 { PL [1 dl 1.5 dl] 1 0 1 DL } def +/LT4 { PL [5 dl 2 dl 1 dl 2 dl] 0 1 1 DL } def +/LT5 { PL [4 dl 3 dl 1 dl 3 dl] 1 1 0 DL } def +/LT6 { PL [2 dl 2 dl 2 dl 4 dl] 0 0 0 DL } def +/LT7 { PL [2 dl 2 dl 2 dl 2 dl 2 dl 4 dl] 1 0.3 0 DL } def +/LT8 { PL [2 dl 2 dl 2 dl 2 dl 2 dl 2 dl 2 dl 4 dl] 0.5 0.5 0.5 DL } def +/Pnt { stroke [] 0 setdash + gsave 1 setlinecap M 0 0 V stroke grestore } def +/Dia { stroke [] 0 setdash 2 copy vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath stroke + Pnt } def +/Pls { stroke [] 0 setdash vpt sub M 0 vpt2 V + currentpoint stroke M + hpt neg vpt neg R hpt2 0 V stroke + } def +/Box { stroke [] 0 setdash 2 copy exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath stroke + Pnt } def +/Crs { stroke [] 0 setdash exch hpt sub exch vpt add M + hpt2 vpt2 neg V currentpoint stroke M + hpt2 neg 0 R hpt2 vpt2 V stroke } def +/TriU { stroke [] 0 setdash 2 copy vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath stroke + Pnt } def +/Star { 2 copy Pls Crs } def +/BoxF { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath fill } def +/TriUF { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath fill } def +/TriD { stroke [] 0 setdash 2 copy vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath stroke + Pnt } def +/TriDF { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath fill} def +/DiaF { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath fill } def +/Pent { stroke [] 0 setdash 2 copy gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath stroke grestore Pnt } def +/PentF { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath fill grestore } def +/Circle { stroke [] 0 setdash 2 copy + hpt 0 360 arc stroke Pnt } def +/CircleF { stroke [] 0 setdash hpt 0 360 arc fill } def +/C0 { BL [] 0 setdash 2 copy moveto vpt 90 450 arc } bind def +/C1 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc closepath fill + vpt 0 360 arc closepath } bind def +/C2 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C3 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C4 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 180 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C5 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc + 2 copy moveto + 2 copy vpt 180 270 arc closepath fill + vpt 0 360 arc } bind def +/C6 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C7 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 270 arc closepath fill + vpt 0 360 arc closepath } bind def +/C8 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 270 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C9 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 270 450 arc closepath fill + vpt 0 360 arc closepath } bind def +/C10 { BL [] 0 setdash 2 copy 2 copy moveto vpt 270 360 arc closepath fill + 2 copy moveto + 2 copy vpt 90 180 arc closepath fill + vpt 0 360 arc closepath } bind def +/C11 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 180 arc closepath fill + 2 copy moveto + 2 copy vpt 270 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C12 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 180 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C13 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 0 90 arc closepath fill + 2 copy moveto + 2 copy vpt 180 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/C14 { BL [] 0 setdash 2 copy moveto + 2 copy vpt 90 360 arc closepath fill + vpt 0 360 arc } bind def +/C15 { BL [] 0 setdash 2 copy vpt 0 360 arc closepath fill + vpt 0 360 arc closepath } bind def +/Rec { newpath 4 2 roll moveto 1 index 0 rlineto 0 exch rlineto + neg 0 rlineto closepath } bind def +/Square { dup Rec } bind def +/Bsquare { vpt sub exch vpt sub exch vpt2 Square } bind def +/S0 { BL [] 0 setdash 2 copy moveto 0 vpt rlineto BL Bsquare } bind def +/S1 { BL [] 0 setdash 2 copy vpt Square fill Bsquare } bind def +/S2 { BL [] 0 setdash 2 copy exch vpt sub exch vpt Square fill Bsquare } bind def +/S3 { BL [] 0 setdash 2 copy exch vpt sub exch vpt2 vpt Rec fill Bsquare } bind def +/S4 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt Square fill Bsquare } bind def +/S5 { BL [] 0 setdash 2 copy 2 copy vpt Square fill + exch vpt sub exch vpt sub vpt Square fill Bsquare } bind def +/S6 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill Bsquare } bind def +/S7 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt vpt2 Rec fill + 2 copy vpt Square fill + Bsquare } bind def +/S8 { BL [] 0 setdash 2 copy vpt sub vpt Square fill Bsquare } bind def +/S9 { BL [] 0 setdash 2 copy vpt sub vpt vpt2 Rec fill Bsquare } bind def +/S10 { BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt Square fill + Bsquare } bind def +/S11 { BL [] 0 setdash 2 copy vpt sub vpt Square fill 2 copy exch vpt sub exch vpt2 vpt Rec fill + Bsquare } bind def +/S12 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill Bsquare } bind def +/S13 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill + 2 copy vpt Square fill Bsquare } bind def +/S14 { BL [] 0 setdash 2 copy exch vpt sub exch vpt sub vpt2 vpt Rec fill + 2 copy exch vpt sub exch vpt Square fill Bsquare } bind def +/S15 { BL [] 0 setdash 2 copy Bsquare fill Bsquare } bind def +/D0 { gsave translate 45 rotate 0 0 S0 stroke grestore } bind def +/D1 { gsave translate 45 rotate 0 0 S1 stroke grestore } bind def +/D2 { gsave translate 45 rotate 0 0 S2 stroke grestore } bind def +/D3 { gsave translate 45 rotate 0 0 S3 stroke grestore } bind def +/D4 { gsave translate 45 rotate 0 0 S4 stroke grestore } bind def +/D5 { gsave translate 45 rotate 0 0 S5 stroke grestore } bind def +/D6 { gsave translate 45 rotate 0 0 S6 stroke grestore } bind def +/D7 { gsave translate 45 rotate 0 0 S7 stroke grestore } bind def +/D8 { gsave translate 45 rotate 0 0 S8 stroke grestore } bind def +/D9 { gsave translate 45 rotate 0 0 S9 stroke grestore } bind def +/D10 { gsave translate 45 rotate 0 0 S10 stroke grestore } bind def +/D11 { gsave translate 45 rotate 0 0 S11 stroke grestore } bind def +/D12 { gsave translate 45 rotate 0 0 S12 stroke grestore } bind def +/D13 { gsave translate 45 rotate 0 0 S13 stroke grestore } bind def +/D14 { gsave translate 45 rotate 0 0 S14 stroke grestore } bind def +/D15 { gsave translate 45 rotate 0 0 S15 stroke grestore } bind def +/DiaE { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V closepath stroke } def +/BoxE { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V closepath stroke } def +/TriUE { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V closepath stroke } def +/TriDE { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V closepath stroke } def +/PentE { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + closepath stroke grestore } def +/CircE { stroke [] 0 setdash + hpt 0 360 arc stroke } def +/Opaque { gsave closepath 1 setgray fill grestore 0 setgray closepath } def +/DiaW { stroke [] 0 setdash vpt add M + hpt neg vpt neg V hpt vpt neg V + hpt vpt V hpt neg vpt V Opaque stroke } def +/BoxW { stroke [] 0 setdash exch hpt sub exch vpt add M + 0 vpt2 neg V hpt2 0 V 0 vpt2 V + hpt2 neg 0 V Opaque stroke } def +/TriUW { stroke [] 0 setdash vpt 1.12 mul add M + hpt neg vpt -1.62 mul V + hpt 2 mul 0 V + hpt neg vpt 1.62 mul V Opaque stroke } def +/TriDW { stroke [] 0 setdash vpt 1.12 mul sub M + hpt neg vpt 1.62 mul V + hpt 2 mul 0 V + hpt neg vpt -1.62 mul V Opaque stroke } def +/PentW { stroke [] 0 setdash gsave + translate 0 hpt M 4 {72 rotate 0 hpt L} repeat + Opaque stroke grestore } def +/CircW { stroke [] 0 setdash + hpt 0 360 arc Opaque stroke } def +/BoxFill { gsave Rec 1 setgray fill grestore } def +end +%%EndProlog +gnudict begin +gsave +50 50 translate +0.050 0.050 scale +0 setgray +newpath +(Helvetica) findfont 140 scalefont setfont +1.000 UL +LTb +1.000 UL +LTa +882 420 M +3200 0 V +1.000 UL +LTb +882 420 M +63 0 V +3137 0 R +-63 0 V +798 420 M +(0) Rshow +1.000 UL +LTa +882 768 M +3200 0 V +1.000 UL +LTb +882 768 M +63 0 V +3137 0 R +-63 0 V +798 768 M +(20000) Rshow +1.000 UL +LTa +882 1116 M +3200 0 V +1.000 UL +LTb +882 1116 M +63 0 V +3137 0 R +-63 0 V +-3221 0 R +(40000) Rshow +1.000 UL +LTa +882 1464 M +3200 0 V +1.000 UL +LTb +882 1464 M +63 0 V +3137 0 R +-63 0 V +-3221 0 R +(60000) Rshow +1.000 UL +LTa +882 1812 M +3200 0 V +1.000 UL +LTb +882 1812 M +63 0 V +3137 0 R +-63 0 V +-3221 0 R +(80000) Rshow +1.000 UL +LTa +882 2160 M +3200 0 V +1.000 UL +LTb +882 2160 M +63 0 V +3137 0 R +-63 0 V +-3221 0 R +(100000) Rshow +1.000 UL +LTa +882 2508 M +3200 0 V +1.000 UL +LTb +882 2508 M +63 0 V +3137 0 R +-63 0 V +-3221 0 R +(120000) Rshow +1.000 UL +LTa +882 2856 M +3200 0 V +1.000 UL +LTb +882 2856 M +63 0 V +3137 0 R +-63 0 V +-3221 0 R +(140000) Rshow +1.000 UL +LTa +882 420 M +0 2436 V +1.000 UL +LTb +882 420 M +0 63 V +0 2373 R +0 -63 V +882 280 M +(0) Cshow +1.000 UL +LTa +1202 420 M +0 2436 V +1.000 UL +LTb +1202 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(100) Cshow +1.000 UL +LTa +1522 420 M +0 2436 V +1.000 UL +LTb +1522 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(200) Cshow +1.000 UL +LTa +1842 420 M +0 2436 V +1.000 UL +LTb +1842 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(300) Cshow +1.000 UL +LTa +2162 420 M +0 2436 V +1.000 UL +LTb +2162 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(400) Cshow +1.000 UL +LTa +2482 420 M +0 2436 V +1.000 UL +LTb +2482 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(500) Cshow +1.000 UL +LTa +2802 420 M +0 2436 V +1.000 UL +LTb +2802 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(600) Cshow +1.000 UL +LTa +3122 420 M +0 2436 V +1.000 UL +LTb +3122 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(700) Cshow +1.000 UL +LTa +3442 420 M +0 2373 V +0 63 V +1.000 UL +LTb +3442 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(800) Cshow +1.000 UL +LTa +3762 420 M +0 2373 V +0 63 V +1.000 UL +LTb +3762 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(900) Cshow +1.000 UL +LTa +4082 420 M +0 2436 V +1.000 UL +LTb +4082 420 M +0 63 V +0 2373 R +0 -63 V +0 -2513 R +(1000) Cshow +1.000 UL +LTb +882 420 M +3200 0 V +0 2436 V +-3200 0 V +882 420 L +140 1638 M +currentpoint gsave translate 90 rotate 0 0 M +(nanoseconds) Cshow +grestore +2482 70 M +(seconds) Cshow +1.000 UL +LT0 +883 628 M +3 589 V +3 -538 V +3 -63 V +4 51 V +3 -42 V +3 458 V +3 -424 V +3 19 V +4 -31 V +3 30 V +3 10 V +3 65 V +3 -89 V +4 -39 V +3 33 V +3 13 V +3 -10 V +3 23 V +3 -18 V +4 25 V +3 -17 V +3 232 V +3 -248 V +4 19 V +3 -35 V +3 8 V +3 -46 V +3 9 V +4 19 V +3 -17 V +3 9 V +3 243 V +3 -226 V +4 -41 V +3 38 V +3 0 V +3 5 V +3 -40 V +4 571 V +3 823 V +3 -1311 V +3 -44 V +3 7 V +4 4 V +3 9 V +3 11 V +3 13 V +3 -81 V +3 54 V +4 -27 V +3 30 V +3 -8 V +3 -21 V +4 -2 V +3 19 V +3 -14 V +3 13 V +3 -25 V +4 27 V +3 15 V +3 1888 V +3 -1558 V +3 -413 V +4 38 V +3 2 V +3 -6 V +3 15 V +3 -42 V +4 24 V +3 -27 V +3 19 V +3 -8 V +3 1236 V +3 -1232 V +4 3 V +3 -15 V +3 9 V +3 10 V +3 -27 V +4 -31 V +3 40 V +3 17 V +3 -20 V +3 2 V +4 3 V +3 15 V +3 -27 V +3 15 V +4 -37 V +3 42 V +3 1 V +3 -10 V +3 -10 V +4 -22 V +3 11 V +3 34 V +3 -33 V +3 -6 V +4 3 V +3 13 V +3 17 V +3 -30 V +3 13 V +4 15 V +3 -26 V +3 341 V +3 -321 V +3 26 V +4 -25 V +3 39 V +3 -26 V +3 -10 V +3 -23 V +4 14 V +3 24 V +3 -14 V +3 3 V +3 -32 V +3 41 V +4 -24 V +3 18 V +3 -28 V +3 38 V +4 -24 V +3 28 V +3 -30 V +3 5 V +3 -7 V +4 -7 V +3 9 V +3 37 V +3 -43 V +3 19 V +4 -9 V +3 -16 V +3 39 V +3 -22 V +3 -29 V +4 126 V +3 1183 V +3 -1283 V +3 -30 V +3 -6 V +4 20 V +3 -18 V +3 28 V +3 -15 V +3 3 V +3 34 V +4 -27 V +3 18 V +3 -48 V +3 55 V +3 -23 V +4 14 V +3 10 V +3 -21 V +3 22 V +4 -23 V +3 12 V +3 -15 V +3 25 V +3 -49 V +4 54 V +3 -30 V +3 27 V +3 -18 V +3 10 V +4 -24 V +3 58 V +3 -42 V +3 12 V +3 784 V +4 -765 V +3 -15 V +3 56 V +3 -64 V +3 2 V +4 39 V +3 -32 V +3 -11 V +3 26 V +3 -15 V +4 7 V +3 -30 V +3 46 V +3 -7 V +3 16 V +4 -39 V +3 -9 V +3 -7 V +3 22 V +3 -32 V +4 16 V +3 0 V +3 1 V +3 -13 V +3 14 V +4 10 V +3 -39 V +3 10 V +3 -7 V +3 29 V +4 -28 V +3 35 V +3 686 V +3 -531 V +3 -97 V +4 26 V +3 -31 V +3 -58 V +3 45 V +3 27 V +3 -38 V +4 30 V +3 -80 V +3 83 V +3 -7 V +3 20 V +4 -26 V +3 108 V +3 -101 V +3 27 V +4 -49 V +3 41 V +3 -24 V +3 4 V +3 7 V +4 -3 V +3 -7 V +3 26 V +3 -9 V +3 33 V +4 132 V +3 204 V +3 -416 V +3 26 V +3 -37 V +4 23 V +3 1264 V +3 -1299 V +3 7 V +3 -11 V +3 21 V +4 -40 V +3 41 V +3 -28 V +3 28 V +3 -24 V +4 18 V +3 -12 V +3 6 V +3 -3 V +4 5 V +3 -22 V +3 24 V +3 3 V +3 10 V +4 -26 V +3 26 V +3 -16 V +3 12 V +3 -16 V +4 8 V +3 -30 V +3 3 V +3 -1 V +3 28 V +4 0 V +3 -1 V +3 -3 V +3 -28 V +3 341 V +3 -287 V +4 -42 V +3 38 V +3 -22 V +3 20 V +3 -19 V +4 39 V +3 -24 V +3 16 V +3 -9 V +3 20 V +4 -18 V +3 -21 V +3 14 V +3 -37 V +3 47 V +4 -21 V +3 20 V +3 -36 V +3 38 V +4 8 V +3 -20 V +3 -6 V +3 5 V +3 -21 V +4 30 V +3 -29 V +3 5 V +3 -8 V +3 15 V +4 -18 V +3 17 V +3 138 V +3 949 V +3 -1087 V +4 -14 V +3 -4 V +3 17 V +3 -16 V +3 5 V +3 -16 V +4 15 V +3 -7 V +3 23 V +3 -37 V +4 43 V +3 -14 V +3 -2 V +3 -3 V +3 36 V +4 -66 V +3 41 V +3 8 V +3 -17 V +3 16 V +4 2 V +3 9 V +3 -34 V +3 50 V +3 -48 V +4 18 V +3 -10 V +3 6 V +3 -2 V +3 12 V +4 -23 V +3 782 V +3 -758 V +3 50 V +3 -31 V +4 27 V +3 -4 V +3 29 V +3 -32 V +3 -2 V +4 0 V +3 2 V +3 4 V +3 -7 V +3 -48 V +4 15 V +3 -15 V +3 6 V +3 -52 V +3 71 V +4 -15 V +3 12 V +3 -4 V +3 30 V +3 -28 V +4 19 V +3 -21 V +3 15 V +3 -17 V +3 11 V +4 1 V +3 -21 V +3 34 V +3 -27 V +3 140 V +4 1432 V +3 -1456 V +3 -52 V +3 59 V +3 -36 V +3 15 V +4 -9 V +3 15 V +3 -17 V +3 4 V +3 -2 V +4 4 V +3 -4 V +3 -2 V +3 -29 V +3 31 V +4 2 V +3 -2 V +3 21 V +3 -11 V +4 -6 V +3 16 V +3 13 V +3 -34 V +3 39 V +4 -13 V +currentpoint stroke M +3 -21 V +3 54 V +3 -33 V +3 24 V +4 -29 V +3 64 V +3 447 V +3 757 V +3 -1299 V +4 549 V +3 -428 V +3 -128 V +3 0 V +3 14 V +3 -40 V +4 1 V +3 23 V +3 -41 V +3 50 V +4 -21 V +3 20 V +3 -15 V +3 -1 V +3 -6 V +4 -1 V +3 -19 V +3 26 V +3 -9 V +3 21 V +4 -28 V +3 29 V +3 1 V +3 -8 V +3 1 V +4 0 V +3 -7 V +3 12 V +3 -21 V +3 19 V +3 -15 V +4 256 V +3 -228 V +3 2 V +3 -3 V +3 2 V +4 -25 V +3 38 V +3 -3 V +3 21 V +3 -28 V +4 42 V +3 -24 V +3 -10 V +3 -10 V +3 -7 V +4 19 V +3 -14 V +3 7 V +3 -29 V +4 -3 V +3 -4 V +3 28 V +3 -18 V +3 10 V +4 3 V +3 23 V +3 -51 V +3 15 V +3 -29 V +4 36 V +3 -19 V +3 18 V +3 4 V +3 220 V +4 835 V +3 -1079 V +3 -18 V +3 7 V +3 17 V +3 -11 V +4 3 V +3 8 V +3 -11 V +3 24 V +3 -9 V +4 -1 V +3 4 V +3 5 V +3 -15 V +4 6 V +3 8 V +3 26 V +3 -26 V +3 0 V +4 5 V +3 27 V +3 -24 V +3 -2 V +3 3 V +4 -2 V +3 22 V +3 -38 V +3 28 V +3 -36 V +4 29 V +3 -15 V +3 28 V +3 601 V +3 -596 V +4 -36 V +3 6 V +3 27 V +3 -1 V +3 -41 V +3 56 V +4 -67 V +3 16 V +3 -30 V +3 -7 V +3 2108 V +4 -2072 V +3 -28 V +3 10 V +3 20 V +4 1 V +3 -125 V +3 130 V +3 -46 V +3 5 V +4 33 V +3 8 V +3 -21 V +3 -9 V +3 56 V +4 -5 V +3 -38 V +3 36 V +3 -49 V +3 31 V +4 4 V +3 640 V +3 564 V +3 -1162 V +3 46 V +3 -53 V +4 40 V +3 -32 V +3 16 V +3 2 V +3 7 V +4 17 V +3 -26 V +3 4 V +3 -1 V +3 -19 V +4 18 V +3 -13 V +3 35 V +3 -15 V +4 -19 V +3 3 V +3 25 V +3 -34 V +3 37 V +4 -28 V +3 31 V +3 -18 V +3 31 V +3 -56 V +4 -4 V +3 -24 V +3 -30 V +3 20 V +3 33 V +4 1300 V +3 -1283 V +3 1 V +3 6 V +3 -18 V +3 20 V +4 -21 V +3 -10 V +3 35 V +3 -20 V +3 498 V +4 -391 V +3 -145 V +3 3 V +3 8 V +3 -58 V +4 50 V +3 -12 V +3 6 V +3 -1 V +4 -9 V +3 -1 V +3 5 V +3 10 V +3 14 V +4 -23 V +3 20 V +3 -2 V +3 -9 V +3 4 V +4 -1 V +3 -7 V +3 2 V +3 246 V +3 -249 V +4 10 V +3 9 V +3 -28 V +3 47 V +3 -16 V +3 -9 V +4 3 V +3 22 V +3 -26 V +3 21 V +3 -5 V +4 33 V +3 -50 V +3 -21 V +3 40 V +4 -18 V +3 14 V +3 -29 V +3 29 V +3 -24 V +4 29 V +3 -38 V +3 41 V +3 -24 V +3 41 V +4 -49 V +3 19 V +3 -14 V +3 10 V +3 -2 V +4 9 V +3 192 V +3 903 V +3 -1091 V +3 8 V +4 -21 V +3 16 V +3 -13 V +3 12 V +3 -9 V +3 18 V +4 -9 V +3 -7 V +3 4 V +3 12 V +4 -5 V +3 0 V +3 -5 V +3 14 V +3 -11 V +4 26 V +3 -34 V +3 -53 V +3 66 V +3 35 V +4 -39 V +3 16 V +3 4 V +3 -43 V +3 52 V +4 -36 V +3 22 V +3 -26 V +3 28 V +3 -15 V +4 199 V +3 -180 V +3 32 V +3 -19 V +3 -10 V +4 -11 V +3 -39 V +3 48 V +3 -26 V +3 20 V +3 24 V +4 -43 V +3 26 V +3 -36 V +3 22 V +3 3 V +4 -7 V +3 -12 V +3 14 V +3 -24 V +4 16 V +3 -6 V +3 4 V +3 13 V +3 -26 V +4 29 V +3 3 V +3 9 V +3 -16 V +3 4 V +4 22 V +3 -45 V +3 45 V +3 -31 V +3 1300 V +4 -1245 V +3 -27 V +3 29 V +3 -29 V +3 20 V +4 -20 V +3 37 V +3 -32 V +3 42 V +3 -40 V +4 23 V +3 -5 V +3 -70 V +3 32 V +3 24 V +4 -11 V +3 36 V +3 -12 V +3 -17 V +3 1 V +4 7 V +3 -1 V +3 -33 V +3 -3 V +3 -45 V +4 30 V +3 -10 V +3 16 V +3 -1 V +3 -1 V +4 -1 V +3 7 V +3 790 V +3 -631 V +3 -117 V +4 -9 V +3 28 V +3 -15 V +3 28 V +3 -45 V +3 36 V +4 -20 V +3 -22 V +3 47 V +3 -40 V +3 24 V +4 -33 V +3 50 V +3 -20 V +3 453 V +4 46 V +3 -544 V +3 -2 V +3 1 V +3 -6 V +4 -42 V +3 24 V +3 21 V +3 -8 V +3 -2 V +4 -22 V +3 30 V +3 -2 V +3 1 V +3 -20 V +4 -3 V +3 1284 V +3 -1273 V +3 18 V +3 -39 V +3 40 V +4 -3 V +3 -10 V +3 -3 V +3 -16 V +3 15 V +4 5 V +3 11 V +3 -18 V +3 -8 V +4 3 V +3 38 V +3 -43 V +3 24 V +3 -8 V +4 -16 V +3 21 V +3 -16 V +3 -10 V +3 -22 V +4 51 V +currentpoint stroke M +3 -29 V +3 22 V +3 -31 V +3 15 V +4 -10 V +3 30 V +3 -53 V +3 19 V +3 400 V +4 -415 V +3 3 V +3 15 V +3 -14 V +3 -1 V +3 3 V +4 -12 V +3 20 V +3 -2 V +3 -7 V +3 -22 V +4 19 V +3 -4 V +3 22 V +3 -3 V +3 -10 V +4 -1 V +3 -10 V +3 15 V +3 5 V +4 -18 V +3 7 V +3 30 V +3 -18 V +3 19 V +4 -34 V +3 11 V +3 5 V +3 -20 V +3 15 V +4 -31 V +3 34 V +3 224 V +3 1103 V +3 -1388 V +4 47 V +3 -14 V +3 5 V +3 5 V +3 -6 V +4 -14 V +3 22 V +3 -8 V +3 -22 V +3 13 V +4 17 V +3 3 V +3 -18 V +3 -12 V +3 7 V +4 2 V +3 -14 V +3 12 V +3 -2 V +3 -10 V +4 12 V +3 2 V +3 -9 V +3 15 V +3 -6 V +4 -3 V +3 4 V +3 3 V +3 12 V +3 -31 V +4 30 V +3 780 V +3 -707 V +3 -49 V +3 32 V +4 -1 V +3 -18 V +3 1 V +3 -14 V +3 22 V +4 7 V +3 -43 V +3 18 V +3 32 V +3 -17 V +3 1 V +4 -29 V +3 7 V +3 33 V +3 -14 V +4 -35 V +3 -6 V +3 -24 V +3 23 V +3 -6 V +4 -4 V +3 15 V +3 -37 V +3 2 V +3 5 V +4 -9 V +3 29 V +3 -26 V +3 11 V +3 728 V +4 -606 V +3 -32 V +3 -16 V +3 10 V +3 -12 V +3 -2 V +4 28 V +3 -13 V +3 -26 V +3 9 V +4 5 V +3 14 V +3 -18 V +3 32 V +3 -16 V +4 -7 V +3 -20 V +3 24 V +3 -24 V +3 30 V +4 -51 V +3 45 V +3 516 V +3 -101 V +3 -450 V +4 20 V +3 -10 V +3 -20 V +3 7 V +3 1 V +4 3 V +3 13 V +3 45 V +3 1128 V +3 -1227 V +4 -52 V +3 -5 V +3 42 V +3 -11 V +3 15 V +4 -12 V +3 26 V +3 -18 V +3 3 V +3 -10 V +4 -49 V +3 45 V +3 -1 V +3 28 V +3 -22 V +4 1 V +3 1 V +3 -12 V +3 26 V +3 -20 V +4 3 V +3 22 V +3 -20 V +3 9 V +3 -32 V +4 28 V +3 -27 V +3 18 V +3 -9 V +3 14 V +4 -30 V +3 1081 V +3 268 V +3 -1320 V +3 -15 V +4 10 V +3 -20 V +3 44 V +3 -18 V +3 -5 V +3 -17 V +4 10 V +3 5 V +3 -4 V +3 -3 V +3 -9 V +4 4 V +3 9 V +3 12 V +3 -15 V +4 12 V +3 -28 V +3 16 V +3 46 V +3 -45 V +stroke +grestore +end +showpage +%%Trailer +%%DocumentFonts: Helvetica diff --git a/share/doc/papers/timecounter/timecounter.ms b/share/doc/papers/timecounter/timecounter.ms new file mode 100644 index 0000000..097ab65 --- /dev/null +++ b/share/doc/papers/timecounter/timecounter.ms @@ -0,0 +1,1076 @@ +.EQ +delim øø +.EN +.\" +.\" ---------------------------------------------------------------------------- +.\" "THE BEER-WARE LICENSE" (Revision 42): +.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you +.\" can do whatever you want with this stuff. If we meet some day, and you think +.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp +.\" ---------------------------------------------------------------------------- +.\" +.\" $FreeBSD$ +.\" +.if n .ND +.TI +Timecounters: Efficient and precise timekeeping in SMP kernels. +.AA +.A "Poul-Henning Kamp" "The FreeBSD Project" +.AB +.PP +The FreeBSD timecounters are an architecture-independent implementation +of a binary timescale using whatever hardware support is at hand +for tracking time. The binary timescale converts using simple +multiplication to canonical timescales based on micro- or nano-seconds +and can interface seamlessly to the NTP PLL/FLL facilities for clock +synchronisation. Timecounters are implemented using lock-less +stable-storage based primitives which scale efficiently in SMP +systems. The math and implementation behind timecounters will +be detailed as well as the mechanisms used for synchronisation. \** +.AE +.FS +This paper was presented at the EuroBSDcon 2002 conference in Amsterdam. +.FE +.SH +Introduction +.PP +Despite digging around for it, I have not been able to positively +identify the first computer which knew the time of day. +The feature probably arrived either from the commercial side +so service centres could bill computer cycles to customers or from +the technical side so computers could timestamp external events, +but I have not been able to conclusively nail the first implementation down. +.LP +But there is no doubt that it happened very early in the development +of computers +and if systems like the ``SAGE'' [SAGE] did not know what time +it was I would be amazed. +.LP +On the other hand, it took a long time for a real time clock to +become a standard feature: +.LP +The ``Apple ]['' computer +had neither in hardware or software any notion what time it was. +.LP +The original ``IBM PC'' did know what time it was, provided you typed +it in when you booted it, but it forgot when you turned it off. +.LP +One of the ``advanced technologies'' in the ``IBM PC/AT'' was a battery +backed CMOS chip which kept track of time even when the computer +was powered off. +.LP +Today we expect our computers to know the time, and with network +protocols like NTP we will usually find that they do, give and +take some milliseconds. +.LP +This article is about the code in the FreeBSD kernel which keeps +track of time. +.SH +Time and timescale basics +.PP +Despite the fact that time is the physical quantity (or maybe entity +?) about which we know the least, it is at the same time [sic!] what we +can measure with the highest precision of all physical quantities. +.LP +The current crop of atomic clocks will neither gain nor lose a +second in the next couple hundred million years, provided we +stick to the preventative maintenance schedules. This is a feat +roughly in line with to knowing the circumference of the Earth +with one micrometer precision, in real time. +.LP +While it is possible to measure time by means other than oscillations, +for instance transport or consumption of a substance at a well-known +rate, such designs play no practical role in time measurement because +their performance is significantly inferior to oscillation based +designs. +.LP +In other words, it is pretty fair to say that all relevant +timekeeping is based on oscillating phenomena: +.TS +center; +l l. +sun-dial Earths rotation about its axis. +calendar Ditto + Earths orbit around the sun. +clockwork Mechanical oscillation of pendulum. +crystals Mechanical resonance in quartz. +atomic Quantum-state transitions in atoms. +.TE +.LP +We can therefore with good fidelity define ``a clock'' to be the +combination of an oscillator and a counting mechanism: +.LP +.if t .PSPIC fig3.eps +.LP +The standard second is currently defined as +.QP +The duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. +.LP +and we have frequency standards which are able to mark a sequence of +such seconds +with an error less than ø2 cdot 10 sup{-15}ø [DMK2001] with commercially +available products doing better than ø1 cdot 10 sup{-14}ø [AG2002]. +.LP +Unlike other physical units with a conventionally defined origin, +longitude for instance, the ephemeral nature of time prevents us +from putting a stake in the ground, so to speak, and measure from +there. For measuring time we have to rely on ``dead reckoning'', +just like the navigators before Harrison built his clocks [RGO2002]: +We have to tally how far we went from our reference point, keeping a +running total at all times, and use that as our estimated position. +.LP +The upshot of this is, that we cannot define a timescale by any +other means than some other timescale(s). +.LP +``Relative time'' is a time interval between two events, and for this +we only need to agree on the rate of the oscillator. +.LP +``Absolute time'' consists of a well defined point in time and the +time interval since then, this is a bit more tricky. +.LP +The Internationally agreed upon TAI and the UTC timescales +starts at (from a physics point of view) arbitrary points in time +and progresses in integral intervals of the standard second, with the +difference being that UTC does tricks to the counting to stay roughly +in sync with Earths rotation \**. +.FS +The first atomic based definition actually operated in a different way: +each year would have its own value determined for the frequency of the +caesium resonance, selected so as to match the revolution rate of the +Earth. This resulted in time-intervals being very unwieldy business, +and more and more scientists realized that that the caesium resonance +was many times more stable than the angular momentum of the Earth. +Eventually the new leap-second method were introduced in 1972. +It is interesting to note that the autumn leaves falling on the +northern hemisphere affects the angular momentum enough to change +the Earths rotational rate measurably. +.FE +.LP +TAI is defined as a sequence of standard seconds (the first timescale), +counted from January 1st 1958 (the second timescale). +.LP +UTC is defined basically the same way, but every so often a leap-second +is inserted (or theoretically deleted) to keep UTC synchronised +with Earths rotation. +.LP +Both the implementation of these two, and a few others speciality +timescales are the result of the +combined efforts of several hundred atomic frequency standards in +various laboratories and institutions throughout the world, all +reporting to the BIPM in Paris who calculate the ``paper clock'' which +TAI and UTC really are using a carefully designed weighting algorithm \**. +.FS +The majority of these clocks are model 5071A from Agilent (the test +and measurement company formerly known as ``Hewlett-Packard'') which +count for as much as 85% of the combined weight. +A fact the company deservedly is proud of. +The majority of the remaining weight is assigned to a handful of big +custom-design units like the PTB2 and NIST7. +.FE +.LP +Leap seconds are typically announced six to nine months in advance, +based on precise observations of median transit times of stars and VLBI +radio astronomy of very distant quasars. +.LP +The perceived wisdom of leap-seconds have been gradually decreasing +in recent years, as devices and products with built-in calendar +functionality becomes more and more common and people realize that +user input or software upgrades are necessary to instruct the +calendar functionality about upcoming leap seconds. +.SH +UNIX timescales +.PP +UNIX systems use a timescale which pretends to be UTC, but defined +as the count of standard seconds since 00:00:00 01-01-1970 UTC, +ignoring the leap-seconds. This definition has never been perceived +as wise. +.LP +Ignoring leap seconds means that unless some trickery is performed +when a leap second happens on the UTC scale, UNIX clocks would be +one second off. Another implication is that the length of a +time interval calculated on UNIX time_t variables, can be up to 22 +(and counting) seconds wrong relative to the same time interval +measured on the UTC timescale. +.LP +Recent efforts have tried to make the NTP protocol make up for this +deficiency by transmitting the UTC-TAI offset as part of the protocol. +[MILLS2000A] +.LP +Fractional seconds are represented two ways in UNIX, ``timeval'' and +``timespec''. Both of these formats are two-component structures +which record the number of seconds, and the number of microseconds +or nanoseconds respectively. +.LP +This unfortunate definition makes arithmetic on these two formats +quite expensive to perform in terms of computer instructions: +.DS +.ps -1 +/* Subtract timeval from timespec */ +t3.tv_sec = t1.tv_sec - t2.tv_sec; +t3.tv_nsec = t1.tv_nsec - + t2.tv_usec * 1000; +if (t3.tv_nsec >= 1000000000) { + t3.tv_sec++; + t3.tv_nsec -= 1000000000; +} else if (t3.tv_nsec < 0) { + t3.tv_sec--; + t3.tv_nsec += 1000000000; +} +.ps +1 +.DE +.LP +While nanoseconds will probably be enough for most timestamping +tasks faced by UNIX computers for a number of years, it is an +increasingly uncomfortable situation that CPU clock periods and +instruction timings are already not representable in the standard +time formats available on UNIX for consumer grade hardware, +and the first POSIX mandated API, \fCclock_getres(3)\fP has +already effectively reached end of life as a result of this. +.LP +Hopefully the various standards bodies will address this issue +better in the future. +.SH +Precision, Stability and Resolution +.PP +Three very important terms in timekeeping are ``precision'', +``stability'' and ``resolution''. +While the three words may seem to describe somewhat the +same property in most uses, their use in timekeeping covers three +very distinct and well defined properties of a clock. +.LP +Resolution in clocks is simply a matter of the step-size of the +counter or in other words: the rate at which it steps. +A counter running on a 1 MHz frequency will have a resolution +of 1 microsecond. +.LP +Precision talks about how close to the intended rate the clock runs, +stability about how much the rate varies and resolution about the +size of the smallest timeinterval we can measure. +.LP +From a quality point of view, Stability is a much more +valuable property than precision, this is probably best explained +using a graphic illustration of the difference between the two +concepts: +.LP +.if t .PSPIC fig1.eps +.LP +In the top row we have instability, the bullet holes are spread over +a large fraction of the target area. +In the bottom row, the bullets all hit in a very small area. +.LP +On the left side, we have lack of precision, the holes obviously are +not centred on the target, a systematic offset exists. +In the right side we have precision, the bullets are centred on +the target \**. +.FS +We cannot easily get resolution into this analogy, the obvious +representation as the diameter of the bullet-hole is not correct, +it would have to be the grid or other pattern of locations where +the bullet could possibly penetrate the target material, but this +gets too quantum-mechanical-oid to serve the instructional purpose. +.FE +.LP +Transposing these four targets to actual clocks, the situation +could look like the following plots: +.LP +.if t .PSPIC fig2.eps +.LP +On the x-axis we have time and on the y-axis how wrong the clock +was at a given point in time. +.LP +The reason atomic standards are such a big deal in timekeeping is +that they are incredibly stable: they are able to generate an oscillation +where the period varies by roughly a millionth of a billonth of a +second in long term measurements. +.LP +They are in fact not nearly as precise as they are stable, but as +one can see from the graphic above, a stable clock which is not +precise can be easily corrected for the offset and thus calibrated +is as good as any clock. +.LP +This lack of precision is not necessarily a flaw in these kinds of +devices, once you get into the ø10 cdot 10 sup{-15}ø territory +things like the blackbody spectrum at the particular absolute +temperature of the clocks hardware and general relativistic +effects mostly dependent on the altitude above earths center +has to be corrected for \**. +.FS +This particularly becomes an issue with space-based atomic standards +as those found on the ``Navstar'' GPS satellites. +.FE +.SH +Design goals of timecounters +.PP +After this brief description of the major features of the local +landscape, we can look at the design goals of timecounters in detail: +.LP +.I "Provide timestamps in timeval and timespec formats," +.IP +This is obviously the basic task we have to solve, but as was noted +earlier, this is in no way the performance requirement. +.LP +.I "on both the ``uptime'' and the POSIX timescales," +.IP +The ``uptime'' timescale is convenient for time intervals which are +not anchored in UTC time: the run time of processes, the access +time of disks and similar. +.IP +The uptime timescale counts seconds starting from when the system +is booted. The POSIX/UTC timescale is implemented by adding an +estimate of the POSIX time when the system booted to the uptime +timescale. +.LP +.I "using whatever hardware we have available at the time," +.IP +Which in a subtle way also implies ``be able to switch from one +piece of hardware to another on the fly'' since we may not know +right up front what hardware we have access to and which is +preferable to use. +.LP +.I "while supporting time the NTP PLL/FLL discipline code," +.IP +The NTP kernel PLL/FLL code allows the local clock and timescale +to be synchronised or syntonised to an external timescale either +via network packets or hardware connection. This also implies +that the rate and phase of the timescale must be manoeuvrable +with sufficient resolution. +.LP +.I "and providing support for the RFC 2783 PPS API," +.IP +This is mainly for the benefit of the NTPD daemons communication +with external clock or frequency hardware, but it has many other +interesting uses as well [PHK2001]. +.LP +.I "in a SMP efficient way." +.IP +Timestamps are used many places in the kernel and often at pretty +high rate so it is important that the timekeeping facility +does not become a point of CPU or lock contention. +.SH +Timecounter timestamp format. +.PP +Choosing the fundamental timestamp format for the timecounters is +mostly a question of the resolution and steer-ability requirements. +.LP +There are two basic options on contemporary hardware: use a 32 bit +integer for the fractional part of seconds, or use a 64 bit which +is computationally more expensive. +.LP +The question therefore reduced to the somewhat simpler: can we get +away with using only 32 bit ? +.LP +Since 32 bits fractional seconds have a resolution of slightly +better than quarter of a nanosecond (.2328 nsec) it can obviously +be converted to nanosecond resolution struct timespec timestamps +with no loss of precision, but unfortunately not with pure 32 bit +arithmetic as that would result in unacceptable rounding errors. +.LP +But timecounters also need to represent the clock period of the +chosen hardware and this hardware might be the GHz range CPU-clock. +The list of clock frequencies we could support with 32 bits are: +.TS +center; +l l n l. +ø2 sup{32} / 1ø ø=ø 4.294 GHz +ø2 sup{32} / 2ø ø=ø 2.147 GHz +ø2 sup{32} / 3ø ø=ø 1.432 GHz +\&... +ø2 sup{32} / (2 sup{32}-1)ø ø=ø 1.000 Hz +.TE +We can immediately see that 32 bit is insufficient to faithfully +represent clock frequencies even in the low GHz area, much less in +the range of frequencies which have already been vapourwared by +both IBM, Intel and AMD. +QED: 32 bit fractions are not enough. +.LP +With 64 bit fractions the same table looks like: +.TS +center; +l l r l. +ø2 sup{64} / 1ø ø=ø ø 18.45 cdot 10 sup{9}ø GHz +ø2 sup{64} / 2ø ø=ø ø 9.223 cdot 10 sup{9}ø GHz +\&... +ø2 sup{64} / 2 sup{32}ø ø=ø 4.294 GHz +\&... +ø2 sup{64} / (2 sup{64}-1)ø ø=ø 1.000 Hz +.TE +And the resolution in the 4 GHz frequency range is approximately one Hz. +.LP +The following format have therefore been chosen as the basic format +for timecounters operations: +.DS +.ps -1 +struct bintime { + time_t sec; + uint64_t frac; +}; +.ps +1 +.DE +Notice that the format will adapt to any size of time_t variable, +keeping timecounters safely out of the ``We SHALL prepare for the +Y2.038K problem'' war zone. +.LP +One beauty of the bintime format, compared to the timeval and +timespec formats is that it is a binary number, not a pseudo-decimal +number. If compilers and standards allowed, the representation +would have been ``int128_t'' or at least ``int96_t'', but since this +is currently not possible, we have to express the simple concept +of multiword addition in the C language which has no concept of a +``carry bit''. +.LP +To add two bintime values, the code therefore looks like this \**: +.FS +If the reader suspects the '>' is a typo, further study is suggested. +.FE +.LP +.DS +.ps -1 +uint64_t u; + +u = bt1->frac; +bt3->frac = bt1->frac + bt2->frac; +bt3->sec = bt1->sec + bt2->sec; +if (u > bt3->frac) + bt3->sec += 1; +.ps +1 +.DE +.LP +An important property of the bintime format is that it can be +converted to and from timeval and timespec formats with simple +multiplication and shift operations as shown in these two +actual code fragments: +.DS +.ps -1 +void +bintime2timespec(struct bintime *bt, + struct timespec *ts) +{ + + ts->tv_sec = bt->sec; + ts->tv_nsec = + ((uint64_t)1000000000 * + (uint32_t)(bt->frac >> 32)) >> 32; +} +.ps +1 +.DE +.DS +.ps -1 +void +timespec2bintime(struct timespec *ts, + struct bintime *bt) +{ + + bt->sec = ts->tv_sec; + /* 18446744073 = + int(2^64 / 1000000000) */ + bt->frac = ts->tv_nsec * + (uint64_t)18446744073LL; +} +.ps +1 +.DE +.LP +.SH +How timecounters work +.PP +To produce a current timestamp the timecounter code +reads the hardware counter, subtracts a reference +count to find the number of steps the counter has +progressed since the reference timestamp. +This number of steps is multiplied with a factor +derived from the counters frequency, taking into account +any corrections from the NTP PLL/FLL and this product +is added to the reference timestamp to get a timestamp. +.LP +This timestamp is on the ``uptime'' time scale, so if +UNIX/UTC time is requested, the estimated time of boot is +added to the timestamp and finally it is scaled to the +timeval or timespec if that is the desired format. +.LP +A fairly large number of functions are provided to produce +timestamps, depending on the desired timescale and output +format: +.TS +center; +l r r. +Desired uptime UTC/POSIX +Format timescale timescale +_ +bintime binuptime() bintime() +timespec nanouptime() nanotime() +timeval microuptime() microtime() +.TE +.LP +Some applications need to timestamp events, but are not +particular picky about the precision. +In many cases a precision of tenths or hundreds of +seconds is sufficient. +.LP +A very typical case is UNIX file timestamps: +There is little point in spending computational resources getting an +exact nanosecond timestamp, when the data is written to +a mechanical device which has several milliseconds of unpredictable +delay before the operation is completed. +.LP +Therefore a complementary shadow family of timestamping functions +with the prefix ``get'' have been added. +.LP +These functions return the reference +timestamp from the current timehands structure without going to the +hardware to determine how much time has elapsed since then. +These timestamps are known to be correct to within rate at which +the periodic update runs, which in practice means 1 to 10 milliseconds. +.SH +Timecounter math +.LP +The delta-count operation is straightforward subtraction, but we +need to logically AND the result with a bit-mask with the same number +(or less) bits as the counter implements, +to prevent higher order bits from getting set when the counter rolls over: +.DS +.ce +.EQ +Delta Count = (Count sub{now} - Count sub{ref}) ~ BITAND ~ mask +.EN +.DE +The scaling step is straightforward. +.DS +.ce +.EQ +T sub{now} = Delta Count cdot R sub{counter} + T sub{ref} +.EN +.DE +The scaling factor øR sub{counter}ø will be described below. +.LP +At regular intervals, scheduled by \fChardclock()\fP, a housekeeping +routine is run which does the following: +.LP +A timestamp with associated hardware counter reading is elevated +to be the new reference timecount: +.DS + +.ce +.EQ +Delta Count = (Count sub{now} - Count sub{ref}) ~ BITAND ~ mask +.EN + +.ce +.EQ +T sub{now} = Delta Count cdot R sub{counter} +.EN + +.ce +.EQ +Count sub{ref} = Count sub{now} +.EN + +.ce +.EQ +T sub{ref} = T sub{now} +.EN +.DE +.LP +If a new second has started, the NTP processing routines are called +and the correction they return and the counters frequency is used +to calculate the new scaling factor øR sub{counter}ø: +.DS +.ce +.EQ +R sub{counter} = {2 sup{64} over Freq sub{counter}} cdot ( 1 + R sub{NTP} ) +.EN +.DE +Since we only have access to 64 bit arithmetic, dividing something +into ø2 sup{64}ø is a problem, so in the name of code clarity +and efficiency, we sacrifice the low order bit and instead calculate: +.DS +.ce +.EQ +R sub{counter} = 2 cdot {2 sup{63} over Freq sub{counter}} cdot ( 1 + R sub{NTP} ) +.EN +.DE +The øR sub{NTP}ø correct factor arrives as the signed number of +nanoseconds (with 32 bit binary fractions) to adjust per second. +This quasi-decimal number is a bit of a square peg in our round binary +hole, and a conversion factor is needed. +Ideally we want to multiply this factor by: +.DS +.ce +.EQ +2 sup {64} over {10 sup{9} cdot 2 sup{32}} = 4.294967296 +.EN +.DE +This is not a nice number to work with. +Fortunately, the precision of this correction is not critical, we are +within an factor of a million of the ø10 sup{-15}ø performance level +of state of the art atomic clocks, so we can use an approximation +on this term without anybody noticing. +.LP +Deciding which fraction to use as approximation needs to carefully +consider any possible overflows that could happen. +In this case the correction may be as large as \(+- 5000 PPM which +leaves us room to multiply with about 850 in a multiply-before-divide +setting. +Unfortunately, there are no good fractions which multiply with less +than 850 and at the same time divide by a power of two, which is +desirable since it can be implemented as a binary shift instead of +an expensive full division. +.LP +A divide-before-multiply approximation necessarily results in a loss +of lower order bits, but in this case dividing by 512 and multiplying +by 2199 gives a good approximation where the lower order bit loss is +not a concern: +.DE +.EQ +2199 over 512 = 4.294921875 +.EN +.DE +The resulting error is an systematic under compensation of 10.6PPM +of the requested change, or ø1.06 cdot 10 sup -14ø per nanosecond +of correction. +This is perfectly acceptable. +.LP +Putting it all together, including the one bit we put on the alter for the +Goddess of code clarity, the formula looks like this: +.DS +.ce +.EQ +R sub{counter} = 2 cdot {{2 sup{63} + 2199 cdot {R sub{NTP}} over 1024} over Freq sub{counter}} +.EN +.DE +Presented here in slightly unorthodox format to show the component arithmetic +operations as they are carried out in the code. +.SH +Frequency of the periodic update +.PP +The hardware counter should have a long enough +period, ie, number of distinct counter values divided by +frequency, to not roll over before our periodic update function +has had a chance to update the reference timestamp data. +.LP +The periodic update function is called from \fChardclock()\fP which +runs at a rate which is controlled by the kernel parameter +.I HZ . +.LP +By default HZ is 100 which means that only hardware with a period +longer than 10 msec is usable. +If HZ is configured higher than 1000, an internal divider is +activated to keep the timecounter periodic update running +no more often than 2000 times per second. +.LP +Let us take an example: +At HZ=100 a 16 bit counter can run no faster than: +.DS +.ce +.EQ +2 sup{16} cdot {100 Hz} = 6.5536 MHz +.EN +.DE +Similarly, if the counter runs at 10MHz, the minimum HZ is +.DS +.ce +.EQ +{10 MHz} over {2 sup{16}} = 152.6 Hz +.EN +.DE +.LP +Some amount of margin is of course always advisable, +and a factor two is considered prudent. +.LP +.SH +Locking, lack of ... +.PP +Provided our hardware can be read atomically, that our arithmetic +has enough bits to not roll over and that our clock frequency is +perfectly, or at least sufficiently, stable, we could avoid the +periodic update function, and consequently disregard the entire +issue of locking. +We are seldom that lucky in practice. +.LP +The straightforward way of dealing with meta data updates is to +put a lock of some kind on the data and grab hold of that before +doing anything. +This would however be a very heavy-handed approach. First of +all, the updates are infrequent compared to simple references, +second it is not important which particular state of meta data +a consumer gets hold of, as long as it is consistent: as long +as the øCount sub{ref}ø and øT sub{ref}ø are a matching pair, +and not old enough to cause an ambiguity with hardware counter +rollover, a valid timestamp can be derived from them. +.LP +A pseudo-stable-storage with generation count method has been +chosen instead. +A ring of ten ``timehands'' data structures are used to hold the +state of the timecounter system, the periodic update function +updates the next structure with the new reference data and +scaling factor and makes it the current timehands. +.LP +The beauty of this arrangement lies in the fact that even though +a particular ``timehands'' data structure has been bumped from being +the ``currents state'' by its successor, it still contains valid data +for some amount of time into the future. +.LP +Therefore, a process which has started the timestamping process but +suffered an interrupt which resulted in the above periodic processing +can continue unaware of this afterwards and not suffer corruption +or miscalculation even though it holds no locks on the shared +meta-data. +.if t .PSPIC fig4.eps +.LP +This scheme has an inherent risk that a process may be de-scheduled for +so long time that it will not manage to complete the timestamping +process before the entire ring of timehands have been recycled. +This case is covered by each timehand having a private generation number +which is temporarily set to zero during the periodic processing, to +mark inconsistent data, and incremented to one more than the +previous value when the update has finished and the timehands +is again consistent. +.LP +The timestamping code will grab a copy of this generation number and +compare this copy to the generation in the timehands after completion +and if they differ it will restart the timestamping calculation. +.DS +.ps -1 +do { + th = timehands; + gen = th->th_generation; + /* calculate timestamp */ +} while (gen == 0 || + gen != th->th_generation); +.ps +1 +.DE +.LP +Each hardware device supporting timecounting is represented by a +small data structure called a timecounter, which documents the +frequency, the number of bits implemented by the counter and a method +function to read the counter. +.LP +Part of the state in the timehands structure is a pointer to the +relevant timecounter structure, this makes it possible to change +to a one piece of hardware to another ``on the fly'' by updating +the current timehands pointer in a manner similar to the periodic +update function. +.LP +In practice this can be done with sysctl(8): +.DS +.ps -1 +sysctl kern.timecounter.hardware=TSC +.ps +1 +.DE +.LP +at any time while the system is running. +.SH +Suitable hardware +.PP +A closer look on ``suitable hardware'' is warranted +at this point. +It is obvious from the above description that the ideal hardware +for timecounting is a wide binary counter running at a constant +high frequency +and atomically readable by all CPUs in the system with a fast +instruction(-sequence). +.LP +When looking at the hardware support on the PC platform, one +is somewhat tempted to sigh deeply and mutter ``so much for theory'', +because none of the above parameters seems to have been on the +drawing board together yet. +.LP +All IBM PC derivatives contain a device more or less compatible +with the venerable Intel i8254 chip. +This device contains 3 counters of 16 bits each, +one of which is wired so it can interrupt the CPU when the +programmable terminal count is reached. +.LP +The problem with this device is that it only has 8bit bus-width, +so reading a 16 bit timestamp takes 3 I/O operations: one to latch +the count in an internal register, and two to read the high and +low parts of that register respectively. +.LP +Obviously, on multi-CPU systems this cannot be done without some +kind of locking mechanism preventing the other CPUs from trying +to do the same thing at the same time. +.LP +Less obviously we find it is even worse than that: +Since a low priority kernel thread +might be reading a timestamp when an interrupt comes in, and since +the interrupt thread might also attempt to generate a timestamp, +we need to totally block interrupts out while doing those three +I/O instructions. +.LP +And just to make life even more complicated, FreeBSD uses the same +counter to provide the periodic interrupts which schedule the +\fChardclock()\fP routine, so in addition the code has to deal with the +fact that the counter does not count down from a power of two and +that an interrupt is generated right after the reloading of the +counter when it reaches zero. +.LP +Ohh, and did I mention that the interrupt rate for hardclock() will +be set to a higher frequency if profiling is active ? \** +.FS +I will not even mention the fact that it can be set also to ridiculous +high frequencies in order to be able to use the binary driven ``beep'' +speaker in the PC in a PCM fashion to output ``real sounds''. +.FE +.LP +It hopefully doesn't ever get more complicated than that, but it +shows, in its own bizarre and twisted way, just how little help the +timecounter code needs from the actual hardware. +.LP +The next kind of hardware support to materialise was the ``CPU clock +counter'' called ``TSC'' in official data-sheets. +This is basically a on-CPU counter, which counts at the rate +of the CPU clock. +.LP +Unfortunately, the electrical power needed to run a CPU is pretty +precisely proportional with the clock frequency for the +prevailing CMOS chip technology, so +the advent of computers powered by batteries prompted technologies +like APM, ACPI, SpeedStep and others which varies or throttles the +CPU clock to match computing demand in order to minimise the power +consumption \**. +.FS +This technology also found ways into stationary computers from +two different vectors. +The first vector was technical: Cheaper cooling solutions can be used +for the CPU if they are employed resulting in cheaper commodity +hardware. +The second vector was political: For reasons beyond reason, energy +conservation became an issue with personal computers, despite the fact +that practically all north American households contains 4 to 5 household +items which through inefficient designs waste more power than a +personal computer use. +.FE +.LP +Another wiggle for the TSC is that it is not usable on multi-CPU +systems because the counter is implemented inside the CPU and +not readable from other CPUs in the system. +.LP +The counters on different CPUs are not guaranteed +to run syntonously (ie: show the same count at the same time). +For some architectures like the DEC/alpha architecture they do not even +run synchronously (ie: at the same rate) because the CPU clock frequency +is generated by a small SAW device on the chip which is very sensitive +to temperature changes. +.LP +The ACPI specification finally brings some light: +it postulates the existence of a 24 or 32 bit +counter running at a standardised constant frequency and +specifically notes that this is intended to be used for timekeeping. +.LP +The frequency chosen, 3.5795454... MHz\** +.FS +The reason for this odd-ball frequency has to be sought in the ghastly +colours offered by the original IBM PC Color Graphics Adapter: It +delivered NTSC format output and therefore introduced the NTSC colour +sync frequency into personal computers. +.FE + is not quite as high as one +could have wished for, but it is certainly a big improvement over +the i8254 hardware in terms of access path. +.LP +But trust it to Murphys Law: The majority of implementations so far +have failed to provide latching suitable to avoid meta-stability +problems, and several readings from the counter is necessary to +get a reliable timestamp. +In difference from the i8254 mentioned above, we do not need to +any locking while doing so, since each individual read is atomic. +.LP +An initialization routine tries to test if the ACPI counter is properly +latched by examining the width of a histogram over read delta-values. +.LP +Other architectures are similarly equipped with means for timekeeping, +but generally more carefully thought out compared to the haphazard +developments of the IBM PC architecture. +.LP +One final important wiggle of all this, is that it may not be possible +to determine which piece of hardware is best suited for clock +use until well into or even after the bootstrap process. +.LP +One example of this is the Loran-C receiver designed by Prof. Dave Mills +[MILLS1992] +which is unsuitable as timecounter until the daemon program which +implements the software-half of the receiver has properly initialised +and locked onto a Loran-C signal. +.SH +Ideal timecounter hardware +.LP +As proof of concept, a sort of an existentialist protest against +the sorry state describe above, the author undertook a project to +prove that it is possible to do better than that, since none of +the standard hardware offered a way to fully validate the timecounter +design. +.LP +Using a COTS product, ``HOT1'', from Virtual Computers Corporation +[VCC2002] containing a FPGA chip on a PCI form factor card, a 26 +bit timecounter running at 100MHz was successfully implemented. +.LP +.if t .PSPIC fig5.eps +.LP +.LP +In order to show that timestamping does not necessarily have to +be done using unpredictable and uncalibratable interrupts, an +array of latches were implemented as well, which allow up to 10 +external signals to latch the reading of the counter when +an external PPS signal transitions from logic high to logic +low or vice versa. +.LP +Using this setup, a standard 133 MHz Pentium based PC is able to +timestamp the PPS output of the Motorola UT+ GPS receiver with +a precision of \(+- 10 nanoseconds \(+- one count which in practice +averages out to roughly \(+- 15 nanoseconds\**: +.FS +The reason the plot does not show a very distinct 10 nanosecond +quantization is that the GPS receiver produces the PPS signal from +a clock with a roughly 55 nanosecond period and then predicts in +the serial data stream how many nanoseconds this will be offset +from the ideal time. +This plot shows the timestamps corrected for this ``negative +sawtooth correction''. +.FE +.LP +.if t .PSPIC gps.ps +.LP +It shold be noted that the author is no hardware wizard and +a number of issues in the implementation results in less than +ideal noise performance. +.LP +Now compare this to ``ideal'' timecounter to the normal setup +where the PPS signal is used +to trigger an interrupt via the DCD pin on a serial port, and +the interrupt handler calls \fCnanotime()\fP to timestamp +the external event \**: +.FS +In both cases, the computers clock frequency controlled +with a Rubidium Frequency standard. +The average quality of crystals used for computers would +totally obscure the curves due to their temperature coefficient. +.FE +.LP +.if t .PSPIC intr.ps +.LP +It is painfully obvious that the interrupt latency is the +dominant noise factor in PPS timestamping in the second case. +The asymetric distribution of the noise in the second plot +also more or less entirely invalidates the design assumption +in the NTP PLL/FLL kernel code that timestamps are dominated +by gaussian noise with few spikes. +.SH +Status and availability +.PP +The timecounter code has been developed and used in FreeBSD +for a number of years and has now reached maturity. +The source-code is located almost entirely in the kernel source file +kern_tc.c, with a few necessary adaptations in code which +interfaces to it, primarily the NTP PLL/FLL code. +.LP +The code runs on all FreeBSD platforms including i386, alpha, +PC98, sparc64, ia64 and s/390 and contains no wordsize or +endianess issues not specifically handled in the sourcecode. +.LP +The timecounter implementation is distributed under the ``BSD'' +open source license or the even more free ``Beer-ware'' license. +.LP +While the ability to accurately model and compensate for +inaccuracies typical of atomic frequency standards are not +catering to the larger userbase, but this ability and precision +of the code guarntees solid support for the widespread deployment +of NTP as a time synchronization protocol, without rounding +or accumulative errors. +.LP +Adding support for new hardware and platforms have been +done several times by other developers without any input from the +author, so this particular aspect of timecounters design +seems to work very well. +.SH +Future work +.PP +At this point in time, no specific plans exist for further +development of the timecounters code. +.LP +Various micro-optimizations, mostly to compensate for inadequate +compiler optimization could be contemplated, but the author +resists these on the basis that they significantly decrease +the readability of the source code. +.SH +Acknowledgements +.PP +.EQ +delim ññ +.EN +The author would like to thank: +.LP +Bruce Evans +for his invaluable assistance +in taming the evil i8254 timecounter, as well as the enthusiastic +resistance he has provided throughout. +.PP +Professor Dave Mills of University of Delaware for his work on +NTP, for lending out the neglected twin Loran-C receiver and for +picking up the glove when timecounters made it clear +that the old ``microkernel'' NTP timekeeping code were not up to snuff +[MILLS2000B]. +.PP +Tom Van Baak for helping out, despite the best efforts of the National +Danish Posts center for Customs and Dues to prevent it. +.PP +Corby Dawson for helping with the care and feeding for caesium standards. +.PP +The staff at the NELS Loran-C control station in Bø, Norway for providing +information about step-changes. +.PP +The staff at NELS Loran-C station Eiðe, Faeroe +Islands for permission to tour their installation. +.PP +The FreeBSD users for putting up with ``micro uptime went backwards''. +.SH +References +.LP +[AG2002] +Published specifications for Agilent model 5071A Primary Frequency +Standard on +.br +http://www.agilent.com +.LP +[DMK2001] +"Accuracy Evaluation of a Cesium Fountain Primary Frequency Standard at NIST." +D. M. Meekhof, S. R. Jefferts, M. Stephanovic, and T. E. Parker +IEEE Transactions on instrumentation and measurement, VOL. 50, NO. 2, +APRIL 2001. +.LP +[PHK2001] +"Monitoring Natural Gas Usage" +Poul-Henning Kamp +http://phk.freebsd.dk/Gasdims/ +.LP +[MILLS1992] +"A computer-controlled LORAN-C receiver for precision timekeeping." +Mills, D.L. +Electrical Engineering Department Report 92-3-1, University of Delaware, March 1992, 63 pp. +.LP +[MILLS2000A] +Levine, J., and D. Mills. "Using the Network Time Protocol to transmit International Atomic Time (TAI)". Proc. Precision Time and Time Interval (PTTI) Applications and Planning Meeting (Reston VA, November 2000), 431-439. +.LP +[MILLS2000B] +"The nanokernel." +Mills, D.L., and P.-H. Kamp. +Proc. Precision Time and Time Interval (PTTI) Applications and Planning Meeting (Reston VA, November 2000), 423-430. +.LP +[RGO2002] +For an introduction to Harrison and his clocks, see for +instance +.br +http://www.rog.nmm.ac.uk/museum/harrison/ +.br +or for +a more detailed and possibly better researched account: Dava +Sobels excellent book, "Longitude: The True Story of a Lone +Genius Who Solved the Greatest Scientific Problem of His +Time" Penguin USA (Paper); ISBN: 0140258795. +.LP +[SAGE] +This ``gee-wiz'' kind of article in Dr. Dobbs Journal is a good place to +start: +.br +http://www.ddj.com/documents/s=1493/ddj0001hc/0085a.htm +.LP +[VCC2002] +Please consult Virtual Computer Corporations homepage: +.br +http://www.vcc.com diff --git a/share/doc/papers/timecounter/tmac.usenix b/share/doc/papers/timecounter/tmac.usenix new file mode 100644 index 0000000..c0c13c2 --- /dev/null +++ b/share/doc/papers/timecounter/tmac.usenix @@ -0,0 +1,953 @@ +.\" $FreeBSD$ +.ds CC " +.nr PS 10 +.nr FU 0.0i \" priniter prints this much too low +.nr VS 11 +.ds Q `\h'-0.02i'` +.ds U '\h'-0.02i'' +.ds `` `\h'-0.02i'` +.ds '' '\h'-0.02i'' +.\" footnote stuff +.nr * 0 1 +.ds [. \|[ +.ds .] ] +.if t .ds [, \s-2\v'-.4m'\f2 +.if n .ds [, [ +.if t .ds ,] \v'.4m'\s+2\fP +.if n .ds ,] ] +.ds * \*([,\\n+*\*(,] +.ds [o `` +.ds [c '' +.ev 1 +.ps \n(PS +.vs \n(VS +.ev +.de pp +.PP +.. +.de PP +.LP +.if t .ti 0.3i +.if n .ti 5 +.. +.de LP +.if t .sp 0.3 +.if n .sp +.ne 1 +.in 0 +.nr Ia 0 +.nr Ic 0 +.fi +.. +.de IP +.if t .sp 0.3 +.if n .sp +.\" Ia = total indent for this guy +.\" Ib = .ti value for this guy +.\" Ic = auxiliary indent +.nr Ib 0.0i +.if \\n(Ia=0 .nr Ia 0.2i +.if !\\$1 \{\ +. nr Ia \w\\$1\ \ u +. nr Ib \\n(Ia +.\} +.if !\\$2 .nr Ia \\$2n +.in \\n(Iau +.in +\\n(Icu +.ti -\\n(Ibu +.if !\\$1 \{\ +\&\\$1\ \ \c +.\} +.. +.de QP +.IP +.. +.de RS +.nr Ic +0.2i +.. +.de RE +.nr Ic -0.2i +.. +.de PN +.rs +'sp |10.4i-\\n(FUu +.rs +'sp |10.4i-\\n(FUu \" how many traps could there be? +.rs +'sp |10.4i-\\n(FUu +.PO +'ie e \{\ +.ev 2 +.\".if t 'tl \s10\f3%\\*(CC\fP\s0 +.ev +'\} +'el \{\ +.ev 2 +.\".if t 'tl \s10\f3\\*(CC%\fP\s0 +.ev +'\} +.po +.wh 0 hh +'bp +.. +.de ff +.nr dn 0 +.if \\nx \{\ +. ev 1 +. vs \\n(VVu +. mk QR +' nr QS 11i+0.5v-1u+\\nyu +' if \\n(QS>\\n(QR 'if t 'sp |\\n(QSu +. nf +. FN \" print the footnotes +. vs +. rm FN +. if \\n(.zfy .br\" end overflow diversion +. if \\n(.zfy .di\" end overflow diversion +. nr x 0 1 +. ev +.\} +.nr N +1 +.if \n(dn .fz \" leftover footnote +.ie \\nN<\\nC \{\ +' if t 'sp |\\nTu +' ns +' po +3.12i \" postition of 2nd column +.\} +.el \{\ +. rF +. PN +. PO +. nr N 0 +.\} +.nr y 0-\\nb +.nr QQ 11i-\\nb +.ch fx +.ch ff +.if t .wh \\n(QQu ff +.if n .wh 66 ff +.wh 12i fx +.ch fx \\n(QQu +.if \\n(dn .fz +.. +.de fz \" get leftover footnote +.FS \& +.nf +.fy +.FE +.. +.de fx \" footnote overflow processing +.if \\nx .di fy +.. +.de FS \" start a footnote +.if \\n(.t<=1.7v .ne 2 +.da FN +.nr YY \\n(.lu +.ev 1 +.if t .ll \\n(YYu +.if n .ll 70 +.if \\n+x=1 .fs +.fi +.ie \\$1 \ \ \*([,\\n*\*(,]\c +.el \ \ \*([,\\$1\*(,]\c +.ps -1 +.vs -1 +.nr VV \\n(.v +.. +.de FE +.br +.ps +1 +.vs +1 +.ev +.da +.nr y -\\n(dn +.nr QR 11i-1v-1u+\\nyu \" y is negative +.ie \\n(nlu+1v<\\n(QRu .ch ff \\n(QRu +.el .ch ff \\n(nlu+1v +.. +.de fs +.br +.vs \\n(VS +\v'-0.4v'\s16\D'l 1.5i 0'\s0 +.sp -0.4v +.vs +.. +.de PO +.if t \{\ +.ie e .po 1.20i +.el .po 1.20i +.\} +.if n .po 0 +.. +.de NC +'PO +.if t 'll \\n(LLu +.if n 'll 78 +'nr N 0 +.. +.de 2C +.br +.nr LL 2.85i +'NC +'nr C 2 +'mk T +'ns +.. +.de 1C +.br +.if t .nr LL 6.5i +.if n .nr LL 78 +.NC +'nr C 1 +'mk T +'ns +.. +.de rF \" reset footer to nominal +.nr b 1.0i+\\n(FUu \" nominal footer place +.. +.rF +'nr x 0 1 \" init: +.nr y 0-\nb +.pl 11i +.nr QQ 11i+\ny +.wh \n(QQu ff +.wh 12i fx +.ch fx \n(QQu +.de hh +'rs +'if t 'sp |0.5i-\\n(FUu +.PO +'ie e \{\ +.ev 2 +'if t 'tl \s10\f3\\*(T2\\*(A2\fP\s0 +.ev +'\} +'el \{\ +.ev 2 +'if t 'tl \s10\f3\\*(A2\\*(T2\fP\s0 +.ev +'\} +'if t 'sp |1i-\\n(FUu +'mk T +'ns +'nr x 0 1 \" number of footnotes +.nr y 0-\\nb +.nr QQ 11i+\\ny +.ch ff +.wh \\n(QQu ff +.ch fx +.wh 12i fx +.ch fx \\n(QQu +.. +.\"------------------- +.de TI +.nh +.rs +.in 0i +.nr % \\$1 +.fi +.nr QS \\n(.lu +.ll 100i +.ps 14 +.vs 17 +.ft 3 +.ds TT \\ +.. +.de AA +.nr DL \w\\*(TT +.nr NN 1 +.nr NL \\n(QSu-1i \" a nice line length for title +.if \\n(NLu*\\n(NNu<\\n(DLu .nr NN +1 +.if \\n(NLu*\\n(NNu<\\n(DLu .nr NN +1 +.if \\n(NLu*\\n(NNu<\\n(DLu .nr NN +1 +.if \\n(NLu*\\n(NNu<\\n(DLu .nr NN +1 +.if \\n(NLu*\\n(NNu<\\n(DLu .nr NN +1 +.nr QR (\\n(DLu/\\n(NNu)+0.75i \" +.75 cuz words don't always balance +.ll \\n(QRu +.di TU +.ad l +\\*(TT +.br +.di +.sp |1.0i-\\n(FUu +.nr NP 0 +.if \\n(QSu>\\n(QRu .nr NP (\\n(QSu-\\n(QRu)/2u +.po +\\n(NPu +.ce 999 +.TU +.ce 0 +.po +.ll \\n(QSu +.sp 0.1i +.ft 1 +.ps 12 +.vs 14 +.sp 0.5 +.. +.de A \" .A "Brian Author" "Affiliation" +.in 0 +.ie !\\$2 \{\ +.ce +\f1\\$1 +.ce +\f2\\$2 +.\} +.el \{\ +.ce +\f1\\$1\f2 +.\} +.. +.de AB +.sp 0.20i +.po +0.5i +.ll -1.125i +.ce +\f3\s12ABSTRACT\s0\f1 +.sp 0.5 +.ps \\n(PS +.vs \\n(VS +.ad b +.fi +.. +.de EA +.sp +.if t .2C +.if n .1C +.hy 14 +.. +.de AE +.EA +.. +.de SH +.br +.in 0 +.di St +.ft 3 +.it 1 S2 +.. +.de SH +.NH "\\$1" "\\$2" "\\$3" +.. +.de S2 +.br +.di +.sp 0.75 +.ne 3 +.ce +.St +.br +.ft 1 +.sp 0.5 +.ns +.. +.de NH +.br +.ne 2 +.in 0 +.nr Ia 0 +.nr Ic 0 +.fi +.nr L 1 +.if !\\$1 .nr L \\$1\" level +.if \\nL1 .ft 3 +.if \\nL2 .ft 3 +.if \\nL3 .ft 2 +.di Nt +.in 0.3i +.ti 0 +.it 1 N2 +.. +.de N2 +.br +.in 0 +.di +.if t .if \\nL1 .sp 0.75 +.if t .if \\nL2 .sp 0.25 +.if t .if \\nL3 .sp 0.25 +.if t .if \\nL4 .sp 0.25 +.if n .sp +.ne 3 +.if \\nL1 .ce +.Nt +.br +.ft 1 +.if t .if \\nL1 .sp 0.50 +.if t .if \\nL2 .sp 0.25 +.if t .if \\nL3 .sp 0.25 +.if t .if \\nL4 .sp 0.25 +.if n .sp +.ns +.. +.de XP +.sp 0.5 +.ne 2 +.in \w[3]\ \ u +.ti 0 +.ns +.. +.de I +.nr PQ \\n(.f +.ft 2 +.if !"\\$1"" \&\\$1\\f\\n(PQ\\$2 +.. +.de R +.ft 1 +.. +.de B +.nr PQ \\n(.f +.ft 3 +.if !\\$1 \&\\$1\\f\\n(PQ\\$2 +.. +.de T +.nr PQ \\n(.f +.if !\\$1 \&\\$3\f(CW\\$1\\f\\n(PQ\\$2 +.. +.de Ds +'sp 0.4 +'nr DY \\n(.i +'in 0.1i +.if !\\$1 .in \\$1 +.ft CW +.nf +.. +.de DS +.br +.Ds \\$1 +.. +.de DE +.br +.De +.. +.de De +'sp 0.4 +.in \\n(DYu +.ft 1 +.fi +.. +.de np +.br +.in \w\(bu\ \ u +.ti -\w\(bu\ \ u +\(bu\ \ \c +.. +.de lp +.br +.in 0 +.. +.de TS +.br +.ul 0 +.sp 0.5 +.. +.de TE +.sp 0.5 +.. +.de RT +.ft 1 +.ce 0 +.ul 0 +.if t 'll \\n(LLu +.if n 'll \\n(LL +.ps \\n(PS +.vs \\n(VS +.in 0 +.\"bd 1 +.ta 5n 10n 15n 20n 25n 30n 35n 40n 45n 50n 55n 60n 65n 70n 75n 80n +.fi +.. +.de KF +'sp 0.4 +.ev 2 +.nr Zs \\n(.s +.nr Zv \\n(.v +.ll \\n(LLu +.in 0 +.. +.de KE +.br +.ps \\n(Zs +.vs \\n(Zvu +.ev +'sp 0.4 +.. +.de UX +\\$3\s-2UNIX\s0\\$1\\$2 +.. +.de SM +.ps -2 +.. +. \" LG - larger +.de LG +.ps +2 +.. +.de EB +.nr QQ 11i-\\nb-\\$1 +.nr b +\\n(QQu +.nr y 0+\\nyu-\\n(QQu +.nr QQ 11i+\\ny +.ch ff +.wh \\n(QQu ff +.ch fx +.wh 12i fx +.ch fx \\n(QQu +.. +.\"============================================== +.de Zz +.if \\nN=1 'ch Zz +'sp 11i +.. +.de Z +.br +.mk Qz +.ev 2 +.nr Qy \\n(.l +.ll 6.5i +.di J +.in 0 +.ft 1 +.. +.de ZZ +.br +.if !\\$1 \{\ +. if !\\$2 .ll \\$2 +. sp 0.4 +. ce +. ft 1 +\\$1 +. ft +. if !\\$2 .ll +.\} +.di +.ev +.nr QQ \\n(.t-\\n(dn-10u +.if \\n(QQ<0 .tm oops -- called Z too late on page \\n%! +.if \\n(QQ<0 .ex +.sp \\n(QQu +.mk Q2 +.ev 2 +.in 0 +.nf +.J +.fi +.rm J +.ll \\n(.lu +.ev +.sp |\\n(Qzu +.nr QQ \\n(Q2-0.8v +.EB \\n(QQu +.. +.\"====================================================== +.de KS +.\".tm KS: Not implemented yet +.. +.de KE +.\".tm KE: Not implemented yet +.. +.de KF +.\".tm KF: Not implemented yet +.. +.ds ' \h'\w'e'u*4/10'\z\(aa\h'-\w'e'u*4/10' +.de BE +.br +.. +.lt 6.5i +.de T1 +.ds T2 \\$1 +.. +.de A1 +.ds A2 \\$1 +.. +.nr P1 1.1i \" picture width +.nr P2 14u*\n(P1u/10u \" picture depth +.de BB +.in 0 +.\".nr QQ \\n(P2+0.1i +.\".ne \\n(QQu +.\".rs +.\".ll -\\n(P1u +.\".ll -0.1i +.\".po +\\n(.lu+0.1i +.\".sp 0.3 +.\" +.\".sp -0.8 +.\"\!H\\n(.o +.\".mk QQ +.\"\!V\\n(QQ +.\"\!DZ \\n(P1 \\n(P2 +.\".ie \\$1 .tm Picture not yet inserted for .BB +.\".el \!P \\$1 +.\".sp -0.3 +.\".po +.\".sp -1 +.\".if \\$1 \{\ +.\"\h'0.1i'\h'\\n(.lu'\D'l \\n(P1u 0'\D'l 0 \\n(P2u'\D'l -\\n(P1u 0'\D'l 0 -\\n(P2u' +.\".sp -1 +.\".\} +.\".sp 0.8 +.\".mk QQ +.\".nr QQ +\\n(P2u +.\".wh \\n(QQu Bb +.\"===== +.\" ::: .sp 1 +.\" ::: .ne 2 +.if \\n(SB=0 \{\ +.NH 1 +Author Information +.\} +.nr SB 1 +.PP +.. +.de Bb +'ch Bb +'ll +\\n(P1u +'ll +0.1i +.. +.de GS +.br +.. +.de GE +.. +.nr SL 0.3 +.nr LI 0.28i +.de BL \" begin list +.br +.sp \\n(SL +.in +\\n(LIu +.ll -0.1i +.if \\n(Ld \{\ +. ds Z\\n(Ld \\*(LT +. af LN 1 +. nr N\\n(Ld \\n(LN +. ds C\\n(Ld \\*(LC +.\} +.nr Ld +1 +.ds LT \\$1\" LT is the List Type: 1, a, or a bulletchar +.if \\$1 .if '\\n(Ld'1'.ds LT \(bu +.if \\$1 .if '\\n(Ld'2'.ds LT \(ci +.if \\$1 .if '\\n(Ld'3'.ds LT \(sq +.if '\\*(LT'1' .af LN \\$1 +.if '\\*(LT'i' .af LN \\$1 +.if '\\*(LT'I' .af LN \\$1 +.if '\\*(LT'a' .af LN \\$1 +.if '\\*(LT'A' .af LN \\$1 +.nr LN 0 \" LN is the list element number +.ds LC\\$2 +.\" LC is the optional bullet trailer... +.. +.de LE \" list element +.br +.ie '\\$1'' .nr LN +1 +.el \{\ +. nr LN 0 +. nr LN \\$1 +.\} +.ds LX \\*(LT\\*(LC +.if \\*(LT1 .ds LX \\n(LN\\*(LC +.if \\*(LTa .ds LX \\n(LN\\*(LC +.if \\*(LTA .ds LX \\n(LN\\*(LC +.if \\*(LTi .ds LX \\n(LN\\*(LC +.if \\*(LTI .ds LX \\n(LN\\*(LC +.if \\n(LN=0 \{\ +. if !'\\$1'' .ds LX \\$1\\*(LC +.\} +.nr QQ 3u*\w' 'u/2u +.ti -\\w'\\*(LX\h'\\n(QQu''u +\\*(LX\h'\\n(QQu'\c +.. +.de EL \" end list +.br +.nr Ld -1 +.if \\n(Ld>=0 \{\ +. ds LT \\*(Z\\n(Ld +. nr LN \\n(N\\n(Ld +. ds LC \\*(C\\n(Ld +.if '\\*(LT'1' .af LN \\*(LT +.if '\\*(LT'i' .af LN \\*(LT +.if '\\*(LT'I' .af LN \\*(LT +.if '\\*(LT'a' .af LN \\*(LT +.if '\\*(LT'A' .af LN \\*(LT +. \} +.in -\\n(LIu +.ll +0.1i +.. +.de F1 +.in 0 +\v'-0.4'\D'l \\n(.lu 0' +.sp -0.7 +.in +.. +.de F2 +.mk QQ +.if !'\\nT'\\n(QQ' \{\ +.in 0 +\v'-0.4'\D'l \\n(.lu 0' +.sp -0.4 +.in +.\} +.. +.de EM +.br +.if o \{\ +.ds A2 +.ds T2 +.rs +.bp +.ch ff +.ch fx +.PO +.rs +.sp |10.4i-\\n(FUu +.mk QQ +'ie e \{\ +. ev 2 +.if t 'tl \s10\f3%\\*(CC\fP\s0 +. ev +' \} +'el \{\ +. ev 2 +.if t 'tl \s10\f3\\*(CC%\fP\s0 +. ev +' \} +.\} +.. +.de RF +.sp 0.1 +.in 0.3i +.ie !\\$1 \{\ +.nr QQ \w'\\$1\ ' +.ti -\\n(QQu +\\$1\ \c +.\} +.el .ti 0 +.. +.de RZ +.sp 0.1 +.in 0.3i +.nr QQ \w'\\$1\ ' +.ti -\\n(QQu +\\$1\ \c +.. +.de zz +.tm note: .zz is not implemented. +.ex +.nr Z1 \\$1 +.nr Z2 \\$2 +.if \\n(.t<\\n(Z2 .tm note that figure ``\\$3'' does not fit at column bottom ------------------------ on page \\n% +.ie '\\n(.z'' \{\ +.sp 0.2 +.ne \\n(Z2u +\\!H\\n(.o +.mk QQ +.nr QQ +0.25v +\\!V\\n(QQ +\\!DZ \\n(Z1 \\n(Z2 +\\!P \\$3 +.rs +.sp \\n(Z2u +.sp 0.2 +.\} +.el \{\ +.sp 0.2 +\\!.z3 \\n(Z1 \\n(Z2 "\\$3" \\n(.o +.sp \\n(Z2u +.sp 0.2 +.\} +.. +.de z2 +.nr Z1 \\$1 +.nr Z2 \\$2 +.sp 0.2 +.ne \\n(Z2u +.nr QQ (\\n(.lu-\\$1)/2u +.sp \\n(Z2u +.vs 0 +.po +\\n(QQu +\X'ps: import \\$3 0 0 1 1 \\n(Z1 \\n(Z2' +.br +.po -\\n(QQu +.vs +.rs +.sp 0.2 +.. +.de sz +.vs \\$1 +.ps \\$1 +.. +.de M +\f2\\$1\f1\|(\\$2)\\$3 +.. +.de B1 +.br +.mk Bz +.. +.de B2 +.br +.mk By +.nr D \\n(Byu-\\n(Bzu +.nr L \\n(.lu+0.2i-\\n(.iu +\h'-0.1i'\v'-0.7v'\D'l \\nLu 0'\D'l 0 -\\nDu'\D'l -\\nLu 0'\D'l 0 \\nDu' +.sp -1 +.. +.de [] +.][ \\$1 +.. +.de ][ +.if \\$1>5 .tm Bad arg to [] +.[\\$1 +.. +.de [5 \" tm style +.FS +\\*([A, \\f2\\*([T\\f1, +.ie \\n(TN \\*([M. +.el Bell Laboratories internal memorandum (\\*([D). +.RT +.FE +.. +.de [0 \" other +.FS +.nr [: 0 +.if !\\*([F .FP \\*([F +.if !\\*([Q \{\ +.nr [: 1 +\\*([Q\c +.\} +.if !\\*([A \{\ +.nr [: 1 +\\*([A\c +.\} +.if !\\*([T \{\ +.if \\n([:>0 , +.nr [: 1 +\f2\\*([T\f1\c +.\} +.if !\\*([S , \\*([S\c +.if !\\*([V , \\*([V\c +.if !\\*([P \{\ +.ie \\n([P>0 , pp. \\*([P\c +.el , p. \\*([P\c +.\} +.if !\\*([C , \\*([C\c +.if !\\*([D , \\*([D\c +.if \\n([:>0 \&. +.if !\\*([O \\*([O +.FE +.. +.de [1 +.FS +.if !\\*([F .FP \\*([F +.if !\\*([Q \\*([Q, +.if !\\*([A \\*([A, +.if !\\*([T \\*([o\\*([T,\\*([c +\f2\\*([J\f1\c +.if !\\*([V , vol. \\*([V\c +.if !\\*([N , no. \\*([N\c +.if !\\*([P \{\ +.ie \\n([P>0 , pp. \\*([P\c +.el , p. \\*([P\c +.\} +.if !\\*([I , \\*([I\c +.if !\\*([C , \\*([C\c +.if !\\*([D , \\*([D\c +\&. +.if !\\*([O \\*([O +.FE +.. +.de [2 \" book +.FS +.if !\\*([F .FP \\*([F +.if !\\*([Q \\*([Q, +.if !\\*([A \\*([A, +.if !\\*([T \f2\\*([T,\f1 +.if !\\*([S \\*([S, +.if !\\*([V \\*([V, +.if !\\*([P \{\ +.ie \\n([P>0 pp. \\*([P, +.el p. \\*([P, +.\} +\\*([I\c +.if !\\*([C , \\*([C\c +.if !\\*([D , \\*([D\c +\&. +.if !\\*([O \\*([O +.FE +.. +.de [4 \" report +.FS +.if !\\*([F .FP \\*([F +.if !\\*([Q \\*([Q, +.if !\\*([A \\*([A, +.if !\\*([T \\*([o\\*([T,\\*([c +.if !\\*([R \\*([R\c +.if !\\*([G \& (\\*([G)\c +.if !\\*([P \{\ +.ie \\n([P>0 , pp. \\*([P\c +.el , p. \\*([P\c +.\} +.if !\\*([I , \\*([I\c +.if !\\*([C , \\*([C\c +.if !\\*([D , \\*([D\c +\&. +.if !\\*([O \\*([O +.FE +.. +.de [3 \" article in book +.FS +.if !\\*([F .FP \\*([F +.if !\\*([Q \\*([Q, +.if !\\*([A \\*([A, +.if !\\*([T \\*([o\\*([T,\\*([c +in \f2\\*([B\f1\c +.if !\\*([E , ed. \\*([E\c +.if !\\*([S , \\*([S\c +.if !\\*([V , vol. \\*([V\c +.if !\\*([P \{\ +.ie \\n([P>0 , pp. \\*([P\c +.el , p. \\*([P\c +.\} +.if !\\*([I , \\*([I\c +.if !\\*([C , \\*([C\c +.if !\\*([D , \\*([D\c +\&. +.if !\\*([O \\*([O +.FE +.. +.de [< +.]> +.. +.de ]< +.SH +References +.LP +.de FP +.\".IP \\\\$1. +.RZ \\\\$1. +\\.. +.rm FS FE +.. +.de [> +.]> +.. +.de ]> +.sp +.. +.de [- +.]- +.. +.de ]- +.rm [Q [A [T [J [B [E [S [V +.rm [N [P [I [C [D [O [R [G +.. +.de FG +.ds QQ \fB\\$1\\fP: \\$2 +.ie \w\\*(QQ>\\n(.l \{\ +.in +0.25i +.ti 0 +\\*(QQ +.in 0 +.\} +.el \{\ +.ce +\\*(QQ +.\} +.. +.1C diff --git a/share/doc/psd/01.cacm/Makefile b/share/doc/psd/01.cacm/Makefile new file mode 100644 index 0000000..14a2f70 --- /dev/null +++ b/share/doc/psd/01.cacm/Makefile @@ -0,0 +1,16 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/01.cacm +SRCS= stubs p.mac p1 p2 p3 p4 p5 p6 +EXTRA= ref.bib +MACROS= -ms +USE_REFER= +USE_TBL= +CLEANFILES= stubs + +stubs: + @(echo .R1; echo database ${.CURDIR}/ref.bib; \ + echo accumulate; echo .R2) > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/psd/01.cacm/p.mac b/share/doc/psd/01.cacm/p.mac new file mode 100644 index 0000000..5450b5d --- /dev/null +++ b/share/doc/psd/01.cacm/p.mac @@ -0,0 +1,31 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p.mac 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.de P1 +.DS +.. +.de P2 +.DE +.. +.de UL +.lg 0 +.if n .ul +\%\&\\$3\f3\\$1\fR\&\\$2 +.lg +.. +.de UC +\&\\$3\s-1\\$1\\s0\&\\$2 +.. +.de IT +.lg 0 +.if n .ul +\%\&\\$3\f2\\$1\fR\&\\$2 +.lg +.. +.de SP +.sp \\$1 +.. diff --git a/share/doc/psd/01.cacm/p1 b/share/doc/psd/01.cacm/p1 new file mode 100644 index 0000000..88987e0 --- /dev/null +++ b/share/doc/psd/01.cacm/p1 @@ -0,0 +1,567 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p1 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.OH 'The UNIX Time-Sharing System''PSD:1-%' +.EH 'PSD:1-%''The UNIX Time-Sharing System' +.ds n \s+2 +.hw above-mentioned +.ds s \s-2 +.ds m \v'-.3'.\v'.3' +.TL +The UNIX +Time-Sharing System\f1\s10\v'-.2n'*\v'.2n'\s0\fP +.AU +D. M. Ritchie and K. Thompson +.AB +.FS +* Copyright 1974, +Association for Computing Machinery, Inc., +reprinted by permission. +This is a revised version of an article +that appeared in Communications of the \*sACM\*n, +.IT 17 , +No. 7 (July 1974), pp. 365-375. +That article was a +revised version of a paper presented +at the Fourth \*sACM\*n Symposium on Operating +Systems Principles, +\*sIBM\*n Thomas J. Watson Research Center, +Yorktown Heights, +New York, +October 15-17, 1973. +.FE +.UX +is a general-purpose, multi-user, interactive +operating system for the larger Digital Equipment Corporation +\*sPDP\*n-11 and +the Interdata 8/32 computers. +It offers a number of features +seldom found even in larger operating +systems, including +.IP i +A hierarchical file system incorporating +demountable volumes, +.IP ii +Compatible file, device, and inter-process I/O, +.IP iii +The ability to initiate asynchronous processes, +.IP iv +System command language selectable on a per-user basis, +.IP v +Over 100 subsystems including a dozen languages, +.IP vi +High degree of portability. +.LP +This paper discusses the nature +and implementation of the file system +and of the user command interface. +.AE +.NH +INTRODUCTION +.PP +There have been four versions of +the +.UX +time-sharing system. +.hy 12 +The earliest (circa 1969-70) ran on +the Digital Equipment Corporation \*sPDP\*n-7 and -9 computers. +The second version ran on the unprotected +\*sPDP\*n-11/20 computer. +The third incorporated multiprogramming and ran +on the \*sPDP\*n-11/34, /40, /45, /60, and /70 computers; +it is the one described in the previously published version +of this paper, and is also the most widely used today. +.hy 14 +This paper describes only the +fourth, current +system that runs on the \*sPDP\*n-11/70 and the +Interdata 8/32 computers. +In fact, the differences among the various systems is +rather small; +most of the revisions made to the originally published version of this +paper, +aside from those concerned with style, +had to do with details of the implementation of the file system. +.PP +Since +\*sPDP\*n-11 +.UX +became operational +in February, 1971, +over 600 installations have been put into service. +Most of them are engaged in applications such as +computer science education, +the preparation and formatting of documents +and other textual material, +the collection and processing of trouble data +from various switching machines within the Bell System, +and recording and checking telephone service +orders. +Our own installation is used mainly for research +in operating systems, languages, +computer networks, +and other topics in computer science, and also for +document preparation. +.PP +Perhaps the most important achievement of +.UX +is to demonstrate +that +a powerful operating system for interactive use +need not be expensive either in equipment or in human +effort: +it +can run on hardware costing as little as $40,000, and +less than two man-years were spent on the main system +software. +We hope, however, that users find +that the +most important characteristics of the system +are its simplicity, elegance, and ease of use. +.PP +Besides the operating system proper, some major programs +available under +.UX +are +.DS +.nf +C compiler +Text editor based on \*sQED\*n +.[ +qed lampson +.] +Assembler, linking loader, symbolic debugger +Phototypesetting and equation setting programs +.[ +cherry kernighan typesetting mathematics cacm +.] +.[ +kernighan lesk ossanna document preparation bstj +%Q This issue +.] +.fi +.in +3n +.ll -5n +.ti -3n +Dozens of languages including +Fortran 77, Basic, Snobol, \*sAPL\*n, Algol 68, M6, \*sTMG\*n, Pascal +.in +.ll +.DE +There is a host of maintenance, utility, recreation and novelty programs, +all written locally. +The +.UX +user community, which numbers in the thousands, +has contributed many more programs and languages. +It is worth noting that the system is totally self-supporting. +All +.UX +software is maintained on +the +system; +likewise, this paper and all other +documents +in this issue +were generated and formatted by the +.UX +editor and text formatting +programs. +.SH +II. HARDWARE AND SOFTWARE ENVIRONMENT +.PP +The \*sPDP\*n-11/70 on which the Research +.UX +system is installed is a 16-bit +word (8-bit byte) computer with 768K bytes of core memory; +the system kernel +occupies 90K bytes +about equally divided between code +and data tables. +This system, however, includes a very large number of +device drivers +and enjoys a generous allotment +of space for I/O buffers and system tables; +a minimal system capable of running the software +mentioned above can +require as little as 96K bytes +of core altogether. +There are even larger installations; +see the description of the +\*sPWB/UNIX\*n systems, +.[ +dolotta mashey workbench software engineering +.] +.[ +dolotta haight mashey workbench bstj +%Q This issue +.] +for example. +There are also much smaller, though somewhat restricted, +versions of the system. +.[ +lycklama microprocessor bstj +%Q This issue +.] +.PP +Our own \*sPDP\*n-11 has two +200-Mb moving-head disks +for file system storage and swapping. +There are 20 variable-speed +communications interfaces +attached to 300- and 1200-baud data sets, +and an additional 12 communication lines +hard-wired to 9600-baud terminals and +satellite computers. +There are also several 2400- and 4800-baud +synchronous communication interfaces +used for machine-to-machine file transfer. +Finally, there is a variety +of miscellaneous +devices including +nine-track magnetic tape, +a line printer, +a voice synthesizer, +a phototypesetter, +a digital switching network, +and a chess machine. +.PP +The preponderance of +.UX +software is written in the +abovementioned C language. +.[ +c programming language kernighan ritchie prentice-hall +.] +Early versions of the operating system were written in assembly language, +but during the summer of 1973, it was rewritten in C. +The size of the new system was about one-third greater +than that of the old. +Since the new system not only became much easier to +understand and to modify but also +included +many functional improvements, +including multiprogramming and the ability to +share reentrant code among several user programs, +we consider this increase in size quite acceptable. +.SH +III. THE FILE SYSTEM +.PP +The most important role of +the system +is to provide +a file system. +From the point of view of the user, there +are three kinds of files: ordinary disk files, +directories, and special files. +.SH +3.1 Ordinary files +.PP +A file +contains whatever information the user places on it, +for example, symbolic or binary +(object) programs. +No particular structuring is expected by the system. +A file of text consists simply of a string +of characters, with lines demarcated by the newline character. +Binary programs are sequences of words as +they will appear in core memory when the program +starts executing. +A few user programs manipulate files with more +structure; +for example, the assembler generates, and the loader +expects, an object file in a particular format. +However, +the structure of files is controlled by +the programs that use them, not by the system. +.SH +3.2 Directories +.PP +Directories provide +the mapping between the names of files +and the files themselves, and thus +induce a structure on the file system as a whole. +Each user has a directory of his own files; +he may also create subdirectories to contain +groups of files conveniently treated together. +A directory behaves exactly like an ordinary file except that it +cannot be written on by unprivileged programs, so that the system +controls the contents of directories. +However, anyone with +appropriate permission may read a directory just like any other file. +.PP +The system maintains several directories +for its own use. +One of these is the +.UL root +directory. +All files in the system can be found by tracing +a path through a chain of directories +until the desired file is reached. +The starting point for such searches is often the +.UL root . +Other system directories contain all the programs provided +for general use; that is, all the +.IT commands . +As will be seen, however, it is by no means necessary +that a program reside in one of these directories for it +to be executed. +.PP +Files are named by sequences of 14 or +fewer characters. +When the name of a file is specified to the +system, it may be in the form of a +.IT path +.IT name , +which +is a sequence of directory names separated by slashes, ``/\^'', +and ending in a file name. +If the sequence begins with a slash, the search begins in the +root directory. +The name +.UL /alpha/beta/gamma +causes the system to search +the root for directory +.UL alpha , +then to search +.UL alpha +for +.UL beta , +finally to find +.UL gamma +in +.UL beta . +.UL \&gamma +may be an ordinary file, a directory, or a special +file. +As a limiting case, the name ``/\^'' refers to the root itself. +.PP +A path name not starting with ``/\^'' causes the system to begin the +search in the user's current directory. +Thus, the name +.UL alpha/beta +specifies the file named +.UL beta +in +subdirectory +.UL alpha +of the current +directory. +The simplest kind of name, for example, +.UL alpha , +refers to a file that itself is found in the current +directory. +As another limiting case, the null file name refers +to the current directory. +.PP +The same non-directory file may appear in several directories under +possibly different names. +This feature is called +.IT linking ; +a directory entry for a file is sometimes called a link. +The +.UX +system +differs from other systems in which linking is permitted +in that all links to a file have equal status. +That is, a file does not exist within a particular directory; +the directory entry for a file consists merely +of its name and a pointer to the information actually +describing the file. +Thus a file exists independently of any +directory entry, although in practice a file is made to +disappear along with the last link to it. +.PP +Each directory always has at least two entries. +The name +``\|\fB.\|\fP'' in each directory refers to the directory itself. +Thus a program +may read the current directory under the name ``\fB\|.\|\fP'' without knowing +its complete path name. +The name ``\fB\|.\|.\|\fP'' by convention refers to the parent of the +directory in which it appears, that is, to the directory in which +it was created. +.PP +The directory structure is constrained to have the form +of a rooted tree. +Except for the special entries ``\|\fB\|.\|\fP'' and ``\fB\|.\|.\|\fP'', each directory +must appear as an entry in exactly one other directory, which is its +parent. +The reason for this is to simplify the writing of programs +that visit subtrees of the directory structure, and more +important, to avoid the separation of portions of the hierarchy. +If arbitrary links to directories were permitted, it would +be quite difficult to detect when the last connection from +the root to a directory was severed. +.SH +3.3 Special files +.PP +Special files constitute the most unusual feature of the +.UX +file system. +Each supported I/O device +is associated with at least one such file. +Special files are read and written just like ordinary +disk files, but requests to read or write result in activation of the associated +device. +An entry for each special file resides in directory +.UL /dev , +although a link may be made to one of these files +just as it may to an ordinary file. +Thus, for example, +to write on a magnetic tape +one may write on the file +.UL /dev/mt . +Special files exist for each communication line, each disk, +each tape drive, +and for physical main memory. +Of course, +the active disks +and the memory special file are protected from +indiscriminate access. +.PP +There is a threefold advantage in treating +I/O devices this way: +file and device I/O +are as similar as possible; +file and device names have the same +syntax and meaning, so that +a program expecting a file name +as a parameter can be passed a device +name; finally, +special files are subject to the same +protection mechanism as regular files. +.SH +3.4 Removable file systems +.PP +Although the root of the file system is always stored on the same +device, +it is not necessary that the entire file system hierarchy +reside on this device. +There is a +.UL mount +system request with two arguments: +the name of an existing ordinary file, and the name of a special +file whose associated +storage volume (e.g., a disk pack) should have the structure +of an independent file system +containing its own directory hierarchy. +The effect of +.UL mount +is to cause +references to the heretofore ordinary file +to refer instead to the root directory +of the file system on the removable volume. +In effect, +.UL mount +replaces a leaf of the hierarchy tree (the ordinary file) +by a whole new subtree (the hierarchy stored on the +removable volume). +After the +.UL mount , +there is virtually no distinction +between files on the removable volume and those in the +permanent file system. +In our installation, for example, +the root directory resides +on a small partition of one of +our disk drives, +while the other drive, +which contains the user's files, +is mounted by the system initialization +sequence. +A mountable file system is generated by +writing on its corresponding special file. +A utility program is available to create +an empty file system, +or one may simply copy an existing file system. +.PP +There is only one exception to the rule of identical +treatment of files on different devices: +no link may exist between one file system hierarchy and +another. +This restriction is enforced so as to avoid +the elaborate bookkeeping +that would otherwise be required to assure removal of the links +whenever the removable volume is dismounted. +.SH +3.5 Protection +.PP +Although the access control scheme +is quite simple, it has some unusual features. +Each user of the system is assigned a unique +user identification number. +When a file is created, it is marked with +the user \*sID\*n of its owner. +Also given for new files +is a set of ten protection bits. +Nine of these specify +independently read, write, and execute permission +for the +owner of the file, +for other members of his group, +and for all remaining users. +.PP +If the tenth bit is on, the system +will temporarily change the user identification +(hereafter, user \*sID\*n) +of the current user to that of the creator of the file whenever +the file is executed as a program. +This change in user \*sID\*n is effective only +during the execution of the program that calls for it. +The set-user-\*sID\*n feature provides +for privileged programs that may use files +inaccessible to other users. +For example, a program may keep an accounting file +that should neither be read nor changed +except by the program itself. +If the set-user-\*sID\*n bit is on for the +program, it may access the file although +this access might be forbidden to other programs +invoked by the given program's user. +Since the actual user \*sID\*n +of the invoker of any program +is always available, +set-user-\*sID\*n programs +may take any measures desired to satisfy themselves +as to their invoker's credentials. +This mechanism is used to allow users to execute +the carefully written +commands +that call privileged system entries. +For example, there is a system entry +invokable only by the ``super-user'' (below) +that creates +an empty directory. +As indicated above, directories are expected to +have entries for ``\fB\|.\|\fP'' and ``\fB\|.\|.\|\fP''. +The command which creates a directory +is owned by the super-user +and has the set-user-\*sID\*n bit set. +After it checks its invoker's authorization to +create the specified directory, +it creates it and makes the entries +for ``\fB\|.\|\fP'' and ``\fB\|.\|.\|\fP''. +.PP +Because anyone may set the set-user-\*sID\*n +bit on one of his own files, +this mechanism is generally +available without administrative intervention. +For example, +this protection scheme easily solves the \*sMOO\*n +accounting problem posed by ``Aleph-null.'' +.[ +aleph null software practice +.] +.PP +The system recognizes one particular user \*sID\*n (that of the ``super-user'') as +exempt from the usual constraints on file access; thus (for example), +programs may be written to dump and reload the file +system without +unwanted interference from the protection +system. diff --git a/share/doc/psd/01.cacm/p2 b/share/doc/psd/01.cacm/p2 new file mode 100644 index 0000000..8373c0e --- /dev/null +++ b/share/doc/psd/01.cacm/p2 @@ -0,0 +1,448 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p2 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +3.6 I/O calls +.PP +The system calls to do I/O are designed to eliminate +the differences between the various devices and styles of +access. +There is no distinction between ``random'' +and ``sequential'' I/O, nor is any logical record size imposed +by the system. +The size of an ordinary file is determined +by the number of bytes written on it; +no predetermination of the size of a file is necessary +or possible. +.PP +To illustrate the essentials of I/O, +some of the basic calls are +summarized below +in an anonymous language that will +indicate the required parameters without getting into the +underlying +complexities. +Each call to the system may potentially result in an error +return, which for simplicity is not represented +in the calling sequence. +.PP +To read or write a file assumed to exist already, it must +be opened by the following call: +.P1 +filep = open\|(\|name, flag\|) +.P2 +where +.UL name +indicates the name of the file. +An arbitrary path name may be given. +The +.UL flag +argument indicates whether the file is to be read, written, +or ``updated,'' that is, read and written simultaneously. +.PP +The returned value +.UL filep +is called a +.IT "file descriptor" . +It is a small integer used to identify the file +in subsequent calls to read, write, +or otherwise manipulate the file. +.PP +To create a new file or completely rewrite an old one, +there is a +.UL create +system call that +creates the given file if it does not exist, +or truncates it to zero length +if it does exist; +.UL create +also opens the new file for writing +and, like +.UL open , +returns a file descriptor. +.PP +The file system maintains no locks visible to the user, nor is there any +restriction on the number of users who may have a file +open for reading or writing. +Although it is possible for the contents of a file +to become scrambled when two users write on it simultaneously, +in practice difficulties do not arise. +We take the view that locks are neither +necessary nor sufficient, in our environment, +to prevent interference between users of the same file. +They are unnecessary because we are not +faced with large, single-file data bases +maintained by independent processes. +They are insufficient because +locks in the ordinary sense, whereby +one user is prevented from writing on a file that another +user is reading, +cannot prevent confusion +when, for example, both users are editing +a file with an editor that makes +a copy of the file being edited. +.PP +There are, however, +sufficient internal interlocks to maintain +the logical consistency of the file system +when two users engage simultaneously in +activities such as writing on +the same file, +creating files in the same directory, +or deleting each other's open files. +.PP +Except as indicated below, reading and writing +are sequential. +This means that if a particular +byte in the file was the last byte written (or read), +the next I/O call implicitly refers to the +immediately following byte. +For each open file there is a pointer, maintained +inside the system, +that indicates the next byte to be read +or written. +If +.IT n +bytes are read or written, the pointer advances +by +.IT n +bytes. +.PP +Once a file is open, the following calls +may be used: +.P1 +n = read\|(\|filep, buffer, count\|) +n = write\|(\|filep, buffer, count\|) +.P2 +Up to +.UL count +bytes are transmitted between the file specified +by +.UL filep +and the byte array +specified by +.UL buffer . +The returned value +.UL n +is the number of bytes actually transmitted. +In the +.UL write +case, +.UL n +is the same as +.UL count +except under exceptional conditions, such as I/O errors or +end of physical medium on special files; +in a +.UL read , +however, +.UL n +may without error be less than +.UL count . +If the read pointer is so near the end of the +file that reading +.UL count +characters +would cause reading beyond the end, only sufficient +bytes are transmitted to reach the end of the +file; +also, typewriter-like terminals +never return more than one line of input. +When a +.UL read +call returns with +.UL n +equal +to zero, the end of the file has been reached. +For disk files this occurs when the read pointer +becomes equal to the current +size of the file. +It is possible to generate an end-of-file +from a terminal by use of an escape +sequence that depends on the device used. +.PP +Bytes written affect only those parts of a file implied by +the position of the write pointer and the +count; no other part of the file +is changed. +If the last byte lies beyond the end of the file, the +file is made to grow as needed. +.PP +To do random (direct-access) I/O +it is only necessary to move the read or write pointer +to the appropriate location in the file. +.P1 +location = lseek\|(\|filep, offset, base\|) +.P2 +The pointer +associated with +.UL filep +is moved to a position +.UL offset +bytes from the beginning of the file, from the current position +of the pointer, or from the end of the file, +depending on +.UL base. +.UL \&offset +may be negative. +For some devices (e.g., paper +tape and +terminals) seek calls are +ignored. +The actual offset from the beginning of the file +to which the pointer was moved is returned +in +.UL location . +.PP +There are several additional system entries +having to do with I/O and with the file +system that will not be discussed. +For example: +close a file, +get the status of a file, +change the protection mode or the owner +of a file, +create a directory, +make a link to an existing file, +delete a file. +.SH +IV. IMPLEMENTATION OF THE FILE SYSTEM +.PP +As mentioned in Section 3.2 above, a directory entry contains +only a name for the associated file and a pointer to the +file itself. +This pointer is an integer called the +.IT i-number +(for index number) +of the file. +When the file is accessed, +its i-number is used as an index into +a system table (the +.IT i-list \|) +stored in a known +part of the device on which +the directory resides. +The entry found thereby (the file's +.IT i-node \|) +contains +the description of the file: +.IP i +the user and group-\*sID\*n of its owner +.IP ii +its protection bits +.IP iii +the physical disk or tape addresses for the file contents +.IP iv +its size +.IP v +time of creation, last use, and last modification +.IP vi +the number of links to the file, that is, the number of times it appears in a directory +.IP vii +a code indicating whether the file is a directory, an ordinary file, or a special file. +.LP +The purpose of an +.UL open +or +.UL create +system call is to turn the path name given by the user +into an i-number +by searching the explicitly or implicitly named directories. +Once a file is open, +its device, i-number, and read/write pointer are stored in a system table +indexed by the file descriptor returned by the +.UL open +or +.UL create . +Thus, during a subsequent +call to read or write the +file, +the descriptor +may be easily related to the information necessary to access the file. +.PP +When a new file is created, +an i-node is allocated for it and a directory entry is made +that contains the name of the file and the i-node +number. +Making a link to an existing file involves +creating a directory entry with the new name, +copying the i-number from the original file entry, +and incrementing the link-count field of the i-node. +Removing (deleting) a file is done by +decrementing the +link-count of the i-node specified by its directory entry +and erasing the directory entry. +If the link-count drops to 0, +any disk blocks in the file +are freed and the i-node is de-allocated. +.PP +The space on all disks that +contain a file system is divided into a number of +512-byte +blocks logically addressed from 0 up to a limit that +depends on the device. +There is space in the i-node of each file for 13 device addresses. +For nonspecial files, +the first 10 device addresses point at the first +10 blocks of the file. +If the file is larger than 10 blocks, +the 11 device address points to an +indirect block containing up to 128 addresses +of additional blocks in the file. +Still larger files use the twelfth device address +of the i-node to point to +a double-indirect block naming +128 indirect blocks, +each +pointing to 128 blocks of the file. +If required, +the thirteenth device address is +a triple-indirect block. +Thus files may conceptually grow to +[\|(10+128+128\u\s62\s0\d+128\u\s63\s0\d)\*m512\|] bytes. +Once opened, +bytes numbered below 5120 can be read with a single +disk access; +bytes in the range 5120 to 70,656 +require two accesses; +bytes in the range 70,656 +to 8,459,264 +require three accesses; +bytes from there to the +largest file +(1,082,201,088) +require four accesses. +In practice, +a device cache mechanism +(see below) +proves effective in eliminating +most of the indirect fetches. +.PP +The foregoing discussion applies to ordinary files. +When an I/O request is made to a file whose i-node indicates that it +is special, +the last 12 device address words are immaterial, +and the first specifies +an internal +.IT "device name" , +which is interpreted as a pair of numbers +representing, +respectively, a device type +and subdevice number. +The device type indicates which +system routine will deal with I/O on that device; +the subdevice number selects, for example, a disk drive +attached to a particular controller or one of several +similar terminal interfaces. +.PP +In this environment, the implementation of the +.UL mount +system call (Section 3.4) is quite straightforward. +.UL \&mount +maintains a system table whose +argument is the i-number and device name of the +ordinary file specified +during the +.UL mount , +and whose corresponding value is the +device name of the indicated special file. +This table is searched for each i-number/device pair +that turns up while a path name is being scanned +during an +.UL open +or +.UL create ; +if a match is found, +the i-number is replaced by the i-number of the root +directory +and the device name is replaced by the table value. +.PP +To the user, both reading and writing of files appear to +be synchronous and unbuffered. +That is, immediately after +return from a +.UL read +call the data are available; conversely, +after a +.UL write +the user's workspace may be reused. +In fact, the system maintains a rather complicated +buffering mechanism that reduces greatly the number +of I/O operations required to access a file. +Suppose a +.UL write +call is made specifying transmission +of a single byte. +The system +will search its buffers to see +whether the affected disk block currently resides in main memory; +if not, it will be read in from the device. +Then the affected byte is replaced in the buffer and an +entry is made in a list of blocks to be written. +The return from the +.UL write +call may then take place, +although the actual I/O may not be completed until a later time. +Conversely, if a single byte is read, the system determines +whether the secondary storage block in which the byte is located is already +in one of the system's buffers; if so, the byte can be returned immediately. +If not, the block is read into a buffer and the byte picked out. +.PP +The system recognizes when +a program has +made accesses to +sequential blocks of a file, +and asynchronously +pre-reads the next block. +This significantly reduces +the running time of most programs +while adding little to +system overhead. +.PP +A program that reads or writes files in units of 512 bytes +has an advantage over a program that reads or writes +a single byte at a time, but the gain is not immense; +it comes mainly from the avoidance of system overhead. +If a program is used rarely or does +no great volume of I/O, it may quite reasonably +read and write in units as small as it wishes. +.PP +The notion of the i-list is an unusual feature +of +.UX . +In practice, this method of organizing the file system +has proved quite reliable and easy to deal with. +To the system itself, one of its strengths is +the fact that each file has a short, unambiguous name +related in a simple way to the protection, addressing, +and other information needed to access the file. +It also permits a quite simple and rapid +algorithm for checking the consistency of a file system, +for example, verification +that the portions of each device containing useful information +and those free to be allocated are disjoint and together +exhaust the space on the device. +This algorithm is independent +of the directory hierarchy, because it need only scan +the linearly organized i-list. +At the same time the notion of the i-list induces certain +peculiarities not found in other file system organizations. +For example, there is the question of who is to be charged +for the space a file occupies, +because all directory entries for a file have equal status. +Charging the owner of a file is unfair in general, +for one user may create a file, another may link to +it, and the first user may delete the file. +The first user is still the owner of the +file, but it should be charged +to the second user. +The simplest reasonably fair algorithm +seems to be to spread the charges +equally among users who have links to a file. +Many installations +avoid the +issue by not charging any fees at all. diff --git a/share/doc/psd/01.cacm/p3 b/share/doc/psd/01.cacm/p3 new file mode 100644 index 0000000..2dc86d2 --- /dev/null +++ b/share/doc/psd/01.cacm/p3 @@ -0,0 +1,190 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p3 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +V. PROCESSES AND IMAGES +.PP +An +.IT image +is a computer execution environment. +It includes a memory image, +general register values, +status of open files, +current directory and the like. +An image is the current state of a pseudo-computer. +.PP +A +.IT process +is the execution of an image. +While the processor is executing on behalf of a process, +the image must reside in main memory; +during the execution of other processes it remains in main memory +unless the appearance of an active, higher-priority +process +forces it to be swapped out to the disk. +.PP +The user-memory part of an image is divided into three logical segments. +The program text segment begins at location 0 in the virtual address space. +During execution, this segment is write-protected +and a single copy of it is shared among +all processes executing the same program. +At the first hardware protection byte boundary above the program text segment in the +virtual address space begins a non-shared, writable data segment, +the size of which may be extended by a system call. +Starting at the highest +address in the virtual address space is a stack segment, +which automatically grows downward +as the stack pointer fluctuates. +.SH +5.1 Processes +.PP +Except while +the system +is bootstrapping itself into operation, a new +process can come into existence only +by use of the +.UL fork +system call: +.P1 +processid = fork\|(\|\|)\| +.P2 +When +.UL fork +is executed, the process +splits into two independently executing processes. +The two processes have independent +copies of the original memory image, +and share all open files. +The new processes differ only in that one is considered +the parent process: +in the parent, +the returned +.UL processid +actually identifies the child process +and is never 0, +while in the child, +the returned value is always 0. +.PP +Because the values returned by +.UL fork +in the parent and child process are distinguishable, +each process may determine whether +it is the parent or child. +.SH +5.2 Pipes +.PP +Processes may communicate +with related processes using the same system +.UL read +and +.UL write +calls that are used for file-system I/O. +The call: +.P1 +filep = pipe\|(\|\|)\| +.P2 +returns a file descriptor +.UL filep +and +creates an inter-process channel called a +.IT pipe . +This channel, like other open files, is passed from parent to child process in +the image by the +.UL fork +call. +A +.UL read +using a pipe file descriptor +waits until another process writes using the +file descriptor for the same pipe. +At this point, data are passed between the images of the +two processes. +Neither process need know that a pipe, +rather than an ordinary file, +is involved. +.PP +Although +inter-process communication +via pipes is a quite valuable tool +(see Section 6.2), +it is not a completely general +mechanism, +because the pipe must be set up by a common ancestor +of the processes involved. +.SH +5.3 Execution of programs +.PP +Another major system primitive +is invoked by +.P1 +execute\|(\|file, arg\*s\d1\u\*n, arg\*s\d2\u\*n, .\|.\|. , arg\*s\dn\u\*n\|)\| +.P2 +which requests the system to read in and execute the program +named by +.UL file , +passing it string arguments +.UL arg\v'.3'\*s1\*n\v'-.3'\| , +.UL arg\v'.3'\*s2\*n\v'-.3'\| , +.UL .\|.\|.\|\| , +.UL arg\v'.3'\*sn\*n\v'-.3' . +All the code and data in the process invoking +.UL execute +is replaced from the +.UL file , +but +open files, current directory, and +inter-process relationships are unaltered. +Only if the call fails, for example +because +.UL file +could not be found or because +its execute-permission bit was not set, does a return +take place from the +.UL execute +primitive; +it resembles a ``jump'' machine instruction +rather than a subroutine call. +.SH +5.4 Process synchronization +.PP +Another process control system call: +.P1 +processid = wait\|(\|status\|)\| +.P2 +causes its caller to suspend +execution until one of its children has completed execution. +Then +.UL wait +returns the +.UL processid +of the terminated process. +An error return is taken if the calling process has no +descendants. +Certain status from the child process +is also available. +.SH +5.5 Termination +.PP +Lastly: +.P1 +exit\|(\|status\|)\| +.P2 +terminates a process, +destroys its image, +closes its open files, +and generally obliterates it. +The parent is notified through +the +.UL wait +primitive, +and +.UL status +is made available +to it. +Processes may also terminate as a result of +various illegal actions or user-generated signals +(Section VII below). diff --git a/share/doc/psd/01.cacm/p4 b/share/doc/psd/01.cacm/p4 new file mode 100644 index 0000000..09adb2b --- /dev/null +++ b/share/doc/psd/01.cacm/p4 @@ -0,0 +1,524 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p4 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +VI. THE SHELL +.PP +For most users, +communication with +the system +is carried on with the +aid of a program called the \&shell. +The \&shell is a +command-line interpreter: it reads lines typed by the user and +interprets them as requests to execute +other programs. +(The \&shell is described fully elsewhere, +.[ +bourne shell bstj +%Q This issue +.] +so this section will discuss only the theory of its operation.) +In simplest form, a command line consists of the command +name followed by arguments to the command, all separated +by spaces: +.P1 +command arg\*s\d1\u\*n arg\*s\d2\u\*n .\|.\|. arg\*s\dn\u\*n +.P2 +The \&shell splits up the command name and the arguments into +separate strings. +Then a file with name +.UL command +is sought; +.UL command +may be a path name including the ``/'' character to +specify any file in the system. +If +.UL command +is found, it is brought into +memory and executed. +The arguments +collected by the \&shell are accessible +to the command. +When the command is finished, the \&shell +resumes its own execution, and indicates its readiness +to accept another command by typing a prompt character. +.PP +If file +.UL command +cannot be found, +the \&shell generally prefixes a string +such as +.UL /\|bin\|/ +to +.UL command +and +attempts again to find the file. +Directory +.UL /\|bin +contains commands +intended to be generally used. +(The sequence of directories to be searched +may be changed by user request.) +.SH +6.1 Standard I/O +.PP +The discussion of I/O in Section III above seems to imply that +every file used by a program must be opened or created by the program in +order to get a file descriptor for the file. +Programs executed by the \&shell, however, start off with +three open files with file descriptors +0, 1, and 2. +As such a program begins execution, file 1 is open for writing, +and is best understood as the standard output file. +Except under circumstances indicated below, this file +is the user's terminal. +Thus programs that wish to write informative +information ordinarily use file descriptor 1. +Conversely, file 0 starts off open for reading, and programs that +wish to read messages typed by the user +read this file. +.PP +The \&shell is able to change the standard assignments of +these file descriptors from the +user's terminal printer and keyboard. +If one of the +arguments to a command is prefixed by ``>'', file descriptor +1 will, for the duration of the command, refer to the +file named after the ``>''. +For example: +.P1 +ls +.P2 +ordinarily lists, on the typewriter, the names of the files in the current +directory. +The command: +.P1 +ls >there +.P2 +creates a file called +.UL there +and places the listing there. +Thus the argument +.UL >there +means +``place output on +.UL there .'' +On the other hand: +.P1 +ed +.P2 +ordinarily enters the editor, which takes requests from the +user via his keyboard. +The command +.P1 +ed <script +.P2 +interprets +.UL script +as a file of editor commands; +thus +.UL <script +means ``take input from +.UL script .'' +.PP +Although the file name following ``<'' or ``>'' appears +to be an argument to the command, in fact it is interpreted +completely by the \&shell and is not passed to the +command at all. +Thus no special coding to handle I/O redirection is needed within each +command; the command need merely use the standard file +descriptors 0 and 1 where appropriate. +.PP +File descriptor 2 is, like file 1, +ordinarily associated with the terminal output stream. +When an output-diversion request with ``>'' is specified, +file 2 remains attached to the terminal, so that commands +may produce diagnostic messages that +do not silently end up in the output file. +.SH +6.2 Filters +.PP +An extension of the standard I/O notion is used +to direct output from one command to +the input of another. +A sequence of commands separated by +vertical bars causes the \&shell to +execute all the commands simultaneously and to arrange +that the standard output of each command +be delivered to the standard input of +the next command in the sequence. +Thus in the command line: +.P1 +ls | pr \(mi2 | opr +.P2 +.UL ls +lists the names of the files in the current directory; +its output is passed to +.UL pr , +which +paginates its input with dated headings. +(The argument ``\(mi2'' requests +double-column output.) +Likewise, the output from +.UL pr +is input to +.UL opr ; +this command spools its input onto a file for off-line +printing. +.PP +This procedure could have been carried out +more clumsily by: +.P1 +ls >temp1 +pr \(mi2 <temp1 >temp2 +opr <temp2 +.P2 +followed by removal of the temporary files. +In the absence of the ability +to redirect output and input, +a still clumsier method would have been to +require the +.UL ls +command +to accept user requests to paginate its output, +to print in multi-column format, and to arrange +that its output be delivered off-line. +Actually it would be surprising, and in fact +unwise for efficiency reasons, +to expect authors of +commands such as +.UL ls +to provide such a wide variety of output options. +.PP +A program +such as +.UL pr +which copies its standard input to its standard output +(with processing) +is called a +.IT filter . +Some filters that we have found useful +perform +character transliteration, +selection of lines according to a pattern, +sorting of the input, +and encryption and decryption. +.SH +6.3 Command separators; multitasking +.PP +Another feature provided by the \&shell is relatively straightforward. +Commands need not be on different lines; instead they may be separated +by semicolons: +.P1 +ls; ed +.P2 +will first list the contents of the current directory, then enter +the editor. +.PP +A related feature is more interesting. +If a command is followed +by ``\f3&\f1,'' the \&shell will not wait for the command to finish before +prompting again; instead, it is ready immediately +to accept a new command. +For example: +.bd 3 +.P1 +as source >output & +.P2 +causes +.UL source +to be assembled, with diagnostic +output going to +.UL output ; +no matter how long the +assembly takes, the \&shell returns immediately. +When the \&shell does not wait for +the completion of a command, +the identification number of the +process running that command is printed. +This identification may be used to +wait for the completion of the command or to +terminate it. +The ``\f3&\f1'' may be used +several times in a line: +.P1 +as source >output & ls >files & +.P2 +does both the assembly and the listing in the background. +In these examples, an output file +other than the terminal was provided; if this had not been +done, the outputs of the various commands would have been +intermingled. +.PP +The \&shell also allows parentheses in the above operations. +For example: +.P1 +(\|date; ls\|) >x & +.P2 +writes the current date and time followed by +a list of the current directory onto the file +.UL x . +The \&shell also returns immediately for another request. +.SH 1 +6.4 The \&shell as a command; command files +.PP +The \&shell is itself a command, and may be called recursively. +Suppose file +.UL tryout +contains the lines: +.P1 +as source +mv a.out testprog +testprog +.P2 +The +.UL mv +command causes the file +.UL a.out +to be renamed +.UL testprog. +.UL \&a.out +is the (binary) output of the assembler, ready to be executed. +Thus if the three lines above were typed on the keyboard, +.UL source +would be assembled, the resulting program renamed +.UL testprog , +and +.UL testprog +executed. +When the lines are in +.UL tryout , +the command: +.P1 +sh <tryout +.P2 +would cause the \&shell +.UL sh +to execute the commands +sequentially. +.PP +The \&shell has further capabilities, including the +ability to substitute parameters +and +to construct argument lists from a specified +subset of the file names in a directory. +It also provides general conditional and looping constructions. +.SH 1 +6.5 Implementation of the \&shell +.PP +The outline of the operation of the \&shell can now be understood. +Most of the time, the \&shell +is waiting for the user to type a command. +When the +newline character ending the line +is typed, the \&shell's +.UL read +call returns. +The \&shell analyzes the command line, putting the +arguments in a form appropriate for +.UL execute . +Then +.UL fork +is called. +The child process, whose code +of course is still that of the \&shell, attempts +to perform an +.UL execute +with the appropriate arguments. +If successful, this will bring in and start execution of the program whose name +was given. +Meanwhile, the other process resulting from the +.UL fork , +which is the +parent process, +.UL wait s +for the child process to die. +When this happens, the \&shell knows the command is finished, so +it types its prompt and reads the keyboard to obtain another +command. +.PP +Given this framework, the implementation of background processes +is trivial; whenever a command line contains ``\f3&\f1,'' +the \&shell merely refrains from waiting for the process +that it created +to execute the command. +.PP +Happily, all of this mechanism meshes very nicely with +the notion of standard input and output files. +When a process is created by the +.UL fork +primitive, it +inherits not only the memory image of its parent +but also all the files currently open in its parent, +including those with file descriptors 0, 1, and 2. +The \&shell, of course, uses these files to read command +lines and to write its prompts and diagnostics, and in the ordinary case +its children\(emthe command programs\(eminherit them automatically. +When an argument with ``<'' or ``>'' is given, however, the +offspring process, just before it performs +.UL execute, +makes the standard I/O +file descriptor (0 or 1, respectively) refer to the named file. +This is easy +because, by agreement, +the smallest unused file descriptor is assigned +when a new file is +.UL open ed +(or +.UL create d); +it is only necessary to close file 0 (or 1) +and open the named file. +Because the process in which the command program runs simply terminates +when it is through, the association between a file +specified after ``<'' or ``>'' and file descriptor 0 or 1 is ended +automatically when the process dies. +Therefore +the \&shell need not know the actual names of the files +that are its own standard input and output, because it need +never reopen them. +.PP +Filters are straightforward extensions +of standard I/O redirection with pipes used +instead of files. +.PP +In ordinary circumstances, the main loop of the \&shell never +terminates. +(The main loop includes the +branch of the return from +.UL fork +belonging to the +parent process; that is, the branch that does a +.UL wait , +then +reads another command line.) +The one thing that causes the \&shell to terminate is +discovering an end-of-file condition on its input file. +Thus, when the \&shell is executed as a command with +a given input file, as in: +.P1 +sh <comfile +.P2 +the commands in +.UL comfile +will be executed until +the end of +.UL comfile +is reached; then the instance of the \&shell +invoked by +.UL sh +will terminate. +Because this \&shell process +is the child of another instance of the \&shell, the +.UL wait +executed in the latter will return, and another +command may then be processed. +.SH +6.6 Initialization +.PP +The instances of the \&shell to which users type +commands are themselves children of another process. +The last step in the initialization of +the system +is the creation of +a single process and the invocation (via +.UL execute ) +of a program called +.UL init . +The role of +.UL init +is to create one process +for each terminal channel. +The various subinstances of +.UL init +open the appropriate terminals +for input and output +on files 0, 1, and 2, +waiting, if necessary, for carrier to be established on dial-up lines. +Then a message is typed out requesting that the user log in. +When the user types a name or other identification, +the appropriate instance of +.UL init +wakes up, receives the log-in +line, and reads a password file. +If the user's name is found, and if +he is able to supply the correct password, +.UL init +changes to the user's default current directory, sets +the process's user \*sID\*n to that of the person logging in, and performs +an +.UL execute +of the \&shell. +At this point, the \&shell is ready to receive commands +and the logging-in protocol is complete. +.PP +Meanwhile, the mainstream path of +.UL init +(the parent of all +the subinstances of itself that will later become \&shells) +does a +.UL wait . +If one of the child processes terminates, either +because a \&shell found an end of file or because a user +typed an incorrect name or password, this path of +.UL init +simply recreates the defunct process, which in turn reopens the appropriate +input and output files and types another log-in message. +Thus a user may log out simply by typing the end-of-file +sequence to the \&shell. +.SH +6.7 Other programs as \&shell +.PP +The \&shell as described above is designed to allow users +full access to the facilities of the system, because it will +invoke the execution of any program +with appropriate protection mode. +Sometimes, however, a different interface to the system +is desirable, and this feature is easily arranged for. +.PP +Recall that after a user has successfully logged in by supplying +a name and password, +.UL init +ordinarily invokes the \&shell +to interpret command lines. +The user's entry +in the password file may contain the name +of a program to be invoked after log-in instead of the \&shell. +This program is free to interpret the user's messages +in any way it wishes. +.PP +For example, the password file entries +for users of a secretarial editing system +might +specify that the +editor +.UL ed +is to be used instead of the \&shell. +Thus when users of the editing system log in, they are inside the editor and +can begin work immediately; also, they can be prevented from +invoking +programs not intended for their use. +In practice, it has proved desirable to allow a temporary +escape from the editor +to execute the formatting program and other utilities. +.PP +Several of the games (e.g., chess, blackjack, 3D tic-tac-toe) +available on +the system +illustrate +a much more severely restricted environment. +For each of these, an entry exists +in the password file specifying that the appropriate game-playing +program is to be invoked instead of the \&shell. +People who log in as a player +of one of these games find themselves limited to the +game and unable to investigate the (presumably more interesting) +offerings of +the +.UX +system +as a whole. diff --git a/share/doc/psd/01.cacm/p5 b/share/doc/psd/01.cacm/p5 new file mode 100644 index 0000000..cf40f2d --- /dev/null +++ b/share/doc/psd/01.cacm/p5 @@ -0,0 +1,235 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p5 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +VII. TRAPS +.PP +The \*sPDP\*n-11 hardware detects a number of program faults, +such as references to non-existent memory, unimplemented instructions, +and odd addresses used where an even address is required. +Such faults cause the processor to trap to a system routine. +Unless other arrangements have been made, +an illegal action causes the system +to terminate the process and to write its +image +on file +.UL core +in the current directory. +A debugger can be used to determine +the state of the program at the time of the fault. +.PP +Programs that are looping, that produce unwanted output, or about which +the user has second thoughts may be halted by the use of the +.UL interrupt +signal, which is generated by typing the ``delete'' +character. +Unless special action has been taken, this +signal simply causes the program to cease execution +without producing a +.UL core +file. +There is also a +.UL quit +signal +used to force an image file to be produced. +Thus programs that loop unexpectedly may be +halted and the remains inspected without prearrangement. +.PP +The hardware-generated faults +and the interrupt and quit signals +can, by request, be either ignored or caught by a process. +For example, +the \&shell ignores quits to prevent +a quit from logging the user out. +The editor catches interrupts and returns +to its command level. +This is useful for stopping long printouts +without losing work in progress (the editor +manipulates a copy of the file it is editing). +In systems without floating-point hardware, +unimplemented instructions are caught +and floating-point instructions are +interpreted. +.SH +VIII. PERSPECTIVE +.PP +Perhaps paradoxically, +the success of +the +.UX +system +is largely due to the fact that it was not +designed to meet any +predefined objectives. +The first version was written when one of us +(Thompson), +dissatisfied with the available computer facilities, +discovered a little-used \*sPDP\*n-7 +and set out to create a more +hospitable environment. +This (essentially personal) effort was +sufficiently successful +to gain the interest of the other author +and several colleagues, +and later to justify the acquisition +of the \*sPDP\*n-11/20, specifically to support +a text editing and formatting system. +When in turn the 11/20 was outgrown, +the system +had proved useful enough to persuade management to +invest in the \*sPDP\*n-11/45, +and later in the +\*sPDP\*n-11/70 and Interdata 8/32 machines, +upon which it developed to its present form. +Our goals throughout the effort, +when articulated at all, have always been to build +a comfortable relationship with the machine +and to explore ideas and inventions in operating systems +and other software. +We have not been faced with the need to satisfy someone +else's requirements, +and for this freedom we are grateful. +.PP +Three considerations that influenced the design of +.UX +are visible in retrospect. +.PP +First: +because we are programmers, +we naturally designed the system to make it easy to +write, test, and run programs. +The most important expression of our desire for +programming convenience +was that the system +was arranged for interactive use, +even though the original version only +supported one user. +We believe that a properly designed +interactive system is much more +productive +and satisfying to use than a ``batch'' system. +Moreover, such a system is rather easily +adaptable to noninteractive use, while the converse is not true. +.PP +Second: +there have always been fairly severe size constraints +on the system and its software. +Given the partially antagonistic desires for reasonable efficiency and +expressive power, +the size constraint has encouraged +not only economy, but also a certain elegance of design. +This may be a thinly disguised version of the ``salvation +through suffering'' philosophy, +but in our case it worked. +.PP +Third: nearly from the start, the system was able to, and did, maintain itself. +This fact is more important than it might seem. +If designers of a system are forced to use that system, +they quickly become aware of its functional and superficial deficiencies +and are strongly motivated to correct them before it is too late. +Because all source programs were always available +and easily modified on-line, +we were willing to revise and rewrite the system and its software +when new ideas were invented, discovered, +or suggested by others. +.PP +The aspects of +.UX +discussed in this paper exhibit clearly +at least the first two of these +design considerations. +The interface to the file +system, for example, is extremely convenient from +a programming standpoint. +The lowest possible interface level is designed +to eliminate distinctions +between +the various devices and files and between +direct and sequential access. +No large ``access method'' routines +are required +to insulate the programmer from the +system calls; +in fact, all user programs either call the system +directly or +use a small library program, less than a page long, +that buffers a number of characters +and reads or writes them all at once. +.PP +Another important aspect of programming +convenience is that there are no ``control blocks'' +with a complicated structure partially maintained by +and depended on by the file system or other system calls. +Generally speaking, the contents of a program's address space +are the property of the program, and we have tried to +avoid placing restrictions +on the data structures within that address space. +.PP +Given the requirement +that all programs should be usable with any file or +device as input or output, +it is also desirable +to push device-dependent considerations +into the operating system itself. +The only alternatives seem to be to load, +with all programs, +routines for dealing with each device, +which is expensive in space, +or to depend on some means of dynamically linking to +the routine appropriate to each device when it is actually +needed, +which is expensive either in overhead or in hardware. +.PP +Likewise, +the process-control scheme and the command interface +have proved both convenient and efficient. +Because the \&shell operates as an ordinary, swappable +user program, +it consumes no ``wired-down'' space in the system proper, +and it may be made as powerful as desired +at little cost. +In particular, +given the framework in which the \&shell executes +as a process that spawns other processes to +perform commands, +the notions of I/O redirection, background processes, +command files, and user-selectable system interfaces +all become essentially trivial to implement. +.SH +Influences +.PP +The success of +.UX +lies +not so much in new inventions +but rather in the full exploitation of a carefully selected +set of fertile ideas, +and especially in showing that +they can be keys to the implementation of a small +yet powerful operating system. +.PP +The +.UL fork +operation, essentially as we implemented it, was +present in the \*sGENIE\*n time-sharing system. +.[ +lampson deutsch 930 manual 1965 system preliminary +.] +On a number of points we were influenced by Multics, +which suggested the particular form of the I/O system calls +.[ +multics input output feiertag organick +.] +and both the name of the \&shell and its general functions. +The notion that the \&shell should create a process +for each command was also suggested to us by +the early design of Multics, although in that +system it was later dropped for efficiency reasons. +A similar scheme is used by \*sTENEX\*n. +.[ +bobrow burchfiel tenex +.] diff --git a/share/doc/psd/01.cacm/p6 b/share/doc/psd/01.cacm/p6 new file mode 100644 index 0000000..77af7e7 --- /dev/null +++ b/share/doc/psd/01.cacm/p6 @@ -0,0 +1,72 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)p6 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +IX. STATISTICS +.PP +The following numbers +are presented to suggest the scale of the Research +.UX +operation. +Those of our users +not involved in document preparation +tend to use the system for +program development, especially language work. +There are few important +``applications'' programs. +.PP +Overall, we have today: +.PP +.SP .5 +.TS +center; +r5 l. +125 user population +33 maximum simultaneous users +1,630 directories +28,300 files +301,700 512-byte secondary storage blocks used +.TE +.SP .5 +There is a ``background'' process that +runs at the lowest possible priority; it is used +to soak up any idle \*sCPU\*n time. +It has been used to produce a million-digit +approximation to the constant \fIe\fR, +and other semi-infinite problems. +Not counting this background work, we average daily: +.SP .5 +.TS +center; +r 5 l. +13,500 commands +9.6 \*sCPU\*n hours +230 connect hours +62 different users +240 log-ins +.TE +.SP .5 +.SH +X. ACKNOWLEDGMENTS +.PP +The contributors to +.UX +are, in the traditional but here especially apposite +phrase, too numerous to mention. +Certainly, collective salutes are due to our colleagues in the +Computing Science Research Center. +R. H. Canaday contributed much to the basic design of the +file system. +We are particularly appreciative +of the inventiveness, +thoughtful criticism, +and constant support of +R. Morris, M. D. McIlroy, +and J. F. Ossanna. +.[ +$LIST$ +.] diff --git a/share/doc/psd/01.cacm/ref.bib b/share/doc/psd/01.cacm/ref.bib new file mode 100644 index 0000000..c4283b5 --- /dev/null +++ b/share/doc/psd/01.cacm/ref.bib @@ -0,0 +1,113 @@ +# $FreeBSD$ + +%A L. P. Deutsch +%A B. W. Lampson +%T An online editor +%J Comm. Assoc. Comp. Mach. +%V 10 +%N 12 +%D December 1967 +%P 793-799, 803 +%K qed + +%K cstr +%R Comp. Sci. Tech. Rep. No. 17 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A B. W. Kernighan +%A L. L. Cherry +%T A System for Typesetting Mathematics +%d May 1974, revised April 1977 +%J Comm. Assoc. Comp. Mach. +%K acm cacm +%V 18 +%P 151-157 +%D March 1975 + +%T U\s-2NIX\s0 Time-Sharing System: Document Preparation +%K unix bstj +%A B. W. Kernighan +%A M. E. Lesk +%A J. F. Ossanna +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2115-2135 +%D 1978 + +%A T. A. Dolotta +%A J. R. Mashey +%T An Introduction to the Programmer's Workbench +%J Proc. 2nd Int. Conf. on Software Engineering +%D October 13-15, 1976 +%P 164-168 + +%T U\s-2NIX\s0 Time-Sharing System: The Programmer's Workbench +%A T. A. Dolotta +%A R. C. Haight +%A J. R. Mashey +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2177-2200 +%D 1978 +%K unix bstj + +%T U\s-2NIX\s0 Time-Sharing System: U\s-2NIX\s0 on a Microprocessor +%K unix bstj +%A H. Lycklama +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2087-2101 +%D 1978 + +%T The C Programming Language +%A B. W. Kernighan +%A D. M. Ritchie +%I Prentice-Hall +%C Englewood Cliffs, New Jersey +%D 1978 + +%T Computer Recreations +%A Aleph-null +%J Software Practice and Experience +%V 1 +%N 2 +%D April-June 1971 +%P 201-204 + +%T U\s-2NIX\s0 Time-Sharing System: The U\s-2NIX\s0 Shell +%A S. R. Bourne +%K unix bstj +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 1971-1990 +%D 1978 + +%A L. P. Deutsch +%A B. W. Lampson +%T \*sSDS\*n 930 time-sharing system preliminary reference manual +%R Doc. 30.10.10, Project \*sGENIE\*n +%C Univ. Cal. at Berkeley +%D April 1965 + +%A R. J. Feiertag +%A E. I. Organick +%T The Multics input-output system +%J Proc. Third Symposium on Operating Systems Principles +%D October 18-20, 1971 +%P 35-41 + +%A D. G. Bobrow +%A J. D. Burchfiel +%A D. L. Murphy +%A R. S. Tomlinson +%T \*sTENEX\*n, a Paged Time Sharing System for the \*sPDP\*n-10 +%J Comm. Assoc. Comp. Mach. +%V 15 +%N 3 +%D March 1972 +%K tenex +%P 135-143 diff --git a/share/doc/psd/02.implement/Makefile b/share/doc/psd/02.implement/Makefile new file mode 100644 index 0000000..89d0dc3 --- /dev/null +++ b/share/doc/psd/02.implement/Makefile @@ -0,0 +1,17 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/02.implement +SRCS= stubs implement +EXTRA= ref.bib +MACROS= -ms +USE_PIC= +USE_REFER= +USE_SOELIM= +CLEANFILES= stubs + +stubs: + @(echo .R1; echo database ${.CURDIR}/ref.bib; \ + echo accumulate; echo .R2) > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/psd/02.implement/fig1.pic b/share/doc/psd/02.implement/fig1.pic new file mode 100644 index 0000000..2d4b3d3 --- /dev/null +++ b/share/doc/psd/02.implement/fig1.pic @@ -0,0 +1,100 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)fig1.pic 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.PS +.ps 9 +[ + PT: [ + T: box invis ht .2i "Process Table"; move down .125i + A: box ht .25i; down + PTE: box "Process" "Table" "Entry"; down + C: box ht .25i + ] + move right 1.5i + TT: [ + T: box invis ht .2i "Text Table"; move down .125i + A: box ht .25i; down + TTE: box "Text" "Table" "Entry"; down + C: box ht .25i + ] + move down 1i from TT.C.s + move right 0.5i + UTS: [ + box ht 0.75i wid 0.75i "User" "Text" "Segment" + ] + move left 1.5i from UTS.w + DS: [ + SDS: box "System" "Data" "Segment" ; move down .5i from SDS.n ; + UDS: box ht 0.75i "User" "Data" "Segment" + ] + move left 1i from DS.UDS.w + move down 0.25i + UAS: [ + box invis "User" "Address" "Space" + ] + line from UAS.ne to UAS.se + line from UAS.nw to UAS.sw + line right 0.15i from UAS.nw + line right 0.15i from UAS.sw + line left 0.15i from UAS.ne + line left 0.15i from UAS.se + arrow from 1/4 of the way between PT.PTE.ne and PT.PTE.se right 1.875i + arrow from TT.TTE.e right .5i then down to UTS.n + arrow from PT.PTE.e right .875i then down to DS.SDS.n + arrow from 3/4 of the way between PT.PTE.ne and PT.PTE.se right .25i then down 1.5i then right .25i + arrow from 1/4 of the way between UAS.ne and UAS.se right .375i then up .25i then right .25i + arrow from 3/4 of the way between UAS.ne and UAS.se right 2.375i then up .875i then right .5i + move up 1.3175i from UAS.nw + move left .75i + line right 5.625i + move left 5.25i + move up .3125i + RS: [ + box invis ht 0.2i "Resident" + ] + move down .8i + SW: [ + box invis ht 0.2i "Swapped" + ] + arrow <-> from RS.s to SW.n +] +.PE diff --git a/share/doc/psd/02.implement/fig2.pic b/share/doc/psd/02.implement/fig2.pic new file mode 100644 index 0000000..2dc2915 --- /dev/null +++ b/share/doc/psd/02.implement/fig2.pic @@ -0,0 +1,110 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)fig2.pic 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.PS +.ps 9 +[ + PUOFT: [ + A: box invis ht .4i wid 1i "Per-User Open" "File Table" + B: box ht .25i with .n at A.s + C: box with .n at B.s + D: box ht .25i with .n at C.s + ] + move down 1.0625i left 1.25i from PUOFT.D.s + OFT: [ + A: box invis ht .4i wid 1i "Open File" "Table" + B: box ht .25i with .n at A.s + C: box with .n at B.s + D: box ht .25i with .n at C.s + ] + move down 1.0625i right 1.25i from PUOFT.D.s + AIT: [ + A: box invis ht .4i wid 1i "Active I-node" "Table" + B: box ht .25i with .n at A.s + C: box with .n at B.s + D: box ht .25i with .n at C.s + ] + move down 2.5i from PUOFT.D.s + IF: [ + A: box ht .25i + B: box ht .25i "I-node" with .n at A.s + C: box ht .25i with .n at B.s + D: box ht .25i "File" with .n at C.s + E: box ht .25i with .n at D.s + ] + move right 1.5i from IF.D.w + FMA: [ + box invis "File" "Mapping" "Algorithms" + ] + line from FMA.ne to FMA.se + line from FMA.nw to FMA.sw + line left .15i from FMA.se + line left .15i from FMA.ne + line right .15i from FMA.nw + line right .15i from FMA.sw + + arrow from FMA.w to IF.D.e + arrow from AIT.C.e right .25i then down 2.125i then left .5i + arrow from OFT.C.e to AIT.C.w + arrow from PUOFT.C.w left .5i then down 1.625i then left .5i + arrow <-> from IF.B.e right .5i then up 1.5i then right .5i + + move up .1875i from OFT.A.nw + line right 5i + move left 5i down 1.9375i + line right 5i + + move up 1.63475i right 2.75i from PUOFT.D.s + line right .1i down .1i then down .6i then right .1i down .1i then left .1i down .1i then down .6i then left .1i down .1i + move down .34375i right 2.75i from PUOFT.D.s + line right .1i down .1i then down .6i then right .1i down .1i then left .1i down .1i then down .6i then left .1i down .1i + move down 2.34375i right 2.75i from PUOFT.D.s + line right .1i down .1i then down .6i then right .1i down .1i then left .1i down .1i then down .6i then left .1i down .1i + + move up 0.817375i right 2.9i from PUOFT.D.s + box invis "Swapped" "Per User" + move down 1.15625i right 2.9i from PUOFT.D.s + box invis wid 1i "Resident" "Per System" + move down 3.15675i right 2.9i from PUOFT.D.s + box invis ht 1i wid 1i "Secondary" "Storage" "Per" "File System" +] +.PE diff --git a/share/doc/psd/02.implement/implement b/share/doc/psd/02.implement/implement new file mode 100644 index 0000000..f6ad7d7 --- /dev/null +++ b/share/doc/psd/02.implement/implement @@ -0,0 +1,1282 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)implement 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.EH 'PSD:2-%''UNIX Implementation' +.OH 'UNIX Implementation''PSD:2-%' +.de P1 +.DS +.. +.de P2 +.DE +.. +.de UL +.lg 0 +.if n .ul +\%\&\\$3\f3\\$1\fR\&\\$2 +.lg +.. +.de UC +\&\\$3\s-1\\$1\\s0\&\\$2 +.. +.de IT +.lg 0 +.if n .ul +\%\&\\$3\f2\\$1\fR\&\\$2 +.lg +.. +.de SP +.sp \\$1 +.. +.hw device +.TL +UNIX Implementation +.AU "MH 2C-523" 2394 +K. Thompson +.AI +AT&T Bell Laboratories +Murray Hill, NJ +.AB +This paper describes in high-level terms the +implementation of the resident +.UX +kernel. +This discussion is broken into three parts. +The first part describes +how the +.UX +system views processes, users, and programs. +The second part describes the I/O system. +The last part describes the +.UX +file system. +.AE +.NH +INTRODUCTION +.PP +The +.UX +kernel consists of about 10,000 +lines of C code and about 1,000 lines of assembly code. +The assembly code can be further broken down into +200 lines included for +the sake of efficiency +(they could have been written in C) +and 800 lines to perform hardware +functions not possible in C. +.PP +This code represents 5 to 10 percent of what has +been lumped into the broad expression +``the +.UX +operating system.'' +The kernel is the only +.UX +code that +cannot be substituted by a user to his +own liking. +For this reason, +the kernel should make as few real +decisions as possible. +This does not mean to allow the user +a million options to do the same thing. +Rather, it means to allow only one way to +do one thing, +but have that way be the least-common divisor +of all the options that might have been provided. +.PP +What is or is not implemented in the kernel +represents both a great responsibility and a great power. +It is a soap-box platform on +``the way things should be done.'' +Even so, if +``the way'' is too radical, +no one will follow it. +Every important decision was weighed +carefully. +Throughout, +simplicity has been substituted for efficiency. +Complex algorithms are used only if +their complexity can be localized. +.NH +PROCESS CONTROL +.PP +In the +.UX +system, +a user executes programs in an +environment called a user process. +When a system function is required, +the user process calls the system +as a subroutine. +At some point in this call, +there is a distinct switch of environments. +After this, +the process is said to be a system process. +In the normal definition of processes, +the user and system processes are different +phases of the same process +(they never execute simultaneously). +For protection, +each system process has its own stack. +.PP +The user process may execute +from a read-only text segment, +which is shared by all processes +executing the same code. +There is no +.IT functional +benefit +from shared-text segments. +An +.IT efficiency +benefit comes from the fact +that there is no need to swap read-only +segments out because the original +copy on secondary memory is still current. +This is a great benefit to interactive +programs that tend to be swapped while +waiting for terminal input. +Furthermore, +if two processes are +executing +simultaneously +from the same copy of a read-only segment, +only one copy needs to reside in +primary memory. +This is a secondary effect, +because +simultaneous execution of a program +is not common. +It is ironic that this effect, +which reduces the use of primary memory, +only comes into play when there is +an overabundance of primary memory, +that is, +when there is enough memory +to keep waiting processes loaded. +.PP +All current read-only text segments in the +system are maintained from the +.IT "text table" . +A text table entry holds the location of the +text segment on secondary memory. +If the segment is loaded, +that table also holds the primary memory location +and the count of the number of processes +sharing this entry. +When this count is reduced to zero, +the entry is freed along with any +primary and secondary memory holding the segment. +When a process first executes a shared-text segment, +a text table entry is allocated and the +segment is loaded onto secondary memory. +If a second process executes a text segment +that is already allocated, +the entry reference count is simply incremented. +.PP +A user process has some strictly private +read-write data +contained in its +data segment. +As far as possible, +the system does not use the user's +data segment to hold system data. +In particular, +there are no I/O buffers in the +user address space. +.PP +The user data segment has two growing boundaries. +One, increased automatically by the system +as a result of memory faults, +is used for a stack. +The second boundary is only grown (or shrunk) by +explicit requests. +The contents of newly allocated primary memory +is initialized to zero. +.PP +Also associated and swapped with +a process is a small fixed-size +system data segment. +This segment contains all +the data about the process +that the system needs only when the +process is active. +Examples of the kind of data contained +in the system data segment are: +saved central processor registers, +open file descriptors, +accounting information, +scratch data area, +and the stack for the system phase +of the process. +The system data segment is not +addressable from the user process +and is therefore protected. +.PP +Last, +there is a process table with +one entry per process. +This entry contains all the data +needed by the system when the process +is +.IT not +active. +Examples are +the process's name, +the location of the other segments, +and scheduling information. +The process table entry is allocated +when the process is created, and freed +when the process terminates. +This process entry is always directly +addressable by the kernel. +.PP +Figure 1 shows the relationships +between the various process control +data. +In a sense, +the process table is the +definition of all processes, +because +all the data associated with a process +may be accessed +starting from the process table entry. +.KF +.if t .in .375i +.so fig1.pic +.if t .in -.375i +.sp 2v +.ce +Fig. 1\(emProcess control data structure. +.KE +.NH 2 +Process creation and program execution +.PP +Processes are created by the system primitive +.UL fork . +The newly created process (child) is a copy of the original process (parent). +There is no detectable sharing of primary memory between the two processes. +(Of course, +if the parent process was executing from a read-only +text segment, +the child will share the text segment.) +Copies of all writable data segments +are made for the child process. +Files that were open before the +.UL fork +are +truly shared after the +.UL fork . +The processes are informed as to their part in the +relationship to +allow them to select their own +(usually non-identical) +destiny. +The parent may +.UL wait +for the termination of +any of its children. +.PP +A process may +.UL exec +a file. +This consists of exchanging the current text and data +segments of the process for new text and data +segments specified in the file. +The old segments are lost. +Doing an +.UL exec +does +.IT not +change processes; +the process that did the +.UL exec +persists, +but +after the +.UL exec +it is executing a different program. +Files that were open +before the +.UL exec +remain open after the +.UL exec . +.PP +If a program, +say the first pass of a compiler, +wishes to overlay itself with another program, +say the second pass, +then it simply +.UL exec s +the second program. +This is analogous +to a ``goto.'' +If a program wishes to regain control +after +.UL exec ing +a second program, +it should +.UL fork +a child process, +have the child +.UL exec +the second program, and +have the parent +.UL wait +for the child. +This is analogous to a ``call.'' +Breaking up the call into a binding followed by +a transfer is similar to the subroutine linkage in +SL-5. +.[ +griswold hanson sl5 overview +.] +.NH 2 +Swapping +.PP +The major data associated with a process +(the user data segment, +the system data segment, and +the text segment) +are swapped to and from secondary +memory, as needed. +The user data segment and the system data segment +are kept in contiguous primary memory to reduce +swapping latency. +(When low-latency devices, such as bubbles, +.UC CCD s, +or scatter/gather devices, +are used, +this decision will have to be reconsidered.) +Allocation of both primary +and secondary memory is performed +by the same simple first-fit algorithm. +When a process grows, +a new piece of primary memory is allocated. +The contents of the old memory is copied to the new memory. +The old memory is freed +and the tables are updated. +If there is not enough primary memory, +secondary memory is allocated instead. +The process is swapped out onto the +secondary memory, +ready to be swapped in with +its new size. +.PP +One separate process in the kernel, +the swapping process, +simply swaps the other +processes in and out of primary memory. +It examines the +process table looking for a process +that is swapped out and is +ready to run. +It allocates primary memory for that +process and +reads its segments into +primary memory, where that process competes for the +central processor with other loaded processes. +If no primary memory is available, +the swapping process makes memory available +by examining the process table for processes +that can be swapped out. +It selects a process to swap out, +writes it to secondary memory, +frees the primary memory, +and then goes back to look for a process +to swap in. +.PP +Thus there are two specific algorithms +to the swapping process. +Which of the possibly many processes that +are swapped out is to be swapped in? +This is decided by secondary storage residence +time. +The one with the longest time out is swapped in first. +There is a slight penalty for larger processes. +Which of the possibly many processes that +are loaded is to be swapped out? +Processes that are waiting for slow events +(i.e., not currently running or waiting for +disk I/O) +are picked first, +by age in primary memory, +again with size penalties. +The other processes are examined +by the same age algorithm, +but are not taken out unless they are +at least of some age. +This adds +hysteresis to the swapping and +prevents total thrashing. +.PP +These swapping algorithms are the +most suspect in the system. +With limited primary memory, +these algorithms cause total swapping. +This is not bad in itself, because +the swapping does not impact the +execution of the resident processes. +However, if the swapping device must +also be used for file storage, +the swapping traffic severely +impacts the file system traffic. +It is exactly these small systems +that tend to double usage of limited disk +resources. +.NH 2 +Synchronization and scheduling +.PP +Process synchronization is accomplished by having processes +wait for events. +Events are represented by arbitrary integers. +By convention, +events are chosen to be addresses of +tables associated with those events. +For example, a process that is waiting for +any of its children to terminate will wait +for an event that is the address of +its own process table entry. +When a process terminates, +it signals the event represented by +its parent's process table entry. +Signaling an event on which no process +is waiting has no effect. +Similarly, +signaling an event on which many processes +are waiting will wake all of them up. +This differs considerably from +Dijkstra's P and V +synchronization operations, +.[ +dijkstra sequential processes 1968 +.] +in that +no memory is associated with events. +Thus there need be no allocation of events +prior to their use. +Events exist simply by being used. +.PP +On the negative side, +because there is no memory associated with events, +no notion of ``how much'' +can be signaled via the event mechanism. +For example, +processes that want memory might +wait on an event associated with +memory allocation. +When any amount of memory becomes available, +the event would be signaled. +All the competing processes would then wake +up to fight over the new memory. +(In reality, +the swapping process is the only process +that waits for primary memory to become available.) +.PP +If an event occurs +between the time a process decides +to wait for that event and the +time that process enters the wait state, +then +the process will wait on an event that has +already happened (and may never happen again). +This race condition happens because there is no memory associated with +the event to indicate that the event has occurred; +the only action of an event is to change a set of processes +from wait state to run state. +This problem is relieved largely +by the fact that process switching can +only occur in the kernel by explicit calls +to the event-wait mechanism. +If the event in question is signaled by another +process, +then there is no problem. +But if the event is signaled by a hardware +interrupt, +then special care must be taken. +These synchronization races pose the biggest +problem when +.UX +is adapted to multiple-processor configurations. +.[ +hawley meyer multiprocessing unix +.] +.PP +The event-wait code in the kernel +is like a co-routine linkage. +At any time, +all but one of the processes has called event-wait. +The remaining process is the one currently executing. +When it calls event-wait, +a process whose event has been signaled +is selected and that process +returns from its call to event-wait. +.PP +Which of the runable processes is to run next? +Associated with each process is a priority. +The priority of a system process is assigned by the code +issuing the wait on an event. +This is roughly equivalent to the response +that one would expect on such an event. +Disk events have high priority, +teletype events are low, +and time-of-day events are very low. +(From observation, +the difference in system process priorities +has little or no performance impact.) +All user-process priorities are lower than the +lowest system priority. +User-process priorities are assigned +by an algorithm based on the +recent ratio of the amount of compute time to real time consumed +by the process. +A process that has used a lot of +compute time in the last real-time +unit is assigned a low user priority. +Because interactive processes are characterized +by low ratios of compute to real time, +interactive response is maintained without any +special arrangements. +.PP +The scheduling algorithm simply picks +the process with the highest priority, +thus +picking all system processes first and +user processes second. +The compute-to-real-time ratio is updated +every second. +Thus, +all other things being equal, +looping user processes will be +scheduled round-robin with a +1-second quantum. +A high-priority process waking up will +preempt a running, low-priority process. +The scheduling algorithm has a very desirable +negative feedback character. +If a process uses its high priority +to hog the computer, +its priority will drop. +At the same time, if a low-priority +process is ignored for a long time, +its priority will rise. +.NH +I/O SYSTEM +.PP +The I/O system +is broken into two completely separate systems: +the block I/O system and the character I/O system. +In retrospect, +the names should have been ``structured I/O'' +and ``unstructured I/O,'' respectively; +while the term ``block I/O'' has some meaning, +``character I/O'' is a complete misnomer. +.PP +Devices are characterized by a major device number, +a minor device number, and +a class (block or character). +For each class, +there is an array of entry points into the device drivers. +The major device number is used to index the array +when calling the code for a particular device driver. +The minor device number is passed to the +device driver as an argument. +The minor number has no significance other +than that attributed to it by the driver. +Usually, +the driver uses the minor number to access +one of several identical physical devices. +.PP +The use of the array of entry points +(configuration table) +as the only connection between the +system code and the device drivers is +very important. +Early versions of the system had a much +less formal connection with the drivers, +so that it was extremely hard to handcraft +differently configured systems. +Now it is possible to create new +device drivers in an average of a few hours. +The configuration table in most cases +is created automatically by a program +that reads the system's parts list. +.NH 2 +Block I/O system +.PP +The model block I/O device consists +of randomly addressed, secondary +memory blocks of 512 bytes each. +The blocks are uniformly addressed +0, 1, .\|.\|. up to the size of the device. +The block device driver has the job of +emulating this model on a +physical device. +.PP +The block I/O devices are accessed +through a layer of buffering software. +The system maintains a list of buffers +(typically between 10 and 70) +each assigned a device name and +a device address. +This buffer pool constitutes a data cache +for the block devices. +On a read request, +the cache is searched for the desired block. +If the block is found, +the data are made available to the +requester without any physical I/O. +If the block is not in the cache, +the least recently used block in the cache is renamed, +the correct device driver is called to +fill up the renamed buffer, and then the +data are made available. +Write requests are handled in an analogous manner. +The correct buffer is found +and relabeled if necessary. +The write is performed simply by marking +the buffer as ``dirty.'' +The physical I/O is then deferred until +the buffer is renamed. +.PP +The benefits in reduction of physical I/O +of this scheme are substantial, +especially considering the file system implementation. +There are, +however, +some drawbacks. +The asynchronous nature of the +algorithm makes error reporting +and meaningful user error handling +almost impossible. +The cavalier approach to I/O error +handling in the +.UX +system is partly due to the asynchronous +nature of the block I/O system. +A second problem is in the delayed writes. +If the system stops unexpectedly, +it is almost certain that there is a +lot of logically complete, +but physically incomplete, +I/O in the buffers. +There is a system primitive to +flush all outstanding I/O activity +from the buffers. +Periodic use of this primitive helps, +but does not solve, the problem. +Finally, +the associativity in the buffers +can alter the physical I/O sequence +from that of the logical I/O sequence. +This means that there are times +when data structures on disk are inconsistent, +even though the software is careful +to perform I/O in the correct order. +On non-random devices, +notably magnetic tape, +the inversions of writes can be disastrous. +The problem with magnetic tapes is ``cured'' by +allowing only one outstanding write request +per drive. +.NH 2 +Character I/O system +.PP +The character I/O system consists of all +devices that do not fall into the block I/O model. +This includes the ``classical'' character devices +such as communications lines, paper tape, and +line printers. +It also includes magnetic tape and disks when +they are not used in a stereotyped way, +for example, 80-byte physical records on tape +and track-at-a-time disk copies. +In short, +the character I/O interface +means ``everything other than block.'' +I/O requests from the user are sent to the +device driver essentially unaltered. +The implementation of these requests is, of course, +up to the device driver. +There are guidelines and conventions +to help the implementation of +certain types of device drivers. +.NH 3 +Disk drivers +.PP +Disk drivers are implemented +with a queue of transaction records. +Each record holds a read/write flag, +a primary memory address, +a secondary memory address, and +a transfer byte count. +Swapping is accomplished by passing +such a record to the swapping device driver. +The block I/O interface is implemented by +passing such records with requests to +fill and empty system buffers. +The character I/O interface to the disk +drivers create a transaction record that +points directly into the user area. +The routine that creates this record also insures +that the user is not swapped during this +I/O transaction. +Thus by implementing the general disk driver, +it is possible to use the disk +as a block device, +a character device, and a swap device. +The only really disk-specific code in normal +disk drivers is the pre-sort of transactions to +minimize latency for a particular device, and +the actual issuing of the I/O request. +.NH 3 +Character lists +.PP +Real character-oriented devices may +be implemented using the common +code to handle character lists. +A character list is a queue of characters. +One routine puts a character on a queue. +Another gets a character from a queue. +It is also possible to ask how many +characters are currently on a queue. +Storage for all queues in the system comes +from a single common pool. +Putting a character on a queue will allocate +space from the common pool and link the +character onto the data structure defining the queue. +Getting a character from a queue returns +the corresponding space to the pool. +.PP +A typical character-output device +(paper tape punch, for example) +is implemented by passing characters +from the user onto a character queue until +some maximum number of characters is on the queue. +The I/O is prodded to start as +soon as there is anything on the queue +and, once started, +it is sustained by hardware completion interrupts. +Each time there is a completion interrupt, +the driver gets the next character from the queue +and sends it to the hardware. +The number of characters on the queue is checked and, +as the count falls through some intermediate level, +an event (the queue address) is signaled. +The process that is passing characters from +the user to the queue can be waiting on the event, and +refill the queue to its maximum +when the event occurs. +.PP +A typical character input device +(for example, a paper tape reader) +is handled in a very similar manner. +.PP +Another class of character devices is the terminals. +A terminal is represented by three +character queues. +There are two input queues (raw and canonical) +and an output queue. +Characters going to the output of a terminal +are handled by common code exactly as described +above. +The main difference is that there is also code +to interpret the output stream as +.UC ASCII +characters and to perform some translations, +e.g., escapes for deficient terminals. +Another common aspect of terminals is code +to insert real-time delay after certain control characters. +.PP +Input on terminals is a little different. +Characters are collected from the terminal and +placed on a raw input queue. +Some device-dependent code conversion and +escape interpretation is handled here. +When a line is complete in the raw queue, +an event is signaled. +The code catching this signal then copies a +line from the raw queue to a canonical queue +performing the character erase and line kill editing. +User read requests on terminals can be +directed at either the raw or canonical queues. +.NH 3 +Other character devices +.PP +Finally, +there are devices that fit no general category. +These devices are set up as character I/O drivers. +An example is a driver that reads and writes +unmapped primary memory as an I/O device. +Some devices are too +fast to be treated a character at time, +but do not fit the disk I/O mold. +Examples are fast communications lines and +fast line printers. +These devices either have their own buffers +or ``borrow'' block I/O buffers for a while and +then give them back. +.NH +THE FILE SYSTEM +.PP +In the +.UX +system, +a file is a (one-dimensional) array of bytes. +No other structure of files is implied by the +system. +Files are attached anywhere +(and possibly multiply) +onto a hierarchy of directories. +Directories are simply files that +users cannot write. +For a further discussion +of the external view of files and directories, +see Ref.\0 +.[ +ritchie thompson unix bstj 1978 +%Q This issue +.]. +.PP +The +.UX +file system is a disk data structure +accessed completely through +the block I/O system. +As stated before, +the canonical view of a ``disk'' is +a randomly addressable array of +512-byte blocks. +A file system breaks the disk into +four self-identifying regions. +The first block (address 0) +is unused by the file system. +It is left aside for booting procedures. +The second block (address 1) +contains the so-called ``super-block.'' +This block, +among other things, +contains the size of the disk and +the boundaries of the other regions. +Next comes the i-list, +a list of file definitions. +Each file definition is +a 64-byte structure, called an i-node. +The offset of a particular i-node +within the i-list is called its i-number. +The combination of device name +(major and minor numbers) and i-number +serves to uniquely name a particular file. +After the i-list, +and to the end of the disk, +come free storage blocks that +are available for the contents of files. +.PP +The free space on a disk is maintained +by a linked list of available disk blocks. +Every block in this chain contains a disk address +of the next block in the chain. +The remaining space contains the address of up to +50 disk blocks that are also free. +Thus with one I/O operation, +the system obtains 50 free blocks and a +pointer where to find more. +The disk allocation algorithms are +very straightforward. +Since all allocation is in fixed-size +blocks and there is strict accounting of +space, +there is no need to compact or garbage collect. +However, +as disk space becomes dispersed, +latency gradually increases. +Some installations choose to occasionally compact +disk space to reduce latency. +.PP +An i-node contains 13 disk addresses. +The first 10 of these addresses point directly at +the first 10 blocks of a file. +If a file is larger than 10 blocks (5,120 bytes), +then the eleventh address points at a block +that contains the addresses of the next 128 blocks of the file. +If the file is still larger than this +(70,656 bytes), +then the twelfth block points at up to 128 blocks, +each pointing to 128 blocks of the file. +Files yet larger +(8,459,264 bytes) +use the thirteenth address for a ``triple indirect'' address. +The algorithm ends here with the maximum file size +of 1,082,201,087 bytes. +.PP +A logical directory hierarchy is added +to this flat physical structure simply +by adding a new type of file, the directory. +A directory is accessed exactly as an ordinary file. +It contains 16-byte entries consisting of +a 14-byte name and an i-number. +The root of the hierarchy is at a known i-number +(\fIviz.,\fR 2). +The file system structure allows an arbitrary, directed graph +of directories with regular files linked in +at arbitrary places in this graph. +In fact, +very early +.UX +systems used such a structure. +Administration of such a structure became so +chaotic that later systems were restricted +to a directory tree. +Even now, +with regular files linked multiply +into arbitrary places in the tree, +accounting for space has become a problem. +It may become necessary to restrict the entire +structure to a tree, +and allow a new form of linking that +is subservient to the tree structure. +.PP +The file system allows +easy creation, +easy removal, +easy random accessing, +and very easy space allocation. +With most physical addresses confined +to a small contiguous section of disk, +it is also easy to dump, restore, and +check the consistency of the file system. +Large files suffer from indirect addressing, +but the cache prevents most of the implied physical I/O +without adding much execution. +The space overhead properties of this scheme are quite good. +For example, +on one particular file system, +there are 25,000 files containing 130M bytes of data-file content. +The overhead (i-node, indirect blocks, and last block breakage) +is about 11.5M bytes. +The directory structure to support these files +has about 1,500 directories containing 0.6M bytes of directory content +and about 0.5M bytes of overhead in accessing the directories. +Added up any way, +this comes out to less than a 10 percent overhead for actual +stored data. +Most systems have this much overhead in +padded trailing blanks alone. +.NH 2 +File system implementation +.PP +Because the i-node defines a file, +the implementation of the file system centers +around access to the i-node. +The system maintains a table of all active +i-nodes. +As a new file is accessed, +the system locates the corresponding i-node, +allocates an i-node table entry, and reads +the i-node into primary memory. +As in the buffer cache, +the table entry is considered to be the current +version of the i-node. +Modifications to the i-node are made to +the table entry. +When the last access to the i-node goes +away, +the table entry is copied back to the +secondary store i-list and the table entry is freed. +.KF +.if t .in .25i +.so fig2.pic +.if t .in -.25i +.sp 2v +.ce +Fig. 2\(emFile system data structure. +.sp +.KE +.PP +All I/O operations on files are carried out +with the aid of the corresponding i-node table entry. +The accessing of a file is a straightforward +implementation of the algorithms mentioned previously. +The user is not aware of i-nodes and i-numbers. +References to the file system are made in terms of +path names of the directory tree. +Converting a path name into an i-node table entry +is also straightforward. +Starting at some known i-node +(the root or the current directory of some process), +the next component of the path name is +searched by reading the directory. +This gives an i-number and an implied device +(that of the directory). +Thus the next i-node table entry can be accessed. +If that was the last component of the path name, +then this i-node is the result. +If not, +this i-node is the directory needed to look up +the next component of the path name, and the +algorithm is repeated. +.PP +The user process accesses the file system with +certain primitives. +The most common of these are +.UL open , +.UL create , +.UL read , +.UL write , +.UL seek , +and +.UL close . +The data structures maintained are shown in Fig. 2. +In the system data segment associated with a user, +there is room for some (usually between 10 and 50) open files. +This open file table consists of pointers that can be used to access +corresponding i-node table entries. +Associated with each of these open files is +a current I/O pointer. +This is a byte offset of +the next read/write operation on the file. +The system treats each read/write request +as random with an implied seek to the +I/O pointer. +The user usually thinks of the file as +sequential with the I/O pointer +automatically counting the number of bytes +that have been read/written from the file. +The user may, +of course, +perform random I/O by setting the I/O pointer +before reads/writes. +.PP +With file sharing, +it is necessary to allow related +processes to share a common I/O pointer +and yet have separate I/O pointers +for independent processes +that access the same file. +With these two conditions, +the I/O pointer cannot reside +in the i-node table nor can +it reside in the list of +open files for the process. +A new table +(the open file table) +was invented for the sole purpose +of holding the I/O pointer. +Processes that share the same open +file +(the result of +.UL fork s) +share a common open file table entry. +A separate open of the same file will +only share the i-node table entry, +but will have distinct open file table entries. +.PP +The main file system primitives are implemented as follows. +.UL \&open +converts a file system path name into an i-node +table entry. +A pointer to the i-node table entry is placed in a +newly created open file table entry. +A pointer to the file table entry is placed in the +system data segment for the process. +.UL \&create +first creates a new i-node entry, +writes the i-number into a directory, and +then builds the same structure as for an +.UL open . +.UL \&read +and +.UL write +just access the i-node entry as described above. +.UL \&seek +simply manipulates the I/O pointer. +No physical seeking is done. +.UL \&close +just frees the structures built by +.UL open +and +.UL create . +Reference counts are kept on the open file table entries and +the i-node table entries to free these structures after +the last reference goes away. +.UL \&unlink +simply decrements the count of the +number of directories pointing at the given i-node. +When the last reference to an i-node table entry +goes away, +if the i-node has no directories pointing to it, +then the file is removed and the i-node is freed. +This delayed removal of files prevents +problems arising from removing active files. +A file may be removed while still open. +The resulting unnamed file vanishes +when the file is closed. +This is a method of obtaining temporary files. +.PP +There is a type of unnamed +.UC FIFO +file called a +.UL pipe. +Implementation of +.UL pipe s +consists of implied +.UL seek s +before each +.UL read +or +.UL write +in order to implement +first-in-first-out. +There are also checks and synchronization +to prevent the +writer from grossly outproducing the +reader and to prevent the reader from +overtaking the writer. +.NH 2 +Mounted file systems +.PP +The file system of a +.UX +system +starts with some designated block device +formatted as described above to contain +a hierarchy. +The root of this structure is the root of +the +.UX +file system. +A second formatted block device may be +mounted +at any leaf of +the current hierarchy. +This logically extends the current hierarchy. +The implementation of +mounting +is trivial. +A mount table is maintained containing +pairs of designated leaf i-nodes and +block devices. +When converting a path name into an i-node, +a check is made to see if the new i-node is a +designated leaf. +If it is, +the i-node of the root +of the block device replaces it. +.PP +Allocation of space for a file is taken +from the free pool on the device on which the +file lives. +Thus a file system consisting of many +mounted devices does not have a common pool of +free secondary storage space. +This separation of space on different +devices is necessary to allow easy +unmounting +of a device. +.NH 2 +Other system functions +.PP +There are some other things that the system +does for the user\-a +little accounting, +a little tracing/debugging, +and a little access protection. +Most of these things are not very +well developed +because our use of the system in computing science research +does not need them. +There are some features that are missed in some +applications, for example, better inter-process communication. +.PP +The +.UX +kernel is an I/O multiplexer more than +a complete operating system. +This is as it should be. +Because of this outlook, +many features are +found in most +other operating systems that are missing from the +.UX +kernel. +For example, +the +.UX +kernel does not support +file access methods, +file disposition, +file formats, +file maximum size, +spooling, +command language, +logical records, +physical records, +assignment of logical file names, +logical file names, +more than one character set, +an operator's console, +an operator, +log-in, +or log-out. +Many of these things are symptoms rather than features. +Many of these things are implemented +in user software +using the kernel as a tool. +A good example of this is the command language. +.[ +bourne shell 1978 bstj +%Q This issue +.] +Each user may have his own command language. +Maintenance of such code is as easy as +maintaining user code. +The idea of implementing ``system'' code with general +user primitives +comes directly from +.UC MULTICS . +.[ +organick multics 1972 +.] +.LP +.[ +$LIST$ +.] diff --git a/share/doc/psd/02.implement/ref.bib b/share/doc/psd/02.implement/ref.bib new file mode 100644 index 0000000..3414064 --- /dev/null +++ b/share/doc/psd/02.implement/ref.bib @@ -0,0 +1,54 @@ +# $FreeBSD$ + +%T U\s-2NIX\s0 Time-Sharing System: The U\s-2NIX\s0 Shell +%A S. R. Bourne +%K unix bstj +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 1971-1990 +%D 1978 + +%A R. E. Griswold +%A D. R. Hanson +%T An Overview of SL5 +%J SIGPLAN Notices +%V 12 +%N 4 +%D April 1977 +%P 40-50 + +%A E. W. Dijkstra +%T Cooperating Sequential Processes +%B Programming Languages +%E F. Genuys +%I Academic Press +%C New York +%D 1968 +%P 43-112 + +%A J. A. Hawley +%A W. B. Meyer +%T M\s-2UNIX\s0, A Multiprocessing Version of U\s-2NIX\s0 +%K munix unix +%R M.S. Thesis +%I Naval Postgraduate School +%C Monterey, Cal. +%D 1975 + +%T The U\s-2NIX\s0 Time-Sharing System +%K unix bstj +%A D. M. Ritchie +%A K. Thompson +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 1905-1929 +%D 1978 + +%A E. I. Organick +%T The M\s-2ULTICS\s0 System +%K multics +%I M.I.T. Press +%C Cambridge, Mass. +%D 1972 diff --git a/share/doc/psd/03.iosys/Makefile b/share/doc/psd/03.iosys/Makefile new file mode 100644 index 0000000..113bf90 --- /dev/null +++ b/share/doc/psd/03.iosys/Makefile @@ -0,0 +1,8 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/03.iosys +SRCS= iosys +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/psd/03.iosys/iosys b/share/doc/psd/03.iosys/iosys new file mode 100644 index 0000000..ce63bc2 --- /dev/null +++ b/share/doc/psd/03.iosys/iosys @@ -0,0 +1,1086 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)iosys 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.EH 'PSD:3-%''The UNIX I/O System' +.OH 'The UNIX I/O System''PSD:3-%' +.TL +The UNIX I/O System +.AU +Dennis M. Ritchie +.AI +AT&T Bell Laboratories +Murray Hill, NJ +.PP +This paper gives an overview of the workings of the UNIX\(dg +.FS +\(dgUNIX is a Trademark of Bell Laboratories. +.FE +I/O system. +It was written with an eye toward providing +guidance to writers of device driver routines, +and is oriented more toward describing the environment +and nature of device drivers than the implementation +of that part of the file system which deals with +ordinary files. +.PP +It is assumed that the reader has a good knowledge +of the overall structure of the file system as discussed +in the paper ``The UNIX Time-sharing System.'' +A more detailed discussion +appears in +``UNIX Implementation;'' +the current document restates parts of that one, +but is still more detailed. +It is most useful in +conjunction with a copy of the system code, +since it is basically an exegesis of that code. +.SH +Device Classes +.PP +There are two classes of device: +.I block +and +.I character. +The block interface is suitable for devices +like disks, tapes, and DECtape +which work, or can work, with addressible 512-byte blocks. +Ordinary magnetic tape just barely fits in this category, +since by use of forward +and +backward spacing any block can be read, even though +blocks can be written only at the end of the tape. +Block devices can at least potentially contain a mounted +file system. +The interface to block devices is very highly structured; +the drivers for these devices share a great many routines +as well as a pool of buffers. +.PP +Character-type devices have a much +more straightforward interface, although +more work must be done by the driver itself. +.PP +Devices of both types are named by a +.I major +and a +.I minor +device number. +These numbers are generally stored as an integer +with the minor device number +in the low-order 8 bits and the major device number +in the next-higher 8 bits; +macros +.I major +and +.I minor +are available to access these numbers. +The major device number selects which driver will deal with +the device; the minor device number is not used +by the rest of the system but is passed to the +driver at appropriate times. +Typically the minor number +selects a subdevice attached to +a given controller, or one of +several similar hardware interfaces. +.PP +The major device numbers for block and character devices +are used as indices in separate tables; +they both start at 0 and therefore overlap. +.SH +Overview of I/O +.PP +The purpose of +the +.I open +and +.I creat +system calls is to set up entries in three separate +system tables. +The first of these is the +.I u_ofile +table, +which is stored in the system's per-process +data area +.I u. +This table is indexed by +the file descriptor returned by the +.I open +or +.I creat, +and is accessed during +a +.I read, +.I write, +or other operation on the open file. +An entry contains only +a pointer to the corresponding +entry of the +.I file +table, +which is a per-system data base. +There is one entry in the +.I file +table for each +instance of +.I open +or +.I creat. +This table is per-system because the same instance +of an open file must be shared among the several processes +which can result from +.I forks +after the file is opened. +A +.I file +table entry contains +flags which indicate whether the file +was open for reading or writing or is a pipe, and +a count which is used to decide when all processes +using the entry have terminated or closed the file +(so the entry can be abandoned). +There is also a 32-bit file offset +which is used to indicate where in the file the next read +or write will take place. +Finally, there is a pointer to the +entry for the file in the +.I inode +table, +which contains a copy of the file's i-node. +.PP +Certain open files can be designated ``multiplexed'' +files, and several other flags apply to such +channels. +In such a case, instead of an offset, +there is a pointer to an associated multiplex channel table. +Multiplex channels will not be discussed here. +.PP +An entry in the +.I file +table corresponds precisely to an instance of +.I open +or +.I creat; +if the same file is opened several times, +it will have several +entries in this table. +However, +there is at most one entry +in the +.I inode +table for a given file. +Also, a file may enter the +.I inode +table not only because it is open, +but also because it is the current directory +of some process or because it +is a special file containing a currently-mounted +file system. +.PP +An entry in the +.I inode +table differs somewhat from the +corresponding i-node as stored on the disk; +the modified and accessed times are not stored, +and the entry is augmented +by a flag word containing information about the entry, +a count used to determine when it may be +allowed to disappear, +and the device and i-number +whence the entry came. +Also, the several block numbers that give addressing +information for the file are expanded from +the 3-byte, compressed format used on the disk to full +.I long +quantities. +.PP +During the processing of an +.I open +or +.I creat +call for a special file, +the system always calls the device's +.I open +routine to allow for any special processing +required (rewinding a tape, turning on +the data-terminal-ready lead of a modem, etc.). +However, +the +.I close +routine is called only when the last +process closes a file, +that is, when the i-node table entry +is being deallocated. +Thus it is not feasible +for a device to maintain, or depend on, +a count of its users, although it is quite +possible to +implement an exclusive-use device which cannot +be reopened until it has been closed. +.PP +When a +.I read +or +.I write +takes place, +the user's arguments +and the +.I file +table entry are used to set up the +variables +.I u.u_base, +.I u.u_count, +and +.I u.u_offset +which respectively contain the (user) address +of the I/O target area, the byte-count for the transfer, +and the current location in the file. +If the file referred to is +a character-type special file, the appropriate read +or write routine is called; it is responsible +for transferring data and updating the +count and current location appropriately +as discussed below. +Otherwise, the current location is used to calculate +a logical block number in the file. +If the file is an ordinary file the logical block +number must be mapped (possibly using indirect blocks) +to a physical block number; a block-type +special file need not be mapped. +This mapping is performed by the +.I bmap +routine. +In any event, the resulting physical block number +is used, as discussed below, to +read or write the appropriate device. +.SH +Character Device Drivers +.PP +The +.I cdevsw +table specifies the interface routines present for +character devices. +Each device provides five routines: +open, close, read, write, and special-function +(to implement the +.I ioctl +system call). +Any of these may be missing. +If a call on the routine +should be ignored, +(e.g. +.I open +on non-exclusive devices that require no setup) +the +.I cdevsw +entry can be given as +.I nulldev; +if it should be considered an error, +(e.g. +.I write +on read-only devices) +.I nodev +is used. +For terminals, +the +.I cdevsw +structure also contains a pointer to the +.I tty +structure associated with the terminal. +.PP +The +.I open +routine is called each time the file +is opened with the full device number as argument. +The second argument is a flag which is +non-zero only if the device is to be written upon. +.PP +The +.I close +routine is called only when the file +is closed for the last time, +that is when the very last process in +which the file is open closes it. +This means it is not possible for the driver to +maintain its own count of its users. +The first argument is the device number; +the second is a flag which is non-zero +if the file was open for writing in the process which +performs the final +.I close. +.PP +When +.I write +is called, it is supplied the device +as argument. +The per-user variable +.I u.u_count +has been set to +the number of characters indicated by the user; +for character devices, this number may be 0 +initially. +.I u.u_base +is the address supplied by the user from which to start +taking characters. +The system may call the +routine internally, so the +flag +.I u.u_segflg +is supplied that indicates, +if +.I on, +that +.I u.u_base +refers to the system address space instead of +the user's. +.PP +The +.I write +routine +should copy up to +.I u.u_count +characters from the user's buffer to the device, +decrementing +.I u.u_count +for each character passed. +For most drivers, which work one character at a time, +the routine +.I "cpass( )" +is used to pick up characters +from the user's buffer. +Successive calls on it return +the characters to be written until +.I u.u_count +goes to 0 or an error occurs, +when it returns \(mi1. +.I Cpass +takes care of interrogating +.I u.u_segflg +and updating +.I u.u_count. +.PP +Write routines which want to transfer +a probably large number of characters into an internal +buffer may also use the routine +.I "iomove(buffer, offset, count, flag)" +which is faster when many characters must be moved. +.I Iomove +transfers up to +.I count +characters into the +.I buffer +starting +.I offset +bytes from the start of the buffer; +.I flag +should be +.I B_WRITE +(which is 0) in the write case. +Caution: +the caller is responsible for making sure +the count is not too large and is non-zero. +As an efficiency note, +.I iomove +is much slower if any of +.I "buffer+offset, count" +or +.I u.u_base +is odd. +.PP +The device's +.I read +routine is called under conditions similar to +.I write, +except that +.I u.u_count +is guaranteed to be non-zero. +To return characters to the user, the routine +.I "passc(c)" +is available; it takes care of housekeeping +like +.I cpass +and returns \(mi1 as the last character +specified by +.I u.u_count +is returned to the user; +before that time, 0 is returned. +.I Iomove +is also usable as with +.I write; +the flag should be +.I B_READ +but the same cautions apply. +.PP +The ``special-functions'' routine +is invoked by the +.I stty +and +.I gtty +system calls as follows: +.I "(*p) (dev, v)" +where +.I p +is a pointer to the device's routine, +.I dev +is the device number, +and +.I v +is a vector. +In the +.I gtty +case, +the device is supposed to place up to 3 words of status information +into the vector; this will be returned to the caller. +In the +.I stty +case, +.I v +is 0; +the device should take up to 3 words of +control information from +the array +.I "u.u_arg[0...2]." +.PP +Finally, each device should have appropriate interrupt-time +routines. +When an interrupt occurs, it is turned into a C-compatible call +on the devices's interrupt routine. +The interrupt-catching mechanism makes +the low-order four bits of the ``new PS'' word in the +trap vector for the interrupt available +to the interrupt handler. +This is conventionally used by drivers +which deal with multiple similar devices +to encode the minor device number. +After the interrupt has been processed, +a return from the interrupt handler will +return from the interrupt itself. +.PP +A number of subroutines are available which are useful +to character device drivers. +Most of these handlers, for example, need a place +to buffer characters in the internal interface +between their ``top half'' (read/write) +and ``bottom half'' (interrupt) routines. +For relatively low data-rate devices, the best mechanism +is the character queue maintained by the +routines +.I getc +and +.I putc. +A queue header has the structure +.DS +struct { + int c_cc; /* character count */ + char *c_cf; /* first character */ + char *c_cl; /* last character */ +} queue; +.DE +A character is placed on the end of a queue by +.I "putc(c, &queue)" +where +.I c +is the character and +.I queue +is the queue header. +The routine returns \(mi1 if there is no space +to put the character, 0 otherwise. +The first character on the queue may be retrieved +by +.I "getc(&queue)" +which returns either the (non-negative) character +or \(mi1 if the queue is empty. +.PP +Notice that the space for characters in queues is +shared among all devices in the system +and in the standard system there are only some 600 +character slots available. +Thus device handlers, +especially write routines, must take +care to avoid gobbling up excessive numbers of characters. +.PP +The other major help available +to device handlers is the sleep-wakeup mechanism. +The call +.I "sleep(event, priority)" +causes the process to wait (allowing other processes to run) +until the +.I event +occurs; +at that time, the process is marked ready-to-run +and the call will return when there is no +process with higher +.I priority. +.PP +The call +.I "wakeup(event)" +indicates that the +.I event +has happened, that is, causes processes sleeping +on the event to be awakened. +The +.I event +is an arbitrary quantity agreed upon +by the sleeper and the waker-up. +By convention, it is the address of some data area used +by the driver, which guarantees that events +are unique. +.PP +Processes sleeping on an event should not assume +that the event has really happened; +they should check that the conditions which +caused them to sleep no longer hold. +.PP +Priorities can range from 0 to 127; +a higher numerical value indicates a less-favored +scheduling situation. +A distinction is made between processes sleeping +at priority less than the parameter +.I PZERO +and those at numerically larger priorities. +The former cannot +be interrupted by signals, although it +is conceivable that it may be swapped out. +Thus it is a bad idea to sleep with +priority less than PZERO on an event which might never occur. +On the other hand, calls to +.I sleep +with larger priority +may never return if the process is terminated by +some signal in the meantime. +Incidentally, it is a gross error to call +.I sleep +in a routine called at interrupt time, since the process +which is running is almost certainly not the +process which should go to sleep. +Likewise, none of the variables in the user area +``\fIu\fB.\fR'' +should be touched, let alone changed, by an interrupt routine. +.PP +If a device driver +wishes to wait for some event for which it is inconvenient +or impossible to supply a +.I wakeup, +(for example, a device going on-line, which does not +generally cause an interrupt), +the call +.I "sleep(&lbolt, priority) +may be given. +.I Lbolt +is an external cell whose address is awakened once every 4 seconds +by the clock interrupt routine. +.PP +The routines +.I "spl4( ), spl5( ), spl6( ), spl7( )" +are available to +set the processor priority level as indicated to avoid +inconvenient interrupts from the device. +.PP +If a device needs to know about real-time intervals, +then +.I "timeout(func, arg, interval) +will be useful. +This routine arranges that after +.I interval +sixtieths of a second, the +.I func +will be called with +.I arg +as argument, in the style +.I "(*func)(arg). +Timeouts are used, for example, +to provide real-time delays after function characters +like new-line and tab in typewriter output, +and to terminate an attempt to +read the 201 Dataphone +.I dp +if there is no response within a specified number +of seconds. +Notice that the number of sixtieths of a second is limited to 32767, +since it must appear to be positive, +and that only a bounded number of timeouts +can be going on at once. +Also, the specified +.I func +is called at clock-interrupt time, so it should +conform to the requirements of interrupt routines +in general. +.SH +The Block-device Interface +.PP +Handling of block devices is mediated by a collection +of routines that manage a set of buffers containing +the images of blocks of data on the various devices. +The most important purpose of these routines is to assure +that several processes that access the same block of the same +device in multiprogrammed fashion maintain a consistent +view of the data in the block. +A secondary but still important purpose is to increase +the efficiency of the system by +keeping in-core copies of blocks that are being +accessed frequently. +The main data base for this mechanism is the +table of buffers +.I buf. +Each buffer header contains a pair of pointers +.I "(b_forw, b_back)" +which maintain a doubly-linked list +of the buffers associated with a particular +block device, and a +pair of pointers +.I "(av_forw, av_back)" +which generally maintain a doubly-linked list of blocks +which are ``free,'' that is, +eligible to be reallocated for another transaction. +Buffers that have I/O in progress +or are busy for other purposes do not appear in this list. +The buffer header +also contains the device and block number to which the +buffer refers, and a pointer to the actual storage associated with +the buffer. +There is a word count +which is the negative of the number of words +to be transferred to or from the buffer; +there is also an error byte and a residual word +count used to communicate information +from an I/O routine to its caller. +Finally, there is a flag word +with bits indicating the status of the buffer. +These flags will be discussed below. +.PP +Seven routines constitute +the most important part of the interface with the +rest of the system. +Given a device and block number, +both +.I bread +and +.I getblk +return a pointer to a buffer header for the block; +the difference is that +.I bread +is guaranteed to return a buffer actually containing the +current data for the block, +while +.I getblk +returns a buffer which contains the data in the +block only if it is already in core (whether it is +or not is indicated by the +.I B_DONE +bit; see below). +In either case the buffer, and the corresponding +device block, is made ``busy,'' +so that other processes referring to it +are obliged to wait until it becomes free. +.I Getblk +is used, for example, +when a block is about to be totally rewritten, +so that its previous contents are +not useful; +still, no other process can be allowed to refer to the block +until the new data is placed into it. +.PP +The +.I breada +routine is used to implement read-ahead. +it is logically similar to +.I bread, +but takes as an additional argument the number of +a block (on the same device) to be read asynchronously +after the specifically requested block is available. +.PP +Given a pointer to a buffer, +the +.I brelse +routine +makes the buffer again available to other processes. +It is called, for example, after +data has been extracted following a +.I bread. +There are three subtly-different write routines, +all of which take a buffer pointer as argument, +and all of which logically release the buffer for +use by others and place it on the free list. +.I Bwrite +puts the +buffer on the appropriate device queue, +waits for the write to be done, +and sets the user's error flag if required. +.I Bawrite +places the buffer on the device's queue, but does not wait +for completion, so that errors cannot be reflected directly to +the user. +.I Bdwrite +does not start any I/O operation at all, +but merely marks +the buffer so that if it happens +to be grabbed from the free list to contain +data from some other block, the data in it will +first be written +out. +.PP +.I Bwrite +is used when one wants to be sure that +I/O takes place correctly, and that +errors are reflected to the proper user; +it is used, for example, when updating i-nodes. +.I Bawrite +is useful when more overlap is desired +(because no wait is required for I/O to finish) +but when it is reasonably certain that the +write is really required. +.I Bdwrite +is used when there is doubt that the write is +needed at the moment. +For example, +.I bdwrite +is called when the last byte of a +.I write +system call falls short of the end of a +block, on the assumption that +another +.I write +will be given soon which will re-use the same block. +On the other hand, +as the end of a block is passed, +.I bawrite +is called, since probably the block will +not be accessed again soon and one might as +well start the writing process as soon as possible. +.PP +In any event, notice that the routines +.I "getblk" +and +.I bread +dedicate the given block exclusively to the +use of the caller, and make others wait, +while one of +.I "brelse, bwrite, bawrite," +or +.I bdwrite +must eventually be called to free the block for use by others. +.PP +As mentioned, each buffer header contains a flag +word which indicates the status of the buffer. +Since they provide +one important channel for information between the drivers and the +block I/O system, it is important to understand these flags. +The following names are manifest constants which +select the associated flag bits. +.IP B_READ 10 +This bit is set when the buffer is handed to the device strategy routine +(see below) to indicate a read operation. +The symbol +.I B_WRITE +is defined as 0 and does not define a flag; it is provided +as a mnemonic convenience to callers of routines like +.I swap +which have a separate argument +which indicates read or write. +.IP B_DONE 10 +This bit is set +to 0 when a block is handed to the the device strategy +routine and is turned on when the operation completes, +whether normally as the result of an error. +It is also used as part of the return argument of +.I getblk +to indicate if 1 that the returned +buffer actually contains the data in the requested block. +.IP B_ERROR 10 +This bit may be set to 1 when +.I B_DONE +is set to indicate that an I/O or other error occurred. +If it is set the +.I b_error +byte of the buffer header may contain an error code +if it is non-zero. +If +.I b_error +is 0 the nature of the error is not specified. +Actually no driver at present sets +.I b_error; +the latter is provided for a future improvement +whereby a more detailed error-reporting +scheme may be implemented. +.IP B_BUSY 10 +This bit indicates that the buffer header is not on +the free list, i.e. is +dedicated to someone's exclusive use. +The buffer still remains attached to the list of +blocks associated with its device, however. +When +.I getblk +(or +.I bread, +which calls it) searches the buffer list +for a given device and finds the requested +block with this bit on, it sleeps until the bit +clears. +.IP B_PHYS 10 +This bit is set for raw I/O transactions that +need to allocate the Unibus map on an 11/70. +.IP B_MAP 10 +This bit is set on buffers that have the Unibus map allocated, +so that the +.I iodone +routine knows to deallocate the map. +.IP B_WANTED 10 +This flag is used in conjunction with the +.I B_BUSY +bit. +Before sleeping as described +just above, +.I getblk +sets this flag. +Conversely, when the block is freed and the busy bit +goes down (in +.I brelse) +a +.I wakeup +is given for the block header whenever +.I B_WANTED +is on. +This strategem avoids the overhead +of having to call +.I wakeup +every time a buffer is freed on the chance that someone +might want it. +.IP B_AGE +This bit may be set on buffers just before releasing them; if it +is on, +the buffer is placed at the head of the free list, rather than at the +tail. +It is a performance heuristic +used when the caller judges that the same block will not soon be used again. +.IP B_ASYNC 10 +This bit is set by +.I bawrite +to indicate to the appropriate device driver +that the buffer should be released when the +write has been finished, usually at interrupt time. +The difference between +.I bwrite +and +.I bawrite +is that the former starts I/O, waits until it is done, and +frees the buffer. +The latter merely sets this bit and starts I/O. +The bit indicates that +.I relse +should be called for the buffer on completion. +.IP B_DELWRI 10 +This bit is set by +.I bdwrite +before releasing the buffer. +When +.I getblk, +while searching for a free block, +discovers the bit is 1 in a buffer it would otherwise grab, +it causes the block to be written out before reusing it. +.SH +Block Device Drivers +.PP +The +.I bdevsw +table contains the names of the interface routines +and that of a table for each block device. +.PP +Just as for character devices, block device drivers may supply +an +.I open +and a +.I close +routine +called respectively on each open and on the final close +of the device. +Instead of separate read and write routines, +each block device driver has a +.I strategy +routine which is called with a pointer to a buffer +header as argument. +As discussed, the buffer header contains +a read/write flag, the core address, +the block number, a (negative) word count, +and the major and minor device number. +The role of the strategy routine +is to carry out the operation as requested by the +information in the buffer header. +When the transaction is complete the +.I B_DONE +(and possibly the +.I B_ERROR) +bits should be set. +Then if the +.I B_ASYNC +bit is set, +.I brelse +should be called; +otherwise, +.I wakeup. +In cases where the device +is capable, under error-free operation, +of transferring fewer words than requested, +the device's word-count register should be placed +in the residual count slot of +the buffer header; +otherwise, the residual count should be set to 0. +This particular mechanism is really for the benefit +of the magtape driver; +when reading this device +records shorter than requested are quite normal, +and the user should be told the actual length of the record. +.PP +Although the most usual argument +to the strategy routines +is a genuine buffer header allocated as discussed above, +all that is actually required +is that the argument be a pointer to a place containing the +appropriate information. +For example the +.I swap +routine, which manages movement +of core images to and from the swapping device, +uses the strategy routine +for this device. +Care has to be taken that +no extraneous bits get turned on in the +flag word. +.PP +The device's table specified by +.I bdevsw +has a +byte to contain an active flag and an error count, +a pair of links which constitute the +head of the chain of buffers for the device +.I "(b_forw, b_back)," +and a first and last pointer for a device queue. +Of these things, all are used solely by the device driver +itself +except for the buffer-chain pointers. +Typically the flag encodes the state of the +device, and is used at a minimum to +indicate that the device is currently engaged in +transferring information and no new command should be issued. +The error count is useful for counting retries +when errors occur. +The device queue is used to remember stacked requests; +in the simplest case it may be maintained as a first-in +first-out list. +Since buffers which have been handed over to +the strategy routines are never +on the list of free buffers, +the pointers in the buffer which maintain the free list +.I "(av_forw, av_back)" +are also used to contain the pointers +which maintain the device queues. +.PP +A couple of routines +are provided which are useful to block device drivers. +.I "iodone(bp)" +arranges that the buffer to which +.I bp +points be released or awakened, +as appropriate, +when the +strategy module has finished with the buffer, +either normally or after an error. +(In the latter case the +.I B_ERROR +bit has presumably been set.) +.PP +The routine +.I "geterror(bp)" +can be used to examine the error bit in a buffer header +and arrange that any error indication found therein is +reflected to the user. +It may be called only in the non-interrupt +part of a driver when I/O has completed +.I (B_DONE +has been set). +.SH +Raw Block-device I/O +.PP +A scheme has been set up whereby block device drivers may +provide the ability to transfer information +directly between the user's core image and the device +without the use of buffers and in blocks as large as +the caller requests. +The method involves setting up a character-type special file +corresponding to the raw device +and providing +.I read +and +.I write +routines which set up what is usually a private, +non-shared buffer header with the appropriate information +and call the device's strategy routine. +If desired, separate +.I open +and +.I close +routines may be provided but this is usually unnecessary. +A special-function routine might come in handy, especially for +magtape. +.PP +A great deal of work has to be done to generate the +``appropriate information'' +to put in the argument buffer for +the strategy module; +the worst part is to map relocated user addresses to physical addresses. +Most of this work is done by +.I "physio(strat, bp, dev, rw) +whose arguments are the name of the +strategy routine +.I strat, +the buffer pointer +.I bp, +the device number +.I dev, +and a read-write flag +.I rw +whose value is either +.I B_READ +or +.I B_WRITE. +.I Physio +makes sure that the user's base address and count are +even (because most devices work in words) +and that the core area affected is contiguous +in physical space; +it delays until the buffer is not busy, and makes it +busy while the operation is in progress; +and it sets up user error return information. diff --git a/share/doc/psd/04.uprog/Makefile b/share/doc/psd/04.uprog/Makefile new file mode 100644 index 0000000..f149dcf --- /dev/null +++ b/share/doc/psd/04.uprog/Makefile @@ -0,0 +1,8 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/04.uprog +SRCS= p.mac p0 p1 p2 p3 p4 p5 p6 p8 p9 +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/psd/04.uprog/p.mac b/share/doc/psd/04.uprog/p.mac new file mode 100644 index 0000000..7f9295c --- /dev/null +++ b/share/doc/psd/04.uprog/p.mac @@ -0,0 +1,71 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p.mac 8.1 (Berkeley) 6/8/93 +.de UC +\&\\$3\s-1\\$1\\s0\&\\$2 +.. +.de IT +\&\\$3\fI\\$1\fR\^\&\\$2 +.. +.de UL +\%\&\\$3\f(CW\s-1\\$1\s0\fR\&\\$2 +.. +.de P1 +.DS I .5i +.nf +.ft CW +.ps \\n(PS-1 +.vs \\n(VS-1 +.. +.de P2 +.ps \\n(PS +.vs \\n(VS +.ft R +.DE +.. +.hy 14 \"2=not last lines; 4= no -xx; 8=no xx- +.am SH +.ft R +.. +.am NH +.ft R +.. +.am TL +.ft R +.. diff --git a/share/doc/psd/04.uprog/p0 b/share/doc/psd/04.uprog/p0 new file mode 100644 index 0000000..5aaf325 --- /dev/null +++ b/share/doc/psd/04.uprog/p0 @@ -0,0 +1,82 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p0 8.1 (Berkeley) 6/8/93 +.\" +.if n .ls 1 +.\" .TM 78-1273-9 39199 39199-11 +.\" .ND October 2, 1978 +.\" .old TM 75-1273-11 October 22, 1975 +.OH 'UNIX Programming \(em Second Edition''PSD:4-%' +.EH 'PSD:4-%''UNIX Programming \(em Second Edition' +.TL +UNIX Programming \(em Second Edition +.AU "MH 2C-518" 6021 +Brian W. Kernighan +.AU "MH 2C-517" 3770 +Dennis M. Ritchie +.AI +AT&T Bell Laboratories +Murray Hill, NJ 07974 +.AB +.PP +This paper is an introduction to programming on +the +.UX +system. +The emphasis is on how to write programs that interface +to the operating system, +either directly or through the standard I/O library. +The topics discussed include +.IP " \(bu" +handling command arguments +.IP " \(bu" +rudimentary I/O; the standard input and output +.IP " \(bu" +the standard I/O library; file system access +.IP " \(bu" +low-level I/O: open, read, write, close, seek +.IP " \(bu" +processes: exec, fork, pipes +.IP " \(bu" +signals \(em interrupts, etc. +.PP +There is also an appendix which describes +the standard I/O library in detail. +.AE +.\" .CS 17 0 17 0 0 4 diff --git a/share/doc/psd/04.uprog/p1 b/share/doc/psd/04.uprog/p1 new file mode 100644 index 0000000..848862e --- /dev/null +++ b/share/doc/psd/04.uprog/p1 @@ -0,0 +1,88 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p1 8.1 (Berkeley) 6/8/93 +.\" +.if n .ls 2 +.if t .tr |\(or +.NH +INTRODUCTION +.PP +This paper describes how to write +programs +that interface with the +.UC UNIX +operating system in a non-trivial way. +This includes programs that use files by name, +that use pipes, +that invoke other commands as they run, +or that attempt to catch interrupts and other signals +during execution. +.PP +The document collects material which is scattered +throughout several sections of +.I +The +.UC UNIX +Programmer's Manual +.R +[1] +for Version 7 +.UC UNIX . +There is no attempt to be complete; +only generally useful material is dealt with. +It is assumed that you will be programming in C, +so you must be able to read the language +roughly up to the level of +.I +The C Programming Language +.R +[2]. +Some of the material in sections 2 through 4 +is based on +topics covered more carefully there. +You should also be familiar with +.UC UNIX +itself +at least +to the level of +.I +.UC UNIX +for Beginners +.R +[3]. diff --git a/share/doc/psd/04.uprog/p2 b/share/doc/psd/04.uprog/p2 new file mode 100644 index 0000000..280f65c --- /dev/null +++ b/share/doc/psd/04.uprog/p2 @@ -0,0 +1,275 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p2 8.1 (Berkeley) 6/8/93 +.\" +.NH +BASICS +.NH 2 +Program Arguments +.PP +When a C program is run as a command, +the arguments on the command line are made available +to the +function +.UL main +as an argument count +.UL argc +and an array +.UL argv +of +pointers to +character strings +that contain +the arguments. +By convention, +.UL argv[0] +is the command name itself, +so +.UL argc +is always greater than 0. +.PP +The following program illustrates the mechanism: +it simply echoes its arguments +back to the terminal. +(This is essentially the +.UL echo +command.) +.P1 +main(argc, argv) /* echo arguments */ +int argc; +char *argv[]; +{ + int i; + + for (i = 1; i < argc; i++) + printf("%s%c", argv[i], (i<argc-1) ? ' ' : '\en'); +} +.P2 +.UL argv +is a pointer to an array +whose individual elements are pointers to arrays of characters; +each is terminated by +.UL \e0 , +so they can be treated as strings. +The program starts by printing +.UL argv[1] +and loops until it has printed them all. +.PP +The argument count and the arguments +are parameters to +.UL main . +If you want to keep them around so other +routines can get at them, you must +copy them to external variables. +.NH 2 +The ``Standard Input'' and ``Standard Output'' +.PP +The simplest input mechanism is to read the ``standard input,'' +which is generally the user's terminal. +The function +.UL getchar +returns the next input character each time it is called. +A file may be substituted for the terminal by +using the +.UL < +convention: +if +.UL prog +uses +.UL getchar , +then +the command line +.P1 +prog <file +.P2 +causes +.UL prog +to read +.UL file +instead of the terminal. +.UL prog +itself need know nothing about where its input +is coming from. +This is also true if the input comes from another program via +the +.U +pipe mechanism: +.P1 +otherprog | prog +.P2 +provides the standard input for +.UL prog +from the standard output of +.UL otherprog. +.PP +.UL getchar +returns the value +.UL EOF +when it encounters the end of file +(or an error) +on whatever you are reading. +The value of +.UL EOF +is normally defined to be +.UL -1 , +but it is unwise to take any advantage +of that knowledge. +As will become clear shortly, +this value is automatically defined for you when +you compile a program, +and need not be of any concern. +.PP +Similarly, +.UL putchar(c) +puts the character +.UL c +on the ``standard output,'' +which is also by default the terminal. +The output can be captured on a file +by using +.UL > : +if +.UL prog +uses +.UL putchar , +.P1 +prog >outfile +.P2 +writes the standard output on +.UL outfile +instead of the terminal. +.UL outfile +is created if it doesn't exist; +if it already exists, its previous contents are overwritten. +And a pipe can be used: +.P1 +prog | otherprog +.P2 +puts the standard output of +.UL prog +into the standard input of +.UL otherprog. +.PP +The function +.UL printf , +which formats output in various ways, +uses +the same mechanism as +.UL putchar +does, +so calls to +.UL printf +and +.UL putchar +may be intermixed in any order; +the output will appear in the order of the calls. +.PP +Similarly, the function +.UL scanf +provides for formatted input conversion; +it will read the standard input and break it +up into strings, numbers, etc., +as desired. +.UL scanf +uses the same mechanism as +.UL getchar , +so calls to them may also be intermixed. +.PP +Many programs +read only one input and write one output; +for such programs I/O +with +.UL getchar , +.UL putchar , +.UL scanf , +and +.UL printf +may be entirely adequate, +and it is almost always enough to get started. +This is particularly true if +the +.UC UNIX +pipe facility is used to connect the output of +one program to the input of the next. +For example, the following program +strips out all ascii control characters +from its input +(except for newline and tab). +.P1 +#include <stdio.h> + +main() /* ccstrip: strip non-graphic characters */ +{ + int c; + while ((c = getchar()) != EOF) + if ((c >= ' ' && c < 0177) || c == '\et' || c == '\en') + putchar(c); + exit(0); +} +.P2 +The line +.P1 +#include <stdio.h> +.P2 +should appear at the beginning of each source file. +It causes the C compiler to read a file +.IT /usr/include/stdio.h ) ( +of +standard routines and symbols +that includes the definition of +.UL EOF . +.PP +If it is necessary to treat multiple files, +you can use +.UL cat +to collect the files for you: +.P1 +cat file1 file2 ... | ccstrip >output +.P2 +and thus avoid learning how to access files from a program. +By the way, +the call to +.UL exit +at the end is not necessary to make the program work +properly, +but it assures that any caller +of the program will see a normal termination status +(conventionally 0) +from the program when it completes. +Section 6 discusses status returns in more detail. diff --git a/share/doc/psd/04.uprog/p3 b/share/doc/psd/04.uprog/p3 new file mode 100644 index 0000000..201c4a9 --- /dev/null +++ b/share/doc/psd/04.uprog/p3 @@ -0,0 +1,469 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p3 8.1 (Berkeley) 6/8/93 +.\" +.NH +THE STANDARD I/O LIBRARY +.PP +The ``Standard I/O Library'' +is a collection of routines +intended to provide +efficient +and portable +I/O services +for most C programs. +The standard I/O library is available on each system that supports C, +so programs that confine +their system interactions +to its facilities +can be transported from one system to another essentially without change. +.PP +In this section, we will discuss the basics of the standard I/O library. +The appendix contains a more complete description of its capabilities. +.NH 2 +File Access +.PP +The programs written so far have all +read the standard input and written the standard output, +which we have assumed are magically pre-defined. +The next step +is to write a program that accesses +a file that is +.ul +not +already connected to the program. +One simple example is +.IT wc , +which counts the lines, words and characters +in a set of files. +For instance, the command +.P1 +wc x.c y.c +.P2 +prints the number of lines, words and characters +in +.UL x.c +and +.UL y.c +and the totals. +.PP +The question is how to arrange for the named files +to be read \(em +that is, how to connect the file system names +to the I/O statements which actually read the data. +.PP +The rules are simple. +Before it can be read or written +a file has to be +.ul +opened +by the standard library function +.UL fopen . +.UL fopen +takes an external name +(like +.UL x.c +or +.UL y.c ), +does some housekeeping and negotiation with the operating system, +and returns an internal name +which must be used in subsequent +reads or writes of the file. +.PP +This internal name is actually a pointer, +called a +.IT file +.IT pointer , +to a structure +which contains information about the file, +such as the location of a buffer, +the current character position in the buffer, +whether the file is being read or written, +and the like. +Users don't need to know the details, +because part of the standard I/O definitions +obtained by including +.UL stdio.h +is a structure definition called +.UL FILE . +The only declaration needed for a file pointer +is exemplified by +.P1 +FILE *fp, *fopen(); +.P2 +This says that +.UL fp +is a pointer to a +.UL FILE , +and +.UL fopen +returns a pointer to +a +.UL FILE . +.UL FILE \& ( +is a type name, like +.UL int , +not a structure tag. +.PP +The actual call to +.UL fopen +in a program +is +.P1 +fp = fopen(name, mode); +.P2 +The first argument of +.UL fopen +is the +name +of the file, +as a character string. +The second argument is the +mode, +also as a character string, +which indicates how you intend to +use the file. +The only allowable modes are +read +.UL \&"r" ), ( +write +.UL \&"w" ), ( +or append +.UL \&"a" ). ( +.PP +If a file that you open for writing or appending does not exist, +it is created +(if possible). +Opening an existing file for writing causes the old contents +to be discarded. +Trying to read a file that does not exist +is an error, +and there may be other causes of error +as well +(like trying to read a file +when you don't have permission). +If there is any error, +.UL fopen +will return the null pointer +value +.UL NULL +(which is defined as zero in +.UL stdio.h ). +.PP +The next thing needed is a way to read or write the file +once it is open. +There are several possibilities, +of which +.UL getc +and +.UL putc +are the simplest. +.UL getc +returns the next character from a file; +it needs the file pointer to tell it what file. +Thus +.P1 +c = getc(fp) +.P2 +places in +.UL c +the next character from the file referred to by +.UL fp ; +it returns +.UL EOF +when it reaches end of file. +.UL putc +is the inverse of +.UL getc : +.P1 +putc(c, fp) +.P2 +puts the character +.UL c +on the file +.UL fp +and returns +.UL c . +.UL getc +and +.UL putc +return +.UL EOF +on error. +.PP +When a program is started, three files are opened automatically, +and file pointers are provided for them. +These files are the standard input, +the standard output, +and the standard error output; +the corresponding file pointers are +called +.UL stdin , +.UL stdout , +and +.UL stderr . +Normally these are all connected to the terminal, +but +may be redirected to files or pipes as described in +Section 2.2. +.UL stdin , +.UL stdout +and +.UL stderr +are pre-defined in the I/O library +as the standard input, output and error files; +they may be used anywhere an object of type +.UL FILE\ * +can be. +They are +constants, however, +.ul +not +variables, +so don't try to assign to them. +.PP +With some of the preliminaries out of the way, +we can now write +.IT wc . +The basic design +is one that has been found +convenient for many programs: +if there are command-line arguments, they are processed in order. +If there are no arguments, the standard input +is processed. +This way the program can be used stand-alone +or as part of a larger process. +.P1 +#include <stdio.h> + +main(argc, argv) /* wc: count lines, words, chars */ +int argc; +char *argv[]; +{ + int c, i, inword; + FILE *fp, *fopen(); + long linect, wordct, charct; + long tlinect = 0, twordct = 0, tcharct = 0; + + i = 1; + fp = stdin; + do { + if (argc > 1 && (fp=fopen(argv[i], "r")) == NULL) { + fprintf(stderr, "wc: can't open %s\en", argv[i]); + continue; + } + linect = wordct = charct = inword = 0; + while ((c = getc(fp)) != EOF) { + charct++; + if (c == '\en') + linect++; + if (c == ' ' || c == '\et' || c == '\en') + inword = 0; + else if (inword == 0) { + inword = 1; + wordct++; + } + } + printf("%7ld %7ld %7ld", linect, wordct, charct); + printf(argc > 1 ? " %s\en" : "\en", argv[i]); + fclose(fp); + tlinect += linect; + twordct += wordct; + tcharct += charct; + } while (++i < argc); + if (argc > 2) + printf("%7ld %7ld %7ld total\en", tlinect, twordct, tcharct); + exit(0); +} +.P2 +The function +.UL fprintf +is identical to +.UL printf , +save that the first argument is a file pointer +that specifies the file to be +written. +.PP +The function +.UL fclose +is the inverse of +.UL fopen ; +it breaks the connection between the file pointer and the external name +that was established by +.UL fopen , +freeing the +file pointer for another file. +Since there is a limit on the number +of files +that a program may have open simultaneously, +it's a good idea to free things when they are no longer needed. +There is also another reason to call +.UL fclose +on an output file +\(em it flushes the buffer +in which +.UL putc +is collecting output. +.UL fclose \& ( +is called automatically for each open file +when a program terminates normally.) +.NH 2 +Error Handling \(em Stderr and Exit +.PP +.UL stderr +is assigned to a program in the same way that +.UL stdin +and +.UL stdout +are. +Output written on +.UL stderr +appears on the user's terminal +even if the standard output is redirected. +.IT wc +writes its diagnostics on +.UL stderr +instead of +.UL stdout +so that if one of the files can't +be accessed for some reason, +the message +finds its way to the user's terminal instead of disappearing +down a pipeline +or into an output file. +.PP +The program actually signals errors in another way, +using the function +.UL exit +to terminate program execution. +The argument of +.UL exit +is available to whatever process +called it (see Section 6), +so the success or failure +of the program can be tested by another program +that uses this one as a sub-process. +By convention, a return value of 0 +signals that all is well; +non-zero values signal abnormal situations. +.PP +.UL exit +itself +calls +.UL fclose +for each open output file, +to flush out any buffered output, +then calls +a routine named +.UL _exit . +The function +.UL _exit +causes immediate termination without any buffer flushing; +it may be called directly if desired. +.NH 2 +Miscellaneous I/O Functions +.PP +The standard I/O library provides several other I/O functions +besides those we have illustrated above. +.PP +Normally output with +.UL putc , +etc., is buffered (except to +.UL stderr ); +to force it out immediately, use +.UL fflush(fp) . +.PP +.UL fscanf +is identical to +.UL scanf , +except that its first argument is a file pointer +(as with +.UL fprintf ) +that specifies the file from which the input comes; +it returns +.UL EOF +at end of file. +.PP +The functions +.UL sscanf +and +.UL sprintf +are identical to +.UL fscanf +and +.UL fprintf , +except that the first argument names a character string +instead of a file pointer. +The conversion is done from the string +for +.UL sscanf +and into it for +.UL sprintf . +.PP +.UL fgets(buf,\ size,\ fp) +copies the next line from +.UL fp , +up to and including a newline, +into +.UL buf ; +at most +.UL size-1 +characters are copied; +it returns +.UL NULL +at end of file. +.UL fputs(buf,\ fp) +writes the string in +.UL buf +onto file +.UL fp . +.PP +The function +.UL ungetc(c,\ fp) +``pushes back'' the character +.UL c +onto the input stream +.UL fp ; +a subsequent call to +.UL getc , +.UL fscanf , +etc., +will encounter +.UL c . +Only one character of pushback per file is permitted. diff --git a/share/doc/psd/04.uprog/p4 b/share/doc/psd/04.uprog/p4 new file mode 100644 index 0000000..fe23ac3 --- /dev/null +++ b/share/doc/psd/04.uprog/p4 @@ -0,0 +1,600 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p4 8.1 (Berkeley) 6/8/93 +.\" +.NH +LOW-LEVEL I/O +.PP +This section describes the +bottom level of I/O on the +.UC UNIX +system. +The lowest level of I/O in +.UC UNIX +provides no buffering or any other services; +it is in fact a direct entry into the operating system. +You are entirely on your own, +but on the other hand, +you have the most control over what happens. +And since the calls and usage are quite simple, +this isn't as bad as it sounds. +.NH 2 +File Descriptors +.PP +In the +.UC UNIX +operating system, +all input and output is done +by reading or writing files, +because all peripheral devices, even the user's terminal, +are files in the file system. +This means that a single, homogeneous interface +handles all communication between a program and peripheral devices. +.PP +In the most general case, +before reading or writing a file, +it is necessary to inform the system +of your intent to do so, +a process called +``opening'' the file. +If you are going to write on a file, +it may also be necessary to create it. +The system checks your right to do so +(Does the file exist? +Do you have permission to access it?), +and if all is well, +returns a small positive integer +called a +.ul +file descriptor. +Whenever I/O is to be done on the file, +the file descriptor is used instead of the name to identify the file. +(This is roughly analogous to the use of +.UC READ(5,...) +and +.UC WRITE(6,...) +in Fortran.) +All +information about an open file is maintained by the system; +the user program refers to the file +only +by the file descriptor. +.PP +The file pointers discussed in section 3 +are similar in spirit to file descriptors, +but file descriptors are more fundamental. +A file pointer is a pointer to a structure that contains, +among other things, the file descriptor for the file in question. +.PP +Since input and output involving the user's terminal +are so common, +special arrangements exist to make this convenient. +When the command interpreter (the +``shell'') +runs a program, +it opens +three files, with file descriptors 0, 1, and 2, +called the standard input, +the standard output, and the standard error output. +All of these are normally connected to the terminal, +so if a program reads file descriptor 0 +and writes file descriptors 1 and 2, +it can do terminal I/O +without worrying about opening the files. +.PP +If I/O is redirected +to and from files with +.UL < +and +.UL > , +as in +.P1 +prog <infile >outfile +.P2 +the shell changes the default assignments for file descriptors +0 and 1 +from the terminal to the named files. +Similar observations hold if the input or output is associated with a pipe. +Normally file descriptor 2 remains attached to the terminal, +so error messages can go there. +In all cases, +the file assignments are changed by the shell, +not by the program. +The program does not need to know where its input +comes from nor where its output goes, +so long as it uses file 0 for input and 1 and 2 for output. +.NH 2 +Read and Write +.PP +All input and output is done by +two functions called +.UL read +and +.UL write . +For both, the first argument is a file descriptor. +The second argument is a buffer in your program where the data is to +come from or go to. +The third argument is the number of bytes to be transferred. +The calls are +.P1 +n_read = read(fd, buf, n); + +n_written = write(fd, buf, n); +.P2 +Each call returns a byte count +which is the number of bytes actually transferred. +On reading, +the number of bytes returned may be less than +the number asked for, +because fewer than +.UL n +bytes remained to be read. +(When the file is a terminal, +.UL read +normally reads only up to the next newline, +which is generally less than what was requested.) +A return value of zero bytes implies end of file, +and +.UL -1 +indicates an error of some sort. +For writing, the returned value is the number of bytes +actually written; +it is generally an error if this isn't equal +to the number supposed to be written. +.PP +The number of bytes to be read or written is quite arbitrary. +The two most common values are +1, +which means one character at a time +(``unbuffered''), +and +512, +which corresponds to a physical blocksize on many peripheral devices. +This latter size will be most efficient, +but even character at a time I/O +is not inordinately expensive. +.PP +Putting these facts together, +we can write a simple program to copy +its input to its output. +This program will copy anything to anything, +since the input and output can be redirected to any file or device. +.P1 +#define BUFSIZE 512 /* best size for PDP-11 UNIX */ + +main() /* copy input to output */ +{ + char buf[BUFSIZE]; + int n; + + while ((n = read(0, buf, BUFSIZE)) > 0) + write(1, buf, n); + exit(0); +} +.P2 +If the file size is not a multiple of +.UL BUFSIZE , +some +.UL read +will return a smaller number of bytes +to be written by +.UL write ; +the next call to +.UL read +after that +will return zero. +.PP +It is instructive to see how +.UL read +and +.UL write +can be used to construct +higher level routines like +.UL getchar , +.UL putchar , +etc. +For example, +here is a version of +.UL getchar +which does unbuffered input. +.P1 +#define CMASK 0377 /* for making char's > 0 */ + +getchar() /* unbuffered single character input */ +{ + char c; + + return((read(0, &c, 1) > 0) ? c & CMASK : EOF); +} +.P2 +.UL c +.ul +must +be declared +.UL char , +because +.UL read +accepts a character pointer. +The character being returned must be masked with +.UL 0377 +to ensure that it is positive; +otherwise sign extension may make it negative. +(The constant +.UL 0377 +is appropriate for the +.UC PDP -11 +but not necessarily for other machines.) +.PP +The second version of +.UL getchar +does input in big chunks, +and hands out the characters one at a time. +.P1 +#define CMASK 0377 /* for making char's > 0 */ +#define BUFSIZE 512 + +getchar() /* buffered version */ +{ + static char buf[BUFSIZE]; + static char *bufp = buf; + static int n = 0; + + if (n == 0) { /* buffer is empty */ + n = read(0, buf, BUFSIZE); + bufp = buf; + } + return((--n >= 0) ? *bufp++ & CMASK : EOF); +} +.P2 +.NH 2 +Open, Creat, Close, Unlink +.PP +Other than the default +standard input, output and error files, +you must explicitly open files in order to +read or write them. +There are two system entry points for this, +.UL open +and +.UL creat +[sic]. +.PP +.UL open +is rather like the +.UL fopen +discussed in the previous section, +except that instead of returning a file pointer, +it returns a file descriptor, +which is just an +.UL int . +.P1 +int fd; + +fd = open(name, rwmode); +.P2 +As with +.UL fopen , +the +.UL name +argument +is a character string corresponding to the external file name. +The access mode argument +is different, however: +.UL rwmode +is 0 for read, 1 for write, and 2 for read and write access. +.UL open +returns +.UL -1 +if any error occurs; +otherwise it returns a valid file descriptor. +.PP +It is an error to +try to +.UL open +a file that does not exist. +The entry point +.UL creat +is provided to create new files, +or to re-write old ones. +.P1 +fd = creat(name, pmode); +.P2 +returns a file descriptor +if it was able to create the file +called +.UL name , +and +.UL -1 +if not. +If the file +already exists, +.UL creat +will truncate it to zero length; +it is not an error to +.UL creat +a file that already exists. +.PP +If the file is brand new, +.UL creat +creates it with the +.ul +protection mode +specified by +the +.UL pmode +argument. +In the +.UC UNIX +file system, +there are nine bits of protection information +associated with a file, +controlling read, write and execute permission for +the owner of the file, +for the owner's group, +and for all others. +Thus a three-digit octal number +is most convenient for specifying the permissions. +For example, +0755 +specifies read, write and execute permission for the owner, +and read and execute permission for the group and everyone else. +.PP +To illustrate, +here is a simplified version of +the +.UC UNIX +utility +.IT cp , +a program which copies one file to another. +(The main simplification is that our version +copies only one file, +and does not permit the second argument +to be a directory.) +.P1 +#define NULL 0 +#define BUFSIZE 512 +#define PMODE 0644 /* RW for owner, R for group, others */ + +main(argc, argv) /* cp: copy f1 to f2 */ +int argc; +char *argv[]; +{ + int f1, f2, n; + char buf[BUFSIZE]; + + if (argc != 3) + error("Usage: cp from to", NULL); + if ((f1 = open(argv[1], 0)) == -1) + error("cp: can't open %s", argv[1]); + if ((f2 = creat(argv[2], PMODE)) == -1) + error("cp: can't create %s", argv[2]); + + while ((n = read(f1, buf, BUFSIZE)) > 0) + if (write(f2, buf, n) != n) + error("cp: write error", NULL); + exit(0); +} +.P2 +.P1 +error(s1, s2) /* print error message and die */ +char *s1, *s2; +{ + printf(s1, s2); + printf("\en"); + exit(1); +} +.P2 +.PP +As we said earlier, +there is a limit (typically 15-25) +on the number of files which a program +may have open simultaneously. +Accordingly, any program which intends to process +many files must be prepared to re-use +file descriptors. +The routine +.UL close +breaks the connection between a file descriptor +and an open file, +and frees the +file descriptor for use with some other file. +Termination of a program +via +.UL exit +or return from the main program closes all open files. +.PP +The function +.UL unlink(filename) +removes the file +.UL filename +from the file system. +.NH 2 +Random Access \(em Seek and Lseek +.PP +File I/O is normally sequential: +each +.UL read +or +.UL write +takes place at a position in the file +right after the previous one. +When necessary, however, +a file can be read or written in any arbitrary order. +The +system call +.UL lseek +provides a way to move around in +a file without actually reading +or writing: +.P1 +lseek(fd, offset, origin); +.P2 +forces the current position in the file +whose descriptor is +.UL fd +to move to position +.UL offset , +which is taken relative to the location +specified by +.UL origin . +Subsequent reading or writing will begin at that position. +.UL offset +is +a +.UL long ; +.UL fd +and +.UL origin +are +.UL int 's. +.UL origin +can be 0, 1, or 2 to specify that +.UL offset +is to be +measured from +the beginning, from the current position, or from the +end of the file respectively. +For example, +to append to a file, +seek to the end before writing: +.P1 +lseek(fd, 0L, 2); +.P2 +To get back to the beginning (``rewind''), +.P1 +lseek(fd, 0L, 0); +.P2 +Notice the +.UL 0L +argument; +it could also be written as +.UL (long)\ 0 . +.PP +With +.UL lseek , +it is possible to treat files more or less like large arrays, +at the price of slower access. +For example, the following simple function reads any number of bytes +from any arbitrary place in a file. +.P1 +get(fd, pos, buf, n) /* read n bytes from position pos */ +int fd, n; +long pos; +char *buf; +{ + lseek(fd, pos, 0); /* get to pos */ + return(read(fd, buf, n)); +} +.P2 +.PP +In pre-version 7 +.UC UNIX , +the basic entry point to the I/O system +is called +.UL seek . +.UL seek +is identical to +.UL lseek , +except that its +.UL offset +argument is an +.UL int +rather than a +.UL long . +Accordingly, +since +.UC PDP -11 +integers have only 16 bits, +the +.UL offset +specified +for +.UL seek +is limited to 65,535; +for this reason, +.UL origin +values of 3, 4, 5 cause +.UL seek +to multiply the given offset by 512 +(the number of bytes in one physical block) +and then interpret +.UL origin +as if it were 0, 1, or 2 respectively. +Thus to get to an arbitrary place in a large file +requires two seeks, first one which selects +the block, then one which +has +.UL origin +equal to 1 and moves to the desired byte within the block. +.NH 2 +Error Processing +.PP +The routines discussed in this section, +and in fact all the routines which are direct entries into the system +can incur errors. +Usually they indicate an error by returning a value of \-1. +Sometimes it is nice to know what sort of error occurred; +for this purpose all these routines, when appropriate, +leave an error number in the external cell +.UL errno . +The meanings of the various error numbers are +listed +in the introduction to Section II +of the +.I +.UC UNIX +Programmer's Manual, +.R +so your program can, for example, determine if +an attempt to open a file failed because it did not exist +or because the user lacked permission to read it. +Perhaps more commonly, +you may want to print out the +reason for failure. +The routine +.UL perror +will print a message associated with the value +of +.UL errno ; +more generally, +.UL sys\_errno +is an array of character strings which can be indexed +by +.UL errno +and printed by your program. diff --git a/share/doc/psd/04.uprog/p5 b/share/doc/psd/04.uprog/p5 new file mode 100644 index 0000000..652641f --- /dev/null +++ b/share/doc/psd/04.uprog/p5 @@ -0,0 +1,577 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p5 8.1 (Berkeley) 6/8/93 +.\" +.NH +PROCESSES +.PP +It is often easier to use a program written +by someone else than to invent one's own. +This section describes how to +execute a program from within another. +.NH 2 +The ``System'' Function +.PP +The easiest way to execute a program from another +is to use +the standard library routine +.UL system . +.UL system +takes one argument, a command string exactly as typed +at the terminal +(except for the newline at the end) +and executes it. +For instance, to time-stamp the output of a program, +.P1 +main() +{ + system("date"); + /* rest of processing */ +} +.P2 +If the command string has to be built from pieces, +the in-memory formatting capabilities of +.UL sprintf +may be useful. +.PP +Remember than +.UL getc +and +.UL putc +normally buffer their input; +terminal I/O will not be properly synchronized unless +this buffering is defeated. +For output, use +.UL fflush ; +for input, see +.UL setbuf +in the appendix. +.NH 2 +Low-Level Process Creation \(em Execl and Execv +.PP +If you're not using the standard library, +or if you need finer control over what +happens, +you will have to construct calls to other programs +using the more primitive routines that the standard +library's +.UL system +routine is based on. +.PP +The most basic operation is to execute another program +.ul +without +.IT returning , +by using the routine +.UL execl . +To print the date as the last action of a running program, +use +.P1 +execl("/bin/date", "date", NULL); +.P2 +The first argument to +.UL execl +is the +.ul +file name +of the command; you have to know where it is found +in the file system. +The second argument is conventionally +the program name +(that is, the last component of the file name), +but this is seldom used except as a place-holder. +If the command takes arguments, they are strung out after +this; +the end of the list is marked by a +.UL NULL +argument. +.PP +The +.UL execl +call +overlays the existing program with +the new one, +runs that, then exits. +There is +.ul +no +return to the original program. +.PP +More realistically, +a program might fall into two or more phases +that communicate only through temporary files. +Here it is natural to make the second pass +simply an +.UL execl +call from the first. +.PP +The one exception to the rule that the original program never gets control +back occurs when there is an error, for example if the file can't be found +or is not executable. +If you don't know where +.UL date +is located, say +.P1 +execl("/bin/date", "date", NULL); +execl("/usr/bin/date", "date", NULL); +fprintf(stderr, "Someone stole 'date'\en"); +.P2 +.PP +A variant of +.UL execl +called +.UL execv +is useful when you don't know in advance how many arguments there are going to be. +The call is +.P1 +execv(filename, argp); +.P2 +where +.UL argp +is an array of pointers to the arguments; +the last pointer in the array must be +.UL NULL +so +.UL execv +can tell where the list ends. +As with +.UL execl , +.UL filename +is the file in which the program is found, and +.UL argp[0] +is the name of the program. +(This arrangement is identical to the +.UL argv +array for program arguments.) +.PP +Neither of these routines provides the niceties of normal command execution. +There is no automatic search of multiple directories \(em +you have to know precisely where the command is located. +Nor do you get the expansion of metacharacters like +.UL < , +.UL > , +.UL * , +.UL ? , +and +.UL [] +in the argument list. +If you want these, use +.UL execl +to invoke the shell +.UL sh , +which then does all the work. +Construct a string +.UL commandline +that contains the complete command as it would have been typed +at the terminal, then say +.P1 +execl("/bin/sh", "sh", "-c", commandline, NULL); +.P2 +The shell is assumed to be at a fixed place, +.UL /bin/sh . +Its argument +.UL -c +says to treat the next argument +as a whole command line, so it does just what you want. +The only problem is in constructing the right information +in +.UL commandline . +.NH 2 +Control of Processes \(em Fork and Wait +.PP +So far what we've talked about isn't really all that useful by itself. +Now we will show how to regain control after running +a program with +.UL execl +or +.UL execv . +Since these routines simply overlay the new program on the old one, +to save the old one requires that it first be split into +two copies; +one of these can be overlaid, while the other waits for the new, +overlaying program to finish. +The splitting is done by a routine called +.UL fork : +.P1 +proc_id = fork(); +.P2 +splits the program into two copies, both of which continue to run. +The only difference between the two is the value of +.UL proc_id , +the ``process id.'' +In one of these processes (the ``child''), +.UL proc_id +is zero. +In the other +(the ``parent''), +.UL proc_id +is non-zero; it is the process number of the child. +Thus the basic way to call, and return from, +another program is +.P1 +if (fork() == 0) + execl("/bin/sh", "sh", "-c", cmd, NULL); /* in child */ +.P2 +And in fact, except for handling errors, this is sufficient. +The +.UL fork +makes two copies of the program. +In the child, the value returned by +.UL fork +is zero, so it calls +.UL execl +which does the +.UL command +and then dies. +In the parent, +.UL fork +returns non-zero +so it skips the +.UL execl. +(If there is any error, +.UL fork +returns +.UL -1 ). +.PP +More often, the parent wants to wait for the child to terminate +before continuing itself. +This can be done with +the function +.UL wait : +.P1 +int status; + +if (fork() == 0) + execl(...); +wait(&status); +.P2 +This still doesn't handle any abnormal conditions, such as a failure +of the +.UL execl +or +.UL fork , +or the possibility that there might be more than one child running simultaneously. +(The +.UL wait +returns the +process id +of the terminated child, if you want to check it against the value +returned by +.UL fork .) +Finally, this fragment doesn't deal with any +funny behavior on the part of the child +(which is reported in +.UL status ). +Still, these three lines +are the heart of the standard library's +.UL system +routine, +which we'll show in a moment. +.PP +The +.UL status +returned by +.UL wait +encodes in its low-order eight bits +the system's idea of the child's termination status; +it is 0 for normal termination and non-zero to indicate +various kinds of problems. +The next higher eight bits are taken from the argument +of the call to +.UL exit +which caused a normal termination of the child process. +It is good coding practice +for all programs to return meaningful +status. +.PP +When a program is called by the shell, +the three file descriptors +0, 1, and 2 are set up pointing at the right files, +and all other possible file descriptors +are available for use. +When this program calls another one, +correct etiquette suggests making sure the same conditions +hold. +Neither +.UL fork +nor the +.UL exec +calls affects open files in any way. +If the parent is buffering output +that must come out before output from the child, +the parent must flush its buffers +before the +.UL execl . +Conversely, +if a caller buffers an input stream, +the called program will lose any information +that has been read by the caller. +.NH 2 +Pipes +.PP +A +.ul +pipe +is an I/O channel intended for use +between two cooperating processes: +one process writes into the pipe, +while the other reads. +The system looks after buffering the data and synchronizing +the two processes. +Most pipes are created by the shell, +as in +.P1 +ls | pr +.P2 +which connects the standard output of +.UL ls +to the standard input of +.UL pr . +Sometimes, however, it is most convenient +for a process to set up its own plumbing; +in this section, we will illustrate how +the pipe connection is established and used. +.PP +The system call +.UL pipe +creates a pipe. +Since a pipe is used for both reading and writing, +two file descriptors are returned; +the actual usage is like this: +.P1 +int fd[2]; + +stat = pipe(fd); +if (stat == -1) + /* there was an error ... */ +.P2 +.UL fd +is an array of two file descriptors, where +.UL fd[0] +is the read side of the pipe and +.UL fd[1] +is for writing. +These may be used in +.UL read , +.UL write +and +.UL close +calls just like any other file descriptors. +.PP +If a process reads a pipe which is empty, +it will wait until data arrives; +if a process writes into a pipe which +is too full, it will wait until the pipe empties somewhat. +If the write side of the pipe is closed, +a subsequent +.UL read +will encounter end of file. +.PP +To illustrate the use of pipes in a realistic setting, +let us write a function called +.UL popen(cmd,\ mode) , +which creates a process +.UL cmd +(just as +.UL system +does), +and returns a file descriptor that will either +read or write that process, according to +.UL mode . +That is, +the call +.P1 +fout = popen("pr", WRITE); +.P2 +creates a process that executes +the +.UL pr +command; +subsequent +.UL write +calls using the file descriptor +.UL fout +will send their data to that process +through the pipe. +.PP +.UL popen +first creates the +the pipe with a +.UL pipe +system call; +it then +.UL fork s +to create two copies of itself. +The child decides whether it is supposed to read or write, +closes the other side of the pipe, +then calls the shell (via +.UL execl ) +to run the desired process. +The parent likewise closes the end of the pipe it does not use. +These closes are necessary to make end-of-file tests work properly. +For example, if a child that intends to read +fails to close the write end of the pipe, it will never +see the end of the pipe file, just because there is one writer +potentially active. +.P1 +#include <stdio.h> + +#define READ 0 +#define WRITE 1 +#define tst(a, b) (mode == READ ? (b) : (a)) +static int popen_pid; + +popen(cmd, mode) +char *cmd; +int mode; +{ + int p[2]; + + if (pipe(p) < 0) + return(NULL); + if ((popen_pid = fork()) == 0) { + close(tst(p[WRITE], p[READ])); + close(tst(0, 1)); + dup(tst(p[READ], p[WRITE])); + close(tst(p[READ], p[WRITE])); + execl("/bin/sh", "sh", "-c", cmd, 0); + _exit(1); /* disaster has occurred if we get here */ + } + if (popen_pid == -1) + return(NULL); + close(tst(p[READ], p[WRITE])); + return(tst(p[WRITE], p[READ])); +} +.P2 +The sequence of +.UL close s +in the child +is a bit tricky. +Suppose +that the task is to create a child process that will read data from the parent. +Then the first +.UL close +closes the write side of the pipe, +leaving the read side open. +The lines +.P1 +close(tst(0, 1)); +dup(tst(p[READ], p[WRITE])); +.P2 +are the conventional way to associate the pipe descriptor +with the standard input of the child. +The +.UL close +closes file descriptor 0, +that is, the standard input. +.UL dup +is a system call that +returns a duplicate of an already open file descriptor. +File descriptors are assigned in increasing order +and the first available one is returned, +so +the effect of the +.UL dup +is to copy the file descriptor for the pipe (read side) +to file descriptor 0; +thus the read side of the pipe becomes the standard input. +(Yes, this is a bit tricky, but it's a standard idiom.) +Finally, the old read side of the pipe is closed. +.PP +A similar sequence of operations takes place +when the child process is supposed to write +from the parent instead of reading. +You may find it a useful exercise to step through that case. +.PP +The job is not quite done, +for we still need a function +.UL pclose +to close the pipe created by +.UL popen . +The main reason for using a separate function rather than +.UL close +is that it is desirable to wait for the termination of the child process. +First, the return value from +.UL pclose +indicates whether the process succeeded. +Equally important when a process creates several children +is that only a bounded number of unwaited-for children +can exist, even if some of them have terminated; +performing the +.UL wait +lays the child to rest. +Thus: +.P1 +#include <signal.h> + +pclose(fd) /* close pipe fd */ +int fd; +{ + register r, (*hstat)(), (*istat)(), (*qstat)(); + int status; + extern int popen_pid; + + close(fd); + istat = signal(SIGINT, SIG_IGN); + qstat = signal(SIGQUIT, SIG_IGN); + hstat = signal(SIGHUP, SIG_IGN); + while ((r = wait(&status)) != popen_pid && r != -1); + if (r == -1) + status = -1; + signal(SIGINT, istat); + signal(SIGQUIT, qstat); + signal(SIGHUP, hstat); + return(status); +} +.P2 +The calls to +.UL signal +make sure that no interrupts, etc., +interfere with the waiting process; +this is the topic of the next section. +.PP +The routine as written has the limitation that only one pipe may +be open at once, because of the single shared variable +.UL popen_pid ; +it really should be an array indexed by file descriptor. +A +.UL popen +function, with slightly different arguments and return value is available +as part of the standard I/O library discussed below. +As currently written, it shares the same limitation. diff --git a/share/doc/psd/04.uprog/p6 b/share/doc/psd/04.uprog/p6 new file mode 100644 index 0000000..c323d94 --- /dev/null +++ b/share/doc/psd/04.uprog/p6 @@ -0,0 +1,361 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p6 8.1 (Berkeley) 6/8/93 +.\" +.NH +SIGNALS \(em INTERRUPTS AND ALL THAT +.PP +This section is concerned with how to +deal gracefully with signals from +the outside world (like interrupts), and with program faults. +Since there's nothing very useful that +can be done from within C about program +faults, which arise mainly from illegal memory references +or from execution of peculiar instructions, +we'll discuss only the outside-world signals: +.IT interrupt , +which is sent when the +.UC DEL +character is typed; +.IT quit , +generated by the +.UC FS +character; +.IT hangup , +caused by hanging up the phone; +and +.IT terminate , +generated by the +.IT kill +command. +When one of these events occurs, +the signal is sent to +.IT all +processes which were started +from the corresponding terminal; +unless other arrangements have been made, +the signal +terminates the process. +In the +.IT quit +case, a core image file is written for debugging +purposes. +.PP +The routine which alters the default action +is +called +.UL signal . +It has two arguments: the first specifies the signal, and the second +specifies how to treat it. +The first argument is just a number code, but the second is the +address is either a function, or a somewhat strange code +that requests that the signal either be ignored, or that it be +given the default action. +The include file +.UL signal.h +gives names for the various arguments, and should always be included +when signals are used. +Thus +.P1 +#include <signal.h> + ... +signal(SIGINT, SIG_IGN); +.P2 +causes interrupts to be ignored, while +.P1 +signal(SIGINT, SIG_DFL); +.P2 +restores the default action of process termination. +In all cases, +.UL signal +returns the previous value of the signal. +The second argument to +.UL signal +may instead be the name of a function +(which has to be declared explicitly if +the compiler hasn't seen it already). +In this case, the named routine will be called +when the signal occurs. +Most commonly this facility is used +to allow the program to clean up +unfinished business before terminating, for example to +delete a temporary file: +.P1 +#include <signal.h> + +main() +{ + int onintr(); + + if (signal(SIGINT, SIG_IGN) != SIG_IGN) + signal(SIGINT, onintr); + + /* Process ... */ + + exit(0); +} + +onintr() +{ + unlink(tempfile); + exit(1); +} +.P2 +.PP +Why the test and the double call to +.UL signal ? +Recall that signals like interrupt are sent to +.ul +all +processes started from a particular terminal. +Accordingly, when a program is to be run +non-interactively +(started by +.UL & ), +the shell turns off interrupts for it +so it won't be stopped by interrupts intended for foreground processes. +If this program began by announcing that all interrupts were to be sent +to the +.UL onintr +routine regardless, +that would undo the shell's effort to protect it +when run in the background. +.PP +The solution, shown above, is to test the state of interrupt handling, +and to continue to ignore interrupts if they are already being ignored. +The code as written +depends on the fact that +.UL signal +returns the previous state of a particular signal. +If signals were already being ignored, the process should continue to ignore them; +otherwise, they should be caught. +.PP +A more sophisticated program may wish to intercept +an interrupt and interpret it as a request +to stop what it is doing +and return to its own command-processing loop. +Think of a text editor: +interrupting a long printout should not cause it +to terminate and lose the work +already done. +The outline of the code for this case is probably best written like this: +.P1 +#include <signal.h> +#include <setjmp.h> +jmp_buf sjbuf; + +main() +{ + int (*istat)(), onintr(); + + istat = signal(SIGINT, SIG_IGN); /* save original status */ + setjmp(sjbuf); /* save current stack position */ + if (istat != SIG_IGN) + signal(SIGINT, onintr); + + /* main processing loop */ +} +.P2 +.P1 +onintr() +{ + printf("\enInterrupt\en"); + longjmp(sjbuf); /* return to saved state */ +} +.P2 +The include file +.UL setjmp.h +declares the type +.UL jmp_buf +an object in which the state +can be saved. +.UL sjbuf +is such an object; it is an array of some sort. +The +.UL setjmp +routine then saves +the state of things. +When an interrupt occurs, +a call is forced to the +.UL onintr +routine, +which can print a message, set flags, or whatever. +.UL longjmp +takes as argument an object stored into by +.UL setjmp , +and restores control +to the location after the call to +.UL setjmp , +so control (and the stack level) will pop back +to the place in the main routine where +the signal is set up and the main loop entered. +Notice, by the way, that +the signal +gets set again after an interrupt occurs. +This is necessary; most signals are automatically +reset to their default action when they occur. +.PP +Some programs that want to detect signals simply can't be stopped +at an arbitrary point, +for example in the middle of updating a linked list. +If the routine called on occurrence of a signal +sets a flag and then +returns instead of calling +.UL exit +or +.UL longjmp , +execution will continue +at the exact point it was interrupted. +The interrupt flag can then be tested later. +.PP +There is one difficulty associated with this +approach. +Suppose the program is reading the +terminal when the interrupt is sent. +The specified routine is duly called; it sets its flag +and returns. +If it were really true, as we said +above, that ``execution resumes at the exact point it was interrupted,'' +the program would continue reading the terminal +until the user typed another line. +This behavior might well be confusing, since the user +might not know that the program is reading; +he presumably would prefer to have the signal take effect instantly. +The method chosen to resolve this difficulty +is to terminate the terminal read when execution +resumes after the signal, returning an error code +which indicates what happened. +.PP +Thus programs which catch and resume +execution after signals should be prepared for ``errors'' +which are caused by interrupted +system calls. +(The ones to watch out for are reads from a terminal, +.UL wait , +and +.UL pause .) +A program +whose +.UL onintr +program just sets +.UL intflag , +resets the interrupt signal, and returns, +should usually include code like the following when it reads +the standard input: +.P1 +if (getchar() == EOF) + if (intflag) + /* EOF caused by interrupt */ + else + /* true end-of-file */ +.P2 +.PP +A final subtlety to keep in mind becomes important +when signal-catching is combined with execution of other programs. +Suppose a program catches interrupts, and also includes +a method (like ``!'' in the editor) +whereby other programs can be executed. +Then the code should look something like this: +.P1 +if (fork() == 0) + execl(...); +signal(SIGINT, SIG_IGN); /* ignore interrupts */ +wait(&status); /* until the child is done */ +signal(SIGINT, onintr); /* restore interrupts */ +.P2 +Why is this? +Again, it's not obvious but not really difficult. +Suppose the program you call catches its own interrupts. +If you interrupt the subprogram, +it will get the signal and return to its +main loop, and probably read your terminal. +But the calling program will also pop out of +its wait for the subprogram and read your terminal. +Having two processes reading +your terminal is very unfortunate, +since the system figuratively flips a coin to decide +who should get each line of input. +A simple way out is to have the parent program +ignore interrupts until the child is done. +This reasoning is reflected in the standard I/O library function +.UL system : +.P1 +#include <signal.h> + +system(s) /* run command string s */ +char *s; +{ + int status, pid, w; + register int (*istat)(), (*qstat)(); + + if ((pid = fork()) == 0) { + execl("/bin/sh", "sh", "-c", s, 0); + _exit(127); + } + istat = signal(SIGINT, SIG_IGN); + qstat = signal(SIGQUIT, SIG_IGN); + while ((w = wait(&status)) != pid && w != -1) + ; + if (w == -1) + status = -1; + signal(SIGINT, istat); + signal(SIGQUIT, qstat); + return(status); +} +.P2 +.PP +As an aside on declarations, +the function +.UL signal +obviously has a rather strange second argument. +It is in fact a pointer to a function delivering an integer, +and this is also the type of the signal routine itself. +The two values +.UL SIG_IGN +and +.UL SIG_DFL +have the right type, but are chosen so they coincide with +no possible actual functions. +For the enthusiast, here is how they are defined for the PDP-11; +the definitions should be sufficiently ugly +and nonportable to encourage use of the include file. +.P1 +#define SIG_DFL (int (*)())0 +#define SIG_IGN (int (*)())1 +.P2 diff --git a/share/doc/psd/04.uprog/p8 b/share/doc/psd/04.uprog/p8 new file mode 100644 index 0000000..3c7c7b4 --- /dev/null +++ b/share/doc/psd/04.uprog/p8 @@ -0,0 +1,62 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p8 8.1 (Berkeley) 6/8/93 +.\" +.SH +References +.LP +.IP [1] +K. L. Thompson and D. M. Ritchie, +.ul +The +.ul +.UC UNIX +.ul +Programmer's Manual, +Bell Laboratories, 1978. +.IP [2] +B. W. Kernighan and D. M. Ritchie, +.ul +The C Programming Language, +Prentice-Hall, Inc., 1978. +.IP [3] +B. W. Kernighan, +.UC UNIX \& `` +for Beginners \(em Second Edition.'' +Bell Laboratories, 1978. diff --git a/share/doc/psd/04.uprog/p9 b/share/doc/psd/04.uprog/p9 new file mode 100644 index 0000000..2366444 --- /dev/null +++ b/share/doc/psd/04.uprog/p9 @@ -0,0 +1,680 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.\" @(#)p9 8.1 (Berkeley) 6/8/93 +.\" +.sp 100 +.TL +.ft R +Appendix \(em The Standard I/O Library +.AU +D. M. Ritchie +.AI +AT&T Bell Laboratories +Murray Hill, NJ 07974 +.PP +The standard I/O library +was designed with the following goals in mind. +.IP 1. +It must be as efficient as possible, both in time and in space, +so that there will be no hesitation in using it +no matter how critical the application. +.IP 2. +It must be simple to use, and also free of the magic +numbers and mysterious calls +whose use mars the understandability and portability +of many programs using older packages. +.IP 3. +The interface provided should be applicable on all machines, +whether or not the programs which implement it are directly portable +to other systems, +or to machines other than the PDP-11 running a version of +.UC UNIX . +.SH +1. General Usage +.PP +Each program using the library must have the line +.P1 + #include <stdio.h> +.P2 +which defines certain macros and variables. +The routines are in the normal C library, +so no special library argument is needed for loading. +All names in the include file intended only for internal use begin +with an underscore +.UL _ +to reduce the possibility +of collision with a user name. +The names intended to be visible outside the package are +.IP \f3stdin\f1 10 +The name of the standard input file +.IP \f3stdout\f1 10 +The name of the standard output file +.IP \f3stderr\f1 10 +The name of the standard error file +.IP \f3EOF\f1 10 +is actually \-1, and is the value returned by +the read routines on end-of-file or error. +.IP \f3NULL\f1 10 +is a notation for the null pointer, returned by +pointer-valued functions +to indicate an error +.IP \f3FILE\f1 10 +expands to +.UL struct +.UL _iob +and is a useful +shorthand when declaring pointers +to streams. +.IP \f3BUFSIZ\f1 10 +is a number (viz. 512) +of the size suitable for an I/O buffer supplied by the user. +See +.UL setbuf , +below. +.IP \f3getc,\ getchar,\ putc,\ putchar,\ feof,\ ferror,\ f\&ileno\f1 10 +.br +are defined as macros. +Their actions are described below; +they are mentioned here +to point out that it is not possible to +redeclare them +and that they are not actually functions; +thus, for example, they may not have breakpoints set on them. +.PP +The routines in this package +offer the convenience of automatic buffer allocation +and output flushing where appropriate. +The names +.UL stdin , +.UL stdout , +and +.UL stderr +are in effect constants and may not be assigned to. +.SH +2. Calls +.nr PD .4v +.LP +.UL FILE\ *fopen(filename,\ type)\ char\ *filename,\ *type; +.nr PD 0 +.IP +.br +opens the file and, if needed, allocates a buffer for it. +.UL filename +is a character string specifying the name. +.UL type +is a character string (not a single character). +It may be +.UL \&"r" , +.UL \&"w" , +or +.UL \&"a" +to indicate +intent to read, write, or append. +The value returned is a file pointer. +If it is +.UL NULL +the attempt to open failed. +.ne 3 +.nr PD .4v +.LP +.UL FILE\ *freopen(filename,\ type,\ ioptr)\ char\ *filename,\ *type;\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +The stream named by +.UL ioptr +is closed, if necessary, and then reopened +as if by +.UL fopen . +If the attempt to open fails, +.UL NULL +is returned, +otherwise +.UL ioptr , +which will now refer to the new file. +Often the reopened stream is +.UL stdin +or +.UL stdout . +.nr PD .4v +.LP +.UL int\ getc(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +returns the next character from the stream named by +.UL ioptr , +which is a pointer to a file such as returned by +.UL fopen , +or the name +.UL stdin . +The integer +.UL EOF +is returned on end-of-file or when +an error occurs. +The null character +.UL \e0 +is a legal character. +.nr PD .4v +.LP +.UL int\ fgetc(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +acts like +.UL getc +but is a genuine function, +not a macro, +so it can be pointed to, passed as an argument, etc. +.nr PD .4v +.LP +.UL putc(c,\ ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +.UL putc +writes the character +.UL c +on the output stream named by +.UL ioptr , +which is a value returned from +.UL fopen +or perhaps +.UL stdout +or +.UL stderr . +The character is returned as value, +but +.UL EOF +is returned on error. +.nr PD .4v +.LP +.UL fputc(c,\ ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +acts like +.UL putc +but is a genuine +function, not a macro. +.nr PD .4v +.LP +.UL fclose(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +The file corresponding to +.UL ioptr +is closed after any buffers are emptied. +A buffer allocated by the I/O system is freed. +.UL fclose +is automatic on normal termination of the program. +.nr PD .4v +.LP +.UL fflush(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +Any buffered information on the (output) stream named by +.UL ioptr +is written out. +Output files are normally buffered +if and only if they are not directed to the terminal; +however, +.UL stderr +always starts off unbuffered and remains so unless +.UL setbuf +is used, or unless it is reopened. +.nr PD .4v +.LP +.UL exit(errcode); +.nr PD 0 +.IP +.br +terminates the process and returns its argument as status +to the parent. +This is a special version of the routine +which calls +.UL fflush +for each output file. +To terminate without flushing, +use +.UL _exit . +.nr PD .4v +.LP +.UL feof(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +returns non-zero when end-of-file +has occurred on the specified input stream. +.nr PD .4v +.LP +.UL ferror(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +returns non-zero when an error has occurred while reading +or writing the named stream. +The error indication lasts until the file has been closed. +.nr PD .4v +.LP +.UL getchar(); +.nr PD 0 +.IP +.br +is identical to +.UL getc(stdin) . +.nr PD .4v +.LP +.UL putchar(c); +.nr PD 0 +.IP +.br +is identical to +.UL putc(c,\ stdout) . +.nr PD .4v +.nr PD .4v +.ne 2 +.LP +.UL char\ *fgets(s,\ n,\ ioptr)\ char\ *s;\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +reads up to +.UL n-1 +characters from the stream +.UL ioptr +into the character pointer +.UL s . +The read terminates with a newline character. +The newline character is placed in the buffer +followed by a null character. +.UL fgets +returns the first argument, +or +.UL NULL +if error or end-of-file occurred. +.nr PD .4v +.nr PD .4v +.LP +.UL fputs(s,\ ioptr)\ char\ *s;\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +writes the null-terminated string (character array) +.UL s +on the stream +.UL ioptr . +No newline is appended. +No value is returned. +.nr PD .4v +.LP +.UL ungetc(c,\ ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +The argument character +.UL c +is pushed back on the input stream named by +.UL ioptr . +Only one character may be pushed back. +.ne 5 +.nr PD .4v +.LP +.UL printf(format,\ a1,\ ...)\ char\ *format; +.br +.UL fprintf(ioptr,\ format,\ a1,\ ...)\ FILE\ *ioptr;\ char\ *format; +.br +.UL sprintf(s,\ format,\ a1,\ ...)char\ *s,\ *format; +.br +.nr PD 0 +.IP +.UL printf +writes on the standard output. +.UL fprintf +writes on the named output stream. +.UL sprintf +puts characters in the character array (string) +named by +.UL s . +The specifications are as described in section +.UL printf (3) +of the +.ul +.UC UNIX +.ul +Programmer's Manual. +.nr PD .4v +.LP +.UL scanf(format,\ a1,\ ...)\ char\ *format; +.br +.UL fscanf(ioptr,\ format,\ a1,\ ...)\ FILE\ *ioptr;\ char\ *format; +.br +.UL sscanf(s,\ format,\ a1,\ ...)\ char\ *s,\ *format; +.nr PD 0 +.IP +.br +.UL scanf +reads from the standard input. +.UL fscanf +reads from the named input stream. +.UL sscanf +reads from the character string +supplied as +.UL s . +.UL scanf +reads characters, interprets +them according to a format, and stores the results in its arguments. +Each routine expects as arguments +a control string +.UL format , +and a set of arguments, +.I +each of which must be a pointer, +.R +indicating where the converted input should be stored. +.if t .sp .4v +.UL scanf +returns as its value the number of successfully matched and assigned input +items. +This can be used to decide how many input items were found. +On end of file, +.UL EOF +is returned; note that this is different +from 0, which means that the next input character does not +match what was called for in the control string. +.nr PD .4v +.LP +.UL fread(ptr,\ sizeof(*ptr),\ nitems,\ ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +reads +.UL nitems +of data beginning at +.UL ptr +from file +.UL ioptr . +No advance notification +that binary I/O is being done is required; +when, for portability reasons, +it becomes required, it will be done +by adding an additional character to the mode-string on the +.UL fopen +call. +.nr PD .4v +.LP +.UL fwrite(ptr,\ sizeof(*ptr),\ nitems,\ ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +Like +.UL fread , +but in the other direction. +.nr PD .4v +.LP +.UL rewind(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +rewinds the stream +named by +.UL ioptr . +It is not very useful except on input, +since a rewound output file is still open only for output. +.nr PD .4v +.LP +.UL system(string)\ char\ *string; +.nr PD 0 +.IP +.br +The +.UL string +is executed by the shell as if typed at the terminal. +.nr PD .4v +.LP +.UL getw(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +returns the next word from the input stream named by +.UL ioptr . +.UL EOF +is returned on end-of-file or error, +but since this a perfectly good +integer +.UL feof +and +.UL ferror +should be used. +A ``word'' is 16 bits on the +.UC PDP-11. +.nr PD .4v +.LP +.UL putw(w,\ ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +writes the integer +.UL w +on the named output stream. +.nr PD .4v +.LP +.UL setbuf(ioptr,\ buf)\ FILE\ *ioptr;\ char\ *buf; +.nr PD 0 +.IP +.br +.UL setbuf +may be used after a stream has been opened +but before I/O has started. +If +.UL buf +is +.UL NULL , +the stream will be unbuffered. +Otherwise the buffer supplied will be used. +It must be a character array of sufficient size: +.P1 +char buf[BUFSIZ]; +.P2 +.nr PD .4v +.LP +.UL fileno(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +returns the integer file descriptor associated with the file. +.nr PD .4v +.LP +.UL fseek(ioptr,\ offset,\ ptrname)\ FILE\ *ioptr;\ long\ offset; +.nr PD 0 +.IP +.br +The location of the next byte in the stream +named by +.UL ioptr +is adjusted. +.UL offset +is a long integer. +If +.UL ptrname +is 0, the offset is measured from the beginning of the file; +if +.UL ptrname +is 1, the offset is measured from the current read or +write pointer; +if +.UL ptrname +is 2, the offset is measured from the end of the file. +The routine accounts properly for any buffering. +(When this routine is used on +.UC UNIX \& non- +systems, +the offset must be a value returned from +.UL ftell +and the ptrname must be 0). +.ne 3 +.nr PD .4v +.LP +.UL long\ ftell(ioptr)\ FILE\ *ioptr; +.nr PD 0 +.IP +.br +The byte offset, measured from the beginning of the file, +associated with the named stream is returned. +Any buffering is properly accounted for. +(On +.UC UNIX \& non- +systems the value of this call is useful only +for handing to +.UL fseek , +so as to position the file to the same place it was when +.UL ftell +was called.) +.nr PD .4v +.LP +.UL getpw(uid,\ buf)\ char\ *buf; +.nr PD 0 +.IP +.br +The password file is searched for the given integer user ID. +If an appropriate line is found, it is copied into +the character array +.UL buf , +and 0 is returned. +If no line is found corresponding to the user ID +then 1 is returned. +.nr PD .4v +.LP +.UL char\ *malloc(num); +.nr PD 0 +.IP +.br +allocates +.UL num +bytes. +The pointer returned is sufficiently well aligned to be usable for any purpose. +.UL NULL +is returned if no space is available. +.nr PD .4v +.LP +.UL char\ *calloc(num,\ size); +.nr PD 0 +.IP +.br +allocates space for +.UL num +items each of size +.UL size . +The space is guaranteed to be set to 0 and the pointer is +sufficiently well aligned to be usable for any purpose. +.UL NULL +is returned if no space is available . +.nr PD .4v +.LP +.UL cfree(ptr)\ char\ *ptr; +.nr PD 0 +.IP +.br +Space is returned to the pool used by +.UL calloc . +Disorder can be expected if the pointer was not obtained +from +.UL calloc . +.nr PD .4v +.LP +The following are macros whose definitions may be obtained by including +.UL <ctype.h> . +.nr PD .4v +.LP +.UL isalpha(c) +returns non-zero if the argument is alphabetic. +.nr PD .4v +.LP +.UL isupper(c) +returns non-zero if the argument is upper-case alphabetic. +.nr PD .4v +.LP +.UL islower(c) +returns non-zero if the argument is lower-case alphabetic. +.nr PD .4v +.LP +.UL isdigit(c) +returns non-zero if the argument is a digit. +.nr PD .4v +.LP +.UL isspace(c) +returns non-zero if the argument is a spacing character: +tab, newline, carriage return, vertical tab, +form feed, space. +.nr PD .4v +.LP +.UL ispunct(c) +returns non-zero if the argument is +any punctuation character, i.e., not a space, letter, +digit or control character. +.nr PD .4v +.LP +.UL isalnum(c) +returns non-zero if the argument is a letter or a digit. +.nr PD .4v +.LP +.UL isprint(c) +returns non-zero if the argument is printable \(em +a letter, digit, or punctuation character. +.nr PD .4v +.LP +.UL iscntrl(c) +returns non-zero if the argument is a control character. +.nr PD .4v +.LP +.UL isascii(c) +returns non-zero if the argument is an ascii character, i.e., less than octal 0200. +.nr PD .4v +.LP +.UL toupper(c) +returns the upper-case character corresponding to the lower-case +letter +.UL c. +.nr PD .4v +.LP +.UL tolower(c) +returns the lower-case character corresponding to the upper-case +letter +.UL c . diff --git a/share/doc/psd/05.sysman/0.t b/share/doc/psd/05.sysman/0.t new file mode 100644 index 0000000..865e8ff --- /dev/null +++ b/share/doc/psd/05.sysman/0.t @@ -0,0 +1,292 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 8.1 (Berkeley) 6/8/93 +.\" +.if n .ND +.TL +Berkeley Software Architecture Manual +.br +4.4BSD Edition +.AU +William Joy, Robert Fabry, +.AU +Samuel Leffler, M. Kirk McKusick, +.AU +Michael Karels +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.EH 'PSD:5-%''4.4BSD Architecture Manual' +.OH '4.4BSD Architecture Manual''PSD:5-%' +.AB +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +This document summarizes the facilities +provided by the 4.4BSD version of the UNIX\|* operating system. +It does not attempt to act as a tutorial for use of the system +nor does it attempt to explain or justify the design of the +system facilities. +It gives neither motivation nor implementation details, +in favor of brevity. +.PP +The first section describes the basic kernel functions +provided to a UNIX process: process naming and protection, +memory management, software interrupts, +object references (descriptors), time and statistics functions, +and resource controls. +These facilities, as well as facilities for +bootstrap, shutdown and process accounting, +are provided solely by the kernel. +.PP +The second section describes the standard system +abstractions for +files and file systems, +communication, +terminal handling, +and process control and debugging. +These facilities are implemented by the operating system or by +network server processes. +.AE +.LP +.bp +.ft B +.br +.sv 2 +.ce +TABLE OF CONTENTS +.ft R +.LP +.sp 1 +.nf +.B "Introduction." +.LP +.if t .sp .5v +.nf +.B "0. Notation and types" +.LP +.if t .sp .5v +.nf +.B "1. Kernel primitives" +.LP +.if t .sp .5v +.nf +.nf +\fB1.1. Processes and protection\fP +1.1.1. Host and process identifiers +1.1.2. Process creation and termination +1.1.3. User and group ids +1.1.4. Process groups +.LP +.nf +\fB1.2. Memory management\fP +1.2.1. Text, data and stack +1.2.2. Mapping pages +1.2.3. Page protection control +1.2.4. Giving and getting advice +1.2.5. Protection primitives +.LP +.if t .sp .5v +.nf +\fB1.3. Signals\fP +1.3.1. Overview +1.3.2. Signal types +1.3.3. Signal handlers +1.3.4. Sending signals +1.3.5. Protecting critical sections +1.3.6. Signal stacks +.LP +.if t .sp .5v +.nf +\fB1.4. Timing and statistics\fP +1.4.1. Real time +1.4.2. Interval time +.LP +.if t .sp .5v +.nf +\fB1.5. Descriptors\fP +1.5.1. The reference table +1.5.2. Descriptor properties +1.5.3. Managing descriptor references +1.5.4. Multiplexing requests +1.5.5. Descriptor wrapping +.LP +.if t .sp .5v +.nf +\fB1.6. Resource controls\fP +1.6.1. Process priorities +1.6.2. Resource utilization +1.6.3. Resource limits +.LP +.if t .sp .5v +.nf +\fB1.7. System operation support\fP +1.7.1. Bootstrap operations +1.7.2. Shutdown operations +1.7.3. Accounting +.bp +.LP +.if t .sp .5v +.sp 1 +.nf +\fB2. System facilities\fP +.LP +.if t .sp .5v +.nf +\fB2.1. Generic operations\fP +2.1.1. Read and write +2.1.2. Input/output control +2.1.3. Non-blocking and asynchronous operations +.LP +.if t .sp .5v +.nf +\fB2.2. File system\fP +2.2.1 Overview +2.2.2. Naming +2.2.3. Creation and removal +2.2.3.1. Directory creation and removal +2.2.3.2. File creation +2.2.3.3. Creating references to devices +2.2.3.4. Portal creation +2.2.3.6. File, device, and portal removal +2.2.4. Reading and modifying file attributes +2.2.5. Links and renaming +2.2.6. Extension and truncation +2.2.7. Checking accessibility +2.2.8. Locking +2.2.9. Disc quotas +.LP +.if t .sp .5v +.nf +\fB2.3. Interprocess communication\fP +2.3.1. Interprocess communication primitives +2.3.1.1.\0 Communication domains +2.3.1.2.\0 Socket types and protocols +2.3.1.3.\0 Socket creation, naming and service establishment +2.3.1.4.\0 Accepting connections +2.3.1.5.\0 Making connections +2.3.1.6.\0 Sending and receiving data +2.3.1.7.\0 Scatter/gather and exchanging access rights +2.3.1.8.\0 Using read and write with sockets +2.3.1.9.\0 Shutting down halves of full-duplex connections +2.3.1.10.\0 Socket and protocol options +2.3.2. UNIX domain +2.3.2.1. Types of sockets +2.3.2.2. Naming +2.3.2.3. Access rights transmission +2.3.3. INTERNET domain +2.3.3.1. Socket types and protocols +2.3.3.2. Socket naming +2.3.3.3. Access rights transmission +2.3.3.4. Raw access +.LP +.if t .sp .5v +.nf +\fB2.4. Terminals and devices\fP +2.4.1. Terminals +2.4.1.1. Terminal input +2.4.1.1.1 Input modes +2.4.1.1.2 Interrupt characters +2.4.1.1.3 Line editing +2.4.1.2. Terminal output +2.4.1.3. Terminal control operations +2.4.1.4. Terminal hardware support +2.4.2. Structured devices +2.4.3. Unstructured devices +.LP +.if t .sp .5v +.nf +\fB2.5. Process control and debugging\fP +.LP +.if t .sp .5v +.nf +\fBI. Summary of facilities\fP +.LP +.de sh +.ds RH \\$1 +.bp +.NH \\*(ss +\s+2\\$1\s0 +.PP +.PP +.. +.bp +.ds ss 1 +.de _d +.if t .ta .6i 2.1i 2.6i +.\" 2.94 went to 2.6, 3.64 to 3.30 +.if n .ta .84i 2.6i 3.30i +.. +.de _f +.if t .ta .5i 1.25i 2.5i 3.5i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i 4.8i +.. +.nr H1 -1 +.sh "Notation and types +.PP +The notation used to describe system calls is a variant of a +C language call, consisting of a prototype call followed by +declaration of parameters and results. +An additional keyword \fBresult\fP, not part of the normal C language, +is used to indicate which of the declared entities receive results. +As an example, consider the \fIread\fP call, as described in +section 2.1: +.DS +cc = read(fd, buf, nbytes); +result int cc; int fd; result char *buf; int nbytes; +.DE +The first line shows how the \fIread\fP routine is called, with +three parameters. +As shown on the second line \fIcc\fP is an integer and \fIread\fP also +returns information in the parameter \fIbuf\fP. +.PP +Description of all error conditions arising from each system call +is not provided here; they appear in the programmer's manual. +In particular, when accessed from the C language, +many calls return a characteristic \-1 value +when an error occurs, returning the error code in the global variable +\fIerrno\fP. +Other languages may present errors in different ways. +.PP +A number of system standard types are defined in the include file +.I <sys/types.h> +and used in the specifications here and in many C programs. +These include \fBcaddr_t\fP giving a memory address (typically as +a character pointer), +\fBoff_t\fP giving a file offset (typically as a long integer), +and a set of unsigned types \fBu_char\fP, \fBu_short\fP, \fBu_int\fP +and \fBu_long\fP, shorthand names for \fBunsigned char\fP, \fBunsigned +short\fP, etc. diff --git a/share/doc/psd/05.sysman/1.0.t b/share/doc/psd/05.sysman/1.0.t new file mode 100644 index 0000000..5a465a7 --- /dev/null +++ b/share/doc/psd/05.sysman/1.0.t @@ -0,0 +1,56 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.0.t 8.1 (Berkeley) 6/8/93 +.\" +.ds ss 1 +.sh "Kernel primitives +.PP +The facilities available to a UNIX user process are logically +divided into two parts: kernel facilities directly implemented by +UNIX code running in the operating system, and system facilities +implemented either by the system, or in cooperation with a +\fIserver process\fP. These kernel facilities are described in +this section 1. +.PP +The facilities implemented in the kernel are those which define the +\fIUNIX virtual machine\fP in which each process runs. +Like many real machines, this virtual machine has memory management hardware, +an interrupt facility, timers and counters. The UNIX +virtual machine also allows access to files and other objects through a set of +\fIdescriptors\fP. Each descriptor resembles a device controller, +and supports a set of operations. Like devices on real machines, some +of which are internal to the machine and some of which are external, +parts of the descriptor machinery are built-in to the operating system, while +other parts are often implemented in server processes on other machines. +The facilities provided through the descriptor machinery are described in +section 2. +.ds ss 2 diff --git a/share/doc/psd/05.sysman/1.1.t b/share/doc/psd/05.sysman/1.1.t new file mode 100644 index 0000000..223cf83 --- /dev/null +++ b/share/doc/psd/05.sysman/1.1.t @@ -0,0 +1,216 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.1.t 8.1 (Berkeley) 6/8/93 +.\" $FreeBSD$ +.\" +.sh "Processes and protection +.NH 3 +Host and process identifiers +.PP +Each UNIX host has associated with it a 32-bit host id, and a host +name of up to 256 characters (as defined by MAXHOSTNAMELEN in +\fI<sys/param.h>\fP). +These are set (by a privileged user) +and returned by the calls: +.DS +sethostid(hostid) +long hostid; + +hostid = gethostid(); +result long hostid; + +sethostname(name, len) +char *name; int len; + +len = gethostname(buf, buflen) +result int len; result char *buf; int buflen; +.DE +On each host runs a set of \fIprocesses\fP. +Each process is largely independent of other processes, +having its own protection domain, address space, timers, and +an independent set of references to system or user implemented objects. +.PP +Each process in a host is named by an integer +called the \fIprocess id\fP. This number is +in the range 1-30000 +and is returned by +the \fIgetpid\fP routine: +.DS +pid = getpid(); +result int pid; +.DE +On each UNIX host this identifier is guaranteed to be unique; +in a multi-host environment, the (hostid, process id) pairs are +guaranteed unique. +.NH 3 +Process creation and termination +.PP +A new process is created by making a logical duplicate of an +existing process: +.DS +pid = fork(); +result int pid; +.DE +The \fIfork\fP call returns twice, once in the parent process, where +\fIpid\fP is the process identifier of the child, +and once in the child process where \fIpid\fP is 0. +The parent-child relationship induces a hierarchical structure on +the set of processes in the system. +.PP +A process may terminate by executing an \fIexit\fP call: +.DS +exit(status) +int status; +.DE +returning 8 bits of exit status to its parent. +.PP +When a child process exits or +terminates abnormally, the parent process receives +information about any +event which caused termination of the child process. A +second call provides a non-blocking interface and may also be used +to retrieve information about resources consumed by the process during its +lifetime. +.DS +#include <sys/wait.h> + +pid = wait(astatus); +result int pid; result union wait *astatus; + +pid = wait3(astatus, options, arusage); +result int pid; result union waitstatus *astatus; +int options; result struct rusage *arusage; +.DE +.PP +A process can overlay itself with the memory image of another process, +passing the newly created process a set of parameters, using the call: +.DS +execve(name, argv, envp) +char *name, **argv, **envp; +.DE +The specified \fIname\fP must be a file which is in a format recognized +by the system, either a binary executable file or a file which causes +the execution of a specified interpreter program to process its contents. +.NH 3 +User and group ids +.PP +Each process in the system has associated with it two user-id's: +a \fIreal user id\fP and a \fIeffective user id\fP, both 16 bit +unsigned integers (type \fBuid_t\fP). +Each process has an \fIreal accounting group id\fP and an \fIeffective +accounting group id\fP and a set of +\fIaccess group id's\fP. The group id's are 16 bit unsigned integers +(type \fBgid_t\fP). +Each process may be in several different access groups, with the maximum +concurrent number of access groups a system compilation parameter, +the constant NGROUPS in the file \fI<sys/param.h>\fP, +guaranteed to be at least 8. +.PP +The real and effective user ids associated with a process are returned by: +.DS +ruid = getuid(); +result uid_t ruid; + +euid = geteuid(); +result uid_t euid; +.DE +the real and effective accounting group ids by: +.DS +rgid = getgid(); +result gid_t rgid; + +egid = getegid(); +result gid_t egid; +.DE +The access group id set is returned by a \fIgetgroups\fP call*: +.DS +ngroups = getgroups(gidsetsize, gidset); +result int ngroups; int gidsetsize; result int gidset[gidsetsize]; +.DE +.FS +* The type of the gidset array in getgroups and setgroups +remains integer for compatibility with 4.2BSD. +It may change to \fBgid_t\fP in future releases. +.FE +.PP +The user and group id's +are assigned at login time using the \fIsetreuid\fP, \fIsetregid\fP, +and \fIsetgroups\fP calls: +.DS +setreuid(ruid, euid); +int ruid, euid; + +setregid(rgid, egid); +int rgid, egid; + +setgroups(gidsetsize, gidset) +int gidsetsize; int gidset[gidsetsize]; +.DE +The \fIsetreuid\fP call sets both the real and effective user-id's, +while the \fIsetregid\fP call sets both the real +and effective accounting group id's. +Unless the caller is the super-user, \fIruid\fP +must be equal to either the current real or effective user-id, +and \fIrgid\fP equal to either the current real or effective +accounting group id. The \fIsetgroups\fP call is restricted +to the super-user. +.NH 3 +Process groups +.PP +Each process in the system is also normally associated with a \fIprocess +group\fP. The group of processes in a process group is sometimes +referred to as a \fIjob\fP and manipulated by high-level system +software (such as the shell). +The current process group of a process is returned by the +\fIgetpgrp\fP call: +.DS +pgrp = getpgrp(pid); +result int pgrp; int pid; +.DE +When a process is in a specific process group it may receive +software interrupts affecting the group, causing the group to +suspend or resume execution or to be interrupted or terminated. +In particular, a system terminal has a process group and only processes +which are in the process group of the terminal may read from the +terminal, allowing arbitration of terminals among several different jobs. +.PP +The process group associated with a process may be changed by +the \fIsetpgrp\fP call: +.DS +setpgrp(pid, pgrp); +int pid, pgrp; +.DE +Newly created processes are assigned process id's distinct from all +processes and process groups, and the same process group as their +parent. A normal (unprivileged) process may set its process group equal +to its process id. A privileged process may set the process group of any +process to any value. diff --git a/share/doc/psd/05.sysman/1.2.t b/share/doc/psd/05.sysman/1.2.t new file mode 100644 index 0000000..8527a75 --- /dev/null +++ b/share/doc/psd/05.sysman/1.2.t @@ -0,0 +1,273 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.2.t 8.1 (Berkeley) 6/8/93 +.\" $FreeBSD$ +.\" +.sh "Memory management\(dg +.NH 3 +Text, data and stack +.PP +.FS +\(dg This section represents the interface planned for later +releases of the system. Of the calls described in this section, +only \fIsbrk\fP and \fIgetpagesize\fP are included in 4.3BSD. +.FE +Each process begins execution with three logical areas of memory +called text, data and stack. +The text area is read-only and shared, while the data and stack +areas are private to the process. Both the data and stack areas may +be extended and contracted on program request. The call +.DS +addr = sbrk(incr); +result caddr_t addr; int incr; +.DE +changes the size of the data area by \fIincr\fP bytes and +returns the new end of the data area, while +.DS +addr = sstk(incr); +result caddr_t addr; int incr; +.DE +changes the size of the stack area. +The stack area is also automatically extended as needed. +On the VAX the text and data areas are adjacent in the P0 region, +while the stack section is in the P1 region, and grows downward. +.NH 3 +Mapping pages +.PP +The system supports sharing of data between processes +by allowing pages to be mapped into memory. These mapped +pages may be \fIshared\fP with other processes or \fIprivate\fP +to the process. +Protection and sharing options are defined in \fI<sys/mman.h>\fP as: +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* protections are chosen from these bits, or-ed together */ +#define PROT_READ 0x04 /* pages can be read */ +#define PROT_WRITE 0x02 /* pages can be written */ +#define PROT_EXEC 0x01 /* pages can be executed */ +.DE +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* flags contain mapping type, sharing type and options */ +/* mapping type; choose one */ +#define MAP_FILE 0x0001 /* mapped from a file or device */ +#define MAP_ANON 0x0002 /* allocated from memory, swap space */ +#define MAP_TYPE 0x000f /* mask for type field */ +.DE +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* sharing types; choose one */ +#define MAP_SHARED 0x0010 /* share changes */ +#define MAP_PRIVATE 0x0000 /* changes are private */ +.DE +.DS +.ta \w'#define\ \ 'u +\w'MAP_HASSEMAPHORE\ \ 'u +\w'0x0080\ \ 'u +/* other flags */ +#define MAP_FIXED 0x0020 /* map addr must be exactly as requested */ +#define MAP_INHERIT 0x0040 /* region is retained after exec */ +#define MAP_HASSEMAPHORE 0x0080 /* region may contain semaphores */ +#define MAP_NOPREALLOC 0x0100 /* do not preallocate space */ +.DE +The cpu-dependent size of a page is returned by the +\fIgetpagesize\fP system call: +.DS +pagesize = getpagesize(); +result int pagesize; +.DE +.LP +The call: +.DS +maddr = mmap(addr, len, prot, flags, fd, pos); +result caddr_t maddr; caddr_t addr; int *len, prot, flags, fd; off_t pos; +.DE +causes the pages starting at \fIaddr\fP and continuing +for at most \fIlen\fP bytes to be mapped from the object represented by +descriptor \fIfd\fP, starting at byte offset \fIpos\fP. +The starting address of the region is returned; +for the convenience of the system, +it may differ from that supplied +unless the MAP_FIXED flag is given, +in which case the exact address will be used or the call will fail. +The actual amount mapped is returned in \fIlen\fP. +The \fIaddr\fP, \fIlen\fP, and \fIpos\fP parameters +must all be multiples of the pagesize. +A successful \fImmap\fP will delete any previous mapping +in the allocated address range. +The parameter \fIprot\fP specifies the accessibility +of the mapped pages. +The parameter \fIflags\fP specifies +the type of object to be mapped, +mapping options, and +whether modifications made to +this mapped copy of the page +are to be kept \fIprivate\fP, or are to be \fIshared\fP with +other references. +Possible types include MAP_FILE, +mapping a regular file or character-special device memory, +and MAP_ANON, which maps memory not associated with any specific file. +The file descriptor used for creating MAP_ANON regions is used only +for naming, and may be given as \-1 if no name +is associated with the region.\(dd +.FS +\(dd The current design does not allow a process +to specify the location of swap space. +In the future we may define an additional mapping type, MAP_SWAP, +in which the file descriptor argument specifies a file +or device to which swapping should be done. +.FE +The MAP_INHERIT flag allows a region to be inherited after an \fIexec\fP. +The MAP_HASSEMAPHORE flag allows special handling for +regions that may contain semaphores. +The MAP_NOPREALLOC flag allows processes to allocate regions whose +virtual address space, if fully allocated, +would exceed the available memory plus swap resources. +Such regions may get a SIGSEGV signal if they page fault and resources +are not available to service their request; +typically they would free up some resources via \fIunmap\fP so that +when they return from the signal the page +fault could be successfully completed. +.PP +A facility is provided to synchronize a mapped region with the file +it maps; the call +.DS +msync(addr, len); +caddr_t addr; int len; +.DE +writes any modified pages back to the filesystem and updates +the file modification time. +If \fIlen\fP is 0, all modified pages within the region containing \fIaddr\fP +will be flushed; +if \fIlen\fP is non-zero, only the pages containing \fIaddr\fP and \fIlen\fP +succeeding locations will be examined. +Any required synchronization of memory caches +will also take place at this time. +Filesystem operations on a file that is mapped for shared modifications +are unpredictable except after an \fImsync\fP. +.PP +A mapping can be removed by the call +.DS +munmap(addr, len); +caddr_t addr; int len; +.DE +This call deletes the mappings for the specified address range, +and causes further references to addresses within the range +to generate invalid memory references. +.NH 3 +Page protection control +.PP +A process can control the protection of pages using the call +.DS +mprotect(addr, len, prot); +caddr_t addr; int len, prot; +.DE +This call changes the specified pages to have protection \fIprot\fP\|. +Not all implementations will guarantee protection on a page basis; +the granularity of protection changes may be as large as an entire region. +.NH 3 +Giving and getting advice +.PP +A process that has knowledge of its memory behavior may +use the \fImadvise\fP call: +.DS +madvise(addr, len, behav); +caddr_t addr; int len, behav; +.DE +\fIBehav\fP describes expected behavior, as given +in \fI<sys/mman.h>\fP: +.DS +.ta \w'#define\ \ 'u +\w'MADV_SEQUENTIAL\ \ 'u +\w'00\ \ \ \ 'u +#define MADV_NORMAL 0 /* no further special treatment */ +#define MADV_RANDOM 1 /* expect random page references */ +#define MADV_SEQUENTIAL 2 /* expect sequential references */ +#define MADV_WILLNEED 3 /* will need these pages */ +#define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_SPACEAVAIL 5 /* insure that resources are reserved */ +.DE +Finally, a process may obtain information about whether pages are +core resident by using the call +.DS +mincore(addr, len, vec) +caddr_t addr; int len; result char *vec; +.DE +Here the current core residency of the pages is returned +in the character array \fIvec\fP, with a value of 1 meaning +that the page is in-core. +.NH 3 +Synchronization primitives +.PP +Primitives are provided for synchronization using semaphores in shared memory. +Semaphores must lie within a MAP_SHARED region with at least modes +PROT_READ and PROT_WRITE. +The MAP_HASSEMAPHORE flag must have been specified when the region was created. +To acquire a lock a process calls: +.DS +value = mset(sem, wait) +result int value; semaphore *sem; int wait; +.DE +\fIMset\fP indivisibly tests and sets the semaphore \fIsem\fP. +If the previous value is zero, the process has acquired the lock +and \fImset\fP returns true immediately. +Otherwise, if the \fIwait\fP flag is zero, +failure is returned. +If \fIwait\fP is true and the previous value is non-zero, +\fImset\fP relinquishes the processor until notified that it should retry. +.LP +To release a lock a process calls: +.DS +mclear(sem) +semaphore *sem; +.DE +\fIMclear\fP indivisibly tests and clears the semaphore \fIsem\fP. +If the ``WANT'' flag is zero in the previous value, +\fImclear\fP returns immediately. +If the ``WANT'' flag is non-zero in the previous value, +\fImclear\fP arranges for waiting processes to retry before returning. +.PP +Two routines provide services analogous to the kernel +\fIsleep\fP and \fIwakeup\fP functions interpreted in the domain of +shared memory. +A process may relinquish the processor by calling \fImsleep\fP +with a set semaphore: +.DS +msleep(sem) +semaphore *sem; +.DE +If the semaphore is still set when it is checked by the kernel, +the process will be put in a sleeping state +until some other process issues an \fImwakeup\fP for the same semaphore +within the region using the call: +.DS +mwakeup(sem) +semaphore *sem; +.DE +An \fImwakeup\fP may awaken all sleepers on the semaphore, +or may awaken only the next sleeper on a queue. diff --git a/share/doc/psd/05.sysman/1.3.t b/share/doc/psd/05.sysman/1.3.t new file mode 100644 index 0000000..f81a185 --- /dev/null +++ b/share/doc/psd/05.sysman/1.3.t @@ -0,0 +1,254 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.3.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "Signals +.PP +.NH 3 +Overview +.PP +The system defines a set of \fIsignals\fP that may be delivered +to a process. Signal delivery resembles the occurrence of a hardware +interrupt: the signal is blocked from further occurrence, +the current process context is saved, and a new one +is built. A process may specify +the \fIhandler\fP to which a signal is delivered, or specify that +the signal is to be \fIblocked\fP or \fIignored\fP. A process may +also specify that a +\fIdefault\fP action is to be taken when signals occur. +.PP +Some signals +will cause a process to exit when they are not caught. This +may be accompanied by creation of a \fIcore\fP image file, containing +the current memory image of the process for use in post-mortem debugging. +A process may choose to have signals delivered on a special +stack, so that sophisticated software stack manipulations are possible. +.PP +All signals have the same \fIpriority\fP. If multiple signals +are pending simultaneously, the order in which they are delivered +to a process is implementation specific. Signal routines execute +with the signal that caused their invocation \fIblocked\fP, but other +signals may yet occur. Mechanisms are provided whereby critical sections +of code may protect themselves against the occurrence of specified signals. +.NH 3 +Signal types +.PP +The signals defined by the system fall into one of +five classes: hardware conditions, +software conditions, input/output notification, process control, or +resource control. +The set of signals is defined in the file \fI<signal.h>\fP. +.PP +Hardware signals are derived from exceptional conditions which +may occur during +execution. Such signals include SIGFPE representing floating +point and other arithmetic exceptions, SIGILL for illegal instruction +execution, SIGSEGV for addresses outside the currently assigned +area of memory, and SIGBUS for accesses that violate memory +protection constraints. +Other, more cpu-specific hardware signals exist, +such as those for the various customer-reserved instructions on +the VAX (SIGIOT, SIGEMT, and SIGTRAP). +.PP +Software signals reflect interrupts generated by user request: +SIGINT for the normal interrupt signal; SIGQUIT for the more +powerful \fIquit\fP signal, that normally causes a core image +to be generated; SIGHUP and SIGTERM that cause graceful +process termination, either because a user has ``hung up'', or +by user or program request; and SIGKILL, a more powerful termination +signal which a process cannot catch or ignore. +Programs may define their own asynchronous events using SIGUSR1 +and SIGUSR2. +Other software signals (SIGALRM, SIGVTALRM, SIGPROF) +indicate the expiration of interval timers. +.PP +A process can request notification via a SIGIO signal +when input or output is possible +on a descriptor, or when a \fInon-blocking\fP operation completes. +A process may request to receive a SIGURG signal when an +urgent condition arises. +.PP +A process may be \fIstopped\fP by a signal sent to it or the members +of its process group. The SIGSTOP signal is a powerful stop +signal, because it cannot be caught. Other stop signals +SIGTSTP, SIGTTIN, and SIGTTOU are used when a user request, input +request, or output request respectively is the reason for stopping the process. +A SIGCONT signal is sent to a process when it is +continued from a stopped state. +Processes may receive notification with a SIGCHLD signal when +a child process changes state, either by stopping or by terminating. +.PP +Exceeding resource limits may cause signals to be generated. +SIGXCPU occurs when a process nears its CPU time limit and SIGXFSZ +warns that the limit on file size creation has been reached. +.NH 3 +Signal handlers +.PP +A process has a handler associated with each signal. +The handler controls the way the signal is delivered. +The call +.DS +#include <signal.h> + +._f +struct sigvec { + int (*sv_handler)(); + int sv_mask; + int sv_flags; +}; + +sigvec(signo, sv, osv) +int signo; struct sigvec *sv; result struct sigvec *osv; +.DE +assigns interrupt handler address \fIsv_handler\fP to signal \fIsigno\fP. +Each handler address +specifies either an interrupt routine for the signal, that the +signal is to be ignored, +or that a default action (usually process termination) is to occur +if the signal occurs. +The constants +SIG_IGN and SIG_DEF used as values for \fIsv_handler\fP +cause ignoring or defaulting of a condition. +The \fIsv_mask\fP value specifies the +signal mask to be used when the handler is invoked; it implicitly includes +the signal which invoked the handler. +Signal masks include one bit for each signal; +the mask for a signal \fIsigno\fP is provided by the macro +\fIsigmask\fP(\fIsigno\fP), from \fI<signal.h>\fP. +\fISv_flags\fP specifies whether system calls should be +restarted if the signal handler returns and +whether the handler should operate on the normal run-time +stack or a special signal stack (see below). If \fIosv\fP +is non-zero, the previous signal vector is returned. +.PP +When a signal condition arises for a process, the signal +is added to a set of signals pending for the process. +If the signal is not currently \fIblocked\fP by the process +then it will be delivered. The process of signal delivery +adds the signal to be delivered and those signals +specified in the associated signal +handler's \fIsv_mask\fP to a set of those \fImasked\fP +for the process, saves the current process context, +and places the process in the context of the signal +handling routine. The call is arranged so that if the signal +handling routine exits normally the signal mask will be restored +and the process will resume execution in the original context. +If the process wishes to resume in a different context, then +it must arrange to restore the signal mask itself. +.PP +The mask of \fIblocked\fP signals is independent of handlers for +signals. It delays signals from being delivered much as a +raised hardware interrupt priority level delays hardware interrupts. +Preventing an interrupt from occurring by changing the handler is analogous to +disabling a device from further interrupts. +.PP +The signal handling routine \fIsv_handler\fP is called by a C call +of the form +.DS +(*sv_handler)(signo, code, scp); +int signo; long code; struct sigcontext *scp; +.DE +The \fIsigno\fP gives the number of the signal that occurred, and +the \fIcode\fP, a word of information supplied by the hardware. +The \fIscp\fP parameter is a pointer to a machine-dependent +structure containing the information for restoring the +context before the signal. +.NH 3 +Sending signals +.PP +A process can send a signal to another process or group of processes +with the calls: +.DS +kill(pid, signo) +int pid, signo; + +killpgrp(pgrp, signo) +int pgrp, signo; +.DE +Unless the process sending the signal is privileged, +it must have the same effective user id as the process receiving the signal. +.PP +Signals are also sent implicitly from a terminal device to the +process group associated with the terminal when certain input characters +are typed. +.NH 3 +Protecting critical sections +.PP +To block a section of code against one or more signals, a \fIsigblock\fP +call may be used to add a set of signals to the existing mask, returning +the old mask: +.DS +oldmask = sigblock(mask); +result long oldmask; long mask; +.DE +The old mask can then be restored later with \fIsigsetmask\fP\|, +.DS +oldmask = sigsetmask(mask); +result long oldmask; long mask; +.DE +The \fIsigblock\fP call can be used to read the current mask +by specifying an empty \fImask\fP\|. +.PP +It is possible to check conditions with some signals blocked, +and then to pause waiting for a signal and restoring the mask, by using: +.DS +sigpause(mask); +long mask; +.DE +.NH 3 +Signal stacks +.PP +Applications that maintain complex or fixed size stacks can use +the call +.DS +._f +struct sigstack { + caddr_t ss_sp; + int ss_onstack; +}; + +sigstack(ss, oss) +struct sigstack *ss; result struct sigstack *oss; +.DE +to provide the system with a stack based at \fIss_sp\fP for delivery +of signals. The value \fIss_onstack\fP indicates whether the +process is currently on the signal stack, +a notion maintained in software by the system. +.PP +When a signal is to be delivered, the system checks whether +the process is on a signal stack. If not, then the process is switched +to the signal stack for delivery, with the return from the signal +arranged to restore the previous stack. +.PP +If the process wishes to take a non-local exit from the signal routine, +or run code from the signal stack that uses a different stack, +a \fIsigstack\fP call should be used to reset the signal stack. diff --git a/share/doc/psd/05.sysman/1.4.t b/share/doc/psd/05.sysman/1.4.t new file mode 100644 index 0000000..a67a5ce --- /dev/null +++ b/share/doc/psd/05.sysman/1.4.t @@ -0,0 +1,137 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.4.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "Timers +.NH 3 +Real time +.PP +The system's notion of the current Greenwich time and the current time +zone is set and returned by the call by the calls: +.DS +#include <sys/time.h> + +settimeofday(tvp, tzp); +struct timeval *tp; +struct timezone *tzp; + +gettimeofday(tp, tzp); +result struct timeval *tp; +result struct timezone *tzp; +.DE +where the structures are defined in \fI<sys/time.h>\fP as: +.DS +._f +struct timeval { + long tv_sec; /* seconds since Jan 1, 1970 */ + long tv_usec; /* and microseconds */ +}; + +struct timezone { + int tz_minuteswest; /* of Greenwich */ + int tz_dsttime; /* type of dst correction to apply */ +}; +.DE +The precision of the system clock is hardware dependent. +Earlier versions of UNIX contained only a 1-second resolution version +of this call, which remains as a library routine: +.DS +time(tvsec) +result long *tvsec; +.DE +returning only the tv_sec field from the \fIgettimeofday\fP call. +.NH 3 +Interval time +.PP +The system provides each process with three interval timers, +defined in \fI<sys/time.h>\fP: +.DS +._d +#define ITIMER_REAL 0 /* real time intervals */ +#define ITIMER_VIRTUAL 1 /* virtual time intervals */ +#define ITIMER_PROF 2 /* user and system virtual time */ +.DE +The ITIMER_REAL timer decrements +in real time. It could be used by a library routine to +maintain a wakeup service queue. A SIGALRM signal is delivered +when this timer expires. +.PP +The ITIMER_VIRTUAL timer decrements in process virtual time. +It runs only when the process is executing. A SIGVTALRM signal +is delivered when it expires. +.PP +The ITIMER_PROF timer decrements both in process virtual time and when +the system is running on behalf of the process. +It is designed to be used by processes to statistically profile +their execution. +A SIGPROF signal is delivered when it expires. +.PP +A timer value is defined by the \fIitimerval\fP structure: +.DS +._f +struct itimerval { + struct timeval it_interval; /* timer interval */ + struct timeval it_value; /* current value */ +}; +.DE +and a timer is set or read by the call: +.DS +getitimer(which, value); +int which; result struct itimerval *value; + +setitimer(which, value, ovalue); +int which; struct itimerval *value; result struct itimerval *ovalue; +.DE +The third argument to \fIsetitimer\fP specifies an optional structure +to receive the previous contents of the interval timer. +A timer can be disabled by specifying a timer value of 0. +.PP +The system rounds argument timer intervals to be not less than the +resolution of its clock. This clock resolution can be determined +by loading a very small value into a timer and reading the timer back to +see what value resulted. +.PP +The \fIalarm\fP system call of earlier versions of UNIX is provided +as a library routine using the ITIMER_REAL timer. The process +profiling facilities of earlier versions of UNIX +remain because +it is not always possible to guarantee +the automatic restart of system calls after +receipt of a signal. +The \fIprofil\fP call arranges for the kernel to begin gathering +execution statistics for a process: +.DS +profil(buf, bufsize, offset, scale); +result char *buf; int bufsize, offset, scale; +.DE +This begins sampling of the program counter, with statistics maintained +in the user-provided buffer. diff --git a/share/doc/psd/05.sysman/1.5.t b/share/doc/psd/05.sysman/1.5.t new file mode 100644 index 0000000..e642e2d --- /dev/null +++ b/share/doc/psd/05.sysman/1.5.t @@ -0,0 +1,225 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.5.t 8.1 (Berkeley) 6/8/93 +.\" +.sh Descriptors +.PP +.NH 3 +The reference table +.PP +Each process has access to resources through +\fIdescriptors\fP. Each descriptor is a handle allowing +the process to reference objects such as files, devices +and communications links. +.PP +Rather than allowing processes direct access to descriptors, the system +introduces a level of indirection, so that descriptors may be shared +between processes. Each process has a \fIdescriptor reference table\fP, +containing pointers to the actual descriptors. The descriptors +themselves thus have multiple references, and are reference counted by the +system. +.PP +Each process has a fixed size descriptor reference table, where +the size is returned by the \fIgetdtablesize\fP call: +.DS +nds = getdtablesize(); +result int nds; +.DE +and guaranteed to be at least 20. The entries in the descriptor reference +table are referred to by small integers; for example if there +are 20 slots they are numbered 0 to 19. +.NH 3 +Descriptor properties +.PP +Each descriptor has a logical set of properties maintained +by the system and defined by its \fItype\fP. +Each type supports a set of operations; +some operations, such as reading and writing, are common to several +abstractions, while others are unique. +The generic operations applying to many of these types are described +in section 2.1. Naming contexts, files and directories are described in +section 2.2. Section 2.3 describes communications domains and sockets. +Terminals and (structured and unstructured) devices are described +in section 2.4. +.NH 3 +Managing descriptor references +.PP +A duplicate of a descriptor reference may be made by doing +.DS +new = dup(old); +result int new; int old; +.DE +returning a copy of descriptor reference \fIold\fP indistinguishable from +the original. The \fInew\fP chosen by the system will be the +smallest unused descriptor reference slot. +A copy of a descriptor reference may be made in a specific slot +by doing +.DS +dup2(old, new); +int old, new; +.DE +The \fIdup2\fP call causes the system to deallocate the descriptor reference +current occupying slot \fInew\fP, if any, replacing it with a reference +to the same descriptor as old. +This deallocation is also performed by: +.DS +close(old); +int old; +.DE +.NH 3 +Multiplexing requests +.PP +The system provides a +standard way to do +synchronous and asynchronous multiplexing of operations. +.PP +Synchronous multiplexing is performed by using the \fIselect\fP call +to examine the state of multiple descriptors simultaneously, +and to wait for state changes on those descriptors. +Sets of descriptors of interest are specified as bit masks, +as follows: +.DS +#include <sys/types.h> + +nds = select(nd, in, out, except, tvp); +result int nds; int nd; result fd_set *in, *out, *except; +struct timeval *tvp; + +FD_ZERO(&fdset); +FD_SET(fd, &fdset); +FD_CLR(fd, &fdset); +FD_ISSET(fd, &fdset); +int fs; fs_set fdset; +.DE +The \fIselect\fP call examines the descriptors +specified by the +sets \fIin\fP, \fIout\fP and \fIexcept\fP, replacing +the specified bit masks by the subsets that select true for input, +output, and exceptional conditions respectively (\fInd\fP +indicates the number of file descriptors specified by the bit masks). +If any descriptors meet the following criteria, +then the number of such descriptors is returned in \fInds\fP and the +bit masks are updated. +.if n .ds bu * +.if t .ds bu \(bu +.IP \*(bu +A descriptor selects for input if an input oriented operation +such as \fIread\fP or \fIreceive\fP is possible, or if a +connection request may be accepted (see section 2.3.1.4). +.IP \*(bu +A descriptor selects for output if an output oriented operation +such as \fIwrite\fP or \fIsend\fP is possible, or if an operation +that was ``in progress'', such as connection establishment, +has completed (see section 2.1.3). +.IP \*(bu +A descriptor selects for an exceptional condition if a condition +that would cause a SIGURG signal to be generated exists (see section 1.3.2), +or other device-specific events have occurred. +.LP +If none of the specified conditions is true, the operation +waits for one of the conditions to arise, +blocking at most the amount of time specified by \fItvp\fP. +If \fItvp\fP is given as 0, the \fIselect\fP waits indefinitely. +.PP +Options affecting I/O on a descriptor +may be read and set by the call: +.DS +._d +dopt = fcntl(d, cmd, arg) +result int dopt; int d, cmd, arg; + +/* interesting values for cmd */ +#define F_SETFL 3 /* set descriptor options */ +#define F_GETFL 4 /* get descriptor options */ +#define F_SETOWN 5 /* set descriptor owner (pid/pgrp) */ +#define F_GETOWN 6 /* get descriptor owner (pid/pgrp) */ +.DE +The F_SETFL \fIcmd\fP may be used to set a descriptor in +non-blocking I/O mode and/or enable signaling when I/O is +possible. F_SETOWN may be used to specify a process or process +group to be signaled when using the latter mode of operation +or when urgent indications arise. +.PP +Operations on non-blocking descriptors will +either complete immediately, +note an error EWOULDBLOCK, +partially complete an input or output operation returning a partial count, +or return an error EINPROGRESS noting that the requested operation is +in progress. +A descriptor which has signalling enabled will cause the specified process +and/or process group +be signaled, with a SIGIO for input, output, or in-progress +operation complete, or +a SIGURG for exceptional conditions. +.PP +For example, when writing to a terminal +using non-blocking output, +the system will accept only as much data as there is buffer space for +and return; when making a connection on a \fIsocket\fP, the operation may +return indicating that the connection establishment is ``in progress''. +The \fIselect\fP facility can be used to determine when further +output is possible on the terminal, or when the connection establishment +attempt is complete. +.NH 3 +Descriptor wrapping.\(dg +.PP +.FS +\(dg The facilities described in this section are not included +in 4.3BSD. +.FE +A user process may build descriptors of a specified type by +\fIwrapping\fP a communications channel with a system supplied protocol +translator: +.DS +new = wrap(old, proto) +result int new; int old; struct dprop *proto; +.DE +Operations on the descriptor \fIold\fP are then translated by the +system provided protocol translator into requests on the underlying +object \fIold\fP in a way defined by the protocol. +The protocols supported by the kernel may vary from system to system +and are described in the programmers manual. +.PP +Protocols may be based on communications multiplexing or a rights-passing +style of handling multiple requests made on the same object. For instance, +a protocol for implementing a file abstraction may or may not include +locally generated ``read-ahead'' requests. A protocol that provides for +read-ahead may provide higher performance but have a more difficult +implementation. +.PP +Another example is the terminal driving facilities. Normally a terminal +is associated with a communications line, and the terminal type +and standard terminal access protocol are wrapped around a synchronous +communications line and given to the user. If a virtual terminal +is required, the terminal driver can be wrapped around a communications +link, the other end of which is held by a virtual terminal protocol +interpreter. diff --git a/share/doc/psd/05.sysman/1.6.t b/share/doc/psd/05.sysman/1.6.t new file mode 100644 index 0000000..109d271 --- /dev/null +++ b/share/doc/psd/05.sysman/1.6.t @@ -0,0 +1,135 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.6.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "Resource controls +.NH 3 +Process priorities +.PP +The system gives CPU scheduling priority to processes that have not used +CPU time recently. This tends to favor interactive processes and +processes that execute only for short periods. +It is possible to determine the priority currently +assigned to a process, process group, or the processes of a specified user, +or to alter this priority using the calls: +.DS +._d +#define PRIO_PROCESS 0 /* process */ +#define PRIO_PGRP 1 /* process group */ +#define PRIO_USER 2 /* user id */ + +prio = getpriority(which, who); +result int prio; int which, who; + +setpriority(which, who, prio); +int which, who, prio; +.DE +The value \fIprio\fP is in the range \-20 to 20. +The default priority is 0; lower priorities cause more +favorable execution. +The \fIgetpriority\fP call returns the highest priority (lowest numerical value) +enjoyed by any of the specified processes. +The \fIsetpriority\fP call sets the priorities of all of the +specified processes to the specified value. +Only the super-user may lower priorities. +.NH 3 +Resource utilization +.PP +The resources used by a process are returned by a \fIgetrusage\fP call, +returning information in a structure defined in \fI<sys/resource.h>\fP: +.DS +._d +#define RUSAGE_SELF 0 /* usage by this process */ +#define RUSAGE_CHILDREN -1 /* usage by all children */ + +getrusage(who, rusage) +int who; result struct rusage *rusage; + +._f +struct rusage { + struct timeval ru_utime; /* user time used */ + struct timeval ru_stime; /* system time used */ + int ru_maxrss; /* maximum core resident set size: kbytes */ + int ru_ixrss; /* integral shared memory size (kbytes*sec) */ + int ru_idrss; /* unshared data memory size */ + int ru_isrss; /* unshared stack memory size */ + int ru_minflt; /* page-reclaims */ + int ru_majflt; /* page faults */ + int ru_nswap; /* swaps */ + int ru_inblock; /* block input operations */ + int ru_oublock; /* block output operations */ + int ru_msgsnd; /* messages sent */ + int ru_msgrcv; /* messages received */ + int ru_nsignals; /* signals received */ + int ru_nvcsw; /* voluntary context switches */ + int ru_nivcsw; /* involuntary context switches */ +}; +.DE +The \fIwho\fP parameter specifies whose resource usage is to be returned. +The resources used by the current process, or by all +the terminated children of the current process may be requested. +.NH 3 +Resource limits +.PP +The resources of a process for which limits are controlled by the +kernel are defined in \fI<sys/resource.h>\fP, and controlled by the +\fIgetrlimit\fP and \fIsetrlimit\fP calls: +.DS +._d +#define RLIMIT_CPU 0 /* cpu time in milliseconds */ +#define RLIMIT_FSIZE 1 /* maximum file size */ +#define RLIMIT_DATA 2 /* maximum data segment size */ +#define RLIMIT_STACK 3 /* maximum stack segment size */ +#define RLIMIT_CORE 4 /* maximum core file size */ +#define RLIMIT_RSS 5 /* maximum resident set size */ + +#define RLIM_NLIMITS 6 + +#define RLIM_INFINITY 0x7f\&f\&f\&f\&f\&f\&f + +._f +struct rlimit { + int rlim_cur; /* current (soft) limit */ + int rlim_max; /* hard limit */ +}; + +getrlimit(resource, rlp) +int resource; result struct rlimit *rlp; + +setrlimit(resource, rlp) +int resource; struct rlimit *rlp; +.DE +.PP +Only the super-user can raise the maximum limits. +Other users may only +alter \fIrlim_cur\fP within the range from 0 to \fIrlim_max\fP +or (irreversibly) lower \fIrlim_max\fP. diff --git a/share/doc/psd/05.sysman/1.7.t b/share/doc/psd/05.sysman/1.7.t new file mode 100644 index 0000000..09e1a02 --- /dev/null +++ b/share/doc/psd/05.sysman/1.7.t @@ -0,0 +1,100 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.7.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "System operation support +.PP +Unless noted otherwise, +the calls in this section are permitted only to a privileged user. +.NH 3 +Bootstrap operations +.PP +The call +.DS +mount(blkdev, dir, ronly); +char *blkdev, *dir; int ronly; +.DE +extends the UNIX name space. The \fImount\fP call specifies +a block device \fIblkdev\fP containing a UNIX file system +to be made available starting at \fIdir\fP. If \fIronly\fP is +set then the file system is read-only; writes to the file system +will not be permitted and access times will not be updated +when files are referenced. +\fIDir\fP is normally a name in the root directory. +.PP +The call +.DS +swapon(blkdev, size); +char *blkdev; int size; +.DE +specifies a device to be made available for paging and swapping. +.PP +.NH 3 +Shutdown operations +.PP +The call +.DS +unmount(dir); +char *dir; +.DE +unmounts the file system mounted on \fIdir\fP. +This call will succeed only if the file system is +not currently being used. +.PP +The call +.DS +sync(); +.DE +schedules input/output to clean all system buffer caches. +(This call does not require privileged status.) +.PP +The call +.DS +reboot(how) +int how; +.DE +causes a machine halt or reboot. The call may request a reboot +by specifying \fIhow\fP as RB_AUTOBOOT, or that the machine be halted +with RB_HALT. These constants are defined in \fI<sys/reboot.h>\fP. +.NH 3 +Accounting +.PP +The system optionally keeps an accounting record in a file +for each process that exits on the system. +The format of this record is beyond the scope of this document. +The accounting may be enabled to a file \fIname\fP by doing +.DS +acct(path); +char *path; +.DE +If \fIpath\fP is null, then accounting is disabled. Otherwise, +the named file becomes the accounting file. diff --git a/share/doc/psd/05.sysman/2.0.t b/share/doc/psd/05.sysman/2.0.t new file mode 100644 index 0000000..ca44bc2 --- /dev/null +++ b/share/doc/psd/05.sysman/2.0.t @@ -0,0 +1,83 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.0.t 8.1 (Berkeley) 6/8/93 +.\" +.ds ss 1 +.sh "System facilities +This section discusses the system facilities that +are not considered part of the kernel. +.PP +The system abstractions described are: +.IP "Directory contexts +.br +A directory context is a position in the UNIX file system name +space. Operations on files and other named objects in a file system are +always specified relative to such a context. +.IP "Files +.br +Files are used to store uninterpreted sequence of bytes on which +random access \fIreads\fP and \fIwrites\fP may occur. +Pages from files may also be mapped into process address space.\(dg +A directory may be read as a file. +.FS +\(dg Support for mapping files is not included in the 4.3 release. +.FE +.IP "Communications domains +.br +A communications domain represents +an interprocess communications environment, such as the communications +facilities of the UNIX system, +communications in the INTERNET, or the resource sharing protocols +and access rights of a resource sharing system on a local network. +.IP "Sockets +.br +A socket is an endpoint of communication and the focal +point for IPC in a communications domain. Sockets may be created in pairs, +or given names and used to rendezvous with other sockets +in a communications domain, accepting connections from these +sockets or exchanging messages with them. These operations model +a labeled or unlabeled communications graph, and can be used in a +wide variety of communications domains. Sockets can have different +\fItypes\fP\| to provide different semantics of communication, +increasing the flexibility of the model. +.IP "Terminals and other devices +.br +Devices include +terminals, providing input editing and interrupt generation +and output flow control and editing, magnetic tapes, +disks and other peripherals. They often support the generic +\fIread\fP and \fIwrite\fP operations as well as a number of \fIioctl\fP\|s. +.IP "Processes +.br +Process descriptors provide facilities for control and debugging of +other processes. +.ds ss 2 diff --git a/share/doc/psd/05.sysman/2.1.t b/share/doc/psd/05.sysman/2.1.t new file mode 100644 index 0000000..ef25887 --- /dev/null +++ b/share/doc/psd/05.sysman/2.1.t @@ -0,0 +1,138 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.1.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "Generic operations +.PP +.PP +Many system abstractions support the +operations \fIread\fP, \fIwrite\fP and \fIioctl\fP. We describe +the basics of these common primitives here. +Similarly, the mechanisms whereby normally synchronous operations +may occur in a non-blocking or asynchronous fashion are +common to all system-defined abstractions and are described here. +.NH 3 +Read and write +.PP +The \fIread\fP and \fIwrite\fP system calls can be applied +to communications channels, files, terminals and devices. +They have the form: +.DS +cc = read(fd, buf, nbytes); +result int cc; int fd; result caddr_t buf; int nbytes; + +cc = write(fd, buf, nbytes); +result int cc; int fd; caddr_t buf; int nbytes; +.DE +The \fIread\fP call transfers as much data as possible from the +object defined by \fIfd\fP to the buffer at address \fIbuf\fP of +size \fInbytes\fP. The number of bytes transferred is +returned in \fIcc\fP, which is \-1 if a return occurred before +any data was transferred because of an error or use of non-blocking +operations. +.PP +The \fIwrite\fP call transfers data from the buffer to the +object defined by \fIfd\fP. Depending on the type of \fIfd\fP, +it is possible that the \fIwrite\fP call will accept some portion +of the provided bytes; the user should resubmit the other bytes +in a later request in this case. +Error returns because of interrupted or otherwise incomplete operations +are possible. +.PP +Scattering of data on input or gathering of data for output +is also possible using an array of input/output vector descriptors. +The type for the descriptors is defined in \fI<sys/uio.h>\fP as: +.DS +._f +struct iovec { + caddr_t iov_msg; /* base of a component */ + int iov_len; /* length of a component */ +}; +.DE +The calls using an array of descriptors are: +.DS +cc = readv(fd, iov, iovlen); +result int cc; int fd; struct iovec *iov; int iovlen; + +cc = writev(fd, iov, iovlen); +result int cc; int fd; struct iovec *iov; int iovlen; +.DE +Here \fIiovlen\fP is the count of elements in the \fIiov\fP array. +.NH 3 +Input/output control +.PP +Control operations on an object are performed by the \fIioctl\fP +operation: +.DS +ioctl(fd, request, buffer); +int fd, request; caddr_t buffer; +.DE +This operation causes the specified \fIrequest\fP to be performed +on the object \fIfd\fP. The \fIrequest\fP parameter specifies +whether the argument buffer is to be read, written, read and written, +or is not needed, and also the size of the buffer, as well as the +request. +Different descriptor types and subtypes within descriptor types +may use distinct \fIioctl\fP requests. For example, +operations on terminals control flushing of input and output +queues and setting of terminal parameters; operations on +disks cause formatting operations to occur; operations on tapes +control tape positioning. +.PP +The names for basic control operations are defined in \fI<sys/ioctl.h>\fP. +.NH 3 +Non-blocking and asynchronous operations +.PP +A process that wishes to do non-blocking operations on one of +its descriptors sets the descriptor in non-blocking mode as +described in section 1.5.4. Thereafter the \fIread\fP call will +return a specific EWOULDBLOCK error indication if there is no data to be +\fIread\fP. The process may +\fIselect\fP the associated descriptor to determine when a read is +possible. +.PP +Output attempted when a descriptor can accept less than is requested +will either accept some of the provided data, returning a shorter than normal +length, or return an error indicating that the operation would block. +More output can be performed as soon as a \fIselect\fP call indicates +the object is writeable. +.PP +Operations other than data input or output +may be performed on a descriptor in a non-blocking fashion. +These operations will return with a characteristic error indicating +that they are in progress +if they cannot complete immediately. The descriptor +may then be \fIselect\fPed for \fIwrite\fP to find out +when the operation has been completed. When \fIselect\fP indicates +the descriptor is writeable, the operation has completed. +Depending on the nature of the descriptor and the operation, +additional activity may be started or the new state may be tested. diff --git a/share/doc/psd/05.sysman/2.2.t b/share/doc/psd/05.sysman/2.2.t new file mode 100644 index 0000000..996e9b5 --- /dev/null +++ b/share/doc/psd/05.sysman/2.2.t @@ -0,0 +1,470 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.2.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "File system +.NH 3 +Overview +.PP +The file system abstraction provides access to a hierarchical +file system structure. +The file system contains directories (each of which may contain +other sub-directories) as well as files and references to other +objects such as devices and inter-process communications sockets. +.PP +Each file is organized as a linear array of bytes. No record +boundaries or system related information is present in +a file. +Files may be read and written in a random-access fashion. +The user may read the data in a directory as though +it were an ordinary file to determine the names of the contained files, +but only the system may write into the directories. +The file system stores only a small amount of ownership, protection and usage +information with a file. +.NH 3 +Naming +.PP +The file system calls take \fIpath name\fP arguments. +These consist of a zero or more component \fIfile names\fP +separated by ``/\^'' characters, where each file name +is up to 255 ASCII characters excluding null and ``/\^''. +.PP +Each process always has two naming contexts: one for the +root directory of the file system and one for the +current working directory. These are used +by the system in the filename translation process. +If a path name begins with a ``/\^'', it is called +a full path name and interpreted relative to the root directory context. +If the path name does not begin with a ``/\^'' it is called +a relative path name and interpreted relative to the current directory +context. +.PP +The system limits +the total length of a path name to 1024 characters. +.PP +The file name ``..'' in each directory refers to +the parent directory of that directory. +The parent directory of the root of the file system is always that directory. +.PP +The calls +.DS +chdir(path); +char *path; + +chroot(path) +char *path; +.DE +change the current working directory and root directory context of a process. +Only the super-user can change the root directory context of a process. +.NH 3 +Creation and removal +.PP +The file system allows directories, files, special devices, +and ``portals'' to be created and removed from the file system. +.NH 4 +Directory creation and removal +.PP +A directory is created with the \fImkdir\fP system call: +.DS +mkdir(path, mode); +char *path; int mode; +.DE +where the mode is defined as for files (see below). +Directories are removed with the \fIrmdir\fP system call: +.DS +rmdir(path); +char *path; +.DE +A directory must be empty if it is to be deleted. +.NH 4 +File creation +.PP +Files are created with the \fIopen\fP system call, +.DS +fd = open(path, oflag, mode); +result int fd; char *path; int oflag, mode; +.DE +The \fIpath\fP parameter specifies the name of the +file to be created. The \fIoflag\fP parameter must +include O_CREAT from below to cause the file to be created. +Bits for \fIoflag\fP are +defined in \fI<sys/file.h>\fP: +.DS +._d +#define O_RDONLY 000 /* open for reading */ +#define O_WRONLY 001 /* open for writing */ +#define O_RDWR 002 /* open for read & write */ +#define O_NDELAY 004 /* non-blocking open */ +#define O_APPEND 010 /* append on each write */ +#define O_CREAT 01000 /* open with file create */ +#define O_TRUNC 02000 /* open with truncation */ +#define O_EXCL 04000 /* error on create if file exists */ +.DE +.PP +One of O_RDONLY, O_WRONLY and O_RDWR should be specified, +indicating what types of operations are desired to be performed +on the open file. The operations will be checked against the user's +access rights to the file before allowing the \fIopen\fP to succeed. +Specifying O_APPEND causes writes to automatically append to the +file. +The flag O_CREAT causes the file to be created if it does not +exist, owned by the current user +and the group of the containing directory. +The protection for the new file is specified in \fImode\fP. +The file mode is used as a three digit octal number. +Each digit encodes read access as 4, write access as 2 and execute +access as 1, or'ed together. The 0700 bits describe owner +access, the 070 bits describe the access rights for processes in the same +group as the file, and the 07 bits describe the access rights +for other processes. +.PP +If the open specifies to create the file with O_EXCL +and the file already exists, then the \fIopen\fP will fail +without affecting the file in any way. This provides a +simple exclusive access facility. +If the file exists but is a symbolic link, the open will fail +regardless of the existence of the file specified by the link. +.NH 4 +Creating references to devices +.PP +The file system allows entries which reference peripheral devices. +Peripherals are distinguished as \fIblock\fP or \fIcharacter\fP +devices according by their ability to support block-oriented +operations. +Devices are identified by their ``major'' and ``minor'' +device numbers. The major device number determines the kind +of peripheral it is, while the minor device number indicates +one of possibly many peripherals of that kind. +Structured devices have all operations performed internally +in ``block'' quantities while +unstructured devices often have a number of +special \fIioctl\fP operations, and may have input and output +performed in varying units. +The \fImknod\fP call creates special entries: +.DS +mknod(path, mode, dev); +char *path; int mode, dev; +.DE +where \fImode\fP is formed from the object type +and access permissions. The parameter \fIdev\fP is a configuration +dependent parameter used to identify specific character or +block I/O devices. +.NH 4 +Portal creation\(dg +.PP +.FS +\(dg The \fIportal\fP call is not implemented in 4.3BSD. +.FE +The call +.DS +fd = portal(name, server, param, dtype, protocol, domain, socktype) +result int fd; char *name, *server, *param; int dtype, protocol; +int domain, socktype; +.DE +places a \fIname\fP in the file system name space that causes connection to a +server process when the name is used. +The portal call returns an active portal in \fIfd\fP as though an +access had occurred to activate an inactive portal, as now described. +.PP +When an inactive portal is accessed, the system sets up a socket +of the specified \fIsocktype\fP in the specified communications +\fIdomain\fP (see section 2.3), and creates the \fIserver\fP process, +giving it the specified \fIparam\fP as argument to help it identify +the portal, and also giving it the newly created socket as descriptor +number 0. The accessor of the portal will create a socket in the same +\fIdomain\fP and \fIconnect\fP to the server. The user will then +\fIwrap\fP the socket in the specified \fIprotocol\fP to create an object of +the required descriptor type \fIdtype\fP and proceed with the +operation which was in progress before the portal was encountered. +.PP +While the server process holds the socket (which it received as \fIfd\fP +from the \fIportal\fP call on descriptor 0 at activation) further references +will result in connections being made to the same socket. +.NH 4 +File, device, and portal removal +.PP +A reference to a file, special device or portal may be removed with the +\fIunlink\fP call, +.DS +unlink(path); +char *path; +.DE +The caller must have write access to the directory in which +the file is located for this call to be successful. +.NH 3 +Reading and modifying file attributes +.PP +Detailed information about the attributes of a file +may be obtained with the calls: +.DS +#include <sys/stat.h> + +stat(path, stb); +char *path; result struct stat *stb; + +fstat(fd, stb); +int fd; result struct stat *stb; +.DE +The \fIstat\fP structure includes the file +type, protection, ownership, access times, +size, and a count of hard links. +If the file is a symbolic link, then the status of the link +itself (rather than the file the link references) +may be found using the \fIlstat\fP call: +.DS +lstat(path, stb); +char *path; result struct stat *stb; +.DE +.PP +Newly created files are assigned the user id of the +process that created it and the group id of the directory +in which it was created. The ownership of a file may +be changed by either of the calls +.DS +chown(path, owner, group); +char *path; int owner, group; + +fchown(fd, owner, group); +int fd, owner, group; +.DE +.PP +In addition to ownership, each file has three levels of access +protection associated with it. These levels are owner relative, +group relative, and global (all users and groups). Each level +of access has separate indicators for read permission, write +permission, and execute permission. +The protection bits associated with a file may be set by either +of the calls: +.DS +chmod(path, mode); +char *path; int mode; + +fchmod(fd, mode); +int fd, mode; +.DE +where \fImode\fP is a value indicating the new protection +of the file, as listed in section 2.2.3.2. +.PP +Finally, the access and modify times on a file may be set by the call: +.DS +utimes(path, tvp) +char *path; struct timeval *tvp[2]; +.DE +This is particularly useful when moving files between media, to +preserve relationships between the times the file was modified. +.NH 3 +Links and renaming +.PP +Links allow multiple names for a file +to exist. Links exist independently of the file linked to. +.PP +Two types of links exist, \fIhard\fP links and \fIsymbolic\fP +links. A hard link is a reference counting mechanism that +allows a file to have multiple names within the same file +system. Symbolic links cause string substitution +during the pathname interpretation process. +.PP +Hard links and symbolic links have different +properties. A hard link insures the target +file will always be accessible, even after its original +directory entry is removed; no such guarantee exists for a symbolic link. +Symbolic links can span file systems boundaries. +.PP +The following calls create a new link, named \fIpath2\fP, +to \fIpath1\fP: +.DS +link(path1, path2); +char *path1, *path2; + +symlink(path1, path2); +char *path1, *path2; +.DE +The \fIunlink\fP primitive may be used to remove +either type of link. +.PP +If a file is a symbolic link, the ``value'' of the +link may be read with the \fIreadlink\fP call, +.DS +len = readlink(path, buf, bufsize); +result int len; result char *path, *buf; int bufsize; +.DE +This call returns, in \fIbuf\fP, the null-terminated string +substituted into pathnames passing through \fIpath\fP\|. +.PP +Atomic renaming of file system resident objects is possible +with the \fIrename\fP call: +.DS +rename(oldname, newname); +char *oldname, *newname; +.DE +where both \fIoldname\fP and \fInewname\fP must be +in the same file system. +If \fInewname\fP exists and is a directory, then it must be empty. +.NH 3 +Extension and truncation +.PP +Files are created with zero length and may be extended +simply by writing or appending to them. While a file is +open the system maintains a pointer into the file +indicating the current location in the file associated with +the descriptor. This pointer may be moved about in the +file in a random access fashion. +To set the current offset into a file, the \fIlseek\fP +call may be used, +.DS +oldoffset = lseek(fd, offset, type); +result off_t oldoffset; int fd; off_t offset; int type; +.DE +where \fItype\fP is given in \fI<sys/file.h>\fP as one of: +.DS +._d +#define L_SET 0 /* set absolute file offset */ +#define L_INCR 1 /* set file offset relative to current position */ +#define L_XTND 2 /* set offset relative to end-of-file */ +.DE +The call ``lseek(fd, 0, L_INCR)'' +returns the current offset into the file. +.PP +Files may have ``holes'' in them. Holes are void areas in the +linear extent of the file where data has never been +written. These may be created by seeking to +a location in a file past the current end-of-file and writing. +Holes are treated by the system as zero valued bytes. +.PP +A file may be truncated with either of the calls: +.DS +truncate(path, length); +char *path; int length; + +ftruncate(fd, length); +int fd, length; +.DE +reducing the size of the specified file to \fIlength\fP bytes. +.NH 3 +Checking accessibility +.PP +A process running with +different real and effective user ids +may interrogate the accessibility of a file to the +real user by using +the \fIaccess\fP call: +.DS +accessible = access(path, how); +result int accessible; char *path; int how; +.DE +Here \fIhow\fP is constructed by or'ing the following bits, defined +in \fI<sys/file.h>\fP: +.DS +._d +#define F_OK 0 /* file exists */ +#define X_OK 1 /* file is executable */ +#define W_OK 2 /* file is writable */ +#define R_OK 4 /* file is readable */ +.DE +The presence or absence of advisory locks does not affect the +result of \fIaccess\fP\|. +.NH 3 +Locking +.PP +The file system provides basic facilities that allow cooperating processes +to synchronize their access to shared files. A process may +place an advisory \fIread\fP or \fIwrite\fP lock on a file, +so that other cooperating processes may avoid interfering +with the process' access. This simple mechanism +provides locking with file granularity. More granular +locking can be built using the IPC facilities to provide a lock +manager. +The system does not force processes to obey the locks; +they are of an advisory nature only. +.PP +Locking is performed after an \fIopen\fP call by applying the +\fIflock\fP primitive, +.DS +flock(fd, how); +int fd, how; +.DE +where the \fIhow\fP parameter is formed from bits defined in \fI<sys/file.h>\fP: +.DS +._d +#define LOCK_SH 1 /* shared lock */ +#define LOCK_EX 2 /* exclusive lock */ +#define LOCK_NB 4 /* don't block when locking */ +#define LOCK_UN 8 /* unlock */ +.DE +Successive lock calls may be used to increase or +decrease the level of locking. If an object is currently +locked by another process when a \fIflock\fP call is made, +the caller will be blocked until the current lock owner +releases the lock; this may be avoided by including LOCK_NB +in the \fIhow\fP parameter. +Specifying LOCK_UN removes all locks associated with the descriptor. +Advisory locks held by a process are automatically deleted when +the process terminates. +.NH 3 +Disk quotas +.PP +As an optional facility, each file system may be requested to +impose limits on a user's disk usage. +Two quantities are limited: the total amount of disk space which +a user may allocate in a file system and the total number of files +a user may create in a file system. Quotas are expressed as +\fIhard\fP limits and \fIsoft\fP limits. A hard limit is +always imposed; if a user would exceed a hard limit, the operation +which caused the resource request will fail. A soft limit results +in the user receiving a warning message, but with allocation succeeding. +Facilities are provided to turn soft limits into hard limits if a +user has exceeded a soft limit for an unreasonable period of time. +.PP +To enable disk quotas on a file system the \fIsetquota\fP call +is used: +.DS +setquota(special, file) +char *special, *file; +.DE +where \fIspecial\fP refers to a structured device file where +a mounted file system exists, and +\fIfile\fP refers to a disk quota file (residing on the file +system associated with \fIspecial\fP) from which user quotas +should be obtained. The format of the disk quota file is +implementation dependent. +.PP +To manipulate disk quotas the \fIquota\fP call is provided: +.DS +#include <sys/quota.h> + +quota(cmd, uid, arg, addr) +int cmd, uid, arg; caddr_t addr; +.DE +The indicated \fIcmd\fP is applied to the user ID \fIuid\fP. +The parameters \fIarg\fP and \fIaddr\fP are command specific. +The file \fI<sys/quota.h>\fP contains definitions pertinent to the +use of this call. diff --git a/share/doc/psd/05.sysman/2.3.t b/share/doc/psd/05.sysman/2.3.t new file mode 100644 index 0000000..edf3e10 --- /dev/null +++ b/share/doc/psd/05.sysman/2.3.t @@ -0,0 +1,413 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.3.t 8.1 (Berkeley) 6/8/93 +.\" $FreeBSD$ +.\" +.sh "Interprocess communications +.NH 3 +Interprocess communication primitives +.NH 4 +Communication domains +.PP +The system provides access to an extensible set of +communication \fIdomains\fP. A communication domain +is identified by a manifest constant defined in the +file \fI<sys/socket.h>\fP. +Important standard domains supported by the system are the ``unix'' +domain, AF_UNIX, for communication within the system, the ``Internet'' +domain for communication in the DARPA Internet, AF_INET, +and the ``NS'' domain, AF_NS, for communication +using the Xerox Network Systems protocols. +Other domains can be added to the system. +.NH 4 +Socket types and protocols +.PP +Within a domain, communication takes place between communication endpoints +known as \fIsockets\fP. Each socket has the potential to exchange +information with other sockets of an appropriate type within the domain. +.PP +Each socket has an associated +abstract type, which describes the semantics of communication using that +socket. Properties such as reliability, ordering, and prevention +of duplication of messages are determined by the type. +The basic set of socket types is defined in \fI<sys/socket.h>\fP: +.DS +/* Standard socket types */ +._d +#define SOCK_DGRAM 1 /* datagram */ +#define SOCK_STREAM 2 /* virtual circuit */ +#define SOCK_RAW 3 /* raw socket */ +#define SOCK_RDM 4 /* reliably-delivered message */ +#define SOCK_SEQPACKET 5 /* sequenced packets */ +.DE +The SOCK_DGRAM type models the semantics of datagrams in network communication: +messages may be lost or duplicated and may arrive out-of-order. +A datagram socket may send messages to and receive messages from multiple +peers. +The SOCK_RDM type models the semantics of reliable datagrams: messages +arrive unduplicated and in-order, the sender is notified if +messages are lost. +The \fIsend\fP and \fIreceive\fP operations (described below) +generate reliable/unreliable datagrams. +The SOCK_STREAM type models connection-based virtual circuits: two-way +byte streams with no record boundaries. +Connection setup is required before data communication may begin. +The SOCK_SEQPACKET type models a connection-based, +full-duplex, reliable, sequenced packet exchange; +the sender is notified if messages are lost, and messages are never +duplicated or presented out-of-order. +Users of the last two abstractions may use the facilities for +out-of-band transmission to send out-of-band data. +.PP +SOCK_RAW is used for unprocessed access to internal network layers +and interfaces; it has no specific semantics. +.PP +Other socket types can be defined. +.PP +Each socket may have a specific \fIprotocol\fP associated with it. +This protocol is used within the domain to provide the semantics +required by the socket type. +Not all socket types are supported by each domain; +support depends on the existence and the implementation +of a suitable protocol within the domain. +For example, within the ``Internet'' domain, the SOCK_DGRAM type may be +implemented by the UDP user datagram protocol, and the SOCK_STREAM +type may be implemented by the TCP transmission control protocol, while +no standard protocols to provide SOCK_RDM or SOCK_SEQPACKET sockets exist. +.NH 4 +Socket creation, naming and service establishment +.PP +Sockets may be \fIconnected\fP or \fIunconnected\fP. An unconnected +socket descriptor is obtained by the \fIsocket\fP call: +.DS +s = socket(domain, type, protocol); +result int s; int domain, type, protocol; +.DE +The socket domain and type are as described above, +and are specified using the definitions from \fI<sys/socket.h>\fP. +The protocol may be given as 0, meaning any suitable protocol. +One of several possible protocols may be selected using identifiers +obtained from a library routine, \fIgetprotobyname\fP. +.PP +An unconnected socket descriptor of a connection-oriented type +may yield a connected socket descriptor +in one of two ways: either by actively connecting to another socket, +or by becoming associated with a name in the communications domain and +\fIaccepting\fP a connection from another socket. +Datagram sockets need not establish connections before use. +.PP +To accept connections or to receive datagrams, +a socket must first have a binding +to a name (or address) within the communications domain. +Such a binding may be established by a \fIbind\fP call: +.DS +bind(s, name, namelen); +int s; struct sockaddr *name; int namelen; +.DE +Datagram sockets may have default bindings established when first +sending data if not explicitly bound earlier. +In either case, +a socket's bound name may be retrieved with a \fIgetsockname\fP call: +.DS +getsockname(s, name, namelen); +int s; result struct sockaddr *name; result int *namelen; +.DE +while the peer's name can be retrieved with \fIgetpeername\fP: +.DS +getpeername(s, name, namelen); +int s; result struct sockaddr *name; result int *namelen; +.DE +Domains may support sockets with several names. +.NH 4 +Accepting connections +.PP +Once a binding is made to a connection-oriented socket, +it is possible to \fIlisten\fP for connections: +.DS +listen(s, backlog); +int s, backlog; +.DE +The \fIbacklog\fP specifies the maximum count of connections +that can be simultaneously queued awaiting acceptance. +.PP +An \fIaccept\fP call: +.DS +t = accept(s, name, anamelen); +result int t; int s; result struct sockaddr *name; result int *anamelen; +.DE +returns a descriptor for a new, connected, socket +from the queue of pending connections on \fIs\fP. +If no new connections are queued for acceptance, +the call will wait for a connection unless non-blocking I/O has been enabled. +.NH 4 +Making connections +.PP +An active connection to a named socket is made by the \fIconnect\fP call: +.DS +connect(s, name, namelen); +int s; struct sockaddr *name; int namelen; +.DE +Although datagram sockets do not establish connections, +the \fIconnect\fP call may be used with such sockets +to create an \fIassociation\fP with the foreign address. +The address is recorded for use in future \fIsend\fP calls, +which then need not supply destination addresses. +Datagrams will be received only from that peer, +and asynchronous error reports may be received. +.PP +It is also possible to create connected pairs of sockets without +using the domain's name space to rendezvous; this is done with the +\fIsocketpair\fP call\(dg: +.FS +\(dg 4.3BSD supports \fIsocketpair\fP creation only in the ``unix'' +communication domain. +.FE +.DS +socketpair(domain, type, protocol, sv); +int domain, type, protocol; result int sv[2]; +.DE +Here the returned \fIsv\fP descriptors correspond to those obtained with +\fIaccept\fP and \fIconnect\fP. +.PP +The call +.DS +pipe(pv) +result int pv[2]; +.DE +creates a pair of SOCK_STREAM sockets in the UNIX domain, +with pv[0] only writable and pv[1] only readable. +.NH 4 +Sending and receiving data +.PP +Messages may be sent from a socket by: +.DS +cc = sendto(s, buf, len, flags, to, tolen); +result int cc; int s; caddr_t buf; int len, flags; caddr_t to; int tolen; +.DE +if the socket is not connected or: +.DS +cc = send(s, buf, len, flags); +result int cc; int s; caddr_t buf; int len, flags; +.DE +if the socket is connected. +The corresponding receive primitives are: +.DS +msglen = recvfrom(s, buf, len, flags, from, fromlenaddr); +result int msglen; int s; result caddr_t buf; int len, flags; +result caddr_t from; result int *fromlenaddr; +.DE +and +.DS +msglen = recv(s, buf, len, flags); +result int msglen; int s; result caddr_t buf; int len, flags; +.DE +.PP +In the unconnected case, +the parameters \fIto\fP and \fItolen\fP +specify the destination or source of the message, while +the \fIfrom\fP parameter stores the source of the message, +and \fI*fromlenaddr\fP initially gives the size of the \fIfrom\fP +buffer and is updated to reflect the true length of the \fIfrom\fP +address. +.PP +All calls cause the message to be received in or sent from +the message buffer of length \fIlen\fP bytes, starting at address \fIbuf\fP. +The \fIflags\fP specify +peeking at a message without reading it or sending or receiving +high-priority out-of-band messages, as follows: +.DS +._d +#define MSG_PEEK 0x1 /* peek at incoming message */ +#define MSG_OOB 0x2 /* process out-of-band data */ +.DE +.NH 4 +Scatter/gather and exchanging access rights +.PP +It is possible scatter and gather data and to exchange access rights +with messages. When either of these operations is involved, +the number of parameters to the call becomes large. +Thus the system defines a message header structure, in \fI<sys/socket.h>\fP, +which can be +used to conveniently contain the parameters to the calls: +.DS +.if t .ta .5i 1.25i 2i 2.7i +.if n ._f +struct msghdr { + caddr_t msg_name; /* optional address */ + int msg_namelen; /* size of address */ + struct iov *msg_iov; /* scatter/gather array */ + int msg_iovlen; /* # elements in msg_iov */ + caddr_t msg_accrights; /* access rights sent/received */ + int msg_accrightslen; /* size of msg_accrights */ +}; +.DE +Here \fImsg_name\fP and \fImsg_namelen\fP specify the source or destination +address if the socket is unconnected; \fImsg_name\fP may be given as +a null pointer if no names are desired or required. +The \fImsg_iov\fP and \fImsg_iovlen\fP describe the scatter/gather +locations, as described in section 2.1.3. +Access rights to be sent along with the message are specified +in \fImsg_accrights\fP, which has length \fImsg_accrightslen\fP. +In the ``unix'' domain these are an array of integer descriptors, +taken from the sending process and duplicated in the receiver. +.PP +This structure is used in the operations \fIsendmsg\fP and \fIrecvmsg\fP: +.DS +sendmsg(s, msg, flags); +int s; struct msghdr *msg; int flags; + +msglen = recvmsg(s, msg, flags); +result int msglen; int s; result struct msghdr *msg; int flags; +.DE +.NH 4 +Using read and write with sockets +.PP +The normal UNIX \fIread\fP and \fIwrite\fP calls may be +applied to connected sockets and translated into \fIsend\fP and \fIreceive\fP +calls from or to a single area of memory and discarding any rights +received. A process may operate on a virtual circuit socket, a terminal +or a file with blocking or non-blocking input/output +operations without distinguishing the descriptor type. +.NH 4 +Shutting down halves of full-duplex connections +.PP +A process that has a full-duplex socket such as a virtual circuit +and no longer wishes to read from or write to this socket can +give the call: +.DS +shutdown(s, direction); +int s, direction; +.DE +where \fIdirection\fP is 0 to not read further, 1 to not +write further, or 2 to completely shut the connection down. +If the underlying protocol supports unidirectional or bidirectional shutdown, +this indication will be passed to the peer. +For example, a shutdown for writing might produce an end-of-file +condition at the remote end. +.NH 4 +Socket and protocol options +.PP +Sockets, and their underlying communication protocols, may +support \fIoptions\fP. These options may be used to manipulate +implementation- or protocol-specific facilities. +The \fIgetsockopt\fP +and \fIsetsockopt\fP calls are used to control options: +.DS +getsockopt(s, level, optname, optval, optlen) +int s, level, optname; result caddr_t optval; result int *optlen; + +setsockopt(s, level, optname, optval, optlen) +int s, level, optname; caddr_t optval; int optlen; +.DE +The option \fIoptname\fP is interpreted at the indicated +protocol \fIlevel\fP for socket \fIs\fP. If a value is specified +with \fIoptval\fP and \fIoptlen\fP, it is interpreted by +the software operating at the specified \fIlevel\fP. The \fIlevel\fP +SOL_SOCKET is reserved to indicate options maintained +by the socket facilities. Other \fIlevel\fP values indicate +a particular protocol which is to act on the option request; +these values are normally interpreted as a ``protocol number''. +.NH 3 +UNIX domain +.PP +This section describes briefly the properties of the UNIX communications +domain. +.NH 4 +Types of sockets +.PP +In the UNIX domain, +the SOCK_STREAM abstraction provides pipe-like +facilities, while SOCK_DGRAM provides (usually) +reliable message-style communications. +.NH 4 +Naming +.PP +Socket names are strings and may appear in the UNIX file +system name space through portals\(dg. +.FS +\(dg The 4.3BSD implementation of the UNIX domain embeds +bound sockets in the UNIX file system name space; +this may change in future releases. +.FE +.NH 4 +Access rights transmission +.PP +The ability to pass UNIX descriptors with messages in this domain +allows migration of service within the system and allows +user processes to be used in building system facilities. +.NH 3 +INTERNET domain +.PP +This section describes briefly how the Internet domain is +mapped to the model described in this section. More +information will be found in the document describing the +network implementation in 4.3BSD. +.NH 4 +Socket types and protocols +.PP +SOCK_STREAM is supported by the Internet TCP protocol; +SOCK_DGRAM by the UDP protocol. +Each is layered atop the transport-level Internet Protocol (IP). +The Internet Control Message Protocol is implemented atop/beside IP +and is accessible via a raw socket. +The SOCK_SEQPACKET +has no direct Internet family analogue; a protocol +based on one from the XEROX NS family and layered on +top of IP could be implemented to fill this gap. +.NH 4 +Socket naming +.PP +Sockets in the Internet domain have names composed of the 32 bit +Internet address, and a 16 bit port number. +Options may be used to +provide IP source routing or security options. +The 32-bit address is composed of network and host parts; +the network part is variable in size and is frequency encoded. +The host part may optionally be interpreted as a subnet field +plus the host on subnet; this is enabled by setting a network address +mask at boot time. +.NH 4 +Access rights transmission +.PP +No access rights transmission facilities are provided in the Internet domain. +.NH 4 +Raw access +.PP +The Internet domain allows the super-user access to the raw facilities +of IP. +These interfaces are modeled as SOCK_RAW sockets. +Each raw socket is associated with one IP protocol number, +and receives all traffic received for that protocol. +This allows administrative and debugging +functions to occur, +and enables user-level implementations of special-purpose protocols +such as inter-gateway routing protocols. diff --git a/share/doc/psd/05.sysman/2.4.t b/share/doc/psd/05.sysman/2.4.t new file mode 100644 index 0000000..cd7dcb9 --- /dev/null +++ b/share/doc/psd/05.sysman/2.4.t @@ -0,0 +1,174 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.4.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "Terminals and Devices +.NH 3 +Terminals +.PP +Terminals support \fIread\fP and \fIwrite\fP I/O operations, +as well as a collection of terminal specific \fIioctl\fP operations, +to control input character interpretation and editing, +and output format and delays. +.NH 4 +Terminal input +.PP +Terminals are handled according to the underlying communication +characteristics such as baud rate and required delays, +and a set of software parameters. +.NH 5 +Input modes +.PP +A terminal is in one of three possible modes: \fIraw\fP, \fIcbreak\fP, +or \fIcooked\fP. +In raw mode all input is passed through to the +reading process immediately and without interpretation. +In cbreak mode, the handler interprets input only by looking +for characters that cause interrupts or output flow control; +all other characters are made available as in raw mode. +In cooked mode, input +is processed to provide standard line-oriented local editing functions, +and input is presented on a line-by-line basis. +.NH 5 +Interrupt characters +.PP +Interrupt characters are interpreted by the terminal handler only in +cbreak and cooked modes, and +cause a software interrupt to be sent to all processes in the process +group associated with the terminal. Interrupt characters exist +to send SIGINT +and SIGQUIT signals, +and to stop a process group +with the SIGTSTP signal either immediately, or when +all input up to the stop character has been read. +.NH 5 +Line editing +.PP +When the terminal is in cooked mode, editing of an input line +is performed. Editing facilities allow deletion of the previous +character or word, or deletion of the current input line. +In addition, a special character may be used to reprint the current +input line after some number of editing operations have been applied. +.PP +Certain other characters are interpreted specially when a process is +in cooked mode. The \fIend of line\fP character determines +the end of an input record. The \fIend of file\fP character simulates +an end of file occurrence on terminal input. Flow control is provided +by \fIstop output\fP and \fIstart output\fP control characters. Output +may be flushed with the \fIflush output\fP character; and a \fIliteral +character\fP may be used to force literal input of the immediately +following character in the input line. +.PP +Input characters may be echoed to the terminal as they are received. +Non-graphic ASCII input characters may be echoed as a two-character +printable representation, ``^character.'' +.NH 4 +Terminal output +.PP +On output, the terminal handler provides some simple formatting services. +These include converting the carriage return character to the +two character return-linefeed sequence, +inserting delays after certain standard control characters, +expanding tabs, and providing translations +for upper-case only terminals. +.NH 4 +Terminal control operations +.PP +When a terminal is first opened it is initialized to a standard +state and configured with a set of standard control, editing, +and interrupt characters. A process +may alter this configuration with certain +control operations, specifying parameters in a standard structure:\(dg +.FS +\(dg The control interface described here is an internal interface only +in 4.3BSD. Future releases will probably use a modified interface +based on currently-proposed standards. +.FE +.DS +._f +struct ttymode { + short tt_ispeed; /* input speed */ + int tt_iflags; /* input flags */ + short tt_ospeed; /* output speed */ + int tt_oflags; /* output flags */ +}; +.DE +and ``special characters'' are specified with the +\fIttychars\fP structure, +.DS +._f +struct ttychars { + char tc_erasec; /* erase char */ + char tc_killc; /* erase line */ + char tc_intrc; /* interrupt */ + char tc_quitc; /* quit */ + char tc_startc; /* start output */ + char tc_stopc; /* stop output */ + char tc_eofc; /* end-of-file */ + char tc_brkc; /* input delimiter (like nl) */ + char tc_suspc; /* stop process signal */ + char tc_dsuspc; /* delayed stop process signal */ + char tc_rprntc; /* reprint line */ + char tc_flushc; /* flush output (toggles) */ + char tc_werasc; /* word erase */ + char tc_lnextc; /* literal next character */ +}; +.DE +.NH 4 +Terminal hardware support +.PP +The terminal handler allows a user to access basic +hardware related functions; e.g. line speed, +modem control, parity, and stop bits. A special signal, +SIGHUP, is automatically +sent to processes in a terminal's process +group when a carrier transition is detected. This is +normally associated with a user hanging up on a modem +controlled terminal line. +.NH 3 +Structured devices +.PP +Structures devices are typified by disks and magnetic +tapes, but may represent any random-access device. +The system performs read-modify-write type buffering actions on block +devices to allow them to be read and written in a totally random +access fashion like ordinary files. +File systems are normally created in block devices. +.NH 3 +Unstructured devices +.PP +Unstructured devices are those devices which +do not support block structure. Familiar unstructured devices +are raw communications lines (with +no terminal handler), raster plotters, magnetic tape and disks unfettered +by buffering and permitting large block input/output and positioning +and formatting commands. diff --git a/share/doc/psd/05.sysman/2.5.t b/share/doc/psd/05.sysman/2.5.t new file mode 100644 index 0000000..109eb6a --- /dev/null +++ b/share/doc/psd/05.sysman/2.5.t @@ -0,0 +1,39 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.5.t 8.1 (Berkeley) 6/8/93 +.\" +.sh "Process and kernel descriptors +.PP +The status of the facilities in this section is still under discussion. +The \fIptrace\fP facility of earlier UNIX systems +remains in 4.3BSD. +Planned enhancements would allow a descriptor-based process control facility. diff --git a/share/doc/psd/05.sysman/Makefile b/share/doc/psd/05.sysman/Makefile new file mode 100644 index 0000000..2c0ec7b --- /dev/null +++ b/share/doc/psd/05.sysman/Makefile @@ -0,0 +1,10 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/05.sysman +SRCS= 0.t 1.0.t 1.1.t 1.2.t 1.3.t 1.4.t 1.5.t 1.6.t 1.7.t \ + 2.0.t 2.1.t 2.2.t 2.3.t 2.4.t 2.5.t a.t +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/psd/05.sysman/a.t b/share/doc/psd/05.sysman/a.t new file mode 100644 index 0000000..dd9cfd9 --- /dev/null +++ b/share/doc/psd/05.sysman/a.t @@ -0,0 +1,235 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)a.t 8.1 (Berkeley) 6/8/93 +.\" +.ds RH Summary of facilities +.bp +.SH +\s+2I. Summary of facilities\s0 +.PP +.de h +.br +.if n .ne 8 +\fB\\$1 \\$2\fP +.br +.. +.nr H1 0 +.NH +Kernel primitives +.LP +.h 1.1. "Process naming and protection +.in +5 +.TS +lw(1.6i) aw(3i). +sethostid set UNIX host id +gethostid get UNIX host id +sethostname set UNIX host name +gethostname get UNIX host name +getpid get process id +fork create new process +exit terminate a process +execve execute a different process +getuid get user id +geteuid get effective user id +setreuid set real and effective user id's +getgid get accounting group id +getegid get effective accounting group id +getgroups get access group set +setregid set real and effective group id's +setgroups set access group set +getpgrp get process group +setpgrp set process group +.TE +.in -5 +.h 1.2 "Memory management +.in +5 +.TS +lw(1.6i) aw(3i). +<sys/mman.h> memory management definitions +sbrk change data section size +sstk\(dg change stack section size +getpagesize get memory page size +mmap\(dg map pages of memory +msync\(dg flush modified mapped pages to filesystem +munmap\(dg unmap memory +mprotect\(dg change protection of pages +madvise\(dg give memory management advice +mincore\(dg determine core residency of pages +msleep\(dg sleep on a lock +mwakeup\(dg wakeup process sleeping on a lock +.TE +.FS +\(dg Not supported in 4.3BSD. +.FE +.in -5 +.h 1.3 "Signals +.in +5 +.TS +lw(1.6i) aw(3i). +<signal.h> signal definitions +sigvec set handler for signal +kill send signal to process +killpgrp send signal to process group +sigblock block set of signals +sigsetmask restore set of blocked signals +sigpause wait for signals +sigstack set software stack for signals +.TE +.in -5 +.h 1.4 "Timing and statistics +.in +5 +.TS +lw(1.6i) aw(3i). +<sys/time.h> time-related definitions +gettimeofday get current time and timezone +settimeofday set current time and timezone +getitimer read an interval timer +setitimer get and set an interval timer +profil profile process +.TE +.in -5 +.h 1.5 "Descriptors +.in +5 +.TS +lw(1.6i) aw(3i). +getdtablesize descriptor reference table size +dup duplicate descriptor +dup2 duplicate to specified index +close close descriptor +select multiplex input/output +fcntl control descriptor options +wrap\(dg wrap descriptor with protocol +.TE +.FS +\(dg Not supported in 4.3BSD. +.FE +.in -5 +.h 1.6 "Resource controls +.in +5 +.TS +lw(1.6i) aw(3i). +<sys/resource.h> resource-related definitions +getpriority get process priority +setpriority set process priority +getrusage get resource usage +getrlimit get resource limitations +setrlimit set resource limitations +.TE +.in -5 +.h 1.7 "System operation support +.in +5 +.TS +lw(1.6i) aw(3i). +mount mount a device file system +swapon add a swap device +umount umount a file system +sync flush system caches +reboot reboot a machine +acct specify accounting file +.TE +.in -5 +.NH +System facilities +.LP +.h 2.1 "Generic operations +.in +5 +.TS +lw(1.6i) aw(3i). +read read data +write write data +<sys/uio.h> scatter-gather related definitions +readv scattered data input +writev gathered data output +<sys/ioctl.h> standard control operations +ioctl device control operation +.TE +.in -5 +.h 2.2 "File system +.PP +Operations marked with a * exist in two forms: as shown, +operating on a file name, and operating on a file descriptor, +when the name is preceded with a ``f''. +.in +5 +.TS +lw(1.6i) aw(3i). +<sys/file.h> file system definitions +chdir change directory +chroot change root directory +mkdir make a directory +rmdir remove a directory +open open a new or existing file +mknod make a special file +portal\(dg make a portal entry +unlink remove a link +stat* return status for a file +lstat returned status of link +chown* change owner +chmod* change mode +utimes change access/modify times +link make a hard link +symlink make a symbolic link +readlink read contents of symbolic link +rename change name of file +lseek reposition within file +truncate* truncate file +access determine accessibility +flock lock a file +.TE +.in -5 +.h 2.3 "Communications +.in +5 +.TS +lw(1.6i) aw(3i). +<sys/socket.h> standard definitions +socket create socket +bind bind socket to name +getsockname get socket name +listen allow queuing of connections +accept accept a connection +connect connect to peer socket +socketpair create pair of connected sockets +sendto send data to named socket +send send data to connected socket +recvfrom receive data on unconnected socket +recv receive data on connected socket +sendmsg send gathered data and/or rights +recvmsg receive scattered data and/or rights +shutdown partially close full-duplex connection +getsockopt get socket option +setsockopt set socket option +.TE +.in -5 +.h 2.4 "Terminals, block and character devices +.in +5 +.in -5 +.h 2.5 "Processes and kernel hooks +.in +5 diff --git a/share/doc/psd/05.sysman/spell.ok b/share/doc/psd/05.sysman/spell.ok new file mode 100644 index 0000000..b0cbd9c --- /dev/null +++ b/share/doc/psd/05.sysman/spell.ok @@ -0,0 +1,332 @@ +AF +ANON +AUTOBOOT +Behav +CLR +DEF +DGRAM +DONTNEED +Datagram +Datagrams +EINPROGRESS +EWOULDBLOCK +EXCL +FD +FSIZE +Fabry +GETFL +GETOWN +HASSEMAPHORE +HASSEMPHORE +IGN +INCR +INET +IP +IPC +ISSET +ITIMER +Karels +Leffler +MADV +MAXHOSTNAMELEN +MSG +Manual''PS1:6 +McKusick +Mclear +Mset +NB +NDELAY +NGROUPS +NLIMITS +NOEXTEND +NS +OOB +PGRP +PRIO +PROT +PS1:6 +RB +RDM +RDONLY +RDWR +RH +RLIM +RLIMIT +RSS +RUSAGE +SEQPACKET +SETFL +SETOWN +SIG +SIGALRM +SIGBUS +SIGCHLD +SIGCONT +SIGEMT +SIGFPE +SIGHUP +SIGILL +SIGINT +SIGIO +SIGIOT +SIGKILL +SIGPROF +SIGQUIT +SIGSEGV +SIGSTOP +SIGTERM +SIGTRAP +SIGTSTP +SIGTTIN +SIGTTOU +SIGURG +SIGUSR1 +SIGUSR2 +SIGVTALRM +SIGXCPU +SIGXFSZ +Sem +Sv +TCP +TRUNC +UDP +VAX +WILLNEED +WRONLY +XTND +accessor +accrights +accrightslen +addr +anamelen +arg +argv +arusage +astatus +behav +blkdev +brkc +bu +buf +buflen +bufsize +caddr +cbreak +chroot +cmd +datagram +datagrams +dev +dopt +dprop +ds +dst +dsttime +dsuspc +dtype +dup2 +egid +envp +eofc +erasec +errno +euid +fchmod +fchown +fcntl +fd +fdset +file.h +filename +filesystem +flushc +fromlenaddr +fs +fstat +ftruncate +getdtablesize +getegid +geteuid +getgid +getgroups +gethostid +gethostname +getitimer +getpagesize +getpeername +getpriority +getprotobyname +getrlimit +getrusage +getsockname +getsockopt +gettimeofday +gid +gidset +gidsetsize +hostid +idrss +iflags +inblock +incr +intrc +ioctl.h +iov +iovec +iovlen +ispeed +isrss +itimerval +ixrss +kbytes +killc +killpgrp +len +linefeed +lnextc +lstat +maddr +madvise +majflt +maxrss +mclear +mincore +minflt +minuteswest +mman.h +mmap +mprotect +mremap +mset +msg +msghdr +msglen +msgrcv +msgsnd +msleep +msync +munmap +mwakeup +namelen +nbytes +nd +nds +newname +ngroups +nivcsw +nl +nsignals +nswap +nvcsw +oflag +oflags +oldmask +oldname +oldoffset +onstack +optlen +optname +optval +or'ed +or'ing +ospeed +oss +osv +oublock +ovalue +pagesize +param +param.h +path1 +path2 +pathname +pathnames +pgrp +pid +pos +prio +prot +proto +pv +quitc +quota.h +readlink +readv +reboot.h +recv +recvfrom +recvmsg +resource.h +rgid +rlim +rlimit +rlp +ronly +rprntc +ru +ruid +rusage +sbrk +scp +sem +sendmsg +sendto +setgroups +sethostid +sethostname +setitimer +setpriority +setquota +setregid +setreuid +setrlimit +setsockopt +settimeofday +sigblock +sigcontext +sigmask +signal.h +signo +sigpause +sigsetmask +sigstack +sigvec +sockaddr +socket.h +socketpair +socktype +sp +ss +sstk +startc +stat.h +stb +stopc +suspc +sv +sw +symlink +ta +time.h +timeval +timezone +tolen +tt +ttychars +ttymode +tv +tvp +tvsec +types.h +tz +tzp +uid +uio.h +umount +usec +vec +wait.h +waitstatus +werasc +writeable +writev diff --git a/share/doc/psd/06.Clang/Clang.ms b/share/doc/psd/06.Clang/Clang.ms new file mode 100644 index 0000000..6395913 --- /dev/null +++ b/share/doc/psd/06.Clang/Clang.ms @@ -0,0 +1,4575 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)Clang.ms 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.nr Cl 2 +.TL +The C Programming Language - Reference Manual +.AU +Dennis M. Ritchie +.AI +AT&T Bell Laboratories +Murray Hill, NJ 07974 +.PP +This manual is a reprint, with updates to the current C standard, from +\fIThe C Programming Language\fR, +by Brian W. Kernighan and Dennis M. Ritchie, Prentice-Hall, Inc., 1978. +.PP +\fBThis document is of historical interest only. Do not use it as a reference +for modern implementations of C.\fP +.EH 'PSD:6-%''The C Programming Language - Reference Manual' +.OH 'The C Programming Language - Reference Manual''PSD:6-%' +.NH 1 +Introduction +.PP +This manual describes the C language on the DEC PDP-11\(dg, the DEC VAX-11, +.FS +.LP +\(dg DEC PDP-11, and DEC VAX-11 are trademarks of Digital Equipment Corporation. +.LP +\(dd 3B 20 is a trademark of AT&T. +.FE +and the AT&T 3B 20\(dd. +Where differences exist, it concentrates on the VAX, but tries to point +out implementation-dependent details. With few exceptions, these dependencies +follow directly from the underlying properties of the hardware; the various +compilers are generally quite compatible. +.NH 1 +Lexical Conventions +.PP +There are six classes of tokens\ -\ +identifiers, keywords, constants, strings, operators, and other separators. +Blanks, tabs, new\(hylines, +and comments (collectively, ``white space'') as described below +are ignored except as they serve to separate +tokens. +Some white space is required to separate +otherwise adjacent identifiers, +keywords, and constants. +.PP +If the input stream has been parsed into tokens +up to a given character, the next token is taken +to include the longest string of characters +which could possibly constitute a token. +.NH 2 +Comments +.PP +The characters +.B +/* +.R +introduce a comment which terminates +with the characters +\fB\(**/\fR. +Comments do not nest. +.NH 2 +Identifiers (Names) +.PP +An identifier is a sequence of letters and digits. +The first character must be a letter. +The underscore +(\fB_\fR) +counts as a letter. +Uppercase and lowercase letters +are different. +Although there is no limit on the length of a name, +only initial characters are significant: at least +eight characters of a non-external name, and perhaps +fewer for external names. +Moreover, some implementations may collapse case +distinctions for external names. +The external name sizes include: +.DS +.TS +l l. +PDP-11 7 characters, 2 cases +VAX-11 >100 characters, 2 cases +AT&T 3B 20 >100 characters, 2 cases +.TE +.fi +.DE +.NH 2 +Keywords +.PP +The following identifiers are reserved for use +as keywords and may not be used otherwise: +.DS +.ta 0.8i 1.6i 2.4i 3.2i 4.0i +\fBauto do for return typedef +break double goto short union +case else if sizeof unsigned +char enum int static void +continue external long struct while +default float register switch\fR +.ta 0.5i +.DE +.PP +Some implementations also reserve the words +.B +fortran, asm, gfloat, hfloat +.R +and +.B quad +.R +.NH 2 +Constants +.PP +There are several kinds +of constants. +Each has a type; an introduction to types is given in ``NAMES.'' +Hardware characteristics that affect sizes are summarized in +``Hardware Characteristics'' under ``LEXICAL CONVENTIONS.'' +.NH 3 +Integer Constants +.br +.PP +An integer constant consisting of a sequence of digits +is taken +to be octal if it begins with +.B +0 +.R +(digit zero). +An octal constant consists of the digits \fB0\fR through \fB7\fR only. +A sequence of digits preceded by +.B +0x +.R +or +.B +0X +.R +(digit zero) is taken to be a hexadecimal integer. +The hexadecimal digits include +.B +a +.R +or +.B +A +.R +through +.B +f +.R +or +.B +F +.R +with values 10 through 15. +Otherwise, the integer constant is taken to be decimal. +A decimal constant whose value exceeds the largest +signed machine integer is taken to be +\fBlong\fR; +an octal or hex constant which exceeds the largest unsigned machine integer +is likewise taken to be +.B +long\fR. +.R +Otherwise, integer constants are \fBint\fR. +.NH 3 +Explicit Long Constants +.br +.PP +A decimal, octal, or hexadecimal integer constant immediately followed +by +.B +l +.R +(letter ell) +or +.B +L +.R +is a long constant. +As discussed below, +on some machines +integer and long values may be considered identical. +.NH 3 +Character Constants +.br +.PP +A character constant is a character enclosed in single quotes, +as in '\fBx\fR'. +The value of a character constant is the numerical value of the +character in the machine's character set. +.PP +Certain nongraphic characters, +the single quote +(\fB'\fR) +and the backslash +(\fB\e\fR), +may be represented according to the following table +of escape sequences: +.DS +.TS +l l l. +new\(hyline NL (LF) \en +horizontal tab HT \et +vertical tab VT \ev +backspace BS \eb +carriage return CR \er +form feed FF \ef +backslash \e \e\e +single quote ' \e' +bit pattern \fIddd\fR\^ \e\fIddd\fR\^ +.TE +.DE +.PP +The escape +\e\fIddd\fR +consists of the backslash followed by 1, 2, or 3 octal digits +which are taken to specify the value of the +desired character. +A special case of this construction is +.B +\e0 +.R +(not followed +by a digit), which indicates the character +.B +NUL\fR. +.R +If the character following a backslash is not one +of those specified, the +behavior is undefined. +A new-line character is illegal in a character constant. +The type of a character constant is \fBint\fR. +.NH 3 +Floating Constants +.br +.PP +A floating constant consists of +an integer part, a decimal point, a fraction part, +an +.B +e +.R +or +\fBE\fR, +and an optionally signed integer exponent. +The integer and fraction parts both consist of a sequence +of digits. +Either the integer part or the fraction +part (not both) may be missing. +Either the decimal point or +the +.B +e +.R +and the exponent (not both) may be missing. +Every floating constant has type \fBdouble\fR. +.NH 3 +Enumeration Constants +.br +.PP +Names declared as enumerators +(see ``Structure, Union, and Enumeration Declarations'' under +``DECLARATIONS'') +have type \fBint\fR. +.NH 2 +Strings +.PP +A string is a sequence of characters surrounded by +double quotes, +as in +\fB"..."\fR. +A string has type +``array of \fBchar\fR'' and storage class +\fBstatic\fR +(see ``NAMES'') +and is initialized with +the given characters. +The compiler places +a null byte +(\fB\e0\fR) +at the end of each string so that programs +which scan the string can +find its end. +In a string, the double quote character +(\fB"\fR) +must be preceded by +a +\fB\e\fR; +in addition, the same escapes as described for character +constants may be used. +.PP +A +.B +\e +.R +and +the immediately following new\(hyline are ignored. +All strings, even when written identically, are distinct. +.NH 2 +Hardware Characteristics +.PP +The following figure summarize +certain hardware properties that vary from machine to machine. +.DS +.TS +center box; +c cfB s cfB s cfB s +c c s c s c s +l | l1 lp8 | l1 lp8 | l1 lp8. + DEC PDP\-11 DEC VAX-11 AT&T 3B + (ASCII) (ASCII) (ASCII) +.sp +_ +char 8 bits 8 bits 8bits +int 16 32 32 +short 16 16 16 +long 32 32 32 +float 32 32 32 +double 64 64 64 +float range \(+-10 \(+-38 \(+-10 \(+-38 \(+-10 \(+-38 +\^ \^ \^ \^ +double range \(+-10 \(+-38 \(+-10 \(+-38 \(+-10 \(+-308 +\^ \^ \^ \^ +.TE +.\" .FG 4 4 1 "DEC PDP-11 HARDWARE CHARACTERISTICS" +.DE +.PP +.NH 1 +Syntax Notation +.PP +Syntactic categories are indicated by +.I +italic +.R +type +and literal words and characters +in +\fBbold\fR +type. +Alternative categories are listed on separate lines. +An optional terminal or nonterminal symbol is +indicated by the subscript ``opt,'' so that +.DS +{ \fIexpression\v'0.5'\s-2opt\s0\v'-0.5'\fR } +.DE +.LP +indicates an optional expression enclosed in braces. +The syntax is summarized in ``SYNTAX SUMMARY''. +.NH 1 +Names +.PP +The C language bases the interpretation of an +identifier upon two attributes of the identifier \(mi its +.I +storage class +.R +and its +.I +type\fR. +The storage class determines the location and lifetime +of the storage associated with an identifier; +the type determines +the meaning of the values +found in the identifier's storage. +.NH 2 +Storage Class +.PP +.\" The original text had borrowed BL, LI and LE from the mm macros. +.\" That way madness lies. +There are four declarable storage classes: +.RS +.br +\(bu Automatic +.br +\(bu Static +.br +\(bu External +.br +\(bu Register. +.RE +.PP +Automatic variables are local to each invocation of +a block (see ``Compound Statement or Block'' in +``STATEMENTS'') and are discarded upon exit from the block. +Static variables are local to a block but retain +their values upon reentry to a block even after control +has left the block. +External variables exist and retain their values throughout +the execution of the entire program and +may be used for communication between +functions, even separately compiled functions. +Register variables are (if possible) stored in the fast registers +of the machine; like automatic +variables, they are local to each block and disappear on exit from the block. +.NH 2 +Type +.PP +The C language supports several +fundamental +types of objects. +Objects declared as characters +(\fBchar\fR) +are large enough to store any member of the implementation's +character set. +If a genuine character from that character set is +stored in a \fBchar\fR variable, +its value is equivalent to the integer code for that character. +Other quantities may be stored into character variables, but +the implementation is machine dependent. +In particular, \fBchar\fR may be signed or unsigned by default. +.PP +Up to three sizes of integer, declared +.B +short +.R +\fBint\fR, +\fBint\fR, +and +.B +long +.R +\fBint\fR, +are available. +Longer integers provide no less storage than shorter ones, +but the implementation may make either short integers or long integers, +or both, equivalent to plain integers. +``Plain'' integers have the natural size suggested +by the host machine architecture. +The other sizes are provided to meet special needs. +.PP +The properties of \fBenum\fR types (see ``Structure, Union, and Enumeration Declarations'' +under ``DECLARATIONS'') +are identical to those of +some integer types. +The implementation may use the range of values to +determine how to allocate storage. +.PP +Unsigned +integers, declared +.B +unsigned, +.R +obey the laws of arithmetic modulo +2\v'-0.5'\fIn\fR\v'0.5' +where \fIn\fR is the number of bits in the representation. +(On the +PDP-11, +unsigned long quantities are not supported.) +.PP +Single-precision floating point +(\fBfloat\fR) +and double precision floating point +(\fBdouble\fR) +may be synonymous in some implementations. +.PP +Because objects of the foregoing types can usefully be interpreted +as numbers, they will be referred to as +.I +arithmetic +.R +types. +\fBChar\fR, +.B +int +.R +of all sizes whether \fBunsigned\fR or not, and +.B +enum +.R +will collectively be called +.I +integral +.R +types. +The +.B +float +.R +and +.B +double +.R +types will collectively be called +.I +floating +.R +types. +.PP +The +.B +void +.R +type +specifies an empty set of values. +It is used as the type returned by functions that +generate no value. +.PP +Besides the fundamental arithmetic types, there is a +conceptually infinite class of derived types constructed +from the fundamental types in the following ways: +.IP \fIArrays\fR +of objects of most types +.IP \fIFunctions\fR +which return objects of a given type +.IP \fIPointers\fR +to objects of a given type +.IP \fIStructures\fR +containing a sequence of objects of various types +.IP \fIUnions\fR +capable of containing any one of several objects of various types. +.LP +In general these methods +of constructing objects can +be applied recursively. +.NH 1 +Objects and Lvalues +.PP +An +.I +object +.R +is a manipulatable region of storage. +An +.I +lvalue +.R +is an expression referring to an object. +An obvious example of an lvalue +expression is an identifier. +There are operators which yield lvalues: +for example, +if +.B +E +.R +is an expression of pointer type, then +.B +\(**E +.R +is an lvalue +expression referring to the object to which +.B +E +.R +points. +The name ``lvalue'' comes from the assignment expression +.B +E1\ =\ E2 +.R +in which the left operand +.B +E1 +.R +must be +an lvalue expression. +The discussion of each operator +below indicates whether it expects lvalue operands and whether it +yields an lvalue. +.NH 1 +Conversions +.PP +A number of operators may, depending on their operands, +cause conversion of the value of an operand from one type to another. +This part explains the result to be expected from such +conversions. +The conversions demanded by most ordinary operators are summarized under +``Arithmetic Conversions.'' +The summary will be supplemented +as required by the discussion +of each operator. +.NH 2 +Characters and Integers +.PP +A character or a short integer may be used wherever an +integer may be used. +In all cases +the value is converted to an integer. +Conversion of a shorter integer +to a longer preserves sign. +Whether or not sign-extension occurs for characters is machine +dependent, but it is guaranteed that a member of the +standard character set is non-negative. +Of the machines treated here, +only the +PDP-11 +and +VAX-11 +sign-extend. +On these machines, +.B +char +.R +variables range in value from +\(mi128 to 127. +The more explicit type +.B +unsigned +.R +.B +char +.R +forces the values to range from 0 to 255. +.PP +On machines that treat characters as signed, +the characters of the +ASCII +set are all non-negative. +However, a character constant specified +with an octal escape suffers sign extension +and may appear negative; +for example, +\fB\'\e377\'\fR +\fRhas the value +.B +\(mi1\fR. +.PP +When a longer integer is converted to a shorter +integer +or to a +.B +char, +.R +it is truncated on the left. +Excess bits are simply discarded. +.NH 2 +Float and Double +.PP +All floating arithmetic in C is carried out in double precision. +Whenever a +.B +float +.R +appears in an expression it is lengthened to +.B +double +.R +by zero padding its fraction. +When a +.B +double +.R +must be +converted to +\fBfloat\fR, +for example by an assignment, +the +.B +double +.R +is rounded before +truncation to +.B +float +.R +length. +This result is undefined if it cannot be represented as a float. +On the VAX, the compiler can be directed to use single precision for expressions +containing only float and integer operands. +.NH 2 +Floating and Integral +.PP +Conversions of floating values to integral type +are rather machine dependent. +In particular, the direction of truncation of negative numbers +varies. +The result is undefined if +it will not fit in the space provided. +.PP +Conversions of integral values to floating type +are well behaved. +Some loss of accuracy occurs +if the destination lacks sufficient bits. +.NH 2 +Pointers and Integers +.PP +An expression of integral type may be added to or subtracted from +a pointer; in such a case, +the first is converted as +specified in the discussion of the addition operator. +Two pointers to objects of the same type may be subtracted; +in this case, the result is converted to an integer +as specified in the discussion of the subtraction +operator. +.NH 2 +Unsigned +.PP +Whenever an unsigned integer and a plain integer +are combined, the plain integer is converted to unsigned +and the result is unsigned. +The value +is the least unsigned integer congruent to the signed +integer (modulo 2\v'-0.3'\s-2wordsize\s+2\v'0.3'). +In a 2's complement representation, +this conversion is conceptual; and there is no actual change in the +bit pattern. +.PP +When an unsigned \fBshort\fR integer is converted to +\fBlong\fR, +the value of the result is the same numerically as that of the +unsigned integer. +Thus the conversion amounts to padding with zeros on the left. +.NH 2 +Arithmetic Conversions +.PP +A great many operators cause conversions +and yield result types in a similar way. +This pattern will be called the ``usual arithmetic conversions.'' +.IP 1. +First, any operands of type +.B +char +.R +or +.B +short +.R +are converted to +\fBint\fR, +and any operands of type \fBunsigned char\fR +or \fBunsigned short\fR are converted +to \fBunsigned int\fR. +.IP 2. +Then, if either operand is +.B +double, +.R +the other is converted to +.B +double +.R +and that is the type of the result. +.IP 3. +Otherwise, if either operand is \fBunsigned long\fR, +the other is converted to \fBunsigned long\fR and that +is the type of the result. +.IP 4. +Otherwise, if either operand is +\fBlong\fR, +the other is converted to +.B +long +.R +and that is the type of the result. +.IP 5. +Otherwise, if one operand is \fBlong\fR, and +the other is \fBunsigned int\fR, they are both +converted to \fBunsigned long\fR and that is +the type of the result. +.IP 6. +Otherwise, if either operand is +.B +unsigned, +.R +the other is converted to +.B +unsigned +.R +and that is the type of the result. +.IP 7. +Otherwise, both operands must be +\fBint\fR, +and that is the type of the result. +.LP +.NH 2 +Void +.PP +The (nonexistent) value of a +.B +void +.R +object may not be used in any way, +and neither explicit nor implicit conversion may be applied. +Because a void expression denotes a nonexistent value, +such an expression may be used only +as an expression statement +(see ``Expression Statement'' under ``STATEMENTS'') +or as the left operand +of a comma expression (see ``Comma Operator'' under ``EXPRESSIONS''). +.PP +An expression may be converted to +type +.B +void +.R +by use of a cast. +For example, this makes explicit the discarding of the value +of a function call used as an expression statement. +.NH 1 +Expressions +.PP +The precedence of expression operators is the same +as the order of the major +subsections of this section, highest precedence first. +Thus, for example, the expressions referred to as the operands of +.B +\(pl +.R +(see ``Additive Operators'') +are those expressions defined under ``Primary Expressions'', +``Unary Operators'', and ``Multiplicative Operators''. +Within each subpart, the operators have the same +precedence. +Left- or right-associativity is specified +in each subsection for the operators +discussed therein. +The precedence and associativity of all the expression +operators are summarized in the +grammar of ``SYNTAX SUMMARY''. +.PP +Otherwise, the order of evaluation of expressions +is undefined. In particular, the compiler +considers itself free to +compute subexpressions in the order it believes +most efficient +even if the subexpressions +involve side effects. +The order in which subexpression evaluation takes place is unspecified. +Expressions involving a commutative and associative +operator +(\fB\(**,\fR +\fB\(pl\fR, +\fB&\fR, +\fB|\fR, +\fB^\fR) +may be rearranged arbitrarily even in the presence +of parentheses; +to force a particular order of evaluation, +an explicit temporary must be used. +.PP +The handling of overflow and divide check +in expression evaluation +is undefined. +Most existing implementations of C ignore integer overflows; +treatment of +division by 0 and all floating-point exceptions +varies between machines and is usually +adjustable by a library function. +.NH 2 +Primary Expressions +.PP +Primary expressions +involving \fB\.\fR, +\fB\(mi>\fR, +subscripting, and function calls +group left to right. +.DS +\fIprimary-expression: + identifier + constant + string + ( expression ) + primary-expression [ expression ] + primary-expression ( expression-list\v'0.5'\s-2opt\s0\v'-0.5' ) + primary-expression . identifier + primary-expression \(mi> identifier\fR +.DE +.DS +\fIexpression-list: + expression + expression-list , expression\fR +.DE +.PP +An identifier is a primary expression provided it has been +suitably declared as discussed below. +Its type is specified by its declaration. +If the type of the identifier is ``array of .\|.\|.'', +then the value of the identifier expression +is a pointer +to the first object in the array; and the +type of the expression is +``pointer to .\|.\|.''. +Moreover, an array identifier is not an lvalue +expression. +Likewise, an identifier which is declared +``function returning .\|.\|.'', +when used except in the function-name position +of a call, is converted to ``pointer to function returning .\|.\|.''. +.PP +A +constant is a primary expression. +Its type may be +\fBint\fR, +\fBlong\fR, +or +.B +double +.R +depending on its form. +Character constants have type +.B +int +.R +and floating constants have type +.B +double\fR. +.R +.PP +A string is a primary expression. +Its type is originally ``array of +\fBchar\fR'', +but following +the same rule given above for identifiers, +this is modified to ``pointer to +\fBchar\fR'' and +the +result is a pointer to the first character +in the string. +(There is an exception in certain initializers; +see ``Initialization'' under ``DECLARATIONS.'') +.PP +A parenthesized expression is a primary expression +whose type and value are identical +to those of the unadorned expression. +The presence of parentheses does +not affect whether the expression is an +lvalue. +.PP +A primary expression followed by an expression in square +brackets is a primary expression. +The intuitive meaning is that of a subscript. +Usually, the primary expression has type ``pointer to .\|.\|.'', +the subscript expression is +\fBint\fR, +and the type of the result is ``\|.\|.\|.\|''. +The expression +.B +E1[E2] +.R +is +identical (by definition) to +.B +\(**((E1)\(plE2))\fR. +All the clues +needed to understand +this notation are contained in this subpart together +with the discussions +in ``Unary Operators'' and ``Additive Operators'' on identifiers, +.B +\(** +.R +and +.B +\(pl +.R +respectively. +The implications are summarized under ``Arrays, Pointers, and Subscripting'' +under ``TYPES REVISITED.'' +.PP +A function call is a primary expression followed by parentheses +containing a possibly +empty, comma-separated list of expressions +which constitute the actual arguments to the +function. +The primary expression must be of type ``function returning .\|.\|.,'' +and the result of the function call is of type ``\|.\|.\|.\|''. +As indicated +below, a hitherto unseen identifier followed +immediately by a left parenthesis +is contextually declared +to represent a function returning +an integer; +thus in the most common case, integer-valued functions +need not be declared. +.PP +Any actual arguments of type +.B +float +.R +are +converted to +.B +double +.R +before the call. +Any of type +.B +char +.R +or +.B +short +.R +are converted to +.B +int\fR. +.R +Array names are converted to pointers. +No other conversions are performed automatically; +in particular, the compiler does not compare +the types of actual arguments with those of formal +arguments. +If conversion is needed, use a cast; +see ``Unary Operators'' and ``Type Names'' under +``DECLARATIONS.'' +.PP +In preparing for the call to a function, +a copy is made of each actual parameter. +Thus, all argument passing in C is strictly by value. +A function may +change the values of its formal parameters, but +these changes cannot affect the values +of the actual parameters. +It is possible +to pass a pointer on the understanding +that the function may change the value +of the object to which the pointer points. +An array name is a pointer expression. +The order of evaluation of arguments is undefined by the language; +take note that the various compilers differ. +Recursive calls to any +function are permitted. +.PP +A primary expression followed by a dot followed by an identifier +is an expression. +The first expression must be a structure or a union, and the identifier +must name a member of the structure or union. +The value is the named member of the structure or union, and it is +an lvalue if the first expression is an lvalue. +.PP +A primary expression followed by an arrow (built from +.B +\(mi +.R +and +.B +> +.R +) +followed by an identifier +is an expression. +The first expression must be a pointer to a structure or a union +and the identifier must name a member of that structure or union. +The result is an lvalue referring to the named member +of the structure or union +to which the pointer expression points. +Thus the expression +.B +E1\(mi>MOS +.R +is the same as +.B +(\(**E1).MOS\fR. +.R +Structures and unions are discussed in +``Structure, Union, and Enumeration Declarations'' under +``DECLARATIONS.'' +.NH 2 +Unary Operators +.PP +Expressions with unary operators +group right to left. +.tr ~~ +.DS +\fIunary-expression: + \(** expression + & lvalue + \(mi expression + ! expression + \s+2~\s0 expression + \(pl\(pl lvalue + \(mi\(milvalue + lvalue \(pl\(pl + lvalue \(mi\(mi + ( type-name ) expression\fR + sizeof\fI expression\fR + sizeof\fI ( type-name )\fR +.DE +.PP +The unary +.B +\(** +.R +operator +means +.I +indirection +.R +; +the expression must be a pointer, and the result +is an lvalue referring to the object to +which the expression points. +If the type of the expression is ``pointer to .\|.\|.,'' +the type of the result is ``\|.\|.\|.\|''. +.PP +The result of the unary +.B +& +.R +operator is a pointer +to the object referred to by the +lvalue. +If the type of the lvalue is ``\|.\|.\|.\|'', +the type of the result is ``pointer to .\|.\|.''. +.PP +The result +of the unary +.B +\(mi +.R +operator +is the negative of its operand. +The usual arithmetic conversions are performed. +The negative of an unsigned quantity is computed by +subtracting its value from +2\v'-0.5'\fIn\fR\^\v'0.5' where \fIn\fR\^ is the number of bits in +the corresponding signed type. +.sp +.tr ~~ +There is no unary +.B +\(pl +.R +operator. +.PP +The result of the logical negation operator +.B +! +.R +is one if the value of its operand is zero, zero if the value of its +operand is nonzero. +The type of the result is +.B +int\fR. +.R +It is applicable to any arithmetic type +or to pointers. +.PP +The +.B +\s+2~\s0 +.R +operator yields the one's complement of its operand. +The usual arithmetic conversions are performed. +The type of the operand must be integral. +.PP +The object referred to by the lvalue operand of prefix +.B +\(pl\(pl +.R +is incremented. +The value is the new value of the operand +but is not an lvalue. +The expression +.B +\(pl\(plx +.R +is equivalent to +\fBx=x\(pl1\fR. +See the discussions ``Additive Operators'' and ``Assignment +Operators'' for information on conversions. +.PP +The lvalue operand of prefix +.B +\(mi\(mi +.R +is decremented +analogously to the +prefix +.B +\(pl\(pl +.R +operator. +.PP +When postfix +.B +\(pl\(pl +.R +is applied to an lvalue, +the result is the value of the object referred to by the lvalue. +After the result is noted, the object +is incremented in the same +manner as for the prefix +.B +\(pl\(pl +.R +operator. +The type of the result is the same as the type of the lvalue expression. +.PP +When postfix +.B +\(mi\(mi +.R +is applied to an lvalue, +the result is the value of the object referred to by the lvalue. +After the result is noted, the object +is decremented in the manner as for the prefix +.B +\(mi\(mi +.R +operator. +The type of the result is the same as the type of the lvalue +expression. +.PP +An expression preceded by the parenthesized name of a data type +causes conversion of the value of the expression to the named type. +This construction is called a +.I +cast\fR. +.R +Type names are described in ``Type Names'' under ``Declarations.'' +.PP +The +.B +sizeof +.R +operator yields the size +in bytes of its operand. +(A +.I +byte +.R +is undefined by the language +except in terms of the value of +.B +sizeof\fR. +.R +However, in all existing implementations, +a byte is the space required to hold a +\fBchar.\fR) +When applied to an array, the result is the total +number of bytes in the array. +The size is determined from +the declarations of +the objects in the expression. +This expression is semantically an +.B +unsigned +.R +constant and may +be used anywhere a constant is required. +Its major use is in communication with routines +like storage allocators and I/O systems. +.PP +The +.B +sizeof +.R +operator +may also be applied to a parenthesized type name. +In that case it yields the size in bytes of an object +of the indicated type. +.PP +The construction +\fBsizeof(\fItype\|\fR\^)\fR\^ +is taken to be a unit, +so the expression +\fBsizeof(\fItype\|\fB)-2\fR +is the same as +\fB(sizeof(\fItype\|\fB))-2\fR. +.NH 2 +Multiplicative Operators +.PP +The multiplicative operators +\fB\(**\fR, +\fB/\fR, +and +.B +% +.R +group left to right. +The usual arithmetic conversions are performed. +.DS +\fImultiplicative expression: + expression \(** expression + expression / expression + expression % expression\fR +.DE +.PP +The binary +.B +\(** +.R +operator indicates multiplication. +The +.B +\(** +.R +operator is associative, +and expressions with several multiplications at the same +level may be rearranged by the compiler. +The binary +.B +/ +.R +operator indicates division. +.PP +The binary +.B +% +.R +operator yields the remainder +from the division of the first expression by the second. +The operands must be integral. +.PP +When positive integers are divided, truncation is toward 0; +but the form of truncation is machine-dependent +if either operand is negative. +On all machines covered by this manual, +the remainder has the same sign as the dividend. +It is always true that +.B +(a/b)\(**b\ \(pl a%b +.R +is equal to +.B +a +.R +(if +.B +b +.R +is not 0). +.NH 2 +Additive Operators +.PP +The additive operators +.B +\(pl +.R +and +.B +\(mi +.R +group left to right. +The usual arithmetic conversions are performed. +There are some additional type possibilities for each operator. +.DS +\fIadditive-expression: + expression \(pl expression + expression \(mi expression\fR +.DE +.PP +The result of the +.B +\(pl +.R +operator is the sum of the operands. +A pointer to an object in an array and +a value of any integral type +may be added. +The latter is in all cases converted to +an address offset +by multiplying it +by the length of the object to which the +pointer points. +The result is a pointer +of the same type as the original pointer +which points to another object in the same array, +appropriately offset from the original object. +Thus if +.B +P +.R +is a pointer +to an object in an array, the expression +.B +P\(pl1 +.R +is a pointer +to the next object in the array. +No further type combinations are allowed for pointers. +.PP +The +.B +\(pl +.R +operator is associative, +and expressions with several additions at the same level may +be rearranged by the compiler. +.PP +The result of the +.B +\(mi +.R +operator is the difference of the operands. +The usual arithmetic conversions are performed. +Additionally, +a value of any integral type +may be subtracted from a pointer, +and then the same conversions for addition apply. +.PP +If two pointers to objects of the same type are subtracted, +the result is converted +(by division by the length of the object) +to an +.B +int +.R +representing the number of +objects separating +the pointed-to objects. +This conversion will in general give unexpected +results unless the pointers point +to objects in the same array, since pointers, even +to objects of the same type, do not necessarily differ +by a multiple of the object length. +.NH 2 +Shift Operators +.PP +The shift operators +.B +<< +.R +and +.B +>> +.R +group left to right. +Both perform the usual arithmetic conversions on their operands, +each of which must be integral. +Then the right operand is converted to +\fBint\fR; +the type of the result is that of the left operand. +The result is undefined if the right operand is negative +or greater than or equal to the length of the object in bits. +On the VAX a negative right operand is interpreted as reversing +the direction of the shift. +.DS +\fIshift-expression: + expression << expression + expression >> expression\fR +.DE +.PP +The value of +.B +E1<<E2 +.R +is +.B +E1 +.R +(interpreted as a bit +pattern) left-shifted +.B +E2 +.R +bits. +Vacated bits are 0 filled. +The value of +.B +E1>>E2 +.R +is +.B +E1 +.R +right-shifted +.B +E2 +.R +bit positions. +The right shift is guaranteed to be logical +(0 fill) +if +.B +E1 +.R +is +\fBunsigned\fR; +otherwise, it may be +arithmetic. +.NH 2 +Relational Operators +.PP +The relational operators group left to right. +.DS +\fIrelational-expression: + expression < expression + expression > expression + expression <= expression + expression >= expression\fR +.DE +.PP +The operators +.B +< +.R +(less than), +.B +> +.R +(greater than), \fB<=\fR +(less than +or equal to), and +.B +>= +.R +(greater than or equal to) +all yield 0 if the specified relation is false +and 1 if it is true. +The type of the result is +.B +int\fR. +The usual arithmetic conversions are performed. +Two pointers may be compared; +the result depends on the relative locations in the address space +of the pointed-to objects. +Pointer comparison is portable only when the pointers point to objects +in the same array. +.NH 2 +Equality Operators +.PP +.DS +\fIequality-expression: + expression == expression + expression != expression\fR +.DE +.PP +The +.B +== +.R +(equal to) and the +.B +!= +.R +(not equal to) operators +are exactly analogous to the relational +operators except for their lower +precedence. +(Thus +.B +a<b\ ==\ c<d +.R +is 1 whenever +.B +a<b +.R +and +.B +c<d +.R +have the same truth value). +.PP +A pointer may be compared to an integer +only if the +integer is the constant 0. +A pointer to which 0 has been assigned is guaranteed +not to point to any object +and will appear to be equal to 0. +In conventional usage, such a pointer is considered to be null. +.NH 2 +Bitwise \s-1AND\s0 Operator +.PP +.DS +\fIand-expression: + expression & expression\fR +.DE +.PP +The +.B +& +.R +operator is associative, +and expressions involving +.B +& +.R +may be rearranged. +The usual arithmetic conversions are performed. +The result is the bitwise +AND +function of the operands. +The operator applies only to integral +operands. +.NH 2 +Bitwise Exclusive \s-1OR\s0 Operator +.DS +\fIexclusive-or-expression: + expression ^ expression\fR +.DE +.PP +The +.B +^ +.R +operator is associative, +and expressions involving +.B +^ +.R +may be rearranged. +The usual arithmetic conversions are performed; +the result is +the bitwise exclusive +OR +function of +the operands. +The operator applies only to integral +operands. +.NH 2 +Bitwise Inclusive \s-1OR\s0 Operator +.DS +\fIinclusive-or-expression: + expression | expression\fR +.DE +.PP +The +.B +| +.R +operator is associative, +and expressions involving +.B +| +.R +may be rearranged. +The usual arithmetic conversions are performed; +the result is the bitwise inclusive +OR +function of its operands. +The operator applies only to integral +operands. +.NH 2 +Logical \s-1AND\s0 Operator +.DS +\fIlogical-and-expression: + expression && expression\fR +.DE +.PP +The +.B +&& +.R +operator groups left to right. +It returns 1 if both its operands +evaluate to nonzero, 0 otherwise. +Unlike +\fB&\fR, +.B +&& +.R +guarantees left to right +evaluation; moreover, the second operand is not evaluated +if the first operand is 0. +.PP +The operands need not have the same type, but each +must have one of the fundamental +types or be a pointer. +The result is always +.B +int\fR. +.R +.NH 2 +Logical \s-1OR\s0 Operator +.DS +\fIlogical-or-expression: + expression || expression\fR +.DE +.PP +The +.B +|| +.R +operator groups left to right. +It returns 1 if either of its operands +evaluates to nonzero, 0 otherwise. +Unlike +\fB|\fR, +.B +|| +.R +guarantees left to right evaluation; moreover, +the second operand is not evaluated +if the value of the first operand is nonzero. +.PP +The operands need not have the same type, but each +must +have one of the fundamental types +or be a pointer. +The result is always +.B +int\fR. +.R +.NH 2 +Conditional Operator +.DS +\fIconditional-expression: + expression ? expression : expression\fR +.DE +.PP +Conditional expressions group right to left. +The first expression is evaluated; +and if it is nonzero, the result is the value of the +second expression, otherwise that of third expression. +If possible, the usual arithmetic conversions are performed +to bring the second and third expressions to a common type. +If both are structures or unions of the same type, +the result has the type of the structure or union. +If both pointers are of the same type, +the result has the common type. +Otherwise, one must be a pointer and the other the constant 0, +and the result has the type of the pointer. +Only one of the second and third +expressions is evaluated. +.NH 2 +Assignment Operators +.PP +There are a number of assignment operators, +all of which group right to left. +All require an lvalue as their left operand, +and the type of an assignment expression is that +of its left operand. +The value is the value stored in the +left operand after the assignment has taken place. +The two parts of a compound assignment operator are separate +tokens. +.DS +\fIassignment-expression: + lvalue = expression + lvalue \(pl= expression + lvalue \(mi= expression + lvalue \(**= expression + lvalue /= expression + lvalue %= expression + lvalue >>= expression + lvalue <<= expression + lvalue &= expression + lvalue ^= expression + lvalue |= expression\fR +.DE +.PP +In the simple assignment with +\fB=\fR, +the value of the expression replaces that of the object +referred +to by the lvalue. +If both operands have arithmetic type, +the right operand is converted to the type of the left +preparatory to the assignment. +Second, both operands may be structures or unions of the same type. +Finally, if the left operand is a pointer, the right operand must in general be a pointer +of the same type. +However, the constant 0 may be assigned to a pointer; +it is guaranteed that this value will produce a null +pointer distinguishable from a pointer to any object. +.PP +The behavior of an expression +of the form +\fBE1\fR\^ \fIop\fR\^ = \fBE2\fR\^ +may be inferred by +taking it as equivalent to +\fBE1 = E1 \fIop\fR\^ (\fBE2\fR\^); +however, +.B +E1 +.R +is evaluated only once. +In +.B +\(pl= +.R +and +\fB\(mi=\fR, +the left operand may be a pointer; in which case, the (integral) right +operand is converted as explained +in ``Additive Operators.'' +All right operands and all nonpointer left operands must +have arithmetic type. +.NH 2 +Comma Operator +.DS +\fIcomma-expression: + expression , expression\fR +.DE +.PP +A pair of expressions separated by a comma is evaluated +left to right, and the value of the left expression is +discarded. +The type and value of the result are the +type and value of the right operand. +This operator groups left to right. +In contexts where comma is given a special meaning, +e.g., in lists of actual arguments +to functions (see ``Primary Expressions'') and lists +of initializers (see ``Initialization'' under ``DECLARATIONS''), +the comma operator as described in this subpart +can only appear in parentheses. For example, +.DS +\fBf(a, (t=3, t\(pl2), c)\fR +.DE +.LP +has three arguments, the second of which has the value 5. +.NH 1 +Declarations +.PP +Declarations are used to specify the interpretation +which C gives to each identifier; they do not necessarily +reserve storage associated with the identifier. +Declarations have the form +.DS +\fIdeclaration: + decl-specifiers declarator-list\v'0.5'\s-2opt\s0\v'-0.5' ;\fR +.DE +.PP +The declarators in the declarator-list +contain the identifiers being declared. +The decl-specifiers +consist of a sequence of type and storage class specifiers. +.DS +\fIdecl-specifiers: + type-specifier decl-specifiers\v'0.5'\s-2opt\s0\v'-0.5' + sc-specifier decl-specifiers\v'0.5'\s-2opt\s0\v'-0.5'\fR +.DE +.PP +The list must be self-consistent in a way described below. +.NH 2 +Storage Class Specifiers +.PP +The sc-specifiers are: +.DS +\fIsc-specifier:\fB + auto + static + extern + register + typedef\fR +.DE +.PP +The +.B +typedef +.R +specifier does not reserve storage +and is called a ``storage class specifier'' only for syntactic convenience. +See ``Typedef'' for more information. +The meanings of the various storage classes were discussed in ``Names.'' +.PP +The +\fBauto\fR, +\fBstatic\fR, +and +.B +register +.R +declarations also serve as definitions +in that they cause an appropriate amount of storage to be reserved. +In the +.B +extern +.R +case, +there must be an external definition (see ``External Definitions'') +for the given identifiers +somewhere outside the function in which they are declared. +.PP +A +.B +register +.R +declaration is best thought of as an +.B +auto +.R +declaration, together with a hint to the compiler +that the variables declared will be heavily used. +Only the first few +such declarations in each function are effective. +Moreover, only variables of certain types will be stored in registers; +on the +PDP-11, +they are +.B +int +.R +or pointer. +One other restriction applies to register variables: +the address-of operator +.B +& +.R +cannot be applied to them. +Smaller, faster programs can be expected if register declarations +are used appropriately, +but future improvements in code generation +may render them unnecessary. +.PP +At most, one sc-specifier may be given in a declaration. +If the sc-specifier is missing from a declaration, it +is taken to be +.B +auto +.R +inside a function, +.B +extern +.R +outside. +Exception: +functions are never +automatic. +.NH 2 +Type Specifiers +.PP +The type-specifiers are +.DS +\fItype-specifier: + struct-or-union-specifier + typedef-name + enum-specifier +basic-type-specifier: + basic-type + basic-type basic-type-specifiers +basic-type:\fB + char + short + int + long + unsigned + float + double + void\fR +.DE +.PP +At most one of the words \fBlong\fR or \fBshort\fR +may be specified in conjunction with \fBint\fR; +the meaning is the same as if \fBint\fR were not mentioned. +The word \fBlong\fR may be specified in conjunction with +\fBfloat\fR; +the meaning is the same as \fBdouble\fR. +The word \fBunsigned\fR may be specified alone, or +in conjunction with \fBint\fR or any of its short +or long varieties, or with \fBchar\fR. +.PP +Otherwise, at most on type-specifier may be +given in a declaration. +In particular, adjectival use of \fBlong\fR, +\fBshort\fR, or \fBunsigned\fR is not permitted +with \fBtypedef\fR names. +If the type-specifier is missing from a declaration, +it is taken to be \fBint\fR. +.PP +Specifiers for structures, unions, and enumerations are discussed in +``Structure, Union, and Enumeration Declarations.'' +Declarations with +.B +typedef +.R +names are discussed in ``Typedef.'' +.NH 2 +Declarators +.PP +The declarator-list appearing in a declaration +is a comma-separated sequence of declarators, +each of which may have an initializer. +.DS +\fIdeclarator-list: + init-declarator + init-declarator , declarator-list +.DE +.DS +\fIinit-declarator: + declarator initializer\v'0.5'\s-2opt\s0\v'-0.5'\fR +.DE +.PP +Initializers are discussed in ``Initialization''. +The specifiers in the declaration +indicate the type and storage class of the objects to which the +declarators refer. +Declarators have the syntax: +.DS +\fIdeclarator: + identifier + ( declarator ) + \(** declarator + declarator () + declarator [ constant-expression\v'0.5'\s-2opt\s0\v'-0.5' ]\fR +.DE +.PP +The grouping is +the same as in expressions. +.NH 2 +Meaning of Declarators +.PP +Each declarator is taken to be +an assertion that when a construction of +the same form as the declarator appears in an expression, +it yields an object of the indicated +type and storage class. +.PP +Each declarator contains exactly one identifier; it is this identifier that +is declared. +If an unadorned identifier appears +as a declarator, then it has the type +indicated by the specifier heading the declaration. +.PP +A declarator in parentheses is identical to the unadorned declarator, +but the binding of complex declarators may be altered by parentheses. +See the examples below. +.PP +Now imagine a declaration +.DS +\fBT D1\fR +.DE +.LP +where +.B +T +.R +is a type-specifier (like +\fBint\fR, +etc.) +and +.B +D1 +.R +is a declarator. +Suppose this declaration makes the identifier have type +``\|.\|.\|.\| +.B +T +.R +,'' +where the ``\|.\|.\|.\|'' is empty if +.B +D1 +.R +is just a plain identifier +(so that the type of +.B +x +.R +in +\fB`int x''\fR +is just +\fBint\fR). +Then if +.B +D1 +.R +has the form +.DS +\fB\(**D\fR +.DE +.LP +the type of the contained identifier is +``\|.\|.\|.\| pointer to +.B +T +.R +\&.'' +.PP +If +.B +D1 +.R +has the form +.DS +\fBD\|(\|\|)\|\fR +.DE +.LP +then the contained identifier has the type +``\|.\|.\|. function returning +\fBT\fR.'' +.LP +If +.B +D1 +.R +has the form +.DS +\fBD\|[\|\fIconstant-expression\fB\|]\fR +.DE +.LP +or +.DS +\fBD\|[\|]\|\fR +.DE +.LP +then the contained identifier has type +``\|.\|.\|.\| array of +\fBT\fR.'' +In the first case, the constant +expression +is an expression +whose value is determinable at compile time +, whose type is +.B +int\fR, +and whose value is positive. +(Constant expressions are defined precisely in ``Constant Expressions.'') +When several ``array of'' specifications are adjacent, a multidimensional +array is created; +the constant expressions which specify the bounds +of the arrays may be missing only for the first member of the sequence. +This elision is useful when the array is external +and the actual definition, which allocates storage, +is given elsewhere. +The first constant expression may also be omitted +when the declarator is followed by initialization. +In this case the size is calculated from the number +of initial elements supplied. +.PP +An array may be constructed from one of the basic types, from a pointer, +from a structure or union, +or from another array (to generate a multidimensional array). +.PP +Not all the possibilities +allowed by the syntax above are actually +permitted. +The restrictions are as follows: +functions may not return +arrays or functions +although they may return pointers; +there are no arrays of functions although +there may be arrays of pointers to functions. +Likewise, a structure or union may not contain a function; +but it may contain a pointer to a function. +.PP +As an example, the declaration +.DS +\fBint i, \(**ip, f(), \(**fip(), (\(**pfi)();\fR +.DE +.LP +declares an integer +\fBi\fR, +a pointer +.B +ip +.R +to an integer, +a function +.B +f +.R +returning an integer, +a function +.B +fip +.R +returning a pointer to an integer, +and a pointer +.B +pfi +.R +to a function which +returns an integer. +It is especially useful to compare the last two. +The binding of +.B +\(**fip() +.R +is +.B +\(**(fip())\fR. +.R +The declaration suggests, +and the same construction in an expression +requires, the calling of a function +.B +fip\fR. +.R +Using indirection through the (pointer) result +to yield an integer. +In the declarator +\fB(\(**pfi)()\fR, +the extra parentheses are necessary, as they are also +in an expression, to indicate that indirection through +a pointer to a function yields a function, which is then called; +it returns an integer. +.PP +As another example, +.DS +\fBfloat fa[17], \(**afp[17];\fR +.DE +.LP +declares an array of +.B +float +.R +numbers and an array of +pointers to +.B +float +.R +numbers. +Finally, +.DS +\fBstatic int x3d[3][5][7];\fR +.DE +.LP +declares a static 3-dimensional array of integers, +with rank 3\(mu5\(mu7. +In complete detail, +.B +x3d +.R +is an array of three items; +each item is an array of five arrays; +each of the latter arrays is an array of seven +integers. +Any of the expressions +\fBx3d\fR, +\fBx3d[i]\fR, +\fBx3d[i][j]\fR, +.B +x3d[i][j][k] +.R +may reasonably appear in an expression. +The first three have type ``array'' +and the last has type +.B +int\fR. +.R +.NH 2 +Structure and Union Declarations +.PP +A structure +is an object consisting of a sequence of named members. +Each member may have any type. +A union is an object which may, at a given time, contain any one +of several members. +Structure and union specifiers have the same form. +.DS +\fIstruct-or-union-specifier: + struct-or-union { struct-decl-list } + struct-or-union identifier { struct-decl-list } + struct-or-union identifier +.DE +.DS +\fIstruct-or-union:\fB + struct + union\fR +.DE +.PP +The +struct-decl-list +.ne 4 +is a sequence of declarations for the members of the structure or union: +.DS +\fIstruct-decl-list: + struct-declaration + struct-declaration struct-decl-list +.DE +.DS +\fIstruct-declaration: + type-specifier struct-declarator-list ; +.DE +.DS +\fIstruct-declarator-list: + struct-declarator + struct-declarator , struct-declarator-list\fR +.DE +.PP +In the usual case, a struct-declarator is just a declarator +for a member of a structure or union. +A structure member may also consist of a specified number of bits. +Such a member is also called a +.I +field ; +.R +its length, +a non-negative constant expression, +is set off from the field name by a colon. +.DS +\fIstruct-declarator: + declarator + declarator : constant-expression + : constant-expression\fR +.DE +.PP +Within a structure, the objects declared +have addresses which increase as the declarations +are read left to right. +Each nonfield member of a structure +begins on an addressing boundary appropriate +to its type; +therefore, there may +be unnamed holes in a structure. +Field members are packed into machine integers; +they do not straddle words. +A field which does not fit into the space remaining in a word +is put into the next word. +No field may be wider than a word. +.PP +Fields are assigned right to left +on the +PDP-11 +and +VAX-11, +left to right on the 3B 20. +.PP +A struct-declarator with no declarator, only a colon and a width, +indicates an unnamed field useful for padding to conform +to externally-imposed layouts. +As a special case, a field with a width of 0 +specifies alignment of the next field at an implementation dependent boundary. +.PP +The language does not restrict the types of things that +are declared as fields, +but implementations are not required to support any but +integer fields. +Moreover, +even +.B +int +.R +fields may be considered to be unsigned. +On the +PDP-11, +fields are not signed and have only integer values; +on the +VAX-11, +fields declared with +.B +int +.R +are treated as containing a sign. +For these reasons, +it is strongly recommended that fields be declared as +.B +unsigned\fR. +.R +In all implementations, +there are no arrays of fields, +and the address-of operator +.B +& +.R +may not be applied to them, so that there are no pointers to +fields. +.PP +A union may be thought of as a structure all of whose members +begin at offset 0 and whose size is sufficient to contain +any of its members. +At most, one of the members can be stored in a union +at any time. +.PP +A structure or union specifier of the second form, that is, one of +.DS + \fBstruct \fIidentifier { struct-decl-list \fR} + \fBunion \fIidentifier { struct-decl-list \fR} +.DE +.LP +declares the identifier to be the +.I +structure tag +.R +(or union tag) +of the structure specified by the list. +A subsequent declaration may then use +the third form of specifier, one of +.DS + \fBstruct \fIidentifier\fR + \fBunion \fIidentifier\fR +.DE +.PP +Structure tags allow definition of self-referential +structures. Structure tags also +permit the long part of the declaration to be +given once and used several times. +It is illegal to declare a structure or union +which contains an instance of +itself, but a structure or union may contain a pointer to an instance of itself. +.PP +The third form of a structure or union specifier may be +used prior to a declaration which gives the complete specification +of the structure or union in situations in which the size +of the structure or union is unnecessary. +The size is unnecessary in two situations: when a +pointer to a structure or union is being declared and +when a \fBtypedef\fR name is declared to be a synonym +for a structure or union. +This, for example, allows the declaration of a pair +of structures which contain pointers to each other. +.PP +The names of members and tags do not conflict +with each other or with ordinary variables. +A particular name may not be used twice +in the same structure, +but the same name may be used in several different structures in the same scope. +.PP +A simple but important example of a structure declaration is +the following binary tree structure: +.DS +\fBstruct tnode +{ + char tword[20]; + int count; + struct tnode \(**left; + struct tnode \(**right; +};\fR +.DE +.LP +which contains an array of 20 characters, an integer, and two pointers +to similar structures. +Once this declaration has been given, the +declaration +.DS +\fBstruct tnode s, \(**sp;\fR +.DE +.LP +declares +.B +s +.R +to be a structure of the given sort +and +.B +sp +.R +to be a pointer to a structure +of the given sort. +With these declarations, the expression +.DS +\fBsp->count\fR +.DE +.LP +refers to the +.B +count +.R +field of the structure to which +.B +sp +.R +points; +.DS +\fBs.left\fR +.DE +.LP +refers to the left subtree pointer +of the structure +\fBs\fR; +and +.DS +\fBs.right->tword[0]\fR +.DE +.LP +refers to the first character of the +.B +tword +.R +member of the right subtree of +.B +s\fR. +.R +.PP +.NH 2 +Enumeration Declarations +.PP +Enumeration variables and constants have integral type. +.DS +\fIenum-specifier:\fB + enum\fI { enum-list \fR}\fB + enum \fIidentifier { enum-list \fR}\fB + enum \fIidentifier +.sp +enum-list: + enumerator + enum-list , enumerator +.sp +enumerator: + identifier + identifier = constant-expression\fR +.DE +.PP +The identifiers in an enum-list are declared as constants +and may appear wherever constants are required. +If no enumerators with +.B += +.R +appear, then the values of the +corresponding constants begin at 0 and increase by 1 as the declaration is +read from left to right. +An enumerator with +.B += +.R +gives the associated identifier the value +indicated; subsequent identifiers continue the progression from the assigned value. +.PP +The names of enumerators in the same scope must all be distinct +from each other and from those of ordinary variables. +.PP +The role of the identifier in the enum-specifier +is entirely analogous to that of the structure tag +in a struct-specifier; it names a particular enumeration. +For example, +.DS L +\fBenum color { chartreuse, burgundy, claret=20, winedark }; +\&... +enum color *cp, col; +\&... +col = claret; +cp = &col; +\&... +if (*cp == burgundy) ...\fR +.DE +.LP +makes +.B +color +.R +the enumeration-tag of a type describing various colors, +and then declares +.B +cp +.R +as a pointer to an object of that type, +and +.B +col +.R +as an object of that type. +The possible values are drawn from the set {0,1,20,21}. +.NH 2 +Initialization +.PP +A declarator may specify an initial value for the +identifier being declared. +The initializer is preceded by +.B += +.R +and +consists of an expression or a list of values nested in braces. +.DS +\fIinitializer: + = expression + = { initializer-list } + = { initializer-list , } +.DE +.DS +\fIinitializer-list: + expression + initializer-list , initializer-list\fR + { \fIinitializer-list \fR} + { \fIinitializer-list\fR , } +.DE +.PP +All the expressions in an initializer +for a static or external variable must be constant +expressions, which are described in ``CONSTANT EXPRESSIONS'', +or expressions which reduce to the address of a previously +declared variable, possibly offset by a constant expression. +Automatic or register variables may be initialized by arbitrary +expressions involving constants and previously declared variables and functions. +.PP +Static and external variables that are not initialized are +guaranteed to start off as zero. +Automatic and register variables that are not initialized +are guaranteed to start off as garbage. +.PP +When an initializer applies to a +.I +scalar +.R +(a pointer or an object of arithmetic type), +it consists of a single expression, perhaps in braces. +The initial value of the object is taken from +the expression; the same conversions as for assignment are performed. +.PP +When the declared variable is an +.I +aggregate +.R +(a structure or array), +the initializer consists of a brace-enclosed, comma-separated list of +initializers for the members of the aggregate +written in increasing subscript or member order. +If the aggregate contains subaggregates, this rule +applies recursively to the members of the aggregate. +If there are fewer initializers in the list than there are members of the aggregate, +then the aggregate is padded with zeros. +It is not permitted to initialize unions or automatic aggregates. +.PP +Braces may in some cases be omitted. +If the initializer begins with a left brace, then +the succeeding comma-separated list of initializers initializes +the members of the aggregate; +it is erroneous for there to be more initializers than members. +If, however, the initializer does not begin with a left brace, +then only enough elements from the list are taken to account +for the members of the aggregate; any remaining members +are left to initialize the next member of the aggregate of which +the current aggregate is a part. +.PP +A final abbreviation allows a +.B +char +.R +array to be initialized by a string. +In this case successive characters of the string +initialize the members of the array. +.PP +For example, +.DS +\fBint x[] = { 1, 3, 5 };\fR +.DE +.LP +declares and initializes +.B +x +.R +as a one-dimensional array which has three members, since no size was specified +and there are three initializers. +.DS +\fBfloat y[4][3] = +{ + { 1, 3, 5 }, + { 2, 4, 6 }, + { 3, 5, 7 }, +};\fR +.DE +.LP +is a completely-bracketed initialization: +1, 3, and 5 initialize the first row of +the array +\fBy[0]\fR, +namely +\fBy[0][0]\fR, +\fBy[0][1]\fR, +and +.B +y[0][2]\fR. +.R +Likewise, the next two lines initialize +.B +y[1] +.R +and +.B +y[2]\fR. +.R +The initializer ends early and therefore +.B +y[3] +.R +is initialized with 0. +Precisely, the same effect could have been achieved by +.DS +\fBfloat y[4][3] = +{ + 1, 3, 5, 2, 4, 6, 3, 5, 7 +};\fR +.DE +.PP +The initializer for +.B +y +.R +begins with a left brace but that for +.B +y[0] +.R +does not; +therefore, three elements from the list are used. +Likewise, the next three are taken successively for +.B +y[1] +.R +and +.B +y[2]\fR. +.R +Also, +.DS +\fBfloat y[4][3] = +{ + { 1 }, { 2 }, { 3 }, { 4 } +};\fR +.DE +.LP +initializes the first column of +.B +y +.R +(regarded as a two-dimensional array) +and leaves the rest 0. +.PP +Finally, +.DS +\fBchar msg[] = "Syntax error on line %s\en";\fR +.DE +.LP +shows a character array whose members are initialized +with a string. +.NH 2 +Type Names +.PP +In two contexts (to specify type conversions explicitly +by means of a cast +and as an argument of +\fBsizeof\fR), +it is desired to supply the name of a data type. +This is accomplished using a ``type name'', which in essence +is a declaration for an object of that type which omits the name of +the object. +.DS +\fItype-name: + type-specifier abstract-declarator +.DE +.DS +\fIabstract-declarator: + empty + ( abstract-declarator ) + \(** abstract-declarator + abstract-declarator () + abstract-declarator\fR\^ [ \fIconstant-expression\v'0.5'\s-2opt\s0\v'-0.5' \fR\^] +.DE +.PP +To avoid ambiguity, +in the construction +.DS + \fI( abstract-declarator \fR) +.DE +.LP +the +abstract-declarator +is required to be nonempty. +Under this restriction, +it is possible to identify uniquely the location in the abstract-declarator +where the identifier would appear if the construction were a declarator +in a declaration. +The named type is then the same as the type of the +hypothetical identifier. +For example, +.DS +\fBint +int \(** +int \(**[3] +int (\(**)[3] +int \(**() +int (\(**)() +int (\(**[3])()\fR +.DE +.LP +name respectively the types ``integer,'' ``pointer to integer,'' +``array of three pointers to integers,'' +``pointer to an array of three integers,'' +``function returning pointer to integer,'' +``pointer to function returning an integer,'' +and ``array of three pointers to functions returning an integer.'' +.NH 2 +Typedef +.PP +Declarations whose ``storage class'' is +.B +typedef +.R +do not define storage but instead +define identifiers which can be used later +as if they were type keywords naming fundamental +or derived types. +.DS +\fItypedef-name:\fR + \fIidentifier\fR +.DE +.PP +Within the scope of a declaration involving +\fBtypedef\fR, +each identifier appearing as part of +any declarator therein becomes syntactically +equivalent to the type keyword +naming the type +associated with the identifier +in the way described in ``Meaning of Declarators.'' +For example, +after +.DS +\fBtypedef int MILES, \(**KLICKSP; +typedef struct { double re, im; } complex;\fR +.DE +.LP +the constructions +.DS +\fBMILES distance; +extern KLICKSP metricp; +complex z, \(**zp;\fR +.DE +.LP +are all legal declarations; the type of +.B +distance +.R +is +\fBint\fR, +that of +.B +metricp +.R +is ``pointer to \fBint\fR, '' +and that of +.B +z +.R +is the specified structure. +The +.B +zp +.R +is a pointer to such a structure. +.PP +The +.B +typedef +.R +does not introduce brand-new types, only synonyms for +types which could be specified in another way. +Thus +in the example above +.B +distance +.R +is considered to have exactly the same type as +any other +.B +int +.R +object. +.NH 1 +Statements +.PP +Except as indicated, statements are executed in sequence. +.NH 2 +Expression Statement +.PP +Most statements are expression statements, which have +the form +.DS +\fIexpression \fR; +.DE +.PP +Usually expression statements are assignments or function +calls. +.NH 2 +Compound Statement or Block +.PP +So that several statements can be used where one is expected, +the compound statement (also, and equivalently, called ``block'') is provided: +.DS +\fIcompound-statement: + { declaration-list\v'0.5'\s-2opt\s0\v'-0.5' statement-list\v'0.5'\s-2opt\s0\v'-0.5' } +.DE +.DS +\fIdeclaration-list: + declaration + declaration declaration-list +.DE +.DS +\fIstatement-list: + statement + statement statement-list\fR +.DE +.PP +If any of the identifiers +in the declaration-list were previously declared, +the outer declaration is pushed down for the duration of the block, +after which it resumes its force. +.PP +Any initializations of +.B +auto +.R +or +.B +register +.R +variables are performed each time the block is entered at the top. +It is currently possible +(but a bad practice) +to transfer into a block; +in that case the initializations are not performed. +Initializations of +.B +static +.R +variables are performed only once when the program +begins execution. +Inside a block, +.B +extern +.R +declarations do not reserve storage +so initialization is not permitted. +.NH 2 +Conditional Statement +.PP +The two forms of the conditional statement are +.DS +\fBif\fR\^ ( \fIexpression\fR\^ ) \fIstatement\fR\^ +\fBif\fR\^ ( \fIexpression\fR\^ ) \fIstatement \fBelse \fIstatement\fR\^ +.DE +.PP +In both cases, the expression is evaluated; +and if it is nonzero, the first substatement +is executed. +In the second case, the second substatement is executed +if the expression is 0. +The ``else'' ambiguity is resolved by connecting +an +.B +else +.R +with the last encountered +\fBelse\fR-less +.B +if\fR. +.R +.NH 2 +While Statement +.PP +The +.B +while +.R +statement has the form +.DS +\fBwhile\fR\^ ( \fIexpression\fR\^ ) \fIstatement\fR\^ +.DE +.PP +The substatement is executed repeatedly +so long as the value of the +expression remains nonzero. +The test takes place before each execution of the +statement. +.NH 2 +Do Statement +.PP +The +.B +do +.R +statement has the form +.DS +\fBdo \fIstatement \fBwhile\fR\^ ( \fIexpression \fR\^) ; +.DE +.PP +The substatement is executed repeatedly until +the value of the expression becomes 0. +The test takes place after each execution of the +statement. +.NH 2 +For Statement +.PP +The +.B +for +.R +statement has the form: +.DS +\fBfor\fI ( exp-1\v'0.5'\s-2opt\s0\v'-0.5' ; exp-2\v'0.5'\s-2opt\s0\v'-0.5' ; exp-3\v'0.5'\s-2opt\s0\v'-0.5' ) statement\fR +.DE +.PP +.sp +Except for the behavior of \fBcontinue\fR, +this statement is equivalent to +.DS +\fIexp-1 \fR; +\fBwhile\fR\^ ( \fIexp-2\ ) \fR\^ +{ + \fIstatement + exp-3 ;\fR +} +.DE +.PP +Thus the first expression specifies initialization +for the loop; the second specifies +a test, made before each iteration, such +that the loop is exited when the expression becomes +0. +The third expression often specifies an incrementing +that is performed after each iteration. +.PP +Any or all of the expressions may be dropped. +A missing +.I +exp-2 +.R +makes the +implied +.B +while +.R +clause equivalent to +\fBwhile(1)\fR; +other missing expressions are simply +dropped from the expansion above. +.NH 2 +Switch Statement +.PP +The +.B +switch +.R +statement causes control to be transferred +to one of several statements depending on +the value of an expression. +It has the form +.DS +\fBswitch\fR\^ ( \fIexpression\fR\^ ) \fIstatement\fR\^ +.DE +.PP +The usual arithmetic conversion is performed on the +expression, but the result must be +.B +int\fR. +.R +The statement is typically compound. +Any statement within the statement +may be labeled with one or more case prefixes +as follows: +.DS +\fBcase \fIconstant-expression \fR: +.DE +.LP +where the constant +expression +must be +.B +int\fR. +.R +No two of the case constants in the same switch +may have the same value. +Constant expressions are precisely defined in ``CONSTANT EXPRESSIONS.'' +.PP +There may also be at most one statement prefix of the +form +.DS +\fBdefault :\fR +.DE +.PP +When the +.B +switch +.R +statement is executed, its expression +is evaluated and compared with each case constant. +If one of the case constants is +equal to the value of the expression, +control is passed to the statement +following the matched case prefix. +If no case constant matches the expression +and if there is a +\fBdefault\fR, +prefix, control +passes to the prefixed +statement. +If no case matches and if there is no +\fBdefault\fR, +then +none of the statements in the +switch is executed. +.PP +The prefixes +.B +case +.R +and +.B +default +.R +do not alter the flow of control, +which continues unimpeded across such prefixes. +To exit from a switch, see +``Break Statement.'' +.PP +Usually, the statement that is the subject of a switch is compound. +Declarations may appear at the head of this +statement, +but +initializations of automatic or register variables +are ineffective. +.NH 2 +Break Statement +.PP +The statement +.DS +\fBbreak ;\fR +.DE +.LP +causes termination of the smallest enclosing +\fBwhile\fR, +\fBdo\fR, +\fBfor\fR, +or +\fBswitch\fR +statement; +control passes to the +statement following the terminated statement. +.NH 2 +Continue Statement +.PP +The statement +.DS +\fBcontinue ;\fR +.DE +.LP +causes control to pass to the loop-continuation portion of the +smallest enclosing +\fBwhile\fR, +\fBdo\fR, +or +\fBfor\fR +statement; that is to the end of the loop. +More precisely, in each of the statements +.DS +.TS +lw(2i) lw(2i) lw(2i). +\fBwhile (\|.\|.\|.\|) { do { for (\|.\|.\|.\|) {\fR + \fIstatement ; statement ; statement ;\fR + \fBcontin: ; contin: ; contin: ; +} } while (...); }\fR +.TE +.DE +.LP +a +.B +continue +.R +is equivalent to +.B +goto\ contin\fR. +.R +(Following the +.B +contin: +.R +is a null statement, see ``Null Statement''.) +.NH 2 +Return Statement +.PP +A function returns to its caller by means of +the +.B +return +.R +statement which has one of the +forms +.DS +\fBreturn ; +return \fIexpression \fR; +.DE +.PP +In the first case, the returned value is undefined. +In the second case, the value of the expression +is returned to the caller +of the function. +If required, the expression is converted, +as if by assignment, to the type of +function in which it appears. +Flowing off the end of a function is +equivalent to a return with no returned value. +The expression may be parenthesized. +.NH 2 +Goto Statement +.PP +Control may be transferred unconditionally by means of +the statement +.DS +\fBgoto \fIidentifier \fR; +.DE +.PP +The identifier must be a label +(see ``Labeled Statement'') +located in the current function. +.NH 2 +Labeled Statement +.PP +Any statement may be preceded by +label prefixes of the form +.DS +\fIidentifier \fR: +.DE +.LP +which serve to declare the identifier +as a label. +The only use of a label is as a target of a +.B +goto\fR. +.R +The scope of a label is the current function, +excluding any subblocks in which the same identifier has been redeclared. +See ``SCOPE RULES.'' +.NH 2 +Null Statement +.PP +The null statement has the form +.DS + \fB;\fR +.DE +.PP +A null statement is useful to carry a label just before the +.B +} +.R +of a compound statement or to supply a null +body to a looping statement such as +.B +while\fR. +.R +.NH 1 +External Definitions +.PP +A C program consists of a sequence of external definitions. +An external definition declares an identifier to +have storage class +.B +extern +.R +(by default) +or perhaps +\fBstatic\fR, +and +a specified type. +The type-specifier (see ``Type Specifiers'' in +``DECLARATIONS'') may also be empty, in which +case the type is taken to be +.B +int\fR. +.R +The scope of external definitions persists to the end +of the file in which they are declared just as the effect +of declarations persists to the end of a block. +The syntax of external definitions is the same +as that of all declarations except that +only at this level may the code for functions be given. +.NH 2 +External Function Definitions +.PP +Function definitions have the form +.DS +\fIfunction-definition: + decl-specifiers\v'0.5'\s-2opt\s0\v'-0.5' function-declarator function-body\fR +.DE +.PP +The only sc-specifiers +allowed +among the decl-specifiers +are +.B +extern +.R +or +\fBstatic\fR; +see ``Scope of Externals'' in +``SCOPE RULES'' for the distinction between them. +A function declarator is similar to a declarator +for a ``function returning .\|.\|.\|'' except that +it lists the formal parameters of +the function being defined. +.DS +\fIfunction-declarator: + declarator ( parameter-list\v'0.5'\s-2opt\s0\v'-0.5' ) +.DE +.DS +\fIparameter-list: + identifier + identifier , parameter-list\fR +.DE +.PP +The function-body +has the form +.DS +\fIfunction-body: + declaration-list\v'0.5'\s-2opt\s0\v'-0.5' compound-statement\fR +.DE +.PP +The identifiers in the parameter list, and only those identifiers, +may be declared in the declaration list. +Any identifiers whose type is not given are taken to be +.B +int\fR. +.R +The only storage class which may be specified is +\fBregister\fR; +if it is specified, the corresponding actual parameter +will be copied, if possible, into a register +at the outset of the function. +.PP +A simple example of a complete function definition is +.DS +\fBint max(a, b, c) + int a, b, c; +{ + int m; +.sp + m = (a > b) ? a : b; + return((m > c) ? m : c); +}\fR +.DE +.PP +Here +.B +int +.R +is the type-specifier; +.B +max(a,\ b,\ c) +.R +is the function-declarator; +.B +int\ a,\ b,\ c; +.R +is the declaration-list for +the formal +parameters; +\fB{\ ...\ }\fR +is the +block giving the code for the statement. +.PP +The C program converts all +.B +float +.R +actual parameters +to +\fBdouble\fR, +so formal parameters declared +.B +float +.R +have their declaration adjusted to read +.B +double\fR. +.R +All \fBchar\fR and \fBshort\fR formal parameter +declarations are similarly adjusted +to read \fBint\fR. +Also, since a reference to an array in any context +(in particular as an actual parameter) +is taken to mean +a pointer to the first element of the array, +declarations of formal parameters declared ``array of .\|.\|.\|'' +are adjusted to read ``pointer to .\|.\|.\|.'' +.NH 2 +External Data Definitions +.PP +An external data definition has the form +.DS +\fIdata-definition: + declaration\fR +.DE +.PP +The storage class of such data may be +.B +extern +.R +(which is the default) +or +.B +static +.R +but not +.B +auto +.R +or +\fBregister\fR. +.NH 1 +Scope Rules +.PP +A C program need not all +be compiled at the same time. The source text of the +program +may be kept in several files, and precompiled +routines may be loaded from +libraries. +Communication among the functions of a program +may be carried out both through explicit calls +and through manipulation of external data. +.PP +Therefore, there are two kinds of scopes to consider: +first, what may be called the +.UL lexical +.UL scope +of an identifier, which is essentially the +region of a program during which it may +be used without drawing ``undefined identifier'' +diagnostics; +and second, the scope +associated with external identifiers, +which is characterized by the rule +that references to the same external +identifier are references to the same object. +.NH 2 +Lexical Scope +.PP +The lexical scope of identifiers declared in external definitions +persists from the definition through +the end of the source file +in which they appear. +The lexical scope of identifiers which are formal parameters +persists through the function with which they are +associated. +The lexical scope of identifiers declared at the head of a block +persists until the end of the block. +The lexical scope of labels is the whole of the +function in which they appear. +.PP +In all cases, however, +if an identifier is explicitly declared at the head of a block, +including the block constituting a function, +any declaration of that identifier outside the block +is suspended until the end of the block. +.PP +Remember also (see ``Structure, Union, and Enumeration Declarations'' in +``DECLARATIONS'') that tags, identifiers associated with +ordinary variables, +and identities associated with structure and union members +form three disjoint classes +which do not conflict. +Members and tags follow the same scope rules +as other identifiers. +The \fBenum\fR constants are in the same +class as ordinary variables and follow the same scope rules. +The +.B +typedef +.R +names are in the same class as ordinary identifiers. +They may be redeclared in inner blocks, but an explicit +type must be given in the inner declaration: +.DS +\fBtypedef float distance; +\&... +{ + auto int distance; + ...\fR +} +.DE +.PP +The +.B +int +.R +must be present in the second declaration, +or it would be taken to be +a declaration with no declarators and type +.B +distance\fR. +.R +.NH 2 +Scope of Externals +.PP +If a function refers to an identifier declared to be +\fBextern\fR, +then somewhere among the files or libraries +constituting the complete program +there must be at least one external definition +for the identifier. +All functions in a given program which refer to the same +external identifier refer to the same object, +so care must be taken that the type and size +specified in the definition +are compatible with those specified +by each function which references the data. +.PP +It is illegal to explicitly initialize any external +identifier more than once in the set of files and libraries +comprising a multi-file program. +It is legal to have more than one data definition +for any external non-function identifier; +explicit use of \fBextern\fR does not +change the meaning of an external declaration. +.PP +In restricted environments, the use of the \fBextern\fR +storage class takes on an additional meaning. +In these environments, the explicit appearance of the +\fBextern\fR keyword in external data declarations of +identities without initialization indicates that +the storage for the identifiers is allocated elsewhere, +either in this file or another file. +It is required that there be exactly one definition of +each external identifier (without \fBextern\fR) +in the set of files and libraries +comprising a mult-file program. +.PP +Identifiers declared +.B +static +.R +at the top level in external definitions +are not visible in other files. +Functions may be declared +.B +static\fR. +.R +.nr Hu 1 +.NH 1 +Compiler Control Lines +.PP +The C compiler contains a preprocessor capable +of macro substitution, conditional compilation, +and inclusion of named files. +Lines beginning with +.B +# +.R +communicate +with this preprocessor. +There may be any number of blanks and horizontal tabs +between the \fB#\fR and the directive. +These lines have syntax independent of the rest of the language; +they may appear anywhere and have effect which lasts (independent of +scope) until the end of the source program file. +.nr Hu 1 +.NH 2 +Token Replacement +.PP +A compiler-control line of the form +.DS +\fB#define \fIidentifier token-string\v'0.5'\s-2opt\s0\v'-0.5'\fR +.DE +.LP +causes the preprocessor to replace subsequent instances +of the identifier with the given string of tokens. +Semicolons in or at the end of the token-string are part of that string. +A line of the form +.DS +\fB#define \fIidentifier(identifier, ... )token-string\v'0.5'\s-2opt\s0\v'-0.5'\fR +.DE +.LP +where there is no space between the first identifier +and the +\fB(\fR, +is a macro definition with arguments. +There may be zero or more formal parameters. +Subsequent instances of the first identifier followed +by a +\fB(\fR, +a sequence of tokens delimited by commas, and a +\fB)\fR +are replaced +by the token string in the definition. +Each occurrence of an identifier mentioned in the formal parameter list +of the definition is replaced by the corresponding token string from the call. +The actual arguments in the call are token strings separated by commas; +however, commas in quoted strings or protected by +parentheses do not separate arguments. +The number of formal and actual parameters must be the same. +Strings and character constants in the token-string are scanned +for formal parameters, but +strings and character constants in the rest of the program are +not scanned for defined identifiers +to replacement. +.PP +In both forms the replacement string is rescanned for more +defined identifiers. +In both forms +a long definition may be continued on another line +by writing +.B +\e +.R +at the end of the line to be continued. +.PP +This facility is most valuable for definition of ``manifest constants,'' +as in +.DS +\fB#define TABSIZE 100 +.sp +int table\|[\|TABSIZE\|]\|;\fR +.DE +.PP +A control line of the form +.DS +\fB#undef \fIidentifier\fR +.DE +.LP +causes the +identifier's preprocessor definition (if any) to be forgotten. +.PP +If a \fB#define\fRd identifier is the subject of a subsequent +\fB#define\fR with no intervening \fB#undef\fR, then +the two token-strings are compared textually. +If the two token-strings are not identical +(all white space is considered as equivalent), then +the identifier is considered to be redefined. +.nr Hu 1 +.NH 2 +File Inclusion +.PP +A compiler control line of +the form +.DS +\fB#include\fI "filename\|\fR" +.DE +.LP +causes the replacement of that +line by the entire contents of the file +.I +filename\fR. +.R +The named file is searched for first in the directory +of the file containing the \fB#include\fR, +and then in a sequence of specified or standard places. +Alternatively, a control line of the form +.DS +\fB#include\fI <filename\|\fR> +.DE +.LP +searches only the specified or standard places +and not the directory of the \fB#include\fR. +(How the places are specified is not part of the language.) +.PP +\fB#include\fRs +may be nested. +.nr Hu 1 +.NH 2 +Conditional Compilation +.PP +A compiler control line of the form +.DS +\fB#if \fIrestricted-constant-expression\fR +.DE +.LP +checks whether the restricted-constant expression evaluates to nonzero. +(Constant expressions are discussed in ``CONSTANT EXPRESSIONS''; +the following additional restrictions apply here: +the constant expression may not contain +.B +sizeof +.R +casts, or an enumeration constant.) +.PP +A restricted constant expression may also contain the +additional unary expression +.PP +\fBdefined \fIidentifier\fR +.LP +or +.PP +\fBdefined( \fIidentifier )\fR +.LP +which evaluates to one if the identifier is currently +defined in the preprocessor and zero if it is not. +.PP +All currently defined identifiers in restricted-constant-expressions +are replaced by their token-strings (except those identifiers +modified by \fBdefined\fR) just as in normal text. +The restricted constant expression will be evaluated only +after all expressions have finished. +During this evaluation, all undefined (to the procedure) +identifiers evaluate to zero. +.PP +A control line of the form +.DS +\fB#ifdef \fIidentifier\fR +.DE +.LP +checks whether the identifier is currently defined +in the preprocessor; i.e., whether it has been the +subject of a +.B +#define +.R +control line. +It is equivalent to \fB#ifdef(\fIidentifier\fB)\fR. +A control line of the form +.DS +\fB#ifndef \fIidentifier\fR +.DE +.LP +checks whether the identifier is currently undefined +in the preprocessor. +It is equivalent to +.DS +\fB#if !\|defined(\fIidentifier\fB)\fR. +.DE +.PP +All three forms are followed by an arbitrary number of lines, +possibly containing a control line +.DS +\fB#else\fR +.DE +.LP +and then by a control line +.DS +\fB#endif\fR +.DE +.PP +If the checked condition is true, +then any lines +between +.B +#else +.R +and +.B +#endif +.R +are ignored. +If the checked condition is false, then any lines between +the test and a +.B +#else +.R +or, lacking a +\fB#else\fR, +the +.B +#endif +.R +are ignored. +.PP +These constructions may be nested. +.nr Hu 1 +.NH 2 +Line Control +.PP +For the benefit of other preprocessors which generate C programs, +a line of the form +.DS +\fB#line \fIconstant "filename\fR" +.DE +.LP +causes the compiler to believe, for purposes of error +diagnostics, +that the line number of the next source line is given by the constant and the current input +file is named by "\fIfilename\fR". +If "\fIfilename\fR" is absent, the remembered file name does not change. +.nr Hu 1 +.NH 1 +Implicit Declarations +.PP +It is not always necessary to specify +both the storage class and the type +of identifiers in a declaration. +The storage class is supplied by +the context in external definitions +and in declarations of formal parameters +and structure members. +In a declaration inside a function, +if a storage class but no type +is given, the identifier is assumed +to be +\fBint\fR; +if a type but no storage class is indicated, +the identifier is assumed to +be +.B +auto\fR. +.R +An exception to the latter rule is made for +functions because +.B +auto +.R +functions do not exist. +If the type of an identifier is ``function returning .\|.\|.\|,'' +it is implicitly declared to be +.B +extern\fR. +.R +.PP +In an expression, an identifier +followed by +.B +( +.R +and not already declared +is contextually +declared to be ``function returning +.B +int\fR.'' +.nr Hu 1 +.NH 1 +Types Revisited +.PP +This part summarizes the operations +which can be performed on objects of certain types. +.nr Hu 1 +.NH 2 +Structures and Unions +.PP +Structures and unions may be assigned, passed as arguments to functions, +and returned by functions. +Other plausible operators, such as equality comparison +and structure casts, +are not implemented. +.PP +In a reference +to a structure or union member, the +name on the right +of the \fB->\fR or the \fB.\fR +must specify a member of the aggregate +named or pointed to by the expression +on the left. +In general, a member of a union may not be inspected +unless the value of the union has been assigned using that same member. +However, one special guarantee is made by the language in order +to simplify the use of unions: +if a union contains several structures that share a common initial sequence +and if the union currently contains one of these structures, +it is permitted to inspect the common initial part of any of +the contained structures. +For example, the following is a legal fragment: +.DS +\fBunion +{ + struct + { + int type; + } n; + struct + { + int type; + int intnode; + } ni; + struct + { + int type; + float floatnode; + } nf; +} u; +\&... +u.nf.type = FLOAT; +u.nf.floatnode = 3.14; +\&... +if (u.n.type == FLOAT) + ... sin(u.nf.floatnode) ...\fR +.DE +.PP +.nr Hu 1 +.NH 2 +Functions +.PP +There are only two things that +can be done with a function \fBm\fR, +call it or take its address. +If the name of a function appears in an +expression not in the function-name position of a call, +a pointer to the function is generated. +Thus, to pass one function to another, one +might say +.DS +\fBint f(); +\&... +g(f);\fR +.DE +.PP +.ne 8 +Then the definition of +.B +g +.R +might read +.DS +\fBg(funcp) + int (\(**funcp)(); +{ + ... + (\(**funcp)(); + ... +}\fR +.DE +.PP +Notice that +.B +f +.R +must be declared +explicitly in the calling routine since its appearance +in +.B +g(f) +.R +was not followed by +.B +(. +.R +.nr Hu 1 +.NH 2 +Arrays, Pointers, and Subscripting +.PP +Every time an identifier of array type appears +in an expression, it is converted into a pointer +to the first member of the array. +Because of this conversion, arrays are not +lvalues. +By definition, the subscript operator +.B +[] +.R +is interpreted +in such a way that +.B +E1[E2] +.R +is identical to +.B +\(**((E1)\(plE2))\fR. +.R +Because of the conversion rules +which apply to +\fB\(pl\fR, +if +.B +E1 +.R +is an array and +.B +E2 +.R +an integer, +then +.B +E1[E2] +.R +refers to the +.B +E2-th +.R +member of +.B +E1\fR. +.R +Therefore, +despite its asymmetric +appearance, subscripting is a commutative operation. +.PP +A consistent rule is followed in the case of +multidimensional arrays. +If +.B +E +.R +is an +\fIn\fR-dimensional +array +of rank +i\(muj\(mu...\(muk, +then +.B +E +.R +appearing in an expression is converted to +a pointer to an (n-1)-dimensional +array with rank +j\(mu...\(muk. +If the +.B +\(** +.R +operator, either explicitly +or implicitly as a result of subscripting, +is applied to this pointer, +the result is the pointed-to (n-1)-dimensional array, +which itself is immediately converted into a pointer. +.PP +For example, consider +.DS +\fBint x[3][5];\fR +.DE +.PP +Here +.B +x +.R +is a 3\(mu5 array of integers. +When +.B +x +.R +appears in an expression, it is converted +to a pointer to (the first of three) 5-membered arrays of integers. +In the expression +\fBx[i]\fR, +which is equivalent to +\fB\(**(x\(pli)\fR, +.B +x +.R +is first converted to a pointer as described; +then +.B +i +.R +is converted to the type of +\fBx\fR, +which involves multiplying +.B +i +.R +by the +length the object to which the pointer points, +namely 5-integer objects. +The results are added and indirection applied to +yield an array (of five integers) which in turn is converted to +a pointer to the first of the integers. +If there is another subscript, the same argument applies +again; this time the result is an integer. +.PP +Arrays in C are stored +row-wise (last subscript varies fastest) +and the first subscript in the declaration helps determine +the amount of storage consumed by an array. +Arrays play no other part in subscript calculations. +.nr Hu 1 +.NH 2 +Explicit Pointer Conversions +.PP +Certain conversions involving pointers are permitted +but have implementation-dependent aspects. +They are all specified by means of an explicit type-conversion +operator, see ``Unary Operators'' under``EXPRESSIONS'' and +``Type Names''under ``DECLARATIONS.'' +.PP +A pointer may be converted to any of the integral types large +enough to hold it. +Whether an +.B +int +.R +or +.B +long +.R +is required is machine dependent. +The mapping function is also machine dependent but is intended +to be unsurprising to those who know the addressing structure +of the machine. +Details for some particular machines are given below. +.PP +An object of integral type may be explicitly converted to a pointer. +The mapping always carries an integer converted from a pointer back to the same pointer +but is otherwise machine dependent. +.PP +A pointer to one type may be converted to a pointer to another type. +The resulting pointer may cause addressing exceptions +upon use if +the subject pointer does not refer to an object suitably aligned in storage. +It is guaranteed that +a pointer to an object of a given size may be converted to a pointer to an object +of a smaller size +and back again without change. +.PP +For example, +a storage-allocation routine +might accept a size (in bytes) +of an object to allocate, and return a +.B +char +.R +pointer; +it might be used in this way. +.DS +\fBextern char \(**malloc(); +double \(**dp; +.sp +dp = (double \(**) malloc(sizeof(double)); +\(**dp = 22.0 / 7.0;\fR +.DE +.PP +The +.B +alloc +.R +must ensure (in a machine-dependent way) +that its return value is suitable for conversion to a pointer to +\fBdouble\fR; +then the +.I +use +.R +of the function is portable. +.PP +The pointer +representation on the +PDP-11 +corresponds to a 16-bit integer and +measures bytes. +The +.B +char\fR's +have no alignment requirements; everything else must have an even address. +.PP +On the +VAX-11, +pointers are 32 bits long and measure bytes. +Elementary objects are aligned on a boundary equal to their +length, except that +.B +double +.R +quantities need be aligned only on even 4-byte boundaries. +Aggregates are aligned on the strictest boundary required by +any of their constituents. +.PP +The 3B 20 computer has 24-bit pointers placed into 32-bit quantities. +Most objects are +aligned on 4-byte boundaries. \fBShort\fRs are aligned in all cases on +2-byte boundaries. Arrays of characters, all structures, +\fBint\fR\^s, \fBlong\fR\^s, \fBfloat\fR\^s, and \fBdouble\fR\^s are aligned on 4-byte +boundaries; but structure members may be packed tighter. +.nr Hu 1 +.NH 2 +CONSTANT EXPRESSIONS +.PP +In several places C requires expressions that evaluate to +a constant: +after +\fBcase\fR, +as array bounds, and in initializers. +In the first two cases, the expression can +involve only integer constants, character constants, +casts to integral types, +enumeration constants, +and +.B +sizeof +.R +expressions, possibly +connected by the binary operators +.ne 10 +.DS +\(pl \(mi \(** / % & | ^ << >> == != < > <= >= && || +.DE +.LP +or by the unary operators +.DS +\(mi \s+2~\s0 +.DE +.LP +or by the ternary operator +.DS +?: +.DE +.PP +Parentheses can be used for grouping +but not for function calls. +.PP +More latitude is permitted for initializers; +besides constant expressions as discussed above, +one can also use floating constants +and arbitrary casts and +can also apply the unary +.B +& +.R +operator to external or static objects +and to external or static arrays subscripted +with a constant expression. +The unary +.B +& +.R +can also +be applied implicitly +by appearance of unsubscripted arrays and functions. +The basic rule is that initializers must +evaluate either to a constant or to the address +of a previously declared external or static object plus or minus a constant. +.nr Hu 1 +.NH 1 +Portability Considerations +.PP +Certain parts of C are inherently machine dependent. +The following list of potential trouble spots +is not meant to be all-inclusive +but to point out the main ones. +.PP +Purely hardware issues like +word size and the properties of floating point arithmetic and integer division +have proven in practice to be not much of a problem. +Other facets of the hardware are reflected +in differing implementations. +Some of these, +particularly sign extension +(converting a negative character into a negative integer) +and the order in which bytes are placed in a word, +are nuisances that must be carefully watched. +Most of the others are only minor problems. +.PP +The number of +.B +register +.R +variables that can actually be placed in registers +varies from machine to machine +as does the set of valid types. +Nonetheless, the compilers all do things properly for their own machine; +excess or invalid +.B +register +.R +declarations are ignored. +.PP +Some difficulties arise only when +dubious coding practices are used. +It is exceedingly unwise to write programs +that depend +on any of these properties. +.PP +The order of evaluation of function arguments +is not specified by the language. +The order in which side effects take place +is also unspecified. +.PP +Since character constants are really objects of type +\fBint\fR, +multicharacter character constants may be permitted. +The specific implementation +is very machine dependent +because the order in which characters +are assigned to a word +varies from one machine to another. +.PP +Fields are assigned to words and characters to integers right to left +on some machines +and left to right on other machines. +These differences are invisible to isolated programs +that do not indulge in type punning (e.g., +by converting an +.B +int +.R +pointer to a +.B +char +.R +pointer and inspecting the pointed-to storage) +but must be accounted for when conforming to externally-imposed +storage layouts. +.nr Hu 1 +.NH 1 +Syntax Summary +.PP +This summary of C syntax is intended more for aiding comprehension +than as an exact statement of the language. +.nr Hu 1 +.ne 18 +.NH 2 +Expressions +.PP +The basic expressions are: +.tr ~~ +.DS + \fIexpression: + primary + \(** expression\fR + &\fIlvalue + \(mi expression + ! expression + \s+2~\s0 expression + \(pl\(pl lvalue + \(mi\(milvalue + lvalue \(pl\(pl + lvalue \(mi\(mi + \fBsizeof\fI expression + \fBsizeof (\fItype-name\fB)\fI + ( type-name ) expression + expression binop expression + expression ? expression : expression + lvalue asgnop expression + expression , expression +.DE +.DS + \fIprimary: + identifier + constant + string + ( expression ) + primary ( expression-list\v'0.5'\s-2opt\s0\v'-0.5' ) + primary [ expression ] + primary . identifier + primary \(mi identifier +.DE +.DS + \fIlvalue: + identifier + primary [ expression ] + lvalue . identifier + primary \(mi identifier + \(** expression + ( lvalue )\fR +.DE +.PP +.PP +The primary-expression operators +.DS + () [] . \(mi +.tr ~~ +.DE +.LP +have highest priority and group left to right. +The unary operators +.DS + \(** & \(mi ! \s+2~\s0 \(pl\(pl \(mi\(mi \fBsizeof\fI ( type-name \fR) +.DE +.LP +have priority below the primary operators +but higher than any binary operator +and group right to left. +Binary operators +group left to right; they have priority +decreasing +as indicated below. +.DS + \fIbinop:\fR + \(** / % + \(pl \(mi + >> << + < > <= >= + == != + & + ^ + | + && + || +.DE +The conditional operator groups right to left. +.PP +Assignment operators all have the same +priority and all group right to left. +.DS + \fIasgnop:\fR + = \(pl= \(mi= \(**= /= %= >>= <<= &= ^= |= +.DE +.PP +The comma operator has the lowest priority and groups left to right. +.nr Hu 1 +.NH 2 +Declarations +.PP +.DS + \fIdeclaration: + decl-specifiers init-declarator-list\v'0.5'\s-2opt\s0\v'-0.5' ; +.DE +.DS + \fIdecl-specifiers: + type-specifier decl-specifiers\v'0.5'\s-2opt\s0\v'-0.5' + sc-specifier decl-specifiers\v'0.5'\s-2opt\s0\v'-0.5' +.DE +.DS + \fIsc-specifier:\fB + auto + static + extern + register + typedef +.DE +.DS + \fItype-specifier: + struct-or-union-specifier + typedef-name + enum-specifier + basic-type-specifier: + basic-type + basic-type basic-type-specifiers + basic-type:\fB + char + short + int + long + unsigned + float + double + void\fR +.DE +.DS +\fIenum-specifier:\fB + enum\fI { enum-list }\fB + enum \fIidentifier { enum-list }\fB + enum \fIidentifier +.DE +.DS + \fIenum-list: + enumerator + enum-list , enumerator +.DE +.DS + \fIenumerator: + identifier + identifier = constant-expression +.DE +.DS + \fIinit-declarator-list: + init-declarator + init-declarator , init-declarator-list +.DE +.DS + \fIinit-declarator: + declarator initializer\v'0.5'\s-2opt\s0\v'-0.5' +.DE +.DS + \fIdeclarator: + identifier + ( declarator ) + \(** declarator + declarator () + declarator [ constant-expression\v'0.5'\s-2opt\s0\v'-0.5' ] +.DE +.DS + \fIstruct-or-union-specifier:\fB + struct\fI { struct-decl-list }\fB + struct \fIidentifier { struct-decl-list }\fB + struct \fIidentifier\fB + union { \fIstruct-decl-list }\fB + union \fIidentifier { struct-decl-list }\fB + union \fIidentifier +.DE +.DS + \fIstruct-decl-list: + struct-declaration + struct-declaration struct-decl-list +.DE +.DS + \fIstruct-declaration: + type-specifier struct-declarator-list ; +.DE +.DS + \fIstruct-declarator-list: + struct-declarator + struct-declarator , struct-declarator-list +.DE +.DS + \fIstruct-declarator: + declarator + declarator : constant-expression + : constant-expression +.DE +.DS + \fIinitializer: + = expression + = { initializer-list } + = { initializer-list , } +.DE +.DS + \fIinitializer-list: + expression + initializer-list , initializer-list + { initializer-list } + { initializer-list , } +.DE +.DS + \fItype-name: + type-specifier abstract-declarator +.DE +.DS + \fIabstract-declarator: + empty + ( abstract-declarator ) + \(** abstract-declarator + abstract-declarator () + abstract-declarator [ constant-expression\v'0.5'\s-2opt\s0\v'-0.5' ] +.DE +.DS + \fItypedef-name: + identifier +.nr Hu 1 +.DE +.NH 2 +Statements +.PP +.DS + \fIcompound-statement: + { declaration-list\v'0.5'\s-2opt\s0\v'-0.5' statement-list\v'0.5'\s-2opt\s0\v'-0.5' } +.DE +.DS + \fIdeclaration-list: + declaration + declaration declaration-list +.DE +.DS + \fIstatement-list: + statement + statement statement-list +.DE +.DS + \fIstatement: + compound-statement + expression ; + \fBif\fI ( expression ) statement + \fBif\fI ( expression ) statement \fBelse\fI statement + \fBwhile\fI ( expression ) statement + \fBdo\fI statement \fBwhile\fI ( expression ) ; + \fBfor\fI (exp\v'0.3'\s-2opt\s0\v'-0.3'\fB;\fIexp\v'0.3'\s-2opt\s0\v'-0.3'\fB;\fIexp\v'0.3'\s-2opt\s0\v'-0.3'\fI) statement + \fBswitch\fI ( expression ) statement + \fBcase\fI constant-expression : statement + \fBdefault\fI : statement + \fBbreak ; + continue ; + return ; + return\fI expression ; + \fBgoto\fI identifier ; + identifier : statement + ;\fR +.nr Hu 1 +.DE +.NH 2 +External definitions +.PP +.DS + \fIprogram: + external-definition + external-definition program +.DE +.DS + \fIexternal-definition: + function-definition + data-definition +.DE +.DS + \fIfunction-definition: + decl-specifier\v'0.5'\s-2opt\s0\v'-0.5' function-declarator function-body +.DE +.DS + \fIfunction-declarator: + declarator ( parameter-list\v'0.5'\s-2opt\s0\v'-0.5' ) +.DE +.DS + \fIparameter-list: + identifier + identifier , parameter-list +.DE +.DS + \fIfunction-body: + declaration-list\v'0.5'\s-2opt\s0\v'-0.5' compound-statement +.DE +.DS + \fIdata-definition: + \fBextern\fI declaration\fB ; + \fBstatic\fI declaration\fB ; +.DE +.NH +Preprocessor +.DS + \fB#define\fI identifier token-string\v'0.3'\s-2opt\s0\v'-0.3'\fB + \fB#define\fI identifier\fB(\fIidentifier\fB,...)\fItoken-string\v'0.5'\s-2opt\s0\v'-0.5'\fB + \fB#undef\fI identifier\fB + \fB#include "\fIfilename\|\fB" + #include <\fIfilename\|\fB> + \fB#if\fI restricted-constant-expression\fB + \fB#ifdef\fI identifier\fB + \fB#ifndef\fI identifier\fB + \fB#else + \fB#endif + \fB#line\fI constant \fB"\fIfilename\|\fB" +.sp 5 +.DE +.\" .TC 2 1 3 0 diff --git a/share/doc/psd/06.Clang/Makefile b/share/doc/psd/06.Clang/Makefile new file mode 100644 index 0000000..877a97c --- /dev/null +++ b/share/doc/psd/06.Clang/Makefile @@ -0,0 +1,9 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/06.Clang +SRCS= Clang.ms +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/psd/12.make/Makefile b/share/doc/psd/12.make/Makefile new file mode 100644 index 0000000..fdc38a7 --- /dev/null +++ b/share/doc/psd/12.make/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +VOLUME= psd/12.make +SRCS= stubs tutorial.ms +MACROS= -ms +SRCDIR= ${.CURDIR}/../../../../usr.bin/make/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/13.rcs/Makefile b/share/doc/psd/13.rcs/Makefile new file mode 100644 index 0000000..ed497da --- /dev/null +++ b/share/doc/psd/13.rcs/Makefile @@ -0,0 +1,6 @@ +# $FreeBSD$ + +SUBDIR= rcs rcs_func + +.include <bsd.subdir.mk> + diff --git a/share/doc/psd/13.rcs/Makefile.inc b/share/doc/psd/13.rcs/Makefile.inc new file mode 100644 index 0000000..666dbd8 --- /dev/null +++ b/share/doc/psd/13.rcs/Makefile.inc @@ -0,0 +1,5 @@ +# $FreeBSD$ + +VOLUME= psd/13.rcs +MACROS= -ms +SRCDIR= ${.CURDIR}/../../../../../gnu/usr.bin/rcs/doc diff --git a/share/doc/psd/13.rcs/rcs/Makefile b/share/doc/psd/13.rcs/rcs/Makefile new file mode 100644 index 0000000..6d94aed --- /dev/null +++ b/share/doc/psd/13.rcs/rcs/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +SRCS= rcs.ms +USE_PIC= +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/psd/13.rcs/rcs_func/Makefile b/share/doc/psd/13.rcs/rcs_func/Makefile new file mode 100644 index 0000000..09e5a9b --- /dev/null +++ b/share/doc/psd/13.rcs/rcs_func/Makefile @@ -0,0 +1,6 @@ +# $FreeBSD$ + +DOC= rcs_func +SRCS= rcs_func.ms + +.include <bsd.doc.mk> diff --git a/share/doc/psd/15.yacc/Makefile b/share/doc/psd/15.yacc/Makefile new file mode 100644 index 0000000..4381c98 --- /dev/null +++ b/share/doc/psd/15.yacc/Makefile @@ -0,0 +1,16 @@ +# @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +VOLUME= psd/15.yacc +SRCS= stubs ss.. ss0 ss1 ss2 ss3 ss4 ss5 ss6 ss7 ss8 ss9 \ + ssA ssB ssa ssb ssc ssd +EXTRA= ref.bib +MACROS= -ms +USE_REFER= +CLEANFILES= stubs + +stubs: + @(echo .R1; echo database ${.CURDIR}/ref.bib; \ + echo accumulate; echo .R2) > ${.TARGET} + +.include <bsd.doc.mk> diff --git a/share/doc/psd/15.yacc/ref.bib b/share/doc/psd/15.yacc/ref.bib new file mode 100644 index 0000000..a1364f6 --- /dev/null +++ b/share/doc/psd/15.yacc/ref.bib @@ -0,0 +1,71 @@ +# $FreeBSD$ + +%T The C Programming Language +%A B. W. Kernighan +%A D. M. Ritchie +%I Prentice-Hall +%C Englewood Cliffs, New Jersey +%D 1978 + +%T LR Parsing +%A A. V. Aho +%A S. C. Johnson +%J Comp. Surveys +%V 6 +%N 2 +%P 99-124 +%D June 1974 + +%T Deterministic Parsing of Ambiguous Grammars +%A A. V. Aho +%A S. C. Johnson +%A J. D. Ullman +%J Comm. Assoc. Comp. Mach. +%K acm cacm +%V 18 +%N 8 +%P 441-452 +%D August 1975 + +%A A. V. Aho +%A J. D. Ullman +%T Principles of Compiler Design +%I Addison-Wesley +%C Reading, Mass. +%D 1977 + +%R Comp. Sci. Tech. Rep. No. 65 +%K CSTR +%A S. C. Johnson +%T Lint, a C Program Checker +%D December 1977 +%O updated version TM 78-1273-3 +%D 1978 + +%T A Portable Compiler: Theory and Practice +%A S. C. Johnson +%J Proc. 5th ACM Symp. on Principles of Programming Languages +%P 97-104 +%D January 1978 + +%K cstr +%R Comp. Sci. Tech. Rep. No. 17 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A B. W. Kernighan +%A L. L. Cherry +%T A System for Typesetting Mathematics +%d May 1974, revised April 1977 +%J Comm. Assoc. Comp. Mach. +%K acm cacm +%V 18 +%P 151-157 +%D March 1975 + +%K CSTR +%R Comp. Sci. Tech. Rep. No. 39 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A M. E. Lesk +%T Lex \(em A Lexical Analyzer Generator +%D October 1975 diff --git a/share/doc/psd/15.yacc/ss.. b/share/doc/psd/15.yacc/ss.. new file mode 100644 index 0000000..7dd9ea8 --- /dev/null +++ b/share/doc/psd/15.yacc/ss.. @@ -0,0 +1,94 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss.. 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.EH 'PSD:15-%''Yacc: Yet Another Compiler-Compiler' +.OH 'Yacc: Yet Another Compiler-Compiler''PSD:15-%' +.\".RP +.ND "July 31, 1978" +.TL +Yacc: +Yet Another Compiler-Compiler +.AU "MH 2C-559" 3968 +Stephen C. Johnson +AT&T Bell Laboratories +Murray Hill, New Jersey 07974 +.AI +.AB +.PP +Computer program input generally has some structure; +in fact, every computer program that does input can be thought of as defining +an ``input language'' which it accepts. +An input language may be as complex as a programming language, or as simple as +a sequence of numbers. +Unfortunately, usual input facilities +are limited, difficult to use, +and often are lax about checking their inputs for validity. +.PP +Yacc provides a general tool for describing +the input to a computer program. +The Yacc user specifies the structures +of his input, together with code to be invoked as +each such structure is recognized. +Yacc turns such a specification into a subroutine that +handles the input process; +frequently, it is convenient and appropriate to have most +of the flow of control in the user's application +handled by this subroutine. +.PP +The input subroutine produced by Yacc calls a user-supplied routine to +return the next basic input item. +Thus, the user can specify his input in terms of individual input characters, or +in terms of higher level constructs such as names and numbers. +The user-supplied routine may also handle idiomatic features such as +comment and continuation conventions, which typically defy easy grammatical specification. +.PP +Yacc is written in portable C. +The class of specifications accepted is a very general one: LALR(1) +grammars with disambiguating rules. +.PP +In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc +has also been used for less conventional languages, +including a phototypesetter language, several desk calculator languages, a document retrieval system, +and a Fortran debugging system. +.AE +.\" .OK +.\"Computer Languages +.\"Compilers +.\"Formal Language Theory +.\" .CS 23 11 34 0 0 8 diff --git a/share/doc/psd/15.yacc/ss0 b/share/doc/psd/15.yacc/ss0 new file mode 100644 index 0000000..223e90fb --- /dev/null +++ b/share/doc/psd/15.yacc/ss0 @@ -0,0 +1,238 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss0 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +0: Introduction +.PP +Yacc provides a general tool for imposing structure on the input to a computer program. +The Yacc user prepares a +specification of the input process; this includes rules +describing the input structure, code to be invoked when these +rules are recognized, and a low-level routine to do the +basic input. +Yacc then generates a function to control the input process. +This function, called a +.I parser , +calls the user-supplied low-level input routine +(the +.I "lexical analyzer" ) +to pick up the basic items +(called +.I tokens ) +from the input stream. +These tokens are organized according to the input structure rules, +called +.I "grammar rules" \|; +when one of these rules has been recognized, +then user code supplied for this rule, an +.I action , +is invoked; actions have the ability to return values and +make use of the values of other actions. +.PP +Yacc is written in a portable dialect of C +.[ +Ritchie Kernighan Language Prentice +.] +and the actions, and output subroutine, are in C as well. +Moreover, many of the syntactic conventions of Yacc follow C. +.PP +The heart of the input specification is a collection of grammar rules. +Each rule describes an allowable structure and gives it a name. +For example, one grammar rule might be +.DS +date : month\_name day \',\' year ; +.DE +Here, +.I date , +.I month\_name , +.I day , +and +.I year +represent structures of interest in the input process; +presumably, +.I month\_name , +.I day , +and +.I year +are defined elsewhere. +The comma ``,'' is enclosed in single quotes; this implies that the +comma is to appear literally in the input. +The colon and semicolon merely serve as punctuation in the rule, and have +no significance in controlling the input. +Thus, with proper definitions, the input +.DS +July 4, 1776 +.DE +might be matched by the above rule. +.PP +An important part of the input process is carried out by the +lexical analyzer. +This user routine reads the input stream, recognizing the lower level structures, +and communicates these tokens +to the parser. +For historical reasons, a structure recognized by the lexical analyzer is called a +.I "terminal symbol" , +while the structure recognized by the parser is called a +.I "nonterminal symbol" . +To avoid confusion, terminal symbols will usually be referred to as +.I tokens . +.PP +There is considerable leeway in deciding whether to recognize structures using the lexical +analyzer or grammar rules. +For example, the rules +.DS +month\_name : \'J\' \'a\' \'n\' ; +month\_name : \'F\' \'e\' \'b\' ; + + . . . + +month\_name : \'D\' \'e\' \'c\' ; +.DE +might be used in the above example. +The lexical analyzer would only need to recognize individual letters, and +.I month\_name +would be a nonterminal symbol. +Such low-level rules tend to waste time and space, and may +complicate the specification beyond Yacc's ability to deal with it. +Usually, the lexical analyzer would +recognize the month names, +and return an indication that a +.I month\_name +was seen; in this case, +.I month\_name +would be a token. +.PP +Literal characters such as ``,'' must also be passed through the lexical +analyzer, and are also considered tokens. +.PP +Specification files are very flexible. +It is realively easy to add to the above example the rule +.DS +date : month \'/\' day \'/\' year ; +.DE +allowing +.DS +7 / 4 / 1776 +.DE +as a synonym for +.DS +July 4, 1776 +.DE +In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, +and little danger of disrupting existing input. +.PP +The input being read may not conform to the +specifications. +These input errors are detected as early as is theoretically possible with a +left-to-right scan; +thus, not only is the chance of reading and computing with bad +input data substantially reduced, but the bad data can usually be quickly found. +Error handling, +provided as part of the input specifications, +permits the reentry of bad data, +or the continuation of the input process after skipping over the bad data. +.PP +In some cases, Yacc fails to produce a parser when given a set of +specifications. +For example, the specifications may be self contradictory, or they may +require a more powerful recognition mechanism than that available to Yacc. +The former cases represent design errors; +the latter cases +can often be corrected +by making +the lexical analyzer +more powerful, or by rewriting some of the grammar rules. +While Yacc cannot handle all possible specifications, its power +compares favorably with similar systems; +moreover, the +constructions which are difficult for Yacc to handle are +also frequently difficult for human beings to handle. +Some users have reported that the discipline of formulating valid +Yacc specifications for their input revealed errors of +conception or design early in the program development. +.PP +The theory underlying Yacc has been described elsewhere. +.[ +Aho Johnson Surveys LR Parsing +.] +.[ +Aho Johnson Ullman Ambiguous Grammars +.] +.[ +Aho Ullman Principles Compiler Design +.] +Yacc has been extensively used in numerous practical applications, +including +.I lint , +.[ +Johnson Lint +.] +the Portable C Compiler, +.[ +Johnson Portable Compiler Theory +.] +and a system for typesetting mathematics. +.[ +Kernighan Cherry typesetting system CACM +.] +.PP +The next several sections describe the +basic process of preparing a Yacc specification; +Section 1 describes the preparation of grammar rules, +Section 2 the preparation of the user supplied actions associated with these rules, +and Section 3 the preparation of lexical analyzers. +Section 4 describes the operation of the parser. +Section 5 discusses various reasons why Yacc may be unable to produce a +parser from a specification, and what to do about it. +Section 6 describes a simple mechanism for +handling operator precedences in arithmetic expressions. +Section 7 discusses error detection and recovery. +Section 8 discusses the operating environment and special features +of the parsers Yacc produces. +Section 9 gives some suggestions which should improve the +style and efficiency of the specifications. +Section 10 discusses some advanced topics, and Section 11 gives +acknowledgements. +Appendix A has a brief example, and Appendix B gives a +summary of the Yacc input syntax. +Appendix C gives an example using some of the more advanced +features of Yacc, and, finally, +Appendix D describes mechanisms and syntax +no longer actively supported, but +provided for historical continuity with older versions of Yacc. diff --git a/share/doc/psd/15.yacc/ss1 b/share/doc/psd/15.yacc/ss1 new file mode 100644 index 0000000..f9369fb --- /dev/null +++ b/share/doc/psd/15.yacc/ss1 @@ -0,0 +1,175 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss1 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.tr *\(** +.tr |\(or +.SH +1: Basic Specifications +.PP +Names refer to either tokens or nonterminal symbols. +Yacc requires +token names to be declared as such. +In addition, for reasons discussed in Section 3, it is often desirable +to include the lexical analyzer as part of the specification file; +it may be useful to include other programs as well. +Thus, every specification file consists of three sections: +the +.I declarations , +.I "(grammar) rules" , +and +.I programs . +The sections are separated by double percent ``%%'' marks. +(The percent ``%'' is generally used in Yacc specifications as an escape character.) +.PP +In other words, a full specification file looks like +.DS +declarations +%% +rules +%% +programs +.DE +.PP +The declaration section may be empty. +Moreover, if the programs section is omitted, the second %% mark may be omitted also; +thus, the smallest legal Yacc specification is +.DS +%% +rules +.DE +.PP +Blanks, tabs, and newlines are ignored except +that they may not appear in names or multi-character reserved symbols. +Comments may appear wherever a name is legal; they are enclosed +in /* . . . */, as in C and PL/I. +.PP +The rules section is made up of one or more grammar rules. +A grammar rule has the form: +.DS +A : BODY ; +.DE +A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals. +The colon and the semicolon are Yacc punctuation. +.PP +Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and +non-initial digits. +Upper and lower case letters are distinct. +The names used in the body of a grammar rule may represent tokens or nonterminal symbols. +.PP +A literal consists of a character enclosed in single quotes ``\'''. +As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes +are recognized. +Thus +.DS +\'\en\' newline +\'\er\' return +\'\e\'\' single quote ``\''' +\'\e\e\' backslash ``\e'' +\'\et\' tab +\'\eb\' backspace +\'\ef\' form feed +\'\exxx\' ``xxx'' in octal +.DE +For a number of technical reasons, the +\s-2NUL\s0 +character (\'\e0\' or 0) should never +be used in grammar rules. +.PP +If there are several grammar rules with the same left hand side, the vertical bar ``|'' +can be used to avoid rewriting the left hand side. +In addition, +the semicolon at the end of a rule can be dropped before a vertical bar. +Thus the grammar rules +.DS +A : B C D ; +A : E F ; +A : G ; +.DE +can be given to Yacc as +.DS +A : B C D + | E F + | G + ; +.DE +It is not necessary that all grammar rules with the same left side appear together in the grammar rules section, +although it makes the input much more readable, and easier to change. +.PP +If a nonterminal symbol matches the empty string, this can be indicated in the obvious way: +.DS +empty : ; +.DE +.PP +Names representing tokens must be declared; this is most simply done by writing +.DS +%token name1 name2 . . . +.DE +in the declarations section. +(See Sections 3 , 5, and 6 for much more discussion). +Every name not defined in the declarations section is assumed to represent a nonterminal symbol. +Every nonterminal symbol must appear on the left side of at least one rule. +.PP +Of all the nonterminal symbols, one, called the +.I "start symbol" , +has particular importance. +The parser is designed to recognize the start symbol; thus, +this symbol represents the largest, +most general structure described by the grammar rules. +By default, +the start symbol is taken to be the left hand side of the first +grammar rule in the rules section. +It is possible, and in fact desirable, to declare the start +symbol explicitly in the declarations section using the %start keyword: +.DS +%start symbol +.DE +.PP +The end of the input to the parser is signaled by a special token, called the +.I endmarker . +If the tokens up to, but not including, the endmarker form a structure +which matches the start symbol, the parser function returns to its caller +after the endmarker is seen; it +.I accepts +the input. +If the endmarker is seen in any other context, it is an error. +.PP +It is the job of the user-supplied lexical analyzer +to return the endmarker when appropriate; see section 3, below. +Usually the endmarker represents some reasonably obvious +I/O status, such as ``end-of-file'' or ``end-of-record''. diff --git a/share/doc/psd/15.yacc/ss2 b/share/doc/psd/15.yacc/ss2 new file mode 100644 index 0000000..f1fdb44 --- /dev/null +++ b/share/doc/psd/15.yacc/ss2 @@ -0,0 +1,190 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss2 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +2: Actions +.PP +With each grammar rule, the user may associate actions to be performed each time +the rule is recognized in the input process. +These actions may return values, and may obtain the values returned by previous +actions. +Moreover, the lexical analyzer can return values +for tokens, if desired. +.PP +An action is an arbitrary C statement, and as such can do +input and output, call subprograms, and alter +external vectors and variables. +An action is specified by +one or more statements, enclosed in curly braces ``{'' and ``}''. +For example, +.DS +A : \'(\' B \')\' + { hello( 1, "abc" ); } +.DE +and +.DS +XXX : YYY ZZZ + { printf("a message\en"); + flag = 25; } +.DE +are grammar rules with actions. +.PP +To facilitate easy communication between the actions and the parser, the action statements are altered +slightly. +The symbol ``dollar sign'' ``$'' is used as a signal to Yacc in this context. +.PP +To return a value, the action normally sets the +pseudo-variable ``$$'' to some value. +For example, an action that does nothing but return the value 1 is +.DS + { $$ = 1; } +.DE +.PP +To obtain the values returned by previous actions and the lexical analyzer, the +action may use the pseudo-variables $1, $2, . . ., +which refer to the values returned by the +components of the right side of a rule, reading from left to right. +Thus, if the rule is +.DS +A : B C D ; +.DE +for example, then $2 has the value returned by C, and $3 the value returned by D. +.PP +As a more concrete example, consider the rule +.DS +expr : \'(\' expr \')\' ; +.DE +The value returned by this rule is usually the value of the +.I expr +in parentheses. +This can be indicated by +.DS +expr : \'(\' expr \')\' { $$ = $2 ; } +.DE +.PP +By default, the value of a rule is the value of the first element in it ($1). +Thus, grammar rules of the form +.DS +A : B ; +.DE +frequently need not have an explicit action. +.PP +In the examples above, all the actions came at the end of their rules. +Sometimes, it is desirable to get control before a rule is fully parsed. +Yacc permits an action to be written in the middle of a rule as well +as at the end. +This rule is assumed to return a value, accessible +.\" XXX What does this mean? Nobody seems to understand it. +.\" through the usual \$ mechanism by the actions to +through the usual mechanism by the actions to +the right of it. +In turn, it may access the values +returned by the symbols to its left. +Thus, in the rule +.DS +A : B + { $$ = 1; } + C + { x = $2; y = $3; } + ; +.DE +the effect is to set +.I x +to 1, and +.I y +to the value returned by C. +.PP +Actions that do not terminate a rule are actually +handled by Yacc by manufacturing a new nonterminal +symbol name, and a new rule matching this +name to the empty string. +The interior action is the action triggered off by recognizing +this added rule. +Yacc actually treats the above example as if +it had been written: +.DS +$ACT : /* empty */ + { $$ = 1; } + ; + +A : B $ACT C + { x = $2; y = $3; } + ; +.DE +.PP +In many applications, output is not done directly by the actions; +rather, a data structure, such as a parse tree, is constructed in memory, +and transformations are applied to it before output is generated. +Parse trees are particularly easy to +construct, given routines to build and maintain the tree +structure desired. +For example, suppose there is a C function +.I node , +written so that the call +.DS +node( L, n1, n2 ) +.DE +creates a node with label L, and descendants n1 and n2, and returns the index of +the newly created node. +Then parse tree can be built by supplying actions such as: +.DS +expr : expr \'+\' expr + { $$ = node( \'+\', $1, $3 ); } +.DE +in the specification. +.PP +The user may define other variables to be used by the actions. +Declarations and definitions can appear in +the declarations section, +enclosed in the marks ``%{'' and ``%}''. +These declarations and definitions have global scope, +so they are known to the action statements and the lexical analyzer. +For example, +.DS +%{ int variable = 0; %} +.DE +could be placed in the declarations section, +making +.I variable +accessible to all of the actions. +The Yacc parser uses only names beginning in ``yy''; +the user should avoid such names. +.PP +In these examples, all the values are integers: a discussion of +values of other types will be found in Section 10. diff --git a/share/doc/psd/15.yacc/ss3 b/share/doc/psd/15.yacc/ss3 new file mode 100644 index 0000000..fa06acb --- /dev/null +++ b/share/doc/psd/15.yacc/ss3 @@ -0,0 +1,141 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss3 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +3: Lexical Analysis +.PP +The user must supply a lexical analyzer to read the input stream and communicate tokens +(with values, if desired) to the parser. +The lexical analyzer is an integer-valued function called +.I yylex . +The function returns an integer, the +.I "token number" , +representing the kind of token read. +If there is a value associated with that token, it should be assigned +to the external variable +.I yylval . +.PP +The parser and the lexical analyzer must agree on these token numbers in order for +communication between them to take place. +The numbers may be chosen by Yacc, or chosen by the user. +In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer +to return these numbers symbolically. +For example, suppose that the token name DIGIT has been defined in the declarations section of the +Yacc specification file. +The relevant portion of the lexical analyzer might look like: +.DS +yylex(){ + extern int yylval; + int c; + . . . + c = getchar(); + . . . + switch( c ) { + . . . + case \'0\': + case \'1\': + . . . + case \'9\': + yylval = c\-\'0\'; + return( DIGIT ); + . . . + } + . . . +.DE +.PP +The intent is to return a token number of DIGIT, and a value equal to the numerical value of the +digit. +Provided that the lexical analyzer code is placed in the programs section of the specification file, +the identifier DIGIT will be defined as the token number associated +with the token DIGIT. +.PP +This mechanism leads to clear, +easily modified lexical analyzers; the only pitfall is the need +to avoid using any token names in the grammar that are reserved +or significant in C or the parser; for example, the use of +token names +.I if +or +.I while +will almost certainly cause severe +difficulties when the lexical analyzer is compiled. +The token name +.I error +is reserved for error handling, and should not be used naively +(see Section 7). +.PP +As mentioned above, the token numbers may be chosen by Yacc or by the user. +In the default situation, the numbers are chosen by Yacc. +The default token number for a literal +character is the numerical value of the character in the local character set. +Other names are assigned token numbers +starting at 257. +.PP +To assign a token number to a token (including literals), +the first appearance of the token name or literal +.I +in the declarations section +.R +can be immediately followed by +a nonnegative integer. +This integer is taken to be the token number of the name or literal. +Names and literals not defined by this mechanism retain their default definition. +It is important that all token numbers be distinct. +.PP +For historical reasons, the endmarker must have token +number 0 or negative. +This token number cannot be redefined by the user; thus, all +lexical analyzers should be prepared to return 0 or negative as a token number +upon reaching the end of their input. +.PP +A very useful tool for constructing lexical analyzers is +the +.I Lex +program developed by Mike Lesk. +.[ +Lesk Lex +.] +These lexical analyzers are designed to work in close +harmony with Yacc parsers. +The specifications for these lexical analyzers +use regular expressions instead of grammar rules. +Lex can be easily used to produce quite complicated lexical analyzers, +but there remain some languages (such as FORTRAN) which do not +fit any theoretical framework, and whose lexical analyzers +must be crafted by hand. diff --git a/share/doc/psd/15.yacc/ss4 b/share/doc/psd/15.yacc/ss4 new file mode 100644 index 0000000..e548d53 --- /dev/null +++ b/share/doc/psd/15.yacc/ss4 @@ -0,0 +1,367 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss4 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +4: How the Parser Works +.PP +Yacc turns the specification file into a C program, which +parses the input according to the specification given. +The algorithm used to go from the +specification to the parser is complex, and will not be discussed +here (see +the references for more information). +The parser itself, however, is relatively simple, +and understanding how it works, while +not strictly necessary, will nevertheless make +treatment of error recovery and ambiguities much more +comprehensible. +.PP +The parser produced by Yacc consists +of a finite state machine with a stack. +The parser is also capable of reading and remembering the next +input token (called the +.I lookahead +token). +The +.I "current state" +is always the one on the top of the stack. +The states of the finite state machine are given +small integer labels; initially, the machine is in state 0, +the stack contains only state 0, and no lookahead token has been read. +.PP +The machine has only four actions available to it, called +.I shift , +.I reduce , +.I accept , +and +.I error . +A move of the parser is done as follows: +.IP 1. +Based on its current state, the parser decides +whether it needs a lookahead token to decide +what action should be done; if it needs one, and does +not have one, it calls +.I yylex +to obtain the next token. +.IP 2. +Using the current state, and the lookahead token +if needed, the parser decides on its next action, and +carries it out. +This may result in states being pushed onto the stack, or popped off of +the stack, and in the lookahead token being processed +or left alone. +.PP +The +.I shift +action is the most common action the parser takes. +Whenever a shift action is taken, there is always +a lookahead token. +For example, in state 56 there may be +an action: +.DS + IF shift 34 +.DE +which says, in state 56, if the lookahead token is IF, +the current state (56) is pushed down on the stack, +and state 34 becomes the current state (on the +top of the stack). +The lookahead token is cleared. +.PP +The +.I reduce +action keeps the stack from growing without +bounds. +Reduce actions are appropriate when the parser has seen +the right hand side of a grammar rule, +and is prepared to announce that it has seen +an instance of the rule, replacing the right hand side +by the left hand side. +It may be necessary to consult the lookahead token +to decide whether to reduce, but +usually it is not; in fact, the default +action (represented by a ``.'') is often a reduce action. +.PP +Reduce actions are associated with individual grammar rules. +Grammar rules are also given small integer +numbers, leading to some confusion. +The action +.DS + \fB.\fR reduce 18 +.DE +refers to +.I "grammar rule" +18, while the action +.DS + IF shift 34 +.DE +refers to +.I state +34. +.PP +Suppose the rule being reduced is +.DS +A \fB:\fR x y z ; +.DE +The reduce action depends on the +left hand symbol (A in this case), and the number of +symbols on the right hand side (three in this case). +To reduce, first pop off the top three states +from the stack +(In general, the number of states popped equals the number of symbols on the +right side of the rule). +In effect, these states were the ones +put on the stack while recognizing +.I x , +.I y , +and +.I z , +and no longer serve any useful purpose. +After popping these states, a state is uncovered +which was the state the parser was in before beginning to +process the rule. +Using this uncovered state, and the symbol +on the left side of the rule, perform what is in +effect a shift of A. +A new state is obtained, pushed onto the stack, and parsing continues. +There are significant differences between the processing of +the left hand symbol and an ordinary shift of a token, +however, so this action is called a +.I goto +action. +In particular, the lookahead token is cleared by a shift, and +is not affected by a goto. +In any case, the uncovered state contains an entry such as: +.DS + A goto 20 +.DE +causing state 20 to be pushed +onto the stack, and become the current state. +.PP +In effect, the reduce action ``turns back the clock'' in the parse, +popping the states off the stack to go back to the +state where the right hand side of the rule was first seen. +The parser then behaves as if it had seen the left side at that time. +If the right hand side of the rule is empty, +no states are popped off of the stack: the uncovered state +is in fact the current state. +.PP +The reduce action is also important in the treatment of user-supplied +actions and values. +When a rule is reduced, the code supplied with the rule is executed +before the stack is adjusted. +In addition to the stack holding the states, another stack, +running in parallel with it, holds the values returned +from the lexical analyzer and the actions. +When a shift takes place, the external variable +.I yylval +is copied onto the value stack. +After the return from the user code, the reduction is carried out. +When the +.I goto +action is done, the external variable +.I yyval +is copied onto the value stack. +The pseudo-variables $1, $2, etc., refer to the value stack. +.PP +The other two parser actions are conceptually much simpler. +The +.I accept +action indicates that the entire input has been seen and +that it matches the specification. +This action appears only when the lookahead token is +the endmarker, and indicates that the parser has successfully +done its job. +The +.I error +action, on the other hand, represents a place where the parser +can no longer continue parsing according to the specification. +The input tokens it has seen, together with the lookahead token, +cannot be followed by anything that would result +in a legal input. +The parser reports an error, and attempts to recover the situation and +resume parsing: the error recovery (as opposed to the detection of error) +will be covered in Section 7. +.PP +It is time for an example! +Consider the specification +.DS +%token DING DONG DELL +%% +rhyme : sound place + ; +sound : DING DONG + ; +place : DELL + ; +.DE +.PP +When Yacc is invoked with the +.B \-v +option, a file called +.I y.output +is produced, with a human-readable description of the parser. +The +.I y.output +file corresponding to the above grammar (with some statistics +stripped off the end) is: +.DS +state 0 + $accept : \_rhyme $end + + DING shift 3 + . error + + rhyme goto 1 + sound goto 2 + +state 1 + $accept : rhyme\_$end + + $end accept + . error + +state 2 + rhyme : sound\_place + + DELL shift 5 + . error + + place goto 4 + +state 3 + sound : DING\_DONG + + DONG shift 6 + . error + +state 4 + rhyme : sound place\_ (1) + + . reduce 1 + +state 5 + place : DELL\_ (3) + + . reduce 3 + +state 6 + sound : DING DONG\_ (2) + + . reduce 2 +.DE +Notice that, in addition to the actions for each state, there is a +description of the parsing rules being processed in each +state. The \_ character is used to indicate +what has been seen, and what is yet to come, in each rule. +Suppose the input is +.DS +DING DONG DELL +.DE +It is instructive to follow the steps of the parser while +processing this input. +.PP +Initially, the current state is state 0. +The parser needs to refer to the input in order to decide +between the actions available in state 0, so +the first token, +.I DING , +is read, becoming the lookahead token. +The action in state 0 on +.I DING +is +is ``shift 3'', so state 3 is pushed onto the stack, +and the lookahead token is cleared. +State 3 becomes the current state. +The next token, +.I DONG , +is read, becoming the lookahead token. +The action in state 3 on the token +.I DONG +is ``shift 6'', +so state 6 is pushed onto the stack, and the lookahead is cleared. +The stack now contains 0, 3, and 6. +In state 6, without even consulting the lookahead, +the parser reduces by rule 2. +.DS + sound : DING DONG +.DE +This rule has two symbols on the right hand side, so +two states, 6 and 3, are popped off of the stack, uncovering state 0. +Consulting the description of state 0, looking for a goto on +.I sound , +.DS + sound goto 2 +.DE +is obtained; thus state 2 is pushed onto the stack, +becoming the current state. +.PP +In state 2, the next token, +.I DELL , +must be read. +The action is ``shift 5'', so state 5 is pushed onto the stack, which now has +0, 2, and 5 on it, and the lookahead token is cleared. +In state 5, the only action is to reduce by rule 3. +This has one symbol on the right hand side, so one state, 5, +is popped off, and state 2 is uncovered. +The goto in state 2 on +.I place , +the left side of rule 3, +is state 4. +Now, the stack contains 0, 2, and 4. +In state 4, the only action is to reduce by rule 1. +There are two symbols on the right, so the top two states are popped off, +uncovering state 0 again. +In state 0, there is a goto on +.I rhyme +causing the parser to enter state 1. +In state 1, the input is read; the endmarker is obtained, +indicated by ``$end'' in the +.I y.output +file. +The action in state 1 when the endmarker is seen is to accept, +successfully ending the parse. +.PP +The reader is urged to consider how the parser works +when confronted with such incorrect strings as +.I "DING DONG DONG" , +.I "DING DONG" , +.I "DING DONG DELL DELL" , +etc. +A few minutes spend with this and other simple examples will +probably be repaid when problems arise in more complicated contexts. diff --git a/share/doc/psd/15.yacc/ss5 b/share/doc/psd/15.yacc/ss5 new file mode 100644 index 0000000..e2c3462 --- /dev/null +++ b/share/doc/psd/15.yacc/ss5 @@ -0,0 +1,339 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss5 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +5: Ambiguity and Conflicts +.PP +A set of grammar rules is +.I ambiguous +if there is some input string that can be structured in two or more different ways. +For example, the grammar rule +.DS +expr : expr \'\-\' expr +.DE +is a natural way of expressing the fact that one way of forming an arithmetic expression +is to put two other expressions together with a minus sign between them. +Unfortunately, this grammar rule does not +completely specify the way that all complex inputs +should be structured. +For example, if the input is +.DS +expr \- expr \- expr +.DE +the rule allows this input to be structured as either +.DS +( expr \- expr ) \- expr +.DE +or as +.DS +expr \- ( expr \- expr ) +.DE +(The first is called +.I "left association" , +the second +.I "right association" ). +.PP +Yacc detects such ambiguities when it is attempting to build the parser. +It is instructive to consider the problem that confronts the parser when it is +given an input such as +.DS +expr \- expr \- expr +.DE +When the parser has read the second expr, the input that it has seen: +.DS +expr \- expr +.DE +matches the right side of the grammar rule above. +The parser could +.I reduce +the input by applying this rule; +after applying the rule; +the input is reduced to +.I expr (the +left side of the rule). +The parser would then read the final part of the input: +.DS +\- expr +.DE +and again reduce. +The effect of this is to take the left associative interpretation. +.PP +Alternatively, when the parser has seen +.DS +expr \- expr +.DE +it could defer the immediate application of the rule, and continue reading +the input until it had seen +.DS +expr \- expr \- expr +.DE +It could then apply the rule to the rightmost three symbols, reducing them to +.I expr +and leaving +.DS +expr \- expr +.DE +Now the rule can be reduced once more; the effect is to +take the right associative interpretation. +Thus, having read +.DS +expr \- expr +.DE +the parser can do two legal things, a shift or a reduction, and has no way of +deciding between them. +This is called a +.I "shift / reduce conflict" . +It may also happen that the parser has a choice of two legal reductions; +this is called a +.I "reduce / reduce conflict" . +Note that there are never any ``Shift/shift'' conflicts. +.PP +When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser. +It does this by selecting one of the valid steps wherever it has a choice. +A rule describing which choice to make in a given situation is called +a +.I "disambiguating rule" . +.PP +Yacc invokes two disambiguating rules by default: +.IP 1. +In a shift/reduce conflict, the default is to do the shift. +.IP 2. +In a reduce/reduce conflict, the default is to reduce by the +.I earlier +grammar rule (in the input sequence). +.PP +Rule 1 implies that reductions are deferred whenever there is a choice, +in favor of shifts. +Rule 2 gives the user rather crude control over the behavior of the parser +in this situation, but reduce/reduce conflicts should be avoided whenever possible. +.PP +Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent, +require a more complex parser than Yacc can construct. +The use of actions within rules can also cause conflicts, if the action must +be done before the parser can be sure which rule is being recognized. +In these cases, the application of disambiguating rules is inappropriate, +and leads to an incorrect parser. +For this reason, Yacc +always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2. +.PP +In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also +possible to rewrite the grammar rules so that the same inputs are read but there are no +conflicts. +For this reason, most previous parser generators +have considered conflicts to be fatal errors. +Our experience has suggested that this rewriting is somewhat unnatural, +and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts. +.PP +As an example of the power of disambiguating rules, consider a fragment from a programming +language involving an ``if-then-else'' construction: +.DS +stat : IF \'(\' cond \')\' stat + | IF \'(\' cond \')\' stat ELSE stat + ; +.DE +In these rules, +.I IF +and +.I ELSE +are tokens, +.I cond +is a nonterminal symbol describing +conditional (logical) expressions, and +.I stat +is a nonterminal symbol describing statements. +The first rule will be called the +.ul +simple-if +rule, and the +second the +.ul +if-else +rule. +.PP +These two rules form an ambiguous construction, since input of the form +.DS +IF ( C1 ) IF ( C2 ) S1 ELSE S2 +.DE +can be structured according to these rules in two ways: +.DS +IF ( C1 ) { + IF ( C2 ) S1 + } +ELSE S2 +.DE +or +.DS +IF ( C1 ) { + IF ( C2 ) S1 + ELSE S2 + } +.DE +The second interpretation is the one given in most programming languages +having this construct. +Each +.I ELSE +is associated with the last preceding +``un-\fIELSE'\fRd'' +.I IF . +In this example, consider the situation where the parser has seen +.DS +IF ( C1 ) IF ( C2 ) S1 +.DE +and is looking at the +.I ELSE . +It can immediately +reduce +by the simple-if rule to get +.DS +IF ( C1 ) stat +.DE +and then read the remaining input, +.DS +ELSE S2 +.DE +and reduce +.DS +IF ( C1 ) stat ELSE S2 +.DE +by the if-else rule. +This leads to the first of the above groupings of the input. +.PP +On the other hand, the +.I ELSE +may be shifted, +.I S2 +read, and then the right hand portion of +.DS +IF ( C1 ) IF ( C2 ) S1 ELSE S2 +.DE +can be reduced by the if-else rule to get +.DS +IF ( C1 ) stat +.DE +which can be reduced by the simple-if rule. +This leads to the second of the above groupings of the input, which +is usually desired. +.PP +Once again the parser can do two valid things \- there is a shift/reduce conflict. +The application of disambiguating rule 1 tells the parser to shift in this case, +which leads to the desired grouping. +.PP +This shift/reduce conflict arises only when there is a particular current input symbol, +.I ELSE , +and particular inputs already seen, such as +.DS +IF ( C1 ) IF ( C2 ) S1 +.DE +In general, there may be many conflicts, and each one +will be associated with an input symbol and +a set of previously read inputs. +The previously read inputs are characterized by the +state +of the parser. +.PP +The conflict messages of Yacc are best understood +by examining the verbose (\fB\-v\fR) option output file. +For example, the output corresponding to the above +conflict state might be: +.DS L +23: shift/reduce conflict (shift 45, reduce 18) on ELSE + +state 23 + + stat : IF ( cond ) stat\_ (18) + stat : IF ( cond ) stat\_ELSE stat + + ELSE shift 45 + . reduce 18 + +.DE +The first line describes the conflict, giving the state and the input symbol. +The ordinary state description follows, giving +the grammar rules active in the state, and the parser actions. +Recall that the underline marks the +portion of the grammar rules which has been seen. +Thus in the example, in state 23 the parser has seen input corresponding +to +.DS +IF ( cond ) stat +.DE +and the two grammar rules shown are active at this time. +The parser can do two possible things. +If the input symbol is +.I ELSE , +it is possible to shift into state +45. +State 45 will have, as part of its description, the line +.DS +stat : IF ( cond ) stat ELSE\_stat +.DE +since the +.I ELSE +will have been shifted in this state. +Back in state 23, the alternative action, described by ``\fB.\fR'', +is to be done if the input symbol is not mentioned explicitly in the above actions; thus, +in this case, if the input symbol is not +.I ELSE , +the parser reduces by grammar rule 18: +.DS +stat : IF \'(\' cond \')\' stat +.DE +Once again, notice that the numbers following ``shift'' commands refer to other states, +while the numbers following ``reduce'' commands refer to grammar +rule numbers. +In the +.I y.output +file, the rule numbers are printed after those rules which can be reduced. +In most one states, there will be at most reduce action possible in the +state, and this will be the default command. +The user who encounters unexpected shift/reduce conflicts will probably want to +look at the verbose output to decide whether the default actions are appropriate. +In really tough cases, the user might need to know more about +the behavior and construction of the parser than can be covered here. +In this case, one of the theoretical references +.[ +Aho Johnson Surveys Parsing +.] +.[ +Aho Johnson Ullman Deterministic Ambiguous +.] +.[ +Aho Ullman Principles Design +.] +might be consulted; the services of a local guru might also be appropriate. diff --git a/share/doc/psd/15.yacc/ss6 b/share/doc/psd/15.yacc/ss6 new file mode 100644 index 0000000..513d042 --- /dev/null +++ b/share/doc/psd/15.yacc/ss6 @@ -0,0 +1,183 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss6 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +6: Precedence +.PP +There is one common situation +where the rules given above for resolving conflicts are not sufficient; +this is in the parsing of arithmetic expressions. +Most of the commonly used constructions for arithmetic expressions can be naturally +described by the notion of +.I precedence +levels for operators, together with information about left +or right associativity. +It turns out that ambiguous grammars with appropriate disambiguating rules +can be used to create parsers that are faster and easier to +write than parsers constructed from unambiguous grammars. +The basic notion is to write grammar rules +of the form +.DS +expr : expr OP expr +.DE +and +.DS +expr : UNARY expr +.DE +for all binary and unary operators desired. +This creates a very ambiguous grammar, with many parsing conflicts. +As disambiguating rules, the user specifies the precedence, or binding +strength, of all the operators, and the associativity +of the binary operators. +This information is sufficient to allow Yacc to resolve the parsing conflicts +in accordance with these rules, and construct a parser that realizes the desired +precedences and associativities. +.PP +The precedences and associativities are attached to tokens in the declarations section. +This is done by a series of lines beginning with a Yacc keyword: %left, %right, +or %nonassoc, followed by a list of tokens. +All of the tokens on the same line are assumed to have the same precedence level +and associativity; the lines are listed in +order of increasing precedence or binding strength. +Thus, +.DS +%left \'+\' \'\-\' +%left \'*\' \'/\' +.DE +describes the precedence and associativity of the four arithmetic operators. +Plus and minus are left associative, and have lower precedence than +star and slash, which are also left associative. +The keyword %right is used to describe right associative operators, +and the keyword %nonassoc is used to describe operators, like +the operator .LT. in Fortran, that may not associate with themselves; thus, +.DS +A .LT. B .LT. C +.DE +is illegal in Fortran, and such an operator would be described with the keyword +%nonassoc in Yacc. +As an example of the behavior of these declarations, the description +.DS +%right \'=\' +%left \'+\' \'\-\' +%left \'*\' \'/\' + +%% + +expr : expr \'=\' expr + | expr \'+\' expr + | expr \'\-\' expr + | expr \'*\' expr + | expr \'/\' expr + | NAME + ; +.DE +might be used to structure the input +.DS +a = b = c*d \- e \- f*g +.DE +as follows: +.DS +a = ( b = ( ((c*d)\-e) \- (f*g) ) ) +.DE +When this mechanism is used, +unary operators must, in general, be given a precedence. +Sometimes a unary operator and a binary operator +have the same symbolic representation, but different precedences. +An example is unary and binary \'\-\'; unary minus may be given the same +strength as multiplication, or even higher, while binary minus has a lower strength than +multiplication. +The keyword, %prec, changes the precedence level associated with a particular grammar rule. +%prec appears immediately after the body of the grammar rule, before the action or closing semicolon, +and is followed by a token name or literal. +It +causes the precedence of the grammar rule to become that of the following token name or literal. +For example, to make unary minus have the same precedence as multiplication the rules might resemble: +.DS +%left \'+\' \'\-\' +%left \'*\' \'/\' + +%% + +expr : expr \'+\' expr + | expr \'\-\' expr + | expr \'*\' expr + | expr \'/\' expr + | \'\-\' expr %prec \'*\' + | NAME + ; +.DE +.PP +A token declared +by %left, %right, and %nonassoc need not be, but may be, declared by %token as well. +.PP +The precedences and associativities are used by Yacc to +resolve parsing conflicts; they give rise to disambiguating rules. +Formally, the rules work as follows: +.IP 1. +The precedences and associativities are recorded for those tokens and literals +that have them. +.IP 2. +A precedence and associativity is associated with each grammar rule; it is the precedence +and associativity of the last token or literal in the body of the rule. +If the %prec construction is used, it overrides this default. +Some grammar rules may have no precedence and associativity associated with them. +.IP 3. +When there is a reduce/reduce conflict, or there is a shift/reduce conflict +and either the input symbol or the grammar rule has no precedence and associativity, +then the two disambiguating rules given at the beginning of the section are used, +and the conflicts are reported. +.IP 4. +If there is a shift/reduce conflict, and both the grammar rule and the input character +have precedence and associativity associated with them, then the conflict is resolved +in favor of the action (shift or reduce) associated with the higher precedence. +If the precedences are the same, then the associativity is used; left +associative implies reduce, right associative implies shift, and nonassociating +implies error. +.PP +Conflicts resolved by precedence are not counted in the number of shift/reduce and reduce/reduce +conflicts reported by Yacc. +This means that mistakes in the specification of precedences may +disguise errors in the input grammar; it is a good idea to be sparing +with precedences, and use them in an essentially ``cookbook'' fashion, +until some experience has been gained. +The +.I y.output +file +is very useful in deciding whether the parser is actually doing +what was intended. diff --git a/share/doc/psd/15.yacc/ss7 b/share/doc/psd/15.yacc/ss7 new file mode 100644 index 0000000..b6440c7 --- /dev/null +++ b/share/doc/psd/15.yacc/ss7 @@ -0,0 +1,161 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss7 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +7: Error Handling +.PP +Error handling is an extremely difficult area, and many of the problems are semantic ones. +When an error is found, for example, it may be necessary to reclaim parse tree storage, +delete or alter symbol table entries, and, typically, set switches to avoid generating any further output. +.PP +It is seldom acceptable to stop all processing when an error is found; it is more useful to continue +scanning the input to find further syntax errors. +This leads to the problem of getting the parser ``restarted'' after an error. +A general class of algorithms to do this involves discarding a number of tokens +from the input string, and attempting to adjust the parser so that input can continue. +.PP +To allow the user some control over this process, +Yacc provides a simple, but reasonably general, feature. +The token name ``error'' is reserved for error handling. +This name can be used in grammar rules; +in effect, it suggests places where errors are expected, and recovery might take place. +The parser pops its stack until it enters a state where the token ``error'' is legal. +It then behaves as if the token ``error'' were the current lookahead token, +and performs the action encountered. +The lookahead token is then reset to the token that caused the error. +If no special error rules have been specified, the processing halts when an error is detected. +.PP +In order to prevent a cascade of error messages, the parser, after +detecting an error, remains in error state until three tokens have been successfully +read and shifted. +If an error is detected when the parser is already in error state, +no message is given, and the input token is quietly deleted. +.PP +As an example, a rule of the form +.DS +stat : error +.DE +would, in effect, mean that on a syntax error the parser would attempt to skip over the statement +in which the error was seen. +More precisely, the parser will +scan ahead, looking for three tokens that might legally follow +a statement, and start processing at the first of these; if +the beginnings of statements are not sufficiently distinctive, it may make a +false start in the middle of a statement, and end up reporting a +second error where there is in fact no error. +.PP +Actions may be used with these special error rules. +These actions might attempt to reinitialize tables, reclaim symbol table space, etc. +.PP +Error rules such as the above are very general, but difficult to control. +Somewhat easier are rules such as +.DS +stat : error \';\' +.DE +Here, when there is an error, the parser attempts to skip over the statement, but +will do so by skipping to the next \';\'. +All tokens after the error and before the next \';\' cannot be shifted, and are discarded. +When the \';\' is seen, this rule will be reduced, and any ``cleanup'' +action associated with it performed. +.PP +Another form of error rule arises in interactive applications, where +it may be desirable to permit a line to be reentered after an error. +A possible error rule might be +.DS +input : error \'\en\' { printf( "Reenter last line: " ); } input + { $$ = $4; } +.DE +There is one potential difficulty with this approach; +the parser must correctly process three input tokens before it +admits that it has correctly resynchronized after the error. +If the reentered line contains an error +in the first two tokens, the parser deletes the offending tokens, +and gives no message; this is clearly unacceptable. +For this reason, there is a mechanism that +can be used to force the parser +to believe that an error has been fully recovered from. +The statement +.DS +yyerrok ; +.DE +in an action +resets the parser to its normal mode. +The last example is better written +.DS +input : error \'\en\' + { yyerrok; + printf( "Reenter last line: " ); } + input + { $$ = $4; } + ; +.DE +.PP +As mentioned above, the token seen immediately +after the ``error'' symbol is the input token at which the +error was discovered. +Sometimes, this is inappropriate; for example, an +error recovery action might +take upon itself the job of finding the correct place to resume input. +In this case, +the previous lookahead token must be cleared. +The statement +.DS +yyclearin ; +.DE +in an action will have this effect. +For example, suppose the action after error +were to call some sophisticated resynchronization routine, +supplied by the user, that attempted to advance the input to the +beginning of the next valid statement. +After this routine was called, the next token returned by yylex would presumably +be the first token in a legal statement; +the old, illegal token must be discarded, and the error state reset. +This could be done by a rule like +.DS +stat : error + { resynch(); + yyerrok ; + yyclearin ; } + ; +.DE +.PP +These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser +from many errors; +moreover, the user can get control to deal with +the error actions required by other portions of the program. diff --git a/share/doc/psd/15.yacc/ss8 b/share/doc/psd/15.yacc/ss8 new file mode 100644 index 0000000..75c66ab --- /dev/null +++ b/share/doc/psd/15.yacc/ss8 @@ -0,0 +1,130 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss8 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +8: The Yacc Environment +.PP +When the user inputs a specification +to Yacc, the output is a file of C programs, called +.I y.tab.c +on most +systems +(due to local file system conventions, the names may differ from +installation to installation). +The function produced by Yacc is called +.I yyparse \|; +it is an integer valued function. +When it is called, it in turn repeatedly calls +.I yylex , +the lexical analyzer +supplied by the user (see Section 3) +to obtain input tokens. +Eventually, either an error is detected, in which case +(if no error recovery is possible) +.I yyparse +returns the value 1, +or the lexical analyzer returns the endmarker token +and the parser accepts. +In this case, +.I yyparse +returns the value 0. +.PP +The user must provide a certain amount of environment for this +parser in order to obtain a working program. +For example, as with every C program, a program called +.I main +must be defined, that eventually calls +.I yyparse . +In addition, a routine called +.I yyerror +prints a message +when a syntax error is detected. +.PP +These two routines must be supplied in one form or another by the +user. +To ease the initial effort of using Yacc, a library has been +provided with default versions of +.I main +and +.I yyerror . +The name of this library is system dependent; +on many systems the library is accessed by a +.B \-ly +argument to the loader. +To show the triviality of these default programs, the source is +given below: +.DS +main(){ + return( yyparse() ); + } +.DE +and +.DS +# include <stdio.h> + +yyerror(s) char *s; { + fprintf( stderr, "%s\en", s ); + } +.DE +The argument to +.I yyerror +is a string containing an error message, usually +the string ``syntax error''. +The average application will want to do better than this. +Ordinarily, the program should keep track of the input line number, and print it +along with the message when a syntax error is detected. +The external integer variable +.I yychar +contains the lookahead token number at the time the error was detected; +this may be of some interest in giving better diagnostics. +Since the +.I main +program is probably supplied by the user (to read arguments, etc.) +the Yacc library is useful only in small +projects, or in the earliest stages of larger ones. +.PP +The external integer variable +.I yydebug +is normally set to 0. +If it is set to a nonzero value, the parser will output a +verbose description of its actions, including +a discussion of which input symbols have been read, and +what the parser actions are. +Depending on the operating environment, +it may be possible to set this variable by using a debugging system. diff --git a/share/doc/psd/15.yacc/ss9 b/share/doc/psd/15.yacc/ss9 new file mode 100644 index 0000000..9d05fec --- /dev/null +++ b/share/doc/psd/15.yacc/ss9 @@ -0,0 +1,206 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ss9 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +9: Hints for Preparing Specifications +.PP +This section contains miscellaneous hints on preparing efficient, easy to change, +and clear specifications. +The individual subsections are more or less +independent. +.SH +Input Style +.PP +It is difficult to +provide rules with substantial actions +and still have a readable specification file. +The following style hints owe much to Brian Kernighan. +.IP a. +Use all capital letters for token names, all lower case letters for +nonterminal names. +This rule comes under the heading of ``knowing who to blame when +things go wrong.'' +.IP b. +Put grammar rules and actions on separate lines. +This allows either to be changed without +an automatic need to change the other. +.IP c. +Put all rules with the same left hand side together. +Put the left hand side in only once, and let all +following rules begin with a vertical bar. +.IP d. +Put a semicolon only after the last rule with a given left hand side, +and put the semicolon on a separate line. +This allows new rules to be easily added. +.IP e. +Indent rule bodies by two tab stops, and action bodies by three +tab stops. +.PP +The example in Appendix A is written following this style, as are +the examples in the text of this paper (where space permits). +The user must make up his own mind about these stylistic questions; +the central problem, however, is to make the rules visible through +the morass of action code. +.SH +Left Recursion +.PP +The algorithm used by the Yacc parser encourages so called ``left recursive'' +grammar rules: rules of the form +.DS +name : name rest_of_rule ; +.DE +These rules frequently arise when +writing specifications of sequences and lists: +.DS +list : item + | list \',\' item + ; +.DE +and +.DS +seq : item + | seq item + ; +.DE +In each of these cases, the first rule +will be reduced for the first item only, and the second rule +will be reduced for the second and all succeeding items. +.PP +With right recursive rules, such as +.DS +seq : item + | item seq + ; +.DE +the parser would be a bit bigger, and the items would be seen, and reduced, +from right to left. +More seriously, an internal stack in the parser +would be in danger of overflowing if a very long sequence were read. +Thus, the user should use left recursion wherever reasonable. +.PP +It is worth considering whether a sequence with zero +elements has any meaning, and if so, consider writing +the sequence specification with an empty rule: +.DS +seq : /* empty */ + | seq item + ; +.DE +Once again, the first rule would always be reduced exactly once, before the +first item was read, +and then the second rule would be reduced once for each item read. +Permitting empty sequences +often leads to increased generality. +However, conflicts might arise if Yacc is asked to decide +which empty sequence it has seen, when it hasn't seen enough to +know! +.SH +Lexical Tie-ins +.PP +Some lexical decisions depend on context. +For example, the lexical analyzer might want to +delete blanks normally, but not within quoted strings. +Or names might be entered into a symbol table in declarations, +but not in expressions. +.PP +One way of handling this situation is +to create a global flag that is +examined by the lexical analyzer, and set by actions. +For example, suppose a program +consists of 0 or more declarations, followed by 0 or more statements. +Consider: +.DS +%{ + int dflag; +%} + ... other declarations ... + +%% + +prog : decls stats + ; + +decls : /* empty */ + { dflag = 1; } + | decls declaration + ; + +stats : /* empty */ + { dflag = 0; } + | stats statement + ; + + ... other rules ... +.DE +The flag +.I dflag +is now 0 when reading statements, and 1 when reading declarations, +.ul +except for the first token in the first statement. +This token must be seen by the parser before it can tell that +the declaration section has ended and the statements have +begun. +In many cases, this single token exception does not +affect the lexical scan. +.PP +This kind of ``backdoor'' approach can be elaborated +to a noxious degree. +Nevertheless, it represents a way of doing some things +that are difficult, if not impossible, to +do otherwise. +.SH +Reserved Words +.PP +Some programming languages +permit the user to +use words like ``if'', which are normally reserved, +as label or variable names, provided that such use does not +conflict with the legal use of these names in the programming language. +This is extremely hard to do in the framework of Yacc; +it is difficult to pass information to the lexical analyzer +telling it ``this instance of `if' is a keyword, and that instance is a variable''. +The user can make a stab at it, using the +mechanism described in the last subsection, +but it is difficult. +.PP +A number of ways of making this easier are under advisement. +Until then, it is better that the keywords be +.I reserved \|; +that is, be forbidden for use as variable names. +There are powerful stylistic reasons for preferring this, anyway. diff --git a/share/doc/psd/15.yacc/ssA b/share/doc/psd/15.yacc/ssA new file mode 100644 index 0000000..f6f1702 --- /dev/null +++ b/share/doc/psd/15.yacc/ssA @@ -0,0 +1,221 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ssA 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +10: Advanced Topics +.PP +This section discusses a number of advanced features +of Yacc. +.SH +Simulating Error and Accept in Actions +.PP +The parsing actions of error and accept can be simulated +in an action by use of macros YYACCEPT and YYERROR. +YYACCEPT causes +.I yyparse +to return the value 0; +YYERROR causes +the parser to behave as if the current input symbol +had been a syntax error; +.I yyerror +is called, and error recovery takes place. +These mechanisms can be used to simulate parsers +with multiple endmarkers or context-sensitive syntax checking. +.SH +Accessing Values in Enclosing Rules. +.PP +An action may refer to values +returned by actions to the left of the current rule. +The mechanism is simply the same as with ordinary actions, +a dollar sign followed by a digit, but in this case the +digit may be 0 or negative. +Consider +.DS +sent : adj noun verb adj noun + { \fIlook at the sentence\fR . . . } + ; + +adj : THE { $$ = THE; } + | YOUNG { $$ = YOUNG; } + . . . + ; + +noun : DOG + { $$ = DOG; } + | CRONE + { if( $0 == YOUNG ){ + printf( "what?\en" ); + } + $$ = CRONE; + } + ; + . . . +.DE +In the action following the word CRONE, a check is made that the +preceding token shifted was not YOUNG. +Obviously, this is only possible when a great deal is known about +what might precede the symbol +.I noun +in the input. +There is also a distinctly unstructured flavor about this. +Nevertheless, at times this mechanism will save a great +deal of trouble, especially when a few combinations are to +be excluded from an otherwise regular structure. +.SH +Support for Arbitrary Value Types +.PP +By default, the values returned by actions and the lexical analyzer are integers. +Yacc can also support +values of other types, including structures. +In addition, Yacc keeps track of the types, and inserts +appropriate union member names so that the resulting parser will +be strictly type checked. +The Yacc value stack (see Section 4) +is declared to be a +.I union +of the various types of values desired. +The user declares the union, and associates union member names +to each token and nonterminal symbol having a value. +When the value is referenced through a $$ or $n construction, +Yacc will automatically insert the appropriate union name, so that +no unwanted conversions will take place. +In addition, type checking commands such as +.I Lint\| +.[ +Johnson Lint Checker 1273 +.] +will be far more silent. +.PP +There are three mechanisms used to provide for this typing. +First, there is a way of defining the union; this must be +done by the user since other programs, notably the lexical analyzer, +must know about the union member names. +Second, there is a way of associating a union member name with tokens +and nonterminals. +Finally, there is a mechanism for describing the type of those +few values where Yacc can not easily determine the type. +.PP +To declare the union, the user includes in the declaration section: +.DS +%union { + body of union ... + } +.DE +This declares the Yacc value stack, +and the external variables +.I yylval +and +.I yyval , +to have type equal to this union. +If Yacc was invoked with the +.B \-d +option, the union declaration +is copied onto the +.I y.tab.h +file. +Alternatively, +the union may be declared in a header file, and a typedef +used to define the variable YYSTYPE to represent +this union. +Thus, the header file might also have said: +.DS +typedef union { + body of union ... + } YYSTYPE; +.DE +The header file must be included in the declarations +section, by use of %{ and %}. +.PP +Once YYSTYPE is defined, +the union member names must be associated +with the various terminal and nonterminal names. +The construction +.DS +< name > +.DE +is used to indicate a union member name. +If this follows +one of the +keywords %token, +%left, %right, and %nonassoc, +the union member name is associated with the tokens listed. +Thus, saying +.DS +%left <optype> \'+\' \'\-\' +.DE +will cause any reference to values returned by these two tokens to be +tagged with +the union member name +.I optype . +Another keyword, %type, is +used similarly to associate +union member names with nonterminals. +Thus, one might say +.DS +%type <nodetype> expr stat +.DE +.PP +There remain a couple of cases where these mechanisms are insufficient. +If there is an action within a rule, the value returned +by this action has no +.I "a priori" +type. +Similarly, reference to left context values (such as $0 \- see the +previous subsection ) leaves Yacc with no easy way of knowing the type. +In this case, a type can be imposed on the reference by inserting +a union member name, between < and >, immediately after +the first $. +An example of this usage is +.DS +rule : aaa { $<intval>$ = 3; } bbb + { fun( $<intval>2, $<other>0 ); } + ; +.DE +This syntax has little to recommend it, but the situation arises rarely. +.PP +A sample specification is given in Appendix C. +The facilities in this subsection are not triggered until they are used: +in particular, the use of %type will turn on these mechanisms. +When they are used, there is a fairly strict level of checking. +For example, use of $n or $$ to refer to something with no defined type +is diagnosed. +If these facilities are not triggered, the Yacc value stack is used to +hold +.I int' s, +as was true historically. diff --git a/share/doc/psd/15.yacc/ssB b/share/doc/psd/15.yacc/ssB new file mode 100644 index 0000000..c16bb52 --- /dev/null +++ b/share/doc/psd/15.yacc/ssB @@ -0,0 +1,63 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ssB 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +11: Acknowledgements +.PP +Yacc owes much to a +most stimulating collection of users, who have goaded +me beyond my inclination, and frequently beyond my +ability, in their endless search for ``one more feature''. +Their irritating unwillingness to learn how to +do things my way has usually led to my doing things their way; +most of the time, they have been right. +B. W. Kernighan, P. J. Plauger, S. I. Feldman, C. Imagna, +M. E. Lesk, +and A. Snyder will recognize some of their ideas in the current version +of Yacc. +C. B. Haley contributed to the error recovery algorithm. +D. M. Ritchie, B. W. Kernighan, and M. O. Harris helped translate this document into English. +Al Aho also deserves special credit for bringing +the mountain to Mohammed, and other favors. +.\" .SG "MH-1273-SCJ-unix" +.bp +.[ +$LIST$ +.] +.bp diff --git a/share/doc/psd/15.yacc/ssa b/share/doc/psd/15.yacc/ssa new file mode 100644 index 0000000..17e815e --- /dev/null +++ b/share/doc/psd/15.yacc/ssa @@ -0,0 +1,150 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ssa 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +Appendix A: A Simple Example +.PP +This example gives the complete Yacc specification for a small desk calculator; +the desk calculator has 26 registers, labeled ``a'' through ``z'', and accepts +arithmetic expressions made up of the operators +, \-, *, /, +% (mod operator), & (bitwise and), | (bitwise or), and assignment. +If an expression at the top level is an assignment, the value is not +printed; otherwise it is. +As in C, an integer that begins with 0 (zero) is assumed to be octal; +otherwise, it is assumed to be decimal. +.PP +As an example of a Yacc specification, the desk calculator +does a reasonable job of showing how precedences and ambiguities +are used, and demonstrating simple error recovery. +The major oversimplifications are that the +lexical analysis phase is much simpler than for most applications, and the +output is produced immediately, line by line. +Note the way that decimal and octal integers are read in by the grammar rules; +This job is probably better done by the lexical analyzer. +.sp +.nf +.ta .5i 1i 1.5i 2i 2.5i + +%{ +# include <stdio.h> +# include <ctype.h> + +int regs[26]; +int base; + +%} + +%start list + +%token DIGIT LETTER + +%left \'|\' +%left \'&\' +%left \'+\' \'\-\' +%left \'*\' \'/\' \'%\' +%left UMINUS /* supplies precedence for unary minus */ + +%% /* beginning of rules section */ + +list : /* empty */ + | list stat \'\en\' + | list error \'\en\' + { yyerrok; } + ; + +stat : expr + { printf( "%d\en", $1 ); } + | LETTER \'=\' expr + { regs[$1] = $3; } + ; + +expr : \'(\' expr \')\' + { $$ = $2; } + | expr \'+\' expr + { $$ = $1 + $3; } + | expr \'\-\' expr + { $$ = $1 \- $3; } + | expr \'*\' expr + { $$ = $1 * $3; } + | expr \'/\' expr + { $$ = $1 / $3; } + | expr \'%\' expr + { $$ = $1 % $3; } + | expr \'&\' expr + { $$ = $1 & $3; } + | expr \'|\' expr + { $$ = $1 | $3; } + | \'\-\' expr %prec UMINUS + { $$ = \- $2; } + | LETTER + { $$ = regs[$1]; } + | number + ; + +number : DIGIT + { $$ = $1; base = ($1==0) ? 8 : 10; } + | number DIGIT + { $$ = base * $1 + $2; } + ; + +%% /* start of programs */ + +yylex() { /* lexical analysis routine */ + /* returns LETTER for a lower case letter, yylval = 0 through 25 */ + /* return DIGIT for a digit, yylval = 0 through 9 */ + /* all other characters are returned immediately */ + + int c; + + while( (c=getchar()) == \' \' ) { /* skip blanks */ } + + /* c is now nonblank */ + + if( islower( c ) ) { + yylval = c \- \'a\'; + return ( LETTER ); + } + if( isdigit( c ) ) { + yylval = c \- \'0\'; + return( DIGIT ); + } + return( c ); + } +.fi +.bp diff --git a/share/doc/psd/15.yacc/ssb b/share/doc/psd/15.yacc/ssb new file mode 100644 index 0000000..2dba020 --- /dev/null +++ b/share/doc/psd/15.yacc/ssb @@ -0,0 +1,147 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)ssb 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +Appendix B: Yacc Input Syntax +.PP +This Appendix has a description of the Yacc input syntax, as a Yacc specification. +Context dependencies, etc., are not considered. +Ironically, the Yacc input specification language +is most naturally specified as an LR(2) grammar; the sticky +part comes when an identifier is seen in a rule, immediately +following an action. +If this identifier is followed by a colon, it is the start of the +next rule; otherwise +it is a continuation of the current rule, which just happens to have +an action embedded in it. +As implemented, the lexical analyzer looks +ahead after seeing an identifier, and +decide whether the next token (skipping blanks, newlines, comments, etc.) +is a colon. +If so, it returns the token C_IDENTIFIER. +Otherwise, it returns IDENTIFIER. +Literals (quoted strings) are also returned as IDENTIFIERS, +but never as part of C_IDENTIFIERs. +.sp +.nf +.ta .6i 1.2i 1.8i 2.4i 3i 3.6i + + /* grammar for the input to Yacc */ + + /* basic entities */ +%token IDENTIFIER /* includes identifiers and literals */ +%token C_IDENTIFIER /* identifier (but not literal) followed by colon */ +%token NUMBER /* [0-9]+ */ + + /* reserved words: %type => TYPE, %left => LEFT, etc. */ + +%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION + +%token MARK /* the %% mark */ +%token LCURL /* the %{ mark */ +%token RCURL /* the %} mark */ + + /* ascii character literals stand for themselves */ + +%start spec + +%% + +spec : defs MARK rules tail + ; + +tail : MARK { \fIIn this action, eat up the rest of the file\fR } + | /* empty: the second MARK is optional */ + ; + +defs : /* empty */ + | defs def + ; + +def : START IDENTIFIER + | UNION { \fICopy union definition to output\fR } + | LCURL { \fICopy C code to output file\fR } RCURL + | ndefs rword tag nlist + ; + +rword : TOKEN + | LEFT + | RIGHT + | NONASSOC + | TYPE + ; + +tag : /* empty: union tag is optional */ + | \'<\' IDENTIFIER \'>\' + ; + +nlist : nmno + | nlist nmno + | nlist \',\' nmno + ; + +nmno : IDENTIFIER /* NOTE: literal illegal with %type */ + | IDENTIFIER NUMBER /* NOTE: illegal with %type */ + ; + + /* rules section */ + +rules : C_IDENTIFIER rbody prec + | rules rule + ; + +rule : C_IDENTIFIER rbody prec + | '|' rbody prec + ; + +rbody : /* empty */ + | rbody IDENTIFIER + | rbody act + ; + +act : \'{\' { \fICopy action, translate $$, etc.\fR } \'}\' + ; + +prec : /* empty */ + | PREC IDENTIFIER + | PREC IDENTIFIER act + | prec \';\' + ; +.fi +.bp diff --git a/share/doc/psd/15.yacc/ssc b/share/doc/psd/15.yacc/ssc new file mode 100644 index 0000000..95fca5c --- /dev/null +++ b/share/doc/psd/15.yacc/ssc @@ -0,0 +1,347 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" @(#)ssc 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.SH +Appendix C: An Advanced Example +.PP +This Appendix gives an example of a grammar using some +of the advanced features discussed in Section 10. +The desk calculator example in Appendix A is +modified to provide a desk calculator that +does floating point interval arithmetic. +The calculator understands floating point +constants, the arithmetic operations +, \-, *, /, +unary \-, and = (assignment), and has 26 floating +point variables, ``a'' through ``z''. +Moreover, it also understands +.I intervals , +written +.DS + ( x , y ) +.DE +where +.I x +is less than or equal to +.I y . +There are 26 interval valued variables ``A'' through ``Z'' +that may also be used. +The usage is similar to that in Appendix A; assignments +return no value, and print nothing, while expressions print +the (floating or interval) value. +.PP +This example explores a number of interesting features +of Yacc and C. +Intervals are represented by a structure, consisting of the +left and right endpoint values, stored as +.I double 's. +This structure is given a type name, INTERVAL, by using +.I typedef . +The Yacc value stack can also contain floating point scalars, and +integers (used to index into the arrays holding the variable values). +Notice that this entire strategy depends strongly on being able to +assign structures and unions in C. +In fact, many of the actions call functions that return structures +as well. +.PP +It is also worth noting the use of YYERROR to handle error conditions: +division by an interval containing 0, and an interval presented in +the wrong order. +In effect, the error recovery mechanism of Yacc is used to throw away the +rest of the offending line. +.PP +In addition to the mixing of types on the value stack, +this grammar also demonstrates an interesting use of syntax to +keep track of the type (e.g. scalar or interval) of intermediate +expressions. +Note that a scalar can be automatically promoted to an interval if +the context demands an interval value. +This causes a large number of conflicts when the grammar is run through +Yacc: 18 Shift/Reduce and 26 Reduce/Reduce. +The problem can be seen by looking at the two input lines: +.DS + 2.5 + ( 3.5 \- 4. ) +.DE +and +.DS + 2.5 + ( 3.5 , 4. ) +.DE +Notice that the 2.5 is to be used in an interval valued expression +in the second example, but this fact is not known until +the ``,'' is read; by this time, 2.5 is finished, and the parser cannot go back +and change its mind. +More generally, it might be necessary to look ahead an arbitrary number of +tokens to decide whether to convert a scalar to an interval. +This problem is evaded by having two rules for each binary interval +valued operator: one when the left operand is a scalar, and one when +the left operand is an interval. +In the second case, the right operand must be an interval, +so the conversion will be applied automatically. +Despite this evasion, there are still many cases where the +conversion may be applied or not, leading to the above +conflicts. +They are resolved by listing the rules that yield scalars first +in the specification file; in this way, the conflicts will +be resolved in the direction of keeping scalar +valued expressions scalar valued until they are forced to become +intervals. +.PP +This way of handling multiple types is very instructive, but not very general. +If there were many kinds of expression types, instead of just two, +the number of rules needed would increase dramatically, and the conflicts +even more dramatically. +Thus, while this example is instructive, it is better practice in a +more normal programming language environment to +keep the type information as part of the value, and not as part +of the grammar. +.PP +Finally, a word about the lexical analysis. +The only unusual feature is the treatment of floating point constants. +The C library routine +.I atof +is used to do the actual conversion from a character string +to a double precision value. +If the lexical analyzer detects an error, +it responds by returning a token that +is illegal in the grammar, provoking a syntax error +in the parser, and thence error recovery. +.LD + +%{ + +# include <stdio.h> +# include <ctype.h> + +typedef struct interval { + double lo, hi; + } INTERVAL; + +INTERVAL vmul(), vdiv(); + +double atof(); + +double dreg[ 26 ]; +INTERVAL vreg[ 26 ]; + +%} + +%start lines + +%union { + int ival; + double dval; + INTERVAL vval; + } + +%token <ival> DREG VREG /* indices into dreg, vreg arrays */ + +%token <dval> CONST /* floating point constant */ + +%type <dval> dexp /* expression */ + +%type <vval> vexp /* interval expression */ + + /* precedence information about the operators */ + +%left \'+\' \'\-\' +%left \'*\' \'/\' +%left UMINUS /* precedence for unary minus */ + +%% + +lines : /* empty */ + | lines line + ; + +line : dexp \'\en\' + { printf( "%15.8f\en", $1 ); } + | vexp \'\en\' + { printf( "(%15.8f , %15.8f )\en", $1.lo, $1.hi ); } + | DREG \'=\' dexp \'\en\' + { dreg[$1] = $3; } + | VREG \'=\' vexp \'\en\' + { vreg[$1] = $3; } + | error \'\en\' + { yyerrok; } + ; + +dexp : CONST + | DREG + { $$ = dreg[$1]; } + | dexp \'+\' dexp + { $$ = $1 + $3; } + | dexp \'\-\' dexp + { $$ = $1 \- $3; } + | dexp \'*\' dexp + { $$ = $1 * $3; } + | dexp \'/\' dexp + { $$ = $1 / $3; } + | \'\-\' dexp %prec UMINUS + { $$ = \- $2; } + | \'(\' dexp \')\' + { $$ = $2; } + ; + +vexp : dexp + { $$.hi = $$.lo = $1; } + | \'(\' dexp \',\' dexp \')\' + { + $$.lo = $2; + $$.hi = $4; + if( $$.lo > $$.hi ){ + printf( "interval out of order\en" ); + YYERROR; + } + } + | VREG + { $$ = vreg[$1]; } + | vexp \'+\' vexp + { $$.hi = $1.hi + $3.hi; + $$.lo = $1.lo + $3.lo; } + | dexp \'+\' vexp + { $$.hi = $1 + $3.hi; + $$.lo = $1 + $3.lo; } + | vexp \'\-\' vexp + { $$.hi = $1.hi \- $3.lo; + $$.lo = $1.lo \- $3.hi; } + | dexp \'\-\' vexp + { $$.hi = $1 \- $3.lo; + $$.lo = $1 \- $3.hi; } + | vexp \'*\' vexp + { $$ = vmul( $1.lo, $1.hi, $3 ); } + | dexp \'*\' vexp + { $$ = vmul( $1, $1, $3 ); } + | vexp \'/\' vexp + { if( dcheck( $3 ) ) YYERROR; + $$ = vdiv( $1.lo, $1.hi, $3 ); } + | dexp \'/\' vexp + { if( dcheck( $3 ) ) YYERROR; + $$ = vdiv( $1, $1, $3 ); } + | \'\-\' vexp %prec UMINUS + { $$.hi = \-$2.lo; $$.lo = \-$2.hi; } + | \'(\' vexp \')\' + { $$ = $2; } + ; + +%% + +# define BSZ 50 /* buffer size for floating point numbers */ + + /* lexical analysis */ + +yylex(){ + register c; + + while( (c=getchar()) == \' \' ){ /* skip over blanks */ } + + if( isupper( c ) ){ + yylval.ival = c \- \'A\'; + return( VREG ); + } + if( islower( c ) ){ + yylval.ival = c \- \'a\'; + return( DREG ); + } + + if( isdigit( c ) || c==\'.\' ){ + /* gobble up digits, points, exponents */ + + char buf[BSZ+1], *cp = buf; + int dot = 0, exp = 0; + + for( ; (cp\-buf)<BSZ ; ++cp,c=getchar() ){ + + *cp = c; + if( isdigit( c ) ) continue; + if( c == \'.\' ){ + if( dot++ || exp ) return( \'.\' ); /* will cause syntax error */ + continue; + } + + if( c == \'e\' ){ + if( exp++ ) return( \'e\' ); /* will cause syntax error */ + continue; + } + + /* end of number */ + break; + } + *cp = \'\e0\'; + if( (cp\-buf) >= BSZ ) printf( "constant too long: truncated\en" ); + else ungetc( c, stdin ); /* push back last char read */ + yylval.dval = atof( buf ); + return( CONST ); + } + return( c ); + } + +INTERVAL hilo( a, b, c, d ) double a, b, c, d; { + /* returns the smallest interval containing a, b, c, and d */ + /* used by *, / routines */ + INTERVAL v; + + if( a>b ) { v.hi = a; v.lo = b; } + else { v.hi = b; v.lo = a; } + + if( c>d ) { + if( c>v.hi ) v.hi = c; + if( d<v.lo ) v.lo = d; + } + else { + if( d>v.hi ) v.hi = d; + if( c<v.lo ) v.lo = c; + } + return( v ); + } + +INTERVAL vmul( a, b, v ) double a, b; INTERVAL v; { + return( hilo( a*v.hi, a*v.lo, b*v.hi, b*v.lo ) ); + } + +dcheck( v ) INTERVAL v; { + if( v.hi >= 0. && v.lo <= 0. ){ + printf( "divisor interval contains 0.\en" ); + return( 1 ); + } + return( 0 ); + } + +INTERVAL vdiv( a, b, v ) double a, b; INTERVAL v; { + return( hilo( a/v.hi, a/v.lo, b/v.hi, b/v.lo ) ); + } +.DE +.bp diff --git a/share/doc/psd/15.yacc/ssd b/share/doc/psd/15.yacc/ssd new file mode 100644 index 0000000..988e0a0 --- /dev/null +++ b/share/doc/psd/15.yacc/ssd @@ -0,0 +1,76 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" @(#)ssd 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.SH +Appendix D: Old Features Supported but not Encouraged +.PP +This Appendix mentions synonyms and features which are supported for historical +continuity, but, for various reasons, are not encouraged. +.IP 1. +Literals may also be delimited by double quotes ``"''. +.IP 2. +Literals may be more than one character long. +If all the characters are alphabetic, numeric, or \_, the type number of the literal is defined, +just as if the literal did not have the quotes around it. +Otherwise, it is difficult to find the value for such literals. +.IP +The use of multi-character literals is likely to mislead those unfamiliar with +Yacc, since it suggests that Yacc is doing a job which must be actually done by the lexical analyzer. +.IP 3. +Most places where % is legal, backslash ``\e'' may be used. +In particular, \e\e is the same as %%, \eleft the same as %left, etc. +.IP 4. +There are a number of other synonyms: +.DS +%< is the same as %left +%> is the same as %right +%binary and %2 are the same as %nonassoc +%0 and %term are the same as %token +%= is the same as %prec +.DE +.IP 5. +Actions may also have the form +.DS +={ . . . } +.DE +and the curly braces can be dropped if the action is a +single C statement. +.IP 6. +C code between %{ and %} used to be permitted at the +head of the rules section, as well as in the +declaration section. diff --git a/share/doc/psd/16.lex/Makefile b/share/doc/psd/16.lex/Makefile new file mode 100644 index 0000000..6dea7c0 --- /dev/null +++ b/share/doc/psd/16.lex/Makefile @@ -0,0 +1,9 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/16.lex +SRCS= lex.ms +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/psd/16.lex/lex.ms b/share/doc/psd/16.lex/lex.ms new file mode 100644 index 0000000..8b3c82e --- /dev/null +++ b/share/doc/psd/16.lex/lex.ms @@ -0,0 +1,2345 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)lex.ms 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.nr tm 0 +.de MH +Bell Laboratories, Murray Hill, NJ 07974. +.. +.EH 'PSD:16-%''Lex \- A Lexical Analyzer Generator' +.OH 'Lex \- A Lexical Analyzer Generator''PSD:16-%' +.hc ~ +.bd I 2 +.de TS +.br +.nf +.sp 1v +.ul 0 +.. +.de TE +.sp 1v +.fi +.. +.\".de PT +.\".if \\n%>1 'tl ''\s7LEX\s0\s9\(mi%\s0'' +.\".if \\n%>1 'sp +.\".. +.ND July 21, 1975 +.\".RP +.\".TM 75-1274-15 39199 39199-11 +.TL +Lex \- A Lexical Analyzer ~Generator~ +.AU ``MH 2C-569'' 6377 +M. E. Lesk and E. Schmidt +.AI +.MH +.AB +.ps +1 +NOTE: This document describes the historical Unix version of \fIlex\fP. +FreeBSD is supplied with \fIflex\fP\| which is a compatible replacement. +See the extensive documentation in \fIflex(1)\fP\| for details. +.ps +.sp +.bd I 2 +.\".nr PS 8 +.\".nr VS 9 +.\".ps 8 +.\".vs 9p +Lex helps write programs whose control flow +is directed by instances of regular +expressions in the input stream. +It is well suited for editor-script type transformations and +for segmenting input in preparation for +a parsing routine. +.PP +Lex source is a table of regular expressions and corresponding program fragments. +The table is translated to a program +which reads an input stream, copying it to an output stream +and partitioning the input +into strings which match the given expressions. +As each such string is recognized the corresponding +program fragment is executed. +The recognition of the expressions +is performed by a deterministic finite automaton +generated by Lex. +The program fragments written by the user are executed in the order in which the +corresponding regular expressions occur in the input stream. +.PP +The lexical analysis +programs written with Lex accept ambiguous specifications +and choose the longest +match possible at each input point. +If necessary, substantial look~ahead +is performed on the input, but the +input stream will be backed up to the +end of the current partition, so that the user +has general freedom to manipulate it. +.PP +Lex can generate analyzers in either C or Ratfor, a language +which can be translated automatically to portable Fortran. +It is available on the PDP-11 UNIX, Honeywell GCOS, +and IBM OS systems. +This manual, however, will only discuss generating analyzers +in C on the UNIX system, which is the only supported +form of Lex under UNIX Version 7. +Lex is designed to simplify +interfacing with Yacc, for those +with access to this compiler-compiler system. +.\".nr PS 9 +.\".nr VS 11 +.AE +.2C +.NH +Introduction. +.PP +Lex is a program generator designed for +lexical processing of character input streams. +It accepts a high-level, problem oriented specification +for character string matching, +and +produces a program in a general purpose language which recognizes +regular expressions. +The regular expressions are specified by the user in the +source specifications given to Lex. +The Lex written code recognizes these expressions +in an input stream and partitions the input stream into +strings matching the expressions. At the bound~aries +between strings +program sections +provided by the user are executed. +The Lex source file associates the regular expressions and the +program fragments. +As each expression appears in the input to the program written by Lex, +the corresponding fragment is executed. +.PP +The user supplies the additional code +beyond expression matching +needed to complete his tasks, possibly +including code written by other generators. +The program that recognizes the expressions is generated in the +general purpose programming language employed for the +user's program fragments. +Thus, a high level expression +language is provided to write the string expressions to be +matched while the user's freedom to write actions +is unimpaired. +This avoids forcing the user who wishes to use a string manipulation +language for input analysis to write processing programs in the same +and often inappropriate string handling language. +.PP +Lex is not a complete language, but rather a generator representing +a new language feature which can be added to +different programming languages, called ``host languages.'' +Just as general purpose languages +can produce code to run on different computer hardware, +Lex can write code in different host languages. +The host language is used for the output code generated by Lex +and also for the program fragments added by the user. +Compatible run-time libraries for the different host languages +are also provided. +This makes Lex adaptable to different environments and +different users. +Each application +may be directed to the combination of hardware and host language appropriate +to the task, the user's background, and the properties of local +implementations. +At present, the only supported host language is C, +although Fortran (in the form of Ratfor [2] has been available +in the past. +Lex itself exists on UNIX, GCOS, and OS/370; but the +code generated by Lex may be taken anywhere the appropriate +compilers exist. +.PP +Lex turns the user's expressions and actions +(called +.ul +source +in this memo) into the host general-purpose language; +the generated program is named +.ul +yylex. +The +.ul +yylex +program +will recognize expressions +in a stream +(called +.ul +input +in this memo) +and perform the specified actions for each expression as it is detected. +See Figure 1. +.\" .GS +.TS +center; +l _ r +l|c|r +l _ r +l _ r +l|c|r +l _ r +c s s +c s s. + +Source \(-> Lex \(-> yylex + +.sp 2 + +Input \(-> yylex \(-> Output + +.sp +An overview of Lex +Figure 1 +.TE +.\" .GE +.PP +For a trivial example, consider a program to delete +from the input +all blanks or tabs at the ends of lines. +.TS +center; +l l. +%% +[ \et]+$ ; +.TE +is all that is required. +The program +contains a %% delimiter to mark the beginning of the rules, and +one rule. +This rule contains a regular expression +which matches one or more +instances of the characters blank or tab +(written \et for visibility, in accordance with the C language convention) +just prior to the end of a line. +The brackets indicate the character +class made of blank and tab; the + indicates ``one or more ...''; +and the $ indicates ``end of line,'' as in QED. +No action is specified, +so the program generated by Lex (yylex) will ignore these characters. +Everything else will be copied. +To change any remaining +string of blanks or tabs to a single blank, +add another rule: +.TS +center; +l l. +%% +[ \et]+$ ; +[ \et]+ printf(" "); +.TE +The finite automaton generated for this +source will scan for both rules at once, +observing at +the termination of the string of blanks or tabs +whether or not there is a newline character, and executing +the desired rule action. +The first rule matches all strings of blanks or tabs +at the end of lines, and the second +rule all remaining strings of blanks or tabs. +.PP +Lex can be used alone for simple transformations, or +for analysis and statistics gathering on a lexical level. +Lex can also be used with a parser generator +to perform the lexical analysis phase; it is particularly +easy to interface Lex and Yacc [3]. +Lex programs recognize only regular expressions; +Yacc writes parsers that accept a large class of context free grammars, +but require a lower level analyzer to recognize input tokens. +Thus, a combination of Lex and Yacc is often appropriate. +When used as a preprocessor for a later parser generator, +Lex is used to partition the input stream, +and the parser generator assigns structure to +the resulting pieces. +The flow of control +in such a case (which might be the first half of a compiler, +for example) is shown in Figure 2. +Additional programs, +written by other generators +or by hand, can +be added easily to programs written by Lex. +.\" .BS 2 +.ps 9 +.vs 11 +.TS +center; +l c c c l +l c c c l +l c c c l +l _ c _ l +l|c|c|c|l +l _ c _ l +l c c c l +l _ c _ l +l|c|c|c|l +l _ c _ l +l c s s l +l c s s l. + lexical grammar + rules rules + \(da \(da + + Lex Yacc + + \(da \(da + +Input \(-> yylex \(-> yyparse \(-> Parsed input + +.sp + Lex with Yacc + Figure 2 +.TE +.ps 10 +.vs 12 +.\" .BE +Yacc users +will realize that the name +.ul +yylex +is what Yacc expects its lexical analyzer to be named, +so that the use of this name by Lex simplifies +interfacing. +.PP +Lex generates a deterministic finite automaton from the regular expressions +in the source [4]. +The automaton is interpreted, rather than compiled, in order +to save space. +The result is still a fast analyzer. +In particular, the time taken by a Lex program +to recognize and partition an input stream is +proportional to the length of the input. +The number of Lex rules or +the complexity of the rules is +not important in determining speed, +unless rules which include +forward context require a significant amount of re~scanning. +What does increase with the number and complexity of rules +is the size of the finite +automaton, and therefore the size of the program +generated by Lex. +.PP +In the program written by Lex, the user's fragments +(representing the +.ul +actions +to be performed as each regular expression +is found) +are gathered +as cases of a switch. +The automaton interpreter directs the control flow. +Opportunity is provided for the user to insert either +declarations or additional statements in the routine containing +the actions, or to +add subroutines outside this action routine. +.PP +Lex is not limited to source which can +be interpreted on the basis of one character +look~ahead. +For example, +if there are two rules, one looking for +.I ab +and another for +.I abcdefg , +and the input stream is +.I abcdefh , +Lex will recognize +.I ab +and leave +the input pointer just before +.I "cd. . ." +Such backup is more costly +than the processing of simpler languages. +.2C +.NH +Lex Source. +.PP +The general format of Lex source is: +.TS +center; +l. +{definitions} +%% +{rules} +%% +{user subroutines} +.TE +where the definitions and the user subroutines +are often omitted. +The second +.I %% +is optional, but the first is required +to mark the beginning of the rules. +The absolute minimum Lex program is thus +.TS +center; +l. +%% +.TE +(no definitions, no rules) which translates into a program +which copies the input to the output unchanged. +.PP +In the outline of Lex programs shown above, the +.I +rules +.R +represent the user's control +decisions; they are a table, in which the left column +contains +.I +regular expressions +.R +(see section 3) +and the right column contains +.I +actions, +.R +program fragments to be executed when the expressions +are recognized. +Thus an individual rule might appear +.TS +center; +l l. +integer printf("found keyword INT"); +.TE +to look for the string +.I integer +in the input stream and +print the message ``found keyword INT'' whenever it appears. +In this example the host procedural language is C and +the C library function +.I +printf +.R +is used to print the string. +The end +of the expression is indicated by the first blank or tab character. +If the action is merely a single C expression, +it can just be given on the right side of the line; if it is +compound, or takes more than a line, it should be enclosed in +braces. +As a slightly more useful example, suppose it is desired to +change a number of words from British to American spelling. +Lex rules such as +.TS +center; +l l. +colour printf("color"); +mechanise printf("mechanize"); +petrol printf("gas"); +.TE +would be a start. These rules are not quite enough, +since +the word +.I petroleum +would become +.I gaseum ; +a way of dealing +with this will be described later. +.2C +.NH +Lex Regular Expressions. +.PP +The definitions of regular expressions are very similar to those +in QED [5]. +A regular +expression specifies a set of strings to be matched. +It contains text characters (which match the corresponding +characters in the strings being compared) +and operator characters (which specify +repetitions, choices, and other features). +The letters of the alphabet and the digits are +always text characters; thus the regular expression +.TS +center; +l l. +integer +.TE +matches the string +.ul +integer +wherever it appears +and the expression +.TS +center; +l. +a57D +.TE +looks for the string +.ul +a57D. +.PP +.I +Operators. +.R +The operator characters are +.TS +center; +l. +" \e [ ] ^ \- ? . \(** + | ( ) $ / { } % < > +.TE +and if they are to be used as text characters, an escape +should be used. +The quotation mark operator (") +indicates that whatever is contained between a pair of quotes +is to be taken as text characters. +Thus +.TS +center; +l. +xyz"++" +.TE +matches the string +.I xyz++ +when it appears. Note that a part of a string may be quoted. +It is harmless but unnecessary to quote an ordinary +text character; the expression +.TS +center; +l. +"xyz++" +.TE +is the same as the one above. +Thus by quoting every non-alphanumeric character +being used as a text character, the user can avoid remembering +the list above of current +operator characters, and is safe should further extensions to Lex +lengthen the list. +.PP +An operator character may also be turned into a text character +by preceding it with \e as in +.TS +center; +l. +xyz\e+\e+ +.TE +which +is another, less readable, equivalent of the above expressions. +Another use of the quoting mechanism is to get a blank into +an expression; normally, as explained above, blanks or tabs end +a rule. +Any blank character not contained within [\|] (see below) must +be quoted. +Several normal C escapes with \e +are recognized: \en is newline, \et is tab, and \eb is backspace. +To enter \e itself, use \e\e. +Since newline is illegal in an expression, \en must be used; +it is not +required to escape tab and backspace. +Every character but blank, tab, newline and the list above is always +a text character. +.PP +.I +Character classes. +.R +Classes of characters can be specified using the operator pair [\|]. +The construction +.I [abc] +matches a +single character, which may be +.I a , +.I b , +or +.I c . +Within square brackets, +most operator meanings are ignored. +Only three characters are special: +these are \e \(mi and ^. The \(mi character +indicates ranges. For example, +.TS +center; +l. +[a\(miz0\(mi9<>_] +.TE +indicates the character class containing all the lower case letters, +the digits, +the angle brackets, and underline. +Ranges may be given in either order. +Using \(mi between any pair of characters which are +not both upper case letters, both lower case letters, or both digits +is implementation dependent and will get a warning message. +(E.g., [0\-z] in ASCII is many more characters +than it is in EBCDIC). +If it is desired to include the +character \(mi in a character class, it should be first or +last; thus +.TS +center; +l. +[\(mi+0\(mi9] +.TE +matches all the digits and the two signs. +.PP +In character classes, +the ^ operator must appear as the first character +after the left bracket; it indicates that the resulting string +is to be complemented with respect to the computer character set. +Thus +.TS +center; +l. +[^abc] +.TE +matches all characters except a, b, or c, including +all special or control characters; or +.TS +center; +l. +[^a\-zA\-Z] +.TE +is any character which is not a letter. +The \e character provides the usual escapes within +character class brackets. +.PP +.I +Arbitrary character. +.R +To match almost any character, the operator character +.TS +center; +l. +\&. +.TE +is the class of all characters except newline. +Escaping into octal is possible although non-portable: +.TS +center; +l. +[\e40\-\e176] +.TE +matches all printable characters in the ASCII character set, from octal +40 (blank) to octal 176 (tilde). +.PP +.I +Optional expressions. +.R +The operator +.I ? +indicates +an optional element of an expression. +Thus +.TS +center; +l. +ab?c +.TE +matches either +.I ac +or +.I abc . +.PP +.I +Repeated expressions. +.R +Repetitions of classes are indicated by the operators +.I \(** +and +.I + . +.TS +center; +l. +\f2a\(**\f1 +.TE +is any number of consecutive +.I a +characters, including zero; while +.TS +center; +l. +a+ +.TE +is one or more instances of +.I a. +For example, +.TS +center; +l. +[a\-z]+ +.TE +is all strings of lower case letters. +And +.TS +center; +l. +[A\(miZa\(miz][A\(miZa\(miz0\(mi9]\(** +.TE +indicates all alphanumeric strings with a leading +alphabetic character. +This is a typical expression for recognizing identifiers in +computer languages. +.PP +.I +Alternation and Grouping. +.R +The operator | +indicates alternation: +.TS +center; +l. +(ab\||\|cd) +.TE +matches either +.ul +ab +or +.ul +cd. +Note that parentheses are used for grouping, although +they are +not necessary on the outside level; +.TS +center; +l. +ab\||\|cd +.TE +would have sufficed. +Parentheses +can be used for more complex expressions: +.TS +center; +l. +(ab\||\|cd+)?(ef)\(** +.TE +matches such strings as +.I abefef , +.I efefef , +.I cdef , +or +.I cddd\| ; +but not +.I abc , +.I abcd , +or +.I abcdef . +.PP +.I +Context sensitivity. +.R +Lex will recognize a small amount of surrounding +context. The two simplest operators for this are +.I ^ +and +.I $ . +If the first character of an expression is +.I ^ , +the expression will only be matched at the beginning +of a line (after a newline character, or at the beginning of +the input stream). +This can never conflict with the other meaning of +.I ^ , +complementation +of character classes, since that only applies within +the [\|] operators. +If the very last character is +.I $ , +the expression will only be matched at the end of a line (when +immediately followed by newline). +The latter operator is a special case of the +.I / +operator character, +which indicates trailing context. +The expression +.TS +center; +l. +ab/cd +.TE +matches the string +.I ab , +but only if followed by +.ul +cd. +Thus +.TS +center; +l. +ab$ +.TE +is the same as +.TS +center; +l. +ab/\en +.TE +Left context is handled in Lex by +.I +start conditions +.R +as explained in section 10. If a rule is only to be executed +when the Lex automaton interpreter is in start condition +.I +x, +.R +the rule should be prefixed by +.TS +center; +l. +<x> +.TE +using the angle bracket operator characters. +If we considered ``being at the beginning of a line'' to be +start condition +.I ONE , +then the ^ operator +would be equivalent to +.TS +center; +l. +<ONE> +.TE +Start conditions are explained more fully later. +.PP +.I +Repetitions and Definitions. +.R +The operators {} specify +either repetitions (if they enclose numbers) +or +definition expansion (if they enclose a name). For example +.TS +center; +l. +{digit} +.TE +looks for a predefined string named +.I digit +and inserts it +at that point in the expression. +The definitions are given in the first part of the Lex +input, before the rules. +In contrast, +.TS +center; +l. +a{1,5} +.TE +looks for 1 to 5 occurrences of +.I a . +.PP +Finally, initial +.I % +is special, being the separator +for Lex source segments. +.2C +.NH +Lex Actions. +.PP +When an expression written as above is matched, Lex +executes the corresponding action. This section describes +some features of Lex which aid in writing actions. Note +that there is a default action, which +consists of copying the input to the output. This +is performed on all strings not otherwise matched. Thus +the Lex user who wishes to absorb the entire input, without +producing any output, must provide rules to match everything. +When Lex is being used with Yacc, this is the normal +situation. +One may consider that actions are what is done instead of +copying the input to the output; thus, in general, +a rule which merely copies can be omitted. +Also, a character combination +which is omitted from the rules +and which appears as input +is likely to be printed on the output, thus calling +attention to the gap in the rules. +.PP +One of the simplest things that can be done is to ignore +the input. Specifying a C null statement, \fI;\fR as an action +causes this result. A frequent rule is +.TS +center; +l l. +[ \et\en] ; +.TE +which causes the three spacing characters (blank, tab, and newline) +to be ignored. +.PP +Another easy way to avoid writing actions is the action character +|, which indicates that the action for this rule is the action +for the next rule. +The previous example could also have been written +.TS +center, tab(#); +l l. +" "#| +"\et"#| +"\en"#; +.TE +with the same result, although in different style. +The quotes around \en and \et are not required. +.PP +In more complex actions, the user +will +often want to know the actual text that matched some expression +like +.I [a\(miz]+ . +Lex leaves this text in an external character +array named +.I +yytext. +.R +Thus, to print the name found, +a rule like +.TS +center; +l l. +[a\-z]+ printf("%s", yytext); +.TE +will print +the string in +.I +yytext. +.R +The C function +.I +printf +.R +accepts a format argument and data to be printed; +in this case, the format is ``print string'' (% indicating +data conversion, and +.I s +indicating string type), +and the data are the characters +in +.I +yytext. +.R +So this just places +the matched string +on the output. +This action +is so common that +it may be written as ECHO: +.TS +center; +l l. +[a\-z]+ ECHO; +.TE +is the same as the above. +Since the default action is just to +print the characters found, one might ask why +give a rule, like this one, which merely specifies +the default action? +Such rules are often required +to avoid matching some other rule +which is not desired. For example, if there is a rule +which matches +.I read +it will normally match the instances of +.I read +contained in +.I bread +or +.I readjust ; +to avoid +this, +a rule +of the form +.I [a\(miz]+ +is needed. +This is explained further below. +.PP +Sometimes it is more convenient to know the end of what +has been found; hence Lex also provides a count +.I +yyleng +.R +of the number of characters matched. +To count both the number +of words and the number of characters in words in the input, the user might write +.TS +center; +l l. +[a\-zA\-Z]+ {words++; chars += yyleng;} +.TE +which accumulates in +.ul +chars +the number +of characters in the words recognized. +The last character in the string matched can +be accessed by +.TS +center; +l. +yytext[yyleng\-1] +.TE +.PP +Occasionally, a Lex +action may decide that a rule has not recognized the correct +span of characters. +Two routines are provided to aid with this situation. +First, +.I +yymore() +.R +can be called to indicate that the next input expression recognized is to be +tacked on to the end of this input. Normally, +the next input string would overwrite the current +entry in +.I +yytext. +.R +Second, +.I +yyless (n) +.R +may be called to indicate that not all the characters matched +by the currently successful expression are wanted right now. +The argument +.I +n +.R +indicates the number of characters +in +.I +yytext +.R +to be retained. +Further characters previously matched +are +returned to the input. This provides the same sort of +look~ahead offered by the / operator, +but in a different form. +.PP +.I +Example: +.R +Consider a language which defines +a string as a set of characters between quotation (") marks, and provides that +to include a " in a string it must be preceded by a \e. The +regular expression which matches that is somewhat confusing, +so that it might be preferable to write +.TS +center; +l l. +\e"[^"]\(** { + if (yytext[yyleng\-1] == \(fm\e\e\(fm) + yymore(); + else + ... normal user processing + } +.TE +which will, when faced with a string such as +.I +"abc\e"def\|" +.R +first match +the five characters +\fI"abc\e\|\fR; +then +the call to +.I yymore() +will +cause the next part of the string, +\fI"def\|\fR, +to be tacked on the end. +Note that the final quote terminating the string should be picked +up in the code labeled ``normal processing''. +.PP +The function +.I +yyless() +.R +might be used to reprocess +text in various circumstances. Consider the C problem of distinguishing +the ambiguity of ``=\(mia''. +Suppose it is desired to treat this as ``=\(mi a'' +but print a message. A rule might be +.ps 9 +.vs 11 +.TS +center; +l l. +=\(mi[a\-zA\-Z] { + printf("Op (=\(mi) ambiguous\en"); + yyless(yyleng\-1); + ... action for =\(mi ... + } +.TE +.ps 10 +.vs 12 +which prints a message, returns the letter after the +operator to the input stream, and treats the operator as ``=\(mi''. +Alternatively it might be desired to treat this as ``= \(mia''. +To do this, just return the minus +sign as well as the letter to the input: +.ps 9 +.vs 11 +.TS +center; +l l. +=\(mi[a\-zA\-Z] { + printf("Op (=\(mi) ambiguous\en"); + yyless(yyleng\-2); + ... action for = ... + } +.TE +.ps 10 +.vs 12 +will perform the other interpretation. +Note that the expressions for the two cases might more easily +be written +.TS +center; +l l. +=\(mi/[A\-Za\-z] +.TE +in the first case and +.TS +center; +l. +=/\-[A\-Za\-z] +.TE +in the second; +no backup would be required in the rule action. +It is not necessary to recognize the whole identifier +to observe the ambiguity. +The +possibility of ``=\(mi3'', however, makes +.TS +center; +l. +=\(mi/[^ \et\en] +.TE +a still better rule. +.PP +In addition to these routines, Lex also permits +access to the I/O routines +it uses. +They are: +.IP 1) +.I +input() +.R +which returns the next input character; +.IP 2) +.I +output(c) +.R +which writes the character +.I +c +.R +on the output; and +.IP 3) +.I +unput(c) +.R +pushes the character +.I +c +.R +back onto the input stream to be read later by +.I +input(). +.R +.LP +By default these routines are provided as macro definitions, +but the user can override them and supply private versions. +These routines +define the relationship between external files and +internal characters, and must all be retained +or modified consistently. +They may be redefined, to +cause input or output to be transmitted to or from strange +places, including other programs or internal memory; +but the character set used must be consistent in all routines; +a value of zero returned by +.I +input +.R +must mean end of file; and +the relationship between +.I +unput +.R +and +.I +input +.R +must be retained +or the Lex look~ahead will not work. +Lex does not look ahead at all if it does not have to, +but every rule ending in +.ft I ++ \(** ? +.ft R +or +.ft I +$ +.ft R +or containing +.ft I +/ +.ft R +implies look~ahead. +Look~ahead is also necessary to match an expression that is a prefix +of another expression. +See below for a discussion of the character set used by Lex. +The standard Lex library imposes +a 100 character limit on backup. +.PP +Another Lex library routine that the user will sometimes want +to redefine is +.I +yywrap() +.R +which is called whenever Lex reaches an end-of-file. +If +.I +yywrap +.R +returns a 1, Lex continues with the normal wrapup on end of input. +Sometimes, however, it is convenient to arrange for more +input to arrive +from a new source. +In this case, the user should provide +a +.I +yywrap +.R +which +arranges for new input and +returns 0. This instructs Lex to continue processing. +The default +.I +yywrap +.R +always returns 1. +.PP +This routine is also a convenient place +to print tables, summaries, etc. at the end +of a program. Note that it is not +possible to write a normal rule which recognizes +end-of-file; the only access to this condition is +through +.I +yywrap. +.R +In fact, unless a private version of +.I +input() +.R +is supplied +a file containing nulls +cannot be handled, +since a value of 0 returned by +.I +input +.R +is taken to be end-of-file. +.PP +.2C +.NH +Ambiguous Source Rules. +.PP +Lex can handle ambiguous specifications. +When more than one expression can match the +current input, Lex chooses as follows: +.IP 1) +The longest match is preferred. +.IP 2) +Among rules which matched the same number of characters, +the rule given first is preferred. +.LP +Thus, suppose the rules +.TS +center; +l l. +integer keyword action ...; +[a\-z]+ identifier action ...; +.TE +to be given in that order. If the input is +.I integers , +it is taken as an identifier, because +.I [a\-z]+ +matches 8 characters while +.I integer +matches only 7. +If the input is +.I integer , +both rules match 7 characters, and +the keyword rule is selected because it was given first. +Anything shorter (e.g. \fIint\fR\|) will +not match the expression +.I integer +and so the identifier interpretation is used. +.PP +The principle of preferring the longest +match makes rules containing +expressions like +.I \&.\(** +dangerous. +For example, +.TS +center; +l. +\&\(fm.\(**\(fm +.TE +might seem a good way of recognizing +a string in single quotes. +But it is an invitation for the program to read far +ahead, looking for a distant +single quote. +Presented with the input +.TS +center; +l l. +\&\(fmfirst\(fm quoted string here, \(fmsecond\(fm here +.TE +the above expression will match +.TS +center; +l l. +\&\(fmfirst\(fm quoted string here, \(fmsecond\(fm +.TE +which is probably not what was wanted. +A better rule is of the form +.TS +center; +l. +\&\(fm[^\(fm\en]\(**\(fm +.TE +which, on the above input, will stop +after +.I \(fmfirst\(fm . +The consequences +of errors like this are mitigated by the fact +that the +.I \&. +operator will not match newline. +Thus expressions like +.I \&.\(** +stop on the +current line. +Don't try to defeat this with expressions like +.I (.|\en)+ +or +equivalents; +the Lex generated program will try to read +the entire input file, causing +internal buffer overflows. +.PP +Note that Lex is normally partitioning +the input stream, not searching for all possible matches +of each expression. +This means that each character is accounted for +once and only once. +For example, suppose it is desired to +count occurrences of both \fIshe\fR and \fIhe\fR in an input text. +Some Lex rules to do this might be +.TS +center; +l l. +she s++; +he h++; +\en | +\&. ; +.TE +where the last two rules ignore everything besides \fIhe\fR and \fIshe\fR. +Remember that . does not include newline. +Since \fIshe\fR includes \fIhe\fR, Lex will normally +.I +not +.R +recognize +the instances of \fIhe\fR included in \fIshe\fR, +since once it has passed a \fIshe\fR those characters are gone. +.PP +Sometimes the user would like to override this choice. The action +REJECT +means ``go do the next alternative.'' +It causes whatever rule was second choice after the current +rule to be executed. +The position of the input pointer is adjusted accordingly. +Suppose the user really wants to count the included instances of \fIhe\fR: +.TS +center; +l l. +she {s++; REJECT;} +he {h++; REJECT;} +\en | +\&. ; +.TE +these rules are one way of changing the previous example +to do just that. +After counting each expression, it is rejected; whenever appropriate, +the other expression will then be counted. In this example, of course, +the user could note that \fIshe\fR includes \fIhe\fR but not +vice versa, and omit the REJECT action on \fIhe\fR; +in other cases, however, it +would not be possible a priori to tell +which input characters +were in both classes. +.PP +Consider the two rules +.TS +center; +l l. +a[bc]+ { ... ; REJECT;} +a[cd]+ { ... ; REJECT;} +.TE +If the input is +.I ab , +only the first rule matches, +and on +.I ad +only the second matches. +The input string +.I accb +matches the first rule for four characters +and then the second rule for three characters. +In contrast, the input +.I accd +agrees with +the second rule for four characters and then the first +rule for three. +.PP +In general, REJECT is useful whenever +the purpose of Lex is not to partition the input +stream but to detect all examples of some items +in the input, and the instances of these items +may overlap or include each other. +Suppose a digram table of the input is desired; +normally the digrams overlap, that is the word +.I the +is considered to contain +both +.I th +and +.I he . +Assuming a two-dimensional array named +.ul +digram +to be incremented, the appropriate +source is +.TS +center; +l l. +%% +[a\-z][a\-z] { + digram[yytext[0]][yytext[1]]++; + REJECT; + } +\&. ; +\en ; +.TE +where the REJECT is necessary to pick up +a letter pair beginning at every character, rather than at every +other character. +.2C +.NH +Lex Source Definitions. +.PP +Remember the format of the Lex +source: +.TS +center; +l. +{definitions} +%% +{rules} +%% +{user routines} +.TE +So far only the rules have been described. The user needs +additional options, +though, to define variables for use in his program and for use +by Lex. +These can go either in the definitions section +or in the rules section. +.PP +Remember that Lex is turning the rules into a program. +Any source not intercepted by Lex is copied +into the generated program. There are three classes +of such things. +.IP 1) +Any line which is not part of a Lex rule or action +which begins with a blank or tab is copied into +the Lex generated program. +Such source input prior to the first %% delimiter will be external +to any function in the code; if it appears immediately after the first +%%, +it appears in an appropriate place for declarations +in the function written by Lex which contains the actions. +This material must look like program fragments, +and should precede the first Lex rule. +.IP +As a side effect of the above, lines which begin with a blank +or tab, and which contain a comment, +are passed through to the generated program. +This can be used to include comments in either the Lex source or +the generated code. The comments should follow the host +language convention. +.IP 2) +Anything included between lines containing +only +.I %{ +and +.I %} +is +copied out as above. The delimiters are discarded. +This format permits entering text like preprocessor statements that +must begin in column 1, +or copying lines that do not look like programs. +.IP 3) +Anything after the third %% delimiter, regardless of formats, etc., +is copied out after the Lex output. +.PP +Definitions intended for Lex are given +before the first %% delimiter. Any line in this section +not contained between %{ and %}, and begining +in column 1, is assumed to define Lex substitution strings. +The format of such lines is +.TS +center; +l l. +name translation +.TE +and it +causes the string given as a translation to +be associated with the name. +The name and translation +must be separated by at least one blank or tab, and the name must begin with a letter. +The translation can then be called out +by the {name} syntax in a rule. +Using {D} for the digits and {E} for an exponent field, +for example, might abbreviate rules to recognize numbers: +.TS +center, tab(#); +l l. +D#[0\-9] +E#[DEde][\-+]?{D}+ +%% +{D}+#printf("integer"); +{D}+"."{D}\(**({E})?#| +{D}\(**"."{D}+({E})?#| +{D}+{E}#printf("real"); +.TE +Note the first two rules for real numbers; +both require a decimal point and contain +an optional exponent field, +but the first requires at least one digit before the +decimal point and the second requires at least one +digit after the decimal point. +To correctly handle the problem +posed by a Fortran expression such as +.I 35.EQ.I , +which does not contain a real number, a context-sensitive +rule such as +.TS +center; +l l. +[0\-9]+/"."EQ printf("integer"); +.TE +could be used in addition to the normal rule for integers. +.PP +The definitions +section may also contain other commands, including the +selection of a host language, a character set table, +a list of start conditions, or adjustments to the default +size of arrays within Lex itself for larger source programs. +These possibilities +are discussed below under ``Summary of Source Format,'' +section 12. +.2C +.NH +Usage. +.PP +There are two steps in +compiling a Lex source program. +First, the Lex source must be turned into a generated program +in the host general purpose language. +Then this program must be compiled and loaded, usually with +a library of Lex subroutines. +The generated program +is on a file named +.I lex.yy.c . +The I/O library is defined in terms of the C standard +library [6]. +.PP +The C programs generated by Lex are slightly different +on OS/370, because the +OS compiler is less powerful than the UNIX or GCOS compilers, +and does less at compile time. +C programs generated on GCOS and UNIX are the same. +.PP +.I +UNIX. +.R +The library is accessed by the loader flag +.I \-ll . +So an appropriate +set of commands is +.KS +.in 5 +lex source +cc lex.yy.c \-ll +.in 0 +.KE +The resulting program is placed on the usual file +.I +a.out +.R +for later execution. +To use Lex with Yacc see below. +Although the default Lex I/O routines use the C standard library, +the Lex automata themselves do not do so; +if private versions of +.I +input, +output +.R +and +.I unput +are given, the library can be avoided. +.PP +.2C +.NH +Lex and Yacc. +.PP +If you want to use Lex with Yacc, note that what Lex writes is a program +named +.I +yylex(), +.R +the name required by Yacc for its analyzer. +Normally, the default main program on the Lex library +calls this routine, but if Yacc is loaded, and its main +program is used, Yacc will call +.I +yylex(). +.R +In this case each Lex rule should end with +.TS +center; +l. +return(token); +.TE +where the appropriate token value is returned. +An easy way to get access +to Yacc's names for tokens is to +compile the Lex output file as part of +the Yacc output file by placing the line +.TS +center; +l. +# include "lex.yy.c" +.TE +in the last section of Yacc input. +Supposing the grammar to be +named ``good'' and the lexical rules to be named ``better'' +the UNIX command sequence can just be: +.TS +center; +l. +yacc good +lex better +cc y.tab.c \-ly \-ll +.TE +The Yacc library (\-ly) should be loaded before the Lex library, +to obtain a main program which invokes the Yacc parser. +The generations of Lex and Yacc programs can be done in +either order. +.2C +.NH +Examples. +.PP +As a trivial problem, consider copying an input file while +adding 3 to every positive number divisible by 7. +Here is a suitable Lex source program +.TS +center; +l l. +%% + int k; +[0\-9]+ { + k = atoi(yytext); + if (k%7 == 0) + printf("%d", k+3); + else + printf("%d",k); + } +.TE +to do just that. +The rule [0\-9]+ recognizes strings of digits; +.I +atoi +.R +converts the digits to binary +and stores the result in +.ul +k. +The operator % (remainder) is used to check whether +.ul +k +is divisible by 7; if it is, +it is incremented by 3 as it is written out. +It may be objected that this program will alter such +input items as +.I 49.63 +or +.I X7 . +Furthermore, it increments the absolute value +of all negative numbers divisible by 7. +To avoid this, just add a few more rules after the active one, +as here: +.TS +center; +l l. +%% + int k; +\-?[0\-9]+ { + k = atoi(yytext); + printf("%d", + k%7 == 0 ? k+3 : k); + } +\-?[0\-9.]+ ECHO; +[A-Za-z][A-Za-z0-9]+ ECHO; +.TE +Numerical strings containing +a ``.'' or preceded by a letter will be picked up by +one of the last two rules, and not changed. +The +.I if\-else +has been replaced by +a C conditional expression to save space; +the form +.ul +a?b:c +means ``if +.I a +then +.I b +else +.I c ''. +.PP +For an example of statistics gathering, here +is a program which histograms the lengths +of words, where a word is defined as a string of letters. +.TS +center; +l l. + int lengs[100]; +%% +[a\-z]+ lengs[yyleng]++; +\&. | +\en ; +%% +.T& +l s. +yywrap() +{ +int i; +printf("Length No. words\en"); +for(i=0; i<100; i++) + if (lengs[i] > 0) + printf("%5d%10d\en",i,lengs[i]); +return(1); +} +.TE +This program +accumulates the histogram, while producing no output. At the end +of the input it prints the table. +The final statement +.I +return(1); +.R +indicates that Lex is to perform wrapup. If +.I +yywrap +.R +returns zero (false) +it implies that further input is available +and the program is +to continue reading and processing. +To provide a +.I +yywrap +.R +that never +returns true causes an infinite loop. +.PP +As a larger example, +here are some parts of a program written by N. L. Schryer +to convert double precision Fortran to single precision Fortran. +Because Fortran does not distinguish upper and lower case letters, +this routine begins by defining a set of classes including +both cases of each letter: +.TS +center; +l l. +a [aA] +b [bB] +c [cC] +\&... +z [zZ] +.TE +An additional class recognizes white space: +.TS +center; +l l. +W [ \et]\(** +.TE +The first rule changes +``double precision'' to ``real'', or ``DOUBLE PRECISION'' to ``REAL''. +.TS +center; +l. +{d}{o}{u}{b}{l}{e}{W}{p}{r}{e}{c}{i}{s}{i}{o}{n} { + printf(yytext[0]==\(fmd\(fm? "real" : "REAL"); + } +.TE +Care is taken throughout this program to preserve the case +(upper or lower) +of the original program. +The conditional operator is used to +select the proper form of the keyword. +The next rule copies continuation card indications to +avoid confusing them with constants: +.TS +center; +l l. +^" "[^ 0] ECHO; +.TE +In the regular expression, the quotes surround the +blanks. +It is interpreted as +``beginning of line, then five blanks, then +anything but blank or zero.'' +Note the two different meanings of +.I ^ . +There follow some rules to change double precision +constants to ordinary floating constants. +.TS +center; +l. +[0\-9]+{W}{d}{W}[+\-]?{W}[0\-9]+ | +[0\-9]+{W}"."{W}{d}{W}[+\-]?{W}[0\-9]+ | +"."{W}[0\-9]+{W}{d}{W}[+\-]?{W}[0\-9]+ { + /\(** convert constants \(**/ + for(p=yytext; \(**p != 0; p++) + { + if (\(**p == \(fmd\(fm || \(**p == \(fmD\(fm) + \(**p=+ \(fme\(fm\- \(fmd\(fm; + ECHO; + } +.TE +After the floating point constant is recognized, it is +scanned by the +.ul +for +loop +to find the letter +.I d +or +.I D . +The program then adds +.I \(fme\(fm\-\(fmd\(fm , +which converts +it to the next letter of the alphabet. +The modified constant, now single-precision, +is written out again. +There follow a series of names which must be respelled to remove +their initial \fId\fR. +By using the +array +.I +yytext +.R +the same action suffices for all the names (only a sample of +a rather long list is given here). +.TS +center; +l l. +{d}{s}{i}{n} | +{d}{c}{o}{s} | +{d}{s}{q}{r}{t} | +{d}{a}{t}{a}{n} | +\&... +{d}{f}{l}{o}{a}{t} printf("%s",yytext+1); +.TE +Another list of names must have initial \fId\fR changed to initial \fIa\fR: +.TS +center; +l l. +{d}{l}{o}{g} | +{d}{l}{o}{g}10 | +{d}{m}{i}{n}1 | +{d}{m}{a}{x}1 { + yytext[0] =+ \(fma\(fm \- \(fmd\(fm; + ECHO; + } +.TE +And one routine +must have initial \fId\fR changed to initial \fIr\fR: +.TS +center, tab(#); +l l. +{d}1{m}{a}{c}{h}#{yytext[0] =+ \(fmr\(fm \- \(fmd\(fm; +#ECHO; +#} +.TE +To avoid such names as \fIdsinx\fR being detected as instances +of \fIdsin\fR, some final rules pick up longer words as identifiers +and copy some surviving characters: +.TS +center; +l l. +[A\-Za\-z][A\-Za\-z0\-9]\(** | +[0\-9]+ | +\en | +\&. ECHO; +.TE +Note that this program is not complete; it +does not deal with the spacing problems in Fortran or +with the use of keywords as identifiers. +.br +.2C +.NH +Left Context Sensitivity. +.PP +Sometimes +it is desirable to have several sets of lexical rules +to be applied at different times in the input. +For example, a compiler preprocessor might distinguish +preprocessor statements and analyze them differently +from ordinary statements. +This requires +sensitivity +to prior context, and there are several ways of handling +such problems. +The \fI^\fR operator, for example, is a prior context operator, +recognizing immediately preceding left context just as \fI$\fR recognizes +immediately following right context. +Adjacent left context could be extended, to produce a facility similar to +that for adjacent right context, but it is unlikely +to be as useful, since often the relevant left context +appeared some time earlier, such as at the beginning of a line. +.PP +This section describes three means of dealing +with different environments: a simple use of flags, +when only a few rules change from one environment to another, +the use of +.I +start conditions +.R +on rules, +and the possibility of making multiple lexical analyzers all run +together. +In each case, there are rules which recognize the need to change the +environment in which the +following input text is analyzed, and set some parameter +to reflect the change. This may be a flag explicitly tested by +the user's action code; such a flag is the simplest way of dealing +with the problem, since Lex is not involved at all. +It may be more convenient, +however, +to have Lex remember the flags as initial conditions on the rules. +Any rule may be associated with a start condition. It will only +be recognized when Lex is in +that start condition. +The current start condition may be changed at any time. +Finally, if the sets of rules for the different environments +are very dissimilar, +clarity may be best achieved by writing several distinct lexical +analyzers, and switching from one to another as desired. +.PP +Consider the following problem: copy the input to the output, +changing the word \fImagic\fR to \fIfirst\fR on every line which began +with the letter \fIa\fR, changing \fImagic\fR to \fIsecond\fR on every line +which began with the letter \fIb\fR, and changing +\fImagic\fR to \fIthird\fR on every line which began +with the letter \fIc\fR. All other words and all other lines +are left unchanged. +.PP +These rules are so simple that the easiest way +to do this job is with a flag: +.TS +center; +l l. + int flag; +%% +^a {flag = \(fma\(fm; ECHO;} +^b {flag = \(fmb\(fm; ECHO;} +^c {flag = \(fmc\(fm; ECHO;} +\en {flag = 0 ; ECHO;} +magic { + switch (flag) + { + case \(fma\(fm: printf("first"); break; + case \(fmb\(fm: printf("second"); break; + case \(fmc\(fm: printf("third"); break; + default: ECHO; break; + } + } +.TE +should be adequate. +.PP +To handle the same problem with start conditions, each +start condition must be introduced to Lex in the definitions section +with a line reading +.TS +center; +l l. +%Start name1 name2 ... +.TE +where the conditions may be named in any order. +The word \fIStart\fR may be abbreviated to \fIs\fR or \fIS\fR. +The conditions may be referenced at the +head of a rule with the <> brackets: +.TS +center; +l. +<name1>expression +.TE +is a rule which is only recognized when Lex is in the +start condition \fIname1\fR. +To enter a start condition, +execute the action statement +.TS +center; +l. +BEGIN name1; +.TE +which changes the start condition to \fIname1\fR. +To resume the normal state, +.TS +center; +l. +BEGIN 0; +.TE +resets the initial condition +of the Lex automaton interpreter. +A rule may be active in several +start conditions: +.TS +center; +l. +<name1,name2,name3> +.TE +is a legal prefix. Any rule not beginning with the +<> prefix operator is always active. +.PP +The same example as before can be written: +.TS +center; +l l. +%START AA BB CC +%% +^a {ECHO; BEGIN AA;} +^b {ECHO; BEGIN BB;} +^c {ECHO; BEGIN CC;} +\en {ECHO; BEGIN 0;} +<AA>magic printf("first"); +<BB>magic printf("second"); +<CC>magic printf("third"); +.TE +where the logic is exactly the same as in the previous +method of handling the problem, but Lex does the work +rather than the user's code. +.2C +.NH +Character Set. +.PP +The programs generated by Lex handle +character I/O only through the routines +.I +input, +output, +.R +and +.I +unput. +.R +Thus the character representation +provided in these routines +is accepted by Lex and employed to return +values in +.I +yytext. +.R +For internal use +a character is represented as a small integer +which, if the standard library is used, +has a value equal to the integer value of the bit +pattern representing the character on the host computer. +Normally, the letter +.I a +is represented as the same form as the character constant +.I \(fma\(fm . +If this interpretation is changed, by providing I/O +routines which translate the characters, +Lex must be told about +it, by giving a translation table. +This table must be in the definitions section, +and must be bracketed by lines containing only +``%T''. +The table contains lines of the form +.TS +center; +l. +{integer} {character string} +.TE +which indicate the value associated with each character. +Thus the next example +.\" .GS 2 +.TS +center; +l l. +%T + 1 Aa + 2 Bb +\&... +26 Zz +27 \en +28 + +29 \- +30 0 +31 1 +\&... +39 9 +%T +.TE +.sp +.ce 1 +Sample character table. +.\" .GE +maps the lower and upper case letters together into the integers 1 through 26, +newline into 27, + and \- into 28 and 29, and the +digits into 30 through 39. +Note the escape for newline. +If a table is supplied, every character that is to appear either +in the rules or in any valid input must be included +in the table. +No character +may be assigned the number 0, and no character may be +assigned a bigger number than the size of the hardware character set. +.2C +.NH +Summary of Source Format. +.PP +The general form of a Lex source file is: +.TS +center; +l. +{definitions} +%% +{rules} +%% +{user subroutines} +.TE +The definitions section contains +a combination of +.IP 1) +Definitions, in the form ``name space translation''. +.IP 2) +Included code, in the form ``space code''. +.IP 3) +Included code, in the form +.TS +center; +l. +%{ +code +%} +.TE +.ns +.IP 4) +Start conditions, given in the form +.TS +center; +l. +%S name1 name2 ... +.TE +.ns +.IP 5) +Character set tables, in the form +.TS +center; +l. +%T +number space character-string +\&... +%T +.TE +.ns +.IP 6) +Changes to internal array sizes, in the form +.TS +center; +l. +%\fIx\fR\0\0\fInnn\fR +.TE +where \fInnn\fR is a decimal integer representing an array size +and \fIx\fR selects the parameter as follows: +.TS +center; +c c +c l. +Letter Parameter +p positions +n states +e tree nodes +a transitions +k packed character classes +o output array size +.TE +.LP +Lines in the rules section have the form ``expression action'' +where the action may be continued on succeeding +lines by using braces to delimit it. +.PP +Regular expressions in Lex use the following +operators: +.br +.TS +center; +l l. +x the character "x" +"x" an "x", even if x is an operator. +\ex an "x", even if x is an operator. +[xy] the character x or y. +[x\-z] the characters x, y or z. +[^x] any character but x. +\&. any character but newline. +^x an x at the beginning of a line. +<y>x an x when Lex is in start condition y. +x$ an x at the end of a line. +x? an optional x. +x\(** 0,1,2, ... instances of x. +x+ 1,2,3, ... instances of x. +x|y an x or a y. +(x) an x. +x/y an x but only if followed by y. +{xx} the translation of xx from the + definitions section. +x{m,n} \fIm\fR through \fIn\fR occurrences of x +.TE +.NH +Caveats and Bugs. +.PP +There are pathological expressions which +produce exponential growth of the tables when +converted to deterministic machines; +fortunately, they are rare. +.PP +REJECT does not rescan the input; instead it remembers the results of the previous +scan. This means that if a rule with trailing context is found, and +REJECT executed, the user +must not have used +.ul +unput +to change the characters forthcoming +from the input stream. +This is the only restriction on the user's ability to manipulate +the not-yet-processed input. +.PP +.2C +.NH +Acknowledgments. +.PP +As should +be obvious from the above, the outside of Lex +is patterned +on Yacc and the inside on Aho's string matching routines. +Therefore, both S. C. Johnson and A. V. Aho +are really originators +of much of Lex, +as well as debuggers of it. +Many thanks are due to both. +.PP +The code of the current version of Lex was designed, written, +and debugged by Eric Schmidt. +.if 0 .SG MH-1274-MEL-unix +.sp 1 +.2C +.NH +References. +.sp 1v +.IP 1. +B. W. Kernighan and D. M. Ritchie, +.I +The C Programming Language, +.R +Prentice-Hall, N. J. (1978). +.IP 2. +B. W. Kernighan, +.I +Ratfor: A Preprocessor for a Rational Fortran, +.R +Software \- Practice and Experience, +\fB5\fR, pp. 395-496 (1975). +.IP 3. +S. C. Johnson, +.I +Yacc: Yet Another Compiler Compiler, +.R +Computing Science Technical Report No. 32, +1975, +.MH +.if \n(tm (also TM 75-1273-6) +.IP 4. +A. V. Aho and M. J. Corasick, +.I +Efficient String Matching: An Aid to Bibliographic Search, +.R +Comm. ACM +.B +18, +.R +333-340 (1975). +.IP 5. +B. W. Kernighan, D. M. Ritchie and K. L. Thompson, +.I +QED Text Editor, +.R +Computing Science Technical Report No. 5, +1972, +.MH +.IP 6. +D. M. Ritchie, +private communication. +See also +M. E. Lesk, +.I +The Portable C Library, +.R +Computing Science Technical Report No. 31, +.MH +.if \n(tm (also TM 75-1274-11) diff --git a/share/doc/psd/17.m4/Makefile b/share/doc/psd/17.m4/Makefile new file mode 100644 index 0000000..c48921f --- /dev/null +++ b/share/doc/psd/17.m4/Makefile @@ -0,0 +1,8 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/17.m4 +SRCS= m4.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/psd/17.m4/m4.ms b/share/doc/psd/17.m4/m4.ms new file mode 100644 index 0000000..c7a2fd9 --- /dev/null +++ b/share/doc/psd/17.m4/m4.ms @@ -0,0 +1,973 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m4.ms 6.3 (Berkeley) 6/5/93 +.\" +.\" $FreeBSD$ +.de MH +Bell Laboratories, Murray Hill, NJ 07974. +.. +.EH 'PSD:17-%''The M4 Macro Processor' +.OH 'The M4 Macro Processor''PSD:17-%' +.if n .ls 2 +.tr _\(em +.tr *\(** +.de UC +\&\\$3\s-1\\$1\\s0\&\\$2 +.. +.de IT +.if n .ul +\&\\$3\f2\\$1\fP\&\\$2 +.. +.de UL +.if n .ul +\&\\$3\f3\\$1\fP\&\\$2 +.. +.de P1 +.DS I 3n +.if n .ls 2 +.nf +.if n .ta 5 10 15 20 25 30 35 40 45 50 55 60 +.if t .ta .4i .8i 1.2i 1.6i 2i 2.4i 2.8i 3.2i 3.6i 4i 4.4i 4.8i 5.2i 5.6i +.if t .tr -\(mi|\(bv'\(fm^\(no*\(** +.tr `\(ga'\(aa +.if t .tr _\(ul +.ft 3 +.lg 0 +.. +.de P2 +.ps \\n(PS +.vs \\n(VSp +.ft R +.if n .ls 2 +.tr --||''^^!! +.if t .tr _\(em +.fi +.lg +.DE +.if t .tr _\(em +.. +.hw semi-colon +.hw estab-lished +.hy 14 +. \"2=not last lines; 4= no -xx; 8=no xx- +. \"special chars in programs +. \" start of text +.\".RP +.\" .....TR 59 +.\" .....TM 77-1273-6 39199 39199-11 +.ND "July 1, 1977" +.TL +The M4 Macro Processor +.AU "MH 2C-518" 6021 +Brian W. Kernighan +.AU "MH 2C-517" 3770 +Dennis M. Ritchie +.AI +.MH +.AB +.PP +M4 is a macro processor available on +.UX +and +.UC GCOS . +Its primary use has been as a +front end for Ratfor for those +cases where parameterless macros +are not adequately powerful. +It has also been used for languages as disparate as C and Cobol. +M4 is particularly suited for functional languages like Fortran, PL/I and C +since macros are specified in a functional notation. +.PP +M4 provides features seldom found even in much larger +macro processors, +including +.IP " \(bu" +arguments +.IP " \(bu" +condition testing +.IP " \(bu" +arithmetic capabilities +.IP " \(bu" +string and substring functions +.IP " \(bu" +file manipulation +.LP +.PP +This paper is a user's manual for M4. +.AE +.\" .CS 6 0 6 0 0 1 +.if t .2C +.SH +Introduction +.PP +A macro processor is a useful way to enhance a programming language, +to make it more palatable +or more readable, +or to tailor it to a particular application. +The +.UL #define +statement in C +and the analogous +.UL define +in Ratfor +are examples of the basic facility provided by +any macro processor _ +replacement of text by other text. +.PP +The M4 macro processor is an extension of a macro processor called M3 +which was written by D. M. Ritchie +for the AP-3 minicomputer; +M3 was in turn based on a macro processor implemented for [1]. +Readers unfamiliar with the basic ideas of macro processing +may wish to read some of the discussion there. +.PP +M4 is a suitable front end for Ratfor and C, +and has also been used successfully with Cobol. +Besides the straightforward replacement of one string of text by another, +it provides +macros with arguments, +conditional macro expansion, +arithmetic, +file manipulation, +and some specialized string processing functions. +.PP +The basic operation of M4 +is to copy its input to its output. +As the input is read, however, each alphanumeric ``token'' +(that is, string of letters and digits) is checked. +If it is the name of a macro, +then the name of the macro is replaced by its defining text, +and the resulting string is pushed back onto the +input to be rescanned. +Macros may be called with arguments, in which case the arguments are collected +and substituted into the right places in the defining text +before it is rescanned. +.PP +M4 provides a collection of about twenty built-in +macros +which perform various useful operations; +in addition, the user can define new macros. +Built-ins and user-defined macros work exactly the same way, except that +some of the built-in macros have side effects +on the state of the process. +.SH +Usage +.PP +On +.UC UNIX , +use +.P1 +m4 [files] +.P2 +Each argument file is processed in order; +if there are no arguments, or if an argument +is `\-', +the standard input is read at that point. +The processed text is written on the standard output, +which may be captured for subsequent processing with +.P1 +m4 [files] >outputfile +.P2 +On +.UC GCOS , +usage is identical, but the program is called +.UL \&./m4 . +.SH +Defining Macros +.PP +The primary built-in function of M4 +is +.UL define , +which is used to define new macros. +The input +.P1 +define(name, stuff) +.P2 +causes the string +.UL name +to be defined as +.UL stuff . +All subsequent occurrences of +.UL name +will be replaced by +.UL stuff . +.UL name +must be alphanumeric and must begin with a letter +(the underscore \(ul counts as a letter). +.UL stuff +is any text that contains balanced parentheses; +it may stretch over multiple lines. +.PP +Thus, as a typical example, +.P1 +define(N, 100) + ... +if (i > N) +.P2 +defines +.UL N +to be 100, and uses this ``symbolic constant'' in a later +.UL if +statement. +.PP +The left parenthesis must immediately follow the word +.UL define , +to signal that +.UL define +has arguments. +If a macro or built-in name is not followed immediately by `(', +it is assumed to have no arguments. +This is the situation for +.UL N +above; +it is actually a macro with no arguments, +and thus when it is used there need be no (...) following it. +.PP +You should also notice that a macro name is only recognized as such +if it appears surrounded by non-alphanumerics. +For example, in +.P1 +define(N, 100) + ... +if (NNN > 100) +.P2 +the variable +.UL NNN +is absolutely unrelated to the defined macro +.UL N , +even though it contains a lot of +.UL N 's. +.PP +Things may be defined in terms of other things. +For example, +.P1 +define(N, 100) +define(M, N) +.P2 +defines both M and N to be 100. +.PP +What happens if +.UL N +is redefined? +Or, to say it another way, is +.UL M +defined as +.UL N +or as 100? +In M4, +the latter is true _ +.UL M +is 100, so even if +.UL N +subsequently changes, +.UL M +does not. +.PP +This behavior arises because +M4 expands macro names into their defining text as soon as it possibly can. +Here, that means that when the string +.UL N +is seen as the arguments of +.UL define +are being collected, it is immediately replaced by 100; +it's just as if you had said +.P1 +define(M, 100) +.P2 +in the first place. +.PP +If this isn't what you really want, there are two ways out of it. +The first, which is specific to this situation, +is to interchange the order of the definitions: +.P1 +define(M, N) +define(N, 100) +.P2 +Now +.UL M +is defined to be the string +.UL N , +so when you ask for +.UL M +later, you'll always get the value of +.UL N +at that time +(because the +.UL M +will be replaced by +.UL N +which will be replaced by 100). +.SH +Quoting +.PP +The more general solution is to delay the expansion of +the arguments of +.UL define +by +.ul +quoting +them. +Any text surrounded by the single quotes \(ga and \(aa +is not expanded immediately, but has the quotes stripped off. +If you say +.P1 +define(N, 100) +define(M, `N') +.P2 +the quotes around the +.UL N +are stripped off as the argument is being collected, +but they have served their purpose, and +.UL M +is defined as +the string +.UL N , +not 100. +The general rule is that M4 always strips off +one level of single quotes whenever it evaluates +something. +This is true even outside of +macros. +If you want the word +.UL define +to appear in the output, +you have to quote it in the input, +as in +.P1 + `define' = 1; +.P2 +.PP +As another instance of the same thing, which is a bit more surprising, +consider redefining +.UL N : +.P1 +define(N, 100) + ... +define(N, 200) +.P2 +Perhaps regrettably, the +.UL N +in the second definition is +evaluated as soon as it's seen; +that is, it is +replaced by +100, so it's as if you had written +.P1 +define(100, 200) +.P2 +This statement is ignored by M4, since you can only define things that look +like names, but it obviously doesn't have the effect you wanted. +To really redefine +.UL N , +you must delay the evaluation by quoting: +.P1 +define(N, 100) + ... +define(`N', 200) +.P2 +In M4, +it is often wise to quote the first argument of a macro. +.PP +If \` and \' are not convenient for some reason, +the quote characters can be changed with the built-in +.UL changequote : +.P1 +changequote([, ]) +.P2 +makes the new quote characters the left and right brackets. +You can restore the original characters with just +.P1 +changequote +.P2 +.PP +There are two additional built-ins related to +.UL define . +.UL undefine +removes the definition of some macro or built-in: +.P1 +undefine(`N') +.P2 +removes the definition of +.UL N . +(Why are the quotes absolutely necessary?) +Built-ins can be removed with +.UL undefine , +as in +.P1 +undefine(`define') +.P2 +but once you remove one, you can never get it back. +.PP +The built-in +.UL ifdef +provides a way to determine if a macro is currently defined. +In particular, M4 has pre-defined the names +.UL unix +and +.UL gcos +on the corresponding systems, so you can +tell which one you're using: +.P1 +ifdef(`unix', `define(wordsize,16)' ) +ifdef(`gcos', `define(wordsize,36)' ) +.P2 +makes a definition appropriate for the particular machine. +Don't forget the quotes! +.PP +.UL ifdef +actually permits three arguments; +if the name is undefined, the value of +.UL ifdef +is then the third argument, as in +.P1 +ifdef(`unix', on UNIX, not on UNIX) +.P2 +.SH +Arguments +.PP +So far we have discussed the simplest form of macro processing _ +replacing one string by another (fixed) string. +User-defined macros may also have arguments, so different invocations +can have different results. +Within the replacement text for a macro +(the second argument of its +.UL define ) +any occurrence of +.UL $n +will be replaced by the +.UL n th +argument when the macro +is actually used. +Thus, the macro +.UL bump , +defined as +.P1 +define(bump, $1 = $1 + 1) +.P2 +generates code to increment its argument by 1: +.P1 +bump(x) +.P2 +is +.P1 +x = x + 1 +.P2 +.PP +A macro can have as many arguments as you want, +but only the first nine are accessible, +through +.UL $1 +to +.UL $9 . +(The macro name itself is +.UL $0 , +although that is less commonly used.) +Arguments that are not supplied are replaced by null strings, +so +we can define a macro +.UL cat +which simply concatenates its arguments, like this: +.P1 +define(cat, $1$2$3$4$5$6$7$8$9) +.P2 +Thus +.P1 +cat(x, y, z) +.P2 +is equivalent to +.P1 +xyz +.P2 +.UL $4 +through +.UL $9 +are null, since no corresponding arguments were provided. +.PP +.PP +Leading unquoted blanks, tabs, or newlines that occur during argument collection +are discarded. +All other white space is retained. +Thus +.P1 +define(a, b c) +.P2 +defines +.UL a +to be +.UL b\ \ \ c . +.PP +Arguments are separated by commas, but parentheses are counted properly, +so a comma ``protected'' by parentheses does not terminate an argument. +That is, in +.P1 +define(a, (b,c)) +.P2 +there are only two arguments; +the second is literally +.UL (b,c) . +And of course a bare comma or parenthesis can be inserted by quoting it. +.SH +Arithmetic Built-ins +.PP +M4 provides two built-in functions for doing arithmetic +on integers (only). +The simplest is +.UL incr , +which increments its numeric argument by 1. +Thus to handle the common programming situation +where you want a variable to be defined as ``one more than N'', +write +.P1 +define(N, 100) +define(N1, `incr(N)') +.P2 +Then +.UL N1 +is defined as one more than the current value of +.UL N . +.PP +The more general mechanism for arithmetic is a built-in +called +.UL eval , +which is capable of arbitrary arithmetic on integers. +It provides the operators +(in decreasing order of precedence) +.DS +unary + and \(mi +** or ^ (exponentiation) +* / % (modulus) ++ \(mi +== != < <= > >= +! (not) +& or && (logical and) +\(or or \(or\(or (logical or) +.DE +Parentheses may be used to group operations where needed. +All the operands of +an expression given to +.UL eval +must ultimately be numeric. +The numeric value of a true relation +(like 1>0) +is 1, and false is 0. +The precision in +.UL eval +is +32 bits on +.UC UNIX +and 36 bits on +.UC GCOS . +.PP +As a simple example, suppose we want +.UL M +to be +.UL 2**N+1 . +Then +.P1 +define(N, 3) +define(M, `eval(2**N+1)') +.P2 +As a matter of principle, it is advisable +to quote the defining text for a macro +unless it is very simple indeed +(say just a number); +it usually gives the result you want, +and is a good habit to get into. +.SH +File Manipulation +.PP +You can include a new file in the input at any time by +the built-in function +.UL include : +.P1 +include(filename) +.P2 +inserts the contents of +.UL filename +in place of the +.UL include +command. +The contents of the file is often a set of definitions. +The value +of +.UL include +(that is, its replacement text) +is the contents of the file; +this can be captured in definitions, etc. +.PP +It is a fatal error if the file named in +.UL include +cannot be accessed. +To get some control over this situation, the alternate form +.UL sinclude +can be used; +.UL sinclude +(``silent include'') +says nothing and continues if it can't access the file. +.PP +It is also possible to divert the output of M4 to temporary files during processing, +and output the collected material upon command. +M4 maintains nine of these diversions, numbered 1 through 9. +If you say +.P1 +divert(n) +.P2 +all subsequent output is put onto the end of a temporary file +referred to as +.UL n . +Diverting to this file is stopped by another +.UL divert +command; +in particular, +.UL divert +or +.UL divert(0) +resumes the normal output process. +.PP +Diverted text is normally output all at once +at the end of processing, +with the diversions output in numeric order. +It is possible, however, to bring back diversions +at any time, +that is, to append them to the current diversion. +.P1 +undivert +.P2 +brings back all diversions in numeric order, and +.UL undivert +with arguments brings back the selected diversions +in the order given. +The act of undiverting discards the diverted stuff, +as does diverting into a diversion +whose number is not between 0 and 9 inclusive. +.PP +The value of +.UL undivert +is +.ul +not +the diverted stuff. +Furthermore, the diverted material is +.ul +not +rescanned for macros. +.PP +The built-in +.UL divnum +returns the number of the currently active diversion. +This is zero during normal processing. +.SH +System Command +.PP +You can run any program in the local operating system +with the +.UL syscmd +built-in. +For example, +.P1 +syscmd(date) +.P2 +on +.UC UNIX +runs the +.UL date +command. +Normally +.UL syscmd +would be used to create a file +for a subsequent +.UL include . +.PP +To facilitate making unique file names, the built-in +.UL maketemp +is provided, with specifications identical to the system function +.ul +mktemp: +a string of XXXXX in the argument is replaced +by the process id of the current process. +.SH +Conditionals +.PP +There is a built-in called +.UL ifelse +which enables you to perform arbitrary conditional testing. +In the simplest form, +.P1 +ifelse(a, b, c, d) +.P2 +compares the two strings +.UL a +and +.UL b . +If these are identical, +.UL ifelse +returns +the string +.UL c ; +otherwise it returns +.UL d . +Thus we might define a macro called +.UL compare +which compares two strings and returns ``yes'' or ``no'' +if they are the same or different. +.P1 +define(compare, `ifelse($1, $2, yes, no)') +.P2 +Note the quotes, +which prevent too-early evaluation of +.UL ifelse . +.PP +If the fourth argument is missing, it is treated as empty. +.PP +.UL ifelse +can actually have any number of arguments, +and thus provides a limited form of multi-way decision capability. +In the input +.P1 +ifelse(a, b, c, d, e, f, g) +.P2 +if the string +.UL a +matches the string +.UL b , +the result is +.UL c . +Otherwise, if +.UL d +is the same as +.UL e , +the result is +.UL f . +Otherwise the result is +.UL g . +If the final argument +is omitted, the result is null, +so +.P1 +ifelse(a, b, c) +.P2 +is +.UL c +if +.UL a +matches +.UL b , +and null otherwise. +.SH +String Manipulation +.PP +The built-in +.UL len +returns the length of the string that makes up its argument. +Thus +.P1 +len(abcdef) +.P2 +is 6, and +.UL len((a,b)) +is 5. +.PP +The built-in +.UL substr +can be used to produce substrings of strings. +.UL substr(s,\ i,\ n) +returns the substring of +.UL s +that starts at the +.UL i th +position +(origin zero), +and is +.UL n +characters long. +If +.UL n +is omitted, the rest of the string is returned, +so +.P1 +substr(`now is the time', 1) +.P2 +is +.P1 +ow is the time +.P2 +If +.UL i +or +.UL n +are out of range, various sensible things happen. +.PP +.UL index(s1,\ s2) +returns the index (position) in +.UL s1 +where the string +.UL s2 +occurs, or \-1 +if it doesn't occur. +As with +.UL substr , +the origin for strings is 0. +.PP +The built-in +.UL translit +performs character transliteration. +.P1 +translit(s, f, t) +.P2 +modifies +.UL s +by replacing any character found in +.UL f +by the corresponding character of +.UL t . +That is, +.P1 +translit(s, aeiou, 12345) +.P2 +replaces the vowels by the corresponding digits. +If +.UL t +is shorter than +.UL f , +characters which don't have an entry in +.UL t +are deleted; as a limiting case, +if +.UL t +is not present at all, +characters from +.UL f +are deleted from +.UL s . +So +.P1 +translit(s, aeiou) +.P2 +deletes vowels from +.UL s . +.PP +There is also a built-in called +.UL dnl +which deletes all characters that follow it up to +and including the next newline; +it is useful mainly for throwing away +empty lines that otherwise tend to clutter up M4 output. +For example, if you say +.P1 +define(N, 100) +define(M, 200) +define(L, 300) +.P2 +the newline at the end of each line is not part of the definition, +so it is copied into the output, where it may not be wanted. +If you add +.UL dnl +to each of these lines, the newlines will disappear. +.PP +Another way to achieve this, due to J. E. Weythman, +is +.P1 +divert(-1) + define(...) + ... +divert +.P2 +.SH +Printing +.PP +The built-in +.UL errprint +writes its arguments out on the standard error file. +Thus you can say +.P1 +errprint(`fatal error') +.P2 +.PP +.UL dumpdef +is a debugging aid which +dumps the current definitions of defined terms. +If there are no arguments, you get everything; +otherwise you get the ones you name as arguments. +Don't forget to quote the names! +.SH +Summary of Built-ins +.PP +Each entry is preceded by the +page number where it is described. +.DS +.tr '\'`\` +.ta .25i +3 changequote(L, R) +1 define(name, replacement) +4 divert(number) +4 divnum +5 dnl +5 dumpdef(`name', `name', ...) +5 errprint(s, s, ...) +4 eval(numeric expression) +3 ifdef(`name', this if true, this if false) +5 ifelse(a, b, c, d) +4 include(file) +3 incr(number) +5 index(s1, s2) +5 len(string) +4 maketemp(...XXXXX...) +4 sinclude(file) +5 substr(string, position, number) +4 syscmd(s) +5 translit(str, from, to) +3 undefine(`name') +4 undivert(number,number,...) +.DE +.SH +Acknowledgements +.PP +We are indebted to Rick Becker, John Chambers, +Doug McIlroy, +and especially Jim Weythman, +whose pioneering use of M4 has led to several valuable improvements. +We are also deeply grateful to Weythman for several substantial contributions +to the code. +.\" .SG +.SH +References +.LP +.IP [1] +B. W. Kernighan and P. J. Plauger, +.ul +Software Tools, +Addison-Wesley, Inc., 1976. diff --git a/share/doc/psd/18.gprof/Makefile b/share/doc/psd/18.gprof/Makefile new file mode 100644 index 0000000..1097072 --- /dev/null +++ b/share/doc/psd/18.gprof/Makefile @@ -0,0 +1,15 @@ +# From: @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +VOLUME= psd/18.gprof +SRCS= header.me abstract.me intro.me profiling.me gathering.me \ + postp.me present.me refs.me +EXTRA= postp1.pic postp2.pic postp3.pic pres1.pic pres2.pic +MACROS= -me +USE_SOELIM= +USE_PIC= +USE_TBL= +USE_EQN= +SRCDIR= ${.CURDIR}/../../../../usr.bin/gprof/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/20.ipctut/Makefile b/share/doc/psd/20.ipctut/Makefile new file mode 100644 index 0000000..934cdea --- /dev/null +++ b/share/doc/psd/20.ipctut/Makefile @@ -0,0 +1,14 @@ +# From: @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +VOLUME= psd/20.ipctut +SRCS= tutor.me +MACROS= -me +EXTRA= dgramread.c dgramsend.c fig2.pic fig3.pic fig8.pic pipe.c \ + socketpair.c strchkread.c streamread.c streamwrite.c \ + udgramread.c udgramsend.c ustreamread.c ustreamwrite.c +USE_SOELIM= +USE_PIC= +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/psd/20.ipctut/dgramread.c b/share/doc/psd/20.ipctut/dgramread.c new file mode 100644 index 0000000..193fca9 --- /dev/null +++ b/share/doc/psd/20.ipctut/dgramread.c @@ -0,0 +1,83 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)dgramread.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <stdio.h> + +/* + * In the included file <netinet/in.h> a sockaddr_in is defined as follows: + * struct sockaddr_in { + * short sin_family; + * u_short sin_port; + * struct in_addr sin_addr; + * char sin_zero[8]; + * }; + * + * This program creates a datagram socket, binds a name to it, then reads + * from the socket. + */ +main() +{ + int sock, length; + struct sockaddr_in name; + char buf[1024]; + + /* Create socket from which to read. */ + sock = socket(AF_INET, SOCK_DGRAM, 0); + if (sock < 0) { + perror("opening datagram socket"); + exit(1); + } + /* Create name with wildcards. */ + name.sin_family = AF_INET; + name.sin_addr.s_addr = INADDR_ANY; + name.sin_port = 0; + if (bind(sock, &name, sizeof(name))) { + perror("binding datagram socket"); + exit(1); + } + /* Find assigned port value and print it out. */ + length = sizeof(name); + if (getsockname(sock, &name, &length)) { + perror("getting socket name"); + exit(1); + } + printf("Socket has port #%d\en", ntohs(name.sin_port)); + /* Read from the socket */ + if (read(sock, buf, 1024) < 0) + perror("receiving datagram packet"); + printf("-->%s\en", buf); + close(sock); +} diff --git a/share/doc/psd/20.ipctut/dgramsend.c b/share/doc/psd/20.ipctut/dgramsend.c new file mode 100644 index 0000000..4bd1e5a --- /dev/null +++ b/share/doc/psd/20.ipctut/dgramsend.c @@ -0,0 +1,80 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)dgramsend.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <netdb.h> +#include <stdio.h> + +#define DATA "The sea is calm tonight, the tide is full . . ." + +/* + * Here I send a datagram to a receiver whose name I get from the command + * line arguments. The form of the command line is dgramsend hostname + * portnumber + */ + +main(argc, argv) + int argc; + char *argv[]; +{ + int sock; + struct sockaddr_in name; + struct hostent *hp, *gethostbyname(); + + /* Create socket on which to send. */ + sock = socket(AF_INET, SOCK_DGRAM, 0); + if (sock < 0) { + perror("opening datagram socket"); + exit(1); + } + /* + * Construct name, with no wildcards, of the socket to send to. + * Gethostbyname() returns a structure including the network address + * of the specified host. The port number is taken from the command + * line. + */ + hp = gethostbyname(argv[1]); + if (hp == 0) { + fprintf(stderr, "%s: unknown host\en", argv[1]); + exit(2); + } + bcopy(hp->h_addr, &name.sin_addr, hp->h_length); + name.sin_family = AF_INET; + name.sin_port = htons(atoi(argv[2])); + /* Send message. */ + if (sendto(sock, DATA, sizeof(DATA), 0, &name, sizeof(name)) < 0) + perror("sending datagram message"); + close(sock); +} diff --git a/share/doc/psd/20.ipctut/fig2.pic b/share/doc/psd/20.ipctut/fig2.pic new file mode 100644 index 0000000..ffbc193 --- /dev/null +++ b/share/doc/psd/20.ipctut/fig2.pic @@ -0,0 +1,77 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" @(#)fig2.pic 8.1 (Berkeley) 8/14/93 +.PS +.ps +.ps 10 +arc at 5.407,4.723 from 5.375,4.838 to 5.362,4.612 cw +arc at 5.907,7.973 from 5.875,8.088 to 5.862,7.862 cw +line from 5.963,5.513 to 6.925,5.513 +line from 5.963,5.650 to 6.925,5.650 +line from 5.963,5.787 to 6.925,5.787 +line from 5.963,5.912 to 6.925,5.912 +line from 5.963,6.050 to 6.925,6.050 +line from 5.963,6.200 to 6.925,6.200 to 6.925,5.375 to 5.963,5.375 to 5.963,6.200 +ellipse at 6.388,6.713 wid 0.475 ht 0.475 +line from 6.388,6.463 to 6.388,6.200 +line from 3.150,6.200 to 4.112,6.200 to 4.112,5.375 to 3.150,5.375 to 3.150,6.200 +line from 3.150,6.050 to 4.112,6.050 +line from 3.150,5.912 to 4.112,5.912 +line from 3.150,5.787 to 4.112,5.787 +line from 3.150,5.650 to 4.112,5.650 +line from 3.150,5.513 to 4.112,5.513 +ellipse at 3.575,6.713 wid 0.475 ht 0.475 +line from 3.575,6.463 to 3.575,6.200 +line from 3.650,8.762 to 4.612,8.762 +line from 3.650,8.900 to 4.612,8.900 +line from 3.650,9.037 to 4.612,9.037 +line from 3.650,9.162 to 4.612,9.162 +line from 3.650,9.300 to 4.612,9.300 +line from 3.650,9.450 to 4.612,9.450 to 4.612,8.625 to 3.650,8.625 to 3.650,9.450 +ellipse at 4.075,9.963 wid 0.475 ht 0.475 +ellipse at 3.950,4.725 wid 0.225 ht 0.225 +ellipse at 4.450,7.975 wid 0.225 ht 0.225 +dashwid = 0.037i +line dotted from 1.925,7.513 to 8.238,7.513 +line from 6.050,6.138 to 5.737,6.138 to 5.737,4.700 to 5.550,4.700 +line from 5.650,4.725 to 5.550,4.700 to 5.650,4.675 +line from 6.050,6.013 to 4.050,4.888 +line from 4.125,4.958 to 4.050,4.888 to 4.149,4.915 +line from 3.975,6.000 to 4.525,5.987 to 3.925,4.875 +line from 3.950,4.975 to 3.925,4.875 to 3.994,4.951 +line from 3.975,6.112 to 5.650,6.112 to 5.650,4.750 to 5.550,4.763 +line from 5.652,4.775 to 5.550,4.763 to 5.646,4.725 +line from 4.075,9.713 to 4.075,9.450 +line from 4.475,9.363 to 6.150,9.363 to 6.150,8.000 to 6.050,8.012 +line from 6.152,8.025 to 6.050,8.012 to 6.146,7.975 +line from 4.475,9.250 to 5.025,9.238 to 4.425,8.125 +line from 4.450,8.225 to 4.425,8.125 to 4.494,8.201 +.ps +.ps 20 +line from 4.362,4.775 to 4.162,4.725 to 4.362,4.675 +line from 4.162,4.725 to 4.838,4.725 +.ps +.ps 10 +line from 3.962,4.600 to 5.375,4.600 +line from 3.950,4.838 to 5.375,4.838 +line from 4.450,8.088 to 5.875,8.088 +line from 4.463,7.850 to 5.875,7.850 +.ps +.ps 20 +line from 4.862,8.025 to 4.662,7.975 to 4.862,7.925 +line from 4.662,7.975 to 5.338,7.975 +.ps +.ps 11 +.ft +.ft R +"Child" at 6.362,7.106 +.ps +.ps 12 +"Parent" at 3.362,7.096 ljust +"Parent" at 3.862,10.346 ljust +"PIPE" at 4.987,4.671 ljust +"PIPE" at 5.425,7.921 ljust +.ps +.ft +.PE diff --git a/share/doc/psd/20.ipctut/fig2.xfig b/share/doc/psd/20.ipctut/fig2.xfig new file mode 100644 index 0000000..59b46be --- /dev/null +++ b/share/doc/psd/20.ipctut/fig2.xfig @@ -0,0 +1,100 @@ +#FIG 2.0 +80 2 +5 1 0 1 0 0 0 0 0.000 0 0 0 432.554 462.170 430 453 442 461 429 471 +5 1 0 1 0 0 0 0 0.000 0 0 0 472.554 202.170 470 193 482 201 469 211 +6 414 279 589 424 +6 473 340 557 414 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 399 554 399 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 388 554 388 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 377 554 377 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 367 554 367 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 356 554 356 9999 9999 +2 2 0 1 0 0 0 0 0.000 0 0 + 477 344 554 344 554 410 477 410 477 344 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 511 303 19 19 511 303 525 317 +2 1 0 1 0 0 0 0 0.000 0 0 + 511 323 511 344 9999 9999 +-6 +6 189 279 364 424 +6 248 340 332 414 +2 2 0 1 0 0 0 0 0.000 0 0 + 252 344 329 344 329 410 252 410 252 344 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 356 329 356 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 367 329 367 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 377 329 377 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 388 329 388 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 399 329 399 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 286 303 19 19 286 303 300 317 +2 1 0 1 0 0 0 0 0.000 0 0 + 286 323 286 344 9999 9999 +-6 +6 288 80 372 154 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 139 369 139 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 128 369 128 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 117 369 117 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 107 369 107 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 96 369 96 9999 9999 +2 2 0 1 0 0 0 0 0.000 0 0 + 292 84 369 84 369 150 292 150 292 84 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 326 43 19 19 326 43 340 57 +1 3 0 1 0 0 0 0 0.000 1 0.000 316 462 9 9 316 462 322 469 +1 3 0 1 0 0 0 0 0.000 1 0.000 356 202 9 9 356 202 362 209 +2 1 2 1 0 0 0 0 3.000 0 0 + 154 239 659 239 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 349 459 349 459 464 444 464 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 359 324 449 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 318 360 362 361 314 450 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 318 351 452 351 452 460 444 459 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 326 63 326 84 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 358 91 492 91 492 200 484 199 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 358 100 402 101 354 190 9999 9999 +2 1 0 2 0 0 0 0 0.000 0 1 + 0 0 2.000 8.000 16.000 + 333 462 387 462 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 317 472 430 472 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 316 453 430 453 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 356 193 470 193 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 357 212 470 212 9999 9999 +2 1 0 2 0 0 0 0 0.000 0 1 + 0 0 2.000 8.000 16.000 + 373 202 427 202 9999 9999 +4 1 0 11 0 0 0 0.000 1 7 24 509 274 Child +4 0 0 12 0 0 0 0.000 1 9 33 269 275 Parent +4 0 0 12 0 0 0 0.000 1 9 33 309 15 Parent +4 0 0 12 0 0 0 0.000 1 9 26 399 469 PIPE +4 0 0 12 0 0 0 0.000 1 9 26 434 209 PIPE diff --git a/share/doc/psd/20.ipctut/fig3.pic b/share/doc/psd/20.ipctut/fig3.pic new file mode 100644 index 0000000..15a4a73 --- /dev/null +++ b/share/doc/psd/20.ipctut/fig3.pic @@ -0,0 +1,69 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" @(#)fig3.pic 8.1 (Berkeley) 8/14/93 +.PS +.ps +.ps 10 +ellipse at 5.787,8.012 wid 0.275 ht 0.275 +ellipse at 4.175,8.012 wid 0.275 ht 0.275 +dashwid = 0.037i +line dotted from 5.550,8.012 to 4.362,8.012 +line from 4.462,8.037 to 4.362,8.012 to 4.462,7.987 +line dotted from 4.362,7.950 to 5.550,7.950 +line from 5.450,7.925 to 5.550,7.950 to 5.450,7.975 +ellipse at 3.737,4.763 wid 0.275 ht 0.275 +ellipse at 5.350,4.763 wid 0.275 ht 0.275 +line dotted from 3.925,4.700 to 5.112,4.700 +line from 5.013,4.675 to 5.112,4.700 to 5.013,4.725 +line dotted from 5.112,4.763 to 3.925,4.763 +line from 4.025,4.788 to 3.925,4.763 to 4.025,4.737 +line from 5.963,5.513 to 6.925,5.513 +line from 5.963,5.650 to 6.925,5.650 +line from 5.963,5.787 to 6.925,5.787 +line from 5.963,5.912 to 6.925,5.912 +line from 5.963,6.050 to 6.925,6.050 +line from 5.963,6.200 to 6.925,6.200 to 6.925,5.375 to 5.963,5.375 to 5.963,6.200 +ellipse at 6.388,6.713 wid 0.475 ht 0.475 +line from 6.388,6.463 to 6.388,6.200 +line from 3.150,6.200 to 4.112,6.200 to 4.112,5.375 to 3.150,5.375 to 3.150,6.200 +line from 3.150,6.050 to 4.112,6.050 +line from 3.150,5.912 to 4.112,5.912 +line from 3.150,5.787 to 4.112,5.787 +line from 3.150,5.650 to 4.112,5.650 +line from 3.150,5.513 to 4.112,5.513 +ellipse at 3.575,6.713 wid 0.475 ht 0.475 +line from 3.575,6.463 to 3.575,6.200 +line from 3.650,8.762 to 4.612,8.762 +line from 3.650,8.900 to 4.612,8.900 +line from 3.650,9.037 to 4.612,9.037 +line from 3.650,9.162 to 4.612,9.162 +line from 3.650,9.300 to 4.612,9.300 +line from 3.650,9.450 to 4.612,9.450 to 4.612,8.625 to 3.650,8.625 to 3.650,9.450 +ellipse at 4.075,9.963 wid 0.475 ht 0.475 +line from 3.975,6.112 to 5.650,6.112 to 5.650,4.750 to 5.550,4.763 +line from 5.652,4.775 to 5.550,4.763 to 5.646,4.725 +line from 6.050,6.138 to 5.737,6.138 to 5.737,4.700 to 5.550,4.700 +line from 5.650,4.725 to 5.550,4.700 to 5.650,4.675 +line dotted from 1.925,7.513 to 8.238,7.513 +line from 6.050,6.013 to 4.050,4.888 +line from 4.125,4.958 to 4.050,4.888 to 4.149,4.915 +line from 3.975,6.000 to 4.525,5.987 to 3.925,4.875 +line from 3.950,4.975 to 3.925,4.875 to 3.994,4.951 +line from 4.075,9.713 to 4.075,9.450 +line from 4.475,9.363 to 6.150,9.363 to 6.150,8.000 to 6.050,8.012 +line from 6.152,8.025 to 6.050,8.012 to 6.146,7.975 +line from 4.475,9.250 to 5.025,9.238 to 4.425,8.125 +line from 4.450,8.225 to 4.425,8.125 to 4.494,8.201 +.ps +.ps 11 +.ft +.ft R +"Child" at 6.362,7.106 +.ps +.ps 12 +"Parent" at 3.362,7.096 ljust +"Parent" at 3.862,10.346 ljust +.ps +.ft +.PE diff --git a/share/doc/psd/20.ipctut/fig3.xfig b/share/doc/psd/20.ipctut/fig3.xfig new file mode 100644 index 0000000..ed65b70 --- /dev/null +++ b/share/doc/psd/20.ipctut/fig3.xfig @@ -0,0 +1,100 @@ +#FIG 2.0 +80 2 +6 309 184 479 214 +1 3 0 1 0 0 0 0 0.000 1 0.000 463 199 11 11 463 199 468 209 +1 3 0 1 0 0 0 0 0.000 1 0.000 334 199 11 11 334 199 339 209 +2 1 2 1 0 0 0 0 3.000 1 0 + 0 0 1.000 4.000 8.000 + 444 199 349 199 9999 9999 +2 1 2 1 0 0 0 0 3.000 1 0 + 0 0 1.000 4.000 8.000 + 349 204 444 204 9999 9999 +-6 +6 274 444 444 474 +1 3 0 1 0 0 0 0 0.000 1 0.000 299 459 11 11 299 459 304 469 +1 3 0 1 0 0 0 0 0.000 1 0.000 428 459 11 11 428 459 433 469 +2 1 2 1 0 0 0 0 3.000 1 0 + 0 0 1.000 4.000 8.000 + 314 464 409 464 9999 9999 +2 1 2 1 0 0 0 0 3.000 1 0 + 0 0 1.000 4.000 8.000 + 409 459 314 459 9999 9999 +-6 +6 414 279 589 424 +6 473 340 557 414 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 399 554 399 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 388 554 388 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 377 554 377 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 367 554 367 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 356 554 356 9999 9999 +2 2 0 1 0 0 0 0 0.000 0 0 + 477 344 554 344 554 410 477 410 477 344 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 511 303 19 19 511 303 525 317 +2 1 0 1 0 0 0 0 0.000 0 0 + 511 323 511 344 9999 9999 +-6 +6 189 279 364 424 +6 248 340 332 414 +2 2 0 1 0 0 0 0 0.000 0 0 + 252 344 329 344 329 410 252 410 252 344 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 356 329 356 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 367 329 367 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 377 329 377 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 388 329 388 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 399 329 399 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 286 303 19 19 286 303 300 317 +2 1 0 1 0 0 0 0 0.000 0 0 + 286 323 286 344 9999 9999 +-6 +6 288 80 372 154 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 139 369 139 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 128 369 128 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 117 369 117 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 107 369 107 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 292 96 369 96 9999 9999 +2 2 0 1 0 0 0 0 0.000 0 0 + 292 84 369 84 369 150 292 150 292 84 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 326 43 19 19 326 43 340 57 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 318 351 452 351 452 460 444 459 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 349 459 349 459 464 444 464 9999 9999 +2 1 2 1 0 0 0 0 3.000 0 0 + 154 239 659 239 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 359 324 449 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 318 360 362 361 314 450 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 326 63 326 84 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 358 91 492 91 492 200 484 199 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 358 100 402 101 354 190 9999 9999 +4 1 0 11 0 0 0 0.000 1 7 24 509 274 Child +4 0 0 12 0 0 0 0.000 1 9 33 269 275 Parent +4 0 0 12 0 0 0 0.000 1 9 33 309 15 Parent diff --git a/share/doc/psd/20.ipctut/fig8.pic b/share/doc/psd/20.ipctut/fig8.pic new file mode 100644 index 0000000..92b8833 --- /dev/null +++ b/share/doc/psd/20.ipctut/fig8.pic @@ -0,0 +1,79 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" @(#)fig8.pic 8.1 (Berkeley) 8/14/93 +.PS +.ps +.ps 11 +.ft +.ft R +"Process 1" at 3.800,7.106 rjust +"Process 2" at 6.612,7.106 rjust +.ps +.ps 10 +line from 3.150,6.200 to 4.112,6.200 to 4.112,5.375 to 3.150,5.375 to 3.150,6.200 +line from 3.150,6.050 to 4.112,6.050 +line from 3.150,5.912 to 4.112,5.912 +line from 3.150,5.787 to 4.112,5.787 +line from 3.150,5.650 to 4.112,5.650 +line from 3.150,5.513 to 4.112,5.513 +ellipse at 3.575,6.713 wid 0.475 ht 0.475 +line from 3.575,6.463 to 3.575,6.200 +line from 5.963,5.513 to 6.925,5.513 +line from 5.963,5.650 to 6.925,5.650 +line from 5.963,5.787 to 6.925,5.787 +line from 5.963,5.912 to 6.925,5.912 +line from 5.963,6.050 to 6.925,6.050 +line from 5.963,6.200 to 6.925,6.200 to 6.925,5.375 to 5.963,5.375 to 5.963,6.200 +ellipse at 6.388,6.713 wid 0.475 ht 0.475 +line from 6.388,6.463 to 6.388,6.200 +line from 3.087,8.637 to 4.050,8.637 +line from 3.087,8.775 to 4.050,8.775 +line from 3.087,8.912 to 4.050,8.912 +line from 3.087,9.037 to 4.050,9.037 +line from 3.087,9.175 to 4.050,9.175 +line from 3.087,9.325 to 4.050,9.325 to 4.050,8.500 to 3.087,8.500 to 3.087,9.325 +ellipse at 3.513,9.838 wid 0.475 ht 0.475 +line from 3.513,9.588 to 3.513,9.325 +line from 5.900,9.325 to 6.862,9.325 to 6.862,8.500 to 5.900,8.500 to 5.900,9.325 +line from 5.900,9.175 to 6.862,9.175 +line from 5.900,9.037 to 6.862,9.037 +line from 5.900,8.912 to 6.862,8.912 +line from 5.900,8.775 to 6.862,8.775 +line from 5.900,8.637 to 6.862,8.637 +ellipse at 6.325,9.838 wid 0.475 ht 0.475 +line from 6.325,9.588 to 6.325,9.325 +.ps +.ps 11 +"Process 2" at 6.550,10.231 rjust +"Process 1" at 3.737,10.231 rjust +.ps +.ps 10 +ellipse at 6.112,4.888 wid 0.275 ht 0.275 +ellipse at 5.350,4.763 wid 0.275 ht 0.275 +ellipse at 3.737,4.763 wid 0.275 ht 0.275 +ellipse at 4.550,7.950 wid 0.275 ht 0.275 +ellipse at 5.487,7.950 wid 0.275 ht 0.275 +line from 6.050,6.013 to 5.175,6.013 to 5.987,5.013 +line from 5.905,5.074 to 5.987,5.013 to 5.944,5.106 +line from 6.050,6.138 to 5.737,6.138 to 5.737,4.700 to 5.550,4.700 +line from 5.650,4.725 to 5.550,4.700 to 5.650,4.675 +dashwid = 0.037i +line dotted from 1.925,7.513 to 8.238,7.513 +line from 3.975,6.000 to 4.525,5.987 to 3.925,4.875 +line from 3.950,4.975 to 3.925,4.875 to 3.994,4.951 +line dotted from 5.112,4.763 to 3.925,4.763 +line from 4.025,4.788 to 3.925,4.763 to 4.025,4.737 +line dotted from 3.925,4.700 to 5.112,4.700 +line from 5.013,4.675 to 5.112,4.700 to 5.013,4.725 +line from 6.050,9.012 to 5.487,9.012 to 5.487,8.137 +line from 5.462,8.237 to 5.487,8.137 to 5.513,8.237 +line from 3.737,9.137 to 4.550,9.137 to 4.550,8.137 +line from 4.525,8.237 to 4.550,8.137 to 4.575,8.237 +.ps +.ps 11 +"NAME" at 6.737,4.918 rjust +"NAME" at 6.112,8.043 rjust +.ps +.ft +.PE diff --git a/share/doc/psd/20.ipctut/fig8.xfig b/share/doc/psd/20.ipctut/fig8.xfig new file mode 100644 index 0000000..f1a5257 --- /dev/null +++ b/share/doc/psd/20.ipctut/fig8.xfig @@ -0,0 +1,116 @@ +#FIG 2.0 +80 2 +6 224 254 589 279 +4 2 0 11 0 0 0 0.000 1 7 38 304 274 Process 1 +4 2 0 11 0 0 0 0.000 1 7 38 529 274 Process 2 +-6 +6 189 279 364 424 +6 248 340 332 414 +2 2 0 1 0 0 0 0 0.000 0 0 + 252 344 329 344 329 410 252 410 252 344 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 356 329 356 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 367 329 367 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 377 329 377 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 388 329 388 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 252 399 329 399 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 286 303 19 19 286 303 300 317 +2 1 0 1 0 0 0 0 0.000 0 0 + 286 323 286 344 9999 9999 +-6 +6 414 279 589 424 +6 473 340 557 414 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 399 554 399 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 388 554 388 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 377 554 377 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 367 554 367 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 477 356 554 356 9999 9999 +2 2 0 1 0 0 0 0 0.000 0 0 + 477 344 554 344 554 410 477 410 477 344 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 511 303 19 19 511 303 525 317 +2 1 0 1 0 0 0 0 0.000 0 0 + 511 323 511 344 9999 9999 +-6 +6 184 29 359 174 +6 243 90 327 164 +2 1 0 1 0 0 0 0 0.000 0 0 + 247 149 324 149 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 247 138 324 138 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 247 127 324 127 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 247 117 324 117 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 247 106 324 106 9999 9999 +2 2 0 1 0 0 0 0 0.000 0 0 + 247 94 324 94 324 160 247 160 247 94 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 281 53 19 19 281 53 295 67 +2 1 0 1 0 0 0 0 0.000 0 0 + 281 73 281 94 9999 9999 +-6 +6 409 29 584 174 +6 468 90 552 164 +2 2 0 1 0 0 0 0 0.000 0 0 + 472 94 549 94 549 160 472 160 472 94 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 472 106 549 106 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 472 117 549 117 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 472 127 549 127 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 472 138 549 138 9999 9999 +2 1 0 1 0 0 0 0 0.000 0 0 + 472 149 549 149 9999 9999 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 506 53 19 19 506 53 520 67 +2 1 0 1 0 0 0 0 0.000 0 0 + 506 73 506 94 9999 9999 +-6 +6 219 4 584 29 +4 2 0 11 0 0 0 0.000 1 7 38 524 24 Process 2 +4 2 0 11 0 0 0 0.000 1 7 38 299 24 Process 1 +-6 +1 3 0 1 0 0 0 0 0.000 1 0.000 489 449 11 11 489 449 494 459 +1 3 0 1 0 0 0 0 0.000 1 0.000 428 459 11 11 428 459 433 469 +1 3 0 1 0 0 0 0 0.000 1 0.000 299 459 11 11 299 459 304 469 +1 3 0 1 0 0 0 0 0.000 1 0.000 364 204 11 11 364 204 369 214 +1 3 0 1 0 0 0 0 0.000 1 0.000 439 204 11 11 439 204 444 214 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 359 414 359 479 439 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 349 459 349 459 464 444 464 9999 9999 +2 1 2 1 0 0 0 0 3.000 0 0 + 154 239 659 239 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 318 360 362 361 314 450 9999 9999 +2 1 2 1 0 0 0 0 3.000 1 0 + 0 0 1.000 4.000 8.000 + 409 459 314 459 9999 9999 +2 1 2 1 0 0 0 0 3.000 1 0 + 0 0 1.000 4.000 8.000 + 314 464 409 464 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 484 119 439 119 439 189 9999 9999 +2 1 0 1 0 0 0 0 0.000 1 0 + 0 0 1.000 4.000 8.000 + 299 109 364 109 364 189 9999 9999 +4 2 0 11 0 0 0 0.000 1 7 32 539 449 NAME +4 2 0 11 0 0 0 0.000 1 7 32 489 199 NAME diff --git a/share/doc/psd/20.ipctut/pipe.c b/share/doc/psd/20.ipctut/pipe.c new file mode 100644 index 0000000..86cb663 --- /dev/null +++ b/share/doc/psd/20.ipctut/pipe.c @@ -0,0 +1,74 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)pipe.c 8.1 (Berkeley) 6/8/93 +.\" +#include <stdio.h> + +#define DATA "Bright star, would I were steadfast as thou art . . ." + +/* + * This program creates a pipe, then forks. The child communicates to the + * parent over the pipe. Notice that a pipe is a one-way communications + * device. I can write to the output socket (sockets[1], the second socket + * of the array returned by pipe()) and read from the input socket + * (sockets[0]), but not vice versa. + */ + +main() +{ + int sockets[2], child; + + /* Create a pipe */ + if (pipe(sockets) < 0) { + perror("opening stream socket pair"); + exit(10); + } + + if ((child = fork()) == -1) + perror("fork"); + else if (child) { + char buf[1024]; + + /* This is still the parent. It reads the child's message. */ + close(sockets[1]); + if (read(sockets[0], buf, 1024) < 0) + perror("reading message"); + printf("-->%s\en", buf); + close(sockets[0]); + } else { + /* This is the child. It writes a message to its parent. */ + close(sockets[0]); + if (write(sockets[1], DATA, sizeof(DATA)) < 0) + perror("writing message"); + close(sockets[1]); + } +} diff --git a/share/doc/psd/20.ipctut/socketpair.c b/share/doc/psd/20.ipctut/socketpair.c new file mode 100644 index 0000000..f525c76 --- /dev/null +++ b/share/doc/psd/20.ipctut/socketpair.c @@ -0,0 +1,77 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)socketpair.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <stdio.h> + +#define DATA1 "In Xanadu, did Kublai Khan . . ." +#define DATA2 "A stately pleasure dome decree . . ." + +/* + * This program creates a pair of connected sockets then forks and + * communicates over them. This is very similar to communication with pipes, + * however, socketpairs are two-way communications objects. Therefore I can + * send messages in both directions. + */ + +main() +{ + int sockets[2], child; + char buf[1024]; + + if (socketpair(AF_UNIX, SOCK_STREAM, 0, sockets) < 0) { + perror("opening stream socket pair"); + exit(1); + } + + if ((child = fork()) == -1) + perror("fork"); + else if (child) { /* This is the parent. */ + close(sockets[0]); + if (read(sockets[1], buf, 1024, 0) < 0) + perror("reading stream message"); + printf("-->%s\en", buf); + if (write(sockets[1], DATA2, sizeof(DATA2)) < 0) + perror("writing stream message"); + close(sockets[1]); + } else { /* This is the child. */ + close(sockets[1]); + if (write(sockets[0], DATA1, sizeof(DATA1)) < 0) + perror("writing stream message"); + if (read(sockets[0], buf, 1024, 0) < 0) + perror("reading stream message"); + printf("-->%s\en", buf); + close(sockets[0]); + } +} diff --git a/share/doc/psd/20.ipctut/strchkread.c b/share/doc/psd/20.ipctut/strchkread.c new file mode 100644 index 0000000..a1e148b --- /dev/null +++ b/share/doc/psd/20.ipctut/strchkread.c @@ -0,0 +1,106 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)strchkread.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/time.h> +#include <netinet/in.h> +#include <netdb.h> +#include <stdio.h> +#define TRUE 1 + +/* + * This program uses select() to check that someone is trying to connect + * before calling accept(). + */ + +main() +{ + int sock, length; + struct sockaddr_in server; + int msgsock; + char buf[1024]; + int rval; + fd_set ready; + struct timeval to; + + /* Create socket */ + sock = socket(AF_INET, SOCK_STREAM, 0); + if (sock < 0) { + perror("opening stream socket"); + exit(1); + } + /* Name socket using wildcards */ + server.sin_family = AF_INET; + server.sin_addr.s_addr = INADDR_ANY; + server.sin_port = 0; + if (bind(sock, &server, sizeof(server))) { + perror("binding stream socket"); + exit(1); + } + /* Find out assigned port number and print it out */ + length = sizeof(server); + if (getsockname(sock, &server, &length)) { + perror("getting socket name"); + exit(1); + } + printf("Socket has port #%d\en", ntohs(server.sin_port)); + + /* Start accepting connections */ + listen(sock, 5); + do { + FD_ZERO(&ready); + FD_SET(sock, &ready); + to.tv_sec = 5; + if (select(sock + 1, &ready, 0, 0, &to) < 0) { + perror("select"); + continue; + } + if (FD_ISSET(sock, &ready)) { + msgsock = accept(sock, (struct sockaddr *)0, (int *)0); + if (msgsock == -1) + perror("accept"); + else do { + bzero(buf, sizeof(buf)); + if ((rval = read(msgsock, buf, 1024)) < 0) + perror("reading stream message"); + else if (rval == 0) + printf("Ending connection\en"); + else + printf("-->%s\en", buf); + } while (rval > 0); + close(msgsock); + } else + printf("Do something else\en"); + } while (TRUE); +} diff --git a/share/doc/psd/20.ipctut/streamread.c b/share/doc/psd/20.ipctut/streamread.c new file mode 100644 index 0000000..ffad802 --- /dev/null +++ b/share/doc/psd/20.ipctut/streamread.c @@ -0,0 +1,102 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)streamread.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <netdb.h> +#include <stdio.h> +#define TRUE 1 + +/* + * This program creates a socket and then begins an infinite loop. Each time + * through the loop it accepts a connection and prints out messages from it. + * When the connection breaks, or a termination message comes through, the + * program accepts a new connection. + */ + +main() +{ + int sock, length; + struct sockaddr_in server; + int msgsock; + char buf[1024]; + int rval; + int i; + + /* Create socket */ + sock = socket(AF_INET, SOCK_STREAM, 0); + if (sock < 0) { + perror("opening stream socket"); + exit(1); + } + /* Name socket using wildcards */ + server.sin_family = AF_INET; + server.sin_addr.s_addr = INADDR_ANY; + server.sin_port = 0; + if (bind(sock, &server, sizeof(server))) { + perror("binding stream socket"); + exit(1); + } + /* Find out assigned port number and print it out */ + length = sizeof(server); + if (getsockname(sock, &server, &length)) { + perror("getting socket name"); + exit(1); + } + printf("Socket has port #%d\en", ntohs(server.sin_port)); + + /* Start accepting connections */ + listen(sock, 5); + do { + msgsock = accept(sock, 0, 0); + if (msgsock == -1) + perror("accept"); + else do { + bzero(buf, sizeof(buf)); + if ((rval = read(msgsock, buf, 1024)) < 0) + perror("reading stream message"); + i = 0; + if (rval == 0) + printf("Ending connection\en"); + else + printf("-->%s\en", buf); + } while (rval != 0); + close(msgsock); + } while (TRUE); + /* + * Since this program has an infinite loop, the socket "sock" is + * never explicitly closed. However, all sockets will be closed + * automatically when a process is killed or terminates normally. + */ +} diff --git a/share/doc/psd/20.ipctut/streamwrite.c b/share/doc/psd/20.ipctut/streamwrite.c new file mode 100644 index 0000000..6205f13 --- /dev/null +++ b/share/doc/psd/20.ipctut/streamwrite.c @@ -0,0 +1,81 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)streamwrite.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <netdb.h> +#include <stdio.h> + +#define DATA "Half a league, half a league . . ." + +/* + * This program creates a socket and initiates a connection with the socket + * given in the command line. One message is sent over the connection and + * then the socket is closed, ending the connection. The form of the command + * line is streamwrite hostname portnumber + */ + +main(argc, argv) + int argc; + char *argv[]; +{ + int sock; + struct sockaddr_in server; + struct hostent *hp, *gethostbyname(); + char buf[1024]; + + /* Create socket */ + sock = socket(AF_INET, SOCK_STREAM, 0); + if (sock < 0) { + perror("opening stream socket"); + exit(1); + } + /* Connect socket using name specified by command line. */ + server.sin_family = AF_INET; + hp = gethostbyname(argv[1]); + if (hp == 0) { + fprintf(stderr, "%s: unknown host\en", argv[1]); + exit(2); + } + bcopy(hp->h_addr, &server.sin_addr, hp->h_length); + server.sin_port = htons(atoi(argv[2])); + + if (connect(sock, &server, sizeof(server)) < 0) { + perror("connecting stream socket"); + exit(1); + } + if (write(sock, DATA, sizeof(DATA)) < 0) + perror("writing on stream socket"); + close(sock); +} diff --git a/share/doc/psd/20.ipctut/tutor.me b/share/doc/psd/20.ipctut/tutor.me new file mode 100644 index 0000000..fba4583 --- /dev/null +++ b/share/doc/psd/20.ipctut/tutor.me @@ -0,0 +1,939 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)tutor.me 8.1 (Berkeley) 8/14/93 +.\" +.oh 'Introductory 4.4BSD IPC''PSD:20-%' +.eh 'PSD:20-%''Introductory 4.4BSD IPC' +.rs +.sp 2 +.sz 14 +.ft B +.ce 2 +An Introductory 4.4BSD +Interprocess Communication Tutorial +.sz 10 +.sp 2 +.ce +.i "Stuart Sechrest" +.ft +.sp +.ce 4 +Computer Science Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +.sp 2 +.ce +.i ABSTRACT +.sp +.(c +.pp +Berkeley UNIX\(dg 4.4BSD offers several choices for interprocess communication. +To aid the programmer in developing programs which are comprised of +cooperating +processes, the different choices are discussed and a series of example +programs are presented. These programs +demonstrate in a simple way the use of pipes, socketpairs, sockets +and the use of datagram and stream communication. The intent of this +document is to present a few simple example programs, not to describe the +networking system in full. +.)c +.sp 2 +.(f +\(dg\|UNIX is a trademark of AT&T Bell Laboratories. +.)f +.b +.sh 1 "Goals" +.r +.pp +Facilities for interprocess communication (IPC) and networking +were a major addition to UNIX in the Berkeley UNIX 4.2BSD release. +These facilities required major additions and some changes +to the system interface. +The basic idea of this interface is to make IPC similar to file I/O. +In UNIX a process has a set of I/O descriptors, from which one reads +and to which one writes. +Descriptors may refer to normal files, to devices (including terminals), +or to communication channels. +The use of a descriptor has three phases: its creation, +its use for reading and writing, and its destruction. By using descriptors +to write files, rather than simply naming the target file in the write +call, one gains a surprising amount of flexibility. Often, the program that +creates a descriptor will be different from the program that uses the +descriptor. For example the shell can create a descriptor for the output +of the `ls' +command that will cause the listing to appear in a file rather than +on a terminal. +Pipes are another form of descriptor that have been used in UNIX +for some time. +Pipes allow one-way data transmission from one process +to another; the two processes and the pipe must be set up by a common +ancestor. +.pp +The use of descriptors is not the only communication interface +provided by UNIX. +The signal mechanism sends a tiny amount of information from one +process to another. +The signaled process receives only the signal type, +not the identity of the sender, +and the number of possible signals is small. +The signal semantics limit the flexibility of the signaling mechanism +as a means of interprocess communication. +.pp +The identification of IPC with I/O is quite longstanding in UNIX and +has proved quite successful. At first, however, IPC was limited to +processes communicating within a single machine. With Berkeley UNIX +4.2BSD this expanded to include IPC between machines. This expansion +has necessitated some change in the way that descriptors are created. +Additionally, new possibilities for the meaning of read and write have +been admitted. Originally the meanings, or semantics, of these terms +were fairly simple. When you wrote something it was delivered. When +you read something, you were blocked until the data arrived. +Other possibilities exist, +however. One can write without full assurance of delivery if one can +check later to catch occasional failures. Messages can be kept as +discrete units or merged into a stream. +One can ask to read, but insist on not waiting if nothing is immediately +available. These new possibilities are allowed in the Berkeley UNIX IPC +interface. +.pp +Thus Berkeley UNIX 4.4BSD offers several choices for IPC. +This paper presents simple examples that illustrate some of +the choices. +The reader is presumed to be familiar with the C programming language +[Kernighan & Ritchie 1978], +but not necessarily with the system calls of the UNIX system or with +processes and interprocess communication. +The paper reviews the notion of a process and the types of +communication that are supported by Berkeley UNIX 4.4BSD. +A series of examples are presented that create processes that communicate +with one another. The programs show different ways of establishing +channels of communication. +Finally, the calls that actually transfer data are reviewed. +To clearly present how communication can take place, +the example programs have been cleared of anything that +might be construed as useful work. +They can, therefore, serve as models +for the programmer trying to construct programs which are comprised of +cooperating processes. +.b +.sh 1 "Processes" +.pp +A \fIprogram\fP is both a sequence of statements and a rough way of referring +to the computation that occurs when the compiled statements are run. +A \fIprocess\fP can be thought of as a single line of control in a program. +Most programs execute some statements, go through a few loops, branch in +various directions and then end. These are single process programs. +Programs can also have a point where control splits into two independent lines, +an action called \fIforking.\fP +In UNIX these lines can never join again. A call to the system routine +\fIfork()\fP, causes a process to split in this way. +The result of this call is that two independent processes will be +running, executing exactly the same code. +Memory values will be the same for all values set before the fork, but, +subsequently, each version will be able to change only the +value of its own copy of each variable. +Initially, the only difference between the two will be the value returned by +\fIfork().\fP The parent will receive a process id for the child, +the child will receive a zero. +Calls to \fIfork(),\fP +therefore, typically precede, or are included in, an if-statement. +.pp +A process views the rest of the system through a private table of descriptors. +The descriptors can represent open files or sockets (sockets are communication +objects that will be discussed below). Descriptors are referred to +by their index numbers in the table. The first three descriptors are often +known by special names, \fI stdin, stdout\fP and \fIstderr\fP. +These are the standard input, output and error. +When a process forks, its descriptor table is copied to the child. +Thus, if the parent's standard input is being taken from a terminal +(devices are also treated as files in UNIX), the child's input will +be taken from the +same terminal. Whoever reads first will get the input. If, before forking, +the parent changes its standard input so that it is reading from a +new file, the child will take its input from the new file. It is +also possible to take input from a socket, rather than from a file. +.b +.sh 1 "Pipes" +.r +.pp +Most users of UNIX know that they can pipe the output of a +program ``prog1'' to the input of another, ``prog2,'' by typing the command +\fI``prog1 | prog2.''\fP +This is called ``piping'' the output of one program +to another because the mechanism used to transfer the output is called a +pipe. +When the user types a command, the command is read by the shell, which +decides how to execute it. If the command is simple, for example, +.i "``prog1,''" +the shell forks a process, which executes the program, prog1, and then dies. +The shell waits for this termination and then prompts for the next +command. +If the command is a compound command, +.i "``prog1 | prog2,''" +the shell creates two processes connected by a pipe. One process +runs the program, prog1, the other runs prog2. The pipe is an I/O +mechanism with two ends, or sockets. Data that is written into one socket +can be read from the other. +.(z +.ft CW +.so pipe.c +.ft +.ce 1 +Figure 1\ \ Use of a pipe +.)z +.pp +Since a program specifies its input and output only by the descriptor table +indices, which appear as variables or constants, +the input source and output destination can be changed without +changing the text of the program. +It is in this way that the shell is able to set up pipes. Before executing +prog1, the process can close whatever is at \fIstdout\fP +and replace it with one +end of a pipe. Similarly, the process that will execute prog2 can substitute +the opposite end of the pipe for +\fIstdin.\fP +.pp +Let us now examine a program that creates a pipe for communication between +its child and itself (Figure 1). +A pipe is created by a parent process, which then forks. +When a process forks, the parent's descriptor table is copied into +the child's. +.pp +In Figure 1, the parent process makes a call to the system routine +\fIpipe().\fP +This routine creates a pipe and places descriptors for the sockets +for the two ends of the pipe in the process's descriptor table. +\fIPipe()\fP +is passed an array into which it places the index numbers of the +sockets it created. +The two ends are not equivalent. The socket whose index is +returned in the low word of the array is opened for reading only, +while the socket in the high end is opened only for writing. +This corresponds to the fact that the standard input is the first +descriptor of a process's descriptor table and the standard output +is the second. After creating the pipe, the parent creates the child +with which it will share the pipe by calling \fIfork().\fP +Figure 2 illustrates the effect of a fork. +The parent process's descriptor table points to both ends of the pipe. +After the fork, both parent's and child's descriptor tables point to +the pipe. +The child can then use the pipe to send a message to the parent. +.(z +.so fig2.pic +.ce 2 +Figure 2\ \ Sharing a pipe between parent and child +.ce 0 +.)z +.pp +Just what is a pipe? +It is a one-way communication mechanism, with one end opened +for reading and the other end for writing. +Therefore, parent and child need to agree on which way to turn +the pipe, from parent to child or the other way around. +Using the same pipe for communication both from parent to child and +from child to parent would be possible (since both processes have +references to both ends), but very complicated. +If the parent and child are to have a two-way conversation, +the parent creates two pipes, one for use in each direction. +(In accordance with their plans, both parent and child in the example above +close the socket that they will not use. It is not required that unused +descriptors be closed, but it is good practice.) +A pipe is also a \fIstream\fP communication mechanism; that +is, all messages sent through the pipe are placed in order +and reliably delivered. When the reader asks for a certain +number of bytes from this +stream, he is given as many bytes as are available, up +to the amount of the request. Note that these bytes may have come from +the same call to \fIwrite()\fR or from several calls to \fIwrite()\fR +which were concatenated. +.b +.sh 1 "Socketpairs" +.r +.pp +Berkeley UNIX 4.4BSD provides a slight generalization of pipes. A pipe is a +pair of connected sockets for one-way stream communication. One may +obtain a pair of connected sockets for two-way stream communication +by calling the routine \fIsocketpair().\fP +The program in Figure 3 calls \fIsocketpair()\fP +to create such a connection. The program uses the link for +communication in both directions. Since socketpairs are +an extension of pipes, their use resembles that of pipes. +Figure 4 illustrates the result of a fork following a call to +\fIsocketpair().\fP +.pp +\fISocketpair()\fP +takes as +arguments a specification of a domain, a style of communication, and a +protocol. +These are the parameters shown in the example. +Domains and protocols will be discussed in the next section. +Briefly, +a domain is a space of names that may be bound +to sockets and implies certain other conventions. +Currently, socketpairs have only been implemented for one +domain, called the UNIX domain. +The UNIX domain uses UNIX path names for naming sockets. +It only allows communication +between sockets on the same machine. +.pp +Note that the header files +.i "<sys/socket.h>" +and +.i "<sys/types.h>." +are required in this program. +The constants AF_UNIX and SOCK_STREAM are defined in +.i "<sys/socket.h>," +which in turn requires the file +.i "<sys/types.h>" +for some of its definitions. +.(z +.ft CW +.so socketpair.c +.ft +.ce 1 +Figure 3\ \ Use of a socketpair +.)z +.(z +.so fig3.pic +.ce 1 +Figure 4\ \ Sharing a socketpair between parent and child +.)z +.b +.sh 1 "Domains and Protocols" +.r +.pp +Pipes and socketpairs are a simple solution for communicating between +a parent and child or between child processes. +What if we wanted to have processes that have no common ancestor +with whom to set up communication? +Neither standard UNIX pipes nor socketpairs are +the answer here, since both mechanisms require a common ancestor to +set up the communication. +We would like to have two processes separately create sockets +and then have messages sent between them. This is often the +case when providing or using a service in the system. This is +also the case when the communicating processes are on separate machines. +In Berkeley UNIX 4.4BSD one can create individual sockets, give them names and +send messages between them. +.pp +Sockets created by different programs use names to refer to one another; +names generally must be translated into addresses for use. +The space from which an address is drawn is referred to as a +.i domain. +There are several domains for sockets. +Two that will be used in the examples here are the UNIX domain (or AF_UNIX, +for Address Format UNIX) and the Internet domain (or AF_INET). +UNIX domain IPC is an experimental facility in 4.2BSD and 4.3BSD. +In the UNIX domain, a socket is given a path name within the file system +name space. +A file system node is created for the socket and other processes may +then refer to the socket by giving the proper pathname. +UNIX domain names, therefore, allow communication between any two processes +that work in the same file system. +The Internet domain is the UNIX implementation of the DARPA Internet +standard protocols IP/TCP/UDP. +Addresses in the Internet domain consist of a machine network address +and an identifying number, called a port. +Internet domain names allow communication between machines. +.pp +Communication follows some particular ``style.'' +Currently, communication is either through a \fIstream\fP +or by \fIdatagram.\fP +Stream communication implies several things. Communication takes +place across a connection between two sockets. The communication +is reliable, error-free, and, as in pipes, no message boundaries are +kept. Reading from a stream may result in reading the data sent from +one or several calls to \fIwrite()\fP +or only part of the data from a single call, if there is not enough room +for the entire message, or if not all the data from a large message +has been transferred. +The protocol implementing such a style will retransmit messages +received with errors. It will also return error messages if one tries to +send a message after the connection has been broken. +Datagram communication does not use connections. Each message is +addressed individually. If the address is correct, it will generally +be received, although this is not guaranteed. Often datagrams are +used for requests that require a response from the +recipient. If no response +arrives in a reasonable amount of time, the request is repeated. +The individual datagrams will be kept separate when they are read, that +is, message boundaries are preserved. +.pp +The difference in performance between the two styles of communication is +generally less important than the difference in semantics. The +performance gain that one might find in using datagrams must be weighed +against the increased complexity of the program, which must now concern +itself with lost or out of order messages. If lost messages may simply be +ignored, the quantity of traffic may be a consideration. The expense +of setting up a connection is best justified by frequent use of the connection. +Since the performance of a protocol changes as it is tuned for different +situations, it is best to seek the most up-to-date information when +making choices for a program in which performance is crucial. +.pp +A protocol is a set of rules, data formats and conventions that regulate the +transfer of data between participants in the communication. +In general, there is one protocol for each socket type (stream, +datagram, etc.) within each domain. +The code that implements a protocol +keeps track of the names that are bound to sockets, +sets up connections and transfers data between sockets, +perhaps sending the data across a network. +This code also keeps track of the names that are bound to sockets. +It is possible for several protocols, differing only in low level +details, to implement the same style of communication within +a particular domain. Although it is possible to select +which protocol should be used, for nearly all uses it is sufficient to +request the default protocol. This has been done in all of the example +programs. +.pp +One specifies the domain, style and protocol of a socket when +it is created. For example, in Figure 5a the call to \fIsocket()\fP +causes the creation of a datagram socket with the default protocol +in the UNIX domain. +.b +.sh 1 "Datagrams in the UNIX Domain" +.r +.(z +.ft CW +.so udgramread.c +.ft +.ce 1 +Figure 5a\ \ Reading UNIX domain datagrams +.)z +.pp +Let us now look at two programs that create sockets separately. +The programs in Figures 5a and 5b use datagram communication +rather than a stream. +The structure used to name UNIX domain sockets is defined +in the file \fI<sys/un.h>.\fP +The definition has also been included in the example for clarity. +.pp +Each program creates a socket with a call to \fIsocket().\fP +These sockets are in the UNIX domain. +Once a name has been decided upon it is attached to a socket by the +system call \fIbind().\fP +The program in Figure 5a uses the name ``socket'', +which it binds to its socket. +This name will appear in the working directory of the program. +The routines in Figure 5b use its +socket only for sending messages. It does not create a name for +the socket because no other process has to refer to it. +.(z +.ft CW +.so udgramsend.c +.ft +.ce 1 +Figure 5b\ \ Sending a UNIX domain datagrams +.)z +.pp +Names in the UNIX domain are path names. Like file path names they may +be either absolute (e.g. ``/dev/imaginary'') or relative (e.g. ``socket''). +Because these names are used to allow processes to rendezvous, relative +path names can pose difficulties and should be used with care. +When a name is bound into the name space, a file (inode) is allocated in the +file system. If +the inode is not deallocated, the name will continue to exist even after +the bound socket is closed. This can cause subsequent runs of a program +to find that a name is unavailable, and can cause +directories to fill up with these +objects. The names are removed by calling \fIunlink()\fP or using +the \fIrm\fP\|(1) command. +Names in the UNIX domain are only used for rendezvous. They are not used +for message delivery once a connection is established. Therefore, in +contrast with the Internet domain, unbound sockets need not be (and are +not) automatically given addresses when they are connected. +.pp +There is no established means of communicating names to interested +parties. In the example, the program in Figure 5b gets the +name of the socket to which it will send its message through its +command line arguments. Once a line of communication has been created, +one can send the names of additional, perhaps new, sockets over the link. +Facilities will have to be built that will make the distribution of +names less of a problem than it now is. +.b +.sh 1 "Datagrams in the Internet Domain" +.r +.(z +.ft CW +.so dgramread.c +.ft +.ce 1 +Figure 6a\ \ Reading Internet domain datagrams +.)z +.pp +The examples in Figure 6a and 6b are very close to the previous example +except that the socket is in the Internet domain. +The structure of Internet domain addresses is defined in the file +\fI<netinet/in.h>\fP. +Internet addresses specify a host address (a 32-bit number) +and a delivery slot, or port, on that +machine. These ports are managed by the system routines that implement +a particular protocol. +Unlike UNIX domain names, Internet socket names are not entered into +the file system and, therefore, +they do not have to be unlinked after the socket has been closed. +When a message must be sent between machines it is sent to +the protocol routine on the destination machine, which interprets the +address to determine to which socket the message should be delivered. +Several different protocols may be active on +the same machine, but, in general, they will not communicate with one another. +As a result, different protocols are allowed to use the same port numbers. +Thus, implicitly, an Internet address is a triple including a protocol as +well as the port and machine address. +An \fIassociation\fP is a temporary or permanent specification +of a pair of communicating sockets. +An association is thus identified by the tuple +<\fIprotocol, local machine address, local port, +remote machine address, remote port\fP>. +An association may be transient when using datagram sockets; +the association actually exists during a \fIsend\fP operation. +.(z +.ft CW +.so dgramsend.c +.ft +.ce 1 +Figure 6b\ \ Sending an Internet domain datagram +.)z +.pp +The protocol for a socket is chosen when the socket is created. The +local machine address for a socket can be any valid network address of the +machine, if it has more than one, or it can be the wildcard value +INADDR_ANY. +The wildcard value is used in the program in Figure 6a. +If a machine has several network addresses, it is likely +that messages sent to any of the addresses should be deliverable to +a socket. This will be the case if the wildcard value has been chosen. +Note that even if the wildcard value is chosen, a program sending messages +to the named socket must specify a valid network address. One can be willing +to receive from ``anywhere,'' but one cannot send a message ``anywhere.'' +The program in Figure 6b is given the destination host name as a command +line argument. +To determine a network address to which it can send the message, it looks +up +the host address by the call to \fIgethostbyname()\fP. +The returned structure includes the host's network address, +which is copied into the structure specifying the +destination of the message. +.pp +The port number can be thought of as the number of a mailbox, into +which the protocol places one's messages. Certain daemons, offering +certain advertised services, have reserved +or ``well-known'' port numbers. These fall in the range +from 1 to 1023. Higher numbers are available to general users. +Only servers need to ask for a particular number. +The system will assign an unused port number when an address +is bound to a socket. +This may happen when an explicit \fIbind\fP +call is made with a port number of 0, or +when a \fIconnect\fP or \fIsend\fP +is performed on an unbound socket. +Note that port numbers are not automatically reported back to the user. +After calling \fIbind(),\fP asking for port 0, one may call +\fIgetsockname()\fP to discover what port was actually assigned. +The routine \fIgetsockname()\fP +will not work for names in the UNIX domain. +.pp +The format of the socket address is specified in part by standards within the +Internet domain. The specification includes the order of the bytes in +the address. Because machines differ in the internal representation +they ordinarily use +to represent integers, printing out the port number as returned by +\fIgetsockname()\fP may result in a misinterpretation. To +print out the number, it is necessary to use the routine \fIntohs()\fP +(for \fInetwork to host: short\fP) to convert the number from the +network representation to the host's representation. On some machines, +such as 68000-based machines, this is a null operation. On others, +such as VAXes, this results in a swapping of bytes. Another routine +exists to convert a short integer from the host format to the network format, +called \fIhtons()\fP; similar routines exist for long integers. +For further information, refer to the +entry for \fIbyteorder\fP in section 3 of the manual. +.b +.sh 1 "Connections" +.r +.pp +To send data between stream sockets (having communication style SOCK_STREAM), +the sockets must be connected. +Figures 7a and 7b show two programs that create such a connection. +The program in 7a is relatively simple. +To initiate a connection, this program simply creates +a stream socket, then calls \fIconnect()\fP, +specifying the address of the socket to which +it wishes its socket connected. Provided that the target socket exists and +is prepared to handle a connection, connection will be complete, +and the program can begin to send +messages. Messages will be delivered in order without message +boundaries, as with pipes. The connection is destroyed when either +socket is closed (or soon thereafter). If a process persists +in sending messages after the connection is closed, a SIGPIPE signal +is sent to the process by the operating system. Unless explicit action +is taken to handle the signal (see the manual page for \fIsignal\fP +or \fIsigvec\fP), +the process will terminate and the shell +will print the message ``broken pipe.'' +.(z +.ft CW +.so streamwrite.c +.ft +.ce 1 +Figure 7a\ \ Initiating an Internet domain stream connection +.)z +.(z +.ft CW +.so streamread.c +.ft +.ce 1 +Figure 7b\ \ Accepting an Internet domain stream connection +.sp 2 +.ft CW +.so strchkread.c +.ft +.ce 1 +Figure 7c\ \ Using select() to check for pending connections +.)z +.(z +.so fig8.pic +.sp +.ce 1 +Figure 8\ \ Establishing a stream connection +.)z +.pp +Forming a connection is asymmetrical; one process, such as the +program in Figure 7a, requests a connection with a particular socket, +the other process accepts connection requests. +Before a connection can be accepted a socket must be created and an address +bound to it. This +situation is illustrated in the top half of Figure 8. Process 2 +has created a socket and bound a port number to it. Process 1 has created an +unnamed socket. +The address bound to process 2's socket is then made known to process 1 and, +perhaps to several other potential communicants as well. +If there are several possible communicants, +this one socket might receive several requests for connections. +As a result, a new socket is created for each connection. This new socket +is the endpoint for communication within this process for this connection. +A connection may be destroyed by closing the corresponding socket. +.pp +The program in Figure 7b is a rather trivial example of a server. It +creates a socket to which it binds a name, which it then advertises. +(In this case it prints out the socket number.) The program then calls +\fIlisten()\fP for this socket. +Since several clients may attempt to connect more or less +simultaneously, a queue of pending connections is maintained in the system +address space. \fIListen()\fP +marks the socket as willing to accept connections and initializes the queue. +When a connection is requested, it is listed in the queue. If the +queue is full, an error status may be returned to the requester. +The maximum length of this queue is specified by the second argument of +\fIlisten()\fP; the maximum length is limited by the system. +Once the listen call has been completed, the program enters +an infinite loop. On each pass through the loop, a new connection is +accepted and removed from the queue, and, hence, a new socket for the +connection is created. The bottom half of Figure 8 shows the result of +Process 1 connecting with the named socket of Process 2, and Process 2 +accepting the connection. After the connection is created, the +service, in this case printing out the messages, is performed and the +connection socket closed. The \fIaccept()\fP +call will take a pending connection +request from the queue if one is available, or block waiting for a request. +Messages are read from the connection socket. +Reads from an active connection will normally block until data is available. +The number of bytes read is returned. When a connection is destroyed, +the read call returns immediately. The number of bytes returned will +be zero. +.pp +The program in Figure 7c is a slight variation on the server in Figure 7b. +It avoids blocking when there are no pending connection requests by +calling \fIselect()\fP +to check for pending requests before calling \fIaccept().\fP +This strategy is useful when connections may be received +on more than one socket, or when data may arrive on other connected +sockets before another connection request. +.pp +The programs in Figures 9a and 9b show a program using stream communication +in the UNIX domain. Streams in the UNIX domain can be used for this sort +of program in exactly the same way as Internet domain streams, except for +the form of the names and the restriction of the connections to a single +file system. There are some differences, however, in the functionality of +streams in the two domains, notably in the handling of +\fIout-of-band\fP data (discussed briefly below). These differences +are beyond the scope of this paper. +.(z +.ft CW +.so ustreamwrite.c +.ft +.ce 1 +Figure 9a\ \ Initiating a UNIX domain stream connection +.sp 2 +.ft CW +.so ustreamread.c +.ft +.ce 1 +Figure 9b\ \ Accepting a UNIX domain stream connection +.)z +.b +.sh 1 "Reads, Writes, Recvs, etc." +.r +.pp +UNIX 4.4BSD has several system calls for reading and writing information. +The simplest calls are \fIread() \fP and \fIwrite().\fP \fIWrite()\fP +takes as arguments the index of a descriptor, a pointer to a buffer +containing the data and the size of the data. +The descriptor may indicate either a file or a connected socket. +``Connected'' can mean either a connected stream socket (as described +in Section 8) or a datagram socket for which a \fIconnect()\fP +call has provided a default destination (see the \fIconnect()\fP manual page). +\fIRead()\fP also takes a descriptor that indicates either a file or a socket. +\fIWrite()\fP requires a connected socket since no destination is +specified in the parameters of the system call. +\fIRead()\fP can be used for either a connected or an unconnected socket. +These calls are, therefore, quite flexible and may be used to +write applications that require no assumptions about the source of +their input or the destination of their output. +There are variations on \fIread() \fP and \fIwrite()\fP +that allow the source and destination of the input and output to use +several separate buffers, while retaining the flexibility to handle +both files and sockets. These are \fIreadv()\fP and \fI writev(),\fP +for read and write \fIvector.\fP +.pp +It is sometimes necessary to send high priority data over a +connection that may have unread low priority data at the +other end. For example, a user interface process may be interpreting +commands and sending them on to another process through a stream connection. +The user interface may have filled the stream with as yet unprocessed +requests when the user types +a command to cancel all outstanding requests. +Rather than have the high priority data wait +to be processed after the low priority data, it is possible to +send it as \fIout-of-band\fP +(OOB) data. The notification of pending OOB data results in the generation of +a SIGURG signal, if this signal has been enabled (see the manual +page for \fIsignal\fP or \fIsigvec\fP). +See [Leffler 1986] for a more complete description of the OOB mechanism. +There are a pair of calls similar to \fIread\fP and \fIwrite\fP +that allow options, including sending +and receiving OOB information; these are \fI send()\fP +and \fIrecv().\fP +These calls are used only with sockets; specifying a descriptor for a file will +result in the return of an error status. These calls also allow +\fIpeeking\fP at data in a stream. +That is, they allow a process to read data without removing the data from +the stream. One use of this facility is to read ahead in a stream +to determine the size of the next item to be read. +When not using these options, these calls have the same functions as +\fIread()\fP and \fIwrite().\fP +.pp +To send datagrams, one must be allowed to specify the destination. +The call \fIsendto()\fP +takes a destination address as an argument and is therefore used for +sending datagrams. The call \fIrecvfrom()\fP +is often used to read datagrams, since this call returns the address +of the sender, if it is available, along with the data. +If the identity of the sender does not matter, one may use \fIread()\fP +or \fIrecv().\fP +.pp +Finally, there are a pair of calls that allow the sending and +receiving of messages from multiple buffers, when the address of the +recipient must be specified. These are \fIsendmsg()\fP and +\fIrecvmsg().\fP +These calls are actually quite general and have other uses, +including, in the UNIX domain, the transmission of a file descriptor from one +process to another. +.pp +The various options for reading and writing are shown in Figure 10, +together with their parameters. The parameters for each system call +reflect the differences in function of the different calls. +In the examples given in this paper, the calls \fIread()\fP and +\fIwrite()\fP have been used whenever possible. +.(z +.ft CW + /* + * The variable descriptor may be the descriptor of either a file + * or of a socket. + */ + cc = read(descriptor, buf, nbytes) + int cc, descriptor; char *buf; int nbytes; + + /* + * An iovec can include several source buffers. + */ + cc = readv(descriptor, iov, iovcnt) + int cc, descriptor; struct iovec *iov; int iovcnt; + + cc = write(descriptor, buf, nbytes) + int cc, descriptor; char *buf; int nbytes; + + cc = writev(descriptor, iovec, ioveclen) + int cc, descriptor; struct iovec *iovec; int ioveclen; + + /* + * The variable ``sock'' must be the descriptor of a socket. + * Flags may include MSG_OOB and MSG_PEEK. + */ + cc = send(sock, msg, len, flags) + int cc, sock; char *msg; int len, flags; + + cc = sendto(sock, msg, len, flags, to, tolen) + int cc, sock; char *msg; int len, flags; + struct sockaddr *to; int tolen; + + cc = sendmsg(sock, msg, flags) + int cc, sock; struct msghdr msg[]; int flags; + + cc = recv(sock, buf, len, flags) + int cc, sock; char *buf; int len, flags; + + cc = recvfrom(sock, buf, len, flags, from, fromlen) + int cc, sock; char *buf; int len, flags; + struct sockaddr *from; int *fromlen; + + cc = recvmsg(sock, msg, flags) + int cc, socket; struct msghdr msg[]; int flags; +.ft +.sp 1 +.ce 1 +Figure 10\ \ Varieties of read and write commands +.)z +.b +.sh 1 "Choices" +.r +.pp +This paper has presented examples of some of the forms +of communication supported by +Berkeley UNIX 4.4BSD. These have been presented in an order chosen for +ease of presentation. It is useful to review these options emphasizing the +factors that make each attractive. +.pp +Pipes have the advantage of portability, in that they are supported in all +UNIX systems. They also are relatively +simple to use. Socketpairs share this simplicity and have the additional +advantage of allowing bidirectional communication. The major shortcoming +of these mechanisms is that they require communicating processes to be +descendants of a common process. They do not allow intermachine communication. +.pp +The two communication domains, UNIX and Internet, allow processes with no common +ancestor to communicate. +Of the two, only the Internet domain allows +communication between machines. +This makes the Internet domain a necessary +choice for processes running on separate machines. +.pp +The choice between datagrams and stream communication is best made by +carefully considering the semantic and performance +requirements of the application. +Streams can be both advantageous and disadvantageous. One disadvantage +is that a process is only allowed a limited number of open streams, +as there are usually only 64 entries available in the open descriptor +table. This can cause problems if a single server must talk with a large +number of clients. +Another is that for delivering a short message the stream setup and +teardown time can be unnecessarily long. Weighed against this are +the reliability built into the streams. This will often be the +deciding factor in favor of streams. +.b +.sh 1 "What to do Next" +.r +.pp +Many of the examples presented here can serve as models for multiprocess +programs and for programs distributed across several machines. +In developing a new multiprocess program, it is often easiest to +first write the code to create the processes and communication paths. +After this code is debugged, the code specific to the application can +be added. +.pp +An introduction to the UNIX system and programming using UNIX system calls +can be found in [Kernighan and Pike 1984]. +Further documentation of the Berkeley UNIX 4.4BSD IPC mechanisms can be +found in [Leffler et al. 1986]. +More detailed information about particular calls and protocols +is provided in sections +2, 3 and 4 of the +UNIX Programmer's Manual [CSRG 1986]. +In particular the following manual pages are relevant: +.(b +.TS +l l. +creating and naming sockets socket(2), bind(2) +establishing connections listen(2), accept(2), connect(2) +transferring data read(2), write(2), send(2), recv(2) +addresses inet(4F) +protocols tcp(4P), udp(4P). +.TE +.)b +.(b +.sp +.b +Acknowledgements +.pp +I would like to thank Sam Leffler and Mike Karels for their help in +understanding the IPC mechanisms and all the people whose comments +have helped in writing and improving this report. +.pp +This work was sponsored by the Defense Advanced Research Projects Agency +(DoD), ARPA Order No. 4031, monitored by the Naval Electronics Systems +Command under contract No. N00039-C-0235. +The views and conclusions contained in this document are those of the +author and should not be interpreted as representing official policies, +either expressed or implied, of the Defense Research Projects Agency +or of the US Government. +.)b +.(b +.sp +.b +References +.r +.sp +.ls 1 +B.W. Kernighan & R. Pike, 1984, +.i "The UNIX Programming Environment." +Englewood Cliffs, N.J.: Prentice-Hall. +.sp +.ls 1 +B.W. Kernighan & D.M. Ritchie, 1978, +.i "The C Programming Language," +Englewood Cliffs, N.J.: Prentice-Hall. +.sp +.ls 1 +S.J. Leffler, R.S. Fabry, W.N. Joy, P. Lapsley, S. Miller & C. Torek, 1986, +.i "An Advanced 4.4BSD Interprocess Communication Tutorial." +Computer Systems Research Group, +Department of Electrical Engineering and Computer Science, +University of California, Berkeley. +.sp +.ls 1 +Computer Systems Research Group, 1986, +.i "UNIX Programmer's Manual, 4.4 Berkeley Software Distribution." +Computer Systems Research Group, +Department of Electrical Engineering and Computer Science, +University of California, Berkeley. +.)b diff --git a/share/doc/psd/20.ipctut/udgramread.c b/share/doc/psd/20.ipctut/udgramread.c new file mode 100644 index 0000000..2cb605d --- /dev/null +++ b/share/doc/psd/20.ipctut/udgramread.c @@ -0,0 +1,80 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)udgramread.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/un.h> + +/* + * In the included file <sys/un.h> a sockaddr_un is defined as follows + * struct sockaddr_un { + * short sun_family; + * char sun_path[108]; + * }; + */ + +#include <stdio.h> + +#define NAME "socket" + +/* + * This program creates a UNIX domain datagram socket, binds a name to it, + * then reads from the socket. + */ +main() +{ + int sock, length; + struct sockaddr_un name; + char buf[1024]; + + /* Create socket from which to read. */ + sock = socket(AF_UNIX, SOCK_DGRAM, 0); + if (sock < 0) { + perror("opening datagram socket"); + exit(1); + } + /* Create name. */ + name.sun_family = AF_UNIX; + strcpy(name.sun_path, NAME); + if (bind(sock, &name, sizeof(struct sockaddr_un))) { + perror("binding name to datagram socket"); + exit(1); + } + printf("socket -->%s\en", NAME); + /* Read from the socket */ + if (read(sock, buf, 1024) < 0) + perror("receiving datagram packet"); + printf("-->%s\en", buf); + close(sock); + unlink(NAME); +} diff --git a/share/doc/psd/20.ipctut/udgramsend.c b/share/doc/psd/20.ipctut/udgramsend.c new file mode 100644 index 0000000..3e3ba93 --- /dev/null +++ b/share/doc/psd/20.ipctut/udgramsend.c @@ -0,0 +1,68 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)udgramsend.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <stdio.h> + +#define DATA "The sea is calm tonight, the tide is full . . ." + +/* + * Here I send a datagram to a receiver whose name I get from the command + * line arguments. The form of the command line is udgramsend pathname + */ + +main(argc, argv) + int argc; + char *argv[]; +{ + int sock; + struct sockaddr_un name; + + /* Create socket on which to send. */ + sock = socket(AF_UNIX, SOCK_DGRAM, 0); + if (sock < 0) { + perror("opening datagram socket"); + exit(1); + } + /* Construct name of socket to send to. */ + name.sun_family = AF_UNIX; + strcpy(name.sun_path, argv[1]); + /* Send message. */ + if (sendto(sock, DATA, sizeof(DATA), 0, + &name, sizeof(struct sockaddr_un)) < 0) { + perror("sending datagram message"); + } + close(sock); +} diff --git a/share/doc/psd/20.ipctut/ustreamread.c b/share/doc/psd/20.ipctut/ustreamread.c new file mode 100644 index 0000000..97fadb9 --- /dev/null +++ b/share/doc/psd/20.ipctut/ustreamread.c @@ -0,0 +1,96 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)ustreamread.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <stdio.h> + +#define NAME "socket" + +/* + * This program creates a socket in the UNIX domain and binds a name to it. + * After printing the socket's name it begins a loop. Each time through the + * loop it accepts a connection and prints out messages from it. When the + * connection breaks, or a termination message comes through, the program + * accepts a new connection. + */ +main() +{ + int sock, msgsock, rval; + struct sockaddr_un server; + char buf[1024]; + + /* Create socket */ + sock = socket(AF_UNIX, SOCK_STREAM, 0); + if (sock < 0) { + perror("opening stream socket"); + exit(1); + } + /* Name socket using file system name */ + server.sun_family = AF_UNIX; + strcpy(server.sun_path, NAME); + if (bind(sock, &server, sizeof(struct sockaddr_un))) { + perror("binding stream socket"); + exit(1); + } + printf("Socket has name %s\en", server.sun_path); + /* Start accepting connections */ + listen(sock, 5); + for (;;) { + msgsock = accept(sock, 0, 0); + if (msgsock == -1) + perror("accept"); + else do { + bzero(buf, sizeof(buf)); + if ((rval = read(msgsock, buf, 1024)) < 0) + perror("reading stream message"); + else if (rval == 0) + printf("Ending connection\en"); + else + printf("-->%s\en", buf); + } while (rval > 0); + close(msgsock); + } + /* + * The following statements are not executed, because they follow an + * infinite loop. However, most ordinary programs will not run + * forever. In the UNIX domain it is necessary to tell the file + * system that one is through using NAME. In most programs one uses + * the call unlink() as below. Since the user will have to kill this + * program, it will be necessary to remove the name by a command from + * the shell. + */ + close(sock); + unlink(NAME); +} diff --git a/share/doc/psd/20.ipctut/ustreamwrite.c b/share/doc/psd/20.ipctut/ustreamwrite.c new file mode 100644 index 0000000..bdc0b95 --- /dev/null +++ b/share/doc/psd/20.ipctut/ustreamwrite.c @@ -0,0 +1,71 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)ustreamwrite.c 8.1 (Berkeley) 6/8/93 +.\" +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <stdio.h> + +#define DATA "Half a league, half a league . . ." + +/* + * This program connects to the socket named in the command line and sends a + * one line message to that socket. The form of the command line is + * ustreamwrite pathname + */ +main(argc, argv) + int argc; + char *argv[]; +{ + int sock; + struct sockaddr_un server; + char buf[1024]; + + /* Create socket */ + sock = socket(AF_UNIX, SOCK_STREAM, 0); + if (sock < 0) { + perror("opening stream socket"); + exit(1); + } + /* Connect socket using name specified by command line. */ + server.sun_family = AF_UNIX; + strcpy(server.sun_path, argv[1]); + + if (connect(sock, &server, sizeof(struct sockaddr_un)) < 0) { + close(sock); + perror("connecting stream socket"); + exit(1); + } + if (write(sock, DATA, sizeof(DATA)) < 0) + perror("writing on stream socket"); +} diff --git a/share/doc/psd/21.ipc/0.t b/share/doc/psd/21.ipc/0.t new file mode 100644 index 0000000..d28199a --- /dev/null +++ b/share/doc/psd/21.ipc/0.t @@ -0,0 +1,93 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 8.1 (Berkeley) 6/8/93 +.\" +.EH 'PSD:21-%''Advanced 4.4BSD IPC Tutorial' +.OH 'Advanced 4.4BSD IPC Tutorial''PSD:21-%' +.ds lq `` +.ds rq '' +.de DT +.if t .ta .5i 1.25i 2.5i 3.75i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i +.. +.bd S B 3 +.TL +An Advanced 4.4BSD Interprocess Communication Tutorial +.AU +Samuel J. Leffler +.AU +Robert S. Fabry +.AU +William N. Joy +.AU +Phil Lapsley +.AI +Computer Systems Research Group +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +.sp 2 +.AU +Steve Miller +.AU +Chris Torek +.AI +Heterogeneous Systems Laboratory +Department of Computer Science +University of Maryland, College Park +College Park, Maryland 20742 +.de IR +\fI\\$1\fP\\$2 +.. +.de UX +UNIX\\$1 +.. +.AB +.PP +.FS +* \s-2UNIX\s0 is a trademark of UNIX System Laboratories, Inc. +in the US and some other countries. +.FE +This document provides an introduction to the interprocess +communication facilities included in the +4.4BSD release of the +.UX * +system. +.PP +It discusses the overall model for interprocess communication +and introduces the interprocess communication primitives +which have been added to the system. The majority of the +document considers the use of these primitives in developing +applications. The reader is expected to be familiar with +the C programming language as all examples are written in C. +.AE diff --git a/share/doc/psd/21.ipc/1.t b/share/doc/psd/21.ipc/1.t new file mode 100644 index 0000000..f4e48ff --- /dev/null +++ b/share/doc/psd/21.ipc/1.t @@ -0,0 +1,106 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 8.1 (Berkeley) 8/14/93 +.\" +.\".ds LH "4.4BSD IPC Primer +.\".ds RH Introduction +.\".ds RF "Leffler/Fabry/Joy +.\".ds LF "\*(DY +.\".ds CF " +.nr H1 1 +.LP +.bp +.LG +.B +.ce +1. INTRODUCTION +.sp 2 +.R +.NL +One of the most important additions to UNIX in 4.2BSD was interprocess +communication. +These facilities were the result of +more than two years of discussion and research. The facilities +provided in 4.2BSD incorporated many of the ideas from current +research, while trying to maintain the UNIX philosophy of +simplicity and conciseness. +The 4.3BSD release of Berkeley UNIX +improved upon some of the IPC facilities +while providing an upward-compatible interface. +4.4BSD adds support for ISO protocols and IP multicasting. +The BSD interprocess communication +facilities have become a defacto standard for UNIX. +.PP +UNIX has previously been very weak in the area of interprocess +communication. Prior to the 4BSD facilities, the only +standard mechanism which allowed two processes to communicate were +pipes (the mpx files which were part of Version 7 were +experimental). Unfortunately, pipes are very restrictive +in that +the two communicating processes must be related through a +common ancestor. +Further, the semantics of pipes makes them almost impossible +to maintain in a distributed environment. +.PP +Earlier attempts at extending the IPC facilities of UNIX have +met with mixed reaction. The majority of the problems have +been related to the fact that these facilities have been tied to +the UNIX file system, either through naming or implementation. +Consequently, the IPC facilities provided in 4.2BSD were +designed as a totally independent subsystem. The BSD IPC +allows processes to rendezvous in many ways. +Processes may rendezvous through a UNIX file system-like +name space (a space where all names are path names) +as well as through a +network name space. In fact, new name spaces may +be added at a future time with only minor changes visible +to users. Further, the communication facilities +have been extended to include more than the simple byte stream +provided by a pipe. These extensions have resulted +in a completely new part of the system which users will need +time to familiarize themselves with. It is likely that as +more use is made of these facilities they will be refined; +only time will tell. +.PP +This document provides a high-level description +of the IPC facilities in 4.4BSD and their use. +It is designed to complement the manual pages for the IPC primitives +by examples of their use. +The remainder of this document is organized in four sections. +Section 2 introduces the IPC-related system calls and the basic model +of communication. Section 3 describes some of the supporting +library routines users may find useful in constructing distributed +applications. Section 4 is concerned with the client/server model +used in developing applications and includes examples of the +two major types of servers. Section 5 delves into advanced topics +which sophisticated users are likely to encounter when using +the IPC facilities. diff --git a/share/doc/psd/21.ipc/2.t b/share/doc/psd/21.ipc/2.t new file mode 100644 index 0000000..6f08454 --- /dev/null +++ b/share/doc/psd/21.ipc/2.t @@ -0,0 +1,714 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 8.1 (Berkeley) 8/14/93 +.\" +.\".ds RH "Basics +.bp +.nr H1 2 +.nr H2 0 +.\" The next line is a major hack to get around internal changes in the groff +.\" implementation of .NH. +.nr nh*hl 1 +.bp +.LG +.B +.ce +2. BASICS +.sp 2 +.R +.NL +.PP +The basic building block for communication is the \fIsocket\fP. +A socket is an endpoint of communication to which a name may +be \fIbound\fP. Each socket in use has a \fItype\fP +and one or more associated processes. Sockets exist within +\fIcommunication domains\fP. +A communication domain is an +abstraction introduced to bundle common properties of +processes communicating through sockets. +One such property is the scheme used to name sockets. For +example, in the UNIX communication domain sockets are +named with UNIX path names; e.g. a +socket may be named \*(lq/dev/foo\*(rq. Sockets normally +exchange data only with +sockets in the same domain (it may be possible to cross domain +boundaries, but only if some translation process is +performed). The +4.4BSD IPC facilities support four separate communication domains: +the UNIX domain, for on-system communication; +the Internet domain, which is used by +processes which communicate +using the Internet standard communication protocols; +the NS domain, which is used by processes which +communicate using the Xerox standard communication +protocols*; +.FS +* See \fIInternet Transport Protocols\fP, Xerox System Integration +Standard (XSIS)028112 for more information. This document is +almost a necessity for one trying to write NS applications. +.FE +and the ISO OSI protocols, which are not documented in this tutorial. +The underlying communication +facilities provided by these domains have a significant influence +on the internal system implementation as well as the interface to +socket facilities available to a user. An example of the +latter is that a socket \*(lqoperating\*(rq in the UNIX domain +sees a subset of the error conditions which are possible +when operating in the Internet (or NS) domain. +.NH 2 +Socket types +.PP +Sockets are +typed according to the communication properties visible to a +user. +Processes are presumed to communicate only between sockets of +the same type, although there is +nothing that prevents communication between sockets of different +types should the underlying communication +protocols support this. +.PP +Four types of sockets currently are available to a user. +A \fIstream\fP socket provides for the bidirectional, reliable, +sequenced, and unduplicated flow of data without record boundaries. +Aside from the bidirectionality of data flow, a pair of connected +stream sockets provides an interface nearly identical to that of pipes\(dg. +.FS +\(dg In the UNIX domain, in fact, the semantics are identical and, +as one might expect, pipes have been implemented internally +as simply a pair of connected stream sockets. +.FE +.PP +A \fIdatagram\fP socket supports bidirectional flow of data which +is not promised to be sequenced, reliable, or unduplicated. +That is, a process +receiving messages on a datagram socket may find messages duplicated, +and, possibly, +in an order different from the order in which it was sent. +An important characteristic of a datagram +socket is that record boundaries in data are preserved. Datagram +sockets closely model the facilities found in many contemporary +packet switched networks such as the Ethernet. +.PP +A \fIraw\fP socket provides users access to +the underlying communication +protocols which support socket abstractions. +These sockets are normally datagram oriented, though their +exact characteristics are dependent on the interface provided by +the protocol. Raw sockets are not intended for the general user; they +have been provided mainly for those interested in developing new +communication protocols, or for gaining access to some of the more +esoteric facilities of an existing protocol. The use of raw sockets +is considered in section 5. +.PP +A \fIsequenced packet\fP socket is similar to a stream socket, +with the exception that record boundaries are preserved. This +interface is provided only as part of the NS socket abstraction, +and is very important in most serious NS applications. +Sequenced-packet sockets allow the user to manipulate the +SPP or IDP headers on a packet or a group of packets either +by writing a prototype header along with whatever data is +to be sent, or by specifying a default header to be used with +all outgoing data, and allows the user to receive the headers +on incoming packets. The use of these options is considered in +section 5. +.PP +Another potential socket type which has interesting properties is +the \fIreliably delivered +message\fP socket. +The reliably delivered message socket has +similar properties to a datagram socket, but with +reliable delivery. There is currently no support for this +type of socket, but a reliably delivered message protocol +similar to Xerox's Packet Exchange Protocol (PEX) may be +simulated at the user level. More information on this topic +can be found in section 5. +.NH 2 +Socket creation +.PP +To create a socket the \fIsocket\fP system call is used: +.DS +s = socket(domain, type, protocol); +.DE +This call requests that the system create a socket in the specified +\fIdomain\fP and of the specified \fItype\fP. A particular protocol may +also be requested. If the protocol is left unspecified (a value +of 0), the system will select an appropriate protocol from those +protocols which comprise the communication domain and which +may be used to support the requested socket type. The user is +returned a descriptor (a small integer number) which may be used +in later system calls which operate on sockets. The domain is specified as +one of the manifest constants defined in the file <\fIsys/socket.h\fP>. +For the UNIX domain the constant is AF_UNIX*; for the Internet +.FS +* The manifest constants are named AF_whatever as they indicate +the ``address format'' to use in interpreting names. +.FE +domain AF_INET; and for the NS domain, AF_NS. +The socket types are also defined in this file +and one of SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, or SOCK_SEQPACKET +must be specified. +To create a stream socket in the Internet domain the following +call might be used: +.DS +s = socket(AF_INET, SOCK_STREAM, 0); +.DE +This call would result in a stream socket being created with the TCP +protocol providing the underlying communication support. To +create a datagram socket for on-machine use the call might +be: +.DS +s = socket(AF_UNIX, SOCK_DGRAM, 0); +.DE +.PP +The default protocol (used when the \fIprotocol\fP argument to the +\fIsocket\fP call is 0) should be correct for most every +situation. However, it is possible to specify a protocol +other than the default; this will be covered in +section 5. +.PP +There are several reasons a socket call may fail. Aside from +the rare occurrence of lack of memory (ENOBUFS), a socket +request may fail due to a request for an unknown protocol +(EPROTONOSUPPORT), or a request for a type of socket for +which there is no supporting protocol (EPROTOTYPE). +.NH 2 +Binding local names +.PP +A socket is created without a name. Until a name is bound +to a socket, processes have no way to reference it and, consequently, +no messages may be received on it. +Communicating processes are bound +by an \fIassociation\fP. In the Internet and NS domains, +an association +is composed of local and foreign +addresses, and local and foreign ports, +while in the UNIX domain, an association is composed of +local and foreign path names (the phrase ``foreign pathname'' +means a pathname created by a foreign process, not a pathname +on a foreign system). +In most domains, associations must be unique. +In the Internet domain there +may never be duplicate <protocol, local address, local port, foreign +address, foreign port> tuples. UNIX domain sockets need not always +be bound to a name, but when bound +there may never be duplicate <protocol, local pathname, foreign +pathname> tuples. +The pathnames may not refer to files +already existing on the system +in 4.3; the situation may change in future releases. +.PP +The \fIbind\fP system call allows a process to specify half of +an association, <local address, local port> +(or <local pathname>), while the \fIconnect\fP +and \fIaccept\fP primitives are used to complete a socket's association. +.PP +In the Internet domain, +binding names to sockets can be fairly complex. +Fortunately, it is usually not necessary to specifically bind an +address and port number to a socket, because the +\fIconnect\fP and \fIsend\fP calls will automatically +bind an appropriate address if they are used with an +unbound socket. The process of binding names to NS +sockets is similar in most ways to that of +binding names to Internet sockets. +.PP +The \fIbind\fP system call is used as follows: +.DS +bind(s, name, namelen); +.DE +The bound name is a variable length byte string which is interpreted +by the supporting protocol(s). Its interpretation may vary from +communication domain to communication domain (this is one of +the properties which comprise the \*(lqdomain\*(rq). +As mentioned, in the +Internet domain names contain an Internet address and port +number. NS domain names contain an NS address and +port number. In the UNIX domain, names contain a path name and +a family, which is always AF_UNIX. If one wanted to bind +the name \*(lq/tmp/foo\*(rq to a UNIX domain socket, the +following code would be used*: +.FS +* Note that, although the tendency here is to call the \*(lqaddr\*(rq +structure \*(lqsun\*(rq, doing so would cause problems if the code +were ever ported to a Sun workstation. +.FE +.DS +#include <sys/un.h> + ... +struct sockaddr_un addr; + ... +strcpy(addr.sun_path, "/tmp/foo"); +addr.sun_family = AF_UNIX; +bind(s, (struct sockaddr *) &addr, strlen(addr.sun_path) + + sizeof (addr.sun_len) + sizeof (addr.sun_family)); +.DE +Note that in determining the size of a UNIX domain address null +bytes are not counted, which is why \fIstrlen\fP is used. In +the current implementation of UNIX domain IPC, +the file name +referred to in \fIaddr.sun_path\fP is created as a socket +in the system file space. +The caller must, therefore, have +write permission in the directory where +\fIaddr.sun_path\fP is to reside, and this file should be deleted by the +caller when it is no longer needed. Future versions of 4BSD +may not create this file. +.PP +In binding an Internet address things become more +complicated. The actual call is similar, +.DS +#include <sys/types.h> +#include <netinet/in.h> + ... +struct sockaddr_in sin; + ... +bind(s, (struct sockaddr *) &sin, sizeof (sin)); +.DE +but the selection of what to place in the address \fIsin\fP +requires some discussion. We will come back to the problem +of formulating Internet addresses in section 3 when +the library routines used in name resolution are discussed. +.PP +Binding an NS address to a socket is even more +difficult, +especially since the Internet library routines do not +work with NS hostnames. The actual call is again similar: +.DS +#include <sys/types.h> +#include <netns/ns.h> + ... +struct sockaddr_ns sns; + ... +bind(s, (struct sockaddr *) &sns, sizeof (sns)); +.DE +Again, discussion of what to place in a \*(lqstruct sockaddr_ns\*(rq +will be deferred to section 3. +.NH 2 +Connection establishment +.PP +Connection establishment is usually asymmetric, +with one process a \*(lqclient\*(rq and the other a \*(lqserver\*(rq. +The server, when willing to offer its advertised services, +binds a socket to a well-known address associated with the service +and then passively \*(lqlistens\*(rq on its socket. +It is then possible for an unrelated process to rendezvous +with the server. +The client requests services from the server by initiating a +\*(lqconnection\*(rq to the server's socket. +On the client side the \fIconnect\fP call is +used to initiate a connection. Using the UNIX domain, this +might appear as, +.DS +struct sockaddr_un server; + ... +connect(s, (struct sockaddr *)&server, strlen(server.sun_path) + + sizeof (server.sun_family)); +.DE +while in the Internet domain, +.DS +struct sockaddr_in server; + ... +connect(s, (struct sockaddr *)&server, sizeof (server)); +.DE +and in the NS domain, +.DS +struct sockaddr_ns server; + ... +connect(s, (struct sockaddr *)&server, sizeof (server)); +.DE +where \fIserver\fP in the example above would contain either the UNIX +pathname, Internet address and port number, or NS address and +port number of the server to which the +client process wishes to speak. +If the client process's socket is unbound at the time of +the connect call, +the system will automatically select and bind a name to +the socket if necessary; c.f. section 5.4. +This is the usual way that local addresses are bound +to a socket. +.PP +An error is returned if the connection was unsuccessful +(any name automatically bound by the system, however, remains). +Otherwise, the socket is associated with the server and +data transfer may begin. Some of the more common errors returned +when a connection attempt fails are: +.IP ETIMEDOUT +.br +After failing to establish a connection for a period of time, +the system decided there was no point in retrying the +connection attempt any more. This usually occurs because +the destination host is down, or because problems in +the network resulted in transmissions being lost. +.IP ECONNREFUSED +.br +The host refused service for some reason. +This is usually +due to a server process +not being present at the requested name. +.IP "ENETDOWN or EHOSTDOWN" +.br +These operational errors are +returned based on status information delivered to +the client host by the underlying communication services. +.IP "ENETUNREACH or EHOSTUNREACH" +.br +These operational errors can occur either because the network +or host is unknown (no route to the network or host is present), +or because of status information returned by intermediate +gateways or switching nodes. Many times the status returned +is not sufficient to distinguish a network being down from a +host being down, in which case the system +indicates the entire network is unreachable. +.PP +For the server to receive a client's connection it must perform +two steps after binding its socket. +The first is to indicate a willingness to listen for +incoming connection requests: +.DS +listen(s, 5); +.DE +The second parameter to the \fIlisten\fP call specifies the maximum +number of outstanding connections which may be queued awaiting +acceptance by the server process; this number +may be limited by the system. Should a connection be +requested while the queue is full, the connection will not be +refused, but rather the individual messages which comprise the +request will be ignored. This gives a harried server time to +make room in its pending connection queue while the client +retries the connection request. Had the connection been returned +with the ECONNREFUSED error, the client would be unable to tell +if the server was up or not. As it is now it is still possible +to get the ETIMEDOUT error back, though this is unlikely. The +backlog figure supplied with the listen call is currently limited +by the system to a maximum of 5 pending connections on any +one queue. This avoids the problem of processes hogging system +resources by setting an infinite backlog, then ignoring +all connection requests. +.PP +With a socket marked as listening, a server may \fIaccept\fP +a connection: +.DS +struct sockaddr_in from; + ... +fromlen = sizeof (from); +newsock = accept(s, (struct sockaddr *)&from, &fromlen); +.DE +(For the UNIX domain, \fIfrom\fP would be declared as a +\fIstruct sockaddr_un\fP, and for the NS domain, \fIfrom\fP +would be declared as a \fIstruct sockaddr_ns\fP, +but nothing different would need +to be done as far as \fIfromlen\fP is concerned. In the examples +which follow, only Internet routines will be discussed.) A new +descriptor is returned on receipt of a connection (along with +a new socket). If the server wishes to find out who its client is, +it may supply a buffer for the client socket's name. The value-result +parameter \fIfromlen\fP is initialized by the server to indicate how +much space is associated with \fIfrom\fP, then modified on return +to reflect the true size of the name. If the client's name is not +of interest, the second parameter may be a null pointer. +.PP +\fIAccept\fP normally blocks. That is, \fIaccept\fP +will not return until a connection is available or the system call +is interrupted by a signal to the process. Further, there is no +way for a process to indicate it will accept connections from only +a specific individual, or individuals. It is up to the user process +to consider who the connection is from and close down the connection +if it does not wish to speak to the process. If the server process +wants to accept connections on more than one socket, or wants to avoid blocking +on the accept call, there are alternatives; they will be considered +in section 5. +.NH 2 +Data transfer +.PP +With a connection established, data may begin to flow. To send +and receive data there are a number of possible calls. +With the peer entity at each end of a connection +anchored, a user can send or receive a message without specifying +the peer. As one might expect, in this case, then +the normal \fIread\fP and \fIwrite\fP system calls are usable, +.DS +write(s, buf, sizeof (buf)); +read(s, buf, sizeof (buf)); +.DE +In addition to \fIread\fP and \fIwrite\fP, +the new calls \fIsend\fP and \fIrecv\fP +may be used: +.DS +send(s, buf, sizeof (buf), flags); +recv(s, buf, sizeof (buf), flags); +.DE +While \fIsend\fP and \fIrecv\fP are virtually identical to +\fIread\fP and \fIwrite\fP, +the extra \fIflags\fP argument is important. The flags, +defined in \fI<sys/socket.h>\fP, may be +specified as a non-zero value if one or more +of the following is required: +.DS +.TS +l l. +MSG_OOB send/receive out of band data +MSG_PEEK look at data without reading +MSG_DONTROUTE send data without routing packets +.TE +.DE +Out of band data is a notion specific to stream sockets, and one +which we will not immediately consider. The option to have data +sent without routing applied to the outgoing packets is currently +used only by the routing table management process, and is +unlikely to be of interest to the casual user. The ability +to preview data is, however, of interest. When MSG_PEEK +is specified with a \fIrecv\fP call, any data present is returned +to the user, but treated as still \*(lqunread\*(rq. That +is, the next \fIread\fP or \fIrecv\fP call applied to the socket will +return the data previously previewed. +.NH 2 +Discarding sockets +.PP +Once a socket is no longer of interest, it may be discarded +by applying a \fIclose\fP to the descriptor, +.DS +close(s); +.DE +If data is associated with a socket which promises reliable delivery +(e.g. a stream socket) when a close takes place, the system will +continue to attempt to transfer the data. +However, after a fairly long period of +time, if the data is still undelivered, it will be discarded. +Should a user have no use for any pending data, it may +perform a \fIshutdown\fP on the socket prior to closing it. +This call is of the form: +.DS +shutdown(s, how); +.DE +where \fIhow\fP is 0 if the user is no longer interested in reading +data, 1 if no more data will be sent, or 2 if no data is to +be sent or received. +.NH 2 +Connectionless sockets +.PP +To this point we have been concerned mostly with sockets which +follow a connection oriented model. However, there is also +support for connectionless interactions typical of the datagram +facilities found in contemporary packet switched networks. +A datagram socket provides a symmetric interface to data +exchange. While processes are still likely to be client +and server, there is no requirement for connection establishment. +Instead, each message includes the destination address. +.PP +Datagram sockets are created as before. +If a particular local address is needed, +the \fIbind\fP operation must precede the first data transmission. +Otherwise, the system will set the local address and/or port +when data is first sent. +To send data, the \fIsendto\fP primitive is used, +.DS +sendto(s, buf, buflen, flags, (struct sockaddr *)&to, tolen); +.DE +The \fIs\fP, \fIbuf\fP, \fIbuflen\fP, and \fIflags\fP +parameters are used as before. +The \fIto\fP and \fItolen\fP +values are used to indicate the address of the intended recipient of the +message. When +using an unreliable datagram interface, it is +unlikely that any errors will be reported to the sender. When +information is present locally to recognize a message that can +not be delivered (for instance when a network is unreachable), +the call will return \-1 and the global value \fIerrno\fP will +contain an error number. +.PP +To receive messages on an unconnected datagram socket, the +\fIrecvfrom\fP primitive is provided: +.DS +recvfrom(s, buf, buflen, flags, (struct sockaddr *)&from, &fromlen); +.DE +Once again, the \fIfromlen\fP parameter is handled in +a value-result fashion, initially containing the size of +the \fIfrom\fP buffer, and modified on return to indicate +the actual size of the address from which the datagram was received. +.PP +In addition to the two calls mentioned above, datagram +sockets may also use the \fIconnect\fP call to associate +a socket with a specific destination address. In this case, any +data sent on the socket will automatically be addressed +to the connected peer, and only data received from that +peer will be delivered to the user. Only one connected +address is permitted for each socket at one time; +a second connect will change the destination address, +and a connect to a null address (family AF_UNSPEC) +will disconnect. +Connect requests on datagram sockets return immediately, +as this simply results in the system recording +the peer's address (as compared to a stream socket, where a +connect request initiates establishment of an end to end +connection). \fIAccept\fP and \fIlisten\fP are not +used with datagram sockets. +.PP +While a datagram socket socket is connected, +errors from recent \fIsend\fP calls may be returned +asynchronously. +These errors may be reported on subsequent operations +on the socket, +or a special socket option used with \fIgetsockopt\fP, SO_ERROR, +may be used to interrogate the error status. +A \fIselect\fP for reading or writing will return true +when an error indication has been received. +The next operation will return the error, and the error status is cleared. +Other of the less +important details of datagram sockets are described +in section 5. +.NH 2 +Input/Output multiplexing +.PP +One last facility often used in developing applications +is the ability to multiplex i/o requests among multiple +sockets and/or files. This is done using the \fIselect\fP +call: +.DS +#include <sys/time.h> +#include <sys/types.h> + ... + +fd_set readmask, writemask, exceptmask; +struct timeval timeout; + ... +select(nfds, &readmask, &writemask, &exceptmask, &timeout); +.DE +\fISelect\fP takes as arguments pointers to three sets, one for +the set of file descriptors for which the caller wishes to +be able to read data on, one for those descriptors to which +data is to be written, and one for which exceptional conditions +are pending; out-of-band data is the only +exceptional condition currently implemented by the socket +If the user is not interested +in certain conditions (i.e., read, write, or exceptions), +the corresponding argument to the \fIselect\fP should +be a null pointer. +.PP +Each set is actually a structure containing an array of +long integer bit masks; the size of the array is set +by the definition FD_SETSIZE. +The array is be +long enough to hold one bit for each of FD_SETSIZE file descriptors. +.PP +The macros FD_SET(\fIfd, &mask\fP) and +FD_CLR(\fIfd, &mask\fP) +have been provided for adding and removing file descriptor +\fIfd\fP in the set \fImask\fP. The +set should be zeroed before use, and +the macro FD_ZERO(\fI&mask\fP) has been provided +to clear the set \fImask\fP. +The parameter \fInfds\fP in the \fIselect\fP call specifies the range +of file descriptors (i.e. one plus the value of the largest +descriptor) to be examined in a set. +.PP +A timeout value may be specified if the selection +is not to last more than a predetermined period of time. If +the fields in \fItimeout\fP are set to 0, the selection takes +the form of a +\fIpoll\fP, returning immediately. If the last parameter is +a null pointer, the selection will block indefinitely*. +.FS +* To be more specific, a return takes place only when a +descriptor is selectable, or when a signal is received by +the caller, interrupting the system call. +.FE +\fISelect\fP normally returns the number of file descriptors selected; +if the \fIselect\fP call returns due to the timeout expiring, then +the value 0 is returned. +If the \fIselect\fP terminates because of an error or interruption, +a \-1 is returned with the error number in \fIerrno\fP, +and with the file descriptor masks unchanged. +.PP +Assuming a successful return, the three sets will +indicate which +file descriptors are ready to be read from, written to, or +have exceptional conditions pending. +The status of a file descriptor in a select mask may be +tested with the \fIFD_ISSET(fd, &mask)\fP macro, which +returns a non-zero value if \fIfd\fP is a member of the set +\fImask\fP, and 0 if it is not. +.PP +To determine if there are connections waiting +on a socket to be used with an \fIaccept\fP call, +\fIselect\fP can be used, followed by +a \fIFD_ISSET(fd, &mask)\fP macro to check for read +readiness on the appropriate socket. If \fIFD_ISSET\fP +returns a non-zero value, indicating permission to read, then a +connection is pending on the socket. +.PP +As an example, to read data from two sockets, \fIs1\fP and +\fIs2\fP as it is available from each and with a one-second +timeout, the following code +might be used: +.DS +#include <sys/time.h> +#include <sys/types.h> + ... +fd_set read_template; +struct timeval wait; + ... +for (;;) { + wait.tv_sec = 1; /* one second */ + wait.tv_usec = 0; + + FD_ZERO(&read_template); + + FD_SET(s1, &read_template); + FD_SET(s2, &read_template); + + nb = select(FD_SETSIZE, &read_template, (fd_set *) 0, (fd_set *) 0, &wait); + if (nb <= 0) { + \fIAn error occurred during the \fPselect\fI, or + the \fPselect\fI timed out.\fP + } + + if (FD_ISSET(s1, &read_template)) { + \fISocket #1 is ready to be read from.\fP + } + + if (FD_ISSET(s2, &read_template)) { + \fISocket #2 is ready to be read from.\fP + } +} +.DE +.PP +In 4.2, the arguments to \fIselect\fP were pointers to integers +instead of pointers to \fIfd_set\fPs. This type of call +will still work as long as the number of file descriptors +being examined is less than the number of bits in an +integer; however, the methods illustrated above should +be used in all current programs. +.PP +\fISelect\fP provides a synchronous multiplexing scheme. +Asynchronous notification of output completion, input availability, +and exceptional conditions is possible through use of the +SIGIO and SIGURG signals described in section 5. diff --git a/share/doc/psd/21.ipc/3.t b/share/doc/psd/21.ipc/3.t new file mode 100644 index 0000000..6e7eb06 --- /dev/null +++ b/share/doc/psd/21.ipc/3.t @@ -0,0 +1,411 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)3.t 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.\" +.\".ds RH "Network Library Routines +.bp +.nr H1 3 +.nr H2 0 +.bp +.LG +.B +.ce +3. NETWORK LIBRARY ROUTINES +.sp 2 +.R +.NL +.PP +The discussion in section 2 indicated the possible need to +locate and construct network addresses when using the +interprocess communication facilities in a distributed +environment. To aid in this task a number of routines +have been added to the standard C run-time library. +In this section we will consider the new routines provided +to manipulate network addresses. While the 4.4BSD networking +facilities support the Internet protocols +and the Xerox NS protocols, +most of the routines presented +in this section do not apply to the NS domain. Unless otherwise +stated, it should be assumed that the routines presented in this +section do not apply to the NS domain. +.PP +Locating a service on a remote host requires many levels of +mapping before client and server may +communicate. A service is assigned a name which is intended +for human consumption; e.g. \*(lqthe \fIlogin server\fP on host +monet\*(rq. +This name, and the name of the peer host, must then be translated +into network \fIaddresses\fP which are not necessarily suitable +for human consumption. Finally, the address must then used in locating +a physical \fIlocation\fP and \fIroute\fP to the service. The +specifics of these three mappings are likely to vary between +network architectures. For instance, it is desirable for a network +to not require hosts to +be named in such a way that their physical location is known by +the client host. Instead, underlying services in the network +may discover the actual location of the host at the time a client +host wishes to communicate. This ability to have hosts named in +a location independent manner may induce overhead in connection +establishment, as a discovery process must take place, +but allows a host to be physically mobile without requiring it to +notify its clientele of its current location. +.PP +Standard routines are provided for: mapping host names +to network addresses, network names to network numbers, +protocol names to protocol numbers, and service names +to port numbers and the appropriate protocol to +use in communicating with the server process. The +file <\fInetdb.h\fP> must be included when using any of these +routines. +.NH 2 +Host names +.PP +An Internet host name to address mapping is represented by +the \fIhostent\fP structure: +.DS +.if t .ta 0.6i 1.1i 2.6i +struct hostent { + char *h_name; /* official name of host */ + char **h_aliases; /* alias list */ + int h_addrtype; /* host address type (e.g., AF_INET) */ + int h_length; /* length of address */ + char **h_addr_list; /* list of addresses, null terminated */ +}; + +#define h_addr h_addr_list[0] /* first address, network byte order */ +.DE +The routine \fIgethostbyname\fP(3N) takes an Internet host name +and returns a \fIhostent\fP structure, +while the routine \fIgethostbyaddr\fP(3N) +maps Internet host addresses into a \fIhostent\fP structure. +.PP +The official name of the host and its public aliases are +returned by these routines, +along with the address type (family) and a null terminated list of +variable length address. This list of addresses is +required because it is possible +for a host to have many addresses, all having the same name. +The \fIh_addr\fP definition is provided for backward compatibility, +and is defined to be the first address in the list of addresses +in the \fIhostent\fP structure. +.PP +The database for these calls is provided either by the +file \fI/etc/hosts\fP (\fIhosts\fP\|(5)), +or by use of a nameserver, \fInamed\fP\|(8). +Because of the differences in these databases and their access protocols, +the information returned may differ. +When using the host table version of \fIgethostbyname\fP, +only one address will be returned, but all listed aliases will be included. +The nameserver version may return alternate addresses, +but will not provide any aliases other than one given as argument. +.PP +Unlike Internet names, NS names are always mapped into host +addresses by the use of a standard NS \fIClearinghouse service\fP, +a distributed name and authentication server. The algorithms +for mapping NS names to addresses via a Clearinghouse are +rather complicated, and the routines are not part of the +standard libraries. The user-contributed Courier (Xerox +remote procedure call protocol) compiler contains routines +to accomplish this mapping; see the documentation and +examples provided therein for more information. It is +expected that almost all software that has to communicate +using NS will need to use the facilities of +the Courier compiler. +.PP +An NS host address is represented by the following: +.DS +union ns_host { + u_char c_host[6]; + u_short s_host[3]; +}; + +union ns_net { + u_char c_net[4]; + u_short s_net[2]; +}; + +struct ns_addr { + union ns_net x_net; + union ns_host x_host; + u_short x_port; +}; +.DE +The following code fragment inserts a known NS address into +a \fIns_addr\fP: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> + ... +u_long netnum; +struct sockaddr_ns dst; + ... +bzero((char *)&dst, sizeof(dst)); + +/* + * There is no convenient way to assign a long + * integer to a ``union ns_net'' at present; in + * the future, something will hopefully be provided, + * but this is the portable way to go for now. + * The network number below is the one for the NS net + * that the desired host (gyre) is on. + */ +netnum = htonl(2266); +dst.sns_addr.x_net = *(union ns_net *) &netnum; +dst.sns_family = AF_NS; + +/* + * host 2.7.1.0.2a.18 == "gyre:Computer Science:UofMaryland" + */ +dst.sns_addr.x_host.c_host[0] = 0x02; +dst.sns_addr.x_host.c_host[1] = 0x07; +dst.sns_addr.x_host.c_host[2] = 0x01; +dst.sns_addr.x_host.c_host[3] = 0x00; +dst.sns_addr.x_host.c_host[4] = 0x2a; +dst.sns_addr.x_host.c_host[5] = 0x18; +dst.sns_addr.x_port = htons(75); +.DE +.NH 2 +Network names +.PP +As for host names, routines for mapping network names to numbers, +and back, are provided. These routines return a \fInetent\fP +structure: +.DS +.DT +/* + * Assumption here is that a network number + * fits in 32 bits -- probably a poor one. + */ +struct netent { + char *n_name; /* official name of net */ + char **n_aliases; /* alias list */ + int n_addrtype; /* net address type */ + int n_net; /* network number, host byte order */ +}; +.DE +The routines \fIgetnetbyname\fP(3N), \fIgetnetbynumber\fP(3N), +and \fIgetnetent\fP(3N) are the network counterparts to the +host routines described above. The routines extract their +information from \fI/etc/networks\fP. +.PP +NS network numbers are determined either by asking your local +Xerox Network Administrator (and hardcoding the information +into your code), or by querying the Clearinghouse for addresses. +The internetwork router is the only process +that needs to manipulate network numbers on a regular basis; if +a process wishes to communicate with a machine, it should ask the +Clearinghouse for that machine's address (which will include +the net number). +.NH 2 +Protocol names +.PP +For protocols, which are defined in \fI/etc/protocols\fP, +the \fIprotoent\fP structure defines the +protocol-name mapping +used with the routines \fIgetprotobyname\fP(3N), +\fIgetprotobynumber\fP(3N), +and \fIgetprotoent\fP(3N): +.DS +.DT +struct protoent { + char *p_name; /* official protocol name */ + char **p_aliases; /* alias list */ + int p_proto; /* protocol number */ +}; +.DE +.PP +In the NS domain, protocols are indicated by the "client type" +field of an IDP header. No protocol database exists; see section +5 for more information. +.NH 2 +Service names +.PP +Information regarding services is a bit more complicated. A service +is expected to reside at a specific \*(lqport\*(rq and employ +a particular communication protocol. This view is consistent with +the Internet domain, but inconsistent with other network architectures. +Further, a service may reside on multiple ports. +If this occurs, the higher level library routines +will have to be bypassed or extended. +Services available are contained in the file \fI/etc/services\fP. +A service mapping is described by the \fIservent\fP structure, +.DS +.DT +struct servent { + char *s_name; /* official service name */ + char **s_aliases; /* alias list */ + int s_port; /* port number, network byte order */ + char *s_proto; /* protocol to use */ +}; +.DE +The routine \fIgetservbyname\fP(3N) maps service +names to a servent structure by specifying a service name and, +optionally, a qualifying protocol. Thus the call +.DS +sp = getservbyname("telnet", (char *) 0); +.DE +returns the service specification for a telnet server using +any protocol, while the call +.DS +sp = getservbyname("telnet", "tcp"); +.DE +returns only that telnet server which uses the TCP protocol. +The routines \fIgetservbyport\fP(3N) and \fIgetservent\fP(3N) are +also provided. The \fIgetservbyport\fP routine has an interface similar +to that provided by \fIgetservbyname\fP; an optional protocol name may +be specified to qualify lookups. +.PP +In the NS domain, services are handled by a central dispatcher +provided as part of the Courier remote procedure call facilities. +Again, the reader is referred to the Courier compiler documentation +and to the Xerox standard* +.FS +* \fICourier: The Remote Procedure Call Protocol\fP, XSIS 038112. +.FE +for further details. +.NH 2 +Miscellaneous +.PP +With the support routines described above, an Internet application program +should rarely have to deal directly +with addresses. This allows +services to be developed as much as possible in a network independent +fashion. It is clear, however, that purging all network dependencies +is very difficult. So long as the user is required to supply network +addresses when naming services and sockets there will always some +network dependency in a program. For example, the normal +code included in client programs, such as the remote login program, +is of the form shown in Figure 1. +(This example will be considered in more detail in section 4.) +.PP +If we wanted to make the remote login program independent of the +Internet protocols and addressing scheme we would be forced to add +a layer of routines which masked the network dependent aspects from +the mainstream login code. For the current facilities available in +the system this does not appear to be worthwhile. +.PP +Aside from the address-related data base routines, there are several +other routines available in the run-time library which are of interest +to users. These are intended mostly to simplify manipulation of +names and addresses. Table 1 summarizes the routines +for manipulating variable length byte strings and handling byte +swapping of network addresses and values. +.KF +.DS B +.TS +box; +l | l +l | l. +Call Synopsis +_ +bcmp(s1, s2, n) compare byte-strings; 0 if same, not 0 otherwise +bcopy(s1, s2, n) copy n bytes from s1 to s2 +bzero(base, n) zero-fill n bytes starting at base +htonl(val) convert 32-bit quantity from host to network byte order +htons(val) convert 16-bit quantity from host to network byte order +ntohl(val) convert 32-bit quantity from network to host byte order +ntohs(val) convert 16-bit quantity from network to host byte order +.TE +.DE +.ce +Table 1. C run-time routines. +.KE +.PP +The byte swapping routines are provided because the operating +system expects addresses to be supplied in network order (aka ``big-endian'' order). On +``little-endian'' architectures, such as Intel x86 and VAX, +host byte ordering is different than +network byte ordering. Consequently, +programs are sometimes required to byte swap quantities. The +library routines which return network addresses provide them +in network order so that they may simply be copied into the structures +provided to the system. This implies users should encounter the +byte swapping problem only when \fIinterpreting\fP network addresses. +For example, if an Internet port is to be printed out the following +code would be required: +.DS +printf("port number %d\en", ntohs(sp->s_port)); +.DE +On machines where unneeded these routines are defined as null +macros. +.DS +.if t .ta .5i 1.0i 1.5i 2.0i +.if n .ta .7i 1.4i 2.1i 2.8i +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <stdio.h> +#include <netdb.h> + ... +main(argc, argv) + int argc; + char *argv[]; +{ + struct sockaddr_in server; + struct servent *sp; + struct hostent *hp; + int s; + ... + sp = getservbyname("login", "tcp"); + if (sp == NULL) { + fprintf(stderr, "rlogin: login/tcp: unknown service\en"); + exit(1); + } + hp = gethostbyname(argv[1]); + if (hp == NULL) { + fprintf(stderr, "rlogin: %s: unknown host\en", argv[1]); + exit(2); + } + bzero((char *)&server, sizeof (server)); + bcopy(hp->h_addr, (char *)&server.sin_addr, hp->h_length); + server.sin_family = hp->h_addrtype; + server.sin_port = sp->s_port; + s = socket(AF_INET, SOCK_STREAM, 0); + if (s < 0) { + perror("rlogin: socket"); + exit(3); + } + ... + /* Connect does the bind() for us */ + + if (connect(s, (char *)&server, sizeof (server)) < 0) { + perror("rlogin: connect"); + exit(5); + } + ... +} +.DE +.ce +Figure 1. Remote login client code. diff --git a/share/doc/psd/21.ipc/4.t b/share/doc/psd/21.ipc/4.t new file mode 100644 index 0000000..22e6836 --- /dev/null +++ b/share/doc/psd/21.ipc/4.t @@ -0,0 +1,515 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)4.t 8.1 (Berkeley) 6/8/93 +.\" $FreeBSD$ +.\" +.\".ds RH "Client/Server Model +.bp +.nr H1 4 +.nr H2 0 +.sp 8i +.bp +.LG +.B +.ce +4. CLIENT/SERVER MODEL +.sp 2 +.R +.NL +.PP +The most commonly used paradigm in constructing distributed applications +is the client/server model. In this scheme client applications request +services from a server process. This implies an asymmetry in establishing +communication between the client and server which has been examined +in section 2. In this section we will look more closely at the interactions +between client and server, and consider some of the problems in developing +client and server applications. +.PP +The client and server require a well known set of conventions before +service may be rendered (and accepted). This set of conventions +comprises a protocol which must be implemented at both ends of a +connection. Depending on the situation, the protocol may be symmetric +or asymmetric. In a symmetric protocol, either side may play the +master or slave roles. In an asymmetric protocol, one side is +immutably recognized as the master, with the other as the slave. +An example of a symmetric protocol is the TELNET protocol used in +the Internet for remote terminal emulation. An example +of an asymmetric protocol is the Internet file transfer protocol, +FTP. No matter whether the specific protocol used in obtaining +a service is symmetric or asymmetric, when accessing a service there +is a \*(lqclient process\*(rq and a \*(lqserver process\*(rq. We +will first consider the properties of server processes, then +client processes. +.PP +A server process normally listens at a well known address for +service requests. That is, the server process remains dormant +until a connection is requested by a client's connection +to the server's address. At such a time +the server process ``wakes up'' and services the client, +performing whatever appropriate actions the client requests of it. +.PP +Alternative schemes which use a service server +may be used to eliminate a flock of server processes clogging the +system while remaining dormant most of the time. For Internet +servers in 4.4BSD, +this scheme has been implemented via \fIinetd\fP, the so called +``internet super-server.'' \fIInetd\fP listens at a variety +of ports, determined at start-up by reading a configuration file. +When a connection is requested to a port on which \fIinetd\fP is +listening, \fIinetd\fP executes the appropriate server program to handle the +client. With this method, clients are unaware that an +intermediary such as \fIinetd\fP has played any part in the +connection. \fIInetd\fP will be described in more detail in +section 5. +.PP +A similar alternative scheme is used by most Xerox services. In general, +the Courier dispatch process (if used) accepts connections from +processes requesting services of some sort or another. The client +processes request a particular <program number, version number, procedure +number> triple. If the dispatcher knows of such a program, it is +started to handle the request; if not, an error is reported to the +client. In this way, only one port is required to service a large +variety of different requests. Again, the Courier facilities are +not available without the use and installation of the Courier +compiler. The information presented in this section applies only +to NS clients and services that do not use Courier. +.NH 2 +Servers +.PP +In 4.4BSD most servers are accessed at well known Internet addresses +or UNIX domain names. For +example, the remote login server's main loop is of the form shown +in Figure 2. +.KF +.if t .ta .5i 1.0i 1.5i 2.0i 2.5i 3.0i 3.5i +.if n .ta .7i 1.4i 2.1i 2.8i 3.5i 4.2i 4.9i +.sp 0.5i +.DS +main(argc, argv) + int argc; + char *argv[]; +{ + int f; + struct sockaddr_in from; + struct servent *sp; + + sp = getservbyname("login", "tcp"); + if (sp == NULL) { + fprintf(stderr, "rlogind: login/tcp: unknown service\en"); + exit(1); + } + ... +#ifndef DEBUG + /* Disassociate server from controlling terminal */ + ... +#endif + + sin.sin_port = sp->s_port; /* Restricted port -- see section 5 */ + ... + f = socket(AF_INET, SOCK_STREAM, 0); + ... + if (bind(f, (struct sockaddr *) &sin, sizeof (sin)) < 0) { + ... + } + ... + listen(f, 5); + for (;;) { + int g, len = sizeof (from); + + g = accept(f, (struct sockaddr *) &from, &len); + if (g < 0) { + if (errno != EINTR) + syslog(LOG_ERR, "rlogind: accept: %m"); + continue; + } + if (fork() == 0) { + close(f); + doit(g, &from); + } + close(g); + } +} +.DE +.ce +Figure 2. Remote login server. +.sp 0.5i +.KE +.PP +The first step taken by the server is look up its service +definition: +.sp 1 +.nf +.in +5 +.if t .ta .5i 1.0i 1.5i 2.0i +.if n .ta .7i 1.4i 2.1i 2.8i +sp = getservbyname("login", "tcp"); +if (sp == NULL) { + fprintf(stderr, "rlogind: login/tcp: unknown service\en"); + exit(1); +} +.sp 1 +.in -5 +.fi +The result of the \fIgetservbyname\fP call +is used in later portions of the code to +define the Internet port at which it listens for service +requests (indicated by a connection). +.KS +.PP +Step two is to disassociate the server from the controlling +terminal of its invoker: +.DS + for (i = 0; i < 3; ++i) + close(i); + + open("/", O_RDONLY); + dup2(0, 1); + dup2(0, 2); + + i = open("/dev/tty", O_RDWR); + if (i >= 0) { + ioctl(i, TIOCNOTTY, 0); + close(i); + } +.DE +.KE +This step is important as the server will +likely not want to receive signals delivered to the process +group of the controlling terminal. Note, however, that +once a server has disassociated itself it can no longer +send reports of errors to a terminal, and must log errors +via \fIsyslog\fP. +.PP +Once a server has established a pristine environment, it +creates a socket and begins accepting service requests. +The \fIbind\fP call is required to insure the server listens +at its expected location. It should be noted that the +remote login server listens at a restricted port number, and must +therefore be run +with a user-id of root. +This concept of a ``restricted port number'' is 4BSD +specific, and is covered in section 5. +.PP +The main body of the loop is fairly simple: +.DS +.if t .ta .5i 1.0i 1.5i 2.0i +.if n .ta .7i 1.4i 2.1i 2.8i +for (;;) { + int g, len = sizeof (from); + + g = accept(f, (struct sockaddr *)&from, &len); + if (g < 0) { + if (errno != EINTR) + syslog(LOG_ERR, "rlogind: accept: %m"); + continue; + } + if (fork() == 0) { /* Child */ + close(f); + doit(g, &from); + } + close(g); /* Parent */ +} +.DE +An \fIaccept\fP call blocks the server until +a client requests service. This call could return a +failure status if the call is interrupted by a signal +such as SIGCHLD (to be discussed in section 5). Therefore, +the return value from \fIaccept\fP is checked to insure +a connection has actually been established, and +an error report is logged via \fIsyslog\fP if an error +has occurred. +.PP +With a connection +in hand, the server then forks a child process and invokes +the main body of the remote login protocol processing. Note +how the socket used by the parent for queuing connection +requests is closed in the child, while the socket created as +a result of the \fIaccept\fP is closed in the parent. The +address of the client is also handed the \fIdoit\fP routine +because it requires it in authenticating clients. +.NH 2 +Clients +.PP +The client side of the remote login service was shown +earlier in Figure 1. +One can see the separate, asymmetric roles of the client +and server clearly in the code. The server is a passive entity, +listening for client connections, while the client process is +an active entity, initiating a connection when invoked. +.PP +Let us consider more closely the steps taken +by the client remote login process. As in the server process, +the first step is to locate the service definition for a remote +login: +.DS +sp = getservbyname("login", "tcp"); +if (sp == NULL) { + fprintf(stderr, "rlogin: login/tcp: unknown service\en"); + exit(1); +} +.DE +Next the destination host is looked up with a +\fIgethostbyname\fP call: +.DS +hp = gethostbyname(argv[1]); +if (hp == NULL) { + fprintf(stderr, "rlogin: %s: unknown host\en", argv[1]); + exit(2); +} +.DE +With this accomplished, all that is required is to establish a +connection to the server at the requested host and start up the +remote login protocol. The address buffer is cleared, then filled +in with the Internet address of the foreign host and the port +number at which the login process resides on the foreign host: +.DS +bzero((char *)&server, sizeof (server)); +bcopy(hp->h_addr, (char *) &server.sin_addr, hp->h_length); +server.sin_family = hp->h_addrtype; +server.sin_port = sp->s_port; +.DE +A socket is created, and a connection initiated. Note +that \fIconnect\fP implicitly performs a \fIbind\fP +call, since \fIs\fP is unbound. +.DS +s = socket(hp->h_addrtype, SOCK_STREAM, 0); +if (s < 0) { + perror("rlogin: socket"); + exit(3); +} + ... +if (connect(s, (struct sockaddr *) &server, sizeof (server)) < 0) { + perror("rlogin: connect"); + exit(4); +} +.DE +The details of the remote login protocol will not be considered here. +.NH 2 +Connectionless servers +.PP +While connection-based services are the norm, some services +are based on the use of datagram sockets. One, in particular, +is the \*(lqrwho\*(rq service which provides users with status +information for hosts connected to a local area +network. This service, while predicated on the ability to +\fIbroadcast\fP information to all hosts connected to a particular +network, is of interest as an example usage of datagram sockets. +.PP +A user on any machine running the rwho server may find out +the current status of a machine with the \fIruptime\fP(1) program. +The output generated is illustrated in Figure 3. +.KF +.DS B +.TS +l r l l l l l. +arpa up 9:45, 5 users, load 1.15, 1.39, 1.31 +cad up 2+12:04, 8 users, load 4.67, 5.13, 4.59 +calder up 10:10, 0 users, load 0.27, 0.15, 0.14 +dali up 2+06:28, 9 users, load 1.04, 1.20, 1.65 +degas up 25+09:48, 0 users, load 1.49, 1.43, 1.41 +ear up 5+00:05, 0 users, load 1.51, 1.54, 1.56 +ernie down 0:24 +esvax down 17:04 +ingres down 0:26 +kim up 3+09:16, 8 users, load 2.03, 2.46, 3.11 +matisse up 3+06:18, 0 users, load 0.03, 0.03, 0.05 +medea up 3+09:39, 2 users, load 0.35, 0.37, 0.50 +merlin down 19+15:37 +miro up 1+07:20, 7 users, load 4.59, 3.28, 2.12 +monet up 1+00:43, 2 users, load 0.22, 0.09, 0.07 +oz down 16:09 +statvax up 2+15:57, 3 users, load 1.52, 1.81, 1.86 +ucbvax up 9:34, 2 users, load 6.08, 5.16, 3.28 +.TE +.DE +.ce +Figure 3. ruptime output. +.sp +.KE +.PP +Status information for each host is periodically broadcast +by rwho server processes on each machine. The same server +process also receives the status information and uses it +to update a database. This database is then interpreted +to generate the status information for each host. Servers +operate autonomously, coupled only by the local network and +its broadcast capabilities. +.PP +Note that the use of broadcast for such a task is fairly inefficient, +as all hosts must process each message, whether or not using an rwho server. +Unless such a service is sufficiently universal and is frequently used, +the expense of periodic broadcasts outweighs the simplicity. +.PP +Multicasting is an alternative to broadcasting. +Setting up multicast sockets is described in Section 5.10. +.PP +The rwho server, in a simplified form, is pictured in Figure +4. There are two separate tasks performed by the server. The +first task is to act as a receiver of status information broadcast +by other hosts on the network. This job is carried out in the +main loop of the program. Packets received at the rwho port +are interrogated to insure they've been sent by another rwho +server process, then are time stamped with their arrival time +and used to update a file indicating the status of the host. +When a host has not been heard from for an extended period of +time, the database interpretation routines assume the host is +down and indicate such on the status reports. This algorithm +is prone to error as a server may be down while a host is actually +up, but serves our current needs. +.KF +.DS +.if t .ta .5i 1.0i 1.5i 2.0i +.if n .ta .7i 1.4i 2.1i 2.8i +main() +{ + ... + sp = getservbyname("who", "udp"); + net = getnetbyname("localnet"); + sin.sin_addr = inet_makeaddr(INADDR_ANY, net); + sin.sin_port = sp->s_port; + ... + s = socket(AF_INET, SOCK_DGRAM, 0); + ... + on = 1; + if (setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof(on)) < 0) { + syslog(LOG_ERR, "setsockopt SO_BROADCAST: %m"); + exit(1); + } + bind(s, (struct sockaddr *) &sin, sizeof (sin)); + ... + signal(SIGALRM, onalrm); + onalrm(); + for (;;) { + struct whod wd; + int cc, whod, len = sizeof (from); + + cc = recvfrom(s, (char *)&wd, sizeof (struct whod), 0, + (struct sockaddr *)&from, &len); + if (cc <= 0) { + if (cc < 0 && errno != EINTR) + syslog(LOG_ERR, "rwhod: recv: %m"); + continue; + } + if (from.sin_port != sp->s_port) { + syslog(LOG_ERR, "rwhod: %d: bad from port", + ntohs(from.sin_port)); + continue; + } + ... + if (!verify(wd.wd_hostname)) { + syslog(LOG_ERR, "rwhod: malformed host name from %x", + ntohl(from.sin_addr.s_addr)); + continue; + } + (void) sprintf(path, "%s/whod.%s", RWHODIR, wd.wd_hostname); + whod = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0666); + ... + (void) time(&wd.wd_recvtime); + (void) write(whod, (char *)&wd, cc); + (void) close(whod); + } +} +.DE +.ce +Figure 4. rwho server. +.sp +.KE +.PP +The second task performed by the server is to supply information +regarding the status of its host. This involves periodically +acquiring system status information, packaging it up in a message +and broadcasting it on the local network for other rwho servers +to hear. The supply function is triggered by a timer and +runs off a signal. Locating the system status +information is somewhat involved, but uninteresting. Deciding +where to transmit the resultant packet +is somewhat problematical, however. +.PP +Status information must be broadcast on the local network. +For networks which do not support the notion of broadcast another +scheme must be used to simulate or +replace broadcasting. One possibility is to enumerate the +known neighbors (based on the status messages received +from other rwho servers). This, unfortunately, +requires some bootstrapping information, +for a server will have no idea what machines are its +neighbors until it receives status messages from them. +Therefore, if all machines on a net are freshly booted, +no machine will have any +known neighbors and thus never receive, or send, any status information. +This is the identical problem faced by the routing table management +process in propagating routing status information. The standard +solution, unsatisfactory as it may be, is to inform one or more servers +of known neighbors and request that they always communicate with +these neighbors. If each server has at least one neighbor supplied +to it, status information may then propagate through +a neighbor to hosts which +are not (possibly) directly neighbors. If the server is able to +support networks which provide a broadcast capability, as well as +those which do not, then networks with an +arbitrary topology may share status information*. +.FS +* One must, however, be concerned about \*(lqloops\*(rq. +That is, if a host is connected to multiple networks, it +will receive status information from itself. This can lead +to an endless, wasteful, exchange of information. +.FE +.PP +It is important that software operating in a distributed +environment not have any site-dependent information compiled into it. +This would require a separate copy of the server at each host and +make maintenance a severe headache. 4.4BSD attempts to isolate +host-specific information from applications by providing system +calls which return the necessary information*. +.FS +* An example of such a system call is the \fIgethostname\fP(2) +call which returns the host's \*(lqofficial\*(rq name. +.FE +A mechanism exists, in the form of an \fIioctl\fP call, +for finding the collection +of networks to which a host is directly connected. +Further, a local network broadcasting mechanism +has been implemented at the socket level. +Combining these two features allows a process +to broadcast on any directly connected local +network which supports the notion of broadcasting +in a site independent manner. This allows 4.4BSD +to solve the problem of deciding how to propagate +status information in the case of \fIrwho\fP, or +more generally in broadcasting: +Such status information is broadcast to connected +networks at the socket level, where the connected networks +have been obtained via the appropriate \fIioctl\fP +calls. +The specifics of +such broadcastings are complex, however, and will +be covered in section 5. diff --git a/share/doc/psd/21.ipc/5.t b/share/doc/psd/21.ipc/5.t new file mode 100644 index 0000000..8ce44b2 --- /dev/null +++ b/share/doc/psd/21.ipc/5.t @@ -0,0 +1,1668 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)5.t 8.1 (Berkeley) 8/14/93 +.\" $FreeBSD$ +.\" +.\".ds RH "Advanced Topics +.bp +.nr H1 5 +.nr H2 0 +.LG +.B +.ce +5. ADVANCED TOPICS +.sp 2 +.R +.NL +.PP +A number of facilities have yet to be discussed. For most users +of the IPC the mechanisms already +described will suffice in constructing distributed +applications. However, others will find the need to utilize some +of the features which we consider in this section. +.NH 2 +Out of band data +.PP +The stream socket abstraction includes the notion of \*(lqout +of band\*(rq data. Out of band data is a logically independent +transmission channel associated with each pair of connected +stream sockets. Out of band data is delivered to the user +independently of normal data. +The abstraction defines that the out of band data facilities +must support the reliable delivery of at least one +out of band message at a time. This message may contain at least one +byte of data, and at least one message may be pending delivery +to the user at any one time. For communications protocols which +support only in-band signaling (i.e. the urgent data is +delivered in sequence with the normal data), the system normally extracts +the data from the normal data stream and stores it separately. +This allows users to choose between receiving the urgent data +in order and receiving it out of sequence without having to +buffer all the intervening data. It is possible +to ``peek'' (via MSG_PEEK) at out of band data. +If the socket has a process group, a SIGURG signal is generated +when the protocol is notified of its existence. +A process can set the process group +or process id to be informed by the SIGURG signal via the +appropriate \fIfcntl\fP call, as described below for +SIGIO. +If multiple sockets may have out of band data awaiting +delivery, a \fIselect\fP call for exceptional conditions +may be used to determine those sockets with such data pending. +Neither the signal nor the select indicate the actual arrival +of the out-of-band data, but only notification that it is pending. +.PP +In addition to the information passed, a logical mark is placed in +the data stream to indicate the point at which the out +of band data was sent. The remote login and remote shell +applications use this facility to propagate signals between +client and server processes. When a signal +flushs any pending output from the remote process(es), all +data up to the mark in the data stream is discarded. +.PP +To send an out of band message the MSG_OOB flag is supplied to +a \fIsend\fP or \fIsendto\fP calls, +while to receive out of band data MSG_OOB should be indicated +when performing a \fIrecvfrom\fP or \fIrecv\fP call. +To find out if the read pointer is currently pointing at +the mark in the data stream, the SIOCATMARK ioctl is provided: +.DS +ioctl(s, SIOCATMARK, &yes); +.DE +If \fIyes\fP is a 1 on return, the next read will return data +after the mark. Otherwise (assuming out of band data has arrived), +the next read will provide data sent by the client prior +to transmission of the out of band signal. The routine used +in the remote login process to flush output on receipt of an +interrupt or quit signal is shown in Figure 5. +It reads the normal data up to the mark (to discard it), +then reads the out-of-band byte. +.KF +.DS +#include <sys/ioctl.h> +#include <sys/file.h> + ... +oob() +{ + int out = FWRITE, mark; + char waste[BUFSIZ]; + + /* flush local terminal output */ + ioctl(1, TIOCFLUSH, (char *)&out); + for (;;) { + if (ioctl(rem, SIOCATMARK, &mark) < 0) { + perror("ioctl"); + break; + } + if (mark) + break; + (void) read(rem, waste, sizeof (waste)); + } + if (recv(rem, &mark, 1, MSG_OOB) < 0) { + perror("recv"); + ... + } + ... +} +.DE +.ce +Figure 5. Flushing terminal I/O on receipt of out of band data. +.sp +.KE +.PP +A process may also read or peek at the out-of-band data +without first reading up to the mark. +This is more difficult when the underlying protocol delivers +the urgent data in-band with the normal data, and only sends +notification of its presence ahead of time (e.g., the TCP protocol +used to implement streams in the Internet domain). +With such protocols, the out-of-band byte may not yet have arrived +when a \fIrecv\fP is done with the MSG_OOB flag. +In that case, the call will return an error of EWOULDBLOCK. +Worse, there may be enough in-band data in the input buffer +that normal flow control prevents the peer from sending the urgent data +until the buffer is cleared. +The process must then read enough of the queued data +that the urgent data may be delivered. +.PP +Certain programs that use multiple bytes of urgent data and must +handle multiple urgent signals (e.g., \fItelnet\fP\|(1C)) +need to retain the position of urgent data within the stream. +This treatment is available as a socket-level option, SO_OOBINLINE; +see \fIsetsockopt\fP\|(2) for usage. +With this option, the position of urgent data (the \*(lqmark\*(rq) +is retained, but the urgent data immediately follows the mark +within the normal data stream returned without the MSG_OOB flag. +Reception of multiple urgent indications causes the mark to move, +but no out-of-band data are lost. +.NH 2 +Non-Blocking Sockets +.PP +It is occasionally convenient to make use of sockets +which do not block; that is, I/O requests which +cannot complete immediately and +would therefore cause the process to be suspended awaiting completion are +not executed, and an error code is returned. +Once a socket has been created via +the \fIsocket\fP call, it may be marked as non-blocking +by \fIfcntl\fP as follows: +.DS +#include <fcntl.h> + ... +int s; + ... +s = socket(AF_INET, SOCK_STREAM, 0); + ... +if (fcntl(s, F_SETFL, FNDELAY) < 0) + perror("fcntl F_SETFL, FNDELAY"); + exit(1); +} + ... +.DE +.PP +When performing non-blocking I/O on sockets, one must be +careful to check for the error EWOULDBLOCK (stored in the +global variable \fIerrno\fP), which occurs when +an operation would normally block, but the socket it +was performed on is marked as non-blocking. +In particular, \fIaccept\fP, \fIconnect\fP, \fIsend\fP, \fIrecv\fP, +\fIread\fP, and \fIwrite\fP can +all return EWOULDBLOCK, and processes should be prepared +to deal with such return codes. +If an operation such as a \fIsend\fP cannot be done in its entirety, +but partial writes are sensible (for example, when using a stream socket), +the data that can be sent immediately will be processed, +and the return value will indicate the amount actually sent. +.NH 2 +Interrupt driven socket I/O +.PP +The SIGIO signal allows a process to be notified +via a signal when a socket (or more generally, a file +descriptor) has data waiting to be read. Use of +the SIGIO facility requires three steps: First, +the process must set up a SIGIO signal handler +by use of the \fIsignal\fP or \fIsigvec\fP calls. Second, +it must set the process id or process group id which is to receive +notification of pending input to its own process id, +or the process group id of its process group (note that +the default process group of a socket is group zero). +This is accomplished by use of an \fIfcntl\fP call. +Third, it must enable asynchronous notification of pending I/O requests +with another \fIfcntl\fP call. Sample code to +allow a given process to receive information on +pending I/O requests as they occur for a socket \fIs\fP +is given in Figure 6. With the addition of a handler for SIGURG, +this code can also be used to prepare for receipt of SIGURG signals. +.KF +.DS +#include <fcntl.h> + ... +int io_handler(); + ... +signal(SIGIO, io_handler); + +/* Set the process receiving SIGIO/SIGURG signals to us */ + +if (fcntl(s, F_SETOWN, getpid()) < 0) { + perror("fcntl F_SETOWN"); + exit(1); +} + +/* Allow receipt of asynchronous I/O signals */ + +if (fcntl(s, F_SETFL, FASYNC) < 0) { + perror("fcntl F_SETFL, FASYNC"); + exit(1); +} +.DE +.ce +Figure 6. Use of asynchronous notification of I/O requests. +.sp +.KE +.NH 2 +Signals and process groups +.PP +Due to the existence of the SIGURG and SIGIO signals each socket has an +associated process number, just as is done for terminals. +This value is initialized to zero, +but may be redefined at a later time with the F_SETOWN +\fIfcntl\fP, such as was done in the code above for SIGIO. +To set the socket's process id for signals, positive arguments +should be given to the \fIfcntl\fP call. To set the socket's +process group for signals, negative arguments should be +passed to \fIfcntl\fP. Note that the process number indicates +either the associated process id or the associated process +group; it is impossible to specify both at the same time. +A similar \fIfcntl\fP, F_GETOWN, is available for determining the +current process number of a socket. +.PP +Another signal which is useful when constructing server processes +is SIGCHLD. This signal is delivered to a process when any +child processes have changed state. Normally servers use +the signal to \*(lqreap\*(rq child processes that have exited +without explicitly awaiting their termination +or periodic polling for exit status. +For example, the remote login server loop shown in Figure 2 +may be augmented as shown in Figure 7. +.KF +.DS +int reaper(); + ... +signal(SIGCHLD, reaper); +listen(f, 5); +for (;;) { + int g, len = sizeof (from); + + g = accept(f, (struct sockaddr *)&from, &len,); + if (g < 0) { + if (errno != EINTR) + syslog(LOG_ERR, "rlogind: accept: %m"); + continue; + } + ... +} + ... +#include <wait.h> +reaper() +{ + union wait status; + + while (wait3(&status, WNOHANG, 0) > 0) + ; +} +.DE +.sp +.ce +Figure 7. Use of the SIGCHLD signal. +.sp +.KE +.PP +If the parent server process fails to reap its children, +a large number of \*(lqzombie\*(rq processes may be created. +.NH 2 +Pseudo terminals +.PP +Many programs will not function properly without a terminal +for standard input and output. Since sockets do not provide +the semantics of terminals, +it is often necessary to have a process communicating over +the network do so through a \fIpseudo-terminal\fP. A pseudo- +terminal is actually a pair of devices, master and slave, +which allow a process to serve as an active agent in communication +between processes and users. Data written on the slave side +of a pseudo-terminal is supplied as input to a process reading +from the master side, while data written on the master side are +processed as terminal input for the slave. +In this way, the process manipulating +the master side of the pseudo-terminal has control over the +information read and written on the slave side +as if it were manipulating the keyboard and reading the screen +on a real terminal. +The purpose of this abstraction is to +preserve terminal semantics over a network connection\(em +that is, the slave side appears as a normal terminal to +any process reading from or writing to it. +.PP +For example, the remote +login server uses pseudo-terminals for remote login sessions. +A user logging in to a machine across the network is provided +a shell with a slave pseudo-terminal as standard input, output, +and error. The server process then handles the communication +between the programs invoked by the remote shell and the user's +local client process. +When a user sends a character that generates an interrupt +on the remote machine that flushes terminal output, +the pseudo-terminal generates a control message for the server process. +The server then sends an out of band message +to the client process to signal a flush of data at the real terminal +and on the intervening data buffered in the network. +.PP +Under 4.4BSD, the name of the slave side of a pseudo-terminal is of the form +\fI/dev/ttyxy\fP, where \fIx\fP is a single letter +starting at `p' and continuing to `t'. +\fIy\fP is a hexadecimal digit (i.e., a single +character in the range 0 through 9 or `a' through `f'). +The master side of a pseudo-terminal is \fI/dev/ptyxy\fP, +where \fIx\fP and \fIy\fP correspond to the +slave side of the pseudo-terminal. +.PP +In general, the method of obtaining a pair of master and +slave pseudo-terminals is to +find a pseudo-terminal which +is not currently in use. +The master half of a pseudo-terminal is a single-open device; +thus, each master may be opened in turn until an open succeeds. +The slave side of the pseudo-terminal is then opened, +and is set to the proper terminal modes if necessary. +The process then \fIfork\fPs; the child closes +the master side of the pseudo-terminal, and \fIexec\fPs the +appropriate program. Meanwhile, the parent closes the +slave side of the pseudo-terminal and begins reading and +writing from the master side. Sample code making use of +pseudo-terminals is given in Figure 8; this code assumes +that a connection on a socket \fIs\fP exists, connected +to a peer who wants a service of some kind, and that the +process has disassociated itself from any previous controlling terminal. +.KF +.DS +gotpty = 0; +for (c = 'p'; !gotpty && c <= 's'; c++) { + line = "/dev/ptyXX"; + line[sizeof("/dev/pty")-1] = c; + line[sizeof("/dev/ptyp")-1] = '0'; + if (stat(line, &statbuf) < 0) + break; + for (i = 0; i < 16; i++) { + line[sizeof("/dev/ptyp")-1] = "0123456789abcdef"[i]; + master = open(line, O_RDWR); + if (master > 0) { + gotpty = 1; + break; + } + } +} +if (!gotpty) { + syslog(LOG_ERR, "All network ports in use"); + exit(1); +} + +line[sizeof("/dev/")-1] = 't'; +slave = open(line, O_RDWR); /* \fIslave\fP is now slave side */ +if (slave < 0) { + syslog(LOG_ERR, "Cannot open slave pty %s", line); + exit(1); +} + +ioctl(slave, TIOCGETP, &b); /* Set slave tty modes */ +b.sg_flags = CRMOD|XTABS|ANYP; +ioctl(slave, TIOCSETP, &b); + +i = fork(); +if (i < 0) { + syslog(LOG_ERR, "fork: %m"); + exit(1); +} else if (i) { /* Parent */ + close(slave); + ... +} else { /* Child */ + (void) close(s); + (void) close(master); + dup2(slave, 0); + dup2(slave, 1); + dup2(slave, 2); + if (slave > 2) + (void) close(slave); + ... +} +.DE +.ce +Figure 8. Creation and use of a pseudo terminal +.sp +.KE +.NH 2 +Selecting specific protocols +.PP +If the third argument to the \fIsocket\fP call is 0, +\fIsocket\fP will select a default protocol to use with +the returned socket of the type requested. +The default protocol is usually correct, and alternate choices are not +usually available. +However, when using ``raw'' sockets to communicate directly with +lower-level protocols or hardware interfaces, +the protocol argument may be important for setting up demultiplexing. +For example, raw sockets in the Internet family may be used to implement +a new protocol above IP, and the socket will receive packets +only for the protocol specified. +To obtain a particular protocol one determines the protocol number +as defined within the communication domain. For the Internet +domain one may use one of the library routines +discussed in section 3, such as \fIgetprotobyname\fP: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <netdb.h> + ... +pp = getprotobyname("newtcp"); +s = socket(AF_INET, SOCK_STREAM, pp->p_proto); +.DE +This would result in a socket \fIs\fP using a stream +based connection, but with protocol type of ``newtcp'' +instead of the default ``tcp.'' +.PP +In the NS domain, the available socket protocols are defined in +<\fInetns/ns.h\fP>. To create a raw socket for Xerox Error Protocol +messages, one might use: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> + ... +s = socket(AF_NS, SOCK_RAW, NSPROTO_ERROR); +.DE +.NH 2 +Address binding +.PP +As was mentioned in section 2, +binding addresses to sockets in the Internet and NS domains can be +fairly complex. As a brief reminder, these associations +are composed of local and foreign +addresses, and local and foreign ports. Port numbers are +allocated out of separate spaces, one for each system and one +for each domain on that system. +Through the \fIbind\fP system call, a +process may specify half of an association, the +<local address, local port> part, while the +\fIconnect\fP +and \fIaccept\fP +primitives are used to complete a socket's association by +specifying the <foreign address, foreign port> part. +Since the association is created in two steps the association +uniqueness requirement indicated previously could be violated unless +care is taken. Further, it is unrealistic to expect user +programs to always know proper values to use for the local address +and local port since a host may reside on multiple networks and +the set of allocated port numbers is not directly accessible +to a user. +.PP +To simplify local address binding in the Internet domain the notion of a +\*(lqwildcard\*(rq address has been provided. When an address +is specified as INADDR_ANY (a manifest constant defined in +<netinet/in.h>), the system interprets the address as +\*(lqany valid address\*(rq. For example, to bind a specific +port number to a socket, but leave the local address unspecified, +the following code might be used: +.DS +#include <sys/types.h> +#include <netinet/in.h> + ... +struct sockaddr_in sin; + ... +s = socket(AF_INET, SOCK_STREAM, 0); +sin.sin_family = AF_INET; +sin.sin_addr.s_addr = htonl(INADDR_ANY); +sin.sin_port = htons(MYPORT); +bind(s, (struct sockaddr *) &sin, sizeof (sin)); +.DE +Sockets with wildcarded local addresses may receive messages +directed to the specified port number, and sent to any +of the possible addresses assigned to a host. For example, +if a host has addresses 128.32.0.4 and 10.0.0.78, and a socket is bound as +above, the process will be +able to accept connection requests which are addressed to +128.32.0.4 or 10.0.0.78. +If a server process wished to only allow hosts on a +given network connect to it, it would bind +the address of the host on the appropriate network. +.PP +In a similar fashion, a local port may be left unspecified +(specified as zero), in which case the system will select an +appropriate port number for it. This shortcut will work +both in the Internet and NS domains. For example, to +bind a specific local address to a socket, but to leave the +local port number unspecified: +.DS +hp = gethostbyname(hostname); +if (hp == NULL) { + ... +} +bcopy(hp->h_addr, (char *) sin.sin_addr, hp->h_length); +sin.sin_port = htons(0); +bind(s, (struct sockaddr *) &sin, sizeof (sin)); +.DE +The system selects the local port number based on two criteria. +The first is that on 4BSD systems, +Internet ports below IPPORT_RESERVED (1024) (for the Xerox domain, +0 through 3000) are reserved +for privileged users (i.e., the super user); +Internet ports above IPPORT_USERRESERVED (50000) are reserved +for non-privileged servers. The second is +that the port number is not currently bound to some other +socket. In order to find a free Internet port number in the privileged +range the \fIrresvport\fP library routine may be used as follows +to return a stream socket in with a privileged port number: +.DS +int lport = IPPORT_RESERVED \- 1; +int s; +\&... +s = rresvport(&lport); +if (s < 0) { + if (errno == EAGAIN) + fprintf(stderr, "socket: all ports in use\en"); + else + perror("rresvport: socket"); + ... +} +.DE +The restriction on allocating ports was done to allow processes +executing in a \*(lqsecure\*(rq environment to perform authentication +based on the originating address and port number. For example, +the \fIrlogin\fP(1) command allows users to log in across a network +without being asked for a password, if two conditions hold: +First, the name of the system the user +is logging in from is in the file +\fI/etc/hosts.equiv\fP on the system he is logging +in to (or the system name and the user name are in +the user's \fI.rhosts\fP file in the user's home +directory), and second, that the user's rlogin +process is coming from a privileged port on the machine from which he is +logging. The port number and network address of the +machine from which the user is logging in can be determined either +by the \fIfrom\fP result of the \fIaccept\fP call, or +from the \fIgetpeername\fP call. +.PP +In certain cases the algorithm used by the system in selecting +port numbers is unsuitable for an application. This is because +associations are created in a two step process. For example, +the Internet file transfer protocol, FTP, specifies that data +connections must always originate from the same local port. However, +duplicate associations are avoided by connecting to different foreign +ports. In this situation the system would disallow binding the +same local address and port number to a socket if a previous data +connection's socket still existed. To override the default port +selection algorithm, an option call must be performed prior +to address binding: +.DS + ... +int on = 1; + ... +setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)); +bind(s, (struct sockaddr *) &sin, sizeof (sin)); +.DE +With the above call, local addresses may be bound which +are already in use. This does not violate the uniqueness +requirement as the system still checks at connect time to +be sure any other sockets with the same local address and +port do not have the same foreign address and port. +If the association already exists, the error EADDRINUSE is returned. +A related socket option, SO_REUSEPORT, which allows completely +duplicate bindings, is described in the IP multicasting section. +.NH 2 +Socket Options +.PP +It is possible to set and get a number of options on sockets +via the \fIsetsockopt\fP and \fIgetsockopt\fP system calls. +These options include such things as marking a socket for +broadcasting, not to route, to linger on close, etc. +In addition, there are protocol-specific options for IP and TCP, +as described in +.IR ip (4), +.IR tcp (4), +and in the section on multicasting below. +.PP +The general forms of the calls are: +.DS +setsockopt(s, level, optname, optval, optlen); +.DE +and +.DS +getsockopt(s, level, optname, optval, optlen); +.DE +.PP +The parameters to the calls are as follows: \fIs\fP +is the socket on which the option is to be applied. +\fILevel\fP specifies the protocol layer on which the +option is to be applied; in most cases this is +the ``socket level'', indicated by the symbolic constant +SOL_SOCKET, defined in \fI<sys/socket.h>.\fP +The actual option is specified in \fIoptname\fP, and is +a symbolic constant also defined in \fI<sys/socket.h>\fP. +\fIOptval\fP and \fIOptlen\fP point to the value of the +option (in most cases, whether the option is to be turned +on or off), and the length of the value of the option, +respectively. +For \fIgetsockopt\fP, \fIoptlen\fP is +a value-result parameter, initially set to the size of +the storage area pointed to by \fIoptval\fP, and modified +upon return to indicate the actual amount of storage used. +.PP +An example should help clarify things. It is sometimes +useful to determine the type (e.g., stream, datagram, etc.) +of an existing socket; programs +under \fIinetd\fP (described below) may need to perform this +task. This can be accomplished as follows via the +SO_TYPE socket option and the \fIgetsockopt\fP call: +.DS +#include <sys/types.h> +#include <sys/socket.h> + +int type, size; + +size = sizeof (int); + +if (getsockopt(s, SOL_SOCKET, SO_TYPE, (char *) &type, &size) < 0) { + ... +} +.DE +After the \fIgetsockopt\fP call, \fItype\fP will be set +to the value of the socket type, as defined in +\fI<sys/socket.h>\fP. If, for example, the socket were +a datagram socket, \fItype\fP would have the value +corresponding to SOCK_DGRAM. +.NH 2 +Broadcasting and determining network configuration +.PP +By using a datagram socket, it is possible to send broadcast +packets on many networks supported by the system. +The network itself must support broadcast; the system +provides no simulation of broadcast in software. +Broadcast messages can place a high load on a network since they force +every host on the network to service them. Consequently, +the ability to send broadcast packets has been limited +to sockets which are explicitly marked as allowing broadcasting. +Broadcast is typically used for one of two reasons: +it is desired to find a resource on a local network without prior +knowledge of its address, +or important functions such as routing require that information +be sent to all accessible neighbors. +.PP +Multicasting is an alternative to broadcasting. +Setting up IP multicast sockets is described in the next section. +.PP +To send a broadcast message, a datagram socket +should be created: +.DS +s = socket(AF_INET, SOCK_DGRAM, 0); +.DE +or +.DS +s = socket(AF_NS, SOCK_DGRAM, 0); +.DE +The socket is marked as allowing broadcasting, +.DS +int on = 1; + +setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on)); +.DE +and at least a port number should be bound to the socket: +.DS +sin.sin_family = AF_INET; +sin.sin_addr.s_addr = htonl(INADDR_ANY); +sin.sin_port = htons(MYPORT); +bind(s, (struct sockaddr *) &sin, sizeof (sin)); +.DE +or, for the NS domain, +.DS +sns.sns_family = AF_NS; +netnum = htonl(net); +sns.sns_addr.x_net = *(union ns_net *) &netnum; /* insert net number */ +sns.sns_addr.x_port = htons(MYPORT); +bind(s, (struct sockaddr *) &sns, sizeof (sns)); +.DE +The destination address of the message to be broadcast +depends on the network(s) on which the message is to be broadcast. +The Internet domain supports a shorthand notation for broadcast +on the local network, the address INADDR_BROADCAST (defined in +<\fInetinet/in.h\fP>. +To determine the list of addresses for all reachable neighbors +requires knowledge of the networks to which the host is connected. +Since this information should +be obtained in a host-independent fashion and may be impossible +to derive, 4.4BSD provides a method of +retrieving this information from the system data structures. +The SIOCGIFCONF \fIioctl\fP call returns the interface +configuration of a host in the form of a +single \fIifconf\fP structure; this structure contains +a ``data area'' which is made up of an array of +of \fIifreq\fP structures, one for each network interface +to which the host is connected. +These structures are defined in +\fI<net/if.h>\fP as follows: +.DS +.if t .ta .5i 1.0i 1.5i 3.5i +.if n .ta .7i 1.4i 2.1i 3.4i +struct ifconf { + int ifc_len; /* size of associated buffer */ + union { + caddr_t ifcu_buf; + struct ifreq *ifcu_req; + } ifc_ifcu; +}; + +#define ifc_buf ifc_ifcu.ifcu_buf /* buffer address */ +#define ifc_req ifc_ifcu.ifcu_req /* array of structures returned */ + +#define IFNAMSIZ 16 + +struct ifreq { + char ifr_name[IFNAMSIZ]; /* if name, e.g. "en0" */ + union { + struct sockaddr ifru_addr; + struct sockaddr ifru_dstaddr; + struct sockaddr ifru_broadaddr; + short ifru_flags; + caddr_t ifru_data; + } ifr_ifru; +}; + +.if t .ta \w' #define'u +\w' ifr_broadaddr'u +\w' ifr_ifru.ifru_broadaddr'u +#define ifr_addr ifr_ifru.ifru_addr /* address */ +#define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */ +#define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */ +#define ifr_flags ifr_ifru.ifru_flags /* flags */ +#define ifr_data ifr_ifru.ifru_data /* for use by interface */ +.DE +The actual call which obtains the +interface configuration is +.DS +struct ifconf ifc; +char buf[BUFSIZ]; + +ifc.ifc_len = sizeof (buf); +ifc.ifc_buf = buf; +if (ioctl(s, SIOCGIFCONF, (char *) &ifc) < 0) { + ... +} +.DE +After this call \fIbuf\fP will contain one \fIifreq\fP structure for +each network to which the host is connected, and +\fIifc.ifc_len\fP will have been modified to reflect the number +of bytes used by the \fIifreq\fP structures. +.PP +For each structure +there exists a set of ``interface flags'' which tell +whether the network corresponding to that interface is +up or down, point to point or broadcast, etc. The +SIOCGIFFLAGS \fIioctl\fP retrieves these +flags for an interface specified by an \fIifreq\fP +structure as follows: +.DS +struct ifreq *ifr; + +ifr = ifc.ifc_req; + +for (n = ifc.ifc_len / sizeof (struct ifreq); --n >= 0; ifr++) { + /* + * We must be careful that we don't use an interface + * devoted to an address family other than those intended; + * if we were interested in NS interfaces, the + * AF_INET would be AF_NS. + */ + if (ifr->ifr_addr.sa_family != AF_INET) + continue; + if (ioctl(s, SIOCGIFFLAGS, (char *) ifr) < 0) { + ... + } + /* + * Skip boring cases. + */ + if ((ifr->ifr_flags & IFF_UP) == 0 || + (ifr->ifr_flags & IFF_LOOPBACK) || + (ifr->ifr_flags & (IFF_BROADCAST | IFF_POINTTOPOINT)) == 0) + continue; +.DE +.PP +Once the flags have been obtained, the broadcast address +must be obtained. In the case of broadcast networks this is +done via the SIOCGIFBRDADDR \fIioctl\fP, while for point-to-point networks +the address of the destination host is obtained with SIOCGIFDSTADDR. +.DS +struct sockaddr dst; + +if (ifr->ifr_flags & IFF_POINTTOPOINT) { + if (ioctl(s, SIOCGIFDSTADDR, (char *) ifr) < 0) { + ... + } + bcopy((char *) ifr->ifr_dstaddr, (char *) &dst, sizeof (ifr->ifr_dstaddr)); +} else if (ifr->ifr_flags & IFF_BROADCAST) { + if (ioctl(s, SIOCGIFBRDADDR, (char *) ifr) < 0) { + ... + } + bcopy((char *) ifr->ifr_broadaddr, (char *) &dst, sizeof (ifr->ifr_broadaddr)); +} +.DE +.PP +After the appropriate \fIioctl\fP's have obtained the broadcast +or destination address (now in \fIdst\fP), the \fIsendto\fP call may be +used: +.DS + sendto(s, buf, buflen, 0, (struct sockaddr *)&dst, sizeof (dst)); +} +.DE +In the above loop one \fIsendto\fP occurs for every +interface to which the host is connected that supports the notion of +broadcast or point-to-point addressing. +If a process only wished to send broadcast +messages on a given network, code similar to that outlined above +would be used, but the loop would need to find the +correct destination address. +.PP +Received broadcast messages contain the senders address +and port, as datagram sockets are bound before +a message is allowed to go out. +.NH 2 +IP Multicasting +.PP +IP multicasting is the transmission of an IP datagram to a "host +group", a set of zero or more hosts identified by a single IP +destination address. A multicast datagram is delivered to all +members of its destination host group with the same "best-efforts" +reliability as regular unicast IP datagrams, i.e., the datagram is +not guaranteed to arrive intact at all members of the destination +group or in the same order relative to other datagrams. +.PP +The membership of a host group is dynamic; that is, hosts may join +and leave groups at any time. There is no restriction on the +location or number of members in a host group. A host may be a +member of more than one group at a time. A host need not be a member +of a group to send datagrams to it. +.PP +A host group may be permanent or transient. A permanent group has a +well-known, administratively assigned IP address. It is the address, +not the membership of the group, that is permanent; at any time a +permanent group may have any number of members, even zero. Those IP +multicast addresses that are not reserved for permanent groups are +available for dynamic assignment to transient groups which exist only +as long as they have members. +.PP +In general, a host cannot assume that datagrams sent to any host +group address will reach only the intended hosts, or that datagrams +received as a member of a transient host group are intended for the +recipient. Misdelivery must be detected at a level above IP, using +higher-level identifiers or authentication tokens. Information +transmitted to a host group address should be encrypted or governed +by administrative routing controls if the sender is concerned about +unwanted listeners. +.PP +IP multicasting is currently supported only on AF_INET sockets of type +SOCK_DGRAM and SOCK_RAW, and only on subnetworks for which the interface +driver has been modified to support multicasting. +.PP +The next subsections describe how to send and receive multicast datagrams. +.NH 3 +Sending IP Multicast Datagrams +.PP +To send a multicast datagram, specify an IP multicast address in the range +224.0.0.0 to 239.255.255.255 as the destination address +in a +.IR sendto (2) +call. +.PP +The definitions required for the multicast-related socket options are +found in \fI<netinet/in.h>\fP. +All IP addresses are passed in network byte-order. +.PP +By default, IP multicast datagrams are sent with a time-to-live (TTL) of 1, +which prevents them from being forwarded beyond a single subnetwork. A new +socket option allows the TTL for subsequent multicast datagrams to be set to +any value from 0 to 255, in order to control the scope of the multicasts: +.DS +u_char ttl; +setsockopt(sock, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl)); +.DE +Multicast datagrams with a TTL of 0 will not be transmitted on any subnet, +but may be delivered locally if the sending host belongs to the destination +group and if multicast loopback has not been disabled on the sending socket +(see below). Multicast datagrams with TTL greater than one may be delivered +to more than one subnet if there are one or more multicast routers attached +to the first-hop subnet. To provide meaningful scope control, the multicast +routers support the notion of TTL "thresholds", which prevent datagrams with +less than a certain TTL from traversing certain subnets. The thresholds +enforce the following convention: +.TS +center; +l | l +l | n. +_ +Scope Initial TTL += +restricted to the same host 0 +restricted to the same subnet 1 +restricted to the same site 32 +restricted to the same region 64 +restricted to the same continent 128 +unrestricted 255 +_ +.TE +"Sites" and "regions" are not strictly defined, and sites may be further +subdivided into smaller administrative units, as a local matter. +.PP +An application may choose an initial TTL other than the ones listed above. +For example, an application might perform an "expanding-ring search" for a +network resource by sending a multicast query, first with a TTL of 0, and +then with larger and larger TTLs, until a reply is received, perhaps using +the TTL sequence 0, 1, 2, 4, 8, 16, 32. +.PP +The multicast router +.IR mrouted (8), +refuses to forward any +multicast datagram with a destination address between 224.0.0.0 and +224.0.0.255, inclusive, regardless of its TTL. This range of addresses is +reserved for the use of routing protocols and other low-level topology +discovery or maintenance protocols, such as gateway discovery and group +membership reporting. +.PP +The address 224.0.0.0 is +guaranteed not to be assigned to any group, and 224.0.0.1 is assigned +to the permanent group of all IP hosts (including gateways). This is +used to address all multicast hosts on the directly connected +network. There is no multicast address (or any other IP address) for +all hosts on the total Internet. The addresses of other well-known, +permanent groups are published in the "Assigned Numbers" RFC, +which is available from the InterNIC. +.PP +Each multicast transmission is sent from a single network interface, even if +the host has more than one multicast-capable interface. (If the host is +also serving as a multicast router, +a multicast may be \fIforwarded\fP to interfaces +other than originating interface, provided that the TTL is greater than 1.) +The default interface to be used for multicasting is the primary network +interface on the system. +A socket option +is available to override the default for subsequent transmissions from a +given socket: +.DS +struct in_addr addr; +setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, &addr, sizeof(addr)); +.DE +where "addr" is the local IP address of the desired outgoing interface. +An address of INADDR_ANY may be used to revert to the default interface. +The local IP address of an interface can be obtained via the SIOCGIFCONF +ioctl. To determine if an interface supports multicasting, fetch the +interface flags via the SIOCGIFFLAGS ioctl and see if the IFF_MULTICAST +flag is set. (Normal applications should not need to use this option; it +is intended primarily for multicast routers and other system services +specifically concerned with internet topology.) +The SIOCGIFCONF and SIOCGIFFLAGS ioctls are described in the previous section. +.PP +If a multicast datagram is sent to a group to which the sending host itself +belongs (on the outgoing interface), a copy of the datagram is, by default, +looped back by the IP layer for local delivery. Another socket option gives +the sender explicit control over whether or not subsequent datagrams are +looped back: +.DS +u_char loop; +setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop)); +.DE +where \f2loop\f1 is set to 0 to disable loopback, +and set to 1 to enable loopback. +This option +improves performance for applications that may have no more than one +instance on a single host (such as a router demon), by eliminating +the overhead of receiving their own transmissions. It should generally not +be used by applications for which there may be more than one instance on a +single host (such as a conferencing program) or for which the sender does +not belong to the destination group (such as a time querying program). +.PP +A multicast datagram sent with an initial TTL greater than 1 may be delivered +to the sending host on a different interface from that on which it was sent, +if the host belongs to the destination group on that other interface. The +loopback control option has no effect on such delivery. +.NH 3 +Receiving IP Multicast Datagrams +.PP +Before a host can receive IP multicast datagrams, it must become a member +of one or more IP multicast groups. A process can ask the host to join +a multicast group by using the following socket option: +.DS +struct ip_mreq mreq; +setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) +.DE +where "mreq" is the following structure: +.DS +struct ip_mreq { + struct in_addr imr_multiaddr; /* \fImulticast group to join\fP */ + struct in_addr imr_interface; /* \fIinterface to join on\fP */ +} +.DE +Every membership is associated with a single interface, and it is possible +to join the same group on more than one interface. "imr_interface" should +be INADDR_ANY to choose the default multicast interface, or one of the +host's local addresses to choose a particular (multicast-capable) interface. +Up to IP_MAX_MEMBERSHIPS (currently 20) memberships may be added on a +single socket. +.PP +To drop a membership, use: +.DS +struct ip_mreq mreq; +setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); +.DE +where "mreq" contains the same values as used to add the membership. The +memberships associated with a socket are also dropped when the socket is +closed or the process holding the socket is killed. However, more than +one socket may claim a membership in a particular group, and the host +will remain a member of that group until the last claim is dropped. +.PP +The memberships associated with a socket do not necessarily determine which +datagrams are received on that socket. Incoming multicast packets are +accepted by the kernel IP layer if any socket has claimed a membership in the +destination group of the datagram; however, delivery of a multicast datagram +to a particular socket is based on the destination port (or protocol type, for +raw sockets), just as with unicast datagrams. +To receive multicast datagrams +sent to a particular port, it is necessary to bind to that local port, +leaving the local address unspecified (i.e., INADDR_ANY). +To receive multicast datagrams +sent to a particular group and port, bind to the local port, with +the local address set to the multicast group address. +Once bound to a multicast address, the socket cannot be used for sending data. +.PP +More than one process may bind to the same SOCK_DGRAM UDP port +or the same multicast group and port if the +.I bind +call is preceded by: +.DS +int on = 1; +setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &on, sizeof(on)); +.DE +All processes sharing the port must enable this option. +Every incoming multicast or broadcast UDP datagram destined to +the shared port is delivered to all sockets bound to the port. +For backwards compatibility reasons, this does not apply to incoming +unicast datagrams. Unicast +datagrams are never delivered to more than one socket, regardless of +how many sockets are bound to the datagram's destination port. +.PP +A final multicast-related extension is independent of IP: two new ioctls, +SIOCADDMULTI and SIOCDELMULTI, are available to add or delete link-level +(e.g., Ethernet) multicast addresses accepted by a particular interface. +The address to be added or deleted is passed as a sockaddr structure of +family AF_UNSPEC, within the standard ifreq structure. +.PP +These ioctls are +for the use of protocols other than IP, and require superuser privileges. +A link-level multicast address added via SIOCADDMULTI is not automatically +deleted when the socket used to add it goes away; it must be explicitly +deleted. It is inadvisable to delete a link-level address that may be +in use by IP. +.NH 3 +Sample Multicast Program +.PP +The following program sends or receives multicast packets. +If invoked with one argument, it sends a packet containing the current +time to an arbitrarily-chosen multicast group and UDP port. +If invoked with no arguments, it receives and prints these packets. +Start it as a sender on just one host and as a receiver on all the other hosts. +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <time.h> +#include <stdio.h> + +#define EXAMPLE_PORT 60123 +#define EXAMPLE_GROUP "224.0.0.250" + +main(argc) + int argc; +{ + struct sockaddr_in addr; + int addrlen, fd, cnt; + struct ip_mreq mreq; + char message[50]; + + fd = socket(AF_INET, SOCK_DGRAM, 0); + if (fd < 0) { + perror("socket"); + exit(1); + } + + bzero(&addr, sizeof(addr)); + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = htonl(INADDR_ANY); + addr.sin_port = htons(EXAMPLE_PORT); + addrlen = sizeof(addr); + + if (argc > 1) { /* Send */ + addr.sin_addr.s_addr = inet_addr(EXAMPLE_GROUP); + while (1) { + time_t t = time(0); + sprintf(message, "time is %-24.24s", ctime(&t)); + cnt = sendto(fd, message, sizeof(message), 0, + (struct sockaddr *)&addr, addrlen); + if (cnt < 0) { + perror("sendto"); + exit(1); + } + sleep(5); + } + } else { /* Receive */ + if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) { + perror("bind"); + exit(1); + } + + mreq.imr_multiaddr.s_addr = inet_addr(EXAMPLE_GROUP); + mreq.imr_interface.s_addr = htonl(INADDR_ANY); + if (setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, + &mreq, sizeof(mreq)) < 0) { + perror("setsockopt mreq"); + exit(1); + } + + while (1) { + cnt = recvfrom(fd, message, sizeof(message), 0, + (struct sockaddr *)&addr, &addrlen); + if (cnt <= 0) { + if (cnt == 0) { + break; + } + perror("recvfrom"); + exit(1); + } + printf("%s: message = \e"%s\e"\en", + inet_ntoa(addr.sin_addr), message); + } + } +} +.DE +.\"---------------------------------------------------------------------- +.NH 2 +NS Packet Sequences +.PP +The semantics of NS connections demand that +the user both be able to look inside the network header associated +with any incoming packet and be able to specify what should go +in certain fields of an outgoing packet. +Using different calls to \fIsetsockopt\fP, it is possible +to indicate whether prototype headers will be associated by +the user with each outgoing packet (SO_HEADERS_ON_OUTPUT), +to indicate whether the headers received by the system should be +delivered to the user (SO_HEADERS_ON_INPUT), or to indicate +default information that should be associated with all +outgoing packets on a given socket (SO_DEFAULT_HEADERS). +.PP +The contents of a SPP header (minus the IDP header) are: +.DS +.if t .ta \w" #define"u +\w" u_short"u +2.0i +struct sphdr { + u_char sp_cc; /* connection control */ +#define SP_SP 0x80 /* system packet */ +#define SP_SA 0x40 /* send acknowledgement */ +#define SP_OB 0x20 /* attention (out of band data) */ +#define SP_EM 0x10 /* end of message */ + u_char sp_dt; /* datastream type */ + u_short sp_sid; /* source connection identifier */ + u_short sp_did; /* destination connection identifier */ + u_short sp_seq; /* sequence number */ + u_short sp_ack; /* acknowledge number */ + u_short sp_alo; /* allocation number */ +}; +.DE +Here, the items of interest are the \fIdatastream type\fP and +the \fIconnection control\fP fields. The semantics of the +datastream type are defined by the application(s) in question; +the value of this field is, by default, zero, but it can be +used to indicate things such as Xerox's Bulk Data Transfer +Protocol (in which case it is set to one). The connection control +field is a mask of the flags defined just below it. The user may +set or clear the end-of-message bit to indicate +that a given message is the last of a given substream type, +or may set/clear the attention bit as an alternate way to +indicate that a packet should be sent out-of-band. +As an example, to associate prototype headers with outgoing +SPP packets, consider: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> +#include <netns/sp.h> + ... +struct sockaddr_ns sns, to; +int s, on = 1; +struct databuf { + struct sphdr proto_spp; /* prototype header */ + char buf[534]; /* max. possible data by Xerox std. */ +} buf; + ... +s = socket(AF_NS, SOCK_SEQPACKET, 0); + ... +bind(s, (struct sockaddr *) &sns, sizeof (sns)); +setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &on, sizeof(on)); + ... +buf.proto_spp.sp_dt = 1; /* bulk data */ +buf.proto_spp.sp_cc = SP_EM; /* end-of-message */ +strcpy(buf.buf, "hello world\en"); +sendto(s, (char *) &buf, sizeof(struct sphdr) + strlen("hello world\en"), + (struct sockaddr *) &to, sizeof(to)); + ... +.DE +Note that one must be careful when writing headers; if the prototype +header is not written with the data with which it is to be associated, +the kernel will treat the first few bytes of the data as the +header, with unpredictable results. +To turn off the above association, and to indicate that packet +headers received by the system should be passed up to the user, +one might use: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> +#include <netns/sp.h> + ... +struct sockaddr sns; +int s, on = 1, off = 0; + ... +s = socket(AF_NS, SOCK_SEQPACKET, 0); + ... +bind(s, (struct sockaddr *) &sns, sizeof (sns)); +setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &off, sizeof(off)); +setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_INPUT, &on, sizeof(on)); + ... +.DE +.PP +Output is handled somewhat differently in the IDP world. +The header of an IDP-level packet looks like: +.DS +.if t .ta \w'struct 'u +\w" struct ns_addr"u +2.0i +struct idp { + u_short idp_sum; /* Checksum */ + u_short idp_len; /* Length, in bytes, including header */ + u_char idp_tc; /* Transport Control (i.e., hop count) */ + u_char idp_pt; /* Packet Type (i.e., level 2 protocol) */ + struct ns_addr idp_dna; /* Destination Network Address */ + struct ns_addr idp_sna; /* Source Network Address */ +}; +.DE +The primary field of interest in an IDP header is the \fIpacket type\fP +field. The standard values for this field are (as defined +in <\fInetns/ns.h\fP>): +.DS +.if t .ta \w" #define"u +\w" NSPROTO_ERROR"u +1.0i +#define NSPROTO_RI 1 /* Routing Information */ +#define NSPROTO_ECHO 2 /* Echo Protocol */ +#define NSPROTO_ERROR 3 /* Error Protocol */ +#define NSPROTO_PE 4 /* Packet Exchange */ +#define NSPROTO_SPP 5 /* Sequenced Packet */ +.DE +For SPP connections, the contents of this field are +automatically set to NSPROTO_SPP; for IDP packets, +this value defaults to zero, which means ``unknown''. +.PP +Setting the value of that field with SO_DEFAULT_HEADERS is +easy: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> +#include <netns/idp.h> + ... +struct sockaddr sns; +struct idp proto_idp; /* prototype header */ +int s, on = 1; + ... +s = socket(AF_NS, SOCK_DGRAM, 0); + ... +bind(s, (struct sockaddr *) &sns, sizeof (sns)); +proto_idp.idp_pt = NSPROTO_PE; /* packet exchange */ +setsockopt(s, NSPROTO_IDP, SO_DEFAULT_HEADERS, (char *) &proto_idp, + sizeof(proto_idp)); + ... +.DE +.PP +Using SO_HEADERS_ON_OUTPUT is somewhat more difficult. When +SO_HEADERS_ON_OUTPUT is turned on for an IDP socket, the socket +becomes (for all intents and purposes) a raw socket. In this +case, all the fields of the prototype header (except the +length and checksum fields, which are computed by the kernel) +must be filled in correctly in order for the socket to send and +receive data in a sensible manner. To be more specific, the +source address must be set to that of the host sending the +data; the destination address must be set to that of the +host for whom the data is intended; the packet type must be +set to whatever value is desired; and the hopcount must be +set to some reasonable value (almost always zero). It should +also be noted that simply sending data using \fIwrite\fP +will not work unless a \fIconnect\fP or \fIsendto\fP call +is used, in spite of the fact that it is the destination +address in the prototype header that is used, not the one +given in either of those calls. For almost +all IDP applications , using SO_DEFAULT_HEADERS is easier and +more desirable than writing headers. +.NH 2 +Three-way Handshake +.PP +The semantics of SPP connections indicates that a three-way +handshake, involving changes in the datastream type, should \(em +but is not absolutely required to \(em take place before a SPP +connection is closed. Almost all SPP connections are +``well-behaved'' in this manner; when communicating with +any process, it is best to assume that the three-way handshake +is required unless it is known for certain that it is not +required. In a three-way close, the closing process +indicates that it wishes to close the connection by sending +a zero-length packet with end-of-message set and with +datastream type 254. The other side of the connection +indicates that it is OK to close by sending a zero-length +packet with end-of-message set and datastream type 255. Finally, +the closing process replies with a zero-length packet with +substream type 255; at this point, the connection is considered +closed. The following code fragments are simplified examples +of how one might handle this three-way handshake at the user +level; in the future, support for this type of close will +probably be provided as part of the C library or as part of +the kernel. The first code fragment below illustrates how a process +might handle three-way handshake if it sees that the process it +is communicating with wants to close the connection: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> +#include <netns/sp.h> + ... +#ifndef SPPSST_END +#define SPPSST_END 254 +#define SPPSST_ENDREPLY 255 +#endif +struct sphdr proto_sp; +int s; + ... +read(s, buf, BUFSIZE); +if (((struct sphdr *)buf)->sp_dt == SPPSST_END) { + /* + * SPPSST_END indicates that the other side wants to + * close. + */ + proto_sp.sp_dt = SPPSST_ENDREPLY; + proto_sp.sp_cc = SP_EM; + setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, + sizeof(proto_sp)); + write(s, buf, 0); + /* + * Write a zero-length packet with datastream type = SPPSST_ENDREPLY + * to indicate that the close is OK with us. The packet that we + * don't see (because we don't look for it) is another packet + * from the other side of the connection, with SPPSST_ENDREPLY + * on it it, too. Once that packet is sent, the connection is + * considered closed; note that we really ought to retransmit + * the close for some time if we do not get a reply. + */ + close(s); +} + ... +.DE +To indicate to another process that we would like to close the +connection, the following code would suffice: +.DS +#include <sys/types.h> +#include <sys/socket.h> +#include <netns/ns.h> +#include <netns/sp.h> + ... +#ifndef SPPSST_END +#define SPPSST_END 254 +#define SPPSST_ENDREPLY 255 +#endif +struct sphdr proto_sp; +int s; + ... +proto_sp.sp_dt = SPPSST_END; +proto_sp.sp_cc = SP_EM; +setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, + sizeof(proto_sp)); +write(s, buf, 0); /* send the end request */ +proto_sp.sp_dt = SPPSST_ENDREPLY; +setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, + sizeof(proto_sp)); +/* + * We assume (perhaps unwisely) + * that the other side will send the + * ENDREPLY, so we'll just send our final ENDREPLY + * as if we'd seen theirs already. + */ +write(s, buf, 0); +close(s); + ... +.DE +.NH 2 +Packet Exchange +.PP +The Xerox standard protocols include a protocol that is both +reliable and datagram-oriented. This protocol is known as +Packet Exchange (PEX or PE) and, like SPP, is layered on top +of IDP. PEX is important for a number of things: Courier +remote procedure calls may be expedited through the use +of PEX, and many Xerox servers are located by doing a PEX +``BroadcastForServers'' operation. Although there is no +implementation of PEX in the kernel, +it may be simulated at the user level with some clever coding +and the use of one peculiar \fIgetsockopt\fP. A PEX packet +looks like: +.DS +.if t .ta \w'struct 'u +\w" struct idp"u +2.0i +/* + * The packet-exchange header shown here is not defined + * as part of any of the system include files. + */ +struct pex { + struct idp p_idp; /* idp header */ + u_short ph_id[2]; /* unique transaction ID for pex */ + u_short ph_client; /* client type field for pex */ +}; +.DE +The \fIph_id\fP field is used to hold a ``unique id'' that +is used in duplicate suppression; the \fIph_client\fP +field indicates the PEX client type (similar to the packet +type field in the IDP header). PEX reliability stems from the +fact that it is an idempotent (``I send a packet to you, you +send a packet to me'') protocol. Processes on each side of +the connection may use the unique id to determine if they have +seen a given packet before (the unique id field differs on each +packet sent) so that duplicates may be detected, and to indicate +which message a given packet is in response to. If a packet with +a given unique id is sent and no response is received in a given +amount of time, the packet is retransmitted until it is decided +that no response will ever be received. To simulate PEX, one +must be able to generate unique ids -- something that is hard to +do at the user level with any real guarantee that the id is really +unique. Therefore, a means (via \fIgetsockopt\fP) has been provided +for getting unique ids from the kernel. The following code fragment +indicates how to get a unique id: +.DS +long uniqueid; +int s, idsize = sizeof(uniqueid); + ... +s = socket(AF_NS, SOCK_DGRAM, 0); + ... +/* get id from the kernel -- only on IDP sockets */ +getsockopt(s, NSPROTO_PE, SO_SEQNO, (char *)&uniqueid, &idsize); + ... +.DE +The retransmission and duplicate suppression code required to +simulate PEX fully is left as an exercise for the reader. +.NH 2 +Inetd +.PP +One of the daemons provided with 4.4BSD is \fIinetd\fP, the +so called ``internet super-server.'' +Having one daemon listen for requests for many daemons +instead of having each daemon listen for its own requests +reduces the number of idle daemons and simplies their implementation. +.I Inetd +handles +two types of services: standard and TCPMUX. +A standard service has a well-known port assigned to it and +is listed in +.I /etc/services +(see \f2services\f1(5)); +it may be a service that implements an official Internet standard or is a +BSD-specific service. +TCPMUX services are nonstandard and do not have a +well-known port assigned to them. +They are invoked from +.I inetd +when a program connects to the "tcpmux" well-known port and specifies +the service name. +This is useful for adding locally-developed servers. +.PP +\fIInetd\fP is invoked at boot +time, and determines from the file \fI/etc/inetd.conf\fP the +servers for which it is to listen. Once this information has been +read and a pristine environment created, \fIinetd\fP proceeds +to create one socket for each service it is to listen for, +binding the appropriate port number to each socket. +.PP +\fIInetd\fP then performs a \fIselect\fP on all these +sockets for read availability, waiting for somebody wishing +a connection to the service corresponding to +that socket. \fIInetd\fP then performs an \fIaccept\fP on +the socket in question, \fIfork\fPs, \fIdup\fPs the new +socket to file descriptors 0 and 1 (stdin and +stdout), closes other open file +descriptors, and \fIexec\fPs the appropriate server. +.PP +Servers making use of \fIinetd\fP are considerably simplified, +as \fIinetd\fP takes care of the majority of the IPC work +required in establishing a connection. The server invoked +by \fIinetd\fP expects the socket connected to its client +on file descriptors 0 and 1, and may immediately perform +any operations such as \fIread\fP, \fIwrite\fP, \fIsend\fP, +or \fIrecv\fP. Indeed, servers may use +buffered I/O as provided by the ``stdio'' conventions, as +long as they remember to use \fIfflush\fP when appropriate. +.PP +One call which may be of interest to individuals writing +servers under \fIinetd\fP is the \fIgetpeername\fP call, +which returns the address of the peer (process) connected +on the other end of the socket. For example, to log the +Internet address in ``dot notation'' (e.g., ``128.32.0.4'') +of a client connected to a server under +\fIinetd\fP, the following code might be used: +.DS +struct sockaddr_in name; +int namelen = sizeof (name); + ... +if (getpeername(0, (struct sockaddr *)&name, &namelen) < 0) { + syslog(LOG_ERR, "getpeername: %m"); + exit(1); +} else + syslog(LOG_INFO, "Connection from %s", inet_ntoa(name.sin_addr)); + ... +.DE +While the \fIgetpeername\fP call is especially useful when +writing programs to run with \fIinetd\fP, it can be used +under other circumstances. Be warned, however, that \fIgetpeername\fP will +fail on UNIX domain sockets. +.PP +Standard TCP +services are assigned unique well-known port numbers in the range of +0 to 1023 by the +Internet Assigned Numbers Authority (IANA@ISI.EDU). +The limited number of ports in this range are +assigned to official Internet protocols. +The TCPMUX service allows you to add +locally-developed protocols without needing an official TCP port assignment. +The TCPMUX protocol described in RFC-1078 is simple: +.QP +``A TCP client connects to a foreign host on TCP port 1. It sends the +service name followed by a carriage-return line-feed <CRLF>. +The service name is never case sensitive. +The server replies with a +single character indicating positive ("+") or negative ("\-") +acknowledgment, immediately followed by an optional message of +explanation, terminated with a <CRLF>. If the reply was positive, +the selected protocol begins; otherwise the connection is closed.'' +.LP +In 4.4BSD, the TCPMUX service is built into +.IR inetd , +that is, +.IR inetd +listens on TCP port 1 for requests for TCPMUX services listed +in \f2inetd.conf\f1. +.IR inetd (8) +describes the format of TCPMUX entries for \f2inetd.conf\f1. +.PP +The following is an example TCPMUX server and its \f2inetd.conf\f1 entry. +More sophisticated servers may want to do additional processing +before returning the positive or negative acknowledgement. +.DS +#include <sys/types.h> +#include <stdio.h> + +main() +{ + time_t t; + + printf("+Go\er\en"); + fflush(stdout); + time(&t); + printf("%d = %s", t, ctime(&t)); + fflush(stdout); +} +.DE +The \f2inetd.conf\f1 entry is: +.DS +tcpmux/current_time stream tcp nowait nobody /d/curtime curtime +.DE +Here's the portion of the client code that handles the TCPMUX handshake: +.DS +char line[BUFSIZ]; +FILE *fp; + ... + +/* Use stdio for reading data from the server */ +fp = fdopen(sock, "r"); +if (fp == NULL) { + fprintf(stderr, "Can't create file pointer\en"); + exit(1); +} + +/* Send service request */ +sprintf(line, "%s\er\en", "current_time"); +if (write(sock, line, strlen(line)) < 0) { + perror("write"); + exit(1); +} + +/* Get ACK/NAK response from the server */ +if (fgets(line, sizeof(line), fp) == NULL) { + if (feof(fp)) { + die(); + } else { + fprintf(stderr, "Error reading response\en"); + exit(1); + } +} + +/* Delete <CR> */ +if ((lp = index(line, '\r')) != NULL) { + *lp = '\0'; +} + +switch (line[0]) { + case '+': + printf("Got ACK: %s\en", &line[1]); + break; + case '-': + printf("Got NAK: %s\en", &line[1]); + exit(0); + default: + printf("Got unknown response: %s\en", line); + exit(1); +} + +/* Get rest of data from the server */ +while ((fgets(line, sizeof(line), fp)) != NULL) { + fputs(line, stdout); +} +.DE diff --git a/share/doc/psd/21.ipc/Makefile b/share/doc/psd/21.ipc/Makefile new file mode 100644 index 0000000..67c3d6c --- /dev/null +++ b/share/doc/psd/21.ipc/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= psd/21.ipc +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/psd/21.ipc/spell.ok b/share/doc/psd/21.ipc/spell.ok new file mode 100644 index 0000000..02b45d4 --- /dev/null +++ b/share/doc/psd/21.ipc/spell.ok @@ -0,0 +1,347 @@ +4.2bsd +AF +ANYP +BUFSIZ +BUFSIZE +BroadcastForServers +CF +CLR +CRMOD +Clearinghouse +DARPA +DESTPORT +DGRAM +DONTROUTE +Datagram +EADDRINUSE +EADDRNOTAVAIL +EAGAIN +ECONNREFUSED +EHOSTDOWN +EHOSTUNREACH +EINTR +ENDREPLY +ENETDOWN +ENETUNREACH +ENOBUFS +EPROTONOSUPPORT +EPROTOTYPE +ETIMEDOUT +EWOULDBLOCK +Ethernet +FASYNC +FCREATE +FD +FNDELAY +FTP +FTRUNCATE +FWRITE +FWRONLY +Fabry +GETOWN +Gethostybyname +IDP +IFF +IFNAMSIZ +INADDR +INET +INFO +IP +IPC +IPPORT +ISSET +Inetd +LF +LH +LOOPBACK +Lapsley +Leffler +MSG +MYADDRESS +MYPORT +NS +NSPROTO +OB +OOB +OOBINLINE +Optlen +Optval +PE +PEX +POINTTOPOINT +PS1:8 +RDONLY +RDWR +REUSEADDR +RF +RH +RWHODIR +SEQNO +SEQPACKET +SETFL +SETOWN +SETSIZE +SIGALRM +SIGCHLD +SIGIO +SIGURG +SIOCATMARK +SIOCGIFBRDADDR +SIOCGIFCONF +SIOCGIFDSTADDR +SIOCGIFFLAGS +SIOCGPGRP +SIOCSPGRP +SOF +SP +SPP +SPPSST +Science:UofMaryland +TCP +TELNET +TIOCFLUSH +TIOCGETP +TIOCNOTTY +TIOCSETP +TRUNC +Torek +Tutorial''PS1:8 +USERRESERVED +VAX +WNOHANG +WRONLY +XSIS +XTABS +ack +addr +addr.s +addr.sa +addr.sun +addr.x +addrtype +alo +argc +argv +arpa +b.sg +bcmp +bcopy +broadaddr +buf +buf.buf +buf.proto +buflen +bzero +c.f +cad +caddr +calder +daemons +dali +databuf +datagram +datastream +dev +dna +doit +dst +dst.sin +dst.sns +dstaddr +dt +dup2 +en0 +endhostent +endif +ernie +errno +es +esvax +exceptmask +execptfds +fcntl +fcntl.h +fd +fflush +file.h +foo +fprintf +from.sin +fromlen +gethostbyaddr +gethostbyname +gethostbynameandnet +gethostent +gethostname +getnetbyname +getnetbynumber +getnetent +getpeername +getprotobyname +getprotobynumber +getprotoent +getservbyname +getservbyport +getservent +getsockopt +goto +gotpty +gyre +gyre:Computer +hardcoding +hopcount +host.c +hostent +hostname +hostnames +hosts.equiv +htonl +htons +idp +idp.h +idp.idp +idsize +if.h +ifc +ifc.ifc +ifconf +ifcu +ifcu.ifcu +ifndef +ifr +ifreq +ifru +ifru.ifru +in.h +inet +inetd +inetd.conf +ing +ingres +io +ioctl.h +ipc +kim +len +localnet +lport +lq +makeaddr +matisse +medea +miro +monet +name.sin +namelen +nameserver +nb +netdb.h +netent +netinet +netns +netnum +netof +newsock +newtcp +nfds +ns +ns.h +ntoa +ntohl +ntohs +onalrm +oob +optlen +optname +optval +oz +pathname +pathnames +pex +pgrp +ph +pp +proto +protoent +pt +pty +ptyXX +ptyp +ptyxy +queueing +readfds +readmask +recv +recvfrom +recvtime +rem +req +rhosts +rlogin +rlogind +rq +rresvport +ruptime +rwho +rwhod +sendto +servent +server.sin +server.sun +sethostent +setsockopt +sid +sigvec +sin.sin +sizeof +sna +snew +sns +sns.sns +sockaddr +socket.h +sp +sp.h +sp.sp +sphdr +spp +spp.sp +sprintf +statbuf +statvax +std +stderr +stdin +stdio.h +stdout +strcmp +strcpy +strlen +syslog +ta +tcp +telnet +time.h +timeval +tmp +tolen +ttyxy +tuples +types.h +ucbvax +udp +un +un.h +uniqueid +useable +usec +val +wait.h +wait.tv +wd +wd.wd +whod +wildcard +wildcarded +writefds +writemask diff --git a/share/doc/psd/22.rpcgen/Makefile b/share/doc/psd/22.rpcgen/Makefile new file mode 100644 index 0000000..4c38add --- /dev/null +++ b/share/doc/psd/22.rpcgen/Makefile @@ -0,0 +1,9 @@ +# $FreeBSD$ + +VOLUME= psd/22.rpcgen +SRCS= stubs rpcgen.ms +MACROS= -ms +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../lib/libc/rpc/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/23.rpc/Makefile b/share/doc/psd/23.rpc/Makefile new file mode 100644 index 0000000..77849b6 --- /dev/null +++ b/share/doc/psd/23.rpc/Makefile @@ -0,0 +1,10 @@ +# $FreeBSD$ + +VOLUME= psd/23.rpc +SRCS= stubs rpc.prog.ms +MACROS= -ms +USE_TBL= +USE_PIC= +SRCDIR= ${.CURDIR}/../../../../lib/libc/rpc/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/24.xdr/Makefile b/share/doc/psd/24.xdr/Makefile new file mode 100644 index 0000000..878dca1 --- /dev/null +++ b/share/doc/psd/24.xdr/Makefile @@ -0,0 +1,9 @@ +# $FreeBSD$ + +VOLUME= psd/24.xdr +SRCS= stubs xdr.nts.ms +MACROS= -ms +USE_EQN= +SRCDIR= ${.CURDIR}/../../../../lib/libc/rpc/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/25.xdrrfc/Makefile b/share/doc/psd/25.xdrrfc/Makefile new file mode 100644 index 0000000..105135e --- /dev/null +++ b/share/doc/psd/25.xdrrfc/Makefile @@ -0,0 +1,9 @@ +# $FreeBSD$ + +VOLUME= psd/25.xdrrfc +SRCS= stubs xdr.rfc.ms +MACROS= -ms +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../lib/libc/rpc/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/26.rpcrfc/Makefile b/share/doc/psd/26.rpcrfc/Makefile new file mode 100644 index 0000000..79214f1 --- /dev/null +++ b/share/doc/psd/26.rpcrfc/Makefile @@ -0,0 +1,9 @@ +# $FreeBSD$ + +VOLUME= psd/26.rpcrfc +SRCS= stubs rpc.rfc.ms +MACROS= -ms +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../lib/libc/rpc/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/27.nfsrpc/Makefile b/share/doc/psd/27.nfsrpc/Makefile new file mode 100644 index 0000000..5904787 --- /dev/null +++ b/share/doc/psd/27.nfsrpc/Makefile @@ -0,0 +1,9 @@ +# $FreeBSD$ + +VOLUME= psd/27.nfsrfc +SRCS= stubs nfs.rfc.ms +MACROS= -ms +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../lib/libc/rpc/PSD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/28.cvs/Makefile b/share/doc/psd/28.cvs/Makefile new file mode 100644 index 0000000..a624732 --- /dev/null +++ b/share/doc/psd/28.cvs/Makefile @@ -0,0 +1,10 @@ +# $FreeBSD$ + +VOLUME= psd/28.cvs +SRCS= cvs-paper.ms +MACROS= -ms +USE_PIC= +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../contrib/cvs/doc + +.include <bsd.doc.mk> diff --git a/share/doc/psd/Makefile b/share/doc/psd/Makefile new file mode 100644 index 0000000..d50f05b --- /dev/null +++ b/share/doc/psd/Makefile @@ -0,0 +1,41 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +# The following modules do not build/install: +# 10.gdb + +# The following modules do not apply to FreeBSD: +# 07.pascal 08.f77 09.f77io + +# The following previously encumbered files have not yet been added to +# the tree: +# 11.adb + +SUBDIR= title \ + contents \ + 01.cacm \ + 02.implement \ + 03.iosys \ + 04.uprog \ + 05.sysman \ + 06.Clang \ + 12.make \ + 13.rcs \ + 15.yacc \ + 16.lex \ + 17.m4 \ + 18.gprof \ + 20.ipctut \ + 21.ipc + +# The following modules don't appear in the O'Reilly book, but +# are in the 4.4BSD distribution. +SUBDIR+=22.rpcgen \ + 23.rpc \ + 24.xdr \ + 25.xdrrfc \ + 26.rpcrfc \ + 27.nfsrpc \ + 28.cvs + +.include <bsd.subdir.mk> diff --git a/share/doc/psd/contents/Makefile b/share/doc/psd/contents/Makefile new file mode 100644 index 0000000..38864e9 --- /dev/null +++ b/share/doc/psd/contents/Makefile @@ -0,0 +1,8 @@ +# $FreeBSD$ + +VOLUME= psd +DOC= contents +SRCS= contents.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/psd/contents/contents.ms b/share/doc/psd/contents/contents.ms new file mode 100644 index 0000000..9f374dc --- /dev/null +++ b/share/doc/psd/contents/contents.ms @@ -0,0 +1,289 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)00.contents 8.1 (Berkeley) 6/8/93 +.\" $FreeBSD$ +.\" +.OH '''PSD Contents' +.EH 'PSD Contents''' +.TL +UNIX Programmer's Supplementary Documents (PSD) +.if !r.U .nr .U 0 +.if \n(.U \{\ +.br +.>> <a href="Title.html">Title.html</a> +.\} +.sp +\s-2 4.4 Berkeley Software Distribution\s+2 +.sp +\fRJune, 1993\fR +.PP +This volume contains documents which supplement the manual pages in +.I +The +.UX +Programmer's Reference Manual +.R +for the FreeBSD system as distributed by the FreeBSD Project. +.SH +Documents of Historical Interest +.IP +.tl 'The Unix Time\-Sharing System''PSD:1' +.QP +Dennis Ritchie and Ken Thompson's original paper about UNIX, reprinted +from Communications of the ACM. +.sp +.IP +.tl 'Unix Implementation''PSD:2' +.QP +Ken Thompson's description of the implementation of the Version 7 +kernel and file system. +.sp +.IP +.tl 'The Unix I/O System''PSD:3' +.QP +Dennis Ritchie's overview of the I/O System of Version 7; still helpful for +those writing device drivers. +.sp +.IP +.tl 'Unix Programming \- Second Edition ''PSD:4' +.QP +Describes the programming interface to the UNIX version 7 operating +system and the standard I/O library. Should be supplemented by +Kernighan and Pike, ``The UNIX Programming Environment'', +Prentice-Hall, 1984 and especially by the Programmer Reference Manual +section 2 (system calls) and 3 (library routines). +.sp +.IP +.tl 'Berkeley Software Architecture Manual (4.4 Edition)''PSD:5' +.if \n(.U \{\ +.br +.>> <a href="05.sysman/paper.html">05.sysman/paper.html</a> +.\} +.QP +A concise and terse description of the system call interface +provided in Berkeley Unix, as revised for 4.4BSD. +This will never be a best seller. + +.SH +Languages in common use +.IP +.tl 'The C Programming Language \- Reference Manual''PSD:6' +.QP +Official statement of the syntax of C. +Should be supplemented by ``The C Programming Language,'' +B.W. Kernighan and D.M. Ritchie, Prentice-Hall, 1978, that +contains a tutorial introduction and many examples. +.sp +.IP +.tl 'Berkeley Pascal User\'s Manual''PSD:7' +.QP +An implementation of this language popular for learning to program. +(Not provided in FreeBSD.) +.sp +.IP +.tl 'A Portable Fortran 77 Compiler''PSD:8' +.QP +A revised version of the document which originally appeared in +Volume 2b of the Bell Labs documentation; +this version reflects the work done at Berkeley. +(Not provided in FreeBSD.) +.sp +.IP +.tl 'Introduction to the f77 I/O Library''PSD:9' +.QP +A description of the revised input/output library for Fortran 77, +reflecting work carried out at Berkeley. (Not provided in FreeBSD.) + +.SH +Programming Tools +.IP +.tl 'Debugging with GDB: The GNU Source-Level Debugger''PSD:10' +.QP +How to debug programs using the source level \fIgdb\fP debugger +(or how to debug programs without having to know much about machine language). +(A TeXinfo version is provided separately.) +.sp +.IP +.tl 'A Tutorial Introduction to ADB''PSD:11' +.QP +How to debug programs using the assembly-language level \fIadb\fP debugger. +(Not provided in FreeBSD.) +.sp +.IP +.tl 'Make \- A Program for Maintaining Computer Programs''PSD:12' +.if \n(.U \{\ +.br +.>> <a href="12.make/paper.html">12.make/paper.html</a> +.\} +.QP +Indispensable tool for making sure large programs are properly +compiled with minimal effort. +.sp +.IP +.tl 'An Introduction to the Revision Control System''PSD:13' +.if \n(.U \{\ +.br +.>> <a href="13.rcs/paper.html">13.rcs/paper.html</a> +.\} +.QP +RCS is a user-contributed tool for working together with other people +without stepping on each other's toes. +An alternative to \fIsccs\fR for controlling software changes. +.sp +.IP +.tl 'An Introduction to the Source Code Control System''PSD:14' +.QP +A useful introductory article for those users with +installations licensed for SCCS. +.sp +.IP +.tl 'YACC: Yet Another Compiler-Compiler''PSD:15' +.QP +Converts a BNF specification of a language and semantic actions +written in C into a compiler for that language. +.sp +.IP +.tl 'LEX \- A Lexical Analyzer Generator''PSD:16' +.QP +Creates a recognizer for a set of regular expressions: +each regular expression can be followed by arbitrary C code +to be executed upon finding the regular expression. +.sp +.IP +.tl 'The M4 Macro Processor''PSD:17' +.QP +M4 is a macro processor useful in its own right and as a +front-end for C, Ratfor, and Cobol. +.sp +.IP +.tl 'gprof: a Call Graph Execution Profiler''PSD:18' +.if \n(.U \{\ +.br +.>> <a href="18.gprof/paper.html">18.gprof/paper.html +.\} +.QP +A program to show the call graph and execution time of a program. +Indispensable aid for improving the running time of almost everything. + +.SH +General Reference +.IP +.tl 'An Introductory 4.4BSD Interprocess Communication Tutorial''PSD:20' +.if \n(.U \{\ +.br +.>> <a href="20.ipctut/paper.html">20.ipctut/paper.html +.\} +.QP +How to write programs that use the Interprocess Communication Facilities +of 4.4BSD. +.sp +.IP +.tl 'An Advanced 4.4BSD Interprocess Communication Tutorial''PSD:21' +.if \n(.U \{\ +.br +.>> <a href="21.ipc/paper.html">21.ipc/paper.html +.\} +.QP +The reference document (with some examples) for the Interprocess Communication +Facilities of 4.4BSD. +.sp +.IP +.tl 'RPCGEN Programming Guide''PSD:22' +.if \n(.U \{\ +.br +.>> <a href="22.rpcgen/paper.html">22.rpcgen/paper.html +.\} +.QP +Manual for the ONC RPC stub-generating program, provided by Sun Microsystems. +.sp +.IP +.tl 'Remote Procedure Call Programming Guide''PSD:23' +.if \n(.U \{\ +.br +.>> <a href="23.rpc/paper.html">23.rpc/paper.html +.\} +.QP +A tutorial introduction to programming the ONC RPC system, provided by +Sun Microsystems. +.sp +.IP +.tl 'External Data Representation: Sun Technical Notes''PSD:24' +.if \n(.U \{\ +.br +.>> <a href="24.xdr/paper.html">24.xdr/paper.html +.\} +.QP +Technical details about the design of the XDR component of ONC RPC, +provided by Sun Microsystems. +.sp +.IP +.tl 'External Data Representation Standard: Protocol Specification''PSD:25' +.if \n(.U \{\ +.br +.>> <a href="25.xdrrfc/paper.html">25.xdrrfc/paper.html +.\} +.QP +The Internet RFC specifying ONC XDR, provided by Sun Microsystems. +.sp +.IP +.tl 'Remote Procedure Calls: Protocol Specification''PSD:26' +.if \n(.U \{\ +.br +.>> <a href="26.rpcrfc/paper.html">26.rpcrfc/paper.html +.\} +.QP +The Internet RFC specifying ONC RPC, RFC 1050, as provided by Sun +Microsystems. +.sp +.IP +.tl 'Network File System: Version 2 Protocol Specification''PSD:27' +.if \n(.U \{\ +.br +.>> <a href="27.nfsrpc/paper.html">27.nfsrpc/paper.html +.\} +.QP +The Internet RFC specifying NFS, as provided by Sun Microsystems. +Note that the NFS-compatible filesystem itself, while +compliant with this specification, was not provided by Sun. +.sp +.IP +.tl 'CVS II: Parallelizing Software Development''PSD:28' +.if \n(.U \{\ +.br +.>> <a href="28.cvs/paper.html">28.cvs/paper.html +.\} +.QP +CVS (Concurrent Versions System) is a front end to the +RCS revision control system which extends the notion of +revision control from a collection of files in a single +directory to a hierarchical collection of directories each +containing revision controlled files. diff --git a/share/doc/psd/title/Makefile b/share/doc/psd/title/Makefile new file mode 100644 index 0000000..d073730 --- /dev/null +++ b/share/doc/psd/title/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +VOLUME= psd +DOC= Title +SRCS= Title + +.include <bsd.doc.mk> diff --git a/share/doc/psd/title/Title b/share/doc/psd/title/Title new file mode 100644 index 0000000..014b3d5 --- /dev/null +++ b/share/doc/psd/title/Title @@ -0,0 +1,132 @@ +.\" Copyright (c) 1986, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)Title 8.2 (Berkeley) 4/19/94 +.\" $FreeBSD$ +.\" +.ps 18 +.vs 22 +.sp 2.75i +.ft B +.ce 3 +UNIX Programmer's Supplementary Documents +(PSD) +.ps 14 +.vs 16 +.sp |4i +.ce 2 +4.4 Berkeley Software Distribution +.sp |5.75i +.ft R +.ps 12 +.vs 16 +.ce +June, 1993 +.sp |8.2i +.ce 5 +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California +Berkeley, California 94720 +.bp +\& +.sp |1i +.hy 0 +.ps 10 +.vs 12p +Copyright 1979, 1980, 1983, 1986, 1993 +The Regents of the University of California. All rights reserved. +.sp 2 +Other than the specific documents listed below as copyrighted by AT&T, +redistribution and use of this manual in source and binary forms, +with or without modification, are permitted provided that the +following conditions are met: +.sp 0.5 +.in +0.2i +.ta 0.2i +.ti -0.2i +1) Redistributions of this manual must retain the copyright +notices on this page, this list of conditions and the following disclaimer. +.ti -0.2i +2) Software or documentation that incorporates part of this manual must +reproduce the copyright notices on this page, this list of conditions and +the following disclaimer in the documentation and/or other materials +provided with the distribution. +.ti -0.2i +3) All advertising materials mentioning features or use of this software +must display the following acknowledgement: +``This product includes software developed by the University of +California, Berkeley and its contributors.'' +.ti -0.2i +4) Neither the name of the University nor the names of its contributors +may be used to endorse or promote products derived from this software +without specific prior written permission. +.in -0.2i +.sp +\fB\s-1THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE.\s+1\fP +.sp 2 +Documents PSD:1, 2, 3, 4, 6, 11, 15, 16, and 17 +are copyright 1979, AT&T Bell Laboratories, Incorporated. +Document PSD:8 is a modification of an earlier document that +is copyrighted 1979 by AT&T Bell Laboratories, Incorporated. +Holders of \x'-1p'UNIX\v'-4p'\s-3TM\s0\v'4p'/32V, +System III, or System V software licenses are +permitted to copy these documents, or any portion of them, +as necessary for licensed use of the software, +provided this copyright notice and statement of permission +are included. +.sp 2 +Document PSD:10 is part of the user contributed software and is +copyright 1992 by the Free Software Foundation, Inc. +Permission is granted to make and distribute verbatim copies of +this document provided the copyright notice and this permission notice +are preserved on all copies. +.sp 2 +Document PSD:13 is part of the user contributed software and is +copyright 1983 by Walter F. Tichy. +Permission to copy the RCS documentation or any portion thereof as +necessary for licensed use of the software is granted to licensees +of this software, provided this copyright notice is included. +.sp 2 +The views and conclusions contained in this manual are those of the +authors and should not be interpreted as representing official policies, +either expressed or implied, of the Regents of the University of California. diff --git a/share/doc/smm/01.setup/0.t b/share/doc/smm/01.setup/0.t new file mode 100644 index 0000000..1951cd0 --- /dev/null +++ b/share/doc/smm/01.setup/0.t @@ -0,0 +1,133 @@ +.\" Copyright (c) 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 8.1 (Berkeley) 7/27/93 +.\" $FreeBSD$ +.\" +.ds Ux \s-1UNIX\s0 +.ds Bs \s-1BSD\s0 +.\" Current version: +.ds 4B 4.4\*(Bs +.ds Ps 4.3\*(Bs +.\" tape and disk naming +.ds Mt mt +.ds Dk sd +.ds Dn disk +.ds Pa c +.\" block size used on the tape +.ds Bb 10240 +.ds Bz 20 +.\" document date +.ds Dy July 27, 1993 +.de Sm +\s-1\\$1\s0\\$2 +.. +.de Pn \" pathname +.ie n \fI\\$1\fP\\$2 +.el \f(CW\\$1\fP\\$2 +.. +.de Li \" literal +\f(CW\\$1\fP\\$2 +.. +.de I \" italicize first arg +\fI\\$1\fP\^\\$2 +.. +.de Xr \" manual reference +\fI\\$1\fP\^\\$2 +.. +.de Fn \" function +\fI\\$1\fP\^()\\$2 +.. +.bd S B 3 +.EH 'SMM:1-%''Installing and Operating \*(4B UNIX' +.OH 'Installing and Operating \*(4B UNIX''SMM:1-%' +.de Sh +.NH \\$1 +\\$2 +.nr PD .1v +.XS \\n% +.ta 0.6i +\\*(SN \\$2 +.XE +.nr PD .3v +.. +.TL +Installing and Operating \*(4B UNIX +.br +\*(Dy +.AU +Marshall Kirk McKusick +.AU +Keith Bostic +.AU +Michael J. Karels +.AU +Samuel J. Leffler +.AI +Computer Systems Research Group +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, California 94720 +(415) 642-7780 +.AU +Mike Hibler +.AI +Center for Software Science +Department of Computer Science +University of Utah +Salt Lake City, Utah 84112 +(801) 581-5017 +.AB +.PP +This document contains instructions for the +installation and operation of the +\*(4B release of UNIX\** +as distributed by The University of California at Berkeley. +.FS +UNIX is a registered trademark of USL in the USA and some other countries. +.FE +.PP +It discusses procedures for installing UNIX on a new machine, +and for upgrading an existing \*(Ps UNIX system to the new release. +An explanation of how to lay out filesystems on available disks +and the space requirements for various parts of the system are given. +A brief overview of the major changes to +the system between \*(Ps and \*(4B are outlined. +An explanation of how to set up terminal lines and user accounts, +and how to do system-specific tailoring is provided. +A description of how to install and configure the \*(4B networking +facilities is included. +Finally, the document details system operation procedures: +shutdown and startup, filesystem backup procedures, +resource control, performance monitoring, and procedures for recompiling +and reinstalling system software. +.AE +.bp +3 diff --git a/share/doc/smm/01.setup/1.t b/share/doc/smm/01.setup/1.t new file mode 100644 index 0000000..2f71b77 --- /dev/null +++ b/share/doc/smm/01.setup/1.t @@ -0,0 +1,172 @@ +.\" Copyright (c) 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 8.1 (Berkeley) 7/27/93 +.\" +.ds lq `` +.ds rq '' +.ds LH "Installing/Operating \*(4B +.ds RH Introduction +.ds CF \*(Dy +.LP +.bp +.Sh 1 "Introduction" +.PP +This document explains how to install the \*(4B Berkeley +version of UNIX on your system. +The filesystem format is compatible with \*(Ps +and it will only be necessary for you to do a full bootstrap +procedure if you are installing the release on a new machine. +The object file formats are completely different from the System +V release, so the most straightforward procedure for upgrading +a System V system is to do a full bootstrap. +.PP +The full bootstrap procedure +is outlined in section 2; the process starts with copying a filesystem +image onto a new disk. +This filesystem is then booted and used to extract the remainder of the +system binaries and sources from the archives on the tape(s). +.PP +The technique for upgrading a \*(Ps system is described +in section 3 of this document. +The upgrade procedure involves extracting system binaries +onto new root and +.Pn /usr +filesystems and merging local +configuration files into the new system. +User filesystems may be upgraded in place. +Most \*(Ps binaries may be used with \*(4B in the course +of the conversion. +It is desirable to recompile local sources after the conversion, +as the new compiler (GCC) provides superior code optimization. +Consult section 3.5 for a description of some of the differences +between \*(Ps and \*(4B. +.Sh 2 "Distribution format" +.PP +The distribution comes in two formats: +.DS +(3)\0\0 6250bpi 2400' 9-track magnetic tapes, or +(1)\0\0 8mm Exabyte tape +.DE +.PP +If you have the facilities, we \fBstrongly\fP recommend copying the +magnetic tape(s) in the distribution kit to guard against disaster. +The tapes contain \*(Bb-byte records. +There are interspersed tape marks; +end-of-tape is signaled by a double end-of-file. +The first file on the tape is architecture dependent. +Additional files on the tape(s) +contain tape archive images of the system binaries and sources (see +.Xr tar (1)\**). +.FS +References of the form \fIX\fP(Y) mean the entry named +\fIX\fP in section Y of the ``UNIX Programmer's Manual''. +.FE +See the tape label for a description of the contents +and format of each individual tape. +.Sh 2 "UNIX device naming" +.PP +Device names have a different syntax depending on whether you are talking +to the standalone system or a running UNIX kernel. +The standalone system syntax is currently architecture dependent and is +described in the various architecture specific sections as applicable. +When not running standalone, devices are available via files in the +.Pn /dev/ +directory. +The file name typically encodes the device type, its logical unit and +a partition within that unit. +For example, +.Pn /dev/sd2b +refers to the second partition (``b'') of +SCSI (``sd'') drive number ``2'', while +.Pn /dev/rmt0 +refers to the raw (``r'') interface of 9-track tape (``mt'') unit ``0''. +.PP +The mapping of physical addressing information (e.g. controller, target) +to a logical unit number is dependent on the system configuration. +In all simple cases, where only a single controller is present, a drive +with physical unit number 0 (e.g., as determined by its unit +specification, either unit plug or other selection mechanism) +will be called unit 0 in its UNIX file name. +This is not, however, strictly +necessary, since the system has a level of indirection in this naming. +If there are multiple controllers, the disk unit numbers will normally +be counted sequentially across controllers. This can be taken +advantage of to make the system less dependent on the interconnect +topology, and to make reconfiguration after hardware failure easier. +.PP +Each UNIX physical disk is divided into at most 8 logical disk partitions, +each of which may occupy any consecutive cylinder range on the physical +device. The cylinders occupied by the 8 partitions for each drive type +are specified initially in the disk description file +.Pn /etc/disktab +(c.f. +.Xr disktab (5)). +The partition information and description of the +drive geometry are written in one of the first sectors of each disk with the +.Xr disklabel (8) +program. Each partition may be used for either a +raw data area such as a paging area or to store a UNIX filesystem. +It is conventional for the first partition on a disk to be used +to store a root filesystem, from which UNIX may be bootstrapped. +The second partition is traditionally used as a paging area, and the +rest of the disk is divided into spaces for additional ``mounted +filesystems'' by use of one or more additional partitions. +.Sh 2 "UNIX devices: block and raw" +.PP +UNIX makes a distinction between ``block'' and ``raw'' (character) +devices. Each disk has a block device interface where +the system makes the device byte addressable and you can write +a single byte in the middle of the disk. The system will read +out the data from the disk sector, insert the byte you gave it +and put the modified data back. The disks with the names +.Pn /dev/xx0[a-h] , +etc., are block devices. +There are also raw devices available. +These have names like +.Pn /dev/rxx0[a-h] , +the ``r'' here standing for ``raw''. +Raw devices bypass the buffer cache and use DMA directly to/from +the program's I/O buffers; +they are normally restricted to full-sector transfers. +In the bootstrap procedures we +will often suggest using the raw devices, because these tend +to work faster. +Raw devices are used when making new filesystems, +when checking unmounted filesystems, +or for copying quiescent filesystems. +The block devices are used to mount filesystems. +.PP +You should be aware that it is sometimes important whether to use +the character device (for efficiency) or not (because it would not +work, e.g. to write a single byte in the middle of a sector). +Do not change the instructions by using the wrong type of device +indiscriminately. diff --git a/share/doc/smm/01.setup/2.t b/share/doc/smm/01.setup/2.t new file mode 100644 index 0000000..4220a6d --- /dev/null +++ b/share/doc/smm/01.setup/2.t @@ -0,0 +1,1659 @@ +.\" Copyright (c) 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 8.1 (Berkeley) 7/27/93 +.\" $FreeBSD$ +.\" +.ds lq `` +.ds rq '' +.ds LH "Installing/Operating \*(4B +.ds RH Bootstrapping +.ds CF \*(Dy +.Sh 1 "Bootstrap procedure" +.PP +This section explains the bootstrap procedure that can be used +to get the kernel supplied with this distribution running on your machine. +If you are not currently running \*(Ps you will +have to do a full bootstrap. +Section 3 describes how to upgrade a \*(Ps system. +An understanding of the operations used in a full bootstrap +is helpful in doing an upgrade as well. +In either case, it is highly desirable to read and understand +the remainder of this document before proceeding. +.PP +The distribution supports a somewhat wider set of machines than +those for which we have built binaries. +The architectures that are supported only in source form include: +.IP \(bu +Intel 386/486-based machines (ISA/AT or EISA bus only) +.IP \(bu +Sony News MIPS-based workstations +.IP \(bu +Omron Luna 68000-based workstations +.LP +If you wish to run one of these architectures, +you will have to build a cross compilation environment. +Note that the distribution does +.B not +include the machine support for the Tahoe and VAX architectures +found in previous BSD distributions. +Our primary development environment is the HP9000/300 series machines. +The other architectures are developed and supported by +people outside the university. +Consequently, we are not able to directly test or maintain these +other architectures, so cannot comment on their robustness, +reliability, or completeness. +.Sh 2 "Bootstrapping from the tape" +.LP +The set of files on the distribution tape are as follows: +.IP 1) +A +.Xr dd (1) +(HP300), +.Xr tar (1) +(DECstation), or +.Xr dump (8) +(SPARC) image of the root filesystem +.IP 2) +A +.Xr tar +image of the +.Pn /var +filesystem +.IP 3) +A +.Xr tar +image of the +.Pn /usr +filesystem +.IP 4) +A +.Xr tar +image of +.Pn /usr/src/sys +.IP 5) +A +.Xr tar +image of +.Pn /usr/src +except sys and contrib +.IP 6) +A +.Xr tar +image of +.Pn /usr/src/contrib +.IP 7) +(8mm Exabyte tape distributions only) +A +.Xr tar +image of +.Pn /usr/src/X11R5 +.LP +The tape bootstrap procedure used to create a +working system involves the following major steps: +.IP 1) +Transfer a bootable root filesystem from the tape to a disk +and get it booted and running. +.IP 2) +Build and restore the +.Pn /var +and +.Pn /usr +filesystems from tape with +.Xr tar (1). +.IP 3) +Extract the system and utility source files as desired. +.PP +The following sections describe the above steps in detail. +The details of the first step vary between architectures. +The specific steps for the HP300, SPARC, and DECstation are +given in the next three sections respectively. +You should follow the instructions for your particular architecture. +In all sections, +commands you are expected to type are shown in italics, while that +information printed by the system is shown emboldened. +.Sh 2 "Booting the HP300" +.Sh 3 "Supported hardware" +.LP +The hardware supported by \*(4B for the HP300/400 is as follows: +.TS +center box; +lw(1i) lw(4i). +CPU's T{ +68020 based (318, 319, 320, 330 and 350), +68030 based (340, 345, 360, 370, 375, 400) and +68040 based (380, 425, 433). +T} +_ +DISK's T{ +HP-IB/CS80 (7912, 7914, 7933, 7936, 7945, 7957, 7958, 7959, 2200, 2203) +and SCSI-I (including magneto-optical). +T} +_ +TAPE's T{ +Low-density CS80 cartridge (7914, 7946, 9144), +high-density CS80 cartridge (9145), +HP SCSI DAT and +SCSI Exabyte. +T} +_ +RS232 T{ +98644 built-in single-port, 98642 4-port and 98638 8-port interfaces. +T} +_ +NETWORK T{ +98643 internal and external LAN cards. +T} +_ +GRAPHICS T{ +Terminal emulation and raw frame buffer support for +98544 / 98545 / 98547 (Topcat color & monochrome), +98548 / 98549 / 98550 (Catseye color & monochrome), +98700 / 98710 (Gatorbox), +98720 / 98721 (Renaissance), +98730 / 98731 (DaVinci) and +A1096A (Hyperion monochrome). +T} +_ +INPUT T{ +General interface supporting all HIL devices. +(e.g. keyboard, 2 and 3 button mice, ID module, ...) +T} +_ +MISC T{ +Battery-backed real time clock, +builtin and 98625A/B HP-IB interfaces, +builtin and 98658A SCSI interfaces, +serial printers and plotters on HP-IB, +and SCSI autochanger device. +T} +.TE +.LP +Major items that are not supported +include the 310 and 332 CPU's, 400 series machines +configured for Domain/OS, EISA and VME bus adaptors, audio, the centronics +port, 1/2" tape drives (7980), CD-ROM, and the PVRX/TVRX 3D graphics displays. +.Sh 3 "Standalone device file naming" +.LP +The standalone system device name syntax on the HP300 is of the form: +.DS +xx(a,c,u,p) +.DE +where +\fIxx\fP is the device type, +\fIa\fP specifies the adaptor to use, +\fIc\fP the controller, +\fIu\fP the unit, and +\fIp\fP a partition. +The \fIdevice type\fP differentiates the various disks and tapes and is one of: +``rd'' for HP-IB CS80 disks, +``ct'' for HP-IB CS80 cartridge tapes, or +``sd'' for SCSI-I disks +(SCSI-I tapes are currently not supported). +The \fIadaptor\fP field is a logical HP-IB or SCSI bus adaptor card number. +This will typically be +0 for SCSI disks, +0 for devices on the ``slow'' HP-IB interface (usually tapes) and +1 for devices on the ``fast'' HP-IB interface (usually disks). +To get a complete mapping of physical (select-code) to logical card numbers +just type a ^C at the standalone prompt. +The \fIcontroller\fP field is the disk or tape's target number on the +HP-IB or SCSI bus. +For SCSI the range is 0 to 6 (7 is the adaptor address) and +for HP-IB the range is 0 to 7. +The \fIunit\fP field is unused and should be 0. +The \fIpartition\fP field is interpreted differently for tapes +and disks: for disks it is a disk partition (in the range 0-7), +and for tapes it is a file number offset on the tape. +Thus, partition 2 of a SCSI disk drive at target 3 on SCSI bus 1 +would be ``sd(1,3,0,2)''. +If you have only one of any type bus adaptor, you may omit the adaptor +and controller numbers; +e.g. ``sd(0,2)'' could be used instead of ``sd(0,0,0,2)''. +The following examples always use the full syntax for clarity. +.Sh 3 "The procedure" +.LP +The basic steps involved in bringing up the HP300 are as follows: +.IP 1) +Obtain a second disk and format it, if necessary. +.IP 2) +Copy a root filesystem from the +tape onto the beginning of the disk. +.IP 3) +Boot the UNIX system on the new disk. +.IP 4) +(Optional) Build a root filesystem optimized for your disk. +.IP 5) +Label the disks with the +.Xr disklabel (8) +program. +.Sh 4 "Step 1: selecting and formatting a disk" +.PP +For your first system you will have to obtain a formatted disk +of a type given in the ``supported hardware'' list above. +If you want to load an entire binary system +(i.e., everything except +.Pn /usr/src ), +on the single disk you will need a minimum of 290MB, +ruling out anything smaller than a 7959B/S disk. +The disklabel included in the bootstrap root image is laid out +to accommodate this scenario. +Note that an HP SCSI magneto-optical disk will work fine for this case. +\*(4B will boot and run (albeit slowly) using one. +If you want to load source on a single disk system, +you will need at least 640MB (at least a 2213A SCSI or 2203A HP-IB disk). +A disk as small as the 7945A (54MB) can be used for the bootstrap +procedure but will hold only the root and primary swap partitions. +If you plan to use multiple disks, +refer to section 2.5 for suggestions on partitioning. +.PP +After selecting a disk, you may need to format it. +Since most HP disk drives come pre-formatted +(except optical media) +you probably will not, but if necessary, +you can format a disk under HP-UX using the +.Xr mediainit (1m) +program. +Once you have \*(4B up and running on one machine you can use the +.Xr scsiformat (8) +program to format additional SCSI disks. +Any additional HP-IB disks will have to be formatted using HP-UX. +.Sh 4 "Step 2: copying the root filesystem from tape to disk" +.PP +Once you have a formatted second disk you can use the +.Xr dd (1) +command under HP-UX to copy the root filesystem image from +the tape to the beginning of the second disk. +For HP's, the root filesystem image is the first file on the tape. +It includes a disklabel and bootblock along with the root filesystem. +An example command to copy the image from tape to the beginning of a disk is: +.DS +.ft CW +dd if=/dev/rmt/0m of=/dev/rdsk/1s0 bs=\*(Bzb +.DE +The actual special file syntax may vary depending on unit numbers and +the version of HP-UX that is running. +Consult the HP-UX +.Xr mt (7) +and +.Xr disk (7) +man pages for details. +.PP +Note that if you have a SCSI disk, you don't necessarily have to use +HP-UX (or an HP) to create the boot disk. +Any machine and operating system that will allow you to copy the +raw disk image out to block 0 of the disk will do. +.PP +If you have only a single machine with a single disk, +you may still be able to install and boot \*(4B if you have an +HP-IB cartridge tape drive. +If so, you can use a more difficult approach of booting a +standalone copy program from the tape, and using that to copy the +root filesystem image from the tape to the disk. +To do this, you need to extract the first file of the distribution tape +(the root image), copy it over to a machine with a cartridge drive +and then copy the image onto tape. +For example: +.DS +.ft CW +dd if=/dev/rst0 of=bootimage bs=\*(Bzb +rcp bootimage foo:/tmp/bootimage +<login to foo> +dd if=/tmp/bootimage of=/dev/rct/0m bs=\*(Bzb +.DE +Once this tape is created you can boot and run the standalone tape +copy program from it. +The copy program is loaded just as any other program would be loaded +by the bootrom in ``attended'' mode: +reset the CPU, +hold down the space bar until the word ``Keyboard'' appears in the +installed interface list, and +enter the menu selection for SYS_TCOPY. +Once loaded and running: +.DS +.TS +lw(2i) l. +\fBFrom:\fP \fI^C\fP (control-C to see logical adaptor assignments) +\fBhpib0 at sc7\fP +\fBscsi0 at sc14\fP +\fBFrom:\fP \fIct(0,7,0,0)\fP (HP-IB tape, target 7, first tape file) +\fBTo:\fP \fIsd(0,0,0,2)\fP (SCSI disk, target 0, third partition) +\fBCopy completed: 1728 records copied\fP +.TE +.DE +.LP +This copy will likely take 30 minutes or more. +.Sh 4 "Step 3: booting the root filesystem" +.PP +You now have a bootable root filesystem on the disk. +If you were previously running with two disks, +it would be best if you shut down the machine and turn off power on +the HP-UX drive. +It will be less confusing and it will eliminate any chance of accidentally +destroying the HP-UX disk. +If you used a cartridge tape for booting you should also unload the tape +at this point. +Whether you booted from tape or copied from disk you should now reboot +the machine and do another attended boot (see previous section), +this time with SYS_TBOOT. +Once loaded and running the boot program will display the CPU type and +prompt for a kernel file to boot: +.DS +.B +HP433 CPU +Boot +.R +\fB:\fP \fI/kernel\fP +.DE +.LP +After providing the kernel name, the machine will boot \*(4B with +output that looks about like this: +.DS +.B +597480+34120+139288 start 0xfe8019ec +Copyright (c) 1982, 1986, 1989, 1991, 1993 + The Regents of the University of California. +Copyright (c) 1992 Hewlett-Packard Company +Copyright (c) 1992 Motorola Inc. +All rights reserved. + +4.4BSD UNIX #1: Tue Jul 20 11:40:36 PDT 1993 + mckusick@vangogh.CS.Berkeley.EDU:/usr/obj/sys/compile/GENERIC.hp300 +HP9000/433 (33MHz MC68040 CPU+MMU+FPU, 4k on-chip physical I/D caches) +real mem = xxx +avail mem = ### +using ### buffers containing ### bytes of memory +(... information about available devices ...) +root device? +.R +.DE +.PP +The first three numbers are printed out by the bootstrap program and +are the sizes of different parts of the system (text, initialized and +uninitialized data). The system also allocates several system data +structures after it starts running. The sizes of these structures are +based on the amount of available memory and the maximum count of active +users expected, as declared in a system configuration description. This +will be discussed later. +.PP +UNIX itself then runs for the first time and begins by printing out a banner +identifying the release and +version of the system that is in use and the date that it was compiled. +.PP +Next the +.I mem +messages give the +amount of real (physical) memory and the +memory available to user programs +in bytes. +For example, if your machine has 16Mb bytes of memory, then +\fBxxx\fP will be 16777216. +.PP +The messages that come out next show what devices were found on +the current processor. These messages are described in +.Xr autoconf (4). +The distributed system may not have +found all the communications devices you have +or all the mass storage peripherals you have, especially +if you have more than +two of anything. You will correct this when you create +a description of your machine from which to configure a site-dependent +version of UNIX. +The messages printed at boot here contain much of the information +that will be used in creating the configuration. +In a correctly configured system most of the information +present in the configuration description +is printed out at boot time as the system verifies that each device +is present. +.PP +The \*(lqroot device?\*(rq prompt was printed by the system +to ask you for the name of the root filesystem to use. +This happens because the distribution system is a \fIgeneric\fP +system, i.e., it can be bootstrapped on a cpu with its root device +and paging area on any available disk drive. +You will most likely respond to the root device question with ``sd0'' +if you are booting from a SCSI disk, +or with ``rd0'' if you are booting from an HP-IB disk. +This response shows that the disk it is running +on is drive 0 of type ``sd'' or ``rd'' respectively. +If you have other disks attached to the system, +it is possible that the drive you are using will not be configured +as logical drive 0. +Check the autoconfiguration messages printed out by the kernel to +make sure. +These messages will show the type of every logical drive +and their associated controller and slave addresses. +You will later build a system tailored to your configuration +that will not prompt you for a root device when it is bootstrapped. +.DS +\fBroot device?\fP \fI\*(Dk0\fP +\fBWARNING: preposterous time in filesystem \-\- CHECK AND RESET THE DATE!\fP +\fBerase ^?, kill ^U, intr ^C\fP +\fB#\fP +.DE +.PP +The \*(lqerase ...\*(rq message is part of the +.Pn /.profile +that was executed by the root shell when it started. This message +tells you about the settings of the character erase, +line erase, and interrupt characters. +.PP +UNIX is now running, +and the \fIUNIX Programmer's Manual\fP applies. The ``#'' is the prompt +from the Bourne shell, and lets you know that you are the super-user, +whose login name is \*(lqroot\*(rq. +.PP +At this point, the root filesystem is mounted read-only. +Before continuing the installation, the filesystem needs to be ``updated'' +to allow writing and device special files for the following steps need +to be created. +This is done as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fImount_mfs -s 1000 -T type /dev/null /tmp\fP (create a writable filesystem) +(\fItype\fP is the disk type as determined from /etc/disktab) +\fB#\fP \fIcd /tmp\fP (connect to that directory) +\fB#\fP \fI../dev/MAKEDEV \*(Dk#\fP (create special files for root disk) +(\fI\*(Dk\fP is the disk type, \fI#\fP is the unit number) +(ignore warning from ``sh'') +\fB#\fP \fImount \-uw /tmp/\*(Dk#a /\fP (read-write mount root filesystem) +\fB#\fP \fIcd /dev\fP (go to device directory) +\fB#\fP \fI./MAKEDEV \*(Dk#\fP (create permanent special files for root disk) +(again, ignore warning from ``sh'') +.TE +.DE +.Sh 4 "Step 4: (optional) restoring the root filesystem" +.PP +The root filesystem that you are currently running on is complete, +however it probably is not optimally laid out for the disk on +which you are running. +If you will be cloning copies of the system onto multiple disks for +other machines, you are advised to connect one of these disks to +this machine, and build and restore a properly laid out root filesystem +onto it. +If this is the only machine on which you will be running \*(4B +or peak performance is not an issue, you can skip this step and +proceed directly to step 5. +.PP +Connect a second disk to your machine. +If you bootstrapped using the two disk method, you can +overwrite your initial HP-UX disk, as it will no longer +be needed (assuming you have no plans to run HP-UX again). +.PP +To really create the root filesystem on drive 1 +you should first label the disk as described in step 5 below. +Then run the following commands: +.DS +\fB#\fP \fIcd /dev\fP +\fB#\fP \fI./MAKEDEV \*(Dk1a\fP +\fB#\fP\|\fInewfs /dev/r\*(Dk1a\fP +\fB#\fP\|\fImount /dev/\*(Dk1a /mnt\fP +\fB#\fP\|\fIcd /mnt\fP +\fB#\fP\|\fIdump 0f \- /dev/r\*(Dk0a | restore xf \-\fP +(Note: restore will ask if you want to ``set owner/mode for '.''' +to which you should reply ``yes''.) +.DE +.PP +When this completes, +you should then shut down the system, and boot on the disk that +you just created following the procedure in step (3) above. +.Sh 4 "Step 5: placing labels on the disks" +.PP +For each disk on the HP300, \*(4B places information about the geometry +of the drive and the partition layout at byte offset 1024. +This information is written with +.Xr disklabel (8). +.PP +The root image just loaded includes a ``generic'' label intended to allow +easy installation of the root and +.Pn /usr +and may not be suitable for the actual +disk on which it was installed. +In particular, +it may make your disk appear larger or smaller than its real size. +In the former case, you lose some capacity. +In the latter, some of the partitions may map non-existent sectors +leading to errors if those partitions are used. +It is also possible that the defined geometry will interact poorly with +the filesystem code resulting in reduced performance. +However, as long as you are willing to give up a little space, +not use certain partitions or suffer minor performance degradation, +you might want to avoid this step; +especially if you do not know how to use +.Xr ed (1). +.PP +If you choose to edit this label, +you can fill in correct geometry information from +.Pn /etc/disktab . +You may also want to rework the ``e'' and ``f'' partitions used for loading +.Pn /usr +and +.Pn /var . +You should not attempt to, and +.Xr disklabel +will not let you, modify the ``a'', ``b'' and ``d'' partitions. +To edit a label: +.DS +\fB#\fP \fIEDITOR=ed\fP +\fB#\fP \fIexport EDITOR\fP +\fB#\fP \fIdisklabel -r -e /dev/r\fBXX#\fPd +.DE +where \fBXX\fP is the type and \fB#\fP is the logical drive number; e.g. +.Pn /dev/rsd0d +or +.Pn /dev/rrd0d . +Note the explicit use of the ``d'' partition. +This partition includes the bootblock as does ``c'' +and using it allows you to change the size of ``c''. +.PP +If you wish to label any additional disks, run the following command for each: +.DS +\fB#\|\fP\fIdisklabel -rw \fBXX# type\fP \fI"optional_pack_name"\fP +.DE +where \fBXX#\fP is the same as in the previous command +and \fBtype\fP is the HP300 disk device name as listed in +.Pn /etc/disktab . +The optional information may contain any descriptive name for the +contents of a disk, and may be up to 16 characters long. This procedure +will place the label on the disk using the information found in +.Pn /etc/disktab +for the disk type named. +If you have changed the disk partition sizes, +you may wish to add entries for the modified configuration in +.Pn /etc/disktab +before labeling the affected disks. +.PP +You have now completed the HP300 specific part of the installation. +Now proceed to the generic part of the installation +described starting in section 2.5 below. +Note that where the disk name ``sd'' is used throughout section 2.5, +you should substitute the name ``rd'' if you are running on an HP-IB disk. +Also, if you are loading on a single disk with the default disklabel, +.Pn /var +should be restored to the ``f'' partition and +.Pn /usr +to the ``e'' partition. +.Sh 2 "Booting the SPARC" +.Sh 3 "Supported hardware" +.LP +The hardware supported by \*(4B for the SPARC is as follows: +.TS +center box; +lw(1i) lw(4i). +CPU's T{ +SPARCstation 1 series (1, 1+, SLC, IPC) and +SPARCstation 2 series (2, IPX). +T} +_ +DISK's T{ +SCSI. +T} +_ +TAPE's T{ +none. +T} +_ +NETWORK T{ +SPARCstation Lance (le). +T} +_ +GRAPHICS T{ +bwtwo and cgthree. +T} +_ +INPUT T{ +Keyboard and mouse. +T} +_ +MISC T{ +Battery-backed real time clock, +built-in serial devices, +Sbus SCSI controller, +and audio device. +T} +.TE +.LP +Major items that are not supported include +anything VME-based, +the GX (cgsix) display, +the floppy disk, and SCSI tapes. +.Sh 3 "Limitations" +.LP +There are several important limitations on the \*(4B distribution +for the SPARC: +.IP 1) +You +.B must +have SunOS 4.1.x or Solaris to bring up \*(4B. +There is no SPARCstation bootstrap code in this distribution. The +Sun-supplied boot loader will be used to boot \*(4B; you must copy +this from your SunOS distribution. This imposes several +restrictions on the system, as detailed below. +.IP 2) +The \*(4B SPARC kernel does not remap SCSI IDs. A SCSI disk at +target 0 will become ``sd0'', where in SunOS the same disk will +normally be called ``sd3''. If your existing SunOS system is +diskful, it will be least painful to have SunOS running on the disk +on target 0 lun 0 and put \*(4B on the disk on target 3 lun 0. Both +systems will then think they are running on ``sd0'', and you can +boot either system as needed simply by changing the EEPROM's boot +device. +.IP 3) +There is no SCSI tape driver. +You must have another system for tape reading and backups. +.IP 4) +Although the \*(4B SPARC kernel will handle existing SunOS shared +libraries, it does not use or create them itself, and therefore +requires much more disk space than SunOS does. +.IP 5) +It is currently difficult (though not completely impossible) to +run \*(4B diskless. These instructions assume you will have a local +boot, swap, and root filesystem. +.IP 6) +When using a serial port rather than a graphics display as the console, +only port +.Pn ttya +can be used. +Attempts to use port +.Pn ttyb +will fail when the kernel tries +to print the boot up messages to the console. +.Sh 3 "The procedure" +.PP +You must have a spare disk on which to place \*(4B. +The steps involved in bootstrapping this tape are as follows: +.IP 1) +Bring up SunOS (preferably SunOS 4.1.x or Solaris 1.x, although +Solaris 2 may work \(em this is untested). +.IP 2) +Attach auxiliary SCSI disk(s). Format and label using the +SunOS formatting and labeling programs as needed. +Note that the root filesystem currently requires at least 10 MB; 16 MB +or more is recommended. The b partition will be used for swap; +this should be at least 32 MB. +.IP 3) +Use the SunOS +.Xr newfs +to build the root filesystem. You may also +want to build other filesystems at the same time. (By default, the +\*(4B +.Xr newfs +builds a filesystem that SunOS will not handle; if you +plan to switch OSes back and forth you may want to sacrifice the +performance gain from the new filesystem format for compatibility.) +You can build an old-format filesystem on \*(4B by giving the \-O +option to +.Xr newfs (8). +.Xr Fsck (8) +can convert old format filesystems to new format +filesystems, but not vice versa, +so you may want to initially build old format filesystems so that they +can be mounted under SunOS, +and then later convert them to new format filesystems when you are +satisfied that \*(4B is running properly. +In any case, +.B +you must build an old-style root filesystem +.R +so that the SunOS boot program will work. +.IP 4) +Mount the new root, then copy the SunOS +.Pn /boot +into place and use the SunOS ``installboot'' program +to enable disk-based booting. +Note that the filesystem must be mounted when you do the ``installboot'': +.DS +.ft CW +# mount /dev/sd3a /mnt +# cp /boot /mnt/boot +# cd /usr/kvm/mdec +# installboot /mnt/boot bootsd /dev/rsd3a +.DE +The SunOS +.Pn /boot +will load \*(4B kernels; there is no SPARCstation +bootstrap code on the distribution. Note that the SunOS +.Pn /boot +does not handle the new \*(4B filesystem format. +.IP 5) +Restore the contents of the \*(4B root filesystem. +.DS +.ft CW +# cd /mnt +# rrestore xf tapehost:/dev/nrst0 +.DE +.IP 6) +Boot the supplied kernel: +.DS +.ft CW +# halt +ok boot sd(0,3)kernel -s [for old proms] OR +ok boot disk3 -s [for new proms] +\&... [\*(4B boot messages] +.DE +.LP +To install the remaining filesystems, use the procedure described +starting in section 2.5. +In these instructions, +.Pn /usr +should be loaded into the ``e'' partition and +.Pn /var +in the ``f'' partition. +.LP +After completing the filesystem installation you may want +to set up \*(4B to reboot automatically: +.DS +.ft CW +# halt +ok setenv boot-from sd(0,3)kernel [for old proms] OR +ok setenv boot-device disk3 [for new proms] +.DE +If you build backwards-compatible filesystems, either with the SunOS +newfs or with the \*(4B ``\-O'' option, you can mount these under +SunOS. The SunOS fsck will, however, always think that these filesystems +are corrupted, as there are several new (previously unused) +superblock fields that are updated in \*(4B. Running ``fsck \-b32'' +and letting it ``fix'' the superblock will take care of this. +.sp 0.5 +If you wish to run SunOS binaries that use SunOS shared libraries, you +simply need to copy all the dynamic linker files from an existing +SunOS system: +.DS +.ft CW +# rcp sunos-host:/etc/ld.so.cache /etc/ +# rcp sunos-host:'/usr/lib/*.so*' /usr/lib/ +.DE +The SunOS compiler and linker should be able to produce SunOS binaries +under \*(4B, but this has not been tested. If you plan to try it you +will need the appropriate .sa files as well. +.Sh 2 "Booting the DECstation" +.Sh 3 "Supported hardware" +.LP +The hardware supported by \*(4B for the DECstation is as follows: +.TS +center box; +lw(1i) lw(4i). +CPU's T{ +R2000 based (3100) and +R3000 based (5000/200, 5000/20, 5000/25, 5000/1xx). +T} +_ +DISK's T{ +SCSI-I (tested RZ23, RZ55, RZ57, Maxtor 8760S). +T} +_ +TAPE's T{ +SCSI-I (tested DEC TK50, Archive DAT, Emulex MT02). +T} +_ +RS232 T{ +Internal DEC dc7085 and AMD 8530 based interfaces. +T} +_ +NETWORK T{ +TURBOchannel PMAD-AA and internal LANCE based interfaces. +T} +_ +GRAPHICS T{ +Terminal emulation and raw frame buffer support for +3100 (color & monochrome), +TURBOchannel PMAG-AA, PMAG-BA, PMAG-DV. +T} +_ +INPUT T{ +Standard DEC keyboard (LK201) and mouse. +T} +_ +MISC T{ +Battery-backed real time clock, +internal and TURBOchannel PMAZ-AA SCSI interfaces. +T} +.TE +.LP +Major items that are not supported include the 5000/240 +(there is code but not compiled in or tested), +R4000 based machines, FDDI and audio interfaces. +Diskless machines are not supported but booting kernels and bootstrapping +over the network is supported on the 5000 series. +.Sh 3 "The procedure" +.PP +The first file on the distribution tape is a tar file that contains +four files. +The first step requires a running UNIX (or ULTRIX) system that can +be used to extract the tar archive from the first file on the tape. +The command: +.DS +.ft CW +tar xf /dev/rmt0 +.DE +will extract the following four files: +.DS +A) root.image: \fIdd\fP image of the root filesystem +B) kernel.tape: \fIdd\fP image for creating boot tapes +C) kernel.net: file for booting over the network +D) root.dump: \fIdump\fP image of the root filesystem +.DE +There are three basic ways a system can be bootstrapped corresponding to the +first three files. +You may want to read the section on bootstrapping the HP300 +since many of the steps are similar. +A spare, formatted SCSI disk is also useful. +.Sh 4 "Procedure A: copy root filesystem to disk" +.PP +This procedure is similar to the HP300. +If you have an extra disk, the easiest approach is to use \fIdd\fP\|(1) +under ULTRIX to copy the root filesystem image to the beginning +of the spare disk. +The root filesystem image includes a disklabel and bootblock along with the +root filesystem. +An example command to copy the image to the beginning of a disk is: +.DS +.ft CW +dd if=root.image of=/dev/rz1c bs=\*(Bzb +.DE +The actual special file syntax will vary depending on unit numbers and +the version of ULTRIX that is running. +This system is now ready to boot. You can boot the kernel with one of the +following PROM commands. If you are booting on a 3100, the disk must be SCSI +id zero because of a bug. +.DS +.ft CW +DEC 3100: boot \-f rz(0,0,0)kernel +DEC 5000: boot 5/rz0/kernel +.DE +You can then proceed to section 2.5 +to create reasonable disk partitions for your machine +and then install the rest of the system. +.Sh 4 "Procedure B: bootstrap from tape" +.PP +If you have only a single machine with a single disk, +you need to use the more difficult approach of booting a +kernel and mini-root from tape or the network, and using it to restore +the root filesystem. +.PP +First, you will need to create a boot tape. This can be done using +\fIdd\fP as in the following example. +.DS +.ft CW +dd if=kernel.tape of=/dev/nrmt0 bs=1b +dd if=root.dump of=/dev/nrmt0 bs=\*(Bzb +.DE +The actual special file syntax for the tape drive will vary depending on +unit numbers, tape device and the version of ULTRIX that is running. +.PP +The first file on the boot tape contains a boot header, kernel, and +mini-root filesystem that the PROM can copy into memory. +Installing from tape has only been tested +on a 3100 and a 5000/200 using a TK50 tape drive. Here are two example +PROM commands to boot from tape. +.DS +.ft CW +DEC 3100: boot \-f tz(0,5,0) m # 5 is the SCSI id of the TK50 +DEC 5000: boot 5/tz6 m # 6 is the SCSI id of the TK50 +.DE +The `m' argument tells the kernel to look for a root filesystem in memory. +Next you should proceed to section 2.4.3 to build a disk-based root filesystem. +.Sh 4 "Procedure C: bootstrap over the network" +.PP +You will need a host machine that is running the \fIbootp\fP server +with the +.Pn kernel.net +file installed in the default directory defined by the +configuration file for +.Xr bootp . +Here are two example PROM commands to boot across the net: +.DS +.ft CW +DEC 3100: boot \-f tftp()kernel.net m +DEC 5000: boot 6/tftp/kernel.net m +.DE +This command should load the kernel and mini-root into memory and +run the same as the tape install (procedure B). +The rest of the steps are the same except +you will need to start the network +(if you are unsure how to fill in the <name> fields below, +see sections 4.4 and 5). +Execute the following to start the networking: +.DS +.ft CW +# mount \-uw / +# echo 127.0.0.1 localhost >> /etc/hosts +# echo <your.host.inet.number> myname.my.domain myname >> /etc/hosts +# echo <friend.host.inet.number> myfriend.my.domain myfriend >> /etc/hosts +# ifconfig le0 inet myname +.DE +Next you should proceed to section 2.4.3 to build a disk-based root filesystem. +.Sh 3 "Label disk and create the root filesystem" +.LP +There are five steps to create a disk-based root filesystem. +.IP 1) +Label the disk. +.DS +.ft CW +# disklabel -W /dev/rrz?c # This enables writing the label +# disklabel -w -r -B /dev/rrz?c $DISKTYPE +# newfs /dev/rrz?a +\&... +# fsck /dev/rrz?a +\&... +.DE +Supported disk types are listed in +.Pn /etc/disktab . +.IP 2) +Restore the root filesystem. +.DS +.ft CW +# mount \-uw / +# mount /dev/rz?a /a +# cd /a +.DE +.ti +0.4i +If you are restoring locally (procedure B), run: +.DS +.ft CW +# mt \-f /dev/nrmt0 rew +# restore \-xsf 2 /dev/rmt0 +.DE +.ti +0.4i +If you are restoring across the net (procedure c), run: +.DS +.ft CW +# rrestore xf myfriend:/path/to/root.dump +.DE +.ti +0.4i +When the restore finishes, clean up with: +.DS +.ft CW +# cd / +# sync +# umount /a +# fsck /dev/rz?a +.DE +.IP 3) +Reset the system and initialize the PROM monitor to boot automatically. +.DS +.ft CW +DEC 3100: setenv bootpath boot \-f rz(0,?,0)kernel +DEC 5000: setenv bootpath 5/rz?/kernel -a +.DE +.IP 4) +After booting UNIX, you will need to create +.Pn /dev/mouse +to run X Window System as in the following example. +.DS +.ft CW +rm /dev/mouse +ln /dev/xx /dev/mouse +.DE +The 'xx' should be one of the following: +.DS +pm0 raw interface to PMAX graphics devices +cfb0 raw interface to TURBOchannel PMAG-BA color frame buffer +xcfb0 raw interface to maxine graphics devices +mfb0 raw interface to mono graphics devices +.DE +You can then proceed to section 2.5 to install the rest of the system. +Note that where the disk name ``sd'' is used throughout section 2.5, +you should substitute the name ``rz''. +.Sh 2 "Disk configuration" +.PP +All architectures now have a root filesystem up and running and +proceed from this point to layout filesystems to make use +of the available space and to balance disk load for better system +performance. +.Sh 3 "Disk naming and divisions" +.PP +Each physical disk drive can be divided into up to 8 partitions; +UNIX typically uses only 3 or 4 partitions. +For instance, the first partition, \*(Dk0a, +is used for a root filesystem, a backup thereof, +or a small filesystem like, +.Pn /var/tmp ; +the second partition, \*(Dk0b, +is used for paging and swapping; and +a third partition, typically \*(Dk0e, +holds a user filesystem. +.PP +The space available on a disk varies per device. +Each disk typically has a paging area of 30 to 100 megabytes +and a root filesystem of about 17 megabytes. +.\" XXX check +The distributed system binaries occupy about 150 (180 with X11R5) megabytes +.\" XXX check +while the major sources occupy another 250 (340 with X11R5) megabytes. +The +.Pn /var +filesystem as delivered on the tape is only 2Mb, +however it should have at least 50Mb allocated to it just for +normal system activity. +Usually it is allocated the last partition on the disk +so that it can provide as much space as possible to the +.Pn /var/users +filesystem. +See section 2.5.4 for further details on disk layouts. +.PP +Be aware that the disks have their sizes +measured in disk sectors (usually 512 bytes), while the UNIX filesystem +blocks are variable sized. +If +.Sm BLOCKSIZE=1k +is set in the user's environment, all user programs report +disk space in kilobytes, otherwise, +disk sizes are always reported in units of 512-byte sectors\**. +.FS +You can thank System V intransigence and POSIX duplicity for +requiring that 512-byte blocks be the units that programs report. +.FE +The +.Pn /etc/disktab +file used in labelling disks and making filesystems +specifies disk partition sizes in sectors. +.Sh 3 "Layout considerations" +.PP +There are several considerations in deciding how +to adjust the arrangement of things on your disks. +The most important is making sure that there is adequate space +for what is required; secondarily, throughput should be maximized. +Paging space is an important parameter. +The system, as distributed, sizes the configured +paging areas each time the system is booted. Further, +multiple paging areas of different sizes may be interleaved. +.PP +Many common system programs (C, the editor, the assembler etc.) +create intermediate files in the +.Pn /tmp +directory, so the filesystem where this is stored also should be made +large enough to accommodate most high-water marks. +Typically, +.Pn /tmp +is constructed from a memory-based filesystem (see +.Xr mount_mfs (8)). +Programs that want their temporary files to persist +across system reboots (such as editors) should use +.Pn /var/tmp . +If you plan to use a disk-based +.Pn /tmp +filesystem to avoid loss across system reboots, it makes +sense to mount this in a ``root'' (i.e. first partition) +filesystem on another disk. +All the programs that create files in +.Pn /tmp +take care to delete them, but are not immune to rare events +and can leave dregs. +The directory should be examined every so often and the old +files deleted. +.PP +The efficiency with which UNIX is able to use the CPU +is often strongly affected by the configuration of disk controllers; +it is critical for good performance to balance disk load. +There are at least five components of the disk load that you can +divide between the available disks: +.IP 1) +The root filesystem. +.IP 2) +The +.Pn /var +and +.Pn /var/tmp +filesystems. +.IP 3) +The +.Pn /usr +filesystem. +.IP 4) +The user filesystems. +.IP 5) +The paging activity. +.LP +The following possibilities are ones we have used at times +when we had 2, 3 and 4 disks: +.TS +center doublebox; +l | c s s +l | lw(5) | lw(5) | lw(5). + disks +what 2 3 4 +_ +root 0 0 0 +var 1 2 3 +usr 1 1 1 +paging 0+1 0+2 0+2+3 +users 0 0+2 0+2 +archive x x 3 +.TE +.PP +The most important things to consider are to +even out the disk load as much as possible, and to do this by +decoupling filesystems (on separate arms) between which heavy copying occurs. +Note that a long term average balanced load is not important; it is +much more important to have an instantaneously balanced +load when the system is busy. +.PP +Intelligent experimentation with a few filesystem arrangements can +pay off in much improved performance. It is particularly easy to +move the root, the +.Pn /var +and +.Pn /var/tmp +filesystems and the paging areas. Place the +user files and the +.Pn /usr +directory as space needs dictate and experiment +with the other, more easily moved filesystems. +.Sh 3 "Filesystem parameters" +.PP +Each filesystem is parameterized according to its block size, +fragment size, and the disk geometry characteristics of the +medium on which it resides. Inaccurate specification of the disk +characteristics or haphazard choice of the filesystem parameters +can result in substantial throughput degradation or significant +waste of disk space. As distributed, +filesystems are configured according to the following table. +.DS +.TS +center; +l l l. +Filesystem Block size Fragment size +_ +root 8 kbytes 1 kbytes +usr 8 kbytes 1 kbytes +users 4 kbytes 512 bytes +.TE +.DE +.PP +The root filesystem block size is +made large to optimize bandwidth to the associated disk. +The large block size is important as many of the most +heavily used programs are demand paged out of the +.Pn /bin +directory. +The fragment size of 1 kbyte is a ``nominal'' value to use +with a filesystem. With a 1 kbyte fragment size +disk space utilization is about the same +as with the earlier versions of the filesystem. +.PP +The filesystems for users have a 4 kbyte block +size with 512 byte fragment size. These parameters +have been selected based on observations of the +performance of our user filesystems. The 4 kbyte +block size provides adequate bandwidth while the +512 byte fragment size provides acceptable space compaction +and disk fragmentation. +.PP +Other parameters may be chosen in constructing filesystems, +but the factors involved in choosing a block +size and fragment size are many and interact in complex +ways. Larger block sizes result in better +throughput to large files in the filesystem as +larger I/O requests will then be done by the +system. However, +consideration must be given to the average file sizes +found in the filesystem and the performance of the +internal system buffer cache. The system +currently provides space in the inode for +12 direct block pointers, 1 single indirect block +pointer, 1 double indirect block pointer, +and 1 triple indirect block pointer. +If a file uses only direct blocks, access time to +it will be optimized by maximizing the block size. +If a file spills over into an indirect block, +increasing the block size of the filesystem may +decrease the amount of space used +by eliminating the need to allocate an indirect block. +However, if the block size is increased and an indirect +block is still required, then more disk space will be +used by the file because indirect blocks are allocated +according to the block size of the filesystem. +.PP +In selecting a fragment size for a filesystem, at least +two considerations should be given. The major performance +tradeoffs observed are between an 8 kbyte block filesystem +and a 4 kbyte block filesystem. Because of implementation +constraints, the block size versus fragment size ratio can not +be greater than 8. This means that an 8 kbyte filesystem +will always have a fragment size of at least 1 kbytes. If +a filesystem is created with a 4 kbyte block size and a +1 kbyte fragment size, then upgraded to an 8 kbyte block size +and 1 kbyte fragment size, identical space compaction will be +observed. However, if a filesystem has a 4 kbyte block size +and 512 byte fragment size, converting it to an 8K/1K +filesystem will result in 4-8% more space being +used. This implies that 4 kbyte block filesystems that +might be upgraded to 8 kbyte blocks for higher performance should +use fragment sizes of at least 1 kbytes to minimize the amount +of work required in conversion. +.PP +A second, more important, consideration when selecting the +fragment size for a filesystem is the level of fragmentation +on the disk. With an 8:1 fragment to block ratio, storage fragmentation +occurs much sooner, particularly with a busy filesystem running +near full capacity. By comparison, the level of fragmentation in a +4:1 fragment to block ratio filesystem is one tenth as severe. This +means that on filesystems where many files are created and +deleted, the 512 byte fragment size is more likely to result in apparent +space exhaustion because of fragmentation. That is, when the filesystem +is nearly full, file expansion that requires locating a +contiguous area of disk space is more likely to fail on a 512 +byte filesystem than on a 1 kbyte filesystem. To minimize +fragmentation problems of this sort, a parameter in the super +block specifies a minimum acceptable free space threshold. When +normal users (i.e. anyone but the super-user) attempt to allocate +disk space and the free space threshold is exceeded, the user is +returned an error as if the filesystem were really full. This +parameter is nominally set to 5%; it may be changed by supplying +a parameter to +.Xr newfs (8), +or by updating the super block of an existing filesystem using +.Xr tunefs (8). +.PP +Finally, a third, less common consideration is the attributes of +the disk itself. The fragment size should not be smaller than the +physical sector size of the disk. As an example, the HP magneto-optical +disks have 1024 byte physical sectors. Using a 512 byte fragment size +on such disks will work but is extremely inefficient. +.PP +Note that the above discussion considers block sizes of up to only 8k. +As of the 4.4 release, the maximum block size has been increased to 64k. +This allows an entirely new set of block/fragment combinations for which +there is little experience to date. +In general though, unless a filesystem is to be used +for a special purpose application (for example, storing +image processing data), we recommend using the +values supplied above. +Remember that the current +implementation limits the block size to at most 64 kbytes +and the ratio of block size versus fragment size must be 1, 2, 4, or 8. +.PP +The disk geometry information used by the filesystem +affects the block layout policies employed. The file +.Pn /etc/disktab , +as supplied, contains the data for most +all drives supported by the system. Before constructing +a filesystem with +.Xr newfs (8) +you should label the disk (if it has not yet been labeled, +and the driver supports labels). +If labels cannot be used, you must instead +specify the type of disk on which the filesystem resides; +.Xr newfs +then reads +.Pn /etc/disktab +instead of the pack label. +This file also contains the default +filesystem partition +sizes, and default block and fragment sizes. To +override any of the default values you can modify the file, +edit the disk label, +or use an option to +.Xr newfs . +.Sh 3 "Implementing a layout" +.PP +To put a chosen disk layout into effect, you should use the +.Xr newfs (8) +command to create each new filesystem. +Each filesystem must also be added to the file +.Pn /etc/fstab +so that it will be checked and mounted when the system is bootstrapped. +.PP +First we will consider a system with a single disk. +There is little real choice on how to do the layout; +the root filesystem goes in the ``a'' partition, +.Pn /usr +goes in the ``e'' partition, and +.Pn /var +fills out the remainder of the disk in the ``f'' partition. +This is the organization used if you loaded the disk-image root filesystem. +With the addition of a memory-based +.Pn /tmp +filesystem, its fstab entry would be as follows: +.TS +center; +lfC lfC l l n n. +/dev/\*(Dk0a / ufs rw 1 1 +/dev/\*(Dk0b none swap sw 0 0 +/dev/\*(Dk0b /tmp mfs rw,-s=14000,-b=8192,-f=1024,-T=sd660 0 0 +/dev/\*(Dk0e /usr ufs ro 1 2 +/dev/\*(Dk0f /var ufs rw 1 2 +.TE +.PP +If we had a second disk, we would split the load between the drives. +On the second disk, we place the +.Pn /usr +and +.Pn /var +filesystems in their usual \*(Dk1e and \*(Dk1f +partitions respectively. +The \*(Dk1b partition would be used as a second paging area, +and the \*(Dk1a partition left as a spare root filesystem +(alternatively \*(Dk1a could be used for +.Pn /var/tmp ). +The first disk still holds the +the root filesystem in \*(Dk0a, and the primary swap area in \*(Dk0b. +The \*(Dk0e partition is used to hold home directories in +.Pn /var/users . +The \*(Dk0f partition can be used for +.Pn /usr/src +or alternately the \*(Dk0e partition can be extended to cover +the rest of the disk with +.Xr disklabel (8). +As before, the +.Pn /tmp +directory is a memory-based filesystem. +Note that to interleave the paging between the two disks +you must build a system configuration that specifies: +.DS +config kernel root on \*(Dk0 swap on \*(Dk0 and \*(Dk1 +.DE +The +.Pn /etc/fstab +file would then contain +.TS +center; +lfC lfC l l n n. +/dev/\*(Dk0a / ufs rw 1 1 +/dev/\*(Dk0b none swap sw 0 0 +/dev/\*(Dk1b none swap sw 0 0 +/dev/\*(Dk0b /tmp mfs rw,-s=14000,-b=8192,-f=1024,-T=sd660 0 0 +/dev/\*(Dk1e /usr ufs ro 1 2 +/dev/\*(Dk0f /usr/src ufs rw 1 2 +/dev/\*(Dk1f /var ufs rw 1 2 +/dev/\*(Dk0e /var/users ufs rw 1 2 +.TE +.PP +To make the +.Pn /var +filesystem we would do: +.DS +\fB#\fP \fIcd /dev\fP +\fB#\fP \fIMAKEDEV \*(Dk1\fP +\fB#\fP \fIdisklabel -wr \*(Dk1 "disk type" "disk name"\fP +\fB#\fP \fInewfs \*(Dk1f\fP +(information about filesystem prints out) +\fB#\fP \fImkdir /var\fP +\fB#\fP \fImount /dev/\*(Dk1f /var\fP +.DE +.Sh 2 "Installing the rest of the system" +.PP +At this point you should have your disks partitioned. +The next step is to extract the rest of the data from the tape. +At a minimum you need to set up the +.Pn /var +and +.Pn /usr +filesystems. +You may also want to extract some or all the program sources. +Since not all architectures support tape drives or don't support the +correct ones, you may need to extract the files indirectly using +.Xr rsh (1). +For example, for a directly connected tape drive you might do: +.DS +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP +\fB#\fP \fItar xbpf \*(Bz /dev/nr\*(Mt0\fP +.DE +The equivalent indirect procedure (where the tape drive is on machine ``foo'') +is: +.DS +\fB#\fP \fIrsh foo mt -f /dev/nr\*(Mt0 fsf\fP +\fB#\fP \fIrsh foo dd if=/dev/nr\*(Mt0 bs=\*(Bzb | tar xbpf \*(Bz -\fP +.DE +Obviously, the target machine must be connected to the local network +for this to work. +To do this: +.DS +\fB#\fP \fIecho 127.0.0.1 localhost >> /etc/hosts\fP +\fB#\fP \fIecho \fPyour.host.inet.number myname.my.domain myname\fI >> /etc/hosts\fP +\fB#\fP \fIecho \fPfriend.host.inet.number myfriend.my.domain myfriend\fI >> /etc/hosts\fP +\fB#\fP \fIifconfig le0 inet \fPmyname +.DE +where the ``host.inet.number'' fields are the IP addresses for your host and +the host with the tape drive +and the ``my.domain'' fields are the names of your machine and the tape-hosting +machine. +See sections 4.4 and 5 for more information on setting up the network. +.PP +Assuming a directly connected tape drive, here is how to extract and +install +.Pn /var +and +.Pn /usr : +.br +.ne 5 +.TS +lw(2i) l. +\fB#\fP \fImount \-uw /dev/\*(Dk#a /\fP (read-write mount root filesystem) +\fB#\fP \fIdate yymmddhhmm\fP (set date, see \fIdate\fP\|(1)) +\&.... +\fB#\fP \fIpasswd -l root\fP (set password for super-user) +\fBNew password:\fP (password will not echo) +\fBRetype new password:\fP +\fB#\fP \fIpasswd -l toor\fP (set password for super-user) +\fBNew password:\fP (password will not echo) +\fBRetype new password:\fP +\fB#\fP \fIhostname mysitename\fP (set your hostname) +\fB#\fP \fInewfs r\*(Dk#p\fP (create empty user filesystem) +(\fI\*(Dk\fP is the disk type, \fI#\fP is the unit number, +\fIp\fP is the partition; this takes a few minutes) +\fB#\fP \fImount /dev/\*(Dk#p /var\fP (mount the var filesystem) +\fB#\fP \fIcd /var\fP (make /var the current directory) +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xbpf \*(Bz /dev/nr\*(Mt0\fP (extract all of var) +(this takes a few minutes) +\fB#\fP \fInewfs r\*(Dk#p\fP (create empty user filesystem) +(as before \fI\*(Dk\fP is the disk type, \fI#\fP is the unit number, +\fIp\fP is the partition) +\fB#\fP \fImount /dev/\*(Dk#p /mnt\fP (mount the new /usr in temporary location) +\fB#\fP \fIcd /mnt\fP (make /mnt the current directory) +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xbpf \*(Bz /dev/nr\*(Mt0\fP (extract all of usr except usr/src) +(this takes about 15-20 minutes) +\fB#\fP \fIcd /\fP (make / the current directory) +\fB#\fP \fIumount /mnt\fP (unmount from temporary mount point) +\fB#\fP \fIrm -r /usr/*\fP (remove excess bootstrap binaries) +\fB#\fP \fImount /dev/\*(Dk#p /usr\fP (remount /usr) +.TE +If no disk label has been installed on the disk, the +.Xr newfs +command will require a third argument to specify the disk type, +using one of the names in +.Pn /etc/disktab . +If the tape had been rewound or positioned incorrectly before the +.Xr tar , +to extract +.Pn /var +it may be repositioned by the following commands. +.DS +\fB#\fP \fImt -f /dev/nr\*(Mt0 rew\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf 1\fP +.DE +The data on the second and third tape files has now been extracted. +If you are using 6250bpi tapes, the first reel of the +distribution is no longer needed; you should now mount the second +reel instead. The installation procedure continues from this +point on the 8mm tape. +The next step is to extract the sources. +As previously noted, +.Pn /usr/src +.\" XXX Check +requires about 250-340Mb of space. +Ideally sources should be in a separate filesystem; +if you plan to put them into your +.Pn /usr +filesystem, it will need at least 500Mb of space. +Assuming that you will be using a separate filesystem on \*(Dk0f for +.Pn /usr/src , +you will start by creating and mounting it: +.DS +\fB#\fP \fInewfs \*(Dk0f\fP +(information about filesystem prints out) +\fB#\fP \fImkdir /usr/src\fP +\fB#\fP \fImount /dev/\*(Dk0f /usr/src\fP +.DE +.LP +First you will extract the kernel source: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +(this should only be done on Exabyte distributions) +\fB#\fP \fItar xpbf \*(Bz /dev/nr\*(Mt0\fP (extract the kernel sources) +(this takes about 15-30 minutes) +.TE +.DE +.LP +The next tar file contains the sources for the utilities. +It is extracted as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xpbf \*(Bz /dev/rmt12\fP (extract the utility source) +(this takes about 30-60 minutes) +.TE +.DE +.PP +If you are using 6250bpi tapes, the second reel of the +distribution is no longer needed; you should now mount the third +reel instead. The installation procedure continues from this +point on the 8mm tape. +.PP +The next tar file contains the sources for the contributed software. +It is extracted as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +(this should only be done on Exabyte distributions) +\fB#\fP \fItar xpbf \*(Bz /dev/rmt12\fP (extract the contributed software source) +(this takes about 30-60 minutes) +.TE +.DE +.PP +If you received a distribution on 8mm Exabyte tape, +there is one additional tape file on the distribution tape +that has not been installed to this point; it contains the +sources for X11R5 in +.Xr tar (1) +format. As distributed, X11R5 should be placed in +.Pn /usr/src/X11R5 . +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImt -f /dev/nr\*(Mt0 fsf\fP (space to end of previous tape file) +\fB#\fP \fItar xpbf \*(Bz /dev/nr\*(Mt0\fP (extract the X11R5 source) +(this takes about 30-60 minutes) +.TE +.DE +Many of the X11 utilities search using the path +.Pn /usr/X11 , +so be sure that you have a symbolic link that points at +the location of your X11 binaries (here, X11R5). +.PP +Having now completed the extraction of the sources, +you may want to verify that your +.Pn /usr/src +filesystem is consistent. +To do so, you must unmount it, and run +.Xr fsck (8); +assuming that you used \*(Dk0f you would proceed as follows: +.DS +.TS +lw(2i) l. +\fB#\fP \fIcd /\fP (change directory, back to the root) +\fB#\fP \fIumount /usr/src\fP (unmount /usr/src) +\fB#\fP \fIfsck /dev/r\*(Dk0f\fP +.TE +.DE +The output from +.Xr fsck +should look something like: +.DS +.B +** /dev/r\*(Dk0f +** Last Mounted on /usr/src +** Phase 1 - Check Blocks and Sizes +** Phase 2 - Check Pathnames +** Phase 3 - Check Connectivity +** Phase 4 - Check Reference Counts +** Phase 5 - Check Cyl groups +23000 files, 261000 used, 39000 free (2200 frags, 4600 blocks) +.R +.DE +.PP +If there are inconsistencies in the filesystem, you may be prompted +to apply corrective action; see the +.Xr fsck (8) +or \fIFsck \(en The UNIX File System Check Program\fP (SMM:3) for more details. +.PP +To use the +.Pn /usr/src +filesystem, you should now remount it with: +.DS +\fB#\fP \fImount /dev/\*(Dk0f /usr/src\fP +.DE +or if you have made an entry for it in +.Pn /etc/fstab +you can remount it with: +.DS +\fB#\fP \fImount /usr/src\fP +.DE +.Sh 2 "Additional conversion information" +.PP +After setting up the new \*(4B filesystems, you may restore the user +files that were saved on tape before beginning the conversion. +Note that the \*(4B +.Xr restore +program does its work on a mounted filesystem using normal system operations. +This means that filesystem dumps may be restored even +if the characteristics of the filesystem changed. +To restore a dump tape for, say, the +.Pn /a +filesystem something like the following would be used: +.DS +\fB#\fP \fImkdir /a\fP +\fB#\fP \fInewfs \*(Dk#p\fI +\fB#\fP \fImount /dev/\*(Dk#p /a\fP +\fB#\fP \fIcd /a\fP +\fB#\fP \fIrestore x\fP +.DE +.PP +If +.Xr tar +images were written instead of doing a dump, you should +be sure to use its `\-p' option when reading the files back. No matter +how you restore a filesystem, be sure to unmount it and check its +integrity with +.Xr fsck (8) +when the job is complete. diff --git a/share/doc/smm/01.setup/3.t b/share/doc/smm/01.setup/3.t new file mode 100644 index 0000000..5b0afd4 --- /dev/null +++ b/share/doc/smm/01.setup/3.t @@ -0,0 +1,1996 @@ +.\" Copyright (c) 1980, 1986, 1988, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" @(#)3.t 8.1 (Berkeley) 7/27/93 +.\" +.ds lq `` +.ds rq '' +.ds RH "Upgrading a \*(Ps System +.ds CF \*(Dy +.Sh 1 "Upgrading a \*(Ps system" +.PP +This section describes the procedure for upgrading a \*(Ps +system to \*(4B. This procedure may vary according to the version of +the system running before conversion. +If you are converting from a +System V system, some of this section will still apply (in particular, +the filesystem conversion). However, many of the system configuration +files are different, and the executable file formats are completely +incompatible. +.PP +In particular be wary when using this information to upgrade +a \*(Ps HP300 system. +There are at least four different versions of ``\*(Ps'' out there: +.IP 1) +HPBSD 1.x from Utah. +.br +This was the original version of \*(Ps for HP300s from which the +other variants (and \*(4B) are derived. +It is largely a \*(Ps system with Sun's NFS 3.0 filesystem code and +some \*(Ps-Tahoe features (e.g. networking code). +Since the filesystem code is 4.2/4.3 vintage and the filesystem +hierarchy is largely \*(Ps, most of this section should apply. +.IP 2) +MORE/bsd from Mt. Xinu. +.br +This is a \*(Ps-Tahoe vintage system with Sun's NFS 4.0 filesystem code +upgraded with Tahoe UFS features. +The instructions for \*(Ps-Tahoe should largely apply. +.IP 3) +\*(Ps-Reno from CSRG. +.br +At least one site bootstrapped HP300 support from the Reno distribution. +The Reno filesystem code was somewhere between \*(Ps and \*(4B: the VFS switch +had been added but many of the UFS features (e.g. ``inline'' symlinks) +were missing. +The filesystem hierarchy reorganization first appeared in this release. +Be extremely careful following these instructions if you are +upgrading from the Reno distribution. +.IP 4) +HPBSD 2.0 from Utah. +.br +As if things were not bad enough already, +this release has the \*(4B filesystem and networking code +as well as some utilities, but still has a \*(Ps hierarchy. +No filesystem conversions are necessary for this upgrade, +but files will still need to be moved around. +.Sh 2 "Installation overview" +.PP +If you are running \*(Ps, upgrading your system +involves replacing your kernel and system utilities. +In general, there are three possible ways to install a new \*(Bs distribution: +(1) boot directly from the distribution tape, use it to load new binaries +onto empty disks, and then merge or restore any existing configuration files +and filesystems; +(2) use an existing \*(Ps or later system to extract the root and +.Pn /usr +filesystems from the distribution tape, +boot from the new system, then merge or restore existing +configuration files and filesystems; or +(3) extract the sources from the distribution tape onto an existing system, +and use that system to cross-compile and install \*(4B. +For this release, the second alternative is strongly advised, +with the third alternative reserved as a last resort. +In general, older binaries will continue to run under \*(4B, +but there are many exceptions that are on the critical path +for getting the system running. +Ideally, the new system binaries (root and +.Pn /usr +filesystems) should be installed on spare disk partitions, +then site-specific files should be merged into them. +Once the new system is up and fully merged, the previous root and +.Pn /usr +filesystems can be reused. +Other existing filesystems can be retained and used, +except that (as usual) the new +.Xr fsck +should be run before they are mounted. +.PP +It is \fBSTRONGLY\fP advised that you make full dumps of each filesystem +before beginning, especially any that you intend to modify in place +during the merge. +It is also desirable to run filesystem checks +of all filesystems to be converted to \*(4B before shutting down. +This is an excellent time to review your disk configuration +for possible tuning of the layout. +Most systems will need to provide a new filesystem for system use +mounted on +.Pn /var +(see below). +However, the +.Pn /tmp +filesystem can be an MFS virtual-memory-resident filesystem, +potentially freeing an existing disk partition. +(Additional swap space may be desirable as a consequence.) +See +.Xr mount_mfs (8). +.PP +The recommended installation procedure includes the following steps. +The order of these steps will probably vary according to local needs. +.IP \(bu +Extract root and +.Pn /usr +filesystems from the distribution tapes. +.IP \(bu +Extract kernel and/or user-level sources from the distribution tape +if space permits. +This can serve as the backup documentation as needed. +.IP \(bu +Configure and boot a kernel for the local system. +This can be delayed if the generic kernel from the distribution +supports enough hardware to proceed. +.IP \(bu +Build a skeletal +.Pn /var +filesystem (see +.Xr mtree (8)). +.IP \(bu +Merge site-dependent configuration files from +.Pn /etc +and +.Pn /usr/lib +into the new +.Pn /etc +directory. +Note that many file formats and contents have changed; see section 3.4 +of this document. +.IP \(bu +Copy or merge files from +.Pn /usr/adm , +.Pn /usr/spool , +.Pn /usr/preserve , +.Pn /usr/lib , +and other locations into +.Pn /var . +.IP \(bu +Merge local macros, dictionaries, etc. into +.Pn /usr/share . +.IP \(bu +Merge and update local software to reflect the system changes. +.IP \(bu +Take off the rest of the morning, you've earned it! +.PP +Section 3.2 lists the files to be saved as part of the conversion process. +Section 3.3 describes the bootstrap process. +Section 3.4 discusses the merger of the saved files back into the new system. +Section 3.5 gives an overview of the major +bug fixes and changes between \*(Ps and \*(4B. +Section 3.6 provides general hints on possible problems to be +aware of when converting from \*(Ps to \*(4B. +.Sh 2 "Files to save" +.PP +The following list enumerates the standard set of files you will want to +save and suggests directories in which site-specific files should be present. +This list will likely be augmented with non-standard files you +have added to your system. +If you do not have enough space to create parallel +filesystems, you should create a +.Xr tar +image of the following files before the new filesystems are created. +The rest of this subsection describes where theses files +have moved and how they have changed. +.TS +lfC c l. +/.cshrc \(dg root csh startup script (moves to \f(CW/root/.cshrc\fP) +/.login \(dg root csh login script (moves to \f(CW/root/.login\fP) +/.profile \(dg root sh startup script (moves to \f(CW/root/.profile\fP) +/.rhosts \(dg for trusted machines and users (moves to \f(CW/root/.rhosts\fP) +/etc/disktab \(dd in case you changed disk partition sizes +/etc/fstab * disk configuration data +/etc/ftpusers \(dg for local additions +/etc/gettytab \(dd getty database +/etc/group * group data base +/etc/hosts \(dg for local host information +/etc/hosts.equiv \(dg for local host equivalence information +/etc/hosts.lpd \(dg printer access file +/etc/inetd.conf * Internet services configuration data +/etc/named* \(dg named configuration files +/etc/netstart \(dg network initialization +/etc/networks \(dg for local network information +/etc/passwd * user data base +/etc/printcap * line printer database +/etc/protocols \(dd in case you added any local protocols +/etc/rc * for any local additions +/etc/rc.local * site specific system startup commands +/etc/remote \(dg auto-dialer configuration +/etc/services \(dd for local additions +/etc/shells \(dd list of valid shells +/etc/syslog.conf * system logger configuration +/etc/securettys * merged into ttys +/etc/ttys * terminal line configuration data +/etc/ttytype * merged into ttys +/etc/termcap \(dd for any local entries that may have been added +/lib \(dd for any locally developed language processors +/usr/dict/* \(dd for local additions to words and papers +/usr/include/* \(dd for local additions +/usr/lib/aliases * mail forwarding data base (moves to \f(CW/etc/aliases\fP) +/usr/lib/crontab * cron daemon data base (moves to \f(CW/etc/crontab\fP) +/usr/lib/crontab.local * local cron daemon data base (moves to \f(CW/etc/crontab.local\fP) +/usr/lib/lib*.a \(dg for local libraries +/usr/lib/mail.rc \(dg system-wide mail(1) initialization (moves to \f(CW/etc/mail.rc\fP) +/usr/lib/sendmail.cf * sendmail configuration (moves to \f(CW/etc/sendmail.cf\fP) +/usr/lib/tmac/* \(dd for locally developed troff/nroff macros (moves to \f(CW/usr/share/tmac/*\fP) +/usr/lib/uucp/* \(dg for local uucp configuration files +/usr/man/manl * for manual pages for locally developed programs (moves to \f(CW/usr/local/man\fP) +/usr/spool/* \(dg for current mail, news, uucp files, etc. (moves to \f(CW/var/spool\fP) +/usr/src/local \(dg for source for locally developed programs +/sys/conf/HOST \(dg configuration file for your machine (moves to \f(CW/sys/<arch>/conf\fP) +/sys/conf/files.HOST \(dg list of special files in your kernel (moves to \f(CW/sys/<arch>/conf\fP) +/*/quotas * filesystem quota files (moves to \f(CW/*/quotas.user\fP) +.TE +.DS +\(dg\|Files that can be used from \*(Ps without change. +\(dd\|Files that need local changes merged into \*(4B files. +*\|Files that require special work to merge and are discussed in section 3.4. +.DE +.Sh 2 "Installing \*(4B" +.PP +The next step is to build a working \*(4B system. +This can be done by following the steps in section 2 of +this document for extracting the root and +.Pn /usr +filesystems from the distribution tape onto unused disk partitions. +For the SPARC, the root filesystem dump on the tape could also be +extracted directly. +For the HP300 and DECstation, the raw disk image can be copied +into an unused partition and this partition can then be dumped +to create an image that can be restored. +The exact procedure chosen will depend on the disk configuration +and the number of suitable disk partitions that may be used. +It is also desirable to run filesystem checks +of all filesystems to be converted to \*(4B before shutting down. +In any case, this is an excellent time to review your disk configuration +for possible tuning of the layout. +Section 2.5 and +.Xr config (8) +are required reading. +.LP +The filesystem in \*(4B has been reorganized in an effort to +meet several goals: +.IP 1) +The root filesystem should be small. +.IP 2) +There should be a per-architecture centrally-shareable read-only +.Pn /usr +filesystem. +.IP 3) +Variable per-machine directories should be concentrated below +a single mount point named +.Pn /var . +.IP 4) +Site-wide machine independent shareable text files should be separated +from architecture specific binary files and should be concentrated below +a single mount point named +.Pn /usr/share . +.LP +These goals are realized with the following general layouts. +The reorganized root filesystem has the following directories: +.TS +lfC l. +/etc (config files) +/bin (user binaries needed when single-user) +/sbin (root binaries needed when single-user) +/local (locally added binaries used only by this machine) +/tmp (mount point for memory based filesystem) +/dev (local devices) +/home (mount point for AMD) +/var (mount point for per-machine variable directories) +/usr (mount point for multiuser binaries and files) +.TE +.LP +The reorganized +.Pn /usr +filesystem has the following directories: +.TS +lfC l. +/usr/bin (user binaries) +/usr/contrib (software contributed to \*(4B) +/usr/games (binaries for games, score files in \f(CW/var\fP) +/usr/include (standard include files) +/usr/lib (lib*.a from old \f(CW/usr/lib\fP) +/usr/libdata (databases from old \f(CW/usr/lib\fP) +/usr/libexec (executables from old \f(CW/usr/lib\fP) +/usr/local (locally added binaries used site-wide) +/usr/old (deprecated binaries) +/usr/sbin (root binaries) +/usr/share (mount point for site-wide shared text) +/usr/src (mount point for sources) +.TE +.LP +The reorganized +.Pn /usr/share +filesystem has the following directories: +.TS +lfC l. +/usr/share/calendar (various useful calendar files) +/usr/share/dict (dictionaries) +/usr/share/doc (\*(4B manual sources) +/usr/share/games (games text files) +/usr/share/groff_font (groff font information) +/usr/share/man (typeset manual pages) +/usr/share/misc (dumping ground for random text files) +/usr/share/mk (templates for \*(4B makefiles) +/usr/share/skel (template user home directory files) +/usr/share/tmac (various groff macro packages) +/usr/share/zoneinfo (information on time zones) +.TE +.LP +The reorganized +.Pn /var +filesystem has the following directories: +.TS +lfC l. +/var/account (accounting files, formerly \f(CW/usr/adm\fP) +/var/at (\fIat\fP\|(1) spooling area) +/var/backups (backups of system files) +/var/crash (crash dumps) +/var/db (system-wide databases, e.g. tags) +/var/games (score files) +/var/log (log files) +/var/mail (users mail) +/var/obj (hierarchy to build \f(CW/usr/src\fP) +/var/preserve (preserve area for vi) +/var/quotas (directory to store quota files) +/var/run (directory to store *.pid files) +/var/rwho (rwho databases) +/var/spool/ftp (home directory for anonymous ftp) +/var/spool/mqueue (sendmail spooling directory) +/var/spool/news (news spooling area) +/var/spool/output (printer spooling area) +/var/spool/uucp (uucp spooling area) +/var/tmp (disk-based temporary directory) +/var/users (root of per-machine user home directories) +.TE +.PP +The \*(4B bootstrap routines pass the identity of the boot device +through to the kernel. +The kernel then uses that device as its root filesystem. +Thus, for example, if you boot from +.Pn /dev/\*(Dk1a , +the kernel will use +.Pn \*(Dk1a +as its root filesystem. If +.Pn /dev/\*(Dk1b +is configured as a swap partition, +it will be used as the initial swap area, +otherwise the normal primary swap area (\c +.Pn /dev/\*(Dk0b ) +will be used. +The \*(4B bootstrap is backward compatible with \*(Ps, +so you can replace your old bootstrap if you use it +to boot your first \*(4B kernel. +However, the \*(Ps bootstrap cannot access \*(4B filesystems, +so if you plan to convert your filesystems to \*(4B, +you must install a new bootstrap \fIbefore\fP doing the conversion. +Note that SPARC users cannot build a \*(4B compatible version +of the bootstrap, so must \fInot\fP convert their root filesystem +to the new \*(4B format. +.PP +Once you have extracted the \*(4B system and booted from it, +you will have to build a kernel customized for your configuration. +If you have any local device drivers, +they will have to be incorporated into the new kernel. +See section 4.1.3 and ``Building 4.3BSD UNIX Systems with Config'' (SMM:2). +.PP +If converting from \*(Ps, your old filesystems should be converted. +If you've modified the partition +sizes from the original \*(Ps ones, and are not already using the +\*(4B disk labels, you will have to modify the default disk partition +tables in the kernel. Make the necessary table changes and boot +your custom kernel \fBBEFORE\fP trying to access any of your old +filesystems! After doing this, if necessary, the remaining filesystems +may be converted in place by running the \*(4B version of +.Xr fsck (8) +on each filesystem and allowing it to make the necessary corrections. +The new version of +.Xr fsck +is more strict about the size of directories than +the version supplied with \*(Ps. +Thus the first time that it is run on a \*(Ps filesystem, +it will produce messages of the form: +.DS +\fBDIRECTORY ...: LENGTH\fP xx \fBNOT MULTIPLE OF 512 (ADJUSTED)\fP +.DE +Length ``xx'' will be the size of the directory; +it will be expanded to the next multiple of 512 bytes. +The new +.Xr fsck +will also set default \fIinterleave\fP and +\fInpsect\fP (number of physical sectors per track) values on older +filesystems, in which these fields were unused spares; this correction +will produce messages of the form: +.DS +\fBIMPOSSIBLE INTERLEAVE=0 IN SUPERBLOCK (SET TO DEFAULT)\fP\** +\fBIMPOSSIBLE NPSECT=0 IN SUPERBLOCK (SET TO DEFAULT)\fP +.DE +.FS +The defaults are to set \fIinterleave\fP to 1 and +\fInpsect\fP to \fInsect\fP. +This is correct on most drives; +it affects only performance (usually virtually unmeasurably). +.FE +Filesystems that have had their interleave and npsect values +set will be diagnosed by the old +.Xr fsck +as having a bad superblock; the old +.Xr fsck +will run only if given an alternate superblock +(\fIfsck \-b32\fP), +in which case it will re-zero these fields. +The \*(4B kernel will internally set these fields to their defaults +if fsck has not done so; again, the \fI\-b32\fP option may be +necessary for running the old +.Xr fsck . +.PP +In addition, \*(4B removes several limits on filesystem sizes +that were present in \*(Ps. +The limited filesystems +continue to work in \*(4B, but should be converted +as soon as it is convenient +by running +.Xr fsck +with the \fI\-c 2\fP option. +The sequence \fIfsck \-p \-c 2\fP will update them all, +fix the interleave and npsect fields, +fix any incorrect directory lengths, +expand maximum uid's and gid's to 32-bits, +place symbolic links less than 60 bytes into their inode, +and fill in directory type fields all at once. +The new filesystem formats are incompatible with older systems. +If you wish to continue using these filesystems with the older +systems you should make only the compatible changes using +\fIfsck \-c 1\fP. +.Sh 2 "Merging your files from \*(Ps into \*(4B" +.PP +When your system is booting reliably and you have the \*(4B root and +.Pn /usr +filesystems fully installed you will be ready +to continue with the next step in the conversion process, +merging your old files into the new system. +.PP +If you saved the files on a +.Xr tar +tape, extract them into a scratch directory, say +.Pn /usr/convert : +.DS +\fB#\fP \fImkdir /usr/convert\fP +\fB#\fP \fIcd /usr/convert\fP +\fB#\fP \fItar xp\fP +.DE +.PP +The data files marked in the previous table with a dagger (\(dg) +may be used without change from the previous system. +Those data files marked with a double dagger (\(dd) have syntax +changes or substantial enhancements. +You should start with the \*(4B version and carefully +integrate any local changes into the new file. +Usually these local changes can be incorporated +without conflict into the new file; +some exceptions are noted below. +The files marked with an asterisk (*) require +particular attention and are discussed below. +.PP +As described in section 3.3, +the most immediately obvious change in \*(4B is the reorganization +of the system filesystems. +Users of certain recent vendor releases have seen this general organization, +although \*(4B takes the reorganization a bit further. +The directories most affected are +.Pn /etc , +that now contains only system configuration files; +.Pn /var , +a new filesystem containing per-system spool and log files; and +.Pn /usr/share, +that contains most of the text files shareable across architectures +such as documentation and macros. +System administration programs formerly in +.Pn /etc +are now found in +.Pn /sbin +and +.Pn /usr/sbin . +Various programs and data files formerly in +.Pn /usr/lib +are now found in +.Pn /usr/libexec +and +.Pn /usr/libdata , +respectively. +Administrative files formerly in +.Pn /usr/adm +are in +.Pn /var/account +and, similarly, log files are now in +.Pn /var/log . +The directory +.Pn /usr/ucb +has been merged into +.Pn /usr/bin , +and the sources for programs in +.Pn /usr/bin +are in +.Pn /usr/src/usr.bin . +Other source directories parallel the destination directories; +.Pn /usr/src/etc +has been greatly expanded, and +.Pn /usr/src/share +is new. +The source for the manual pages, in general, are with the source +code for the applications they document. +Manual pages not closely corresponding to an application program +are found in +.Pn /usr/src/share/man . +The locations of all man pages is listed in +.Pn /usr/src/share/man/man0/man[1-8] . +The manual page +.Xr hier (7) +has been updated and made more detailed; +it is included in the printed documentation. +You should review it to familiarize yourself with the new layout. +.PP +A new utility, +.Xr mtree (8), +is provided to build and check filesystem hierarchies +with the proper contents, owners and permissions. +Scripts are provided in +.Pn /etc/mtree +(and +.Pn /usr/src/etc/mtree ) +for the root, +.Pn /usr +and +.Pn /var +filesystems. +Once a filesystem has been made for +.Pn /var , +.Xr mtree +can be used to create a directory hierarchy there +or you can simply use tar to extract the prototype from +the second file of the distribution tape. +.Sh 3 "Changes in the \f(CW/etc\fP directory" +.PP +The +.Pn /etc +directory now contains nearly all the host-specific configuration +files. +Note that some file formats have changed, +and those configuration files containing pathnames are nearly all affected +by the reorganization. +See the examples provided in +.Pn /etc +(installed from +.Pn /usr/src/etc ) +as a guide. +The following table lists some of the local configuration files +whose locations and/or contents have changed. +.TS +l l l +lfC lfC l. +\*(Ps and Earlier \*(4B Comments +_ _ _ +/etc/fstab /etc/fstab new format; see below +/etc/inetd.conf /etc/inetd.conf pathnames of executables changed +/etc/printcap /etc/printcap pathnames changed +/etc/syslog.conf /etc/syslog.conf pathnames of log files changed +/etc/ttys /etc/ttys pathnames of executables changed +/etc/passwd /etc/master.passwd new format; see below +/usr/lib/sendmail.cf /etc/sendmail.cf changed pathnames +/usr/lib/aliases /etc/aliases may contain changed pathnames +/etc/*.pid /var/run/*.pid + +.T& +l l l +lfC lfC l. +New in \*(Ps-Tahoe \*(4B Comments +_ _ _ +/usr/games/dm.config /etc/dm.conf configuration for games (see \fIdm\fP\|(8)) +/etc/zoneinfo/localtime /etc/localtime timezone configuration +/etc/zoneinfo /usr/share/zoneinfo timezone configuration +.TE +.ne 1.5i +.TS +l l l +lfC lfC l. + New in \*(4B Comments +_ _ _ + /etc/aliases.db database version of the aliases file + /etc/amd-home location database of home directories + /etc/amd-vol location database of exported filesystems + /etc/changelist \f(CW/etc/security\fP files to back up + /etc/csh.cshrc system-wide csh(1) initialization file + /etc/csh.login system-wide csh(1) login file + /etc/csh.logout system-wide csh(1) logout file + /etc/disklabels directory for saving disklabels + /etc/exports NFS list of export permissions + /etc/ftpwelcome message displayed for ftp users; see ftpd(8) + /etc/man.conf lists directories searched by \fIman\fP\|(1) + /etc/mtree directory for local mtree files; see mtree(8) + /etc/netgroup NFS group list used in \f(CW/etc/exports\fP + /etc/pwd.db non-secure hashed user data base file + /etc/spwd.db secure hashed user data base file + /etc/security daily system security checker +.TE +.PP +System security changes require adding several new ``well-known'' groups to +.Pn /etc/group . +The groups that are needed by the system as distributed are: +.TS +l n l. +name number purpose +_ +wheel 0 users allowed superuser privilege +daemon 1 processes that need less than wheel privilege +kmem 2 read access to kernel memory +sys 3 access to kernel sources +tty 4 access to terminals +operator 5 read access to raw disks +bin 7 group for system binaries +news 8 group for news +wsrc 9 write access to sources +games 13 access to games +staff 20 system staff +guest 31 system guests +nobody 39 the least privileged group +utmp 45 access to utmp files +dialer 117 access to remote ports and dialers +.TE +Only users in the ``wheel'' group are permitted to +.Xr su +to ``root''. +Most programs that manage directories in +.Pn /var/spool +now run set-group-id to ``daemon'' so that users cannot +directly access the files in the spool directories. +The special files that access kernel memory, +.Pn /dev/kmem +and +.Pn /dev/mem , +are made readable only by group ``kmem''. +Standard system programs that require this access are +made set-group-id to that group. +The group ``sys'' is intended to control access to kernel sources, +and other sources belong to group ``wsrc.'' +Rather than make user terminals writable by all users, +they are now placed in group ``tty'' and made only group writable. +Programs that should legitimately have access to write on user terminals +such as +.Xr talkd +and +.Xr write +now run set-group-id to ``tty''. +The ``operator'' group controls access to disks. +By default, disks are readable by group ``operator'', +so that programs such as +.Xr dump +can access the filesystem information without being set-user-id to ``root''. +The +.Xr shutdown (8) +program is executable only by group operator +and is setuid to root so that members of group operator may shut down +the system without root access. +.PP +The ownership and modes of some directories have changed. +The +.Xr at +programs now run set-user-id ``root'' instead of ``daemon.'' +Also, the uucp directory no longer needs to be publicly writable, +as +.Xr tip +reverts to privileged status to remove its lock files. +After copying your version of +.Pn /var/spool , +you should do: +.DS +\fB#\fP \fIchown \-R root /var/spool/at\fP +\fB#\fP \fIchown \-R uucp:daemon /var/spool/uucp\fP +\fB#\fP \fIchmod \-R o\-w /var/spool/uucp\fP +.DE +.PP +The format of the cron table, +.Pn /etc/crontab , +has been changed to specify the user-id that should be used to run a process. +The userid ``nobody'' is frequently useful for non-privileged programs. +Local changes are now put in a separate file, +.Pn /etc/crontab.local . +.PP +Some of the commands previously in +.Pn /etc/rc.local +have been moved to +.Pn /etc/rc ; +several new functions are now handled by +.Pn /etc/rc , +.Pn /etc/netstart +and +.Pn /etc/rc.local . +You should look closely at the prototype version of these files +and read the manual pages for the commands contained in it +before trying to merge your local copy. +Note in particular that +.Xr ifconfig +has had many changes, +and that host names are now fully specified as domain-style names +(e.g., vangogh.CS.Berkeley.EDU) for the benefit of the name server. +.PP +Some of the commands previously in +.Pn /etc/daily +have been moved to +.Pn /etc/security , +and several new functions have been added to +.Pn /etc/security +to do nightly security checks on the system. +The script +.Pn /etc/daily +runs +.Pn /etc/security +each night, and mails the output to the super-user. +Some of the checks done by +.Pn /etc/security +are: +.DS +\(bu Syntax errors in the password and group files. +\(bu Duplicate user and group names and id's. +\(bu Dangerous search paths and umask values for the superuser. +\(bu Dangerous values in various initialization files. +\(bu Dangerous .rhosts files. +\(bu Dangerous directory and file ownership or permissions. +\(bu Globally exported filesystems. +\(bu Dangerous owners or permissions for special devices. +.DE +In addition, it reports any changes to setuid and setgid files, special +devices, or the files in +.Pn /etc/changelist +since the last run of +.Pn /etc/security . +Backup copies of the files are saved in +.Pn /var/backups . +Finally, the system binaries are checksummed and their permissions +validated against the +.Xr mtree (8) +specifications in +.Pn /etc/mtree . +.PP +The C-library and system binaries on the distribution tape +are compiled with new versions of +.Xr gethostbyname +and +.Xr gethostbyaddr +that use the name server, +.Xr named (8). +If you have only a small network and are not connected +to a large network, you can use the distributed library routines without +any problems; they use a linear scan of the host table +.Pn /etc/hosts +if the name server is not running. +If you are on the Internet or have a large local network, +it is recommend that you set up +and use the name server. +For instructions on how to set up the necessary configuration files, +refer to ``Name Server Operations Guide for BIND'' (SMM:10). +Several programs rely on the host name returned by +.Xr gethostname +to determine the local domain name. +.PP +If you are using the name server, your +.Xr sendmail +configuration file will need some updates to accommodate it. +See the ``Sendmail Installation and Operation Guide'' (SMM:8) and +the sample +.Xr sendmail +configuration files in +.Pn /usr/src/usr.sbin/sendmail/cf . +The aliases file, +.Pn /etc/aliases +has also been changed to add certain well-known addresses. +.Sh 3 "Shadow password files" +.PP +The password file format adds change and expiration fields +and its location has changed to protect +the encrypted passwords stored there. +The actual password file is now stored in +.Pn /etc/master.passwd . +The hashed dbm password files do not contain encrypted passwords, +but contain the file offset to the entry with the password in +.Pn /etc/master.passwd +(that is readable only by root). +Thus, the +.Fn getpwnam +and +.Fn getpwuid +functions will no longer return an encrypted password string to non-root +callers. +An old-style passwd file is created in +.Pn /etc/passwd +by the +.Xr vipw (8) +and +.Xr pwd_mkdb (8) +programs. +See also +.Xr passwd (5). +.PP +Several new users have also been added to the group of ``well-known'' users in +.Pn /etc/passwd . +The current list is: +.DS +.TS +l c. +name number +_ +root 0 +daemon 1 +operator 2 +bin 3 +games 7 +uucp 66 +nobody 32767 +.TE +.DE +The ``daemon'' user is used for daemon processes that +do not need root privileges. +The ``operator'' user-id is used as an account for dumpers +so that they can log in without having the root password. +By placing them in the ``operator'' group, +they can get read access to the disks. +The ``uucp'' login has existed long before \*(4B, +and is noted here just to provide a common user-id. +The password entry ``nobody'' has been added to specify +the user with least privilege. The ``games'' user is a pseudo-user +that controls access to game programs. +.PP +After installing your updated password file, you must run +.Xr pwd_mkdb (8) +to create the password database. +Note that +.Xr pwd_mkdb (8) +is run whenever +.Xr vipw (8) +is run. +.Sh 3 "The \f(CW/var\fP filesystem" +.PP +The spooling directories saved on tape may be restored in their +eventual resting places without too much concern. Be sure to +use the `\-p' option to +.Xr tar (1) +so that files are recreated with the same file modes. +The following commands provide a guide for copying spool and log files from +an existing system into a new +.Pn /var +filesystem. +At least the following directories should already exist on +.Pn /var : +.Pn output , +.Pn log , +.Pn backups +and +.Pn db . +.LP +.DS +.ft CW +SRC=/oldroot/usr + +cd $SRC; tar cf - msgs preserve | (cd /var && tar xpf -) +.DE +.DS +.ft CW +# copy $SRC/spool to /var +cd $SRC/spool +tar cf - at mail rwho | (cd /var && tar xpf -) +tar cf - ftp mqueue news secretmail uucp uucppublic | \e + (cd /var/spool && tar xpf -) +.DE +.DS +.ft CW +# everything else in spool is probably a printer area +mkdir .save +mv at ftp mail mqueue rwho secretmail uucp uucppublic .save +tar cf - * | (cd /var/spool/output && tar xpf -) +mv .save/* . +rmdir .save +.DE +.DS +.ft CW +cd /var/spool/mqueue +mv syslog.7 /var/log/maillog.7 +mv syslog.6 /var/log/maillog.6 +mv syslog.5 /var/log/maillog.5 +mv syslog.4 /var/log/maillog.4 +mv syslog.3 /var/log/maillog.3 +mv syslog.2 /var/log/maillog.2 +mv syslog.1 /var/log/maillog.1 +mv syslog.0 /var/log/maillog.0 +mv syslog /var/log/maillog +.DE +.DS +.ft CW +# move $SRC/adm to /var +cd $SRC/adm +tar cf - . | (cd /var/account && tar xpf -) +cd /var/account +rm -f msgbuf +mv messages messages.[0-9] ../log +mv wtmp wtmp.[0-9] ../log +mv lastlog ../log +.DE +.Sh 2 "Bug fixes and changes between \*(Ps and \*(4B" +.PP +The major new facilities available in the \*(4B release are +a new virtual memory system, +the addition of ISO/OSI networking support, +a new virtual filesystem interface supporting filesystem stacking, +a freely redistributable implementation of NFS, +a log-structured filesystem, +enhancement of the local filesystems to support +files and filesystems that are up to 2^63 bytes in size, +enhanced security and system management support, +and the conversion to and addition of the IEEE Std1003.1 (``POSIX'') +facilities and many of the IEEE Std1003.2 facilities. +In addition, many new utilities and additions to the C +library are present as well. +The kernel sources have been reorganized to collect all machine-dependent +files for each architecture under one directory, +and most of the machine-independent code is now free of code +conditional on specific machines. +The user structure and process structure have been reorganized +to eliminate the statically-mapped user structure and to make most +of the process resources shareable by multiple processes. +The system and include files have been converted to be compatible +with ANSI C, including function prototypes for most of the exported +functions. +There are numerous other changes throughout the system. +.Sh 3 "Changes to the kernel" +.PP +This release includes several important structural kernel changes. +The kernel uses a new internal system call convention; +the use of global (``u-dot'') variables for parameters and error returns +has been eliminated, +and interrupted system calls no longer abort using non-local goto's (longjmp's). +A new sleep interface separates signal handling from scheduling priority, +returning characteristic errors to abort or restart the current system call. +This sleep call also passes a string describing the process state, +that is used by the ps(1) program. +The old sleep interface can be used only for non-interruptible sleeps. +The sleep interface (\fItsleep\fP) can be used at any priority, +but is only interruptible if the PCATCH flag is set. +When interrupted, \fItsleep\fP returns EINTR or ERESTART. +.PP +Many data structures that were previously statically allocated +are now allocated dynamically. +These structures include mount entries, file entries, +user open file descriptors, the process entries, the vnode table, +the name cache, and the quota structures. +.PP +To protect against indiscriminate reading or writing of kernel +memory, all writing and most reading of kernel data structures +must be done using a new ``sysctl'' interface. +The information to be accessed is described through an extensible +``Management Information Base'' (MIB) style name, +described as a dotted set of components. +A new utility, +.Xr sysctl (8), +retrieves kernel state and allows processes with appropriate +privilege to set kernel state. +.Sh 3 "Security" +.PP +The kernel runs with four different levels of security. +Any superuser process can raise the security level, but only +.Fn init (8) +can lower it. +Security levels are defined as follows: +.IP \-1 +Permanently insecure mode \- always run system in level 0 mode. +.IP " 0" +Insecure mode \- immutable and append-only flags may be turned off. +All devices may be read or written subject to their permissions. +.IP " 1" +Secure mode \- immutable and append-only flags may not be cleared; +disks for mounted filesystems, +.Pn /dev/mem , +and +.Pn /dev/kmem +are read-only. +.IP " 2" +Highly secure mode \- same as secure mode, plus disks are always +read-only whether mounted or not. +This level precludes tampering with filesystems by unmounting them, +but also inhibits running +.Xr newfs (8) +while the system is multi-user. +See +.Xr chflags (1) +and the \-\fBo\fP option to +.Xr ls (1) +for information on setting and displaying the immutable and append-only +flags. +.PP +Normally, the system runs in level 0 mode while single user +and in level 1 mode while multiuser. +If the level 2 mode is desired while running multiuser, +it can be set in the startup script +.Pn /etc/rc +using +.Xr sysctl (1). +If it is desired to run the system in level 0 mode while multiuser, +the administrator must build a kernel with the variable +.Li securelevel +in the kernel source file +.Pn /sys/kern/kern_sysctl.c +initialized to \-1. +.Sh 4 "Virtual memory changes" +.PP +The new virtual memory implementation is derived from the Mach +operating system developed at Carnegie-Mellon, +and was ported to the BSD kernel at the University of Utah. +It is based on the 2.0 release of Mach +(with some bug fixes from the 2.5 and 3.0 releases) +and retains many of its essential features such as +the separation of the machine dependent and independent layers +(the ``pmap'' interface), +efficient memory utilization using copy-on-write +and other lazy-evaluation techniques, +and support for large, sparse address spaces. +It does not include the ``external pager'' interface instead using +a primitive internal pager interface. +The Mach virtual memory system call interface has been replaced with the +``mmap''-based interface described in the ``Berkeley Software +Architecture Manual'' (see UNIX Programmer's Manual, +Supplementary Documents, PSD:5). +The interface is similar to the interfaces shipped +by several commercial vendors such as Sun, USL, and Convex Computer Corp. +The integration of the new virtual memory is functionally complete, +but still has serious performance problems under heavy memory load. +The internal kernel interfaces have not yet been completed +and the memory pool and buffer cache have not been merged. +Some additional caveats: +.IP \(bu +Since the code is based on the 2.0 release of Mach, +bugs and misfeatures of the BSD version should not be considered +short-comings of the current Mach virtual memory system. +.IP \(bu +Because of the disjoint virtual memory (page) and IO (buffer) caches, +it is possible to see inconsistencies if using both the mmap and +read/write interfaces on the same file simultaneously. +.IP \(bu +Swap space is allocated on-demand rather than up front and no +allocation checks are performed so it is possible to over-commit +memory and eventually deadlock. +.IP \(bu +The semantics of the +.Xr vfork (2) +system call are slightly different. +The synchronization between parent and child is preserved, +but the memory sharing aspect is not. +In practice this has been enough for backward compatibility, +but newer code should just use +.Xr fork (2). +.Sh 4 "Networking additions and changes" +.PP +The ISO/OSI Networking consists of a kernel implementation of +transport class 4 (TP-4), +connectionless networking protocol (CLNP), +and 802.3-based link-level support (hardware-compatible with Ethernet\**). +.FS +Ethernet is a trademark of the Xerox Corporation. +.FE +We also include support for ISO Connection-Oriented Network Service, +X.25, TP-0. +The session and presentation layers are provided outside +the kernel using the ISO Development Environment by Marshall Rose, +that is available via anonymous FTP +(but is not included on the distribution tape). +Included in this development environment are file +transfer and management (FTAM), virtual terminals (VT), +a directory services implementation (X.500), +and miscellaneous other utilities. +.PP +Kernel support for the ISO OSI protocols is enabled with the ISO option +in the kernel configuration file. +The +.Xr iso (4) +manual page describes the protocols and addressing; +see also +.Xr clnp (4), +.Xr tp (4) +and +.Xr cltp (4). +The OSI equivalent to ARP is ESIS (End System to Intermediate System Routing +Protocol); running this protocol is mandatory, however one can manually add +translations for machines that do not participate by use of the +.Xr route (8) +command. +Additional information is provided in the manual page describing +.Xr esis (4). +.PP +The command +.Xr route (8) +has a new syntax and several new capabilities: +it can install routes with a specified destination and mask, +and can change route characteristics such as hop count, packet size +and window size. +.PP +Several important enhancements have been added to the TCP/IP +protocols including TCP header prediction and +serial line IP (SLIP) with header compression. +The routing implementation has been completely rewritten +to use a hierarchical routing tree with a mask per route +to support the arbitrary levels of routing found in the ISO protocols. +The routing table also stores and caches route characteristics +to speed the adaptation of the throughput and congestion avoidance +algorithms. +.PP +The format of the +.I sockaddr +structure (the structure used to describe a generic network address with an +address family and family-specific data) +has changed from previous releases, +as have the address family-specific versions of this structure. +The +.I sa_family +family field has been split into a length, +.Pn sa_len , +and a family, +.Pn sa_family . +System calls that pass a +.I sockaddr +structure into the kernel (e.g. +.Fn sendto +and +.Fn connect ) +have a separate parameter that specifies the +.I sockaddr +length, and thus it is not necessary to fill in the +.I sa_len +field for those system calls. +System calls that pass a +.I sockaddr +structure back from the kernel (e.g. +.Fn recvfrom +and +.Fn accept ) +receive a completely filled-in +.I sockaddr +structure, thus the length field is valid. +Because this would not work for old binaries, +the new library uses a different system call number. +Thus, most networking programs compiled under \*(4B are incompatible +with older systems. +.PP +Although this change is mostly source and binary compatible +with old programs, there are three exceptions. +Programs with statically initialized +.I sockaddr +structures +(usually the Internet form, a +.I sockaddr_in ) +are not compatible. +Generally, such programs should be changed to fill in the structure +at run time, as C allows no way to initialize a structure without +assuming the order and number of fields. +Also, programs with use structures to describe a network packet format +that contain embedded +.I sockaddr +structures also require change; a definition of an +.I osockaddr +structure is provided for this purpose. +Finally, programs that use the +.Sm SIOCGIFCONF +ioctl to get a complete list of interface addresses +need to check the +.I sa_len +field when iterating through the array of addresses returned, +as not all the structures returned have the same length +(this variance in length is nearly guaranteed by the presence of link-layer +address structures). +.Sh 4 "Additions and changes to filesystems" +.PP +The \*(4B distribution contains most of the interfaces +specified in the IEEE Std1003.1 system interface standard. +Filesystem additions include IEEE Std1003.1 FIFOs, +byte-range file locking, and saved user and group identifiers. +.PP +A new virtual filesystem interface has been added to the +kernel to support multiple filesystems. +In comparison with other interfaces, +the Berkeley interface has been structured for more efficient support +of filesystems that maintain state (such as the local filesystem). +The interface has been extended with support for stackable +filesystems done at UCLA. +These extensions allow for filesystems to be layered on top of each +other and allow new vnode operations to be added without requiring +changes to existing filesystem implementations. +For example, +the umap filesystem (see +.Xr mount_umap (8)) +is used to mount a sub-tree of an existing filesystem +that uses a different set of uids and gids than the local system. +Such a filesystem could be mounted from a remote site via NFS or it +could be a filesystem on removable media brought from some foreign +location that uses a different password file. +.PP +Other new filesystems that may be stacked include the loopback filesystem +.Xr mount_lofs (8), +the kernel filesystem +.Xr mount_kernfs (8), +and the portal filesystem +.Xr mount_portal (8). +.PP +The buffer cache in the kernel is now organized as a file block cache +rather than a device block cache. +As a consequence, cached blocks from a file +and from the corresponding block device would no longer be kept consistent. +The block device thus has little remaining value. +Three changes have been made for these reasons: +.IP 1) +block devices may not be opened while they are mounted, +and may not be mounted while open, so that the two versions of cached +file blocks cannot be created, +.IP 2) +filesystem checks of the root now use the raw device +to access the root filesystem, and +.IP 3) +the root filesystem is initially mounted read-only +so that nothing can be written back to disk during or after change to +the raw filesystem by +.Xr fsck . +.LP +The root filesystem may be made writable while in single-user mode +with the command: +.DS +.ft CW +mount \-uw / +.DE +The mount command has an option to update the flags on a mounted filesystem, +including the ability to upgrade a filesystem from read-only to read-write +or downgrade it from read-write to read-only. +.PP +In addition to the local ``fast filesystem'', +we have added an implementation of the network filesystem (NFS) +that fully interoperates with the NFS shipped by Sun and its licensees. +Because our NFS implementation was implemented +by Rick Macklem of the University of Guelph +using only the publicly available NFS specification, +it does not require a license from Sun to use in source or binary form. +By default it runs over UDP to be compatible with Sun's implementation. +However, it can be configured on a per-mount basis to run over TCP. +Using TCP allows it to be used quickly and efficiently through +gateways and over long-haul networks. +Using an extended protocol, it supports Leases to allow a limited +callback mechanism that greatly reduces the network traffic necessary +to maintain cache consistency between the server and its clients. +Its use will be familiar to users of other implementations of NFS. +See the manual pages +.Xr mount (8), +.Xr mountd (8), +.Xr fstab (5), +.Xr exports (5), +.Xr netgroup (5), +.Xr nfsd (8), +.Xr nfsiod (8), +and +.Xr nfssvc (8). +and the document ``The 4.4BSD NFS Implementation'' (SMM:6) +for further information. +The format of +.Pn /etc/fstab +has changed from previous \*(Bs releases +to a blank-separated format to allow colons in pathnames. +.PP +A new local filesystem, the log-structured filesystem (LFS), +has been added to the system. +It provides near disk-speed output and fast crash recovery. +This work is based, in part, on the LFS filesystem created +for the Sprite operating system at Berkeley. +While the kernel implementation is almost complete, +only some of the utilities to support the +filesystem have been written, +so we do not recommend it for production use. +See +.Xr newlfs (8), +.Xr mount_lfs (8) +and +.Xr lfs_cleanerd (8) +for more information. +For an in-depth description of the implementation and performance +characteristics of log-structured filesystems in general, +and this one in particular, see Dr. Margo Seltzer's doctoral thesis, +available from the University of California Computer Science Department. +.PP +We have also added a memory-based filesystem that runs in +pageable memory, allowing large temporary filesystems without +requiring dedicated physical memory. +.PP +The local ``fast filesystem'' has been enhanced to do +clustering that allows large pieces of files to be +allocated contiguously resulting in near doubling +of filesystem throughput. +The filesystem interface has been extended to allow +files and filesystems to grow to 2^63 bytes in size. +The quota system has been rewritten to support both +user and group quotas (simultaneously if desired). +Quota expiration is based on time rather than +the previous metric of number of logins over quota. +This change makes quotas more useful on fileservers +onto which users seldom login. +.PP +The system security has been greatly enhanced by the +addition of additional file flags that permit a file to be +marked as immutable or append only. +Once set, these flags can only be cleared by the super-user +when the system is running in insecure mode (normally, single-user). +In addition to the immutable and append-only flags, +the filesystem supports a new user-settable flag ``nodump''. +(File flags are set using the +.Xr chflags (1) +utility.) +When set on a file, +.Xr dump (8) +will omit the file from incremental backups +but retain them on full backups. +See the ``-h'' flag to +.Xr dump (8) +for details on how to change this default. +The ``nodump'' flag is usually set on core dumps, +system crash dumps, and object files generated by the compiler. +Note that the flag is not preserved when files are copied +so that installing an object file will cause it to be preserved. +.PP +The filesystem format used in \*(4B has several additions. +Directory entries have an additional field, +.Pn d_type , +that identifies the type of the entry +(normally found in the +.Pn st_mode +field of the +.Pn stat +structure). +This field is particularly useful for identifying +directories without the need to use +.Xr stat (2). +.PP +Short (less than sixty byte) symbolic links are now stored +in the inode itself rather than in a separate data block. +This saves disk space and makes access of symbolic links faster. +Short symbolic links are not given a special type, +so a user-level application is unaware of their special treatment. +Unlike pre-\*(4B systems, symbolic links do +not have an owner, group, access mode, times, etc. +Instead, these attributes are taken from the directory that contains the link. +The only attributes returned from an +.Xr lstat (2) +that refer to the symbolic link itself are the file type (S_IFLNK), +size, blocks, and link count (always 1). +.PP +An implementation of an auto-mounter daemon, +.Xr amd , +was contributed by Jan-Simon Pendry of the +Imperial College of Science, Technology & Medicine. +See the document ``AMD \- The 4.4BSD Automounter'' (SMM:13) +for further information. +.PP +The directory +.Pn /dev/fd +contains special files +.Pn 0 +through +.Pn 63 +that, when opened, duplicate the corresponding file descriptor. +The names +.Pn /dev/stdin , +.Pn /dev/stdout +and +.Pn /dev/stderr +refer to file descriptors 0, 1 and 2. +See +.Xr fd (4) +and +.Xr mount_fdesc (8) +for more information. +.Sh 4 "POSIX terminal driver changes" +.PP +The \*(4B system uses the IEEE P1003.1 (POSIX.1) terminal interface +rather than the previous \*(Bs terminal interface. +The terminal driver is similar to the System V terminal driver +with the addition of the necessary extensions to get the +functionality previously available in the \*(Ps terminal driver. +Both the old +.Xr ioctl +calls and old options to +.Xr stty (1) +are emulated. +This emulation is expected to be unavailable in many vendors releases, +so conversion to the new interface is encouraged. +.PP +\*(4B also adds the IEEE Std1003.1 job control interface, +that is similar to the \*(Ps job control interface, +but adds a security model that was missing in the +\*(Ps job control implementation. +A new system call, +.Fn setsid , +creates a job-control session consisting of a single process +group with one member, the caller, that becomes a session leader. +Only a session leader may acquire a controlling terminal. +This is done explicitly via a +.Sm TIOCSCTTY +.Fn ioctl +call, not implicitly by an +.Fn open +call. +The call fails if the terminal is in use. +Programs that allocate controlling terminals (or pseudo-terminals) +require change to work in this environment. +The versions of +.Xr xterm +provided in the X11R5 release includes the necessary changes. +New library routines are available for allocating and initializing +pseudo-terminals and other terminals as controlling terminal; see +.Pn /usr/src/lib/libutil/pty.c +and +.Pn /usr/src/lib/libutil/login_tty.c . +.PP +The POSIX job control model formalizes the previous conventions +used in setting up a process group. +Unfortunately, this requires that changes be made in a defined order +and with some synchronization that were not necessary in the past. +Older job control shells (csh, ksh) will generally not operate correctly +with the new system. +.PP +Most of the other kernel interfaces have been changed to correspond +with the POSIX.1 interface, although that work is not complete. +See the relevant manual pages and the IEEE POSIX standard. +.Sh 4 "Native operating system compatibility" +.PP +Both the HP300 and SPARC ports feature the ability to run binaries +built for the native operating system (HP-UX or SunOS) by emulating +their system calls. +Building an HP300 kernel with the HPUXCOMPAT and COMPAT_OHPUX options +or a SPARC kernel with the COMPAT_SUNOS option will enable this feature +(on by default in the generic kernel provided in the root filesystem image). +Though this native operating system compatibility was provided by the +developers as needed for their purposes and is by no means complete, +it is complete enough to run several non-trivial applications including +those that require HP-UX or SunOS shared libraries. +For example, the vendor supplied X11 server and windowing environment +can be used on both the HP300 and SPARC. +.PP +It is important to remember that merely copying over a native binary +and executing it (or executing it directly across NFS) does not imply +that it will run. +All but the most trivial of applications are likely to require access +to auxiliary files that do not exist under \*(4B (e.g. +.Pn /etc/ld.so.cache ) +or have a slightly different format (e.g. +.Pn /etc/passwd ). +However, by using system call tracing and +through creative use of symlinks, +many problems can be tracked down and corrected. +.PP +The DECstation port also has code for ULTRIX emulation +(kernel option ULTRIXCOMPAT, not compiled into the generic kernel) +but it was used primarily for initially bootstrapping the port and +has not been used since. +Hence, some work may be required to make it generally useful. +.Sh 3 "Changes to the utilities" +.PP +We have been tracking the IEEE Std1003.2 shell and utility work +and have included prototypes of many of the proposed utilities +based on draft 12 of the POSIX.2 Shell and Utilities document. +Because most of the traditional utilities have been replaced +with implementations conformant to the POSIX standards, +you should realize that the utility software may not be as stable, +reliable or well documented as in traditional Berkeley releases. +In particular, almost the entire manual suite has been rewritten to +reflect the POSIX defined interfaces, and in some instances +it does not correctly reflect the current state of the software. +It is also worth noting that, in rewriting this software, we have generally +been rewarded with significant performance improvements. +Most of the libraries and header files have been converted +to be compliant with ANSI C. +The shipped compiler (gcc) is a superset of ANSI C, +but supports traditional C as a command-line option. +The system libraries and utilities all compile +with either ANSI or traditional C. +.Sh 4 "Make and Makefiles" +.PP +This release uses a completely new version of the +.Xr make +program derived from the +.Xr pmake +program developed by the Sprite project at Berkeley. +It supports existing makefiles, although certain incorrect makefiles +may fail. +The makefiles for the \*(4B sources make extensive use of the new +facilities, especially conditionals and file inclusion, and are thus +completely incompatible with older versions of +.Xr make +(but nearly all the makefiles are now trivial!). +The standard include files for +.Xr make +are in +.Pn /usr/share/mk . +There is a +.Pn bsd.README +file in +.Pn /usr/src/share/mk . +.PP +Another global change supported by the new +.Xr make +is designed to allow multiple architectures to share a copy of the sources. +If a subdirectory named +.Pn obj +is present in the current directory, +.Xr make +descends into that directory and creates all object and other files there. +We use this by building a directory hierarchy in +.Pn /var/obj +that parallels +.Pn /usr/src . +We then create the +.Pn obj +subdirectories in +.Pn /usr/src +as symbolic links to the corresponding directories in +.Pn /var/obj . +(This step is automated. +The command ``make obj'' in +.Pn /usr/src +builds both the local symlink and the shadow directory, +using +.Pn /usr/obj , +that may be a symbolic link, as the root of the shadow tree. +The use of +.Pn /usr/obj +is for historic reasons only, and the system make configuration files in +.Pn /usr/share/mk +can trivially be modified to use +.Pn /var/obj +instead.) +We have one +.Pn /var/obj +hierarchy on the local system, and another on each +system that shares the source filesystem. +All the sources in +.Pn /usr/src +except for +.Pn /usr/src/contrib +and portions of +.Pn /usr/src/old +have been converted to use the new make and +.Pn obj +subdirectories; +this change allows compilation for multiple +architectures from the same source tree +(that may be mounted read-only). +.Sh 4 "Kerberos" +.PP +The Kerberos authentication system designed by MIT (version 5) +is included in this release. +See +.Xr kerberos (8) +for a general introduction. +Pluggable Authentication Modules (PAM) can use Kerberos +at the system administrator's discretion. +If it is configured, +apps such as +.Xr login (1), +.Xr passwd (1), +.Xr ftp (1) +and +.Xr ssh (1) +can use it automatically. +The file +Each system needs the file +.Pn /etc/krb5.conf +to set its realm and local servers, +and a private key stored in +.Pn /etc/krb5.keytab +(see +.Xr ktutil (8)). +The Kerberos server should be set up on a single, +physically secure, +server machine. +Users and hosts may be added and modified with +.Xr kadmin (8). +.PP +Note that the password-changing program +.Xr passwd (1) +can change the Kerberos password, +if configured by the administrator using PAM. +The +.Li \-l +option to +.Xr passwd (1) +changes the ``local'' password if one exists. +.Sh 4 "Timezone support" +.PP +The timezone conversion code in the C library uses data files installed in +.Pn /usr/share/zoneinfo +to convert from ``GMT'' to various timezones. The data file for the default +timezone for the system should be copied to +.Pn /etc/localtime . +Other timezones can be selected by setting the TZ environment variable. +.PP +The data files initially installed in +.Pn /usr/share/zoneinfo +include corrections for leap seconds since the beginning of 1970. +Thus, they assume that the +kernel will increment the time at a constant rate during a leap second; +that is, time just keeps on ticking. The conversion routines will then +name a leap second 23:59:60. For purists, this effectively means that +the kernel maintains TAI (International Atomic Time) rather than UTC +(Coordinated Universal Time, aka GMT). +.PP +For systems that run current NTP (Network Time Protocol) implementations +or that wish to conform to the letter of the POSIX.1 law, it is possible +to rebuild the timezone data files so that leap seconds are not counted. +(NTP causes the time to jump over a leap second, and POSIX effectively +requires the clock to be reset by hand when a leap second occurs. +In this mode, the kernel effectively runs UTC rather than TAI.) +.PP +The data files without leap second information +are constructed from the source directory, +.Pn /usr/src/share/zoneinfo . +Change the variable REDO in Makefile +from ``right'' to ``posix'', and then do +.DS +make obj (if necessary) +make +make install +.DE +.PP +You will then need to copy the correct default zone file to +.Pn /etc/localtime , +as the old one would still have used leap seconds, and because the Makefile +installs a default +.Pn /etc/localtime +each time ``make install'' is done. +.PP +It is possible to install both sets of timezone data files. This results +in subdirectories +.Pn /usr/share/zoneinfo/right +and +.Pn /usr/share/zoneinfo/posix . +Each contain a complete set of zone files. +See +.Pn /usr/src/share/zoneinfo/Makefile +for details. +.Sh 4 "Additions and changes to the libraries" +.PP +Notable additions to the libraries include functions to traverse a +filesystem hierarchy, database interfaces to btree and hashing functions, +a new, faster implementation of stdio and a radix and merge sort +functions. +.PP +The +.Xr fts (3) +functions will do either physical or logical traversal of +a file hierarchy as well as handle essentially infinite depth +filesystems and filesystems with cycles. +All the utilities in \*(4B which traverse file hierarchies +have been converted to use +.Xr fts (3). +The conversion has always resulted in a significant performance +gain, often of four or five to one in system time. +.PP +The +.Xr dbopen (3) +functions are intended to be a family of database access methods. +Currently, they consist of +.Xr hash (3), +an extensible, dynamic hashing scheme, +.Xr btree (3), +a sorted, balanced tree structure (B+tree's), and +.Xr recno (3), +a flat-file interface for fixed or variable length records +referenced by logical record number. +Each of the access methods stores associated key/data pairs and +uses the same record oriented interface for access. +.PP +The +.Xr qsort (3) +function has been rewritten for additional performance. +In addition, three new types of sorting functions, +.Xr heapsort (3), +.Xr mergesort (3) +and +.Xr radixsort (3) +have been added to the system. +The +.Xr mergesort +function is optimized for data with pre-existing order, +in which case it usually significantly outperforms +.Xr qsort . +The +.Xr radixsort (3) +functions are variants of most-significant-byte radix sorting. +They take time linear to the number of bytes to be +sorted, usually significantly outperforming +.Xr qsort +on data that can be sorted in this fashion. +An implementation of the POSIX 1003.2 standard +.Xr sort (1), +based on +.Xr radixsort , +is included in +.Pn /usr/src/contrib/sort . +.PP +Some additional comments about the \*(4B C library: +.IP \(bu +The floating point support in the C library has been replaced +and is now accurate. +.IP \(bu +The C functions specified by both ANSI C, POSIX 1003.1 and +1003.2 are now part of the C library. +This includes support for file name matching, shell globbing +and both basic and extended regular expressions. +.IP \(bu +ANSI C multibyte and wide character support has been integrated. +The rune functionality from the Bell Labs' Plan 9 system is provided +as well. +.IP \(bu +The +.Xr termcap (3) +functions have been generalized and replaced with a general +purpose interface named +.Xr getcap (3). +.IP \(bu +The +.Xr stdio (3) +routines have been replaced, and are usually much faster. +In addition, the +.Xr funopen (3) +interface permits applications to provide their own I/O stream +function support. +.PP +The +.Xr curses (3) +library has been largely rewritten. +Important additional features include support for scrolling and +.Xr termios (3). +.PP +An application front-end editing library, named libedit, has been +added to the system. +.PP +A superset implementation of the SunOS kernel memory interface library, +libkvm, has been integrated into the system. +.PP +.Sh 4 "Additions and changes to other utilities" +.PP +There are many new utilities, offering many new capabilities, +in \*(4B. +Skimming through the section 1 and section 8 manual pages is sure +to be useful. +The additions to the utility suite include greatly enhanced versions of +programs that display system status information, implementations of +various traditional tools described in the IEEE Std1003.2 standard, +new tools not previous available on Berkeley UNIX systems, +and many others. +Also, with only a very few exceptions, all the utilities from +\*(Ps that included proprietary source code have been replaced, +and their \*(4B counterparts are freely redistributable. +Normally, this replacement resulted in significant performance +improvements and the increase of the limits imposed on data by +the utility as well. +.PP +A summary of specific additions and changes are as follows: +.TS +lfC l. +amd An auto-mounter implementation. +ar Replacement of the historic archive format with a new one. +awk Replaced by gawk; see /usr/src/old/awk for the historic version. +bdes Utility implementing DES modes of operation described in FIPS PUB 81. +calendar Addition of an interface for system calendars. +cap_mkdb Utility for building hashed versions of termcap style databases. +cc Replacement of pcc with gcc suite. +chflags A utility for setting the per-file user and system flags. +chfn An editor based replacement for changing user information. +chpass An editor based replacement for changing user information. +chsh An editor based replacement for changing user information. +cksum The POSIX 1003.2 checksum utility; compatible with sum. +column A columnar text formatting utility. +cp POSIX 1003.2 compatible, able to copy special files. +csh Freely redistributable and 8-bit clean. +date User specified formats added. +dd New EBCDIC conversion tables, major performance improvements. +dev_mkdb Hashed interface to devices. +dm Dungeon master. +find Several new options and primaries, major performance improvements. +fstat Utility displaying information on files open on the system. +ftpd Connection logging added. +hexdump A binary dump utility, superseding od. +id The POSIX 1003.2 user identification utility. +inetd Tcpmux added. +jot A text formatting utility. +kdump A system-call tracing facility. +ktrace A system-call tracing facility. +kvm_mkdb Hashed interface to the kernel name list. +lam A text formatting utility. +lex A new, freely redistributable, significantly faster version. +locate A database of the system files, by name, constructed weekly. +logname The POSIX 1003.2 user identification utility. +mail.local New local mail delivery agent, replacing mail. +make Replaced with a new, more powerful make, supporting include files. +man Added support for man page location configuration. +mkdep A new utility for generating make dependency lists. +mkfifo The POSIX 1003.2 FIFO creation utility. +mtree A new utility for mapping file hierarchies to a file. +nfsstat An NFS statistics utility. +nvi A freely redistributable replacement for the ex/vi editors. +pax The POSIX 1003.2 replacement for cpio and tar. +printf The POSIX 1003.2 replacement for echo. +roff Replaced by groff; see /usr/src/old/roff for the historic versions. +rs New utility for text formatting. +shar An archive building utility. +sysctl MIB-style interface to system state. +tcopy Fast tape-to-tape copying and verification. +touch Time and file reference specifications. +tput The POSIX 1003.2 terminal display utility. +tr Addition of character classes. +uname The POSIX 1003.2 system identification utility. +vis A filter for converting and displaying non-printable characters. +xargs The POSIX 1003.2 argument list constructor utility. +yacc A new, freely redistributable, significantly faster version. +.TE +.PP +The new versions of +.Xr lex (1) +(``flex'') and +.Xr yacc (1) +(``zoo'') should be installed early on if attempting to +cross-compile \*(4B on another system. +Note that the new +.Xr lex +program is not completely backward compatible with historic versions of +.Xr lex , +although it is believed that all documented features are supported. +.PP +The +.Xr find +utility has two new options that are important to be aware of if you +intend to use NFS. +The ``fstype'' and ``prune'' options can be used together to prevent +find from crossing NFS mount points. +See +.Pn /etc/daily +for an example of their use. +.Sh 2 "Hints on converting from \*(Ps to \*(4B" +.PP +This section summarizes changes between +\*(Ps and \*(4B that are likely to +cause difficulty in doing the conversion. +It does not include changes in the network; +see section 5 for information on setting up the network. +.PP +Since the stat st_size field is now 64-bits instead of 32, +doing something like: +.DS +.ft CW +foo(st.st_size); +.DE +and then (improperly) defining foo with an ``int'' or ``long'' parameter: +.DS +.ft CW +foo(size) + int size; +{ + ... +} +.DE +will fail miserably (well, it might work on a little endian machine). +This problem showed up in +.Xr emacs (1) +as well as several other programs. +A related problem is improperly casting (or failing to cast) +the second argument to +.Xr lseek (2), +.Xr truncate (2), +or +.Xr ftruncate (2) +ala: +.DS +.ft CW +lseek(fd, (long)off, 0); +.DE +or +.DS +.ft CW +lseek(fd, 0, 0); +.DE +The best solution is to include +.Pn <unistd.h> +which has prototypes that catch these types of errors. +.PP +Determining the ``namelen'' parameter for a +.Xr connect (2) +call on a unix domain socket should use the ``SUN_LEN'' macro from +.Pn <sys/un.h> . +One old way that was used: +.DS +.ft CW +addrlen = strlen(unaddr.sun_path) + sizeof(unaddr.sun_family); +.DE +no longer works as there is an additional +.Pn sun_len +field. +.PP +The kernel's limit on the number of open files has been +increased from 20 to 64. +It is now possible to change this limit almost arbitrarily. +The standard I/O library +autoconfigures to the kernel limit. +Note that file (``_iob'') entries may be allocated by +.Xr malloc +from +.Xr fopen ; +this allocation has been known to cause problems with programs +that use their own memory allocators. +Memory allocation does not occur until after 20 files have been opened +by the standard I/O library. +.PP +.Xr Select +can be used with more than 32 descriptors +by using arrays of \fBint\fPs for the bit fields rather than single \fBint\fPs. +Programs that used +.Xr getdtablesize +as their first argument to +.Xr select +will no longer work correctly. +Usually the program can be modified to correctly specify the number +of bits in an \fBint\fP. +Alternatively the program can be modified to use an array of \fBint\fPs. +There are a set of macros available in +.Pn <sys/types.h> +to simplify this. +See +.Xr select (2). +.PP +Old core files will not be intelligible by the current debuggers +because of numerous changes to the user structure +and because the kernel stack has been enlarged. +The +.Xr a.out +header that was in the user structure is no longer present. +Locally-written debuggers that try to check the magic number +will need to be changed. +.PP +Files may not be deleted from directories having the ``sticky'' (ISVTX) bit +set in their modes +except by the owner of the file or of the directory, or by the superuser. +This is primarily to protect users' files in publicly-writable directories +such as +.Pn /tmp +and +.Pn /var/tmp . +All publicly-writable directories should have their ``sticky'' bits set +with ``chmod +t.'' +.PP +The following two sections contain additional notes about +changes in \*(4B that affect the installation of local files; +be sure to read them as well. diff --git a/share/doc/smm/01.setup/4.t b/share/doc/smm/01.setup/4.t new file mode 100644 index 0000000..d26dac7 --- /dev/null +++ b/share/doc/smm/01.setup/4.t @@ -0,0 +1,713 @@ +.\" Copyright (c) 1980, 1986, 1988 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)4.t 8.1 (Berkeley) 7/29/93 +.\" +.ds LH "Installing/Operating \*(4B +.ds CF \*(Dy +.ds RH "System setup +.Sh 1 "System setup" +.PP +This section describes procedures used to set up a \*(4B UNIX system. +These procedures are used when a system is first installed +or when the system configuration changes. Procedures for normal +system operation are described in the next section. +.Sh 2 "Kernel configuration" +.PP +This section briefly describes the layout of the kernel code and +how files for devices are made. +For a full discussion of configuring +and building system images, consult the document ``Building +4.3BSD UNIX Systems with Config'' (SMM:2). +.Sh 3 "Kernel organization" +.PP +As distributed, the kernel source is in a +separate tar image. The source may be physically +located anywhere within any filesystem so long as +a symbolic link to the location is created for the file +.Pn /sys +(many files in +.Pn /usr/include +are normally symbolic links relative to +.Pn /sys ). +In further discussions of the system source all path names +will be given relative to +.Pn /sys . +.LP +The kernel is made up of several large generic parts: +.TS +l l l. +sys main kernel header files +kern kernel functions broken down as follows + init system startup, syscall dispatching, entry points + kern scheduling, descriptor handling and generic I/O + sys process management, signals + tty terminal handling and job control + vfs filesystem management + uipc interprocess communication (sockets) + subr miscellaneous support routines +vm virtual memory management +ufs local filesystems broken down as follows + ufs common local filesystem routines + ffs fast filesystem + lfs log-based filesystem + mfs memory based filesystem +nfs Sun-compatible network filesystem +miscfs miscellaneous filesystems broken down as follows + deadfs where rejected vnodes go to die + fdesc access to per-process file descriptors + fifofs IEEE Std1003.1 FIFOs + kernfs filesystem access to kernel data structures + lofs loopback filesystem + nullfs another loopback filesystem + portal associate processes with filesystem locations + specfs device special files + umapfs provide alternate uid/gid mappings +dev generic device drivers (SCSI, vnode, concatenated disk) +.TE +.LP +The networking code is organized by protocol +.TS +l l. +net routing and generic interface drivers +netinet Internet protocols (TCP, UDP, IP, etc) +netiso ISO protocols (TP-4, CLNP, CLTP, etc) +netns Xerox network systems protocols (IDP, SPP, etc) +netx25 CCITT X.25 protocols (X.25 Packet Level, HDLC/LAPB) +.TE +.LP +A separate subdirectory is provided for each machine architecture +.TS +l l. +hp300 HP 9000/300 series of Motorola 68000-based machines +hp code common to both HP 68k and (non-existent) PA-RISC ports +i386 Intel 386/486-based PC machines +luna68k Omron 68000-based workstations +news3400 Sony News MIPS-based workstations +pmax Digital 3100/5000 MIPS-based workstations +sparc Sun Microsystems SPARCstation 1, 1+, and 2 +tahoe (deprecated) CCI Power 6-series machines +vax (deprecated) Digital VAX machines +.TE +.LP +Each machine directory is subdivided by function; +for example the hp300 directory contains +.TS +l l. +include exported machine-dependent header files +hp300 machine-dependent support code and private header files +dev device drivers +conf configuration files +stand machine-dependent standalone code +.TE +.LP +Other kernel related directories +.TS +l l. +compile area to compile kernels +conf machine-independent configuration files +stand machine-independent standalone code +.TE +.Sh 3 "Devices and device drivers" +.PP +Devices supported by UNIX are implemented in the kernel +by drivers whose source is kept in +.Pn /sys/<architecture>/dev . +These drivers are loaded +into the system when included in a cpu specific configuration file +kept in the conf directory. Devices are accessed through special +files in the filesystem, made by the +.Xr mknod (8) +program and normally kept in the +.Pn /dev +directory. +For all the devices supported by the distribution system, the +files in +.Pn /dev +are created by the +.Pn /dev/MAKEDEV +shell script. +.PP +Determine the set of devices that you have and create a new +.Pn /dev +directory by running the MAKEDEV script. +First create a new directory +.Pn /newdev , +copy MAKEDEV into it, edit the file MAKEDEV.local +to provide an entry for local needs, +and run it to generate a +.Pn /newdev directory. +For instance, +.DS +\fB#\fP \fIcd /\fP +\fB#\fP \fImkdir newdev\fP +\fB#\fP \fIcp dev/MAKEDEV newdev/MAKEDEV\fP +\fB#\fP \fIcd newdev\fP +\fB#\fP \fIMAKEDEV \*(Dk0 pt0 std LOCAL\fP +.DE +Note the ``std'' argument causes standard devices such as +.Pn /dev/console , +the machine console, to be created. +.PP +You can then do +.DS +\fB#\fP \fIcd /\fP +\fB#\fP \fImv dev olddev ; mv newdev dev\fP +\fB#\fP \fIsync\fP +.DE +to install the new device directory. +.Sh 3 "Building new system images" +.PP +The kernel configuration of each UNIX system is described by +a single configuration file, stored in the +.Pn /sys/<architecture>/conf +directory. +To learn about the format of this file and the procedure used +to build system images, +start by reading ``Building 4.3BSD UNIX Systems with Config'' (SMM:2), +look at the manual pages in section 4 +of the UNIX manual for the devices you have, +and look at the sample configuration files in the +.Pn /sys/<architecture>/conf +directory. +.PP +The configured system image +.Pn kernel +should be copied to the root, and then booted to try it out. +It is best to name it +.Pn /newkernel +so as not to destroy the working system until you are sure it does work: +.DS +\fB#\fP \fIcp kernel /newkernel\fP +\fB#\fP \fIsync\fP +.DE +It is also a good idea to keep the previous system around under some other +name. In particular, we recommend that you save the generic distribution +version of the system permanently as +.Pn /genkernel +for use in emergencies. +To boot the new version of the system you should follow the +bootstrap procedures outlined in section 6.1. +After having booted and tested the new system, it should be installed as +.Pn /kernel +before going into multiuser operation. +A systematic scheme for numbering and saving old versions +of the system may be useful. +.Sh 2 "Configuring terminals" +.PP +If UNIX is to support simultaneous +access from directly-connected terminals other than the console, +the file +.Pn /etc/ttys +(see +.Xr ttys (5)) +must be edited. +.PP +To add a new terminal device, be sure the device is configured into the system +and that the special files for the device have been made by +.Pn /dev/MAKEDEV . +Then, enable the appropriate lines of +.Pn /etc/ttys +by setting the ``status'' +field to \fBon\fP (or add new lines). +Note that lines in +.Pn /etc/ttys +are one-for-one with entries in the file of current users +(see +.Pn /var/run/utmp ), +and therefore it is best to make changes +while running in single-user mode +and to add all the entries for a new device at once. +.PP +Each line in the +.Pn /etc/ttys +file is broken into four tab separated +fields (comments are shown by a `#' character and extend to +the end of the line). For each terminal line the four fields +are: +the device (without a leading +.Pn /dev ), +the program +.Pn /sbin/init +should startup to service the line +(or \fBnone\fP if the line is to be left alone), +the terminal type (found in +.Pn /usr/share/misc/termcap ), +and optional status information describing if the terminal is +enabled or not and if it is ``secure'' (i.e. the super user should +be allowed to login on the line). +If the console is marked as ``insecure'', +then the root password is required to bring the machine up single-user. +All fields are character strings +with entries requiring embedded white space enclosed in double +quotes. +Thus a newly added terminal +.Pn /dev/tty00 +could be added as +.DS +tty00 "/usr/libexec/getty std.9600" vt100 on secure # mike's office +.DE +The std.9600 parameter provided to +.Pn /usr/libexec/getty +is used in searching the file +.Pn /etc/gettytab ; +it specifies a terminal's characteristics (such as baud rate). +To make custom terminal types, consult +.Xr gettytab (5) +before modifying +.Pn /etc/gettytab . +.PP +Dialup terminals should be wired so that carrier is asserted only when the +phone line is dialed up. +For non-dialup terminals, from which modem control is not available, +you must wire back the signals so that +the carrier appears to always be present. For further details, +find your terminal driver in section 4 of the manual. +.PP +For network terminals (i.e. pseudo terminals), no program should +be started up on the lines. Thus, the normal entry in +.Pn /etc/ttys +would look like +.DS +ttyp0 none network +.DE +(Note, the fourth field is not needed here.) +.PP +When the system is running multi-user, all terminals that are listed in +.Pn /etc/ttys +as \fBon\fP have their line enabled. +If, during normal operations, you wish +to disable a terminal line, you can edit the file +.Pn /etc/ttys +to change the terminal's status to \fBoff\fP and +then send a hangup signal to the +.Xr init +process, by doing +.DS +\fB#\fP \fIkill \-1 1\fP +.DE +Terminals can similarly be enabled by changing the status field +from \fBoff\fP to \fBon\fP and sending a hangup signal to +.Xr init . +.PP +Note that if a special file is inaccessible when +.Xr init +tries to create a process for it, +.Xr init +will log a message to the +system error logging process (see +.Xr syslogd (8)) +and try to reopen the terminal every minute, reprinting the warning +message every 10 minutes. Messages of this sort are normally +printed on the console, though other actions may occur depending +on the configuration information found in +.Pn /etc/syslog.conf . +.PP +Finally note that you should change the names of any dialup +terminals to ttyd? +where ? is in [0-9a-zA-Z], as some programs use this property of the +names to determine if a terminal is a dialup. +Shell commands to do this should be put in the +.Pn /dev/MAKEDEV.local +script. +.PP +While it is possible to use truly arbitrary strings for terminal names, +the accounting and noticeably the +.Xr ps (1) +command make good use of the convention that tty names +(by default, and also after dialups are named as suggested above) +are distinct in the last 2 characters. +Change this and you may be sorry later, as the heuristic +.Xr ps (1) +uses based on these conventions will then break down and +.Xr ps +will run MUCH slower. +.Sh 2 "Adding users" +.PP +The procedure for adding a new user is described in +.Xr adduser (8). +You should add accounts for the initial user community, giving +each a directory and a password, and putting users who will wish +to share software in the same groups. +.PP +Several guest accounts have been provided on the distribution +system; these accounts are for people at Berkeley, +Bell Laboratories, and others +who have done major work on UNIX in the past. You can delete these accounts, +or leave them on the system if you expect that these people would have +occasion to login as guests on your system. +.Sh 2 "Site tailoring" +.PP +All programs that require the site's name, or some similar +characteristic, obtain the information through system calls +or from files located in +.Pn /etc . +Aside from parts of the +system related to the network, to tailor the system to your +site you must simply select a site name, then edit the file +.DS +/etc/netstart +.DE +The first lines in +.Pn /etc/netstart +use a variable to set the hostname, +.DS +hostname=\fImysitename\fP +/bin/hostname $hostname +.DE +to define the value returned by the +.Xr gethostname (2) +system call. If you are running the name server, your site +name should be your fully qualified domain name. Programs such as +.Xr getty (8), +.Xr mail (1), +.Xr wall (1), +and +.Xr uucp (1) +use this system call so that the binary images are site +independent. +.PP +You will also need to edit +.Pn /etc/netstart +to do the network interface initialization using +.Xr ifconfig (8). +If you are not sure how to do this, see sections 5.1, 5.2, and 5.3. +If you are not running a routing daemon and have +more than one Ethernet in your environment +you will need to set up a default route; +see section 5.4 for details. +Before bringing your system up multiuser, +you should ensure that the networking is properly configured. +The network is started by running +.Pn /etc/netstart . +Once started, you should test connectivity using +.Xr ping (8). +You should first test connectivity to yourself, +then another host on your Ethernet, +and finally a host on another Ethernet. +The +.Xr netstat (8) +program can be used to inspect and debug +your routes; see section 5.4. +.Sh 2 "Setting up the line printer system" +.PP +The line printer system consists of at least +the following files and commands: +.DS +.TS +l l. +/usr/bin/lpq spooling queue examination program +/usr/bin/lprm program to delete jobs from a queue +/usr/bin/lpr program to enter a job in a printer queue +/etc/printcap printer configuration and capability database +/usr/sbin/lpd line printer daemon, scans spooling queues +/usr/sbin/lpc line printer control program +/etc/hosts.lpd list of host allowed to use the printers +.TE +.DE +.PP +The file +.Pn /etc/printcap +is a master database describing line +printers directly attached to a machine and, also, printers +accessible across a network. The manual page +.Xr printcap (5) +describes the format of this database and also +shows the default values for such things as the directory +in which spooling is performed. The line printer system handles +multiple printers, multiple spooling queues, local and remote +printers, and also printers attached via serial lines that require +line initialization such as the baud rate. Raster output devices +such as a Varian or Versatec, and laser printers such as an Imagen, +are also supported by the line printer system. +.PP +Remote spooling via the network is handled with two spooling +queues, one on the local machine and one on the remote machine. +When a remote printer job is started with +.Xr lpr , +the job is queued locally and a daemon process created to oversee the +transfer of the job to the remote machine. If the destination +machine is unreachable, the job will remain queued until it is +possible to transfer the files to the spooling queue on the +remote machine. The +.Xr lpq +program shows the contents of spool +queues on both the local and remote machines. +.PP +To configure your line printers, consult the printcap manual page +and the accompanying document, ``4.3BSD Line Printer Spooler Manual'' (SMM:7). +A call to the +.Xr lpd +program should be present in +.Pn /etc/rc . +.Sh 2 "Setting up the mail system" +.PP +The mail system consists of the following commands: +.DS +.TS +l l. +/usr/bin/mail UCB mail program, described in \fImail\fP\|(1) +/usr/sbin/sendmail mail routing program +/var/spool/mail mail spooling directory +/var/spool/secretmail secure mail directory +/usr/bin/xsend secure mail sender +/usr/bin/xget secure mail receiver +/etc/aliases mail forwarding information +/usr/bin/newaliases command to rebuild binary forwarding database +/usr/bin/biff mail notification enabler +/usr/libexec/comsat mail notification daemon +.TE +.DE +Mail is normally sent and received using the +.Xr mail (1) +command (found in +.Pn /usr/bin/mail ), +which provides a front-end to edit the messages sent +and received, and passes the messages to +.Xr sendmail (8) +for routing. +The routing algorithm uses knowledge of the network name syntax, +aliasing and forwarding information, and network topology, as +defined in the configuration file +.Pn /usr/lib/sendmail.cf , +to process each piece of mail. +Local mail is delivered by giving it to the program +.Pn /usr/libexec/mail.local +that adds it to the mailboxes in the directory +.Pn /var/spool/mail/<username> , +using a locking protocol to avoid problems with simultaneous updates. +After the mail is delivered, the local mail delivery daemon +.Pn /usr/libexec/comsat +is notified, which in turn notifies users who have issued a +``\fIbiff\fP y'' command that mail has arrived. +.PP +Mail queued in the directory +.Pn /var/spool/mail +is normally readable only by the recipient. +To send mail that is secure against perusal +(except by a code-breaker) you should use the secret mail facility, +which encrypts the mail. +.PP +To set up the mail facility you should read the instructions in the +file READ_ME in the directory +.Pn /usr/src/usr.sbin/sendmail +and then adjust the necessary configuration files. +You should also set up the file +.Pn /etc/aliases +for your installation, creating mail groups as appropriate. +For more informations see +``Sendmail Installation and Operation Guide'' (SMM:8) and +``Sendmail \- An Internetwork Mail Router'' (SMM:9). +.Sh 3 "Setting up a UUCP connection" +.LP +The version of +.Xr uucp +included in \*(4B has the following features: +.IP \(bu 3 +support for many auto call units and dialers +in addition to the DEC DN11, +.IP \(bu 3 +breakup of the spooling area into multiple subdirectories, +.IP \(bu 3 +addition of an +.Pn L.cmds +file to control the set +of commands that may be executed by a remote site, +.IP \(bu 3 +enhanced ``expect-send'' sequence capabilities when +logging in to a remote site, +.IP \(bu 3 +new commands to be used in polling sites and +obtaining snap shots of +.Xr uucp +activity, +.IP \(bu 3 +additional protocols for different communication media. +.LP +This section gives a brief overview of +.Xr uucp +and points out the most important steps in its installation. +.PP +To connect two UNIX machines with a +.Xr uucp +network link using modems, +one site must have an automatic call unit +and the other must have a dialup port. +It is better if both sites have both. +.PP +You should first read the paper in the UNIX System Manager's Manual: +``Uucp Implementation Description'' (SMM:14). +It describes in detail the file formats and conventions, +and will give you a little context. +In addition, +the document ``setup.tblms'', +located in the directory +.Pn /usr/src/usr.bin/uucp/UUAIDS , +may be of use in tailoring the software to your needs. +.PP +The +.Xr uucp +support is located in three major directories: +.Pn /usr/bin, +.Pn /usr/lib/uucp, +and +.Pn /var/spool/uucp . +User commands are kept in +.Pn /usr/bin, +operational commands in +.Pn /usr/lib/uucp , +and +.Pn /var/spool/uucp +is used as a spooling area. +The commands in +.Pn /usr/bin +are: +.DS +.TS +l l. +/usr/bin/uucp file-copy command +/usr/bin/uux remote execution command +/usr/bin/uusend binary file transfer using mail +/usr/bin/uuencode binary file encoder (for \fIuusend\fP) +/usr/bin/uudecode binary file decoder (for \fIuusend\fP) +/usr/bin/uulog scans session log files +/usr/bin/uusnap gives a snap-shot of \fIuucp\fP activity +/usr/bin/uupoll polls remote system until an answer is received +/usr/bin/uuname prints a list of known uucp hosts +/usr/bin/uuq gives information about the queue +.TE +.DE +The important files and commands in +.Pn /usr/lib/uucp +are: +.DS +.TS +l l. +/usr/lib/uucp/L-devices list of dialers and hard-wired lines +/usr/lib/uucp/L-dialcodes dialcode abbreviations +/usr/lib/uucp/L.aliases hostname aliases +/usr/lib/uucp/L.cmds commands remote sites may execute +/usr/lib/uucp/L.sys systems to communicate with, how to connect, and when +/usr/lib/uucp/SEQF sequence numbering control file +/usr/lib/uucp/USERFILE remote site pathname access specifications +/usr/lib/uucp/uucico \fIuucp\fP protocol daemon +/usr/lib/uucp/uuclean cleans up garbage files in spool area +/usr/lib/uucp/uuxqt \fIuucp\fP remote execution server +.TE +.DE +while the spooling area contains the following important files and directories: +.DS +.TS +l l. +/var/spool/uucp/C. directory for command, ``C.'' files +/var/spool/uucp/D. directory for data, ``D.'', files +/var/spool/uucp/X. directory for command execution, ``X.'', files +/var/spool/uucp/D.\fImachine\fP directory for local ``D.'' files +/var/spool/uucp/D.\fImachine\fPX directory for local ``X.'' files +/var/spool/uucp/TM. directory for temporary, ``TM.'', files +/var/spool/uucp/LOGFILE log file of \fIuucp\fP activity +/var/spool/uucp/SYSLOG log file of \fIuucp\fP file transfers +.TE +.DE +.PP +To install +.Xr uucp +on your system, +start by selecting a site name +(shorter than 14 characters). +A +.Xr uucp +account must be created in the password file and a password set up. +Then, +create the appropriate spooling directories with mode 755 +and owned by user +.Xr uucp , +group \fIdaemon\fP. +.PP +If you have an auto-call unit, +the L.sys, L-dialcodes, and L-devices files should be created. +The L.sys file should contain +the phone numbers and login sequences +required to establish a connection with a +.Xr uucp +daemon on another machine. +For example, our L.sys file looks something like: +.DS +adiron Any ACU 1200 out0123456789- ogin-EOT-ogin uucp +cbosg Never Slave 300 +cbosgd Never Slave 300 +chico Never Slave 1200 out2010123456 +.DE +The first field is the name of a site, +the second shows when the machine may be called, +the third field specifies how the host is connected +(through an ACU, a hard-wired line, etc.), +then comes the phone number to use in connecting through an auto-call unit, +and finally a login sequence. +The phone number +may contain common abbreviations that are defined in the L-dialcodes file. +The device specification should refer to devices +specified in the L-devices file. +Listing only ACU causes the +.Xr uucp +daemon, +.Xr uucico , +to search for any available auto-call unit in L-devices. +Our L-dialcodes file is of the form: +.DS +ucb 2 +out 9% +.DE +while our L-devices file is: +.DS +ACU cul0 unused 1200 ventel +.DE +Refer to the README file in the +.Xr uucp +source directory for more information about installation. +.PP +As +.Xr uucp +operates it creates (and removes) many small +files in the directories underneath +.Pn /var/spool/uucp . +Sometimes files are left undeleted; +these are most easily purged with the +.Xr uuclean +program. +The log files can grow without bound unless trimmed back; +.Xr uulog +maintains these files. +Many useful aids in maintaining your +.Xr uucp +installation are included in a subdirectory UUAIDS beneath +.Pn /usr/src/usr.bin/uucp . +Peruse this directory and read the ``setup'' instructions also located there. diff --git a/share/doc/smm/01.setup/5.t b/share/doc/smm/01.setup/5.t new file mode 100644 index 0000000..10b86dd --- /dev/null +++ b/share/doc/smm/01.setup/5.t @@ -0,0 +1,586 @@ +.\" Copyright (c) 1980, 1986, 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)5.t 8.1 (Berkeley) 7/27/93 +.\" +.ds lq `` +.ds rq '' +.ds LH "Installing/Operating \*(4B +.ds RH Network setup +.ds CF \*(Dy +.Sh 1 "Network setup" +.PP +\*(4B provides support for the standard Internet +protocols IP, ICMP, TCP, and UDP. These protocols may be used +on top of a variety of hardware devices ranging from +serial lines to local area network controllers +for the Ethernet. Network services are split between the +kernel (communication protocols) and user programs (user +services such as TELNET and FTP). This section describes +how to configure your system to use the Internet networking support. +\*(4B also supports the Xerox Network Systems (NS) protocols. +IDP and SPP are implemented in the kernel, +and other protocols such as Courier run at the user level. +\*(4B provides some support for the ISO OSI protocols CLNP +TP4, and ESIS. User level process +complete the application protocols such as X.400 and X.500. +.Sh 2 "System configuration" +.PP +To configure the kernel to include the Internet communication +protocols, define the INET option. +Xerox NS support is enabled with the NS option. +ISO OSI support is enabled with the ISO option. +In either case, include the pseudo-devices +``pty'', and ``loop'' in your machine's configuration +file. +The ``pty'' pseudo-device forces the pseudo terminal device driver +to be configured into the system, see +.Xr pty (4), +while the ``loop'' pseudo-device forces inclusion of the software loopback +interface driver. +The loop driver is used in network testing +and also by the error logging system. +.PP +If you are planning to use the Internet network facilities on a 10Mb/s +Ethernet, the pseudo-device ``ether'' should also be included +in the configuration; this forces inclusion of the Address Resolution +Protocol module used in mapping between 48-bit Ethernet +and 32-bit Internet addresses. +.PP +Before configuring the appropriate networking hardware, you should +consult the manual pages in section 4 of the Programmer's Manual +selecting the appropriate interfaces for your architecture. +.PP +All network interface drivers including the loopback interface, +require that their host address(es) be defined at boot time. +This is done with +.Xr ifconfig (8) +commands included in the +.Pn /etc/netstart +file. +Interfaces that are able to dynamically deduce the host +part of an address may check that the host part of the address is correct. +The manual page for each network interface +describes the method used to establish a host's address. +.Xr Ifconfig (8) +can also be used to set options for the interface at boot time. +Options are set independently for each interface, and +apply to all packets sent using that interface. +Alternatively, translations for such hosts may be set in advance +or ``published'' by a \*(4B host by use of the +.Xr arp (8) +command. +Note that the use of trailer link-level is now negotiated between \*(4B hosts +using ARP, +and it is thus no longer necessary to disable the use of trailers +with +.Xr ifconfig . +.PP +The OSI equivalent to ARP is ESIS (End System to Intermediate System Routing +Protocol); running this protocol is mandatory, however one can manually add +translations for machines that do not participate by use of the +.Xr route (8) +command. +Additional information is provided in the manual page describing +.Xr ESIS (4). +.PP +To use the pseudo terminals just configured, device +entries must be created in the +.Pn /dev +directory. To create 32 +pseudo terminals (plenty, unless you have a heavy network load) +execute the following commands. +.DS +\fB#\fP \fIcd /dev\fP +\fB#\fP \fIMAKEDEV pty0 pty1\fP +.DE +More pseudo terminals may be made by specifying +.Pn pty2 , +.Pn pty3 , +etc. The kernel normally includes support for 32 pseudo terminals +unless the configuration file specifies a different number. +Each pseudo terminal really consists of two files in +.Pn /dev : +a master and a slave. The master pseudo terminal file is named +.Pn /dev/ptyp? , +while the slave side is +.Pn /dev/ttyp? . +Pseudo terminals are also used by several programs not related to the network. +In addition to creating the pseudo terminals, +be sure to install them in the +.Pn /etc/ttys +file (with a `none' in the second column so no +.Xr getty +is started). +.Sh 2 "Local subnets" +.PP +In \*(4B the Internet support +includes the notion of ``subnets''. This is a mechanism +by which multiple local networks may appears as a single Internet +network to off-site hosts. Subnetworks are useful because +they allow a site to hide their local topology, requiring only a single +route in external gateways; +it also means that local network numbers may be locally administered. +The standard describing this change in Internet addressing is RFC-950. +.PP +To set up local subnets one must first decide how the available +address space (the Internet ``host part'' of the 32-bit address) +is to be partitioned. +Sites with a class A network +number have a 24-bit host address space with which to work, sites with a +class B network number have a 16-bit host address space, while sites with +a class C network number have an 8-bit host address space\**. +.FS +If you are unfamiliar with the Internet addressing structure, consult +``Address Mappings'', Internet RFC-796, J. Postel; available from +the Internet Network Information Center at SRI. +.FE +To define local subnets you must steal some bits +from the local host address space for use in extending the network +portion of the Internet address. This reinterpretation of Internet +addresses is done only for local networks; i.e. it is not visible +to hosts off-site. For example, if your site has a class B network +number, hosts on this network have an Internet address that contains +the network number, 16 bits, and the host number, another +16 bits. To define 254 local subnets, each +possessing at most 255 hosts, 8 bits may be taken from the local part. +(The use of subnets 0 and all-1's, 255 in this example, is discouraged +to avoid confusion about broadcast addresses.) +These new network +numbers are then constructed by concatenating the original 16-bit network +number with the extra 8 bits containing the local subnet number. +.PP +The existence of local subnets is communicated to the system at the time a +network interface is configured with the +.I netmask +option to the +.Xr ifconfig +program. A ``network mask'' is specified to define the +portion of the Internet address that is to be considered the network part +for that network. +This mask normally contains the bits corresponding to the standard +network part as well as the portion of the local part +that has been assigned to subnets. +If no mask is specified when the address is set, +it will be set according to the class of the network. +For example, at Berkeley (class B network 128.32) 8 bits +of the local part have been reserved for defining subnets; +consequently the +.Pn /etc/netstart +file contains lines of the form +.DS +.ft CW +/sbin/ifconfig le0 netmask 0xffffff00 128.32.1.7 +.DE +This specifies that for interface ``le0'', the upper 24 bits of +the Internet address should be used in calculating network numbers +(netmask 0xffffff00), and the interface's Internet address is +``128.32.1.7'' (host 7 on network 128.32.1). Hosts \fIm\fP on +sub-network \fIn\fP of this network would then have addresses of +the form ``128.32.\fIn\fP.\fIm\fP''; for example, host +99 on network 129 would have an address ``128.32.129.99''. +For hosts with multiple interfaces, the network mask should +be set for each interface, +although in practice only the mask of the first interface on each network +is really used. +.Sh 2 "Internet broadcast addresses" +.PP +The address defined as the broadcast address for Internet networks +according to RFC-919 is the address with a host part of all 1's. +The address used by 4.2BSD was the address with a host part of 0. +\*(4B uses the standard broadcast address (all 1's) by default, +but allows the broadcast address to be set (with +.Xr ifconfig ) +for each interface. +This allows networks consisting of both 4.2BSD, \*(Ps and \*(4B hosts +to coexist while the upgrade process proceeds. +In the presence of subnets, the broadcast address uses the subnet field +as for normal host addresses, with the remaining host part set to 1's +(or 0's, on a network that has not yet been converted). +\*(4B hosts recognize and accept packets +sent to the logical-network broadcast address as well as those sent +to the subnet broadcast address, and when using an all-1's broadcast, +also recognize and receive packets sent to host 0 as a broadcast. +.Sh 2 "Routing" +.PP +If your environment allows access to networks not directly +attached to your host you will need to set up routing information +to allow packets to be properly routed. Two schemes are +supported by the system. The first scheme +employs a routing table management daemon. +Optimally, you should use the routing daemon +.Xr gated +available from Cornell university. +We use it on our systems and it works well, +especially for multi-homed hosts using Serial Line IP (SLIP). +Unfortunately, we were not able to obtain permission to +include it on \*(4B. +.PP +If you do not wish to or cannot obtain +.Xr gated , +the distribution does include +.Xr routed (8) +to maintain the system routing tables. The routing daemon +uses a variant of the Xerox Routing Information Protocol +to maintain up to date routing tables in a cluster of local +area networks. By using the +.Pn /etc/gateways +file, the routing daemon can also be used to initialize static routes +to distant networks (see the next section for further discussion). +When the routing daemon is started up +(usually from +.Pn /etc/rc ) +it reads +.Pn /etc/gateways +if it exists and installs those routes defined there, +then broadcasts on each local network +to which the host is attached to find other instances of the routing +daemon. If any responses are received, the routing daemons +cooperate in maintaining a globally consistent view of routing +in the local environment. This view can be extended to include +remote sites also running the routing daemon by setting up suitable +entries in +.Pn /etc/gateways ; +consult +.Xr routed (8) +for a more thorough discussion. +.PP +The second approach is to define a default or wildcard +route to a smart +gateway and depend on the gateway to provide ICMP routing +redirect information to dynamically create a routing data +base. This is done by adding an entry of the form +.DS +.ft CW +/sbin/route add default \fIsmart-gateway\fP 1 +.DE +to +.Pn /etc/netstart ; +see +.Xr route (8) +for more information. The default route +will be used by the system as a ``last resort'' +in routing packets to their destination. Assuming the gateway +to which packets are directed is able to generate the proper +routing redirect messages, the system will then add routing +table entries based on the information supplied. This approach +has certain advantages over the routing daemon, but is +unsuitable in an environment where there are only bridges (i.e. +pseudo gateways that, for instance, do not generate routing +redirect messages). Further, if the +smart gateway goes down there is no alternative, save manual +alteration of the routing table entry, to maintaining service. +.PP +The system always listens, and processes, routing redirect +information, so it is possible to combine both of the above +facilities. For example, the routing table management process +might be used to maintain up to date information about routes +to geographically local networks, while employing the wildcard +routing techniques for ``distant'' networks. The +.Xr netstat (1) +program may be used to display routing table contents as well +as various routing oriented statistics. For example, +.DS +\fB#\fP \fInetstat \-r\fP +.DE +will display the contents of the routing tables, while +.DS +\fB#\fP \fInetstat \-r \-s\fP +.DE +will show the number of routing table entries dynamically +created as a result of routing redirect messages, etc. +.Sh 2 "Use of \*(4B machines as gateways" +.PP +Several changes have been made in \*(4B in the area of gateway support +(or packet forwarding, if one prefers). +A new configuration option, GATEWAY, is used when configuring +a machine to be used as a gateway. +This option increases the size of the routing hash tables in the kernel. +Unless configured with that option, +hosts with only a single non-loopback interface never attempt +to forward packets or to respond with ICMP error messages to misdirected +packets. +This change reduces the problems that may occur when different hosts +on a network disagree on the network number or broadcast address. +Another change is that \*(4B machines that forward packets back through +the same interface on which they arrived +will send ICMP redirects to the source host if it is on the same network. +This improves the interaction of \*(4B gateways with hosts that configure +their routes via default gateways and redirects. +The generation of redirects may be disabled with the configuration option +IPSENDREDIRECTS=0 or while the system is running by using the command: +.DS +.ft CW +sysctl -w net.inet.ip.redirect=0 +.DE +in environments where it may cause difficulties. +.Sh 2 "Network databases" +.PP +Several data files are used by the network library routines +and server programs. Most of these files are host independent +and updated only rarely. +.br +.ne 1i +.TS +lfC l l. +File Manual reference Use +_ +/etc/hosts \fIhosts\fP\|(5) local host names +/etc/networks \fInetworks\fP\|(5) network names +/etc/services \fIservices\fP\|(5) list of known services +/etc/protocols \fIprotocols\fP\|(5) protocol names +/etc/hosts.equiv \fIrshd\fP\|(8) list of ``trusted'' hosts +/etc/netstart \fIrc\fP\|(8) command script for initializing network +/etc/rc \fIrc\fP\|(8) command script for starting standard servers +/etc/rc.local \fIrc\fP\|(8) command script for starting local servers +/etc/ftpusers \fIftpd\fP\|(8) list of ``unwelcome'' ftp users +/etc/hosts.lpd \fIlpd\fP\|(8) list of hosts allowed to access printers +/etc/inetd.conf \fIinetd\fP\|(8) list of servers started by \fIinetd\fP +.TE +The files distributed are set up for Internet hosts. +Local networks and hosts should be added to describe the local +configuration; the Berkeley entries may serve as examples +(see also the section on +.Pn /etc/hosts ). +Network numbers will have to be chosen for each Ethernet. +For sites connected to the Internet, +the normal channels should be used for allocation of network +numbers (contact hostmaster@SRI-NIC.ARPA). +For other sites, +these could be chosen more or less arbitrarily, +but it is generally better to request official numbers +to avoid conversion if a connection to the Internet (or others on the Internet) +is ever established. +.Sh 3 "Network servers" +.PP +Most network servers are automatically started up at boot time +by the command file +.Pn /etc/rc +or by the Internet daemon (see below). +These include the following: +.TS +lfC l l. +Program Server Started by +_ +/usr/sbin/syslogd error logging server \f(CW/etc/rc\fP +/usr/sbin/named Internet name server \f(CW/etc/rc\fP +/sbin/routed routing table management daemon \f(CW/etc/rc\fP +/usr/sbin/rwhod system status daemon \f(CW/etc/rc\fP +/usr/sbin/timed time synchronization daemon \f(CW/etc/rc\fP +/usr/sbin/sendmail SMTP server \f(CW/etc/rc\fP +/usr/libexec/rshd shell server inetd +/usr/libexec/rexecd exec server inetd +/usr/libexec/rlogind login server inetd +/usr/libexec/telnetd TELNET server inetd +/usr/libexec/ftpd FTP server inetd +/usr/libexec/fingerd Finger server inetd +/usr/libexec/tftpd TFTP server inetd +.TE +Consult the manual pages and accompanying documentation (particularly +for named and sendmail) for details about their operation. +.PP +The use of +.Xr routed +and +.Xr rwhod +is controlled by shell +variables set in +.Pn /etc/netstart . +By default, +.Xr routed +is used, but +.Xr rwhod +is not; they are enabled by setting the variables \fIroutedflags\fP and +.Xr rwhod +to strings other than ``NO.'' +The value of \fIroutedflags\fP provides host-specific options to +.Xr routed . +For example, +.DS +.ft CW +routedflags=-q +rwhod=NO +.DE +would run +.Xr "routed -q" +and would not run +.Xr rwhod . +.PP +To have other network servers started as well, +commands of the following sort should be placed in the site-dependent file +.Pn /etc/rc.local . +.DS +.ft CW +if [ -f /usr/sbin/timed ]; then + /usr/sbin/timed & echo -n ' timed' >/dev/console +f\&i +.DE +.Sh 3 "Internet daemon" +.PP +In \*(4B most of the servers for user-visible services are started up by a +``super server'', the Internet daemon. The Internet +daemon, +.Pn /usr/sbin/inetd , +acts as a master server for +programs specified in its configuration file, +.Pn /etc/inetd.conf , +listening for service requests for these servers, and starting +up the appropriate program whenever a request is received. +The configuration file contains lines containing a service +name (as found in +.Pn /etc/services ), +the type of socket the +server expects (e.g. stream or dgram), the protocol to be +used with the socket (as found in +.Pn /etc/protocols ), +whether to wait for each server to complete before starting up another, +the user name by which the server should run, the server +program's name, and at most five arguments to pass to the +server program. +Some trivial services are implemented internally in +.Xr inetd , +and their servers are listed as ``internal.'' +For example, an entry for the file +transfer protocol server would appear as +.DS +.ft CW +ftp stream tcp nowait root /usr/libexec/ftpd ftpd +.DE +Consult +.Xr inetd (8) +for more detail on the format of the configuration file +and the operation of the Internet daemon. +.Sh 3 "The \f(CW/etc/hosts.equiv\fP file" +.PP +The remote login and shell servers use an +authentication scheme based on trusted hosts. The +.Pn hosts.equiv +file contains a list of hosts that are considered trusted +and, under a single administrative control. When a user +contacts a remote login or shell server requesting service, +the client process passes the user's name and the official +name of the host on which the client is located. In the simple +case, if the host's name is located in +.Pn hosts.equiv +and the user has an account on the server's machine, then service +is rendered (i.e. the user is allowed to log in, or the command +is executed). Users may expand this ``equivalence'' of +machines by installing a +.Pn \&.rhosts +file in their login directory. +The root login is handled specially, bypassing the +.Pn hosts.equiv +file, and using only the +.Pn /.rhosts +file. +.PP +Thus, to create a class of equivalent machines, the +.Pn hosts.equiv +file should contain the \fIofficial\fP names for those machines. +If you are running the name server, you may omit the domain part +of the host name for machines in your local domain. +For example, four machines on our local +network are considered trusted, so the +.Pn hosts.equiv +file is of the form: +.DS +.ft CW +vangogh.CS.Berkeley.EDU +picasso.CS.Berkeley.EDU +okeeffe.CS.Berkeley.EDU +.DE +.Sh 3 "The \f(CW/etc/ftpusers\fP file" +.PP +The FTP server included in the system provides support for an +anonymous FTP account. Because of the inherent security problems +with such a facility you should read this section carefully if +you consider providing such a service. +.PP +An anonymous account is enabled by creating a user +.Xr ftp . +When a client uses the anonymous account a +.Xr chroot (2) +system call is performed by the server to restrict the client +from moving outside that part of the filesystem where the +user ftp home directory is located. Because a +.Xr chroot +call is used, certain programs and files used by the server +process must be placed in the ftp home directory. +Further, one must be +sure that all directories and executable images are unwritable. +The following directory setup is recommended. The +use of the +.Xr awk +commands to copy the +.Pn /etc/passwd +and +.Pn /etc/group +files are \fBSTRONGLY\fP recommended. +.DS +\fB#\fP \fIcd ~ftp\fP +\fB#\fP \fIchmod 555 .; chown ftp .; chgrp ftp .\fP +\fB#\fP \fImkdir bin etc pub\fP +\fB#\fP \fIchown root bin etc\fP +\fB#\fP \fIchmod 555 bin etc\fP +\fB#\fP \fIchown ftp pub\fP +\fB#\fP \fIchmod 777 pub\fP +\fB#\fP \fIcd bin\fP +\fB#\fP \fIcp /bin/sh /bin/ls .\fP +\fB#\fP \fIchmod 111 sh ls\fP +\fB#\fP \fIcd ../etc\fP +\fB#\fP \fIawk -F: '{$2="*";print$1":"$2":"$3":"$4":"$5":"$6":"}' < /etc/passwd > passwd\fP +\fB#\fP \fIawk -F: '{$2="*";print$1":"$2":"}' < /etc/group > group\fP +\fB#\fP \fIchmod 444 passwd group\fP +.DE +When local users wish to place files in the anonymous +area, they must be placed in a subdirectory. In the +setup here, the directory +.Pn ~ftp/pub +is used. +.PP +Aside from the problems of directory modes and such, +the ftp server may provide a loophole for interlopers +if certain user accounts are allowed. +The file +.Pn /etc/ftpusers +is checked on each connection. +If the requested user name is located in the file, the +request for service is denied. This file normally has +the following names on our systems. +.DS +uucp +root +.DE +Accounts without passwords need not be listed in this file as the ftp +server will refuse service to these users. +Accounts with nonstandard shells (any not listed in +.Pn /etc/shells ) +will also be denied access via ftp. diff --git a/share/doc/smm/01.setup/6.t b/share/doc/smm/01.setup/6.t new file mode 100644 index 0000000..d043474 --- /dev/null +++ b/share/doc/smm/01.setup/6.t @@ -0,0 +1,663 @@ +.\" Copyright (c) 1980, 1986, 1988, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)6.t 8.1 (Berkeley) 7/27/93 +.\" +.ds LH "Installing/Operating \*(4B +.ds CF \*(Dy +.Sh 1 "System operation" +.PP +This section describes procedures used to operate a \*(4B UNIX system. +Procedures described here are used periodically, to reboot the system, +analyze error messages from devices, do disk backups, monitor +system performance, recompile system software and control local changes. +.Sh 2 "Bootstrap and shutdown procedures" +.PP +In a normal reboot, the system checks the disks and comes up multi-user +without intervention at the console. +Such a reboot +can be stopped (after it prints the date) with a ^C (interrupt). +This will leave the system in single-user mode, with only the console +terminal active. +(If the console has been marked ``insecure'' in +.Pn /etc/ttys +you must enter the root password to bring the machine to single-user mode.) +It is also possible to allow the filesystem checks to complete +and then to return to single-user mode by signaling +.Xr fsck (8) +with a QUIT signal (^\|\e). +.PP +To bring the system up to a multi-user configuration from the single-user +status, +all you have to do is hit ^D on the console. The system +will then execute +.Pn /etc/rc , +a multi-user restart script (and +.Pn /etc/rc.local ), +and come up on the terminals listed as +active in the file +.Pn /etc/ttys . +See +.Xr init (8) +and +.Xr ttys (5) for more details. +Note, however, that this does not cause a filesystem check to be done. +Unless the system was taken down cleanly, you should run +``fsck \-p'' or force a reboot with +.Xr reboot (8) +to have the disks checked. +.PP +To take the system down to a single user state you can use +.DS +\fB#\fP \fIkill 1\fP +.DE +or use the +.Xr shutdown (8) +command (which is much more polite, if there are other users logged in) +when you are running multi-user. +Either command will kill all processes and give you a shell on the console, +as if you had just booted. Filesystems remain mounted after the +system is taken single-user. If you wish to come up multi-user again, you +should do this by: +.DS +\fB#\fP \fIcd /\fP +\fB#\fP \fI/sbin/umount -a\fP +\fB#\fP \fI^D\fP +.DE +.PP +Each system shutdown, crash, processor halt and reboot +is recorded in the system log +with its cause. +.Sh 2 "Device errors and diagnostics" +.PP +When serious errors occur on peripherals or in the system, the system +prints a warning diagnostic on the console. +These messages are collected +by the system error logging process +.Xr syslogd (8) +and written into a system error log file +.Pn /var/log/messages . +Less serious errors are sent directly to +.Xr syslogd , +which may log them on the console. +The error priorities that are logged and the locations to which they are logged +are controlled by +.Pn /etc/syslog.conf . +See +.Xr syslogd (8) +for further details. +.PP +Error messages printed by the devices in the system are described with the +drivers for the devices in section 4 of the programmer's manual. +If errors occur suggesting hardware problems, you should contact +your hardware support group or field service. It is a good idea to +examine the error log file regularly +(e.g. with the command \fItail \-r /var/log/messages\fP). +.Sh 2 "Filesystem checks, backups, and disaster recovery" +.PP +Periodically (say every week or so in the absence of any problems) +and always (usually automatically) after a crash, +all the filesystems should be checked for consistency +by +.Xr fsck (1). +The procedures of +.Xr reboot (8) +should be used to get the system to a state where a filesystem +check can be done manually or automatically. +.PP +Dumping of the filesystems should be done regularly, +since once the system is going it is easy to +become complacent. +Complete and incremental dumps are easily done with +.Xr dump (8). +You should arrange to do a towers-of-hanoi dump sequence; we tune +ours so that almost all files are dumped on two tapes and kept for at +least a week in most every case. We take full dumps every month (and keep +these indefinitely). +Operators can execute ``dump w'' at login that will tell them what needs +to be dumped +(based on the +.Pn /etc/fstab +information). +Be sure to create a group +.B operator +in the file +.Pn /etc/group +so that dump can notify logged-in operators when it needs help. +.PP +More precisely, we have three sets of dump tapes: 10 daily tapes, +5 weekly sets of 2 tapes, and fresh sets of three tapes monthly. +We do daily dumps circularly on the daily tapes with sequence +`3 2 5 4 7 6 9 8 9 9 9 ...'. +Each weekly is a level 1 and the daily dump sequence level +restarts after each weekly dump. +Full dumps are level 0 and the daily sequence restarts after each full dump +also. +.PP +Thus a typical dump sequence would be: +.br +.ne 6 +.TS +center; +c c c c c +n n n l l. +tape name level number date opr size +_ +FULL 0 Nov 24, 1992 operator 137K +D1 3 Nov 28, 1992 operator 29K +D2 2 Nov 29, 1992 operator 34K +D3 5 Nov 30, 1992 operator 19K +D4 4 Dec 1, 1992 operator 22K +W1 1 Dec 2, 1992 operator 40K +D5 3 Dec 4, 1992 operator 15K +D6 2 Dec 5, 1992 operator 25K +D7 5 Dec 6, 1992 operator 15K +D8 4 Dec 7, 1992 operator 19K +W2 1 Dec 9, 1992 operator 118K +D9 3 Dec 11, 1992 operator 15K +D10 2 Dec 12, 1992 operator 26K +D1 5 Dec 15, 1992 operator 14K +W3 1 Dec 17, 1992 operator 71K +D2 3 Dec 18, 1992 operator 13K +FULL 0 Dec 22, 1992 operator 135K +.TE +We do weekly dumps often enough that daily dumps always fit on one tape. +.PP +Dumping of files by name is best done by +.Xr tar (1) +but the amount of data that can be moved in this way is limited +to a single tape. +Finally if there are enough drives entire +disks can be copied with +.Xr dd (1) +using the raw special files and an appropriate +blocking factor; the number of sectors per track is usually +a good value to use, consult +.Pn /etc/disktab . +.PP +It is desirable that full dumps of the root filesystem be +made regularly. +This is especially true when only one disk is available. +Then, if the +root filesystem is damaged by a hardware or software failure, you +can rebuild a workable disk doing a restore in the +same way that the initial root filesystem was created. +.PP +Exhaustion of user-file space is certain to occur +now and then; disk quotas may be imposed, or if you +prefer a less fascist approach, try using the programs +.Xr du (1), +.Xr df (1), +and +.Xr quot (8), +combined with threatening +messages of the day, and personal letters. +.Sh 2 "Moving filesystem data" +.PP +If you have the resources, +the best way to move a filesystem +is to dump it to a spare disk partition, or magtape, using +.Xr dump (8), +use +.Xr newfs (8) +to create the new filesystem, +and restore the filesystem using +.Xr restore (8). +Filesystems may also be moved by piping the output of +.Xr dump +to +.Xr restore . +The +.Xr restore +program uses an ``in-place'' algorithm that +allows filesystem dumps to be restored without concern for the +original size of the filesystem. Further, portions of a +filesystem may be selectively restored using a method similar +to the tape archive program. +.PP +If you have to merge a filesystem into another, existing one, +the best bet is to use +.Xr tar (1). +If you must shrink a filesystem, the best bet is to dump +the original and restore it onto the new filesystem. +If you +are playing with the root filesystem and only have one drive, +the procedure is more complicated. +If the only drive is a Winchester disk, this procedure may not be used +without overwriting the existing root or another partition. +What you do is the following: +.IP 1. +GET A SECOND PACK, OR USE ANOTHER DISK DRIVE!!!! +.IP 2. +Dump the root filesystem to tape using +.Xr dump (8). +.IP 3. +Bring the system down. +.IP 4. +Mount the new pack in the correct disk drive, if +using removable media. +.IP 5. +Load the distribution tape and install the new +root filesystem as you did when first installing the system. +Boot normally +using the newly created disk filesystem. +.PP +Note that if you change the disk partition tables or add new disk +drivers they should also be added to the standalone system in +.Pn /sys/<architecture>/stand , +and the default disk partition tables in +.Pn /etc/disktab +should be modified. +.Sh 2 "Monitoring system performance" +.PP +The +.Xr systat +program provided with the system is designed to be an aid to monitoring +systemwide activity. The default ``pigs'' mode shows a dynamic ``ps''. +By running in the ``vmstat'' mode +when the system is active you can judge the system activity in several +dimensions: job distribution, virtual memory load, paging and swapping +activity, device interrupts, and disk and cpu utilization. +Ideally, there should be few blocked (b) jobs, +there should be little paging or swapping activity, there should +be available bandwidth on the disk devices (most single arms peak +out at 20-30 tps in practice), and the user cpu utilization (us) should +be high (above 50%). +.PP +If the system is busy, then the count of active jobs may be large, +and several of these jobs may often be blocked (b). If the virtual +memory is active, then the paging demon will be running (sr will +be non-zero). It is healthy for the paging demon to free pages when +the virtual memory gets active; it is triggered by the amount of free +memory dropping below a threshold and increases its pace as free memory +goes to zero. +.PP +If you run in the ``vmstat'' mode +when the system is busy, you can find +imbalances by noting abnormal job distributions. If many +processes are blocked (b), then the disk subsystem +is overloaded or imbalanced. If you have several non-dma +devices or open teletype lines that are ``ringing'', or user programs +that are doing high-speed non-buffered input/output, then the system +time may go high (60-70% or higher). +It is often possible to pin down the cause of high system time by +looking to see if there is excessive context switching (cs), interrupt +activity (in) and per-device interrupt counts, +or system call activity (sy). Cumulatively on one of +our large machines we average about 60-200 context switches and interrupts +per second and about 50-500 system calls per second. +.PP +If the system is heavily loaded, or if you have little memory +for your load (2M is little in most any case), then the system +may be forced to swap. This is likely to be accompanied by a noticeable +reduction in system performance and pregnant pauses when interactive +jobs such as editors swap out. +If you expect to be in a memory-poor environment +for an extended period you might consider administratively +limiting system load. +.Sh 2 "Recompiling and reinstalling system software" +.PP +It is easy to regenerate either the entire system or a single utility, +and it is a good idea to try rebuilding pieces of the system to build +confidence in the procedures. +.LP +In general, there are six well-known targets supported by +all the makefiles on the system: +.IP all 9 +This entry is the default target, the same as if no target is specified. +This target builds the kernel, binary or library, as well as its +associated manual pages. +This target \fBdoes not\fP build the dependency files. +Some of the utilities require that a \fImake depend\fP be done before +a \fImake all\fP can succeed. +.IP depend +Build the include file dependency file, ``.depend'', which is +read by +.Xr make . +See +.Xr mkdep (1) +for further details. +.IP install +Install the kernel, binary or library, as well as its associated +manual pages. +See +.Xr install (1) +for further details. +.IP clean +Remove the kernel, binary or library, as well as any object files +created when building it. +.IP cleandir +The same as clean, except that the dependency files and formatted +manual pages are removed as well. +.IP obj +Build a shadow directory structure in the area referenced by +.Pn /usr/obj +and create a symbolic link in the current source directory to +referenced it, named ``obj''. +Once this shadow structure has been created, all the files created by +.Xr make +will live in the shadow structure, and +.Pn /usr/src +may be mounted read-only by multiple machines. +Doing a \fImake obj\fP in +.Pn /usr/src +will build the shadow directory structure for everything on the +system except for the contributed, old, and kernel software. +.PP +The system consists of three major parts: +the kernel itself, found in +.Pn /usr/src/sys , +the libraries , found in +.Pn /usr/src/lib , +and the user programs (the rest of +.Pn /usr/src ). +.PP +Deprecated software, found in +.Pn /usr/src/old , +often has old style makefiles; +some of it does not compile under \*(4B at all. +.PP +Contributed software, found in +.Pn /usr/src/contrib , +usually does not support the ``cleandir'', ``depend'', or ``obj'' targets. +.PP +The kernel does not support the ``obj'' shadow structure. +All kernels are compiled in subdirectories of +.Pn /usr/src/sys/compile +which is usually abbreviated as +.Pn /sys/compile . +If you want to mount your source tree read-only, +.Pn /usr/src/sys/compile +will have to be on a separate filesystem from +.Pn /usr/src . +Separation from +.Pn /usr/src +can be done by making +.Pn /usr/src/sys/compile +a symbolic link that references +.Pn /usr/obj/sys/compile . +If it is a symbolic link, the \fIS\fP variable in the kernel +Makefile must be changed from +.Pn \&../.. +to the absolute pathname needed to locate the kernel sources, usually +.Pn /usr/src/sys . +The symbolic link created by +.Xr config (8) +for +.Pn machine +must also be manually changed to an absolute pathname. +Finally, the +.Pn /usr/src/sys/libkern/obj +directory must be located in +.Pn /usr/obj/sys/libkern . +.PP +Each of the standard utilities and libraries may be built and +installed by changing directories into the correct location and +doing: +.DS +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +.DE +Note, if system include files have changed between compiles, +.Xr make +will not do the correct dependency checks if the dependency +files have not been built using the ``depend'' target. +.PP +The entire library and utility suite for the system may be recompiled +from scratch by changing directory to +.Pn /usr/src +and doing: +.DS +\fB#\fP \fImake build\fP +.DE +This target installs the system include files, cleans the source +tree, builds and installs the libraries, and builds and installs +the system utilities. +.PP +To recompile a specific program, first determine where the binary +resides with the +.Xr whereis (1) +command, then change to the corresponding source directory and build +it with the Makefile in the directory. +For instance, to recompile ``passwd'', +all one has to do is: +.DS +\fB#\fP \fIwhereis passwd\fP +\fB/usr/bin/passwd\fP +\fB#\fP \fIcd /usr/src/usr.bin/passwd\fP +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +.DE +this will compile and install the +.Xr passwd +utility. +.PP +If you wish to recompile and install all programs into a particular +target area you can override the default path prefix by doing: +.DS +\fB#\fP \fImake\fP +\fB#\fP \fImake DESTDIR=\fPpathname \fIinstall\fP +.DE +Similarly, the mode, owner, group, and other characteristics of +the installed object can be modified by changing other default +make variables. +See +.Xr make (1), +.Pn /usr/src/share/mk/bsd.README , +and the ``.mk'' scripts in the +.Pn /usr/share/mk +directory for more information. +.PP +If you modify the C library or system include files, to change a +system call for example, and want to rebuild and install everything, +you have to be a little careful. +You must ensure that the include files are installed before anything +is compiled, and that the libraries are installed before the remainder +of the source, otherwise the loaded images will not contain the new +routine from the library. +If include files have been modified, the following commands should +be done first: +.DS +\fB#\fP \fIcd /usr/src/include\fP +\fB#\fP \fImake install\fP +.DE +Then, if, for example, C library files have been modified, the +following commands should be executed: +.DS +\fB#\fP \fIcd /usr/src/lib/libc\fP +\fB#\fP \fImake depend\fP +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +\fB#\fP \fIcd /usr/src\fP +\fB#\fP \fImake depend\fP +\fB#\fP \fImake\fP +\fB#\fP \fImake install\fP +.DE +Alternatively, the \fImake build\fP command described above will +accomplish the same tasks. +This takes several hours on a reasonably configured machine. +.Sh 2 "Making local modifications" +.PP +The source for locally written commands is normally stored in +.Pn /usr/src/local , +and their binaries are kept in +.Pn /usr/local/bin . +This isolation of local binaries allows +.Pn /usr/bin , +and +.Pn /bin +to correspond to the distribution tape (and to the manuals that +people can buy). +People using local commands should be made aware that they are not +in the base manual. +Manual pages for local commands should be installed in +.Pn /usr/local/man/cat[1-8]. +The +.Xr man (1) +command automatically finds manual pages placed in +/usr/local/man/cat[1-8] to encourage this practice (see +.Xr man.conf (5)). +.Sh 2 "Accounting" +.PP +UNIX optionally records two kinds of accounting information: +connect time accounting and process resource accounting. The connect +time accounting information is stored in the file +.Pn /var/log/wtmp , +which is summarized by the program +.Xr ac (8). +The process time accounting information is stored in the file +.Pn /var/account/acct +after it is enabled by +.Xr accton (8), +and is analyzed and summarized by the program +.Xr sa (8). +.PP +If you need to recharge for computing time, you can develop +procedures based on the information provided by these commands. +A convenient way to do this is to give commands to the clock daemon +.Pn /usr/sbin/cron +to be executed every day at a specified time. +This is done by adding lines to +.Pn /etc/crontab.local ; +see +.Xr cron (8) +for details. +.Sh 2 "Resource control" +.PP +Resource control in the current version of UNIX is more +elaborate than in most UNIX systems. The disk quota +facilities developed at the University of Melbourne have +been incorporated in the system and allow control over the +number of files and amount of disk space each user and/or group may use +on each filesystem. In addition, the resources consumed +by any single process can be limited by the mechanisms of +.Xr setrlimit (2). +As distributed, the latter mechanism +is voluntary, though sites may choose to modify the login +mechanism to impose limits not covered with disk quotas. +.PP +To use the disk quota facilities, the system must be +configured with ``options QUOTA''. Filesystems may then +be placed under the quota mechanism by creating a null file +.Pn quota.user +and/or +.Pn quota.group +at the root of the filesystem, running +.Xr quotacheck (8), +and modifying +.Pn /etc/fstab +to show that the filesystem is to run +with disk quotas (options userquota and/or groupquota). +The +.Xr quotaon (8) +program may then be run to enable quotas. +.PP +Individual quotas are applied by using the quota editor +.Xr edquota (8). +Users may view their quotas (but not those of other users) with the +.Xr quota (1) +program. The +.Xr repquota (8) +program may be used to summarize the quotas and current +space usage on a particular filesystem or filesystems. +.PP +Quotas are enforced with \fIsoft\fP and \fIhard\fP limits. +When a user and/or group first reaches a soft limit on a resource, a +message is generated on their terminal. If the user and/or group fails to +lower the resource usage below the soft limit +for longer than the time limit established for that filesystem +(default seven days) the system then treats the soft limit as a +\fIhard\fP limit and disallows any allocations until enough space is +reclaimed to bring the user and/or group back below the soft limit. +Hard limits are enforced strictly resulting in errors when a user +and/or group tries to create or write a file. Each time a hard limit is +exceeded the system will generate a message on the user's terminal. +.PP +Consult the auxiliary document, ``Disc Quotas in a UNIX Environment'' (SMM:4) +and the appropriate manual entries for more information. +.Sh 2 "Network troubleshooting" +.PP +If you have anything more than a trivial network configuration, +from time to time you are bound to run into problems. Before +blaming the software, first check your network connections. On +networks such as the Ethernet a +loose cable tap or misplaced power cable can result in severely +deteriorated service. The +.Xr netstat (1) +program may be of aid in tracking down hardware malfunctions. +In particular, look at the \fB\-i\fP and \fB\-s\fP options in the manual page. +.PP +Should you believe a communication protocol problem exists, +consult the protocol specifications and attempt to isolate the +problem in a packet trace. The SO_DEBUG option may be supplied +before establishing a connection on a socket, in which case the +system will trace all traffic and internal actions (such as timers +expiring) in a circular trace buffer. +This buffer may then be printed out with the +.Xr trpt (8) +program. +Most of the servers distributed with the system +accept a \fB\-d\fP option forcing +all sockets to be created with debugging turned on. +Consult the appropriate manual pages for more information. +.Sh 2 "Files that need periodic attention" +.PP +We conclude the discussion of system operations by listing +the files that require periodic attention or are system specific: +.TS +center; +lfC l. +/etc/fstab how disk partitions are used +/etc/disktab default disk partition sizes/labels +/etc/printcap printer database +/etc/gettytab terminal type definitions +/etc/remote names and phone numbers of remote machines for \fItip\fP(1) +/etc/group group memberships +/etc/motd message of the day +/etc/master.passwd password file; each account has a line +/etc/rc.local local system restart script; runs reboot; starts daemons +/etc/inetd.conf local internet servers +/etc/hosts local host name database +/etc/networks network name database +/etc/services network services database +/etc/hosts.equiv hosts under same administrative control +/etc/syslog.conf error log configuration for \fIsyslogd\fP\|(8) +/etc/ttys enables/disables ports +/etc/crontab commands that are run periodically +/etc/crontab.local local commands that are run periodically +/etc/aliases mail forwarding and distribution groups +/var/account/acct raw process account data +/var/log/messages system error log +/var/log/wtmp login session accounting +.TE +.pn 2 +.bp +.PX diff --git a/share/doc/smm/01.setup/Makefile b/share/doc/smm/01.setup/Makefile new file mode 100644 index 0000000..b2d9c30 --- /dev/null +++ b/share/doc/smm/01.setup/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 8.1 (Berkeley) 7/27/93 +# $FreeBSD$ + +VOLUME= smm/01.setup +SRCS= stubs 0.t 1.t 2.t 3.t 4.t 5.t 6.t +MACROS= -ms +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/01.setup/spell.ok b/share/doc/smm/01.setup/spell.ok new file mode 100644 index 0000000..daedb66 --- /dev/null +++ b/share/doc/smm/01.setup/spell.ok @@ -0,0 +1,618 @@ +A1096A +AA +ACU +AMD +Automounter +BA +BLOCKSIZE +BSD +Bb +Bostic +Bourne +Bs +Bz +CCI +CCITT +CLNP +CLTP +COMPAT +CPU's +CS80 +CSRG +CW +Catseye +Cyl +DAT +DECstation +DESTDIR +DISK's +DISKTYPE +DMA +DN11 +DV +DaVinci +Dk +Dn +Dy +EBCDIC +EEPROM's +EINTR +EISA +EOT +ERESTART +ESIS +Emulex +Exabyte +FDDI +FIPS +FPU +FTAM +Filesystem +Filesystems +GCC +GENERIC.hp300 +GX +Gatorbox +HDLC +HIL +HP +HP's +HP300 +HP300s +HP433 +HP9000 +HPBSD +HPUXCOMPAT +Hibler +IB +ICMP +IDP +IDs +IFLNK +IP +IPC +IPSENDREDIRECTS +IPX +ISA +ISO +ISVTX +Intel +Jul +Karels +Kerberos +L.aliases +L.cmds +L.sys +LAN +LAPB +LFS +LH +LK201 +LOGFILE +Leffler +Luna +MAKEDEV.local +MB +MC68040 +MFS +MIB +MIPS +MISC +MMU +MT02 +Macklem +Makefile +Makefiles +Maxtor +McKusick +NFS +NIC.ARPA +NPSECT +NTP +OHPUX +OS +OSI +OSes +Omron +PCATCH +PDT +PMAD +PMAG +PMAX +PMAZ +POSIX +POSIX.1 +POSIX.2 +PSD:5 +PVRX +Pathnames +Pendry +Postel +README +RFC +RH +RISC +ROM +RS232 +RZ23 +RZ55 +RZ57 +SCSI +SEQF +SIOCGIFCONF +SLC +SMM:1 +SMM:10 +SMM:13 +SMM:14 +SMM:2 +SMM:3 +SMM:4 +SMM:6 +SMM:7 +SMM:8 +SMM:9 +SMTP +SPARC +SPARCstation +SPP +SRC +SUNOS +Sbus +Solaris +Standalone +Std1003.1 +Std1003.2 +SunOS +TAI +TAPE's +TBOOT +TCP +TIOCSCTTY +TK50 +TM +TP4 +TURBOchannel +TVRX +TZ +Tcpmux +Topcat +Tue +UCB +UDP +UFS +ULTRIX +ULTRIXCOMPAT +UNIX''SMM:1 +USERFILE +USL +UTC +UUAIDS +UX +Ux +VAX +VFS +VME +X11R5 +XX +Xinu +a,c,u,p +a.out +adaptor +adaptors +addrlen +adiron +adm +aka +aliases.db +amd +autochanger +autoconf +autoconfiguration +autoconfigures +bdes +bootblock +bootimage +bootp +bootpath +bootrom +bootsd +bootstrapped +bs +bsd +bsd.README +btree +bwtwo +c.f +callback +cbosg +cbosgd +centronics +cfb0 +cgsix +cgthree +changelist +chflags +chico +chpass +cksum +cleandir +cleanerd +clnp +cltp +conf +conformant +contrib +cpio +crontab +crontab.local +cs +csh.cshrc +csh.login +csh.logout +cshrc +ct +cul0 +db +dbopen +dc7085 +deadfs +dev +dgram +dialcode +dialcodes +dict +disk3 +diskful +disklabel +disklabels +disktab +dm +dm.conf +dm.config +dma +doc +endian +es +esis +ext +fd +fdesc +fifofs +files.HOST +fileservers +filesystem +filesystems +foo +frags +friend.host.inet.number +fsf +fstab +fstype +ftpusers +ftpwelcome +fts +funopen +gcc +genkernel +getcap +gettytab +gid +gid's +gids +groff +groupquota +hangup +hanoi +heapsort +hexdump +hier +host.inet.number +hostmaster +hosts.equiv +hosts.lpd +hp +hp300 +hpib0 +inetd.conf +inline +inode +int +intr +iob +iso +kbyte +kbytes +kdb +kdump +kerberos +kerberosIV +kernfs +kmem +krb.conf +ksh +ktrace +kvm +labelling +lastlog +ld.so.cache +le +le0 +lfs +lib +libc +libdata +libedit +libexec +libkern +libkvm +libutil +localhost +lofs +logname +loopback +lq +lun +luna68k +magtape +mail.local +mail.rc +maillog +maillog.0 +maillog.1 +maillog.2 +maillog.3 +maillog.4 +maillog.5 +maillog.6 +maillog.7 +makefiles +man.conf +man0 +manl +master.passwd +maxine +mckusick +mdec +mediainit +mem +mergesort +mfb0 +mfs +misc +miscfs +mk +mkdb +mkdep +mkfifo +mmap +mnt +mono +motd +mountd +mqueue +msgbuf +mtree +my.domain +myfriend +myfriend.my.domain +myname +myname.my.domain +mysitename +namelen +net.inet.ip.redirect +netgroup +netinet +netiso +netmask +netns +netstart +netx25 +newdev +newlfs +news3400 +newkernel +nfs +nfsd +nfsiod +nfsstat +nfssvc +nodump +nowait +npsect +nr +nrmt0 +nrst0 +nsect +nullfs +nvi +obj +ogin +ok +okeeffe.CS.Berkeley.EDU +olddev +oldroot +opr +osockaddr +out0123456789 +out2010123456 +pageable +pathname +pathnames +pcc +picasso.CS.Berkeley.EDU +pid +pm0 +pmake +pmap +pmax +posix +printcap +pt0 +pty +pty.c +pty0 +pty1 +pty2 +pty3 +ptyp +pwd.db +quota.group +quota.user +quotas.user +radixsort +rc.local +rct +rd +rd0 +rdsk +recno +rew +rf +rhosts +rmt0 +rmt12 +ro +roff +root.dump +root.image +routedflags +rq +rrd0d +rrz?a +rrz?c +rsd0d +rsd3a +rsf +rst0 +rw +rxx0 +rz +rz0 +rz1c +rz?a +sbin +sc14 +sc7 +scsi0 +scsiformat +sd +sd0 +sd2b +sd3 +sd3a +sd660 +secretmail +securelevel +securettys +sendmail.cf +setsid +setup.tblms +shar +shareable +sizeof +skel +sockaddr +sparc +specfs +spwd.db +sr +src +srvtab +st +st.st +standalone +std +std.9600 +stderr +stdin +stdout +subr +sunos +sw +sy +sysctl +sysctl.c +syslog.0 +syslog.1 +syslog.2 +syslog.3 +syslog.4 +syslog.5 +syslog.6 +syslog.7 +syslog.conf +tahoe +tapehost +tcp +termios +tmac +tmp +toor +tps +tput +tsleep +tty.c +tty00 +ttya +ttyb +ttyd +ttyp +ttyp0 +ttytype +types.h +tz +tz6 +ucb +ufs +uid +uid's +uids +uipc +umap +umapfs +un.h +unaddr.sun +uname +unistd.h +userid +username +userquota +usr.bin +usr.sbin +utmp +uucp.daemon +uucppublic +uw +vangogh.CS.Berkeley.EDU +var +vax +ventel +vfs +vm +kernel +kernel.net +kernel.tape +vnode +vnodes +vol +vt100 +wildcard +wr +wsrc +wtmp +xargs +xbpf +xcfb0 +xf +xp +xpbf +xpf +xsf +xx +xx0 +xxx +your.host.inet.number +yymmddhhmm +zA +zoneinfo diff --git a/share/doc/smm/01.setup/stubs b/share/doc/smm/01.setup/stubs new file mode 100644 index 0000000..b26f4d5 --- /dev/null +++ b/share/doc/smm/01.setup/stubs @@ -0,0 +1,6 @@ +.\" $FreeBSD$ +.\" +.if n \{\ +. ftr CW R +. ftr C R +.\} diff --git a/share/doc/smm/02.config/Makefile b/share/doc/smm/02.config/Makefile new file mode 100644 index 0000000..26ed70a --- /dev/null +++ b/share/doc/smm/02.config/Makefile @@ -0,0 +1,10 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/02.config +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t a.t b.t c.t d.t e.t +MACROS= -ms +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../usr.sbin/config/SMM.doc + +.include <bsd.doc.mk> diff --git a/share/doc/smm/03.fsck/Makefile b/share/doc/smm/03.fsck/Makefile new file mode 100644 index 0000000..fbdc009 --- /dev/null +++ b/share/doc/smm/03.fsck/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/03.fsck +SRCS= 0.t 1.t 2.t 3.t 4.t +MACROS= -ms +SRCDIR= ${.CURDIR}/../../../../sbin/fsck_ffs/SMM.doc + +.include <bsd.doc.mk> diff --git a/share/doc/smm/04.quotas/Makefile b/share/doc/smm/04.quotas/Makefile new file mode 100644 index 0000000..e3c68ff --- /dev/null +++ b/share/doc/smm/04.quotas/Makefile @@ -0,0 +1,8 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/04.quotas +SRCS= quotas.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/04.quotas/quotas.ms b/share/doc/smm/04.quotas/quotas.ms new file mode 100644 index 0000000..3691830 --- /dev/null +++ b/share/doc/smm/04.quotas/quotas.ms @@ -0,0 +1,318 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)quotas.ms 8.1 (Berkeley) 6/8/93 +.\" +.EH 'SMM:4-%''Disc Quotas in a \s-2UNIX\s+2 Environment' +.OH 'Disc Quotas in a \s-2UNIX\s+2 Environment''SMM:4-%' +.ND 5th July, 1983 +.TL +Disc Quotas in a \s-2UNIX\s+2\s-3\u*\d\s0 Environment +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +.AU +Robert Elz +.AI +Department of Computer Science +University of Melbourne, +Parkville, +Victoria, +Australia. +.AB +.PP +In most computing environments, disc space is not +infinite. +The disc quota system provides a mechanism +to control usage of disc space, on an +individual basis. +.PP +Quotas may be set for each individual user, on any, or +all filesystems. +.PP +The quota system will warn users when they +exceed their allotted limit, but allow some +extra space for current work. +Repeatedly remaining over quota at logout, +will cause a fatal over quota condition eventually. +.PP +The quota system is an optional part of +\s-2VMUNIX\s0 that may be included when the +system is configured. +.AE +.NH 1 +Users' view of disc quotas +.PP +To most users, disc quotas will either be of no concern, +or a fact of life that cannot be avoided. +The +\fIquota\fP\|(1) +command will provide information on any disc quotas +that may have been imposed upon a user. +.PP +There are two individual possible quotas that may be +imposed, usually if one is, both will be. +A limit can be set on the amount of space a user +can occupy, and there may be a limit on the number +of files (inodes) he can own. +.PP +.I Quota +provides information on the quotas that have +been set by the system administrators, in each +of these areas, and current usage. +.PP +There are four numbers for each limit, the current +usage, soft limit (quota), hard limit, and number +of remaining login warnings. +The soft limit is the number of 1K blocks (or files) +that the user is expected to remain below. +Each time the user's usage goes past this limit, +he will be warned. +The hard limit cannot be exceeded. +If a user's usage reaches this number, further +requests for space (or attempts to create a file) +will fail with an EDQUOT error, and the first time +this occurs, a message will be written to the user's +terminal. +Only one message will be output, until space occupied +is reduced below the limit, and reaches it again, +in order to avoid continual noise from those +programs that ignore write errors. +.PP +Whenever a user logs in with a usage greater than +his soft limit, he will be warned, and his login +warning count decremented. +When he logs in under quota, the counter is reset +to its maximum value (which is a system configuration +parameter, that is typically 3). +If the warning count should ever reach zero (caused +by three successive logins over quota), the +particular limit that has been exceeded will be treated +as if the hard limit has been reached, and no +more resources will be allocated to the user. +The \fBonly\fP way to reset this condition is +to reduce usage below quota, then log in again. +.NH 2 +Surviving when quota limit is reached +.PP +In most cases, the only way to recover from over +quota conditions, is to abort whatever activity was in progress +on the filesystem that has reached its limit, remove +sufficient files to bring the limit back below quota, +and retry the failed program. +.PP +However, if you are in the editor and a write fails +because of an over quota situation, that is not +a suitable course of action, as it is most likely +that initially attempting to write the file +will have truncated its previous contents, so should +the editor be aborted without correctly writing the +file not only will the recent changes be lost, but +possibly much, or even all, of the data +that previously existed. +.PP +There are several possible safe exits for a user +caught in this situation. +He may use the editor \fB!\fP shell escape command to +examine his file space, and remove surplus files. +Alternatively, using \fIcsh\fP, he may suspend the +editor, remove some files, then resume it. +A third possibility, is to write the file to +some other filesystem (perhaps to a file on /tmp) +where the user's quota has not been exceeded. +Then after rectifying the quota situation, +the file can be moved back to the filesystem +it belongs on. +.NH 1 +Administering the quota system +.PP +To set up and establish the disc quota system, +there are several steps necessary to be performed +by the system administrator. +.PP +First, the system must be configured to include +the disc quota sub-system. +This is done by including the line: +.DS +options QUOTA +.DE +in the system configuration file, then running +\fIconfig\fP\|(8) +followed by a system configuration\s-3\u*\d\s0. +.FS +* See also the document ``Building 4.2BSD UNIX Systems with Config''. +.FE +.PP +Second, a decision as to what filesystems need to have +quotas applied needs to be made. +Usually, only filesystems that house users' home directories, +or other user files, will need to be subjected to +the quota system, though it may also prove useful to +also include \fB/usr\fR. +If possible, \fB/tmp\fP should usually be free of quotas. +.PP +Having decided on which filesystems quotas need to be +set upon, the administrator should then allocate the +available space amongst the competing needs. How this +should be done is (way) beyond the scope of this document. +.PP +Then, the +\fIedquota\fP\|(8) +command can be used to actually set the limits desired upon +each user. Where a number of users are to be given the +same quotas (a common occurrence) the \fB\-p\fP switch +to edquota will allow this to be easily accomplished. +.PP +Once the quotas are set, ready to operate, the system +must be informed to enforce quotas on the desired filesystems. +This is accomplished with the +\fIquotaon\fP\|(8) +command. +.I Quotaon +will either enable quotas for a particular filesystem, or +with the \fB\-a\fP switch, will enable quotas for each +filesystem indicated in \fB/etc/fstab\fP as using quotas. +See +\fIfstab\fP\|(5) +for details. +Most sites using the quota system, will include the +line +.DS C +/etc/quotaon -a +.DE +in \fB/etc/rc.local\fP. +.PP +Should quotas need to be disabled, the +\fIquotaoff\fP(8) +command will do that, however, should the filesystem be +about to be dismounted, the +\fIumount\fP\|(8) +command will disable quotas immediately before the +filesystem is unmounted. +This is actually an effect of the +\fIumount\fP\|(2) +system call, and it guarantees that the quota system +will not be disabled if the umount would fail +because the filesystem is not idle. +.PP +Periodically (certainly after each reboot, and when quotas +are first enabled for a filesystem), the records retained +in the quota file should be checked for consistency with +the actual number of blocks and files allocated to +the user. +The +\fIquotacheck\fP\|(8) +command can be used to accomplish this. +It is not necessary to dismount the filesystem, or disable +the quota system to run this command, though on +active filesystems inaccurate results may occur. +This does no real harm in most cases, another run of +.I quotacheck +when the filesystem is idle will certainly correct any inaccuracy. +.PP +The super-user may use the +\fIquota\fP\|(1) +command to examine the usage and quotas of any user, and +the +\fIrepquota\fP\|(8) +command may be used to check the usages and limits for +all users on a filesystem. +.NH 1 +Some implementation detail. +.PP +Disc quota usage and information is stored in a file on the +filesystem that the quotas are to be applied to. +Conventionally, this file is \fBquotas\fR in the root of +the filesystem. +While this name is not known to the system in any way, +several of the user level utilities "know" it, and +choosing any other name would not be wise. +.PP +The data in the file comprises an array of structures, indexed +by uid, one structure for each user on the system (whether +the user has a quota on this filesystem or not). +If the uid space is sparse, then the file may have holes +in it, which would be lost by copying, so it is best to +avoid this. +.PP +The system is informed of the existence of the quota +file by the +\fIsetquota\fP\|(2) +system call. +It then reads the quota entries for each user currently +active, then for any files open owned by users who +are not currently active. +Each subsequent open of a file on the filesystem, will +be accompanied by a pairing with its quota information. +In most cases this information will be retained in core, +either because the user who owns the file is running some +process, because other files are open owned by the same +user, or because some file (perhaps this one) was recently +accessed. +In memory, the quota information is kept hashed by user-id +and filesystem, and retained in an LRU chain so recently +released data can be easily reclaimed. +Information about those users whose last process has +recently terminated is also retained in this way. +.PP +Each time a block is accessed or released, and each time an inode +is allocated or freed, the quota system gets told +about it, and in the case of allocations, gets the +opportunity to object. +.PP +Measurements have shown +that the quota code uses a very small percentage of the system +cpu time consumed in writing a new block to disc. +.NH 1 +Acknowledgments +.PP +The current disc quota system is loosely based upon a very +early scheme implemented at the University of New South +Wales, and Sydney University in the mid 70's. That system +implemented a single combined limit for both files and blocks +on all filesystems. +.PP +A later system was implemented at the University of Melbourne +by the author, but was not kept highly accurately, eg: +chown's (etc) did not affect quotas, nor did i/o to a file +other than one owned by the instigator. +.PP +The current system has been running (with only minor modifications) +since January 82 at Melbourne. +It is actually just a small part of a much broader resource +control scheme, which is capable of controlling almost +anything that is usually uncontrolled in unix. The rest +of this is, as yet, still in a state where it is far too +subject to change to be considered for distribution. +.PP +For the 4.2BSD release, much work has been done to clean +up and sanely incorporate the quota code by Sam Leffler and +Kirk McKusick at The University of California at Berkeley. diff --git a/share/doc/smm/05.fastfs/0.t b/share/doc/smm/05.fastfs/0.t new file mode 100644 index 0000000..9cc759b --- /dev/null +++ b/share/doc/smm/05.fastfs/0.t @@ -0,0 +1,159 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 8.1 (Berkeley) 6/8/93 +.\" +.EQ +delim $$ +.EN +.if n .ND +.TL +A Fast File System for UNIX* +.EH 'SMM:05-%''A Fast File System for \s-2UNIX\s+2' +.OH 'A Fast File System for \s-2UNIX\s+2''SMM:05-%' +.AU +Marshall Kirk McKusick, William N. Joy\(dg, +Samuel J. Leffler\(dd, Robert S. Fabry +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +.FS +\(dg William N. Joy is currently employed by: +Sun Microsystems, Inc, 2550 Garcia Avenue, Mountain View, CA 94043 +.FE +.FS +\(dd Samuel J. Leffler is currently employed by: +Lucasfilm Ltd., PO Box 2009, San Rafael, CA 94912 +.FE +.FS +This work was done under grants from +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +ARPA Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.FE +A reimplementation of the UNIX file system is described. +The reimplementation provides substantially higher throughput +rates by using more flexible allocation policies +that allow better locality of reference and can +be adapted to a wide range of peripheral and processor characteristics. +The new file system clusters data that is sequentially accessed +and provides two block sizes to allow fast access to large files +while not wasting large amounts of space for small files. +File access rates of up to ten times faster than the traditional +UNIX file system are experienced. +Long needed enhancements to the programmers' +interface are discussed. +These include a mechanism to place advisory locks on files, +extensions of the name space across file systems, +the ability to use long file names, +and provisions for administrative control of resource usage. +.sp +.LP +Revised February 18, 1984 +.AE +.LP +.sp 2 +CR Categories and Subject Descriptors: +D.4.3 +.B "[Operating Systems]": +File Systems Management \- +.I "file organization, directory structures, access methods"; +D.4.2 +.B "[Operating Systems]": +Storage Management \- +.I "allocation/deallocation strategies, secondary storage devices"; +D.4.8 +.B "[Operating Systems]": +Performance \- +.I "measurements, operational analysis"; +H.3.2 +.B "[Information Systems]": +Information Storage \- +.I "file organization" +.sp +Additional Keywords and Phrases: +UNIX, +file system organization, +file system performance, +file system design, +application program interface. +.sp +General Terms: +file system, +measurement, +performance. +.bp +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. Old file system +.LP +.sp .5v +.nf +.B "3. New file system organization +3.1. Optimizing storage utilization +3.2. File system parameterization +3.3. Layout policies +.LP +.sp .5v +.nf +.B "4. Performance +.LP +.sp .5v +.nf +.B "5. File system functional enhancements +5.1. Long file names +5.2. File locking +5.3. Symbolic links +5.4. Rename +5.5. Quotas +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References diff --git a/share/doc/smm/05.fastfs/1.t b/share/doc/smm/05.fastfs/1.t new file mode 100644 index 0000000..dbdafc4 --- /dev/null +++ b/share/doc/smm/05.fastfs/1.t @@ -0,0 +1,112 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 8.1 (Berkeley) 6/8/93 +.\" +.ds RH Introduction +.NH +Introduction +.PP +This paper describes the changes from the original 512 byte UNIX file +system to the new one released with the 4.2 Berkeley Software Distribution. +It presents the motivations for the changes, +the methods used to effect these changes, +the rationale behind the design decisions, +and a description of the new implementation. +This discussion is followed by a summary of +the results that have been obtained, +directions for future work, +and the additions and changes +that have been made to the facilities that are +available to programmers. +.PP +The original UNIX system that runs on the PDP-11\(dg +.FS +\(dg DEC, PDP, VAX, MASSBUS, and UNIBUS are +trademarks of Digital Equipment Corporation. +.FE +has simple and elegant file system facilities. File system input/output +is buffered by the kernel; +there are no alignment constraints on +data transfers and all operations are made to appear synchronous. +All transfers to the disk are in 512 byte blocks, which can be placed +arbitrarily within the data area of the file system. Virtually +no constraints other than available disk space are placed on file growth +[Ritchie74], [Thompson78].* +.FS +* In practice, a file's size is constrained to be less than about +one gigabyte. +.FE +.PP +When used on the VAX-11 together with other UNIX enhancements, +the original 512 byte UNIX file +system is incapable of providing the data throughput rates +that many applications require. +For example, +applications +such as VLSI design and image processing +do a small amount of processing +on a large quantities of data and +need to have a high throughput from the file system. +High throughput rates are also needed by programs +that map files from the file system into large virtual +address spaces. +Paging data in and out of the file system is likely +to occur frequently [Ferrin82b]. +This requires a file system providing +higher bandwidth than the original 512 byte UNIX +one that provides only about +two percent of the maximum disk bandwidth or about +20 kilobytes per second per arm [White80], [Smith81b]. +.PP +Modifications have been made to the UNIX file system to improve +its performance. +Since the UNIX file system interface +is well understood and not inherently slow, +this development retained the abstraction and simply changed +the underlying implementation to increase its throughput. +Consequently, users of the system have not been faced with +massive software conversion. +.PP +Problems with file system performance have been dealt with +extensively in the literature; see [Smith81a] for a survey. +Previous work to improve the UNIX file system performance has been +done by [Ferrin82a]. +The UNIX operating system drew many of its ideas from Multics, +a large, high performance operating system [Feiertag71]. +Other work includes Hydra [Almes78], +Spice [Thompson80], +and a file system for a LISP environment [Symbolics81]. +A good introduction to the physical latencies of disks is +described in [Pechura83]. +.ds RH Old file system +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/2.t b/share/doc/smm/05.fastfs/2.t new file mode 100644 index 0000000..33d9ade --- /dev/null +++ b/share/doc/smm/05.fastfs/2.t @@ -0,0 +1,143 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 8.1 (Berkeley) 6/8/93 +.\" +.ds RH Old file system +.NH +Old File System +.PP +In the file system developed at Bell Laboratories +(the ``traditional'' file system), +each disk drive is divided into one or more +partitions. Each of these disk partitions may contain +one file system. A file system never spans multiple +partitions.\(dg +.FS +\(dg By ``partition'' here we refer to the subdivision of +physical space on a disk drive. In the traditional file +system, as in the new file system, file systems are really +located in logical disk partitions that may overlap. This +overlapping is made available, for example, +to allow programs to copy entire disk drives containing multiple +file systems. +.FE +A file system is described by its super-block, +which contains the basic parameters of the file system. +These include the number of data blocks in the file system, +a count of the maximum number of files, +and a pointer to the \fIfree list\fP, a linked +list of all the free blocks in the file system. +.PP +Within the file system are files. +Certain files are distinguished as directories and contain +pointers to files that may themselves be directories. +Every file has a descriptor associated with it called an +.I "inode". +An inode contains information describing ownership of the file, +time stamps marking last modification and access times for the file, +and an array of indices that point to the data blocks for the file. +For the purposes of this section, we assume that the first 8 blocks +of the file are directly referenced by values stored +in an inode itself*. +.FS +* The actual number may vary from system to system, but is usually in +the range 5-13. +.FE +An inode may also contain references to indirect blocks +containing further data block indices. +In a file system with a 512 byte block size, a singly indirect +block contains 128 further block addresses, +a doubly indirect block contains 128 addresses of further singly indirect +blocks, +and a triply indirect block contains 128 addresses of further doubly indirect +blocks. +.PP +A 150 megabyte traditional UNIX file system consists +of 4 megabytes of inodes followed by 146 megabytes of data. +This organization segregates the inode information from the data; +thus accessing a file normally incurs a long seek from the +file's inode to its data. +Files in a single directory are not typically allocated +consecutive slots in the 4 megabytes of inodes, +causing many non-consecutive blocks of inodes +to be accessed when executing +operations on the inodes of several files in a directory. +.PP +The allocation of data blocks to files is also suboptimum. +The traditional +file system never transfers more than 512 bytes per disk transaction +and often finds that the next sequential data block is not on the same +cylinder, forcing seeks between 512 byte transfers. +The combination of the small block size, +limited read-ahead in the system, +and many seeks severely limits file system throughput. +.PP +The first work at Berkeley on the UNIX file system attempted to improve both +reliability and throughput. +The reliability was improved by staging modifications +to critical file system information so that they could +either be completed or repaired cleanly by a program +after a crash [Kowalski78]. +The file system performance was improved by a factor of more than two by +changing the basic block size from 512 to 1024 bytes. +The increase was because of two factors: +each disk transfer accessed twice as much data, +and most files could be described without need to access +indirect blocks since the direct blocks contained twice as much data. +The file system with these changes will henceforth be referred to as the +.I "old file system." +.PP +This performance improvement gave a strong indication that +increasing the block size was a good method for improving +throughput. +Although the throughput had doubled, +the old file system was still using only about +four percent of the disk bandwidth. +The main problem was that although the free list was initially +ordered for optimal access, +it quickly became scrambled as files were created and removed. +Eventually the free list became entirely random, +causing files to have their blocks allocated randomly over the disk. +This forced a seek before every block access. +Although old file systems provided transfer rates of up +to 175 kilobytes per second when they were first created, +this rate deteriorated to 30 kilobytes per second after a +few weeks of moderate use because of this +randomization of data block placement. +There was no way of restoring the performance of an old file system +except to dump, rebuild, and restore the file system. +Another possibility, as suggested by [Maruyama76], +would be to have a process that periodically +reorganized the data on the disk to restore locality. +.ds RH New file system +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/3.t b/share/doc/smm/05.fastfs/3.t new file mode 100644 index 0000000..23db86a --- /dev/null +++ b/share/doc/smm/05.fastfs/3.t @@ -0,0 +1,598 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)3.t 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.\" +.ds RH New file system +.NH +New file system organization +.PP +In the new file system organization (as in the +old file system organization), +each disk drive contains one or more file systems. +A file system is described by its super-block, +located at the beginning of the file system's disk partition. +Because the super-block contains critical data, +it is replicated to protect against catastrophic loss. +This is done when the file system is created; +since the super-block data does not change, +the copies need not be referenced unless a head crash +or other hard disk error causes the default super-block +to be unusable. +.PP +To insure that it is possible to create files as large as +.if n 2 ** 32 +.if t $2 sup 32$ +bytes with only two levels of indirection, +the minimum size of a file system block is 4096 bytes. +The size of file system blocks can be any power of two +greater than or equal to 4096. +The block size of a file system is recorded in the +file system's super-block +so it is possible for file systems with different block sizes +to be simultaneously accessible on the same system. +The block size must be decided at the time that +the file system is created; +it cannot be subsequently changed without rebuilding the file system. +.PP +The new file system organization divides a disk partition +into one or more areas called +.I "cylinder groups". +A cylinder group is comprised of one or more consecutive +cylinders on a disk. +Associated with each cylinder group is some bookkeeping information +that includes a redundant copy of the super-block, +space for inodes, +a bit map describing available blocks in the cylinder group, +and summary information describing the usage of data blocks +within the cylinder group. +The bit map of available blocks in the cylinder group replaces +the traditional file system's free list. +For each cylinder group a static number of inodes +is allocated at file system creation time. +The default policy is to allocate one inode for each 2048 +bytes of space in the cylinder group, expecting this +to be far more than will ever be needed. +.PP +All the cylinder group bookkeeping information could be +placed at the beginning of each cylinder group. +However if this approach were used, +all the redundant information would be on the top platter. +A single hardware failure that destroyed the top platter +could cause the loss of all redundant copies of the super-block. +Thus the cylinder group bookkeeping information +begins at a varying offset from the beginning of the cylinder group. +The offset for each successive cylinder group is calculated to be +about one track further from the beginning of the cylinder group +than the preceding cylinder group. +In this way the redundant +information spirals down into the pack so that any single track, cylinder, +or platter can be lost without losing all copies of the super-block. +Except for the first cylinder group, +the space between the beginning of the cylinder group +and the beginning of the cylinder group information +is used for data blocks.\(dg +.FS +\(dg While it appears that the first cylinder group could be laid +out with its super-block at the ``known'' location, +this would not work for file systems +with blocks sizes of 16 kilobytes or greater. +This is because of a requirement that the first 8 kilobytes of the disk +be reserved for a bootstrap program and a separate requirement that +the cylinder group information begin on a file system block boundary. +To start the cylinder group on a file system block boundary, +file systems with block sizes larger than 8 kilobytes +would have to leave an empty space between the end of +the boot block and the beginning of the cylinder group. +Without knowing the size of the file system blocks, +the system would not know what roundup function to use +to find the beginning of the first cylinder group. +.FE +.NH 2 +Optimizing storage utilization +.PP +Data is laid out so that larger blocks can be transferred +in a single disk transaction, greatly increasing file system throughput. +As an example, consider a file in the new file system +composed of 4096 byte data blocks. +In the old file system this file would be composed of 1024 byte blocks. +By increasing the block size, disk accesses in the new file +system may transfer up to four times as much information per +disk transaction. +In large files, several +4096 byte blocks may be allocated from the same cylinder so that +even larger data transfers are possible before requiring a seek. +.PP +The main problem with +larger blocks is that most UNIX +file systems are composed of many small files. +A uniformly large block size wastes space. +Table 1 shows the effect of file system +block size on the amount of wasted space in the file system. +The files measured to obtain these figures reside on +one of our time sharing +systems that has roughly 1.2 gigabytes of on-line storage. +The measurements are based on the active user file systems containing +about 920 megabytes of formatted space. +.KF +.DS B +.TS +box; +l|l|l +a|n|l. +Space used % waste Organization +_ +775.2 Mb 0.0 Data only, no separation between files +807.8 Mb 4.2 Data only, each file starts on 512 byte boundary +828.7 Mb 6.9 Data + inodes, 512 byte block UNIX file system +866.5 Mb 11.8 Data + inodes, 1024 byte block UNIX file system +948.5 Mb 22.4 Data + inodes, 2048 byte block UNIX file system +1128.3 Mb 45.6 Data + inodes, 4096 byte block UNIX file system +.TE +Table 1 \- Amount of wasted space as a function of block size. +.DE +.KE +The space wasted is calculated to be the percentage of space +on the disk not containing user data. +As the block size on the disk +increases, the waste rises quickly, to an intolerable +45.6% waste with 4096 byte file system blocks. +.PP +To be able to use large blocks without undue waste, +small files must be stored in a more efficient way. +The new file system accomplishes this goal by allowing the division +of a single file system block into one or more +.I "fragments". +The file system fragment size is specified +at the time that the file system is created; +each file system block can optionally be broken into +2, 4, or 8 fragments, each of which is addressable. +The lower bound on the size of these fragments is constrained +by the disk sector size, +typically 512 bytes. +The block map associated with each cylinder group +records the space available in a cylinder group +at the fragment level; +to determine if a block is available, aligned fragments are examined. +Figure 1 shows a piece of a map from a 4096/1024 file system. +.KF +.DS B +.TS +box; +l|c c c c. +Bits in map XXXX XXOO OOXX OOOO +Fragment numbers 0-3 4-7 8-11 12-15 +Block numbers 0 1 2 3 +.TE +Figure 1 \- Example layout of blocks and fragments in a 4096/1024 file system. +.DE +.KE +Each bit in the map records the status of a fragment; +an ``X'' shows that the fragment is in use, +while an ``O'' shows that the fragment is available for allocation. +In this example, +fragments 0\-5, 10, and 11 are in use, +while fragments 6\-9, and 12\-15 are free. +Fragments of adjoining blocks cannot be used as a full block, +even if they are large enough. +In this example, +fragments 6\-9 cannot be allocated as a full block; +only fragments 12\-15 can be coalesced into a full block. +.PP +On a file system with a block size of 4096 bytes +and a fragment size of 1024 bytes, +a file is represented by zero or more 4096 byte blocks of data, +and possibly a single fragmented block. +If a file system block must be fragmented to obtain +space for a small amount of data, +the remaining fragments of the block are made +available for allocation to other files. +As an example consider an 11000 byte file stored on +a 4096/1024 byte file system. +This file would uses two full size blocks and one +three fragment portion of another block. +If no block with three aligned fragments is +available at the time the file is created, +a full size block is split yielding the necessary +fragments and a single unused fragment. +This remaining fragment can be allocated to another file as needed. +.PP +Space is allocated to a file when a program does a \fIwrite\fP +system call. +Each time data is written to a file, the system checks to see if +the size of the file has increased*. +.FS +* A program may be overwriting data in the middle of an existing file +in which case space would already have been allocated. +.FE +If the file needs to be expanded to hold the new data, +one of three conditions exists: +.IP 1) +There is enough space left in an already allocated +block or fragment to hold the new data. +The new data is written into the available space. +.IP 2) +The file contains no fragmented blocks (and the last +block in the file +contains insufficient space to hold the new data). +If space exists in a block already allocated, +the space is filled with new data. +If the remainder of the new data contains more than +a full block of data, a full block is allocated and +the first full block of new data is written there. +This process is repeated until less than a full block +of new data remains. +If the remaining new data to be written will +fit in less than a full block, +a block with the necessary fragments is located, +otherwise a full block is located. +The remaining new data is written into the located space. +.IP 3) +The file contains one or more fragments (and the +fragments contain insufficient space to hold the new data). +If the size of the new data plus the size of the data +already in the fragments exceeds the size of a full block, +a new block is allocated. +The contents of the fragments are copied +to the beginning of the block +and the remainder of the block is filled with new data. +The process then continues as in (2) above. +Otherwise, if the new data to be written will +fit in less than a full block, +a block with the necessary fragments is located, +otherwise a full block is located. +The contents of the existing fragments +appended with the new data +are written into the allocated space. +.PP +The problem with expanding a file one fragment at a +a time is that data may be copied many times as a +fragmented block expands to a full block. +Fragment reallocation can be minimized +if the user program writes a full block at a time, +except for a partial block at the end of the file. +Since file systems with different block sizes may reside on +the same system, +the file system interface has been extended to provide +application programs the optimal size for a read or write. +For files the optimal size is the block size of the file system +on which the file is being accessed. +For other objects, such as pipes and sockets, +the optimal size is the underlying buffer size. +This feature is used by the Standard +Input/Output Library, +a package used by most user programs. +This feature is also used by +certain system utilities such as archivers and loaders +that do their own input and output management +and need the highest possible file system bandwidth. +.PP +The amount of wasted space in the 4096/1024 byte new file system +organization is empirically observed to be about the same as in the +1024 byte old file system organization. +A file system with 4096 byte blocks and 512 byte fragments +has about the same amount of wasted space as the 512 byte +block UNIX file system. +The new file system uses less space +than the 512 byte or 1024 byte +file systems for indexing information for +large files and the same amount of space +for small files. +These savings are offset by the need to use +more space for keeping track of available free blocks. +The net result is about the same disk utilization +when a new file system's fragment size +equals an old file system's block size. +.PP +In order for the layout policies to be effective, +a file system cannot be kept completely full. +For each file system there is a parameter, termed +the free space reserve, that +gives the minimum acceptable percentage of file system +blocks that should be free. +If the number of free blocks drops below this level +only the system administrator can continue to allocate blocks. +The value of this parameter may be changed at any time, +even when the file system is mounted and active. +The transfer rates that appear in section 4 were measured on file +systems kept less than 90% full (a reserve of 10%). +If the number of free blocks falls to zero, +the file system throughput tends to be cut in half, +because of the inability of the file system to localize +blocks in a file. +If a file system's performance degrades because +of overfilling, it may be restored by removing +files until the amount of free space once again +reaches the minimum acceptable level. +Access rates for files created during periods of little +free space may be restored by moving their data once enough +space is available. +The free space reserve must be added to the +percentage of waste when comparing the organizations given +in Table 1. +Thus, the percentage of waste in +an old 1024 byte UNIX file system is roughly +comparable to a new 4096/512 byte file system +with the free space reserve set at 5%. +(Compare 11.8% wasted with the old file system +to 6.9% waste + 5% reserved space in the +new file system.) +.NH 2 +File system parameterization +.PP +Except for the initial creation of the free list, +the old file system ignores the parameters of the underlying hardware. +It has no information about either the physical characteristics +of the mass storage device, +or the hardware that interacts with it. +A goal of the new file system is to parameterize the +processor capabilities and +mass storage characteristics +so that blocks can be allocated in an +optimum configuration-dependent way. +Parameters used include the speed of the processor, +the hardware support for mass storage transfers, +and the characteristics of the mass storage devices. +Disk technology is constantly improving and +a given installation can have several different disk technologies +running on a single processor. +Each file system is parameterized so that it can be +adapted to the characteristics of the disk on which +it is placed. +.PP +For mass storage devices such as disks, +the new file system tries to allocate new blocks +on the same cylinder as the previous block in the same file. +Optimally, these new blocks will also be +rotationally well positioned. +The distance between ``rotationally optimal'' blocks varies greatly; +it can be a consecutive block +or a rotationally delayed block +depending on system characteristics. +On a processor with an input/output channel that does not require +any processor intervention between mass storage transfer requests, +two consecutive disk blocks can often be accessed +without suffering lost time because of an intervening disk revolution. +For processors without input/output channels, +the main processor must field an interrupt and +prepare for a new disk transfer. +The expected time to service this interrupt and +schedule a new disk transfer depends on the +speed of the main processor. +.PP +The physical characteristics of each disk include +the number of blocks per track and the rate at which +the disk spins. +The allocation routines use this information to calculate +the number of milliseconds required to skip over a block. +The characteristics of the processor include +the expected time to service an interrupt and schedule a +new disk transfer. +Given a block allocated to a file, +the allocation routines calculate the number of blocks to +skip over so that the next block in the file will +come into position under the disk head in the expected +amount of time that it takes to start a new +disk transfer operation. +For programs that sequentially access large amounts of data, +this strategy minimizes the amount of time spent waiting for +the disk to position itself. +.PP +To ease the calculation of finding rotationally optimal blocks, +the cylinder group summary information includes +a count of the available blocks in a cylinder +group at different rotational positions. +Eight rotational positions are distinguished, +so the resolution of the +summary information is 2 milliseconds for a typical 3600 +revolution per minute drive. +The super-block contains a vector of lists called +.I "rotational layout tables". +The vector is indexed by rotational position. +Each component of the vector +lists the index into the block map for every data block contained +in its rotational position. +When looking for an allocatable block, +the system first looks through the summary counts for a rotational +position with a non-zero block count. +It then uses the index of the rotational position to find the appropriate +list to use to index through +only the relevant parts of the block map to find a free block. +.PP +The parameter that defines the +minimum number of milliseconds between the completion of a data +transfer and the initiation of +another data transfer on the same cylinder +can be changed at any time, +even when the file system is mounted and active. +If a file system is parameterized to lay out blocks with +a rotational separation of 2 milliseconds, +and the disk pack is then moved to a system that has a +processor requiring 4 milliseconds to schedule a disk operation, +the throughput will drop precipitously because of lost disk revolutions +on nearly every block. +If the eventual target machine is known, +the file system can be parameterized for it +even though it is initially created on a different processor. +Even if the move is not known in advance, +the rotational layout delay can be reconfigured after the disk is moved +so that all further allocation is done based on the +characteristics of the new host. +.NH 2 +Layout policies +.PP +The file system layout policies are divided into two distinct parts. +At the top level are global policies that use file system +wide summary information to make decisions regarding +the placement of new inodes and data blocks. +These routines are responsible for deciding the +placement of new directories and files. +They also calculate rotationally optimal block layouts, +and decide when to force a long seek to a new cylinder group +because there are insufficient blocks left +in the current cylinder group to do reasonable layouts. +Below the global policy routines are +the local allocation routines that use a locally optimal scheme to +lay out data blocks. +.PP +Two methods for improving file system performance are to increase +the locality of reference to minimize seek latency +as described by [Trivedi80], and +to improve the layout of data to make larger transfers possible +as described by [Nevalainen77]. +The global layout policies try to improve performance +by clustering related information. +They cannot attempt to localize all data references, +but must also try to spread unrelated data +among different cylinder groups. +If too much localization is attempted, +the local cylinder group may run out of space +forcing the data to be scattered to non-local cylinder groups. +Taken to an extreme, +total localization can result in a single huge cluster of data +resembling the old file system. +The global policies try to balance the two conflicting +goals of localizing data that is concurrently accessed +while spreading out unrelated data. +.PP +One allocatable resource is inodes. +Inodes are used to describe both files and directories. +Inodes of files in the same directory are frequently accessed together. +For example, the ``list directory'' command often accesses +the inode for each file in a directory. +The layout policy tries to place all the inodes of +files in a directory in the same cylinder group. +To ensure that files are distributed throughout the disk, +a different policy is used for directory allocation. +A new directory is placed in a cylinder group that has a greater +than average number of free inodes, +and the smallest number of directories already in it. +The intent of this policy is to allow the inode clustering policy +to succeed most of the time. +The allocation of inodes within a cylinder group is done using a +next free strategy. +Although this allocates the inodes randomly within a cylinder group, +all the inodes for a particular cylinder group can be read with +8 to 16 disk transfers. +(At most 16 disk transfers are required because a cylinder +group may have no more than 2048 inodes.) +This puts a small and constant upper bound on the number of +disk transfers required to access the inodes +for all the files in a directory. +In contrast, the old file system typically requires +one disk transfer to fetch the inode for each file in a directory. +.PP +The other major resource is data blocks. +Since data blocks for a file are typically accessed together, +the policy routines try to place all data +blocks for a file in the same cylinder group, +preferably at rotationally optimal positions in the same cylinder. +The problem with allocating all the data blocks +in the same cylinder group is that large files will +quickly use up available space in the cylinder group, +forcing a spill over to other areas. +Further, using all the space in a cylinder group +causes future allocations for any file in the cylinder group +to also spill to other areas. +Ideally none of the cylinder groups should ever become completely full. +The heuristic solution chosen is to +redirect block allocation +to a different cylinder group +when a file exceeds 48 kilobytes, +and at every megabyte thereafter.* +.FS +* The first spill over point at 48 kilobytes is the point +at which a file on a 4096 byte block file system first +requires a single indirect block. This appears to be +a natural first point at which to redirect block allocation. +The other spillover points are chosen with the intent of +forcing block allocation to be redirected when a +file has used about 25% of the data blocks in a cylinder group. +In observing the new file system in day to day use, the heuristics appear +to work well in minimizing the number of completely filled +cylinder groups. +.FE +The newly chosen cylinder group is selected from those cylinder +groups that have a greater than average number of free blocks left. +Although big files tend to be spread out over the disk, +a megabyte of data is typically accessible before +a long seek must be performed, +and the cost of one long seek per megabyte is small. +.PP +The global policy routines call local allocation routines with +requests for specific blocks. +The local allocation routines will +always allocate the requested block +if it is free, otherwise it +allocates a free block of the requested size that is +rotationally closest to the requested block. +If the global layout policies had complete information, +they could always request unused blocks and +the allocation routines would be reduced to simple bookkeeping. +However, maintaining complete information is costly; +thus the implementation of the global layout policy +uses heuristics that employ only partial information. +.PP +If a requested block is not available, the local allocator uses +a four level allocation strategy: +.IP 1) +Use the next available block rotationally closest +to the requested block on the same cylinder. It is assumed +here that head switching time is zero. On disk +controllers where this is not the case, it may be possible +to incorporate the time required to switch between disk platters +when constructing the rotational layout tables. This, however, +has not yet been tried. +.IP 2) +If there are no blocks available on the same cylinder, +use a block within the same cylinder group. +.IP 3) +If that cylinder group is entirely full, +quadratically hash the cylinder group number to choose +another cylinder group to look for a free block. +.IP 4) +Finally if the hash fails, apply an exhaustive search +to all cylinder groups. +.PP +Quadratic hash is used because of its speed in finding +unused slots in nearly full hash tables [Knuth75]. +File systems that are parameterized to maintain at least +10% free space rarely use this strategy. +File systems that are run without maintaining any free +space typically have so few free blocks that almost any +allocation is random; +the most important characteristic of +the strategy used under such conditions is that the strategy be fast. +.ds RH Performance +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/4.t b/share/doc/smm/05.fastfs/4.t new file mode 100644 index 0000000..15e3923 --- /dev/null +++ b/share/doc/smm/05.fastfs/4.t @@ -0,0 +1,252 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)4.t 8.1 (Berkeley) 6/8/93 +.\" +.ds RH Performance +.NH +Performance +.PP +Ultimately, the proof of the effectiveness of the +algorithms described in the previous section +is the long term performance of the new file system. +.PP +Our empirical studies have shown that the inode layout policy has +been effective. +When running the ``list directory'' command on a large directory +that itself contains many directories (to force the system +to access inodes in multiple cylinder groups), +the number of disk accesses for inodes is cut by a factor of two. +The improvements are even more dramatic for large directories +containing only files, +disk accesses for inodes being cut by a factor of eight. +This is most encouraging for programs such as spooling daemons that +access many small files, +since these programs tend to flood the +disk request queue on the old file system. +.PP +Table 2 summarizes the measured throughput of the new file system. +Several comments need to be made about the conditions under which these +tests were run. +The test programs measure the rate at which user programs can transfer +data to or from a file without performing any processing on it. +These programs must read and write enough data to +insure that buffering in the +operating system does not affect the results. +They are also run at least three times in succession; +the first to get the system into a known state +and the second two to insure that the +experiment has stabilized and is repeatable. +The tests used and their results are +discussed in detail in [Kridle83]\(dg. +.FS +\(dg A UNIX command that is similar to the reading test that we used is +``cp file /dev/null'', where ``file'' is eight megabytes long. +.FE +The systems were running multi-user but were otherwise quiescent. +There was no contention for either the CPU or the disk arm. +The only difference between the UNIBUS and MASSBUS tests +was the controller. +All tests used an AMPEX Capricorn 330 megabyte Winchester disk. +As Table 2 shows, all file system test runs were on a VAX 11/750. +All file systems had been in production use for at least +a month before being measured. +The same number of system calls were performed in all tests; +the basic system call overhead was a negligible portion of +the total running time of the tests. +.KF +.DS B +.TS +box; +c c|c s s +c c|c c c. +Type of Processor and Read +File System Bus Measured Speed Bandwidth % CPU +_ +old 1024 750/UNIBUS 29 Kbytes/sec 29/983 3% 11% +new 4096/1024 750/UNIBUS 221 Kbytes/sec 221/983 22% 43% +new 8192/1024 750/UNIBUS 233 Kbytes/sec 233/983 24% 29% +new 4096/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 73% +new 8192/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 54% +.TE +.ce 1 +Table 2a \- Reading rates of the old and new UNIX file systems. +.TS +box; +c c|c s s +c c|c c c. +Type of Processor and Write +File System Bus Measured Speed Bandwidth % CPU +_ +old 1024 750/UNIBUS 48 Kbytes/sec 48/983 5% 29% +new 4096/1024 750/UNIBUS 142 Kbytes/sec 142/983 14% 43% +new 8192/1024 750/UNIBUS 215 Kbytes/sec 215/983 22% 46% +new 4096/1024 750/MASSBUS 323 Kbytes/sec 323/983 33% 94% +new 8192/1024 750/MASSBUS 466 Kbytes/sec 466/983 47% 95% +.TE +.ce 1 +Table 2b \- Writing rates of the old and new UNIX file systems. +.DE +.KE +.PP +Unlike the old file system, +the transfer rates for the new file system do not +appear to change over time. +The throughput rate is tied much more strongly to the +amount of free space that is maintained. +The measurements in Table 2 were based on a file system +with a 10% free space reserve. +Synthetic work loads suggest that throughput deteriorates +to about half the rates given in Table 2 when the file +systems are full. +.PP +The percentage of bandwidth given in Table 2 is a measure +of the effective utilization of the disk by the file system. +An upper bound on the transfer rate from the disk is calculated +by multiplying the number of bytes on a track by the number +of revolutions of the disk per second. +The bandwidth is calculated by comparing the data rates +the file system is able to achieve as a percentage of this rate. +Using this metric, the old file system is only +able to use about 3\-5% of the disk bandwidth, +while the new file system uses up to 47% +of the bandwidth. +.PP +Both reads and writes are faster in the new system than in the old system. +The biggest factor in this speedup is because of the larger +block size used by the new file system. +The overhead of allocating blocks in the new system is greater +than the overhead of allocating blocks in the old system, +however fewer blocks need to be allocated in the new system +because they are bigger. +The net effect is that the cost per byte allocated is about +the same for both systems. +.PP +In the new file system, the reading rate is always at least +as fast as the writing rate. +This is to be expected since the kernel must do more work when +allocating blocks than when simply reading them. +Note that the write rates are about the same +as the read rates in the 8192 byte block file system; +the write rates are slower than the read rates in the 4096 byte block +file system. +The slower write rates occur because +the kernel has to do twice as many disk allocations per second, +making the processor unable to keep up with the disk transfer rate. +.PP +In contrast the old file system is about 50% +faster at writing files than reading them. +This is because the write system call is asynchronous and +the kernel can generate disk transfer +requests much faster than they can be serviced, +hence disk transfers queue up in the disk buffer cache. +Because the disk buffer cache is sorted by minimum seek distance, +the average seek between the scheduled disk writes is much +less than it would be if the data blocks were written out +in the random disk order in which they are generated. +However when the file is read, +the read system call is processed synchronously so +the disk blocks must be retrieved from the disk in the +non-optimal seek order in which they are requested. +This forces the disk scheduler to do long +seeks resulting in a lower throughput rate. +.PP +In the new system the blocks of a file are more optimally +ordered on the disk. +Even though reads are still synchronous, +the requests are presented to the disk in a much better order. +Even though the writes are still asynchronous, +they are already presented to the disk in minimum seek +order so there is no gain to be had by reordering them. +Hence the disk seek latencies that limited the old file system +have little effect in the new file system. +The cost of allocation is the factor in the new system that +causes writes to be slower than reads. +.PP +The performance of the new file system is currently +limited by memory to memory copy operations +required to move data from disk buffers in the +system's address space to data buffers in the user's +address space. These copy operations account for +about 40% of the time spent performing an input/output operation. +If the buffers in both address spaces were properly aligned, +this transfer could be performed without copying by +using the VAX virtual memory management hardware. +This would be especially desirable when transferring +large amounts of data. +We did not implement this because it would change the +user interface to the file system in two major ways: +user programs would be required to allocate buffers on page boundaries, +and data would disappear from buffers after being written. +.PP +Greater disk throughput could be achieved by rewriting the disk drivers +to chain together kernel buffers. +This would allow contiguous disk blocks to be read +in a single disk transaction. +Many disks used with UNIX systems contain either +32 or 48 512 byte sectors per track. +Each track holds exactly two or three 8192 byte file system blocks, +or four or six 4096 byte file system blocks. +The inability to use contiguous disk blocks +effectively limits the performance +on these disks to less than 50% of the available bandwidth. +If the next block for a file cannot be laid out contiguously, +then the minimum spacing to the next allocatable +block on any platter is between a sixth and a half a revolution. +The implication of this is that the best possible layout without +contiguous blocks uses only half of the bandwidth of any given track. +If each track contains an odd number of sectors, +then it is possible to resolve the rotational delay to any number of sectors +by finding a block that begins at the desired +rotational position on another track. +The reason that block chaining has not been implemented is because it +would require rewriting all the disk drivers in the system, +and the current throughput rates are already limited by the +speed of the available processors. +.PP +Currently only one block is allocated to a file at a time. +A technique used by the DEMOS file system +when it finds that a file is growing rapidly, +is to preallocate several blocks at once, +releasing them when the file is closed if they remain unused. +By batching up allocations, the system can reduce the +overhead of allocating at each write, +and it can cut down on the number of disk writes needed to +keep the block pointers on the disk +synchronized with the block allocation [Powell79]. +This technique was not included because block allocation +currently accounts for less than 10% of the time spent in +a write system call and, once again, the +current throughput rates are already limited by the speed +of the available processors. +.ds RH Functional enhancements +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/5.t b/share/doc/smm/05.fastfs/5.t new file mode 100644 index 0000000..96d721a --- /dev/null +++ b/share/doc/smm/05.fastfs/5.t @@ -0,0 +1,293 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)5.t 8.1 (Berkeley) 6/8/93 +.\" +.ds RH Functional enhancements +.NH +File system functional enhancements +.PP +The performance enhancements to the +UNIX file system did not require +any changes to the semantics or +data structures visible to application programs. +However, several changes had been generally desired for some +time but had not been introduced because they would require users to +dump and restore all their file systems. +Since the new file system already +required all existing file systems to +be dumped and restored, +these functional enhancements were introduced at this time. +.NH 2 +Long file names +.PP +File names can now be of nearly arbitrary length. +Only programs that read directories are affected by this change. +To promote portability to UNIX systems that +are not running the new file system, a set of directory +access routines have been introduced to provide a consistent +interface to directories on both old and new systems. +.PP +Directories are allocated in 512 byte units called chunks. +This size is chosen so that each allocation can be transferred +to disk in a single operation. +Chunks are broken up into variable length records termed +directory entries. A directory entry +contains the information necessary to map the name of a +file to its associated inode. +No directory entry is allowed to span multiple chunks. +The first three fields of a directory entry are fixed length +and contain: an inode number, the size of the entry, and the length +of the file name contained in the entry. +The remainder of an entry is variable length and contains +a null terminated file name, padded to a 4 byte boundary. +The maximum length of a file name in a directory is +currently 255 characters. +.PP +Available space in a directory is recorded by having +one or more entries accumulate the free space in their +entry size fields. This results in directory entries +that are larger than required to hold the +entry name plus fixed length fields. Space allocated +to a directory should always be completely accounted for +by totaling up the sizes of its entries. +When an entry is deleted from a directory, +its space is returned to a previous entry +in the same directory chunk by increasing the size of the +previous entry by the size of the deleted entry. +If the first entry of a directory chunk is free, then +the entry's inode number is set to zero to indicate +that it is unallocated. +.NH 2 +File locking +.PP +The old file system had no provision for locking files. +Processes that needed to synchronize the updates of a +file had to use a separate ``lock'' file. +A process would try to create a ``lock'' file. +If the creation succeeded, then the process +could proceed with its update; +if the creation failed, then the process would wait and try again. +This mechanism had three drawbacks. +Processes consumed CPU time by looping over attempts to create locks. +Locks left lying around because of system crashes had +to be manually removed (normally in a system startup command script). +Finally, processes running as system administrator +are always permitted to create files, +so were forced to use a different mechanism. +While it is possible to get around all these problems, +the solutions are not straight forward, +so a mechanism for locking files has been added. +.PP +The most general schemes allow multiple processes +to concurrently update a file. +Several of these techniques are discussed in [Peterson83]. +A simpler technique is to serialize access to a file with locks. +To attain reasonable efficiency, +certain applications require the ability to lock pieces of a file. +Locking down to the byte level has been implemented in the +Onyx file system by [Bass81]. +However, for the standard system applications, +a mechanism that locks at the granularity of a file is sufficient. +.PP +Locking schemes fall into two classes, +those using hard locks and those using advisory locks. +The primary difference between advisory locks and hard locks is the +extent of enforcement. +A hard lock is always enforced when a program tries to +access a file; +an advisory lock is only applied when it is requested by a program. +Thus advisory locks are only effective when all programs accessing +a file use the locking scheme. +With hard locks there must be some override +policy implemented in the kernel. +With advisory locks the policy is left to the user programs. +In the UNIX system, programs with system administrator +privilege are allowed override any protection scheme. +Because many of the programs that need to use locks must +also run as the system administrator, +we chose to implement advisory locks rather than +create an additional protection scheme that was inconsistent +with the UNIX philosophy or could +not be used by system administration programs. +.PP +The file locking facilities allow cooperating programs to apply +advisory +.I shared +or +.I exclusive +locks on files. +Only one process may have an exclusive +lock on a file while multiple shared locks may be present. +Both shared and exclusive locks cannot be present on +a file at the same time. +If any lock is requested when +another process holds an exclusive lock, +or an exclusive lock is requested when another process holds any lock, +the lock request will block until the lock can be obtained. +Because shared and exclusive locks are advisory only, +even if a process has obtained a lock on a file, +another process may access the file. +.PP +Locks are applied or removed only on open files. +This means that locks can be manipulated without +needing to close and reopen a file. +This is useful, for example, when a process wishes +to apply a shared lock, read some information +and determine whether an update is required, then +apply an exclusive lock and update the file. +.PP +A request for a lock will cause a process to block if the lock +can not be immediately obtained. +In certain instances this is unsatisfactory. +For example, a process that +wants only to check if a lock is present would require a separate +mechanism to find out this information. +Consequently, a process may specify that its locking +request should return with an error if a lock can not be immediately +obtained. +Being able to conditionally request a lock +is useful to ``daemon'' processes +that wish to service a spooling area. +If the first instance of the +daemon locks the directory where spooling takes place, +later daemon processes can +easily check to see if an active daemon exists. +Since locks exist only while the locking processes exist, +lock files can never be left active after +the processes exit or if the system crashes. +.PP +Almost no deadlock detection is attempted. +The only deadlock detection done by the system is that the file +to which a lock is applied must not already have a +lock of the same type (i.e. the second of two successive calls +to apply a lock of the same type will fail). +.NH 2 +Symbolic links +.PP +The traditional UNIX file system allows multiple +directory entries in the same file system +to reference a single file. Each directory entry +``links'' a file's name to an inode and its contents. +The link concept is fundamental; +inodes do not reside in directories, but exist separately and +are referenced by links. +When all the links to an inode are removed, +the inode is deallocated. +This style of referencing an inode does +not allow references across physical file +systems, nor does it support inter-machine linkage. +To avoid these limitations +.I "symbolic links" +similar to the scheme used by Multics [Feiertag71] have been added. +.PP +A symbolic link is implemented as a file that contains a pathname. +When the system encounters a symbolic link while +interpreting a component of a pathname, +the contents of the symbolic link is prepended to the rest +of the pathname, and this name is interpreted to yield the +resulting pathname. +In UNIX, pathnames are specified relative to the root +of the file system hierarchy, or relative to a process's +current working directory. Pathnames specified relative +to the root are called absolute pathnames. Pathnames +specified relative to the current working directory are +termed relative pathnames. +If a symbolic link contains an absolute pathname, +the absolute pathname is used, +otherwise the contents of the symbolic link is evaluated +relative to the location of the link in the file hierarchy. +.PP +Normally programs do not want to be aware that there is a +symbolic link in a pathname that they are using. +However certain system utilities +must be able to detect and manipulate symbolic links. +Three new system calls provide the ability to detect, read, and write +symbolic links; seven system utilities required changes +to use these calls. +.PP +In future Berkeley software distributions +it may be possible to reference file systems located on +remote machines using pathnames. When this occurs, +it will be possible to create symbolic links that span machines. +.NH 2 +Rename +.PP +Programs that create a new version of an existing +file typically create the +new version as a temporary file and then rename the temporary file +with the name of the target file. +In the old UNIX file system renaming required three calls to the system. +If a program were interrupted or the system crashed between these calls, +the target file could be left with only its temporary name. +To eliminate this possibility the \fIrename\fP system call +has been added. The rename call does the rename operation +in a fashion that guarantees the existence of the target name. +.PP +Rename works both on data files and directories. +When renaming directories, +the system must do special validation checks to insure +that the directory tree structure is not corrupted by the creation +of loops or inaccessible directories. +Such corruption would occur if a parent directory were moved +into one of its descendants. +The validation check requires tracing the descendents of the target +directory to insure that it does not include the directory being moved. +.NH 2 +Quotas +.PP +The UNIX system has traditionally attempted to share all available +resources to the greatest extent possible. +Thus any single user can allocate all the available space +in the file system. +In certain environments this is unacceptable. +Consequently, a quota mechanism has been added for restricting the +amount of file system resources that a user can obtain. +The quota mechanism sets limits on both the number of inodes +and the number of disk blocks that a user may allocate. +A separate quota can be set for each user on each file system. +Resources are given both a hard and a soft limit. +When a program exceeds a soft limit, +a warning is printed on the users terminal; +the offending program is not terminated +unless it exceeds its hard limit. +The idea is that users should stay below their soft limit between +login sessions, +but they may use more resources while they are actively working. +To encourage this behavior, +users are warned when logging in if they are over +any of their soft limits. +If users fails to correct the problem for too many login sessions, +they are eventually reprimanded by having their soft limit +enforced as their hard limit. +.ds RH Acknowledgements +.sp 2 +.ne 1i diff --git a/share/doc/smm/05.fastfs/6.t b/share/doc/smm/05.fastfs/6.t new file mode 100644 index 0000000..40be6aa --- /dev/null +++ b/share/doc/smm/05.fastfs/6.t @@ -0,0 +1,159 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)6.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.ds RH Acknowledgements +.SH +\s+2Acknowledgements\s0 +.PP +We thank Robert Elz for his ongoing interest in the new file system, +and for adding disk quotas in a rational and efficient manner. +We also acknowledge Dennis Ritchie for his suggestions +on the appropriate modifications to the user interface. +We appreciate Michael Powell's explanations on how +the DEMOS file system worked; +many of his ideas were used in this implementation. +Special commendation goes to Peter Kessler and Robert Henry for acting +like real users during the early debugging stage when file systems were +less stable than they should have been. +The criticisms and suggestions by the reviews contributed significantly +to the coherence of the paper. +Finally we thank our sponsors, +the National Science Foundation under grant MCS80-05144, +and the Defense Advance Research Projects Agency (DoD) under +ARPA Order No. 4031 monitored by Naval Electronic System Command under +Contract No. N00039-82-C-0235. +.ds RH References +.nr H2 1 +.sp 2 +.SH +\s+2References\s0 +.LP +.IP [Almes78] 20 +Almes, G., and Robertson, G. +"An Extensible File System for Hydra" +Proceedings of the Third International Conference on Software Engineering, +IEEE, May 1978. +.IP [Bass81] 20 +Bass, J. +"Implementation Description for File Locking", +Onyx Systems Inc, 73 E. Trimble Rd, San Jose, CA 95131 +Jan 1981. +.IP [Feiertag71] 20 +Feiertag, R. J. and Organick, E. I., +"The Multics Input-Output System", +Proceedings of the Third Symposium on Operating Systems Principles, +ACM, Oct 1971. pp 35-41 +.IP [Ferrin82a] 20 +Ferrin, T.E., +"Performance and Robustness Improvements in Version 7 UNIX", +Computer Graphics Laboratory Technical Report 2, +School of Pharmacy, University of California, +San Francisco, January 1982. +Presented at the 1982 Winter Usenix Conference, Santa Monica, California. +.IP [Ferrin82b] 20 +Ferrin, T.E., +"Performance Issuses of VMUNIX Revisited", +;login: (The Usenix Association Newsletter), Vol 7, #5, November 1982. pp 3-6 +.IP [Kridle83] 20 +Kridle, R., and McKusick, M., +"Performance Effects of Disk Subsystem Choices for +VAX Systems Running 4.2BSD UNIX", +Computer Systems Research Group, Dept of EECS, Berkeley, CA 94720, +Technical Report #8. +.IP [Kowalski78] 20 +Kowalski, T. +"FSCK - The UNIX System Check Program", +Bell Laboratory, Murray Hill, NJ 07974. March 1978 +.IP [Knuth75] 20 +Kunth, D. +"The Art of Computer Programming", +Volume 3 - Sorting and Searching, +Addison-Wesley Publishing Company Inc, Reading, Mass, 1975. pp 506-549 +.IP [Maruyama76] +Maruyama, K., and Smith, S. +"Optimal reorganization of Distributed Space Disk Files", +CACM, 19, 11. Nov 1976. pp 634-642 +.IP [Nevalainen77] 20 +Nevalainen, O., Vesterinen, M. +"Determining Blocking Factors for Sequential Files by Heuristic Methods", +The Computer Journal, 20, 3. Aug 1977. pp 245-247 +.IP [Pechura83] 20 +Pechura, M., and Schoeffler, J. +"Estimating File Access Time of Floppy Disks", +CACM, 26, 10. Oct 1983. pp 754-763 +.IP [Peterson83] 20 +Peterson, G. +"Concurrent Reading While Writing", +ACM Transactions on Programming Languages and Systems, +ACM, 5, 1. Jan 1983. pp 46-55 +.IP [Powell79] 20 +Powell, M. +"The DEMOS File System", +Proceedings of the Sixth Symposium on Operating Systems Principles, +ACM, Nov 1977. pp 33-42 +.IP [Ritchie74] 20 +Ritchie, D. M. and Thompson, K., +"The UNIX Time-Sharing System", +CACM 17, 7. July 1974. pp 365-375 +.IP [Smith81a] 20 +Smith, A. +"Input/Output Optimization and Disk Architectures: A Survey", +Performance and Evaluation 1. Jan 1981. pp 104-117 +.IP [Smith81b] 20 +Smith, A. +"Bibliography on File and I/O System Optimization and Related Topics", +Operating Systems Review, 15, 4. Oct 1981. pp 39-54 +.IP [Symbolics81] 20 +"Symbolics File System", +Symbolics Inc, 9600 DeSoto Ave, Chatsworth, CA 91311 +Aug 1981. +.IP [Thompson78] 20 +Thompson, K. +"UNIX Implementation", +Bell System Technical Journal, 57, 6, part 2. pp 1931-1946 +July-August 1978. +.IP [Thompson80] 20 +Thompson, M. +"Spice File System", +Carnegie-Mellon University, +Department of Computer Science, Pittsburg, PA 15213 +#CMU-CS-80, Sept 1980. +.IP [Trivedi80] 20 +Trivedi, K. +"Optimal Selection of CPU Speed, Device Capabilities, and File Assignments", +Journal of the ACM, 27, 3. July 1980. pp 457-473 +.IP [White80] 20 +White, R. M. +"Disk Storage Technology", +Scientific American, 243(2), August 1980. diff --git a/share/doc/smm/05.fastfs/Makefile b/share/doc/smm/05.fastfs/Makefile new file mode 100644 index 0000000..d63aae2 --- /dev/null +++ b/share/doc/smm/05.fastfs/Makefile @@ -0,0 +1,10 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/05.fastfs +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t +MACROS= -ms +USE_TBL= +USE_EQN= + +.include <bsd.doc.mk> diff --git a/share/doc/smm/06.nfs/0.t b/share/doc/smm/06.nfs/0.t new file mode 100644 index 0000000..4d77f56 --- /dev/null +++ b/share/doc/smm/06.nfs/0.t @@ -0,0 +1,75 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 8.1 (Berkeley) 6/8/93 +.\" +.(l C +.sz 14 +.b "The 4.4BSD NFS Implementation" +.sp +.sz 10 +Rick Macklem +.i "University of Guelph" +.)l +.sp 2 +.ce 1 +.sz 12 +.b "ABSTRACT" +.eh 'SMM:06-%''The 4.4BSD NFS Implementation' +.oh 'The 4.4BSD NFS Implementation''SMM:06-%' +.pp +The 4.4BSD implementation of the Network File System (NFS)\** is +intended to interoperate with +.(f +\**Network File System (NFS) is believed to be a registered trademark of +Sun Microsystems Inc. +.)f +other NFS Version 2 Protocol (RFC1094) implementations but also +allows use of an alternate protocol that is hoped to provide better +performance in certain environments. +This paper will informally discuss these various protocol features and +their use. +There is a brief overview of the implementation followed +by several sections on various problem areas related to NFS +and some hints on how to deal with them. +.pp +Not Quite NFS (NQNFS) is an NFS like protocol designed to maintain full cache +consistency between clients in a crash tolerant manner. It is an adaptation +of the NFS protocol such that the server supports both NFS +and NQNFS clients while maintaining full consistency between the server and +NQNFS clients. +It borrows heavily from work done on Spritely-NFS [Srinivasan89], but uses +Leases [Gray89] to avoid the need to recover server state information +after a crash. +.sp diff --git a/share/doc/smm/06.nfs/1.t b/share/doc/smm/06.nfs/1.t new file mode 100644 index 0000000..96415da --- /dev/null +++ b/share/doc/smm/06.nfs/1.t @@ -0,0 +1,555 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.\" +.sh 1 "NFS Implementation" +.pp +The 4.4BSD implementation of NFS and the alternate protocol nicknamed +Not Quite NFS (NQNFS) are kernel resident, but make use of a few system +daemons. +The kernel implementation does not use an RPC library, handling the RPC +request and reply messages directly in \fImbuf\fR data areas. NFS +interfaces to the network using +sockets via. the kernel interface available in +\fIsys/kern/uipc_syscalls.c\fR as \fIsosend(), soreceive(),\fR... +There are connection management routines for support of sockets for connection +oriented protocols and timeout/retransmit support for datagram sockets on +the client side. +For connection oriented transport protocols, +such as TCP/IP, there is one connection +for each client to server mount point that is maintained until an umount. +If the connection breaks, the client will attempt a reconnect with a new +socket. +The client side can operate without any daemons running, but performance +will be improved by running nfsiod daemons that perform read-aheads +and write-behinds. +For the server side to function, the daemons portmap, mountd and +nfsd must be running. +The mountd daemon performs two important functions. +.ip 1) +Upon startup and after a hangup signal, mountd reads the exports +file and pushes the export information for each local file system down +into the kernel via. the mount system call. +.ip 2) +Mountd handles remote mount protocol (RFC1094, Appendix A) requests. +.lp +The nfsd master daemon forks off children that enter the kernel +via. the nfssvc system call. The children normally remain kernel +resident, providing a process context for the NFS RPC servers. +Meanwhile, the master nfsd waits to accept new connections from clients +using connection oriented transport protocols and passes the new sockets down +into the kernel. +The client side mount_nfs along with portmap and +mountd are the only parts of the NFS subsystem that make any +use of the Sun RPC library. +.sh 1 "Mount Problems" +.pp +There are several problems that can be encountered at the time of an NFS +mount, ranging from an unresponsive NFS server (crashed, network partitioned +from client, etc.) to various interoperability problems between different +NFS implementations. +.pp +On the server side, +if the 4.4BSD NFS server will be handling any PC clients, mountd will +require the \fB-n\fR option to enable non-root mount request servicing. +Running of a pcnfsd\** daemon will also be necessary. +.(f +\** Pcnfsd is available in source form from Sun Microsystems and many +anonymous ftp sites. +.)f +The server side requires that the daemons +mountd and nfsd be running and that +they be registered with portmap properly. +If problems are encountered, +the safest fix is to kill all the daemons and then restart them in +the order portmap, mountd and nfsd. +Other server side problems are normally caused by problems with the format +of the exports file, which is covered under +Security and in the exports man page. +.pp +On the client side, there are several mount options useful for dealing +with server problems. +In cases where a file system is not critical for system operation, the +\fB-b\fR +mount option may be specified so that mount_nfs will go into the +background for a mount attempt on an unresponsive server. +This is useful for mounts specified in +\fIfstab(5)\fR, +so that the system will not get hung while booting doing +\fBmount -a\fR +because a file server is not responsive. +On the other hand, if the file system is critical to system operation, this +option should not be used so that the client will wait for the server to +come up before completing bootstrapping. +There are also three mount options to help deal with interoperability issues +with various non-BSD NFS servers. The +\fB-P\fR +option specifies that the NFS +client use a reserved IP port number to satisfy some servers' security +requirements.\** +.(f +\**Any security benefit of this is highly questionable and as +such the BSD server does not require a client to use a reserved port number. +.)f +The +\fB-c\fR +option stops the NFS client from doing a \fIconnect\fR on the UDP +socket, so that the mount works with servers that send NFS replies from +port numbers other than the standard 2049.\** +.(f +\**The Encore Multimax is known +to require this. +.)f +Finally, the +\fB-g=\fInum\fR +option sets the maximum size of the group list in the credentials passed +to an NFS server in every RPC request. Although RFC1057 specifies a maximum +size of 16 for the group list, some servers can't handle that many. +If a user, particularly root doing a mount, +keeps getting access denied from a file server, try temporarily +reducing the number of groups that user is in to less than 5 +by editing /etc/group. If the user can then access the file system, slowly +increase the number of groups for that user until the limit is found and +then peg the limit there with the +\fB-g=\fInum\fR +option. +This implies that the server will only see the first \fInum\fR +groups that the user is in, which can cause some accessibility problems. +.pp +For sites that have many NFS servers, amd [Pendry93] +is a useful administration tool. +It also reduces the number of actual NFS mount points, alleviating problems +with commands such as df(1) that hang when any of the NFS servers is +unreachable. +.sh 1 "Dealing with Hung Servers" +.pp +There are several mount options available to help a client deal with +being hung waiting for response from a crashed or unreachable\** server. +.(f +\**Due to a network partitioning or similar. +.)f +By default, a hard mount will continue to try to contact the server +``forever'' to complete the system call. This type of mount is appropriate +when processes on the client that access files in the file system do not +tolerate file I/O systems calls that return -1 with \fIerrno == EINTR\fR +and/or access to the file system is critical for normal system operation. +.lp +There are two other alternatives: +.ip 1) +A soft mount (\fB-s\fR option) retries an RPC \fIn\fR +times and then the corresponding +system call returns -1 with errno set to EINTR. +For TCP transport, the actual RPC request is not retransmitted, but the +timeout intervals waiting for a reply from the server are done +in the same manner as UDP for this purpose. +The problem with this type of mount is that most applications do not +expect an EINTR error return from file I/O system calls (since it never +occurs for a local file system) and get confused by the error return +from the I/O system call. +The option +\fB-x=\fInum\fR +is used to set the RPC retry limit and if set too low, the error returns +will start occurring whenever the NFS server is slow due to heavy load. +Alternately, a large retry limit can result in a process hung for a long +time, due to a crashed server or network partitioning. +.ip 2) +An interruptible mount (\fB-i\fR option) checks to see if a termination signal +is pending for the process when waiting for server response and if it is, +the I/O system call posts an EINTR. Normally this results in the process +being terminated by the signal when returning from the system call. +This feature allows you to ``^C'' out of processes that are hung +due to unresponsive servers. +The problem with this approach is that signals that are caught by +a process are not recognized as termination signals +and the process will remain hung.\** +.(f +\**Unfortunately, there are also some resource allocation situations in the +BSD kernel where the termination signal will be ignored and the process +will not terminate. +.)f +.sh 1 "RPC Transport Issues" +.pp +The NFS Version 2 protocol runs over UDP/IP transport by +sending each Sun Remote Procedure Call (RFC1057) +request/reply message in a single UDP +datagram. Since UDP does not guarantee datagram delivery, the +Remote Procedure Call (RPC) layer +times out and retransmits an RPC request if +no RPC reply has been received. Since this round trip timeout (RTO) value +is for the entire RPC operation, including RPC message transmission to the +server, queuing at the server for an nfsd, performing the RPC and +sending the RPC reply message back to the client, it can be highly variable +for even a moderately loaded NFS server. +As a result, the RTO interval must be a conservation (large) estimate, in +order to avoid extraneous RPC request retransmits.\** +.(f +\**At best, an extraneous RPC request retransmit increases +the load on the server and at worst can result in damaged files +on the server when non-idempotent RPCs are redone [Juszczak89]. +.)f +Also, with an 8Kbyte read/write data size +(the default), the read/write reply/request will be an 8+Kbyte UDP datagram +that must normally be fragmented at the IP layer for transmission.\** +.(f +\**6 IP fragments for an Ethernet, +which has a maximum transmission unit of 1500bytes. +.)f +For IP fragments to be successfully reassembled into +the IP datagram at the receive end, all +fragments must be received within a fairly short ``time to live''. +If one fragment is lost/damaged in transit, +the entire RPC must be retransmitted and redone. +This problem can be exaggerated by a network interface on the receiver that +cannot handle the reception of back to back network packets. [Kent87a] +.pp +There are several tuning mount +options on the client side that can prove useful when trying to +alleviate performance problems related to UDP RPC transport. +The options +\fB-r=\fInum\fR +and +\fB-w=\fInum\fR +specify the maximum read or write data size respectively. +The size \fInum\fR +should be a power of 2 (4K, 2K, 1K) and adjusted downward from the +maximum of 8Kbytes +whenever IP fragmentation is causing problems. The best indicator of +IP fragmentation problems is a significant number of +\fIfragments dropped after timeout\fR +reported by the \fIip:\fR section of a \fBnetstat -s\fR +command on either the client or server. +Of course, if the fragments are being dropped at the server, it can be +fun figuring out which client(s) are involved. +The most likely candidates are clients that are not +on the same local area network as the +server or have network interfaces that do not receive several +back to back network packets properly. +.pp +By default, the 4.4BSD NFS client dynamically estimates the retransmit +timeout interval for the RPC and this appears to work reasonably well for +many environments. However, the +\fB-d\fR +flag can be specified to turn off +the dynamic estimation of retransmit timeout, so that the client will +use a static initial timeout interval.\** +.(f +\**After the first retransmit timeout, the initial interval is backed off +exponentially. +.)f +The +\fB-t=\fInum\fR +option can be used with +\fB-d\fR +to set the initial timeout interval to other than the default of 2 seconds. +The best indicator that dynamic estimation should be turned off would +be a significant number\** in the \fIX Replies\fR field and a +.(f +\**Even 0.1% of the total RPCs is probably significant. +.)f +large number in the \fIRetries\fR field +in the \fIRpc Info:\fR section as reported +by the \fBnfsstat\fR command. +On the server, there would be significant numbers of \fIInprog\fR recent +request cache hits in the \fIServer Cache Stats:\fR section as reported +by the \fBnfsstat\fR command, when run on the server. +.pp +The tradeoff is that a smaller timeout interval results in a better +average RPC response time, but increases the risk of extraneous retries +that in turn increase server load and the possibility of damaged files +on the server. It is probably best to err on the safe side and use a large +(>= 2sec) fixed timeout if the dynamic retransmit timeout estimation +seems to be causing problems. +.pp +An alternative to all this fiddling is to run NFS over TCP transport instead +of UDP. +Since the 4.4BSD TCP implementation provides reliable +delivery with congestion control, it avoids all of the above problems. +It also permits the use of read and write data sizes greater than the 8Kbyte +limit for UDP transport.\** +.(f +\**Read/write data sizes greater than 8Kbytes will not normally improve +performance unless the kernel constant MAXBSIZE is increased and the +file system on the server has a block size greater than 8Kbytes. +.)f +NFS over TCP usually delivers comparable to significantly better performance +than NFS over UDP +unless the client or server processor runs at less than 5-10MIPS. For a +slow processor, the extra CPU overhead of using TCP transport will become +significant and TCP transport may only be useful when the client +to server interconnect traverses congested gateways. +The main problem with using TCP transport is that it is only supported +between BSD clients and servers.\** +.(f +\**There are rumors of commercial NFS over TCP implementations on the horizon +and these may well be worth exploring. +.)f +.sh 1 "Other Tuning Tricks" +.pp +Another mount option that may improve performance over +certain network interconnects is \fB-a=\fInum\fR +which sets the number of blocks that the system will +attempt to read-ahead during sequential reading of a file. The default value +of 1 seems to be appropriate for most situations, but a larger value might +achieve better performance for some environments, such as a mount to a server +across a ``high bandwidth * round trip delay'' interconnect. +.pp +For the adventurous, playing with the size of the buffer cache +can also improve performance for some environments that use NFS heavily. +Under some workloads, a buffer cache of 4-6Mbytes can result in significant +performance improvements over 1-2Mbytes, both in client side system call +response time and reduced server RPC load. +The buffer cache size defaults to 10% of physical memory, +but this can be overridden by specifying the BUFPAGES option +in the machine's config file.\** +.(f +BUFPAGES is the number of physical machine pages allocated to the buffer cache. +ie. BUFPAGES * NBPG = buffer cache size in bytes +.)f +When increasing the size of BUFPAGES, it is also advisable to increase the +number of buffers NBUF by a corresponding amount. +Note that there is a tradeoff of memory allocated to the buffer cache versus +available for paging, which implies that making the buffer cache larger +will increase paging rate, with possibly disastrous results. +.sh 1 "Security Issues" +.pp +When a machine is running an NFS server it opens up a great big security hole. +For ordinary NFS, the server receives client credentials +in the RPC request as a user id +and a list of group ids and trusts them to be authentic! +The only tool available to restrict remote access to +file systems with is the exports(5) file, +so file systems should be exported with great care. +The exports file is read by mountd upon startup and after a hangup signal +is posted for it and then as much of the access specifications as possible are +pushed down into the kernel for use by the nfsd(s). +The trick here is that the kernel information is stored on a per +local file system mount point and client host address basis and cannot refer to +individual directories within the local server file system. +It is best to think of the exports file as referring to the various local +file systems and not just directory paths as mount points. +A local file system may be exported to a specific host, all hosts that +match a subnet mask or all other hosts (the world). The latter is very +dangerous and should only be used for public information. It is also +strongly recommended that file systems exported to ``the world'' be exported +read-only. +For each host or group of hosts, the file system can be exported read-only or +read/write. +You can also define one of three client user id to server credential +mappings to help control access. +Root (user id == 0) can be mapped to some default credentials while all other +user ids are accepted as given. +If the default credentials for user id equal zero +are root, then there is essentially no remapping. +Most NFS file systems are exported this way, most commonly mapping +user id == 0 to the credentials for the user nobody. +Since the client user id and group id list is used unchanged on the server +(except for root), this also implies that +the user id and group id space must be common between the client and server. +(ie. user id N on the client must refer to the same user on the server) +All user ids can be mapped to a default set of credentials, typically that of +the user nobody. This essentially gives world access to all +users on the corresponding hosts. +.pp +As well as the standard NFS Version 2 protocol (RFC1094) implementation, BSD +systems can use a variant of the protocol called Not Quite NFS (NQNFS) that +supports a variety of protocol extensions. +This protocol uses 64bit file offsets +and sizes, an \fIaccess rpc\fR, an \fIappend\fR option on the write rpc +and extended file attributes to support 4.4BSD file system functionality +more fully. +It also makes use of a variant of short term +\fIleases\fR [Gray89] with delayed write client caching, +in an effort to provide full cache consistency and better performance. +This protocol is available between 4.4BSD systems only and is used when +the \fB-q\fR mount option is specified. +It can be used with any of the aforementioned options for NFS, such as TCP +transport (\fB-T\fR). +Although this protocol is experimental, it is recommended over NFS for +mounts between 4.4BSD systems.\** +.(f +\**I would appreciate email from anyone who can provide +NFS vs. NQNFS performance measurements, +particularly fast clients, many clients or over an internetwork +connection with a large ``bandwidth * RTT'' product. +.)f +.sh 1 "Monitoring NFS Activity" +.pp +The basic command for monitoring NFS activity on clients and servers is +nfsstat. It reports cumulative statistics of various NFS activities, +such as counts of the various different RPCs and cache hit rates on the client +and server. Of particular interest on the server are the fields in the +\fIServer Cache Stats:\fR section, which gives numbers for RPC retries received +in the first three fields and total RPCs in the fourth. The first three fields +should remain a very small percentage of the total. If not, it +would indicate one or more clients doing retries too aggressively and the fix +would be to isolate these clients, +disable the dynamic RTO estimation on them and +make their initial timeout interval a conservative (ie. large) value. +.pp +On the client side, the fields in the \fIRpc Info:\fR section are of particular +interest, as they give an overall picture of NFS activity. +The \fITimedOut\fR field is the number of I/O system calls that returned -1 +for ``soft'' mounts and can be reduced +by increasing the retry limit or changing +the mount type to ``intr'' or ``hard''. +The \fIInvalid\fR field is a count of trashed RPC replies that are received +and should remain zero.\** +.(f +\**Some NFS implementations run with UDP checksums disabled, so garbage RPC +messages can be received. +.)f +The \fIX Replies\fR field counts the number of repeated RPC replies received +from the server and is a clear indication of a too aggressive RTO estimate. +Unfortunately, a good NFS server implementation will use a ``recent request +cache'' [Juszczak89] that will suppress the extraneous replies. +A large value for \fIRetries\fR indicates a problem, but +it could be any of: +.ip \(bu +a too aggressive RTO estimate +.ip \(bu +an overloaded NFS server +.ip \(bu +IP fragments being dropped (gateway, client or server) +.lp +and requires further investigation. +The \fIRequests\fR field is the total count of RPCs done on all servers. +.pp +The \fBnetstat -s\fR comes in useful during investigation of RPC transport +problems. +The field \fIfragments dropped after timeout\fR in +the \fIip:\fR section indicates IP fragments are +being lost and a significant number of these occurring indicates that the +use of TCP transport or a smaller read/write data size is in order. +A significant number of \fIbad checksums\fR reported in the \fIudp:\fR +section would suggest network problems of a more generic sort. +(cabling, transceiver or network hardware interface problems or similar) +.pp +There is a RPC activity logging facility for both the client and +server side in the kernel. +When logging is enabled by setting the kernel variable nfsrtton to +one, the logs in the kernel structures nfsrtt (for the client side) +and nfsdrt (for the server side) are updated upon the completion +of each RPC in a circular manner. +The pos element of the structure is the index of the next element +of the log array to be updated. +In other words, elements of the log array from \fIlog\fR[pos] to +\fIlog\fR[pos - 1] are in chronological order. +The include file <sys/nfsrtt.h> should be consulted for details on the +fields in the two log structures.\** +.(f +\**Unfortunately, a monitoring tool that uses these logs is still in the +planning (dreaming) stage. +.)f +.sh 1 "Diskless Client Support" +.pp +The NFS client does include kernel support for diskless/dataless operation +where the root file system and optionally the swap area is remote NFS mounted. +A diskless/dataless client is configured using a version of the +``swapkernel.c'' file as provided in the directory \fIcontrib/diskless.nfs\fR. +If the swap device == NODEV, it specifies an NFS mounted swap area and should +be configured the same size as set up by diskless_setup when run on the server. +This file must be put in the \fIsys/compile/<machine_name>\fR kernel build +directory after the config command has been run, since config does +not know about specifying NFS root and swap areas. +The kernel variable mountroot must be set to nfs_mountroot instead of +ffs_mountroot and the kernel structure nfs_diskless must be filled in +properly. +There are some primitive system administration tools in the \fIcontrib/diskless.nfs\fR directory to assist in filling in +the nfs_diskless structure and in setting up an NFS server for +diskless/dataless clients. +The tools were designed to provide a bare bones capability, to allow maximum +flexibility when setting up different servers. +.lp +The tools are as follows: +.ip \(bu +diskless_offset.c - This little program reads a ``kernel'' object file and +writes the file byte offset of the nfs_diskless structure in it to +standard out. It was kept separate because it sometimes has to +be compiled/linked in funny ways depending on the client architecture. +(See the comment at the beginning of it.) +.ip \(bu +diskless_setup.c - This program is run on the server and sets up files for a +given client. It mostly just fills in an nfs_diskless structure and +writes it out to either the "kernel" file or a separate file called +/var/diskless/setup.<official-hostname> +.ip \(bu +diskless_boot.c - There are two functions in here that may be used +by a bootstrap server such as tftpd to permit sharing of the ``kernel'' +object file for similar clients. This saves disk space on the bootstrap +server and simplify organization, but are not critical for correct operation. +They read the ``kernel'' +file, but optionally fill in the nfs_diskless structure from a +separate "setup.<official-hostname>" file so that there is only +one copy of "kernel" for all similar (same arch etc.) clients. +These functions use a text file called +/var/diskless/boot.<official-hostname> to control the netboot. +.lp +The basic setup steps are: +.ip \(bu +make a "kernel" for the client(s) with mountroot() == nfs_mountroot() +and swdevt[0].sw_dev == NODEV if it is to do nfs swapping as well +(See the same swapkernel.c file) +.ip \(bu +run diskless_offset on the kernel file to find out the byte offset +of the nfs_diskless structure +.ip \(bu +Run diskless_setup on the server to set up the server and fill in the +nfs_diskless structure for that client. +The nfs_diskless structure can either be written into the +kernel file (the -x option) or +saved in /var/diskless/setup.<official-hostname>. +.ip \(bu +Set up the bootstrap server. If the nfs_diskless structure was written into +the ``kernel'' file, any vanilla bootstrap protocol such as bootp/tftp can +be used. If the bootstrap server has been modified to use the functions in +diskless_boot.c, then a +file called /var/diskless/boot.<official-hostname> +must be created. +It is simply a two line text file, where the first line is the pathname +of the correct ``kernel'' file and the second line has the pathname of +the nfs_diskless structure file and its byte offset in it. +For example: +.br + /var/diskless/kernel.pmax +.br + /var/diskless/setup.rickers.cis.uoguelph.ca 642308 +.br +.ip \(bu +Create a /var subtree for each client in an appropriate place on the server, +such as /var/diskless/var/<client-hostname>/... +By using the <client-hostname> to differentiate /var for each host, +/etc/rc can be modified to mount the correct /var from the server. diff --git a/share/doc/smm/06.nfs/2.t b/share/doc/smm/06.nfs/2.t new file mode 100644 index 0000000..85e2896 --- /dev/null +++ b/share/doc/smm/06.nfs/2.t @@ -0,0 +1,532 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.\" +.sh 1 "Not Quite NFS, Crash Tolerant Cache Consistency for NFS" +.pp +Not Quite NFS (NQNFS) is an NFS like protocol designed to maintain full cache +consistency between clients in a crash tolerant manner. +It is an adaptation of the NFS protocol such that the server supports both NFS +and NQNFS clients while maintaining full consistency between the server and +NQNFS clients. +This section borrows heavily from work done on Spritely-NFS [Srinivasan89], +but uses Leases [Gray89] to avoid the need to recover server state information +after a crash. +The reader is strongly encouraged to read these references before +trying to grasp the material presented here. +.sh 2 "Overview" +.pp +The protocol maintains cache consistency by using a somewhat +Sprite [Nelson88] like protocol, +but is based on short term leases\** instead of hard state information +about open files. +.(f +\** A lease is a ticket permitting an activity that is +valid until some expiry time. +.)f +The basic principal is that the protocol will disable client caching of a +file whenever that file is write shared\**. +.(f +\** Write sharing occurs when at least one client is modifying a file while +other client(s) are reading the file. +.)f +Whenever a client wishes to cache data for a file it must hold a valid lease. +There are three types of leases: read caching, write caching and non-caching. +The latter type requires that all file operations be done synchronously with +the server via. RPCs. +A read caching lease allows for client data caching, but no file modifications +may be done. +A write caching lease allows for client caching of writes, +but requires that all writes be pushed to the server when the lease expires. +If a client has dirty buffers\** +.(f +\** Cached write data is not yet pushed (written) to the server. +.)f +when a write cache lease has almost expired, it will attempt to +extend the lease but is required to push the dirty buffers if extension fails. +A client gets leases by either doing a \fBGetLease RPC\fR or by piggybacking +a \fBGetLease Request\fR onto another RPC. Piggybacking is supported for the +frequent RPCs Getattr, Setattr, Lookup, Readlink, Read, Write and Readdir +in an effort to minimize the number of \fBGetLease RPCs\fR required. +All leases are at the granularity of a file, since all NFS RPCs operate on +individual files and NFS has no intrinsic notion of a file hierarchy. +Directories, symbolic links and file attributes may be read cached but +are not write cached. +The exception here is the attribute file_size, which is updated during cached +writing on the client to reflect a growing file. +.pp +It is the server's responsibility to ensure that consistency is maintained +among the NQNFS clients by disabling client caching whenever a server file +operation would cause inconsistencies. +The possibility of inconsistencies occurs whenever a client has +a write caching lease and any other client, +or local operations on the server, +tries to access the file or when +a modify operation is attempted on a file being read cached by client(s). +At this time, the server sends an \fBeviction notice\fR to all clients holding +the lease and then waits for lease termination. +Lease termination occurs when a \fBvacated the premises\fR message has been +received from all the clients that have signed the lease or when the lease +expires via. timeout. +The message pair \fBeviction notice\fR and \fBvacated the premises\fR roughly +correspond to a Sprite server\(->client callback, but are not implemented as an +actual RPC, to avoid the server waiting indefinitely for a reply from a dead +client. +.pp +Server consistency checking can be viewed as issuing intrinsic leases for a +file operation for the duration of the operation only. For example, the +\fBCreate RPC\fR will get an intrinsic write lease on the directory in which +the file is being created, disabling client read caches for that directory. +.pp +By relegating this responsibility to the server, consistency between the +server and NQNFS clients is maintained when NFS clients are modifying the +file system as well.\** +.(f +\** The NFS clients will continue to be \fIapproximately\fR consistent with +the server. +.)f +.pp +The leases are issued as time intervals to avoid the requirement of time of day +clock synchronization. There are three important time constants known to +the server. The \fBmaximum_lease_term\fR sets an upper bound on lease duration. +The \fBclock_skew\fR is added to all lease terms on the server to correct for +differing clock speeds between the client and server and \fBwrite_slack\fR is +the number of seconds the server is willing to wait for a client with +an expired write caching lease to push dirty writes. +.pp +The server maintains a \fBmodify_revision\fR number for each file. It is +defined as an unsigned quadword integer that is never zero and that must +increase whenever the corresponding file is modified on the server. +It is used +by the client to determine whether or not cached data for the file is +stale. +Generating this value is easier said than done. The current implementation +uses the following technique, which is believed to be adequate. +The high order longword is stored in the ufs inode and is initialized to one +when an inode is first allocated. +The low order longword is stored in main memory only and is initialized to +zero when an inode is read in from disk. +When the file is modified for the first time within a given second of +wall clock time, the high order longword is incremented by one and +the low order longword reset to zero. +For subsequent modifications within the same second of wall clock +time, the low order longword is incremented. If the low order longword wraps +around to zero, the high order longword is incremented again. +Since the high order longword only increments once per second and the inode +is pushed to disk frequently during file modification, this implies +0 \(<= Current\(miDisk \(<= 5. +When the inode is read in from disk, 10 +is added to the high order longword, which ensures that the quadword +is greater than any value it could have had before a crash. +This introduces apparent modifications every time the inode falls out of +the LRU inode cache, but this should only reduce the client caching performance +by a (hopefully) small margin. +.sh 2 "Crash Recovery and other Failure Scenarios" +.pp +The server must maintain the state of all the current leases held by clients. +The nice thing about short term leases is that maximum_lease_term seconds +after the server stops issuing leases, there are no current leases left. +As such, server crash recovery does not require any state recovery. After +rebooting, the server refuses to service any RPCs except for writes until +write_slack seconds after the last lease would have expired\**. +.(f +\** The last lease expiry time may be safely estimated as +"boottime+maximum_lease_term+clock_skew" for machines that cannot store +it in nonvolatile RAM. +.)f +By then, the server would not have any outstanding leases to recover the +state of and the clients have had at least write_slack seconds to push dirty +writes to the server and get the server sync'd up to date. After this, the +server simply services requests in a manner similar to NFS. +In an effort to minimize the effect of "recovery storms" [Baker91], +the server replies \fBtry_again_later\fR to the RPCs it is not +yet ready to service. +.pp +After a client crashes, the server may have to wait for a lease to timeout +before servicing a request if write sharing of a file with a cachable lease +on the client is about to occur. +As for the client, it simply starts up getting any leases it now needs. Any +outstanding leases for that client on the server prior to the crash will either be renewed or expire +via timeout. +.pp +Certain network partitioning failures are more problematic. If a client to +server network connection is severed just before a write caching lease expires, +the client cannot push the dirty writes to the server. After the lease expires +on the server, the server permits other clients to access the file with the +potential of getting stale data. Unfortunately I believe this failure scenario +is intrinsic in any delay write caching scheme unless the server is required to +wait \fBforever\fR for a client to regain contact\**. +.(f +\** Gray and Cheriton avoid this problem by using a \fBwrite through\fR policy. +.)f +Since the write caching lease has expired on the client, +it will sync up with the +server as soon as the network connection has been re-established. +.pp +There is another failure condition that can occur when the server is congested. +The worst case scenario would have the client pushing dirty writes to the server +but a large request queue on the server delays these writes for more than +\fBwrite_slack\fR seconds. It is hoped that a congestion control scheme using +the \fBtry_again_later\fR RPC reply after booting combined with +the following lease termination rule for write caching leases +can minimize the risk of this occurrence. +A write caching lease is only terminated on the server when there are have +been no writes to the file and the server has not been overloaded during +the previous write_slack seconds. The server has not been overloaded +is approximated by a test for sleeping nfsd(s) at the end of the write_slack +period. +.sh 2 "Server Disk Full" +.pp +There is a serious unresolved problem for delayed write caching with respect to +server disk space allocation. +When the disk on the file server is full, delayed write RPCs can fail +due to "out of space". +For NFS, this occurrence results in an error return from the close system +call on the file, since the dirty blocks are pushed on close. +Processes writing important files can check for this error return +to ensure that the file was written successfully. +For NQNFS, the dirty blocks are not pushed on close and as such the client +may not attempt the write RPC until after the process has done the close +which implies no error return from the close. +For the current prototype, +the only solution is to modify programs writing important +file(s) to call fsync and check for an error return from it instead of close. +.sh 2 "Protocol Details" +.pp +The protocol specification is identical to that of NFS [Sun89] except for +the following changes. +.ip \(bu +RPC Information +.(l + Program Number 300105 + Version Number 1 +.)l +.ip \(bu +Readdir_and_Lookup RPC +.(l + struct readdirlookargs { + fhandle file; + nfscookie cookie; + unsigned count; + unsigned duration; + }; + + struct entry { + unsigned cachable; + unsigned duration; + modifyrev rev; + fhandle entry_fh; + nqnfs_fattr entry_attrib; + unsigned fileid; + filename name; + nfscookie cookie; + entry *nextentry; + }; + + union readdirlookres switch (stat status) { + case NFS_OK: + struct { + entry *entries; + bool eof; + } readdirlookok; + default: + void; + }; + + readdirlookres + NQNFSPROC_READDIRLOOK(readdirlookargs) = 18; +.)l +Reads entries in a directory in a manner analogous to the NFSPROC_READDIR RPC +in NFS, but returns the file handle and attributes of each entry as well. +This allows the attribute and lookup caches to be primed. +.ip \(bu +Get Lease RPC +.(l + struct getleaseargs { + fhandle file; + cachetype readwrite; + unsigned duration; + }; + + union getleaseres switch (stat status) { + case NFS_OK: + bool cachable; + unsigned duration; + modifyrev rev; + nqnfs_fattr attributes; + default: + void; + }; + + getleaseres + NQNFSPROC_GETLEASE(getleaseargs) = 19; +.)l +Gets a lease for "file" valid for "duration" seconds from when the lease +was issued on the server\**. +.(f +\** To be safe, the client may only assume that the lease is valid +for ``duration'' seconds from when the RPC request was sent to the server. +.)f +The lease permits client caching if "cachable" is true. +The modify revision level and attributes for the file are also returned. +.ip \(bu +Eviction Message +.(l + void + NQNFSPROC_EVICTED (fhandle) = 21; +.)l +This message is sent from the server to the client. When the client receives +the message, it should flush data associated with the file represented by +"fhandle" from its caches and then send the \fBVacated Message\fR back to +the server. Flushing includes pushing any dirty writes via. write RPCs. +.ip \(bu +Vacated Message +.(l + void + NQNFSPROC_VACATED (fhandle) = 20; +.)l +This message is sent from the client to the server in response to the +\fBEviction Message\fR. See above. +.ip \(bu +Access RPC +.(l + struct accessargs { + fhandle file; + bool read_access; + bool write_access; + bool exec_access; + }; + + stat + NQNFSPROC_ACCESS(accessargs) = 22; +.)l +The access RPC does permission checking on the server for the given type +of access required by the client for the file. +Use of this RPC avoids accessibility problems caused by client->server uid +mapping. +.ip \(bu +Piggybacked Get Lease Request +.pp +The piggybacked get lease request is functionally equivalent to the Get Lease +RPC except that is attached to one of the other NQNFS RPC requests as follows. +A getleaserequest is prepended to all of the request arguments for NQNFS +and a getleaserequestres is inserted in all NFS result structures just after +the "stat" field only if "stat == NFS_OK". +.(l + union getleaserequest switch (cachetype type) { + case NQLREAD: + case NQLWRITE: + unsigned duration; + default: + void; + }; + + union getleaserequestres switch (cachetype type) { + case NQLREAD: + case NQLWRITE: + bool cachable; + unsigned duration; + modifyrev rev; + default: + void; + }; +.)l +The get lease request applies to the file that the attached RPC operates on +and the file attributes remain in the same location as for the NFS RPC reply +structure. +.ip \(bu +Three additional "stat" values +.pp +Three additional values have been added to the enumerated type "stat". +.(l + NQNFS_EXPIRED=500 + NQNFS_TRYLATER=501 + NQNFS_AUTHERR=502 +.)l +The "expired" value indicates that a lease has expired. +The "try later" +value is returned by the server when it wishes the client to retry the +RPC request after a short delay. It is used during crash recovery (Section 2) +and may also be useful for server congestion control. +The "authetication error" value is returned for kerberized mount points to +indicate that there is no cached authentication mapping and a Kerberos ticket +for the principal is required. +.sh 2 "Data Types" +.ip \(bu +cachetype +.(l + enum cachetype { + NQLNONE = 0, + NQLREAD = 1, + NQLWRITE = 2 + }; +.)l +Type of lease requested. NQLNONE is used to indicate no piggybacked lease +request. +.ip \(bu +modifyrev +.(l + typedef unsigned hyper modifyrev; +.)l +The "modifyrev" is an unsigned quadword integer value that is never zero +and increases every time the corresponding file is modified on the server. +.ip \(bu +nqnfs_time +.(l + struct nqnfs_time { + unsigned seconds; + unsigned nano_seconds; + }; +.)l +For NQNFS times are handled at nano second resolution instead of micro second +resolution for NFS. +.ip \(bu +nqnfs_fattr +.(l + struct nqnfs_fattr { + ftype type; + unsigned mode; + unsigned nlink; + unsigned uid; + unsigned gid; + unsigned hyper size; + unsigned blocksize; + unsigned rdev; + unsigned hyper bytes; + unsigned fsid; + unsigned fileid; + nqnfs_time atime; + nqnfs_time mtime; + nqnfs_time ctime; + unsigned flags; + unsigned generation; + modifyrev rev; + }; +.)l +The nqnfs_fattr structure is modified from the NFS fattr so that it stores +the file size as a 64bit quantity and the storage occupied as a 64bit number +of bytes. It also has fields added for the 4.4BSD va_flags and va_gen fields +as well as the file's modify rev level. +.ip \(bu +nqnfs_sattr +.(l + struct nqnfs_sattr { + unsigned mode; + unsigned uid; + unsigned gid; + unsigned hyper size; + nqnfs_time atime; + nqnfs_time mtime; + unsigned flags; + unsigned rdev; + }; +.)l +The nqnfs_sattr structure is modified from the NFS sattr structure in the +same manner as fattr. +.lp +The arguments to several of the NFS RPCs have been modified as well. Mostly, +these are minor changes to use 64bit file offsets or similar. The modified +argument structures follow. +.ip \(bu +Lookup RPC +.(l + struct lookup_diropargs { + unsigned duration; + fhandle dir; + filename name; + }; + + union lookup_diropres switch (stat status) { + case NFS_OK: + struct { + union getleaserequestres lookup_lease; + fhandle file; + nqnfs_fattr attributes; + } lookup_diropok; + default: + void; + }; + +.)l +The additional "duration" argument tells the server to get a lease for the +name being looked up if it is non-zero and the lease is specified +in "lookup_lease". +.ip \(bu +Read RPC +.(l + struct nqnfs_readargs { + fhandle file; + unsigned hyper offset; + unsigned count; + }; +.)l +.ip \(bu +Write RPC +.(l + struct nqnfs_writeargs { + fhandle file; + unsigned hyper offset; + bool append; + nfsdata data; + }; +.)l +The "append" argument is true for apeend only write operations. +.ip \(bu +Get Filesystem Attributes RPC +.(l + union nqnfs_statfsres (stat status) { + case NFS_OK: + struct { + unsigned tsize; + unsigned bsize; + unsigned blocks; + unsigned bfree; + unsigned bavail; + unsigned files; + unsigned files_free; + } info; + default: + void; + }; +.)l +The "files" field is the number of files in the file system and the "files_free" +is the number of additional files that can be created. +.sh 1 "Summary" +.pp +The configuration and tuning of an NFS environment tends to be a bit of a +mystic art, but hopefully this paper along with the man pages and other +reading will be helpful. Good Luck. diff --git a/share/doc/smm/06.nfs/Makefile b/share/doc/smm/06.nfs/Makefile new file mode 100644 index 0000000..99f0762 --- /dev/null +++ b/share/doc/smm/06.nfs/Makefile @@ -0,0 +1,8 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/06.nfs +SRCS= 0.t 1.t 2.t ref.t +MACROS= -me + +.include <bsd.doc.mk> diff --git a/share/doc/smm/06.nfs/ref.t b/share/doc/smm/06.nfs/ref.t new file mode 100644 index 0000000..039363b --- /dev/null +++ b/share/doc/smm/06.nfs/ref.t @@ -0,0 +1,123 @@ +.\" Copyright (c) 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" This document is derived from software contributed to Berkeley by +.\" Rick Macklem at The University of Guelph. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)ref.t 8.1 (Berkeley) 6/8/93 +.\" +.sh 1 "Bibliography" +.ip [Baker91] 16 +Mary Baker and John Ousterhout, Availability in the Sprite Distributed +File System, In \fIOperating System Review\fR, (25)2, pg. 95-98, +April 1991. +.ip [Baker91a] 16 +Mary Baker, Private Email Communication, May 1991. +.ip [Burrows88] 16 +Michael Burrows, Efficient Data Sharing, Technical Report #153, +Computer Laboratory, University of Cambridge, Dec. 1988. +.ip [Gray89] 16 +Cary G. Gray and David R. Cheriton, Leases: An Efficient Fault-Tolerant +Mechanism for Distributed File Cache Consistency, In \fIProc. of the +Twelfth ACM Symposium on Operating Systems Principals\fR, Litchfield Park, +AZ, Dec. 1989. +.ip [Howard88] 16 +John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, +M. Satyanarayanan, Robert N. Sidebotham and Michael J. West, +Scale and Performance in a Distributed File System, \fIACM Trans. on +Computer Systems\fR, (6)1, pg 51-81, Feb. 1988. +.ip [Juszczak89] 16 +Chet Juszczak, Improving the Performance and Correctness of an NFS Server, +In \fIProc. Winter 1989 USENIX Conference,\fR pg. 53-63, San Diego, CA, January 1989. +.ip [Keith90] 16 +Bruce E. Keith, Perspectives on NFS File Server Performance Characterization, +In \fIProc. Summer 1990 USENIX Conference\fR, pg. 267-277, Anaheim, CA, +June 1990. +.ip [Kent87] 16 +Christopher. A. Kent, \fICache Coherence in Distributed Systems\fR, +Research Report 87/4, +Digital Equipment Corporation Western Research Laboratory, April 1987. +.ip [Kent87a] 16 +Christopher. A. Kent and Jeffrey C. Mogul, +\fIFragmentation Considered Harmful\fR, Research Report 87/3, +Digital Equipment Corporation Western Research Laboratory, Dec. 1987. +.ip [Macklem91] 16 +Rick Macklem, Lessons Learned Tuning the 4.3BSD Reno Implementation of the +NFS Protocol, In \fIProc. Winter USENIX Conference\fR, pg. 53-64, +Dallas, TX, January 1991. +.ip [Nelson88] 16 +Michael N. Nelson, Brent B. Welch, and John K. Ousterhout, Caching in the +Sprite Network File System, \fIACM Transactions on Computer Systems\fR (6)1 +pg. 134-154, February 1988. +.ip [Nowicki89] 16 +Bill Nowicki, Transport Issues in the Network File System, In +\fIComputer Communication Review\fR, pg. 16-20, Vol. 19, Number 2, April 1989. +.ip [Ousterhout90] 16 +John K. Ousterhout, Why Aren't Operating Systems Getting Faster As Fast as +Hardware? In \fIProc. Summer 1990 USENIX Conference\fR, pg. 247-256, Anaheim, +CA, June 1990. +.ip [Pendry93] 16 +Jan-Simon Pendry, 4.4 BSD Automounter Reference Manual, In +\fIsrc/usr.sbin/amd/doc directory of 4.4 BSD distribution tape\fR. +.ip [Reid90] 16 +Jim Reid, N(e)FS: the Protocol is the Problem, In +\fIProc. Summer 1990 UKUUG Conference\fR, +London, England, July 1990. +.ip [Sandberg85] 16 +Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon, +Design and Implementation of the Sun Network filesystem, In \fIProc. Summer +1985 USENIX Conference\fR, pages 119-130, Portland, OR, June 1985. +.ip [Schroeder85] 16 +Michael D. Schroeder, David K. Gifford and Roger M. Needham, A Caching +File System For A Programmer's Workstation, In \fIProc. of the Tenth +ACM Symposium on Operating Systems Principals\fR, pg. 25-34, Orcas Island, +WA, Dec. 1985. +.ip [Srinivasan89] 16 +V. Srinivasan and Jeffrey. C. Mogul, \fISpritely NFS: Implementation and +Performance of Cache-Consistency Protocols\fR, Research Report 89/5, +Digital Equipment Corporation Western Research Laboratory, May 1989. +.ip [Steiner88] 16 +Jennifer G. Steiner, Clifford Neuman and Jeffrey I. Schiller, +Kerberos: An Authentication Service for Open Network Systems, In +\fIProc. Winter 1988 USENIX Conference\fR, Dallas, TX, February 1988. +.ip [Stern] 16 +Hal Stern, \fIManaging NFS and NIS\fR, O'Reilly and Associates, +ISBN 0-937175-75-7. +.ip [Sun87] 16 +Sun Microsystems Inc., \fIXDR: External Data Representation Standard\fR, +RFC1014, Network Information Center, SRI International, June 1987. +.ip [Sun88] 16 +Sun Microsystems Inc., \fIRPC: Remote Procedure Call Protocol Specification Version 2\fR, +RFC1057, Network Information Center, SRI International, June 1988. +.ip [Sun89] 16 +Sun Microsystems Inc., \fINFS: Network File System Protocol Specification\fR, +ARPANET Working Group Requests for Comment, DDN Network Information Center, +SRI International, Menlo Park, CA, March 1989, RFC-1094. diff --git a/share/doc/smm/08.sendmailop/Makefile b/share/doc/smm/08.sendmailop/Makefile new file mode 100644 index 0000000..482ed79 --- /dev/null +++ b/share/doc/smm/08.sendmailop/Makefile @@ -0,0 +1,11 @@ +# From: @(#)Makefile 8.2 (Berkeley) 2/28/94 +# $FreeBSD$ + +VOLUME= smm/08.sendmailop +SRCS= op.me +MACROS= -me +USE_PIC= +USE_EQN= +SRCDIR= ${.CURDIR}/../../../../contrib/sendmail/doc/op + +.include <bsd.doc.mk> diff --git a/share/doc/smm/11.timedop/Makefile b/share/doc/smm/11.timedop/Makefile new file mode 100644 index 0000000..ad09e78 --- /dev/null +++ b/share/doc/smm/11.timedop/Makefile @@ -0,0 +1,9 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/11.timedop +SRCS= timed.ms +MACROS= -ms +SRCDIR= ${.CURDIR}/../../../../usr.sbin/timed/SMM.doc/timedop + +.include <bsd.doc.mk> diff --git a/share/doc/smm/12.timed/Makefile b/share/doc/smm/12.timed/Makefile new file mode 100644 index 0000000..1d9ed5c --- /dev/null +++ b/share/doc/smm/12.timed/Makefile @@ -0,0 +1,12 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= smm/12.timed +SRCS= timed.ms +EXTRA= date loop time unused +MACROS= -ms +USE_SOELIM= +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../usr.sbin/timed/SMM.doc/timed + +.include <bsd.doc.mk> diff --git a/share/doc/smm/18.net/0.t b/share/doc/smm/18.net/0.t new file mode 100644 index 0000000..d16e56f --- /dev/null +++ b/share/doc/smm/18.net/0.t @@ -0,0 +1,184 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)0.t 8.1 (Berkeley) 6/10/93 +.\" +.de IR +\fI\\$1\fP\\$2 +.. +.if n .ND +.TL +Networking Implementation Notes +.br +4.4BSD Edition +.AU +Samuel J. Leffler, William N. Joy, Robert S. Fabry, and Michael J. Karels +.AI +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California, Berkeley +Berkeley, CA 94720 +.AB +.FS +* UNIX is a trademark of Bell Laboratories. +.FE +This report describes the internal structure of the +networking facilities developed for the 4.4BSD version +of the UNIX* operating system +for the VAX\(dg. These facilities +.FS +\(dg DEC, VAX, DECnet, and UNIBUS are trademarks of +Digital Equipment Corporation. +.FE +are based on several central abstractions which +structure the external (user) view of network communication +as well as the internal (system) implementation. +.PP +The report documents the internal structure of the networking system. +The ``Berkeley Software Architecture Manual, 4.4BSD Edition'' (PSD:5) +provides a description of the user interface to the networking facilities. +.sp +.LP +Revised June 10, 1993 +.AE +.LP +.\".de PT +.\".lt \\n(LLu +.\".pc % +.\".nr PN \\n% +.\".tl '\\*(LH'\\*(CH'\\*(RH' +.\".lt \\n(.lu +.\".. +.\".ds RH Contents +.OH 'Networking Implementation Notes''SMM:18-%' +.EH 'SMM:18-%''Networking Implementation Notes' +.bp +.ce +.B "TABLE OF CONTENTS" +.LP +.sp 1 +.nf +.B "1. Introduction" +.LP +.sp .5v +.nf +.B "2. Overview" +.LP +.sp .5v +.nf +.B "3. Goals +.LP +.sp .5v +.nf +.B "4. Internal address representation" +.LP +.sp .5v +.nf +.B "5. Memory management" +.LP +.sp .5v +.nf +.B "6. Internal layering +6.1. Socket layer +6.1.1. Socket state +6.1.2. Socket data queues +6.1.3. Socket connection queuing +6.2. Protocol layer(s) +6.3. Network-interface layer +6.3.1. UNIBUS interfaces +.LP +.sp .5v +.nf +.B "7. Socket/protocol interface" +.LP +.sp .5v +.nf +.B "8. Protocol/protocol interface" +8.1. pr_output +8.2. pr_input +8.3. pr_ctlinput +8.4. pr_ctloutput +.LP +.sp .5v +.nf +.B "9. Protocol/network-interface interface" +9.1. Packet transmission +9.2. Packet reception +.LP +.sp .5v +.nf +.B "10. Gateways and routing issues +10.1. Routing tables +10.2. Routing table interface +10.3. User level routing policies +.LP +.sp .5v +.nf +.B "11. Raw sockets" +11.1. Control blocks +11.2. Input processing +11.3. Output processing +.LP +.sp .5v +.nf +.B "12. Buffering and congestion control" +12.1. Memory management +12.2. Protocol buffering policies +12.3. Queue limiting +12.4. Packet forwarding +.LP +.sp .5v +.nf +.B "13. Out of band data" +.LP +.sp .5v +.nf +.B "14. Trailer protocols" +.LP +.sp .5v +.nf +.B Acknowledgements +.LP +.sp .5v +.nf +.B References +.bp +.de _d +.if t .ta .6i 2.1i 2.6i +.\" 2.94 went to 2.6, 3.64 to 3.30 +.if n .ta .84i 2.6i 3.30i +.. +.de _f +.if t .ta .5i 1.25i 2.5i +.\" 3.5i went to 3.8i +.if n .ta .7i 1.75i 3.8i +.. diff --git a/share/doc/smm/18.net/1.t b/share/doc/smm/18.net/1.t new file mode 100644 index 0000000..ba5adb5 --- /dev/null +++ b/share/doc/smm/18.net/1.t @@ -0,0 +1,66 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)1.t 8.1 (Berkeley) 6/8/93 +.\" +.\".ds RH Introduction +.br +.ne 2i +.NH +\s+2Introduction\s0 +.PP +This report describes the internal structure of +facilities added to the +4.2BSD version of the UNIX operating system for +the VAX, +as modified in the 4.4BSD release. +The system facilities provide +a uniform user interface to networking +within UNIX. In addition, the implementation +introduces a structure for network communications which may be +used by system implementors in adding new networking +facilities. The internal structure is not visible +to the user, rather it is intended to aid implementors +of communication protocols and network services by +providing a framework which +promotes code sharing and minimizes implementation effort. +.PP +The reader is expected to be familiar with the C programming +language and system interface, as described in the +\fIBerkeley Software Architecture Manual, 4.4BSD Edition\fP [Joy86]. +Basic understanding of network +communication concepts is assumed; where required +any additional ideas are introduced. +.PP +The remainder of this document +provides a description of the system internals, +avoiding, when possible, those portions which are utilized only +by the interprocess communication facilities. diff --git a/share/doc/smm/18.net/2.t b/share/doc/smm/18.net/2.t new file mode 100644 index 0000000..f504889 --- /dev/null +++ b/share/doc/smm/18.net/2.t @@ -0,0 +1,85 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)2.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH Overview +.br +.ne 2i +.NH +\s+2Overview\s0 +.PP +If we consider +the International Standards Organization's (ISO) +Open System Interconnection (OSI) model of +network communication [ISO81] [Zimmermann80], +the networking facilities +described here correspond to a portion of the +session layer (layer 3) and all of the transport and +network layers (layers 2 and 1, respectively). +.PP +The network layer provides possibly imperfect +data transport services with minimal addressing +structure. +Addressing at this level is normally host to host, +with implicit or explicit routing optionally supported +by the communicating agents. +.PP +At the transport +layer the notions of reliable transfer, data sequencing, +flow control, and service addressing are normally +included. Reliability is usually managed by +explicit acknowledgement of data delivered. Failure +to acknowledge a transfer results in retransmission of +the data. Sequencing may be handled by tagging +each message handed to the network layer by a +\fIsequence number\fP and maintaining +state at the endpoints of communication to utilize +received sequence numbers in reordering data which +arrives out of order. +.PP +The session layer facilities may provide forms of +addressing which are mapped into formats required +by the transport layer, service authentication +and client authentication, etc. Various systems +also provide services such as data encryption and +address and protocol translation. +.PP +The following sections begin by describing some of the common +data structures and utility routines, then examine +the internal layering. The contents of each layer +and its interface are considered. Certain of the +interfaces are protocol implementation specific. For +these cases examples have been drawn from the Internet [Cerf78] +protocol family. Later sections cover routing issues, +the design of the raw socket interface and other +miscellaneous topics. diff --git a/share/doc/smm/18.net/3.t b/share/doc/smm/18.net/3.t new file mode 100644 index 0000000..1d1fddd --- /dev/null +++ b/share/doc/smm/18.net/3.t @@ -0,0 +1,59 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)3.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH Goals +.br +.ne 2i +.NH +\s+2Goals\s0 +.PP +The networking system was designed with the goal of supporting +multiple \fIprotocol families\fP and addressing styles. This required +information to be ``hidden'' in common data structures which +could be manipulated by all the pieces of the system, but which +required interpretation only by the protocols which ``controlled'' +it. The system described here attempts to minimize +the use of shared data structures to those kept by a suite of +protocols (a \fIprotocol family\fP), and those used for rendezvous +between ``synchronous'' and ``asynchronous'' portions of the +system (e.g. queues of data packets are filled at interrupt +time and emptied based on user requests). +.PP +A major goal of the system was to provide a framework within +which new protocols and hardware could be easily be supported. +To this end, a great deal of effort has been extended to +create utility routines which hide many of the more +complex and/or hardware dependent chores of networking. +Later sections describe the utility routines and the underlying +data structures they manipulate. diff --git a/share/doc/smm/18.net/4.t b/share/doc/smm/18.net/4.t new file mode 100644 index 0000000..afa6913 --- /dev/null +++ b/share/doc/smm/18.net/4.t @@ -0,0 +1,67 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)4.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Address representation +.br +.ne 2i +.NH +\s+2Internal address representation\s0 +.PP +Common to all portions of the system are two data structures. +These structures are used to represent +addresses and various data objects. +Addresses, internally are described by the \fIsockaddr\fP structure, +.DS +._f +struct sockaddr { + short sa_family; /* data format identifier */ + char sa_data[14]; /* address */ +}; +.DE +All addresses belong to one or more \fIaddress families\fP +which define their format and interpretation. +The \fIsa_family\fP field indicates the address family to which the address +belongs, and the \fIsa_data\fP field contains the actual data value. +The size of the data field, 14 bytes, was selected based on a study +of current address formats.* +Specific address formats use private structure definitions +that define the format of the data field. +The system interface supports larger address structures, +although address-family-independent support facilities, for example routing +and raw socket interfaces, provide only 14 bytes for address storage. +Protocols that do not use those facilities (e.g, the current Unix domain) +may use larger data areas. +.FS +* Later versions of the system may support variable length addresses. +.FE diff --git a/share/doc/smm/18.net/5.t b/share/doc/smm/18.net/5.t new file mode 100644 index 0000000..d4fb8e3 --- /dev/null +++ b/share/doc/smm/18.net/5.t @@ -0,0 +1,184 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)5.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Memory management +.br +.ne 2i +.NH +\s+2Memory management\s0 +.PP +A single mechanism is used for data storage: memory buffers, or +\fImbuf\fP's. An mbuf is a structure of the form: +.DS +._f +struct mbuf { + struct mbuf *m_next; /* next buffer in chain */ + u_long m_off; /* offset of data */ + short m_len; /* amount of data in this mbuf */ + short m_type; /* mbuf type (accounting) */ + u_char m_dat[MLEN]; /* data storage */ + struct mbuf *m_act; /* link in higher-level mbuf list */ +}; +.DE +The \fIm_next\fP field is used to chain mbufs together on linked +lists, while the \fIm_act\fP field allows lists of mbuf chains to be +accumulated. By convention, the mbufs common to a single object +(for example, a packet) are chained together with the \fIm_next\fP +field, while groups of objects are linked via the \fIm_act\fP +field (possibly when in a queue). +.PP +Each mbuf has a small data area for storing information, \fIm_dat\fP. +The \fIm_len\fP field indicates the amount of data, while the \fIm_off\fP +field is an offset to the beginning of the data from the base of the +mbuf. Thus, for example, the macro \fImtod\fP, which converts a pointer +to an mbuf to a pointer to the data stored in the mbuf, has the form +.DS +._d +#define mtod(\fIx\fP,\fIt\fP) ((\fIt\fP)((int)(\fIx\fP) + (\fIx\fP)->m_off)) +.DE +(note the \fIt\fP parameter, a C type cast, which is used to cast +the resultant pointer for proper assignment). +.PP +In addition to storing data directly in the mbuf's data area, data +of page size may be also be stored in a separate area of memory. +The mbuf utility routines maintain +a pool of pages for this purpose and manipulate a private page map +for such pages. +An mbuf with an external data area may be recognized by the larger +offset to the data area; +this is formalized by the macro M_HASCL(\fIm\fP), which is true +if the mbuf whose address is \fIm\fP has an external page cluster. +An array of reference counts on pages is also maintained +so that copies of pages may be made without core to core +copying (copies are created simply by duplicating the reference to the data +and incrementing the associated reference counts for the pages). +Separate data pages are currently used only +when copying data from a user process into the kernel, +and when bringing data in at the hardware level. Routines which +manipulate mbufs are not normally aware whether data is stored directly in +the mbuf data array, or if it is kept in separate pages. +.PP +The following may be used to allocate and free mbufs: +.LP +m = m_get(wait, type); +.br +MGET(m, wait, type); +.IP +The subroutine \fIm_get\fP and the macro \fIMGET\fP +each allocate an mbuf, placing its address in \fIm\fP. +The argument \fIwait\fP is either M_WAIT or M_DONTWAIT according +to whether allocation should block or fail if no mbuf is available. +The \fItype\fP is one of the predefined mbuf types for use in accounting +of mbuf allocation. +.IP "MCLGET(m);" +This macro attempts to allocate an mbuf page cluster +to associate with the mbuf \fIm\fP. +If successful, the length of the mbuf is set to CLSIZE, +the size of the page cluster. +.LP +n = m_free(m); +.br +MFREE(m,n); +.IP +The routine \fIm_free\fP and the macro \fIMFREE\fP +each free a single mbuf, \fIm\fP, and any associated external storage area, +placing a pointer to its successor in the chain it heads, if any, in \fIn\fP. +.IP "m_freem(m);" +This routine frees an mbuf chain headed by \fIm\fP. +.PP +The following utility routines are available for manipulating mbuf +chains: +.IP "m = m_copy(m0, off, len);" +.br +The \fIm_copy\fP routine create a copy of all, or part, of a +list of the mbufs in \fIm0\fP. \fILen\fP bytes of data, starting +\fIoff\fP bytes from the front of the chain, are copied. +Where possible, reference counts on pages are used instead +of core to core copies. The original mbuf chain must have at +least \fIoff\fP + \fIlen\fP bytes of data. If \fIlen\fP is +specified as M_COPYALL, all the data present, offset +as before, is copied. +.IP "m_cat(m, n);" +.br +The mbuf chain, \fIn\fP, is appended to the end of \fIm\fP. +Where possible, compaction is performed. +.IP "m_adj(m, diff);" +.br +The mbuf chain, \fIm\fP is adjusted in size by \fIdiff\fP +bytes. If \fIdiff\fP is non-negative, \fIdiff\fP bytes +are shaved off the front of the mbuf chain. If \fIdiff\fP +is negative, the alteration is performed from back to front. +No space is reclaimed in this operation; alterations are +accomplished by changing the \fIm_len\fP and \fIm_off\fP +fields of mbufs. +.IP "m = m_pullup(m0, size);" +.br +After a successful call to \fIm_pullup\fP, the mbuf at +the head of the returned list, \fIm\fP, is guaranteed +to have at least \fIsize\fP +bytes of data in contiguous memory within the data area of the mbuf +(allowing access via a pointer, obtained using the \fImtod\fP macro, +and allowing the mbuf to be located from a pointer to the data area +using \fIdtom\fP, defined below). +If the original data was less than \fIsize\fP bytes long, +\fIlen\fP was greater than the size of an mbuf data +area (112 bytes), or required resources were unavailable, +\fIm\fP is 0 and the original mbuf chain is deallocated. +.IP +This routine is particularly useful when verifying packet +header lengths on reception. For example, if a packet is +received and only 8 of the necessary 16 bytes required +for a valid packet header are present at the head of the list +of mbufs representing the packet, the remaining 8 bytes +may be ``pulled up'' with a single \fIm_pullup\fP call. +If the call fails the invalid packet will have been discarded. +.PP +By insuring that mbufs always reside on 128 byte boundaries, +it is always possible to locate the mbuf associated with a data +area by masking off the low bits of the virtual address. +This allows modules to store data structures in mbufs and +pass them around without concern for locating the original +mbuf when it comes time to free the structure. +Note that this works only with objects stored in the internal data +buffer of the mbuf. +The \fIdtom\fP macro is used to convert a pointer into an mbuf's +data area to a pointer to the mbuf, +.DS +#define dtom(x) ((struct mbuf *)((int)x & ~(MSIZE-1))) +.DE +.PP +Mbufs are used for dynamically allocated data structures such as +sockets as well as memory allocated for packets and headers. Statistics are +maintained on mbuf usage and can be viewed by users using the +\fInetstat\fP\|(1) program. diff --git a/share/doc/smm/18.net/6.t b/share/doc/smm/18.net/6.t new file mode 100644 index 0000000..601988c --- /dev/null +++ b/share/doc/smm/18.net/6.t @@ -0,0 +1,664 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)6.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Internal layering +.br +.ne 2i +.NH +\s+2Internal layering\s0 +.PP +The internal structure of the network system is divided into +three layers. These +layers correspond to the services provided by the socket +abstraction, those provided by the communication protocols, +and those provided by the hardware interfaces. The communication +protocols are normally layered into two or more individual +cooperating layers, though they are collectively viewed +in the system as one layer providing services supportive +of the appropriate socket abstraction. +.PP +The following sections describe the properties of each layer +in the system and the interfaces to which each must conform. +.NH 2 +Socket layer +.PP +The socket layer deals with the interprocess communication +facilities provided by the system. A socket is a bidirectional +endpoint of communication which is ``typed'' by the semantics +of communication it supports. The system calls described in +the \fIBerkeley Software Architecture Manual\fP [Joy86] +are used to manipulate sockets. +.PP +A socket consists of the following data structure: +.DS +._f +struct socket { + short so_type; /* generic type */ + short so_options; /* from socket call */ + short so_linger; /* time to linger while closing */ + short so_state; /* internal state flags */ + caddr_t so_pcb; /* protocol control block */ + struct protosw *so_proto; /* protocol handle */ + struct socket *so_head; /* back pointer to accept socket */ + struct socket *so_q0; /* queue of partial connections */ + short so_q0len; /* partials on so_q0 */ + struct socket *so_q; /* queue of incoming connections */ + short so_qlen; /* number of connections on so_q */ + short so_qlimit; /* max number queued connections */ + struct sockbuf so_rcv; /* receive queue */ + struct sockbuf so_snd; /* send queue */ + short so_timeo; /* connection timeout */ + u_short so_error; /* error affecting connection */ + u_short so_oobmark; /* chars to oob mark */ + short so_pgrp; /* pgrp for signals */ +}; +.DE +.PP +Each socket contains two data queues, \fIso_rcv\fP and \fIso_snd\fP, +and a pointer to routines which provide supporting services. +The type of the socket, +\fIso_type\fP is defined at socket creation time and used in selecting +those services which are appropriate to support it. The supporting +protocol is selected at socket creation time and recorded in +the socket data structure for later use. Protocols are defined +by a table of procedures, the \fIprotosw\fP structure, which will +be described in detail later. A pointer to a protocol-specific +data structure, +the ``protocol control block,'' is also present in the socket structure. +Protocols control this data structure, which normally includes a +back pointer to the parent socket structure to allow easy +lookup when returning information to a user +(for example, placing an error number in the \fIso_error\fP +field). The other entries in the socket structure are used in +queuing connection requests, validating user requests, storing +socket characteristics (e.g. +options supplied at the time a socket is created), and maintaining +a socket's state. +.PP +Processes ``rendezvous at a socket'' in many instances. For instance, +when a process wishes to extract data from a socket's receive queue +and it is empty, or lacks sufficient data to satisfy the request, +the process blocks, supplying the address of the receive queue as +a ``wait channel' to be used in notification. When data arrives +for the process and is placed in the socket's queue, the blocked +process is identified by the fact it is waiting ``on the queue.'' +.NH 3 +Socket state +.PP +A socket's state is defined from the following: +.DS +.ta \w'#define 'u +\w'SS_ISDISCONNECTING 'u +\w'0x000 'u +#define SS_NOFDREF 0x001 /* no file table ref any more */ +#define SS_ISCONNECTED 0x002 /* socket connected to a peer */ +#define SS_ISCONNECTING 0x004 /* in process of connecting to peer */ +#define SS_ISDISCONNECTING 0x008 /* in process of disconnecting */ +#define SS_CANTSENDMORE 0x010 /* can't send more data to peer */ +#define SS_CANTRCVMORE 0x020 /* can't receive more data from peer */ +#define SS_RCVATMARK 0x040 /* at mark on input */ + +#define SS_PRIV 0x080 /* privileged */ +#define SS_NBIO 0x100 /* non-blocking ops */ +#define SS_ASYNC 0x200 /* async i/o notify */ +.DE +.PP +The state of a socket is manipulated both by the protocols +and the user (through system calls). +When a socket is created, the state is defined based on the type of socket. +It may change as control actions are performed, for example connection +establishment. +It may also change according to the type of +input/output the user wishes to perform, as indicated by options +set with \fIfcntl\fP. ``Non-blocking'' I/O implies that +a process should never be blocked to await resources. Instead, any +call which would block returns prematurely +with the error EWOULDBLOCK, or the service request may be partially +fulfilled, e.g. a request for more data than is present. +.PP +If a process requested ``asynchronous'' notification of events +related to the socket, the SIGIO signal is posted to the process +when such events occur. +An event is a change in the socket's state; +examples of such occurrences are: space +becoming available in the send queue, new data available in the +receive queue, connection establishment or disestablishment, etc. +.PP +A socket may be marked ``privileged'' if it was created by the +super-user. Only privileged sockets may +bind addresses in privileged portions of an address space +or use ``raw'' sockets to access lower levels of the network. +.NH 3 +Socket data queues +.PP +A socket's data queue contains a pointer to the data stored in +the queue and other entries related to the management of +the data. The following structure defines a data queue: +.DS +._f +struct sockbuf { + u_short sb_cc; /* actual chars in buffer */ + u_short sb_hiwat; /* max actual char count */ + u_short sb_mbcnt; /* chars of mbufs used */ + u_short sb_mbmax; /* max chars of mbufs to use */ + u_short sb_lowat; /* low water mark */ + short sb_timeo; /* timeout */ + struct mbuf *sb_mb; /* the mbuf chain */ + struct proc *sb_sel; /* process selecting read/write */ + short sb_flags; /* flags, see below */ +}; +.DE +.PP +Data is stored in a queue as a chain of mbufs. +The actual count of data characters as well as high and low water marks are +used by the protocols in controlling the flow of data. +The amount of buffer space (characters of mbufs and associated data pages) +is also recorded along with the limit on buffer allocation. +The socket routines cooperate in implementing the flow control +policy by blocking a process when it requests to send data and +the high water mark has been reached, or when it requests to +receive data and less than the low water mark is present +(assuming non-blocking I/O has not been specified).* +.FS +* The low-water mark is always presumed to be 0 +in the current implementation. +.FE +.PP +When a socket is created, the supporting protocol ``reserves'' space +for the send and receive queues of the socket. +The limit on buffer allocation is set somewhat higher than the limit +on data characters +to account for the granularity of buffer allocation. +The actual storage associated with a +socket queue may fluctuate during a socket's lifetime, but it is assumed +that this reservation will always allow a protocol to acquire enough memory +to satisfy the high water marks. +.PP +The timeout and select values are manipulated by the socket routines +in implementing various portions of the interprocess communications +facilities and will not be described here. +.PP +Data queued at a socket is stored in one of two styles. +Stream-oriented sockets queue data with no addresses, headers +or record boundaries. +The data are in mbufs linked through the \fIm_next\fP field. +Buffers containing access rights may be present within the chain +if the underlying protocol supports passage of access rights. +Record-oriented sockets, including datagram sockets, +queue data as a list of packets; the sections of packets are distinguished +by the types of the mbufs containing them. +The mbufs which comprise a record are linked through the \fIm_next\fP field; +records are linked from the \fIm_act\fP field of the first mbuf +of one packet to the first mbuf of the next. +Each packet begins with an mbuf containing the ``from'' address +if the protocol provides it, +then any buffers containing access rights, and finally any buffers +containing data. +If a record contains no data, +no data buffers are required unless neither address nor access rights +are present. +.PP +A socket queue has a number of flags used in synchronizing access +to the data and in acquiring resources: +.DS +._d +#define SB_LOCK 0x01 /* lock on data queue (so_rcv only) */ +#define SB_WANT 0x02 /* someone is waiting to lock */ +#define SB_WAIT 0x04 /* someone is waiting for data/space */ +#define SB_SEL 0x08 /* buffer is selected */ +#define SB_COLL 0x10 /* collision selecting */ +.DE +The last two flags are manipulated by the system in implementing +the select mechanism. +.NH 3 +Socket connection queuing +.PP +In dealing with connection oriented sockets (e.g. SOCK_STREAM) +the two ends are considered distinct. One end is termed +\fIactive\fP, and generates connection requests. The other +end is called \fIpassive\fP and accepts connection requests. +.PP +From the passive side, a socket is marked with +SO_ACCEPTCONN when a \fIlisten\fP call is made, +creating two queues of sockets: \fIso_q0\fP for connections +in progress and \fIso_q\fP for connections already made and +awaiting user acceptance. +As a protocol is preparing incoming connections, it creates +a socket structure queued on \fIso_q0\fP by calling the routine +\fIsonewconn\fP(). When the connection +is established, the socket structure is then transferred +to \fIso_q\fP, making it available for an \fIaccept\fP. +.PP +If an SO_ACCEPTCONN socket is closed with sockets on either +\fIso_q0\fP or \fIso_q\fP, these sockets are dropped, +with notification to the peers as appropriate. +.NH 2 +Protocol layer(s) +.PP +Each socket is created in a communications domain, +which usually implies both an addressing structure (address family) +and a set of protocols which implement various socket types within the domain +(protocol family). +Each domain is defined by the following structure: +.DS +.ta .5i +\w'struct 'u +\w'(*dom_externalize)(); 'u +struct domain { + int dom_family; /* PF_xxx */ + char *dom_name; + int (*dom_init)(); /* initialize domain data structures */ + int (*dom_externalize)(); /* externalize access rights */ + int (*dom_dispose)(); /* dispose of internalized rights */ + struct protosw *dom_protosw, *dom_protoswNPROTOSW; + struct domain *dom_next; +}; +.DE +.PP +At boot time, each domain configured into the kernel +is added to a linked list of domain. +The initialization procedure of each domain is then called. +After that time, the domain structure is used to locate protocols +within the protocol family. +It may also contain procedure references +for externalization of access rights at the receiving socket +and the disposal of access rights that are not received. +.PP +Protocols are described by a set of entry points and certain +socket-visible characteristics, some of which are used in +deciding which socket type(s) they may support. +.PP +An entry in the ``protocol switch'' table exists for each +protocol module configured into the system. It has the following form: +.DS +.ta .5i +\w'struct 'u +\w'domain *pr_domain; 'u +struct protosw { + short pr_type; /* socket type used for */ + struct domain *pr_domain; /* domain protocol a member of */ + short pr_protocol; /* protocol number */ + short pr_flags; /* socket visible attributes */ +/* protocol-protocol hooks */ + int (*pr_input)(); /* input to protocol (from below) */ + int (*pr_output)(); /* output to protocol (from above) */ + int (*pr_ctlinput)(); /* control input (from below) */ + int (*pr_ctloutput)(); /* control output (from above) */ +/* user-protocol hook */ + int (*pr_usrreq)(); /* user request */ +/* utility hooks */ + int (*pr_init)(); /* initialization routine */ + int (*pr_fasttimo)(); /* fast timeout (200ms) */ + int (*pr_slowtimo)(); /* slow timeout (500ms) */ + int (*pr_drain)(); /* flush any excess space possible */ +}; +.DE +.PP +A protocol is called through the \fIpr_init\fP entry before any other. +Thereafter it is called every 200 milliseconds through the +\fIpr_fasttimo\fP entry and +every 500 milliseconds through the \fIpr_slowtimo\fP for timer based actions. +The system will call the \fIpr_drain\fP entry if it is low on space and +this should throw away any non-critical data. +.PP +Protocols pass data between themselves as chains of mbufs using +the \fIpr_input\fP and \fIpr_output\fP routines. \fIPr_input\fP +passes data up (towards +the user) and \fIpr_output\fP passes it down (towards the network); control +information passes up and down on \fIpr_ctlinput\fP and \fIpr_ctloutput\fP. +The protocol is responsible for the space occupied by any of the +arguments to these entries and must either pass it onward or dispose of it. +(On output, the lowest level reached must free buffers storing the arguments; +on input, the highest level is responsible for freeing buffers.) +.PP +The \fIpr_usrreq\fP routine interfaces protocols to the socket +code and is described below. +.PP +The \fIpr_flags\fP field is constructed from the following values: +.DS +.ta \w'#define 'u +\w'PR_CONNREQUIRED 'u +8n +#define PR_ATOMIC 0x01 /* exchange atomic messages only */ +#define PR_ADDR 0x02 /* addresses given with messages */ +#define PR_CONNREQUIRED 0x04 /* connection required by protocol */ +#define PR_WANTRCVD 0x08 /* want PRU_RCVD calls */ +#define PR_RIGHTS 0x10 /* passes capabilities */ +.DE +Protocols which are connection-based specify the PR_CONNREQUIRED +flag so that the socket routines will never attempt to send data +before a connection has been established. If the PR_WANTRCVD flag +is set, the socket routines will notify the protocol when the user +has removed data from the socket's receive queue. This allows +the protocol to implement acknowledgement on user receipt, and +also update windowing information based on the amount of space +available in the receive queue. The PR_ADDR field indicates that any +data placed in the socket's receive queue will be preceded by the +address of the sender. The PR_ATOMIC flag specifies that each \fIuser\fP +request to send data must be performed in a single \fIprotocol\fP send +request; it is the protocol's responsibility to maintain record +boundaries on data to be sent. The PR_RIGHTS flag indicates that the +protocol supports the passing of capabilities; this is currently +used only by the protocols in the UNIX protocol family. +.PP +When a socket is created, the socket routines scan the protocol +table for the domain +looking for an appropriate protocol to support the type of +socket being created. The \fIpr_type\fP field contains one of the +possible socket types (e.g. SOCK_STREAM), while the \fIpr_domain\fP +is a back pointer to the domain structure. +The \fIpr_protocol\fP field contains the protocol number of the +protocol, normally a well-known value. +.NH 2 +Network-interface layer +.PP +Each network-interface configured into a system defines a +path through which packets may be sent and received. +Normally a hardware device is associated with this interface, +though there is no requirement for this (for example, all +systems have a software ``loopback'' interface used for +debugging and performance analysis). +In addition to manipulating the hardware device, an interface +module is responsible +for encapsulation and decapsulation of any link-layer header +information required to deliver a message to its destination. +The selection of which interface to use in delivering packets +is a routing decision carried out at a +higher level than the network-interface layer. +An interface may have addresses in one or more address families. +The address is set at boot time using an \fIioctl\fP on a socket +in the appropriate domain; this operation is implemented by the protocol +family, after verifying the operation through the device \fIioctl\fP entry. +.PP +An interface is defined by the following structure, +.DS +.ta .5i +\w'struct 'u +\w'ifaddr *if_addrlist; 'u +struct ifnet { + char *if_name; /* name, e.g. ``en'' or ``lo'' */ + short if_unit; /* sub-unit for lower level driver */ + short if_mtu; /* maximum transmission unit */ + short if_flags; /* up/down, broadcast, etc. */ + short if_timer; /* time 'til if_watchdog called */ + struct ifaddr *if_addrlist; /* list of addresses of interface */ + struct ifqueue if_snd; /* output queue */ + int (*if_init)(); /* init routine */ + int (*if_output)(); /* output routine */ + int (*if_ioctl)(); /* ioctl routine */ + int (*if_reset)(); /* bus reset routine */ + int (*if_watchdog)(); /* timer routine */ + int if_ipackets; /* packets received on interface */ + int if_ierrors; /* input errors on interface */ + int if_opackets; /* packets sent on interface */ + int if_oerrors; /* output errors on interface */ + int if_collisions; /* collisions on csma interfaces */ + struct ifnet *if_next; +}; +.DE +Each interface address has the following form: +.DS +.ta \w'#define 'u +\w'struct 'u +\w'struct 'u +\w'sockaddr ifa_addr; 'u-\w'struct 'u +struct ifaddr { + struct sockaddr ifa_addr; /* address of interface */ + union { + struct sockaddr ifu_broadaddr; + struct sockaddr ifu_dstaddr; + } ifa_ifu; + struct ifnet *ifa_ifp; /* back-pointer to interface */ + struct ifaddr *ifa_next; /* next address for interface */ +}; +.ta \w'#define 'u +\w'ifa_broadaddr 'u +\w'ifa_ifu.ifu_broadaddr 'u +#define ifa_broadaddr ifa_ifu.ifu_broadaddr /* broadcast address */ +#define ifa_dstaddr ifa_ifu.ifu_dstaddr /* other end of p-to-p link */ +.DE +The protocol generally maintains this structure as part of a larger +structure containing additional information concerning the address. +.PP +Each interface has a send queue and routines used for +initialization, \fIif_init\fP, and output, \fIif_output\fP. +If the interface resides on a system bus, the routine \fIif_reset\fP +will be called after a bus reset has been performed. +An interface may also +specify a timer routine, \fIif_watchdog\fP; +if \fIif_timer\fP is non-zero, it is decremented once per second +until it reaches zero, at which time the watchdog routine is called. +.PP +The state of an interface and certain characteristics are stored in +the \fIif_flags\fP field. The following values are possible: +.DS +._d +#define IFF_UP 0x1 /* interface is up */ +#define IFF_BROADCAST 0x2 /* broadcast is possible */ +#define IFF_DEBUG 0x4 /* turn on debugging */ +#define IFF_LOOPBACK 0x8 /* is a loopback net */ +#define IFF_POINTOPOINT 0x10 /* interface is point-to-point link */ +#define IFF_NOTRAILERS 0x20 /* avoid use of trailers */ +#define IFF_RUNNING 0x40 /* resources allocated */ +#define IFF_NOARP 0x80 /* no address resolution protocol */ +.DE +If the interface is connected to a network which supports transmission +of \fIbroadcast\fP packets, the IFF_BROADCAST flag will be set and +the \fIifa_broadaddr\fP field will contain the address to be used in +sending or accepting a broadcast packet. If the interface is associated +with a point-to-point hardware link (for example, a DEC DMR-11), the +IFF_POINTOPOINT flag will be set and \fIifa_dstaddr\fP will contain the +address of the host on the other side of the connection. These addresses +and the local address of the interface, \fIif_addr\fP, are used in +filtering incoming packets. The interface sets IFF_RUNNING after +it has allocated system resources and posted an initial read on the +device it manages. This state bit is used to avoid multiple allocation +requests when an interface's address is changed. The IFF_NOTRAILERS +flag indicates the interface should refrain from using a \fItrailer\fP +encapsulation on outgoing packets, or (where per-host negotiation +of trailers is possible) that trailer encapsulations should not be requested; +\fItrailer\fP protocols are described +in section 14. The IFF_NOARP flag indicates the interface should not +use an ``address resolution protocol'' in mapping internetwork addresses +to local network addresses. +.PP +Various statistics are also stored in the interface structure. These +may be viewed by users using the \fInetstat\fP(1) program. +.PP +The interface address and flags may be set with the SIOCSIFADDR and +SIOCSIFFLAGS \fIioctl\fP\^s. SIOCSIFADDR is used initially to define each +interface's address; SIOGSIFFLAGS can be used to mark +an interface down and perform site-specific configuration. +The destination address of a point-to-point link is set with SIOCSIFDSTADDR. +Corresponding operations exist to read each value. +Protocol families may also support operations to set and read the broadcast +address. +In addition, the SIOCGIFCONF \fIioctl\fP retrieves a list of interface +names and addresses for all interfaces and protocols on the host. +.NH 3 +UNIBUS interfaces +.PP +All hardware related interfaces currently reside on the UNIBUS. +Consequently a common set of utility routines for dealing +with the UNIBUS has been developed. Each UNIBUS interface +utilizes a structure of the following form: +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifubinfo { + short iff_uban; /* uba number */ + short iff_hlen; /* local net header length */ + struct uba_regs *iff_uba; /* uba regs, in vm */ + short iff_flags; /* used during uballoc's */ +}; +.DE +Additional structures are associated with each receive and transmit buffer, +normally one each per interface; for read, +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifrw { + caddr_t ifrw_addr; /* virt addr of header */ + short ifrw_bdp; /* unibus bdp */ + short ifrw_flags; /* type, etc. */ +#define IFRW_W 0x01 /* is a transmit buffer */ + int ifrw_info; /* value from ubaalloc */ + int ifrw_proto; /* map register prototype */ + struct pte *ifrw_mr; /* base of map registers */ +}; +.DE +and for write, +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifxmt { + struct ifrw ifrw; + caddr_t ifw_base; /* virt addr of buffer */ + struct pte ifw_wmap[IF_MAXNUBAMR]; /* base pages for output */ + struct mbuf *ifw_xtofree; /* pages being dma'd out */ + short ifw_xswapd; /* mask of clusters swapped */ + short ifw_nmr; /* number of entries in wmap */ +}; +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +#define ifw_addr ifrw.ifrw_addr +#define ifw_bdp ifrw.ifrw_bdp +#define ifw_flags ifrw.ifrw_flags +#define ifw_info ifrw.ifrw_info +#define ifw_proto ifrw.ifrw_proto +#define ifw_mr ifrw.ifrw_mr +.DE +One of each of these structures is conveniently packaged for interfaces +with single buffers for each direction, as follows: +.DS +.ta \w'#define 'u +\w'ifw_xtofree 'u +\w'pte ifu_wmap[IF_MAXNUBAMR]; 'u +struct ifuba { + struct ifubinfo ifu_info; + struct ifrw ifu_r; + struct ifxmt ifu_xmt; +}; +.ta \w'#define 'u +\w'ifw_xtofree 'u +#define ifu_uban ifu_info.iff_uban +#define ifu_hlen ifu_info.iff_hlen +#define ifu_uba ifu_info.iff_uba +#define ifu_flags ifu_info.iff_flags +#define ifu_w ifu_xmt.ifrw +#define ifu_xtofree ifu_xmt.ifw_xtofree +.DE +.PP +The \fIif_ubinfo\fP structure contains the general information needed +to characterize the I/O-mapped buffers for the device. +In addition, there is a structure describing each buffer, including +UNIBUS resources held by the interface. +Sufficient memory pages and bus map registers are allocated to each buffer +upon initialization according to the maximum packet size and header length. +The kernel virtual address of the buffer is held in \fIifrw_addr\fP, +and the map registers begin +at \fIifrw_mr\fP. UNIBUS map register \fIifrw_mr\fP\^[\-1] +maps the local network header +ending on a page boundary. UNIBUS data paths are +reserved for read and for +write, given by \fIifrw_bdp\fP. The prototype of the map +registers for read and for write is saved in \fIifrw_proto\fP. +.PP +When write transfers are not at least half-full pages on page boundaries, +the data are just copied into the pages mapped on the UNIBUS +and the transfer is started. +If a write transfer is at least half a page long and on a page +boundary, UNIBUS page table entries are swapped to reference +the pages, and then the initial pages are +remapped from \fIifw_wmap\fP when the transfer completes. +The mbufs containing the mapped pages are placed on the \fIifw_xtofree\fP +queue to be freed after transmission. +.PP +When read transfers give at least half a page of data to be input, page +frames are allocated from a network page list and traded +with the pages already containing the data, mapping the allocated +pages to replace the input pages for the next UNIBUS data input. +.PP +The following utility routines are available for use in +writing network interface drivers; all use the +structures described above. +.LP +if_ubaminit(ifubinfo, uban, hlen, nmr, ifr, nr, ifx, nx); +.br +if_ubainit(ifuba, uban, hlen, nmr); +.IP +\fIif_ubaminit\fP allocates resources on UNIBUS adapter \fIuban\fP, +storing the information in the \fIifubinfo\fP, \fIifrw\fP and \fIifxmt\fP +structures referenced. +The \fIifr\fP and \fIifx\fP parameters are pointers to arrays +of \fIifrw\fP and \fIifxmt\fP structures whose dimensions +are \fInr\fP and \fInx\fP, respectively. +\fIif_ubainit\fP is a simpler, backwards-compatible interface used +for hardware with single buffers of each type. +They are called only at boot time or after a UNIBUS reset. +One data path (buffered or unbuffered, +depending on the \fIifu_flags\fP field) is allocated for each buffer. +The \fInmr\fP parameter indicates +the number of UNIBUS mapping registers required to map a maximal +sized packet onto the UNIBUS, while \fIhlen\fP specifies the size +of a local network header, if any, which should be mapped separately +from the data (see the description of trailer protocols in chapter 14). +Sufficient UNIBUS mapping registers and pages of memory are allocated +to initialize the input data path for an initial read. For the output +data path, mapping registers and pages of memory are also allocated +and mapped onto the UNIBUS. The pages associated with the output +data path are held in reserve in the event a write requires copying +non-page-aligned data (see \fIif_wubaput\fP below). +If \fIif_ubainit\fP is called with memory pages already allocated, +they will be used instead of allocating new ones (this normally +occurs after a UNIBUS reset). +A 1 is returned when allocation and initialization are successful, +0 otherwise. +.LP +m = if_ubaget(ifubinfo, ifr, totlen, off0, ifp); +.br +m = if_rubaget(ifuba, totlen, off0, ifp); +.IP +\fIif_ubaget\fP and \fIif_rubaget\fP pull input data +out of an interface receive buffer and into an mbuf chain. +The first interface passes pointers to the \fIifubinfo\fP structure +for the interface and the \fIifrw\fP structure for the receive buffer; +the second call may be used for single-buffered devices. +\fItotlen\fP specifies the length of data to be obtained, not counting the +local network header. If \fIoff0\fP is non-zero, it indicates +a byte offset to a trailing local network header which should be +copied into a separate mbuf and prepended to the front of the resultant mbuf +chain. When the data amount to at least a half a page, +the previously mapped data pages are remapped +into the mbufs and swapped with fresh pages, thus avoiding +any copy. +The receiving interface is recorded as \fIifp\fP, a pointer to an \fIifnet\fP +structure, for the use of the receiving network protocol. +A 0 return value indicates a failure to allocate resources. +.LP +if_wubaput(ifubinfo, ifx, m); +.br +if_wubaput(ifuba, m); +.IP +\fIif_ubaput\fP and \fIif_wubaput\fP map a chain of mbufs +onto a network interface in preparation for output. +The first interface is used by devices with multiple transmit buffers. +The chain includes any local network +header, which is copied so that it resides in the mapped and +aligned I/O space. +Page-aligned data that are page-aligned in the output buffer +are mapped to the UNIBUS in place of the normal buffer page, +and the corresponding mbuf is placed on a queue to be freed after transmission. +Any other mbufs which contained non-page-sized +data portions are copied to the I/O space and then freed. +Pages mapped from a previous output operation (no longer needed) +are unmapped. diff --git a/share/doc/smm/18.net/7.t b/share/doc/smm/18.net/7.t new file mode 100644 index 0000000..e165de0 --- /dev/null +++ b/share/doc/smm/18.net/7.t @@ -0,0 +1,258 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)7.t 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.\" +.nr H2 1 +.br +.ne 30v +.\".ds RH "Socket/protocol interface +.NH +\s+2Socket/protocol interface\s0 +.PP +The interface between the socket routines and the communication +protocols is through the \fIpr_usrreq\fP routine defined in the +protocol switch table. The following requests to a protocol +module are possible: +.DS +._d +#define PRU_ATTACH 0 /* attach protocol */ +#define PRU_DETACH 1 /* detach protocol */ +#define PRU_BIND 2 /* bind socket to address */ +#define PRU_LISTEN 3 /* listen for connection */ +#define PRU_CONNECT 4 /* establish connection to peer */ +#define PRU_ACCEPT 5 /* accept connection from peer */ +#define PRU_DISCONNECT 6 /* disconnect from peer */ +#define PRU_SHUTDOWN 7 /* won't send any more data */ +#define PRU_RCVD 8 /* have taken data; more room now */ +#define PRU_SEND 9 /* send this data */ +#define PRU_ABORT 10 /* abort (fast DISCONNECT, DETATCH) */ +#define PRU_CONTROL 11 /* control operations on protocol */ +#define PRU_SENSE 12 /* return status into m */ +#define PRU_RCVOOB 13 /* retrieve out of band data */ +#define PRU_SENDOOB 14 /* send out of band data */ +#define PRU_SOCKADDR 15 /* fetch socket's address */ +#define PRU_PEERADDR 16 /* fetch peer's address */ +#define PRU_CONNECT2 17 /* connect two sockets */ +/* begin for protocols internal use */ +#define PRU_FASTTIMO 18 /* 200ms timeout */ +#define PRU_SLOWTIMO 19 /* 500ms timeout */ +#define PRU_PROTORCV 20 /* receive from below */ +#define PRU_PROTOSEND 21 /* send to below */ +.DE +A call on the user request routine is of the form, +.DS +._f +error = (*protosw[].pr_usrreq)(so, req, m, addr, rights); +int error; struct socket *so; int req; struct mbuf *m, *addr, *rights; +.DE +The mbuf data chain \fIm\fP is supplied for output operations +and for certain other operations where it is to receive a result. +The address \fIaddr\fP is supplied for address-oriented requests +such as PRU_BIND and PRU_CONNECT. +The \fIrights\fP parameter is an optional pointer to an mbuf +chain containing user-specified capabilities (see the \fIsendmsg\fP +and \fIrecvmsg\fP system calls). The protocol is responsible for +disposal of the data mbuf chains on output operations. +A non-zero return value gives a +UNIX error number which should be passed to higher level software. +The following paragraphs describe each +of the requests possible. +.IP PRU_ATTACH +.br +When a protocol is bound to a socket (with the \fIsocket\fP +system call) the protocol module is called with this +request. It is the responsibility of the protocol module to +allocate any resources necessary. +The ``attach'' request +will always precede any of the other requests, and should not +occur more than once. +.IP PRU_DETACH +.br +This is the antithesis of the attach request, and is used +at the time a socket is deleted. The protocol module may +deallocate any resources assigned to the socket. +.IP PRU_BIND +.br +When a socket is initially created it has no address bound +to it. This request indicates that an address should be bound to +an existing socket. The protocol module must verify that the +requested address is valid and available for use. +.IP PRU_LISTEN +.br +The ``listen'' request indicates the user wishes to listen +for incoming connection requests on the associated socket. +The protocol module should perform any state changes needed +to carry out this request (if possible). A ``listen'' request +always precedes a request to accept a connection. +.IP PRU_CONNECT +.br +The ``connect'' request indicates the user wants to establish +an association. The \fIaddr\fP parameter supplied describes +the peer to be connected to. The effect of a connect request +may vary depending on the protocol. Virtual circuit protocols, +such as TCP [Postel81b], use this request to initiate establishment of a +TCP connection. Datagram protocols, such as UDP [Postel80], simply +record the peer's address in a private data structure and use +it to tag all outgoing packets. There are no restrictions +on how many times a connect request may be used after an attach. +If a protocol supports the notion of \fImulti-casting\fP, it +is possible to use multiple connects to establish a multi-cast +group. Alternatively, an association may be broken by a +PRU_DISCONNECT request, and a new association created with a +subsequent connect request; all without destroying and creating +a new socket. +.IP PRU_ACCEPT +.br +Following a successful PRU_LISTEN request and the arrival +of one or more connections, this request is made to +indicate the user +has accepted the first connection on the queue of +pending connections. The protocol module should fill +in the supplied address buffer with the address of the +connected party. +.IP PRU_DISCONNECT +.br +Eliminate an association created with a PRU_CONNECT request. +.IP PRU_SHUTDOWN +.br +This call is used to indicate no more data will be sent and/or +received (the \fIaddr\fP parameter indicates the direction of +the shutdown, as encoded in the \fIsoshutdown\fP system call). +The protocol may, at its discretion, deallocate any data +structures related to the shutdown and/or notify a connected peer +of the shutdown. +.IP PRU_RCVD +.br +This request is made only if the protocol entry in the protocol +switch table includes the PR_WANTRCVD flag. +When a user removes data from the receive queue this request +will be sent to the protocol module. It may be used to trigger +acknowledgements, refresh windowing information, initiate +data transfer, etc. +.IP PRU_SEND +.br +Each user request to send data is translated into one or more +PRU_SEND requests (a protocol may indicate that a single user +send request must be translated into a single PRU_SEND request by +specifying the PR_ATOMIC flag in its protocol description). +The data to be sent is presented to the protocol as a list of +mbufs and an address is, optionally, supplied in the \fIaddr\fP +parameter. The protocol is responsible for preserving the data +in the socket's send queue if it is not able to send it immediately, +or if it may need it at some later time (e.g. for retransmission). +.IP PRU_ABORT +.br +This request indicates an abnormal termination of service. The +protocol should delete any existing association(s). +.IP PRU_CONTROL +.br +The ``control'' request is generated when a user performs a +UNIX \fIioctl\fP system call on a socket (and the ioctl is not +intercepted by the socket routines). It allows protocol-specific +operations to be provided outside the scope of the common socket +interface. The \fIaddr\fP parameter contains a pointer to a static +kernel data area where relevant information may be obtained or returned. +The \fIm\fP parameter contains the actual \fIioctl\fP request code +(note the non-standard calling convention). +The \fIrights\fP parameter contains a pointer to an \fIifnet\fP structure +if the \fIioctl\fP operation pertains to a particular network interface. +.IP PRU_SENSE +.br +The ``sense'' request is generated when the user makes an \fIfstat\fP +system call on a socket; it requests status of the associated socket. +This currently returns a standard \fIstat\fP structure. +It typically contains only the +optimal transfer size for the connection (based on buffer size, +windowing information and maximum packet size). +The \fIm\fP parameter contains a pointer +to a static kernel data area where the status buffer should be placed. +.IP PRU_RCVOOB +.br +Any ``out-of-band'' data presently available is to be returned. An +mbuf is passed to the protocol module, and the protocol +should either place +data in the mbuf or attach new mbufs to the one supplied if there is +insufficient space in the single mbuf. +An error may be returned if out-of-band data is not (yet) available +or has already been consumed. +The \fIaddr\fP parameter contains any options such as MSG_PEEK +to examine data without consuming it. +.IP PRU_SENDOOB +.br +Like PRU_SEND, but for out-of-band data. +.IP PRU_SOCKADDR +.br +The local address of the socket is returned, if any is currently +bound to it. The address (with protocol specific format) is returned +in the \fIaddr\fP parameter. +.IP PRU_PEERADDR +.br +The address of the peer to which the socket is connected is returned. +The socket must be in a SS_ISCONNECTED state for this request to +be made to the protocol. The address format (protocol specific) is +returned in the \fIaddr\fP parameter. +.IP PRU_CONNECT2 +.br +The protocol module is supplied two sockets and requested to +establish a connection between the two without binding any +addresses, if possible. This call is used in implementing +the +.IR socketpair (2) +system call. +.PP +The following requests are used internally by the protocol modules +and are never generated by the socket routines. In certain instances, +they are handed to the \fIpr_usrreq\fP routine solely for convenience +in tracing a protocol's operation (e.g. PRU_SLOWTIMO). +.IP PRU_FASTTIMO +.br +A ``fast timeout'' has occurred. This request is made when a timeout +occurs in the protocol's \fIpr_fastimo\fP routine. The \fIaddr\fP +parameter indicates which timer expired. +.IP PRU_SLOWTIMO +.br +A ``slow timeout'' has occurred. This request is made when a timeout +occurs in the protocol's \fIpr_slowtimo\fP routine. The \fIaddr\fP +parameter indicates which timer expired. +.IP PRU_PROTORCV +.br +This request is used in the protocol-protocol interface, not by the +routines. It requests reception of data destined for the protocol and +not the user. No protocols currently use this facility. +.IP PRU_PROTOSEND +.br +This request allows a protocol to send data destined for another +protocol module, not a user. The details of how data is marked +``addressed to protocol'' instead of ``addressed to user'' are +left to the protocol modules. No protocols currently use this facility. diff --git a/share/doc/smm/18.net/8.t b/share/doc/smm/18.net/8.t new file mode 100644 index 0000000..e65e656 --- /dev/null +++ b/share/doc/smm/18.net/8.t @@ -0,0 +1,166 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)8.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Protocol/protocol interface +.br +.ne 2i +.NH +\s+2Protocol/protocol interface\s0 +.PP +The interface between protocol modules is through the \fIpr_usrreq\fP, +\fIpr_input\fP, \fIpr_output\fP, \fIpr_ctlinput\fP, and +\fIpr_ctloutput\fP routines. The calling conventions for all +but the \fIpr_usrreq\fP routine are expected to be specific to +the protocol +modules and are not guaranteed to be consistent across protocol +families. We +will examine the conventions used for some of the Internet +protocols in this section as an example. +.NH 2 +pr_output +.PP +The Internet protocol UDP uses the convention, +.DS +error = udp_output(inp, m); +int error; struct inpcb *inp; struct mbuf *m; +.DE +where the \fIinp\fP, ``\fIin\fP\^ternet +\fIp\fP\^rotocol \fIc\fP\^ontrol \fIb\fP\^lock'', +passed between modules conveys per connection state information, and +the mbuf chain contains the data to be sent. UDP +performs consistency checks, appends its header, calculates a +checksum, etc. before passing the packet on. +UDP is based on the Internet Protocol, IP [Postel81a], as its transport. +UDP passes a packet to the IP module for output as follows: +.DS +error = ip_output(m, opt, ro, flags); +int error; struct mbuf *m, *opt; struct route *ro; int flags; +.DE +.PP +The call to IP's output routine is more complicated than that for +UDP, as befits the additional work the IP module must do. +The \fIm\fP parameter is the data to be sent, and the \fIopt\fP +parameter is an optional list of IP options which should +be placed in the IP packet header. The \fIro\fP parameter is +is used in making routing decisions (and passing them back to the +caller for use in subsequent calls). The +final parameter, \fIflags\fP contains flags indicating whether the +user is allowed to transmit a broadcast packet +and if routing is to be performed. The broadcast flag may +be inconsequential if the underlying hardware does not support the +notion of broadcasting. +.PP +All output routines return 0 on success and a UNIX error number +if a failure occurred which could be detected immediately +(no buffer space available, no route to destination, etc.). +.NH 2 +pr_input +.PP +Both UDP and TCP use the following calling convention, +.DS +(void) (*protosw[].pr_input)(m, ifp); +struct mbuf *m; struct ifnet *ifp; +.DE +Each mbuf list passed is a single packet to be processed by +the protocol module. +The interface from which the packet was received is passed as the second +parameter. +.PP +The IP input routine is a VAX software interrupt level routine, +and so is not called with any parameters. It instead communicates +with network interfaces through a queue, \fIipintrq\fP, which is +identical in structure to the queues used by the network interfaces +for storing packets awaiting transmission. +The software interrupt is enabled by the network interfaces +when they place input data on the input queue. +.NH 2 +pr_ctlinput +.PP +This routine is used to convey ``control'' information to a +protocol module (i.e. information which might be passed to the +user, but is not data). +.PP +The common calling convention for this routine is, +.DS +(void) (*protosw[].pr_ctlinput)(req, addr); +int req; struct sockaddr *addr; +.DE +The \fIreq\fP parameter is one of the following, +.DS +.ta \w'#define 'u +\w'PRC_UNREACH_NEEDFRAG 'u +8n +#define PRC_IFDOWN 0 /* interface transition */ +#define PRC_ROUTEDEAD 1 /* select new route if possible */ +#define PRC_QUENCH 4 /* some said to slow down */ +#define PRC_MSGSIZE 5 /* message size forced drop */ +#define PRC_HOSTDEAD 6 /* normally from IMP */ +#define PRC_HOSTUNREACH 7 /* ditto */ +#define PRC_UNREACH_NET 8 /* no route to network */ +#define PRC_UNREACH_HOST 9 /* no route to host */ +#define PRC_UNREACH_PROTOCOL 10 /* dst says bad protocol */ +#define PRC_UNREACH_PORT 11 /* bad port # */ +#define PRC_UNREACH_NEEDFRAG 12 /* IP_DF caused drop */ +#define PRC_UNREACH_SRCFAIL 13 /* source route failed */ +#define PRC_REDIRECT_NET 14 /* net routing redirect */ +#define PRC_REDIRECT_HOST 15 /* host routing redirect */ +#define PRC_REDIRECT_TOSNET 14 /* redirect for type of service & net */ +#define PRC_REDIRECT_TOSHOST 15 /* redirect for tos & host */ +#define PRC_TIMXCEED_INTRANS 18 /* packet lifetime expired in transit */ +#define PRC_TIMXCEED_REASS 19 /* lifetime expired on reass q */ +#define PRC_PARAMPROB 20 /* header incorrect */ +.DE +while the \fIaddr\fP parameter is the address to which the condition applies. +Many of the requests have obviously been +derived from ICMP (the Internet Control Message Protocol [Postel81c]), +and from error messages defined in the 1822 host/IMP convention +[BBN78]. Mapping tables exist to convert +control requests to UNIX error codes which are delivered +to a user. +.NH 2 +pr_ctloutput +.PP +This is the routine that implements per-socket options at the protocol +level for \fIgetsockopt\fP and \fIsetsockopt\fP. +The calling convention is, +.DS +error = (*protosw[].pr_ctloutput)(op, so, level, optname, mp); +int op; struct socket *so; int level, optname; struct mbuf **mp; +.DE +where \fIop\fP is one of PRCO_SETOPT or PRCO_GETOPT, +\fIso\fP is the socket from whence the call originated, +and \fIlevel\fP and \fIoptname\fP are the protocol level and option name +supplied by the user. +The results of a PRCO_GETOPT call are returned in an mbuf whose address +is placed in \fImp\fP before return. +On a PRCO_SETOPT call, \fImp\fP contains the address of an mbuf +containing the option data; the mbuf should be freed before return. diff --git a/share/doc/smm/18.net/9.t b/share/doc/smm/18.net/9.t new file mode 100644 index 0000000..506037a --- /dev/null +++ b/share/doc/smm/18.net/9.t @@ -0,0 +1,124 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)9.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Protocol/network-interface +.br +.ne 2i +.NH +\s+2Protocol/network-interface interface\s0 +.PP +The lowest layer in the set of protocols which comprise a +protocol family must interface itself to one or more network +interfaces in order to transmit and receive +packets. It is assumed that +any routing decisions have been made before handing a packet +to a network interface, in fact this is absolutely necessary +in order to locate any interface at all (unless, of course, +one uses a single ``hardwired'' interface). There are two +cases with which to be concerned, transmission of a packet +and receipt of a packet; each will be considered separately. +.NH 2 +Packet transmission +.PP +Assuming a protocol has a handle on an interface, \fIifp\fP, +a (struct ifnet\ *), +it transmits a fully formatted packet with the following call, +.DS +error = (*ifp->if_output)(ifp, m, dst) +int error; struct ifnet *ifp; struct mbuf *m; struct sockaddr *dst; +.DE +The output routine for the network interface transmits the packet +\fIm\fP to the \fIdst\fP address, or returns an error indication +(a UNIX error number). In reality transmission may +not be immediate or successful; normally the output +routine simply queues the packet on its send queue and primes +an interrupt driven routine to actually transmit the packet. +For unreliable media, such as the Ethernet, ``successful'' +transmission simply means that the packet has been placed on the cable +without a collision. On the other hand, an 1822 interface guarantees +proper delivery or an error indication for each message transmitted. +The model employed in the networking system attaches no promises +of delivery to the packets handed to a network interface, and thus +corresponds more closely to the Ethernet. Errors returned by the +output routine are only those that can be detected immediately, +and are normally trivial in nature (no buffer space, +address format not handled, etc.). +No indication is received if errors are detected after the call has returned. +.NH 2 +Packet reception +.PP +Each protocol family must have one or more ``lowest level'' protocols. +These protocols deal with internetwork addressing and are responsible +for the delivery of incoming packets to the proper protocol processing +modules. In the PUP model [Boggs78] these protocols are termed Level +1 protocols, +in the ISO model, network layer protocols. In this system each such +protocol module has an input packet queue assigned to it. Incoming +packets received by a network interface are queued for the protocol +module, and a VAX software interrupt is posted to initiate processing. +.PP +Three macros are available for queuing and dequeuing packets: +.IP "IF_ENQUEUE(ifq, m)" +.br +This places the packet \fIm\fP at the tail of the queue \fIifq\fP. +.IP "IF_DEQUEUE(ifq, m)" +.br +This places a pointer to the packet at the head of queue \fIifq\fP +in \fIm\fP +and removes the packet from the queue. +A zero value will be returned in \fIm\fP if the queue is empty. +.IP "IF_DEQUEUEIF(ifq, m, ifp)" +.br +Like IF_DEQUEUE, this removes the next packet from the head of a queue +and returns it in \fIm\fP. +A pointer to the interface on which the packet was received +is placed in \fIifp\fP, a (struct ifnet\ *). +.IP "IF_PREPEND(ifq, m)" +.br +This places the packet \fIm\fP at the head of the queue \fIifq\fP. +.PP +Each queue has a maximum length associated with it as a simple form +of congestion control. The macro IF_QFULL(ifq) returns 1 if the queue +is filled, in which case the macro IF_DROP(ifq) should be used to +increment the count of the number of packets dropped, and the offending +packet is dropped. For example, the following code fragment is commonly +found in a network interface's input routine, +.DS +._f +if (IF_QFULL(inq)) { + IF_DROP(inq); + m_freem(m); +} else + IF_ENQUEUE(inq, m); +.DE diff --git a/share/doc/smm/18.net/Makefile b/share/doc/smm/18.net/Makefile new file mode 100644 index 0000000..47e7e11 --- /dev/null +++ b/share/doc/smm/18.net/Makefile @@ -0,0 +1,8 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/10/93 +# $FreeBSD$ + +VOLUME= smm/18.net +SRCS= 0.t 1.t 2.t 3.t 4.t 5.t 6.t 7.t 8.t 9.t a.t b.t c.t d.t e.t f.t +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/18.net/a.t b/share/doc/smm/18.net/a.t new file mode 100644 index 0000000..dddba57 --- /dev/null +++ b/share/doc/smm/18.net/a.t @@ -0,0 +1,219 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)a.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Gateways and routing +.br +.ne 2i +.NH +\s+2Gateways and routing issues\s0 +.PP +The system has been designed with the expectation that it will +be used in an internetwork environment. The ``canonical'' +environment was envisioned to be a collection of local area +networks connected at one or more points through hosts with +multiple network interfaces (one on each local area network), +and possibly a connection to a long haul network (for example, +the ARPANET). In such an environment, issues of +gatewaying and packet routing become very important. Certain +of these issues, such as congestion +control, have been handled in a simplistic manner or specifically +not addressed. +Instead, where possible, the network system +attempts to provide simple mechanisms upon which more involved +policies may be implemented. As some of these problems become +better understood, the solutions developed will be incorporated +into the system. +.PP +This section will describe the facilities provided for packet +routing. The simplistic mechanisms provided for congestion +control are described in chapter 12. +.NH 2 +Routing tables +.PP +The network system maintains a set of routing tables for +selecting a network interface to use in delivering a +packet to its destination. These tables are of the form: +.DS +.ta \w'struct 'u +\w'u_long 'u +\w'sockaddr rt_gateway; 'u +struct rtentry { + u_long rt_hash; /* hash key for lookups */ + struct sockaddr rt_dst; /* destination net or host */ + struct sockaddr rt_gateway; /* forwarding agent */ + short rt_flags; /* see below */ + short rt_refcnt; /* no. of references to structure */ + u_long rt_use; /* packets sent using route */ + struct ifnet *rt_ifp; /* interface to give packet to */ +}; +.DE +.PP +The routing information is organized in two separate tables, one +for routes to a host and one for routes to a network. The +distinction between hosts and networks is necessary so +that a single mechanism may be used +for both broadcast and multi-drop type networks, and +also for networks built from point-to-point links (e.g +DECnet [DEC80]). +.PP +Each table is organized as a hashed set of linked lists. +Two 32-bit hash values are calculated by routines defined for +each address family; one based on the destination being +a host, and one assuming the target is the network portion +of the address. Each hash value is used to +locate a hash chain to search (by taking the value modulo the +hash table size) and the entire 32-bit value is then +used as a key in scanning the list of routes. Lookups are +applied first to the routing +table for hosts, then to the routing table for networks. +If both lookups fail, a final lookup is made for a ``wildcard'' +route (by convention, network 0). +The first appropriate route discovered is used. +By doing this, routes to a specific host on a network may be +present as well as routes to the network. This also allows a +``fall back'' network route to be defined to a ``smart'' gateway +which may then perform more intelligent routing. +.PP +Each routing table entry contains a destination (the desired final destination), +a gateway to which to send the packet, +and various flags which indicate the route's status and type (host or +network). A count +of the number of packets sent using the route is kept, along +with a count of ``held references'' to the dynamically +allocated structure to insure that memory reclamation +occurs only when the route is not in use. Finally, a pointer to the +a network interface is kept; packets sent using +the route should be handed to this interface. +.PP +Routes are typed in two ways: either as host or network, and as +``direct'' or ``indirect''. The host/network +distinction determines how to compare the \fIrt_dst\fP field +during lookup. If the route is to a network, only a packet's +destination network is compared to the \fIrt_dst\fP entry stored +in the table. If the route is to a host, the addresses must +match bit for bit. +.PP +The distinction between ``direct'' and ``indirect'' routes indicates +whether the destination is directly connected to the source. +This is needed when performing local network encapsulation. If +a packet is destined for a peer at a host or network which is +not directly connected to the source, the internetwork packet +header will +contain the address of the eventual destination, while +the local network header will address the intervening +gateway. Should the destination be directly connected, these addresses +are likely to be identical, or a mapping between the two exists. +The RTF_GATEWAY flag indicates that the route is to an ``indirect'' +gateway agent, and that the local network header should be filled in +from the \fIrt_gateway\fP field instead of +from the final internetwork destination address. +.PP +It is assumed that multiple routes to the same destination will not +be present; only one of multiple routes, that most recently installed, +will be used. +.PP +Routing redirect control messages are used to dynamically +modify existing routing table entries as well as dynamically +create new routing table entries. On hosts where exhaustive +routing information is too expensive to maintain (e.g. work +stations), the +combination of wildcard routing entries and routing redirect +messages can be used to provide a simple routing management +scheme without the use of a higher level policy process. +Current connections may be rerouted after notification of the protocols +by means of their \fIpr_ctlinput\fP entries. +Statistics are kept by the routing table routines +on the use of routing redirect messages and their +affect on the routing tables. These statistics may be viewed using +.IR netstat (1). +.PP +Status information other than routing redirect control messages +may be used in the future, but at present they are ignored. +Likewise, more intelligent ``metrics'' may be used to describe +routes in the future, possibly based on bandwidth and monetary +costs. +.NH 2 +Routing table interface +.PP +A protocol accesses the routing tables through +three routines, +one to allocate a route, one to free a route, and one +to process a routing redirect control message. +The routine \fIrtalloc\fP performs route allocation; it is +called with a pointer to the following structure containing +the desired destination: +.DS +._f +struct route { + struct rtentry *ro_rt; + struct sockaddr ro_dst; +}; +.DE +The route returned is assumed ``held'' by the caller until +released with an \fIrtfree\fP call. Protocols which implement +virtual circuits, such as TCP, hold onto routes for the duration +of the circuit's lifetime, while connection-less protocols, +such as UDP, allocate and free routes whenever their destination address +changes. +.PP +The routine \fIrtredirect\fP is called to process a routing redirect +control message. It is called with a destination address, +the new gateway to that destination, and the source of the redirect. +Redirects are accepted only from the current router for the destination. +If a non-wildcard route +exists to the destination, the gateway entry in the route is modified +to point at the new gateway supplied. Otherwise, a new routing +table entry is inserted reflecting the information supplied. Routes +to interfaces and routes to gateways which are not directly accessible +from the host are ignored. +.NH 2 +User level routing policies +.PP +Routing policies implemented in user processes manipulate the +kernel routing tables through two \fIioctl\fP calls. The +commands SIOCADDRT and SIOCDELRT add and delete routing entries, +respectively; the tables are read through the /dev/kmem device. +The decision to place policy decisions in a user process implies +that routing table updates may lag a bit behind the identification of +new routes, or the failure of existing routes, but this period +of instability is normally very small with proper implementation +of the routing process. Advisory information, such as ICMP +error messages and IMP diagnostic messages, may be read from +raw sockets (described in the next section). +.PP +Several routing policy processes have already been implemented. The +system standard +``routing daemon'' uses a variant of the Xerox NS Routing Information +Protocol [Xerox82] to maintain up-to-date routing tables in our local +environment. Interaction with other existing routing protocols, +such as the Internet EGP (Exterior Gateway Protocol), has been +accomplished using a similar process. diff --git a/share/doc/smm/18.net/b.t b/share/doc/smm/18.net/b.t new file mode 100644 index 0000000..2e39a8a --- /dev/null +++ b/share/doc/smm/18.net/b.t @@ -0,0 +1,145 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)b.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Raw sockets +.br +.ne 2i +.NH +\s+2Raw sockets\s0 +.PP +A raw socket is an object which allows users direct access +to a lower-level protocol. Raw sockets are intended for knowledgeable +processes which wish to take advantage of some protocol +feature not directly accessible through the normal interface, or +for the development of new protocols built atop existing lower level +protocols. For example, a new version of TCP might be developed at the +user level by utilizing a raw IP socket for delivery of packets. +The raw IP socket interface attempts to provide an identical interface +to the one a protocol would have if it were resident in the kernel. +.PP +The raw socket support is built around a generic raw socket interface, +(possibly) augmented by protocol-specific processing routines. +This section will describe the core of the raw socket interface. +.NH 2 +Control blocks +.PP +Every raw socket has a protocol control block of the following form: +.DS +.ta \w'struct 'u +\w'caddr_t 'u +\w'sockproto rcb_proto; 'u +struct rawcb { + struct rawcb *rcb_next; /* doubly linked list */ + struct rawcb *rcb_prev; + struct socket *rcb_socket; /* back pointer to socket */ + struct sockaddr rcb_faddr; /* destination address */ + struct sockaddr rcb_laddr; /* socket's address */ + struct sockproto rcb_proto; /* protocol family, protocol */ + caddr_t rcb_pcb; /* protocol specific stuff */ + struct mbuf *rcb_options; /* protocol specific options */ + struct route rcb_route; /* routing information */ + short rcb_flags; +}; +.DE +All the control blocks are kept on a doubly linked list for +performing lookups during packet dispatch. Associations may +be recorded in the control block and used by the output routine +in preparing packets for transmission. +The \fIrcb_proto\fP structure contains the protocol family and protocol +number with which the raw socket is associated. +The protocol, family and addresses are +used to filter packets on input; this will be described in more +detail shortly. If any protocol-specific information is required, +it may be attached to the control block using the \fIrcb_pcb\fP +field. +Protocol-specific options for transmission in outgoing packets +may be stored in \fIrcb_options\fP. +.PP +A raw socket interface is datagram oriented. That is, each send +or receive on the socket requires a destination address. This +address may be supplied by the user or stored in the control block +and automatically installed in the outgoing packet by the output +routine. Since it is not possible to determine whether an address +is present or not in the control block, two flags, RAW_LADDR and +RAW_FADDR, indicate if a local and foreign address are present. +Routing is expected to be performed by the underlying protocol +if necessary. +.NH 2 +Input processing +.PP +Input packets are ``assigned'' to raw sockets based on a simple +pattern matching scheme. Each network interface or protocol +gives unassigned packets +to the raw input routine with the call: +.DS +raw_input(m, proto, src, dst) +struct mbuf *m; struct sockproto *proto, struct sockaddr *src, *dst; +.DE +The data packet then has a generic header prepended to it of the +form +.DS +._f +struct raw_header { + struct sockproto raw_proto; + struct sockaddr raw_dst; + struct sockaddr raw_src; +}; +.DE +and it is placed in a packet queue for the ``raw input protocol'' module. +Packets taken from this queue are copied into any raw sockets that +match the header according to the following rules, +.IP 1) +The protocol family of the socket and header agree. +.IP 2) +If the protocol number in the socket is non-zero, then it agrees +with that found in the packet header. +.IP 3) +If a local address is defined for the socket, the address format +of the local address is the same as the destination address's and +the two addresses agree bit for bit. +.IP 4) +The rules of 3) are applied to the socket's foreign address and the packet's +source address. +.LP +A basic assumption is that addresses present in the +control block and packet header (as constructed by the network +interface and any raw input protocol module) are in a canonical +form which may be ``block compared''. +.NH 2 +Output processing +.PP +On output the raw \fIpr_usrreq\fP routine +passes the packet and a pointer to the raw control block to the +raw protocol output routine for any processing required before +it is delivered to the appropriate network interface. The +output routine is normally the only code required to implement +a raw socket interface. diff --git a/share/doc/smm/18.net/c.t b/share/doc/smm/18.net/c.t new file mode 100644 index 0000000..2c7f752 --- /dev/null +++ b/share/doc/smm/18.net/c.t @@ -0,0 +1,151 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)c.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Buffering and congestion control +.br +.ne 2i +.NH +\s+2Buffering and congestion control\s0 +.PP +One of the major factors in the performance of a protocol is +the buffering policy used. Lack of a proper buffering policy +can force packets to be dropped, cause falsified windowing +information to be emitted by protocols, fragment host memory, +degrade the overall host performance, etc. Due to problems +such as these, most systems allocate a fixed pool of memory +to the networking system and impose +a policy optimized for ``normal'' network operation. +.PP +The networking system developed for UNIX is little different in this +respect. At boot time a fixed amount of memory is allocated by +the networking system. At later times more system memory +may be requested as the need arises, but at no time is +memory ever returned to the system. It is possible to +garbage collect memory from the network, but difficult. In +order to perform this garbage collection properly, some +portion of the network will have to be ``turned off'' as +data structures are updated. The interval over which this +occurs must kept small compared to the average inter-packet +arrival time, or too much traffic may +be lost, impacting other hosts on the network, as well as +increasing load on the interconnecting mediums. In our +environment we have not experienced a need for such compaction, +and thus have left the problem unresolved. +.PP +The mbuf structure was introduced in chapter 5. In this +section a brief description will be given of the allocation +mechanisms, and policies used by the protocols in performing +connection level buffering. +.NH 2 +Memory management +.PP +The basic memory allocation routines manage a private page map, +the size of which determines the maximum amount of memory +that may be allocated by the network. +A small amount of memory is allocated at boot time +to initialize the mbuf and mbuf page cluster free lists. +When the free lists are exhausted, more memory is requested +from the system memory allocator if space remains in the map. +If memory cannot be allocated, +callers may block awaiting free memory, +or the failure may be reflected to the caller immediately. +The allocator will not block awaiting free map entries, however, +as exhaustion of the page map usually indicates that buffers have been lost +due to a ``leak.'' +The private page table is used by the network buffer management +routines in remapping pages to +be logically contiguous as the need arises. In addition, an +array of reference counts parallels the page table and is used +when multiple references to a page are present. +.PP +Mbufs are 128 byte structures, 8 fitting in a 1Kbyte +page of memory. When data is placed in mbufs, +it is copied or remapped into logically contiguous pages of +memory from the network page pool if possible. +Data smaller than half of the size +of a page is copied into one or more 112 byte mbuf data areas. +.NH 2 +Protocol buffering policies +.PP +Protocols reserve fixed amounts of +buffering for send and receive queues at socket creation time. These +amounts define the high and low water marks used by the socket routines +in deciding when to block and unblock a process. The reservation +of space does not currently +result in any action by the memory management +routines. +.PP +Protocols which provide connection level flow control do this +based on the amount of space in the associated socket queues. That +is, send windows are calculated based on the amount of free space +in the socket's receive queue, while receive windows are adjusted +based on the amount of data awaiting transmission in the send queue. +Care has been taken to avoid the ``silly window syndrome'' described +in [Clark82] at both the sending and receiving ends. +.NH 2 +Queue limiting +.PP +Incoming packets from the network are always received unless +memory allocation fails. However, each Level 1 protocol +input queue +has an upper bound on the queue's length, and any packets +exceeding that bound are discarded. It is possible for a host to be +overwhelmed by excessive network traffic (for instance a host +acting as a gateway from a high bandwidth network to a low bandwidth +network). As a ``defensive'' mechanism the queue limits may be +adjusted to throttle network traffic load on a host. +Consider a host willing to devote some percentage of +its machine to handling network traffic. +If the cost of handling an +incoming packet can be calculated so that an acceptable +``packet handling rate'' +can be determined, then input queue lengths may be dynamically +adjusted based on a host's network load and the number of packets +awaiting processing. Obviously, discarding packets is +not a satisfactory solution to a problem such as this +(simply dropping packets is likely to increase the load on a network); +the queue lengths were incorporated mainly as a safeguard mechanism. +.NH 2 +Packet forwarding +.PP +When packets can not be forwarded because of memory limitations, +the system attempts to generate a ``source quench'' message. In addition, +any other problems encountered during packet forwarding are also +reflected back to the sender in the form of ICMP packets. This +helps hosts avoid unneeded retransmissions. +.PP +Broadcast packets are never forwarded due to possible dire +consequences. In an early stage of network development, broadcast +packets were forwarded and a ``routing loop'' resulted in network +saturation and every host on the network crashing. diff --git a/share/doc/smm/18.net/d.t b/share/doc/smm/18.net/d.t new file mode 100644 index 0000000..675bece --- /dev/null +++ b/share/doc/smm/18.net/d.t @@ -0,0 +1,73 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)d.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Out of band data +.br +.ne 2i +.NH +\s+2Out of band data\s0 +.PP +Out of band data is a facility peculiar to the stream socket +abstraction defined. Little agreement appears to exist as +to what its semantics should be. TCP defines the notion of +``urgent data'' as in-line, while the NBS protocols [Burruss81] +and numerous others provide a fully independent logical +transmission channel along which out of band data is to be +sent. +In addition, the amount of the data which may be sent as an out +of band message varies from protocol to protocol; everything +from 1 bit to 16 bytes or more. +.PP +A stream socket's notion of out of band data has been defined +as the lowest reasonable common denominator (at least reasonable +in our minds); +clearly this is subject to debate. Out of band data is expected +to be transmitted out of the normal sequencing and flow control +constraints of the data stream. A minimum of 1 byte of out of +band data and one outstanding out of band message are expected to +be supported by the protocol supporting a stream socket. +It is a protocol's prerogative to support larger-sized messages, or +more than one outstanding out of band message at a time. +.PP +Out of band data is maintained by the protocol and is usually not +stored in the socket's receive queue. +A socket-level option, SO_OOBINLINE, +is provided to force out-of-band data to be placed in the normal +receive queue when urgent data is received; +this sometimes amelioriates problems due to loss of data +when multiple out-of-band +segments are received before the first has been passed to the user. +The PRU_SENDOOB and PRU_RCVOOB +requests to the \fIpr_usrreq\fP routine are used in sending and +receiving data. diff --git a/share/doc/smm/18.net/e.t b/share/doc/smm/18.net/e.t new file mode 100644 index 0000000..77e8a2a --- /dev/null +++ b/share/doc/smm/18.net/e.t @@ -0,0 +1,129 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)e.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH "Trailer protocols +.br +.ne 2i +.NH +\s+2Trailer protocols\s0 +.PP +Core to core copies can be expensive. +Consequently, a great deal of effort was spent +in minimizing such operations. The VAX architecture +provides virtual memory hardware organized in +page units. To cut down on copy operations, data +is kept in page-sized units on page-aligned +boundaries whenever possible. This allows data +to be moved in memory simply by remapping the page +instead of copying. The mbuf and network +interface routines perform page table manipulations +where needed, hiding the complexities of the VAX +virtual memory hardware from higher level code. +.PP +Data enters the system in two ways: from the user, +or from the network (hardware interface). When data +is copied from the user's address space +into the system it is deposited in pages (if sufficient +data is present). +This encourages the user to transmit information in +messages which are a multiple of the system page size. +.PP +Unfortunately, performing a similar operation when taking +data from the network is very difficult. +Consider the format of an incoming packet. A packet +usually contains a local network header followed by +one or more headers used by the high level protocols. +Finally, the data, if any, follows these headers. Since +the header information may be variable length, DMA'ing the eventual +data for the user into a page aligned area of +memory is impossible without +\fIa priori\fP knowledge of the format (e.g., by supporting +only a single protocol header format). +.PP +To allow variable length header information to +be present and still ensure page alignment of data, +a special local network encapsulation may be used. +This encapsulation, termed a \fItrailer protocol\fP [Leffler84], +places the variable length header information after +the data. A fixed size local network +header is then prepended to the resultant packet. +The local network header contains the size of the +data portion (in units of 512 bytes), and a new \fItrailer protocol +header\fP, inserted before the variable length +information, contains the size of the variable length +header information. The following trailer +protocol header is used to store information +regarding the variable length protocol header: +.DS +._f +struct { + short protocol; /* original protocol no. */ + short length; /* length of trailer */ +}; +.DE +.PP +The processing of the trailer protocol is very +simple. On output, the local network header indicates that +a trailer encapsulation is being used. +The header also includes an indication +of the number of data pages present before the trailer +protocol header. The trailer protocol header is +initialized to contain the actual protocol identifier and the +variable length header size, and is appended to the data +along with the variable length header information. +.PP +On input, the interface routines identify the +trailer encapsulation +by the protocol type stored in the local network header, +then calculate the number of +pages of data to find the beginning of the trailer. +The trailing information is copied into a separate +mbuf and linked to the front of the resultant packet. +.PP +Clearly, trailer protocols require cooperation between +source and destination. In addition, they are normally +cost effective only when sizable packets are used. The +current scheme works because the local network encapsulation +header is a fixed size, allowing DMA operations +to be performed at a known offset from the first data page +being received. Should the local network header be +variable length this scheme fails. +.PP +Statistics collected indicate that as much as 200Kb/s +can be gained by using a trailer protocol with +1Kbyte packets. The average size of the variable +length header was 40 bytes (the size of a +minimal TCP/IP packet header). If hardware +supports larger sized packets, even greater gains +may be realized. diff --git a/share/doc/smm/18.net/f.t b/share/doc/smm/18.net/f.t new file mode 100644 index 0000000..18995fd --- /dev/null +++ b/share/doc/smm/18.net/f.t @@ -0,0 +1,117 @@ +.\" Copyright (c) 1983, 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)f.t 8.1 (Berkeley) 6/8/93 +.\" +.nr H2 1 +.\".ds RH Acknowledgements +.br +.ne 2i +.SH +\s+2Acknowledgements\s0 +.PP +The internal structure of the system is patterned +after the Xerox PUP architecture [Boggs79], while in certain +places the Internet +protocol family has had a great deal of influence in the design. +The use of software interrupts for process invocation +is based on similar facilities found in +the VMS operating system. +Many of the +ideas related to protocol modularity, memory management, and network +interfaces are based on Rob Gurwitz's TCP/IP implementation for the +4.1BSD version of UNIX on the VAX [Gurwitz81]. +Greg Chesson explained his use of trailer encapsulations in Datakit, +instigating their use in our system. +.\".ds RH References +.nr H2 1 +.sp 2 +.ne 2i +.SH +\s+2References\s0 +.LP +.IP [Boggs79] 20 +Boggs, D. R., J. F. Shoch, E. A. Taft, and R. M. Metcalfe; +\fIPUP: An Internetwork Architecture\fP. Report CSL-79-10. +XEROX Palo Alto Research Center, July 1979. +.IP [BBN78] 20 +Bolt Beranek and Newman; +Specification for the Interconnection of Host and IMP. +BBN Technical Report 1822. May 1978. +.IP [Cerf78] 20 +Cerf, V. G.; The Catenet Model for Internetworking. +Internet Working Group, IEN 48. July 1978. +.IP [Clark82] 20 +Clark, D. D.; Window and Acknowledgement Strategy in TCP, RFC-813. +Network Information Center, SRI International. July 1982. +.IP [DEC80] 20 +Digital Equipment Corporation; \fIDECnet DIGITAL Network +Architecture \- General Description\fP. Order No. +AA-K179A-TK. October 1980. +.IP [Gurwitz81] 20 +Gurwitz, R. F.; VAX-UNIX Networking Support Project \- Implementation +Description. Internetwork Working Group, IEN 168. +January 1981. +.IP [ISO81] 20 +International Organization for Standardization. +\fIISO Open Systems Interconnection \- Basic Reference Model\fP. +ISO/TC 97/SC 16 N 719. August 1981. +.IP [Joy86] 20 +Joy, W.; Fabry, R.; Leffler, S.; McKusick, M.; and Karels, M.; +Berkeley Software Architecture Manual, 4.4BSD Edition. +\fIUNIX Programmer's Supplementary Documents\fP, Vol. 1 (PSD:5). +Computer Systems Research Group, +University of California, Berkeley. +May, 1986. +.IP [Leffler84] 20 +Leffler, S.J. and Karels, M.J.; Trailer Encapsulations, RFC-893. +Network Information Center, SRI International. +April 1984. +.IP [Postel80] 20 +Postel, J. User Datagram Protocol, RFC-768. +Network Information Center, SRI International. May 1980. +.IP [Postel81a] 20 +Postel, J., ed. Internet Protocol, RFC-791. +Network Information Center, SRI International. September 1981. +.IP [Postel81b] 20 +Postel, J., ed. Transmission Control Protocol, RFC-793. +Network Information Center, SRI International. September 1981. +.IP [Postel81c] 20 +Postel, J. Internet Control Message Protocol, RFC-792. +Network Information Center, SRI International. September 1981. +.IP [Xerox81] 20 +Xerox Corporation. \fIInternet Transport Protocols\fP. +Xerox System Integration Standard 028112. December 1981. +.IP [Zimmermann80] 20 +Zimmermann, H. OSI Reference Model \- The ISO Model of +Architecture for Open Systems Interconnection. +\fIIEEE Transactions on Communications\fP. Com-28(4); 425-432. +April 1980. diff --git a/share/doc/smm/18.net/spell.ok b/share/doc/smm/18.net/spell.ok new file mode 100644 index 0000000..f9a387b --- /dev/null +++ b/share/doc/smm/18.net/spell.ok @@ -0,0 +1,307 @@ +A,1986A +AA +ACCEPTCONN +ADDR +ARPANET +ASYNC +BBN +BBN78 +Beranek +Boggs +Boggs78 +Boggs79 +Burruss81 +CANTRCVMORE +CANTSENDMORE +CLSIZE +COLL +CONNECT2 +CONNREQUIRED +COPYALL +CSL +Catenet +Cerf +Cerf78 +Chesson +Clark82 +Com +DEC80 +DECnet +DEQUEUE +DEQUEUEIF +DETATCH +DMA +DMA'ing +DMR +DONTWAIT +Datagram +Datakit +EGP +EWOULDBLOCK +Ethernet +FADDR +FASTTIMO +Fabry +GETOPT +Gurwitz +Gurwitz's +Gurwitz81 +HASCL +HOSTDEAD +HOSTUNREACH +ICMP +IEN +IFDOWN +IFF +IFRW +INTRANS +IP +IP's +ISCONNECTED +ISCONNECTING +ISDISCONNECTING +ISO +ISO81 +Ircb +Joy86 +K179A +Karels +LADDR +LOOPBACK +Leffler +Leffler84 +M.J +MAXNUBAMR +MCLGET +MFREE +MGET +MLEN +MSG +MSGSIZE +MSIZE +Mbufs +McKusick +Metcalfe +NBIO +NEEDFRAG +NOARP +NOFDREF +NOTRAILERS +NS +Notes''SMM:15 +OOBINLINE +OSI +PARAMPROB +PEERADDR +PF +POINTOPOINT +PRC +PRCO +PRIV +PROTORCV +PROTOSEND +PRU +PS1:6 +Postel +Postel80 +Postel81a +Postel81b +Postel81c +QFULL +RCVATMARK +RCVD +RCVOOB +RFC +RH +ROUTEDEAD +RTF +S.J +SB +SEL +SENDOOB +SETOPT +SIGIO +SIOCADDRT +SIOCDELRT +SIOCGIFCONF +SIOCSIFADDR +SIOCSIFDSTADDR +SIOCSIFFLAGS +SIOGSIFFLAGS +SLOWTIMO +SMM:15 +SOCKADDR +SRCFAIL +SS +Shoch +TCP +TIMXCEED +TOSHOST +TOSNET +UDP +UNIBUS +VAX +VMS +Vol +WANTRCVD +Xerox81 +Xerox82 +Zimmermann +Zimmermann80 +addr +addrlist +adj +amelioriates +async +bdp +broadaddr +caddr +csma +ctlinput +ctloutput +daemon +dat +datagram +decapsulation +dequeuing +dev +dma'd +dom +dst +dstaddr +dtom +faddr +fastimo +fasttimo +fcntl +freem +fstat +getsockopt +hardwired +hiwat +hlen +ierrors +ifa +ifaddr +iff +ifnet +ifp +ifq +ifqueue +ifr +ifrw +ifrw.ifrw +ifu +ifu.ifu +ifuba +ifubinfo +ifw +ifx +ifxmt +info +info.iff +init +inp +inpcb +inq +ip +ipackets +ipintrq +laddr +len +loopback +lowat +m,n +mb +mbcnt +mbmax +mbuf +mbuf's +mbufs +mp +mr +mtod +mtu +netstat +nmr +nr +nx +oerrors +off0 +ontrol +oob +oobmark +op +opackets +ops +optname +pcb +pgrp +prev +proc +proto +protosw +protoswNPROTOSW +pte +pullup +q0len +qlen +qlimit +rawcb +rcb +rcv +recvmsg +ref +refcnt +regs +req +rerouted +ro +rotocol +rt +rtalloc +rtentry +rtfree +rtredirect +rubaget +sb +sel +sendmsg +setsockopt +slowtimo +snd +sockaddr +sockbuf +socketpair +sockproto +sonewconn +soshutdown +src +ta +ternet +timeo +totlen +uba +ubaalloc +ubaget +ubainit +uballoc's +ubaminit +uban +ubaput +ubinfo +udp +unibus +usrreq +virt +vm +wildcard +wmap +wubaput +x,t +xmt +xmt.ifrw +xmt.ifw +xswapd +xtofree +xxx diff --git a/share/doc/smm/Makefile b/share/doc/smm/Makefile new file mode 100644 index 0000000..e6ceda1 --- /dev/null +++ b/share/doc/smm/Makefile @@ -0,0 +1,33 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/10/93 +# $FreeBSD$ + +.include <bsd.own.mk> + +# The following modules do not describe FreeBSD: +# 14.uucpimpl, 15.uucpnet + +# The following modules do not build/install: +# 13.amd (documentation is TeXinfo) +# 16.security 17.password (encumbered) + +# The following modules are built with their programs: +# 07.lpd + +SUBDIR= title \ + contents \ + 01.setup \ + 02.config \ + 03.fsck \ + 04.quotas \ + 05.fastfs \ + 06.nfs \ + ${_08.sendmailop} \ + 11.timedop \ + 12.timed \ + 18.net + +.if ${MK_SENDMAIL} != "no" +_08.sendmailop= 08.sendmailop +.endif + +.include <bsd.subdir.mk> diff --git a/share/doc/smm/contents/Makefile b/share/doc/smm/contents/Makefile new file mode 100644 index 0000000..aa73faa --- /dev/null +++ b/share/doc/smm/contents/Makefile @@ -0,0 +1,8 @@ +# $FreeBSD$ + +VOLUME= smm +DOC= contents +SRCS= contents.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/smm/contents/contents.ms b/share/doc/smm/contents/contents.ms new file mode 100644 index 0000000..1b8038a --- /dev/null +++ b/share/doc/smm/contents/contents.ms @@ -0,0 +1,195 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)00.contents 8.1 (Berkeley) 7/5/93 +.\" $FreeBSD$ +.\" +.OH '''SMM Contents' +.EH 'SMM Contents''' +.TL +UNIX System Manager's Manual (SMM) +.if !r.U .nr .U 0 +.if \n(.U \{\ +.br +.>> <a href="Title.html">Title.html</a> +.\} +.sp +\s-2 4.4 Berkeley Software Distribution\s+2 +.sp +\fRJune, 1993\fR +.PP +This volume contains manual pages and supplementary documents useful to system +administrators. +The information in these documents applies to +the 4.4BSD system as distributed by U.C. Berkeley. +.SH +Reference Manual \- Section 8 +.tl '''(8)' +.IP +Section 8 of the UNIX Programmer's Manual contains information related to +system operation, administration, and maintenance. +.SH +System Installation and Administration +.IP +.tl 'Installing and Operating 4.4BSD''SMM:1' +.if \n(.U \{\ +.br +.>> <a href="01.setup/paper.html">01.setup/paper.html</a> +.\} +.QP +The definitive reference document for those occasions when +you find you need to start over again. + +.IP +.tl 'Building 4.4BSD Kernels with \fIConfig\fP''SMM:2' +.if \n(.U \{\ +.br +.>> <a href="02.config/paper.html">02.config/paper.html</a> +.\} +.QP +In-depth discussions of the use and operation of the \fIconfig\fP +program, and how to build your very own Unix kernel. + +.IP +.tl 'Fsck \- The UNIX File System Check Program''SMM:3' +.if \n(.U \{\ +.br +.>> <a href="03.fsck/paper.html">03.fsck/paper.html</a> +.\} +.QP +A reference document for using the \fIfsck\fP program during +times of file system distress. + +.IP +.tl 'Disc Quotas in a UNIX Environment''SMM:4' +.if \n(.U \{\ +.br +.>> <a href="04.quotas/paper.html">04.quotas/paper.html</a> +.\} +.QP +A light introduction to the techniques +for limiting the use of disc resources. + +.IP +.tl 'A Fast File System for UNIX''SMM:5' +.if \n(.U \{\ +.br +.>> <a href="05.fastfs/paper.html">05.fastfs/paper.html</a> +.\} +.QP +A description of the 4.4BSD file system organization, +design and implementation. + +.IP +.tl 'The 4.4BSD NFS Implementation''SMM:6' +.if \n(.U \{\ +.br +.>> <a href="06.nfs/paper.html">06.nfs/paper.html</a> +.\} +.QP +An overview of the design, implementation, and use of NFS on 4.4BSD. + +.IP +.tl 'Line Printer Spooler Manual''SMM:7' +.QP +This document describes the structure and installation procedure +for the line printer spooling system. + +.IP +.tl 'Sendmail Installation and Operation Guide''SMM:8' +.if \n(.U \{\ +.br +.>> <a href="08.sendmailop/paper.html">08.sendmailop/paper.html</a> +.\} +.QP +The last word in installing and operating the \fIsendmail\fP program. + +.IP +.tl 'Timed Installation and Operation Guide''SMM:11' +.if \n(.U \{\ +.br +.>> <a href="11.timedop/paper.html">11.timedop/paper.html</a> +.\} +.QP +Describes how to maintain time synchronization between machines +in a local network. + +.IP +.tl 'The Berkeley UNIX Time Synchronization Protocol''SMM:12' +.if \n(.U \{\ +.br +.>> <a href="12.timed/paper.html">12.timed/paper.html</a> +.\} +.QP +The protocols and algorithms used by timed, +the network time synchronization daemon. + +.IP +.tl 'AMD \- The 4.4BSD Automounter''SMM:13' +.QP +Automatically mounting file systems on demand. + +.IP +.tl 'Installation and Operation of UUCP''SMM:14' +.QP +Describes the implementation of uucp; for the installer and administrator. + +.IP +.tl 'A Dial\-Up Network of UNIX Systems''SMM:15' +.QP +Describes UUCP, a program for communicating files between UNIX systems. + +.IP +.tl 'On the Security of UNIX''SMM:16' +.QP +Hints on how to break UNIX, and how to avoid your system being broken. + +.IP +.tl 'Password Security \- A Case History''SMM:17' +.QP +How the bad guys used to be able to break the password algorithm, and why +they cannot now (at least not so easily). + +.IP +.tl 'Networking Implementation Notes, 4.4BSD Edition''SMM:18' +.if \n(.U \{\ +.br +.>> <a href="18.net/paper.html">18.net/paper.html</a> +.\} +.QP +A concise description of the system interfaces used within the +networking subsystem. + +.IP +.tl 'The PERL Programming Language''SMM:19' +.QP +The Practical Extraction and Report Language is ideal for +writing those pesky adminitration scripts. diff --git a/share/doc/smm/title/Makefile b/share/doc/smm/title/Makefile new file mode 100644 index 0000000..c1f1c9b --- /dev/null +++ b/share/doc/smm/title/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +VOLUME= smm +DOC= Title +SRCS= Title + +.include <bsd.doc.mk> diff --git a/share/doc/smm/title/Title b/share/doc/smm/title/Title new file mode 100644 index 0000000..5faa11f --- /dev/null +++ b/share/doc/smm/title/Title @@ -0,0 +1,146 @@ +.\" Copyright (c) 1986, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)Title 8.2 (Berkeley) 4/19/94 +.\" $FreeBSD$ +.\" +.ps 18 +.vs 22 +.sp 2.75i +.ft B +.ce 2 +UNIX System Manager's Manual +(SMM) +.ps 14 +.vs 16 +.sp |4i +.ce 2 +4.4 Berkeley Software Distribution +.sp |5.75i +.ft R +.ps 12 +.vs 16 +.ce +June, 1993 +.sp |8.2i +.ce 5 +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California +Berkeley, California 94720 +.bp +\& +.sp |1i +.hy 0 +.ps 10 +.vs 12p +Copyright 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993 +The Regents of the University of California. All rights reserved. +.sp 2 +Other than the specific manual pages and documents listed below +as copyrighted by AT&T, +redistribution and use of this manual in source and binary forms, +with or without modification, are permitted provided that the +following conditions are met: +.sp 0.5 +.in +0.2i +.ta 0.2i +.ti -0.2i +1) Redistributions of this manual must retain the copyright +notices on this page, this list of conditions and the following disclaimer. +.ti -0.2i +2) Software or documentation that incorporates part of this manual must +reproduce the copyright notices on this page, this list of conditions and +the following disclaimer in the documentation and/or other materials +provided with the distribution. +.ti -0.2i +3) All advertising materials mentioning features or use of this software +must display the following acknowledgement: +``This product includes software developed by the University of +California, Berkeley and its contributors.'' +.ti -0.2i +4) Neither the name of the University nor the names of its contributors +may be used to endorse or promote products derived from this software +without specific prior written permission. +.in -0.2i +.sp +\fB\s-1THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE.\s+1\fP +.sp 2 +The Institute of Electrical and Electronics Engineers and the American +National Standards Committee X3, on Information Processing Systems have +given us permission to reprint portions of their documentation. +.sp +In the following statement, the phrase ``this text'' refers to portions +of the system documentation. +.sp 0.5 +``Portions of this text are reprinted and reproduced in +electronic form in 4.4BSD from IEEE Std 1003.1-1988, IEEE +Standard Portable Operating System Interface for Computer Environments +(POSIX), copyright 1988 by the Institute of Electrical and Electronics +Engineers, Inc. In the event of any discrepancy between these versions +and the original IEEE Standard, the original IEEE Standard is the referee +document.'' +.sp +In the following statement, the phrase ``This material'' refers to portions +of the system documentation. +.sp 0.5 +``This material is reproduced with permission from American National +Standards Committee X3, on Information Processing Systems. Computer and +Business Equipment Manufacturers Association (CBEMA), 311 First St., NW, +Suite 500, Washington, DC 20001-2178. The developmental work of +Programming Language C was completed by the X3J11 Technical Committee.'' +.sp 2 +Manual pages cron.8, icheck.8, ncheck.8, and sa.8 +and documents SMM:15, 16, and 17 +are copyright 1979, AT&T Bell Laboratories, Incorporated. +Document SMM:14 is a modification of an earlier document that +is copyrighted 1979 by AT&T Bell Laboratories, Incorporated. +Holders of \x'-1p'UNIX\v'-4p'\s-3TM\s0\v'4p'/32V, +System III, or System V software licenses are +permitted to copy these documents, or any portion of them, +as necessary for licensed use of the software, +provided this copyright notice and statement of permission +are included. +.sp 2 +The views and conclusions contained in this manual are those of the +authors and should not be interpreted as representing official policies, +either expressed or implied, of the Regents of the University of California. diff --git a/share/doc/usd/04.csh/Makefile b/share/doc/usd/04.csh/Makefile new file mode 100644 index 0000000..d22a7b9 --- /dev/null +++ b/share/doc/usd/04.csh/Makefile @@ -0,0 +1,10 @@ +# @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +VOLUME= usd/04.csh +SRCS= tabs csh.1 csh.2 csh.3 csh.4 csh.a csh.g +MACROS= -ms +USE_SOELIM= +SRCDIR= ${.CURDIR}/../../../../bin/csh/USD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/usd/07.mail/Makefile b/share/doc/usd/07.mail/Makefile new file mode 100644 index 0000000..d5a6d3c --- /dev/null +++ b/share/doc/usd/07.mail/Makefile @@ -0,0 +1,11 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= usd/07.mail +SRCS= mail0.nr mail1.nr mail2.nr mail3.nr mail4.nr mail5.nr mail6.nr \ + mail7.nr mail8.nr mail9.nr maila.nr +MACROS= -me +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../usr.bin/mail/USD.doc + +.include <bsd.doc.mk> diff --git a/share/doc/usd/10.exref/Makefile b/share/doc/usd/10.exref/Makefile new file mode 100644 index 0000000..8df4f72 --- /dev/null +++ b/share/doc/usd/10.exref/Makefile @@ -0,0 +1,6 @@ +# $FreeBSD$ + +SUBDIR= exref summary + +.include <bsd.subdir.mk> + diff --git a/share/doc/usd/10.exref/Makefile.inc b/share/doc/usd/10.exref/Makefile.inc new file mode 100644 index 0000000..c5c2f55 --- /dev/null +++ b/share/doc/usd/10.exref/Makefile.inc @@ -0,0 +1,5 @@ +# $FreeBSD$ + +VOLUME= usd/10.exref +MACROS= -ms +SRCDIR= ${.CURDIR}/../../../../../contrib/nvi/docs/USD.doc/exref diff --git a/share/doc/usd/10.exref/exref/Makefile b/share/doc/usd/10.exref/exref/Makefile new file mode 100644 index 0000000..7af18f9 --- /dev/null +++ b/share/doc/usd/10.exref/exref/Makefile @@ -0,0 +1,5 @@ +# $FreeBSD$ + +SRCS= ex.rm + +.include <bsd.doc.mk> diff --git a/share/doc/usd/10.exref/summary/Makefile b/share/doc/usd/10.exref/summary/Makefile new file mode 100644 index 0000000..143333f --- /dev/null +++ b/share/doc/usd/10.exref/summary/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +DOC= summary +SRCS= ex.summary +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/usd/11.vitut/Makefile b/share/doc/usd/11.vitut/Makefile new file mode 100644 index 0000000..f412b62 --- /dev/null +++ b/share/doc/usd/11.vitut/Makefile @@ -0,0 +1,18 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= usd/11.edit +SRCS= edittut.ms +MACROS= -ms +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../contrib/nvi/docs/USD.doc/edit + +# index for versatec is different from the one in edit.tut +# because the fonts are different and entries reference page +# rather than section numbers. if you have a typesetter +# you should just use the index in edit.tut, and ignore editvindex. + +#editvindex: +# ${TROFF} ${MACROS} -n22 edit.vindex + +.include <bsd.doc.mk> diff --git a/share/doc/usd/12.vi/Makefile b/share/doc/usd/12.vi/Makefile new file mode 100644 index 0000000..7b2c080 --- /dev/null +++ b/share/doc/usd/12.vi/Makefile @@ -0,0 +1,6 @@ +# $FreeBSD$ + +SUBDIR= vi viapwh summary + +.include <bsd.subdir.mk> + diff --git a/share/doc/usd/12.vi/Makefile.inc b/share/doc/usd/12.vi/Makefile.inc new file mode 100644 index 0000000..42d417f --- /dev/null +++ b/share/doc/usd/12.vi/Makefile.inc @@ -0,0 +1,5 @@ +# $FreeBSD$ + +VOLUME= usd/12.vi +MACROS= -ms +SRCDIR= ${.CURDIR}/../../../../../contrib/nvi/docs/USD.doc/vitut diff --git a/share/doc/usd/12.vi/summary/Makefile b/share/doc/usd/12.vi/summary/Makefile new file mode 100644 index 0000000..425536d --- /dev/null +++ b/share/doc/usd/12.vi/summary/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +DOC= summary +SRCS= vi.summary +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/usd/12.vi/vi/Makefile b/share/doc/usd/12.vi/vi/Makefile new file mode 100644 index 0000000..6021b09 --- /dev/null +++ b/share/doc/usd/12.vi/vi/Makefile @@ -0,0 +1,6 @@ +# $FreeBSD$ + +SRCS= vi.in vi.chars +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/usd/12.vi/viapwh/Makefile b/share/doc/usd/12.vi/viapwh/Makefile new file mode 100644 index 0000000..f20582d --- /dev/null +++ b/share/doc/usd/12.vi/viapwh/Makefile @@ -0,0 +1,6 @@ +# $FreeBSD$ + +DOC= viapwh +SRCS= vi.apwh.ms + +.include <bsd.doc.mk> diff --git a/share/doc/usd/13.viref/Makefile b/share/doc/usd/13.viref/Makefile new file mode 100644 index 0000000..d7bb392 --- /dev/null +++ b/share/doc/usd/13.viref/Makefile @@ -0,0 +1,35 @@ +# From: @(#)Makefile 8.16 (Berkeley) 8/15/94 +# $FreeBSD$ + +VOLUME= usd/13.viref +SRCS= vi.ref-patched +EXTRA= ex.cmd.roff ref.so set.opt.roff vi.cmd.roff +MACROS= -me +CLEANFILES= vi.ref-patched index +TRFLAGS= -U # this is to hide warnings only +USE_SOELIM= +USE_TBL= +SRCDIR= ${.CURDIR}/../../../../contrib/nvi/docs/USD.doc/vi.ref + +vi.ref-patched: vi.ref + sed -e 's:^\.so index.so$$:&.\\*[.T]:' ${.ALLSRC} > ${.TARGET} + +PRINTERDEVICE?= ascii +.for _dev in ${PRINTERDEVICE} +EXTRA+= index.so.${_dev} +CLEANFILES+= index.so.${_dev} + +# Build index.so as a side-effect of building the paper. +index.so.${_dev}: ${SRCS} ${EXTRA:Nindex.so.${_dev}} + sed -e 's:^\.so index\.so\.\\\*\[\.T\]$$::' vi.ref-patched | \ + ${ROFF.${_dev}} -U -z + sed -e 's/MINUSSIGN/-/' \ + -e 's/DOUBLEQUOTE/""/' \ + -e "s/SQUOTE/'/" \ + -e 's/ /__SPACE/g' < index | \ + sort -u '-t ' -k 1,1 -k 2n | awk -f ${SRCDIR}/merge.awk | \ + sed -e 's/__SPACE/ /g' \ + -e "s/^\\(['\\.]\\)/\\\\\&\\1/" > ${.TARGET} +.endfor + +.include <bsd.doc.mk> diff --git a/share/doc/usd/18.msdiffs/Makefile b/share/doc/usd/18.msdiffs/Makefile new file mode 100644 index 0000000..faf76bb --- /dev/null +++ b/share/doc/usd/18.msdiffs/Makefile @@ -0,0 +1,8 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= usd/18.msdiffs +SRCS= ms.diffs +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/usd/18.msdiffs/ms.diffs b/share/doc/usd/18.msdiffs/ms.diffs new file mode 100644 index 0000000..be6883f --- /dev/null +++ b/share/doc/usd/18.msdiffs/ms.diffs @@ -0,0 +1,288 @@ +.\" Copyright (c) 1983, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)ms.diffs 8.1 (Berkeley) 6/8/93 +.\" +.\" $FreeBSD$ +.\" +.nr LL 6.5i +.nr FL 6.0i +.if t .nr PD .5v +.if t .ds m \u\(ul\dm +.if n .ds m -m +.AM +.OH 'A Revised Version of \*ms''USD:18-%' +.EH 'USD:18-%''A Revised Version of \*ms' +.TL +A Revised Version of \*ms +.AU +Bill Tuthill +.AI +Computing Services +University of California +Berkeley, CA 94720 +.PP +The \*ms macros have been slightly revised and re\%arranged for the +Berkeley Unix distribution. +Because of the rearrangement, +the new macros can be read by the computer +in about half the time required by the previous version of \*ms. +This means that output will begin to appear between ten seconds +and several minutes more quickly, depending on the system load. +On long files, however, the savings in total time are not substantial. +The old version of \*ms is still available as \*mos. +.PP +Several bugs in \*ms have been fixed, including +a bad problem with the .1C macro, +minor difficulties with boxed text, +a break induced by .EQ before initialization, +the failure to set tab stops in displays, +and several bothersome errors in the \fBrefer\fP macros. +Macros used only at Bell Laboratories have been removed. +There are a few extensions to previous \*ms macros, +and a number of new macros, but all the documented \*ms macros +still work exactly as they did before, and have the same names as before. +Output produced with \*ms should look like output produced with \*mos. +.PP +One important new feature is automatically numbered footnotes. +Footnote numbers are printed by means of a pre-defined string +(\e\(**\(**), which you invoke separately from .FS and .FE. +Each time it is used, this string increases the footnote number by one, +whether or not you use .FS and .FE in your text. +Footnote numbers will be superscripted on the phototypesetter +and on daisy-wheel terminals, but on low-resolution devices +(such as the lpr and a crt), they will be bracketed. +If you use \e\(**\(** to indicate numbered footnotes, +then the .FS macro will automatically include +the footnote number at the bottom of the page. +This footnote, for example, was produced as follows:\** +.DS +This footnote, for example, was produced as follows:\e\(**\(** +\&.FS +.sp -.2 + ... +\&.FE +.DE +.FS +If you never use the ``\e\(**\(**'' string, +no footnote numbers will appear anywhere in the text, +including down here. +The output footnotes will look exactly like +footnotes produced with \*mos. +.FE +If you are using \e\(**\(** to number footnotes, +but want a particular footnote to be marked with an asterisk or a dagger, +then give that mark as the first argument to .FS: \(dg +.DS +then give that mark as the first argument to .FS: \e(dg +\&.FS \e(dg +.sp -.2 + ... +\&.FE +.DE +.FS \(dg +In the footnote, the dagger will appear where the footnote +number would otherwise appear, as on the left. +.FE +Footnote numbering will be temporarily suspended, +because the \e\(**\(** string is not used. +Instead of a dagger, you could use an asterisk * +or double dagger \(dd, represented as \|\e(dd. +.PP +Another new feature is a macro for printing theses +according to Berkeley standards. +This macro is called .TM, which stands for thesis mode. +(It is much like the .th macro in \*me.) +It will put page numbers in the upper right-hand corner; +number the first page; suppress the date; +and doublespace everything except quotes, displays, and keeps. +Use it at the top of each file making up your thesis. +Calling .TM defines the .CT macro for chapter titles, +which skips to a new page and moves the pagenumber to the center footer. +The .P1 (P one) macro can be used even without thesis mode +to print the header on page 1, +which is suppressed except in thesis mode. +If you want roman numeral page numbering, +use an ``.af\0PN\0i'' request. +.PP +There is a new macro especially for bibliography entries, +called .XP, which stands for exdented paragraph. +It will exdent the first line of the paragraph by \en(PI units, +usually 5n (the same as the indent for the first line of a .PP). +Most bibliographies are printed this way. +Here are some examples of exdented paragraphs: +.XP +Lumley, Lyle S., \fISex in Crustaceans: Shell Fish Habits,\fP\| +Harbinger Press, Tampa Bay and San Diego, October 1979. +243 pages. +The pioneering work in this field. +.XP +Leffadinger, Harry A., ``Mollusk Mating Season: 52 Weeks, or All Year?'' +in \fIActa Biologica,\fP\| vol. 42, no. 11, November 1980. +A provocative thesis, but the conclusions are wrong. +.LP +Of course, you will have to take care of +italicizing the book title and journal, +and quoting the title of the journal article. +Indentation or exdentation can be changed +by setting the value of number register PI. +.PP +If you need to produce endnotes rather than footnotes, +put the references in a file of their own. +This is similar to what you would do if you were +typing the paper on a conventional typewriter. +Note that you can use automatic footnote numbering +without actually having .FS and .FE pairs in your text. +If you place footnotes in a separate file, +you can use .IP macros with \e\(**\(**\| as a hanging tag; +this will give you numbers at the left-hand margin. +With some styles of endnotes, +you would want to use .PP rather then .IP macros, +and specify \e\(**\(** before the reference begins. +.PP +There are four new macros to help produce a table of contents. +Table of contents entries must be enclosed in .XS and .XE pairs, +with optional .XA macros for additional entries; +arguments to .XS and .XA specify the page number, +to be printed at the right. +A final .PX macro prints out the table of contents. +Here is a sample of typical input and output text: +.DS +\&.XS ii +Introduction +\&.XA 1 +Chapter 1: Review of the Literature +\&.XA 23 +Chapter 2: Experimental Evidence +\&.XE +\&.PX +.sp .5 +.lt 5.5i +.tl ''\fBTable of Contents\fP'' +.ta 5i 5.5iR +.sp +Introduction ii\| +Chapter 1: Review of the Literature 1 +Chapter 2: Experimental Evidence 23 +.sp .5 +.DE +The .XS and .XE pairs may also be used in the text, +after a section header for instance, +in which case page numbers are supplied automatically. +However, most documents that require a table of contents +are too long to produce in one run, +which is necessary if this method is to work. +It is recommended that you do a table of contents +after finishing your document. +To print out the table of contents, use the .PX macro; +if you forget it, nothing will happen. +.PP +As an aid in producing text that will format correctly +with both \fBnroff\fP and \fBtroff\fP, +there are some new string definitions that define quotation marks +and dashes for each of these two formatting programs. +The \e\(**\^\u_\d string will yield two hyphens in \fBnroff\fP, +but in \fBtroff\fP it will produce an em dash\*- +like this one. +The \e\(**Q and \e\(**U strings will produce +`` and '' in \fBtroff\fP, but " in \fBnroff\fP. +(In typesetting, the double quote is traditionally considered bad form.) +.PP +There are now a large number of optional +foreign accent marks defined by the \*ms macros. +All the accent marks available in \*mos are present, +and they all work just as they always did. +However, there are better definitions available +by placing .AM at the beginning of your document. +Unlike the \*mos accent marks, +the accent strings should come \fIafter\fP\| the letter being accented. +Here is a list of the diacritical marks, +with examples of what they look like. +.DS +.ta 2i 3i +name of accent input output +\l'3.5i' +acute accent e\e\(**\' e\*' +grave accent e\e\(**\` e\*` +circumflex o\e\(**\d^\u o\*^ +cedilla c\e\(**, c\*, +tilde n\e\(**\d~\u n\*~ +question \e\(**? \*? +exclamation \e\(**! \*! +umlaut u\e\(**: u\*: +digraph s \e\(**8 \*8 +hac\*vek c\e\(**v c\*v +macron a\e\(**_ a\*_ +underdot s\e\(**. s\*. +o-slash o\e\(**/ o\*/ +angstrom a\e\(**o a\*o +yogh kni\e\(**3t kni\*3t +Thorn \e\(**(Th \*(Th +thorn \e\(**(th \*(th +Eth \e\(**(D- \*(D- +eth \e\(**(d- \*(d- +hooked o \e\(**q \*q +ae ligature \e\(**(ae \*(ae +AE ligature \e\(**(Ae \*(Ae +oe ligature \e\(**(oe \*(oe +OE ligature \e\(**(Oe \*(Oe +.DE +If you want to use these new diacritical marks, +don't forget the .AM at the top of your file. +Without it, some will not print at all, +and others will be placed on the wrong letter. +.PP +It is also possible to produce custom headers and footers +that are different on even and odd pages. +The .OH and .EH macros define odd and even headers, +while .OF and .EF define odd and even footers. +Arguments to these four macros are specified as with .tl. +This document was produced with: +.DS +\&.OH \'\ef\^IThe -mx Macros\'\'Page %\ef\^P\' +\&.EH \'\ef\^IPage %\'\'The -mx Macros\ef\^P\' +.DE +Note that it would be an error to have an apostrophe in the header text; +if you need one, you will have to use a different delimiter +around the left, center, and right portions of the title. +You can use any character as a delimiter, provided it doesn't appear +elsewhere in the argument to .OH, .EH, .OF, or EF. +.PP +The \*ms macros work in conjunction with +the \fBtbl\fR, \fBeqn\fR, and \fBrefer\fR preprocessors. +Macros to deal with these items are read in only as needed, +as are the thesis macros (.TM), +the special accent mark definitions (.AM), +table of contents macros (.XS and .XE), +and macros to format the optional cover page. +The code for the \*ms package lives in /usr/lib/tmac/tmac.s, +and sourced files reside in the directory /usr/ucb/lib/ms. +.sp diff --git a/share/doc/usd/19.memacros/Makefile b/share/doc/usd/19.memacros/Makefile new file mode 100644 index 0000000..4966e36 --- /dev/null +++ b/share/doc/usd/19.memacros/Makefile @@ -0,0 +1,18 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= usd/19.memacros +SRCS= meintro.me-sed +MACROS= -me +GROFFDIR= ${.CURDIR}/../../../../contrib/groff +SRCDIR= ${GROFFDIR}/doc + +version=`cat ${GROFFDIR}/VERSION` +revision=`sed -e 's/^0$$//' -e 's/^[1-9].*$$/.&/' ${GROFFDIR}/REVISION` + +meintro.me-sed: meintro.me + sed -e "s;@VERSION@;$(version)$(revision);" ${.ALLSRC} > ${.TARGET} + +CLEANFILES= ${SRCS} + +.include <bsd.doc.mk> diff --git a/share/doc/usd/20.meref/Makefile b/share/doc/usd/20.meref/Makefile new file mode 100644 index 0000000..af30e9b --- /dev/null +++ b/share/doc/usd/20.meref/Makefile @@ -0,0 +1,18 @@ +# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 +# $FreeBSD$ + +VOLUME= usd/20.meref +SRCS= meref.me-sed +MACROS= -me +GROFFDIR= ${.CURDIR}/../../../../contrib/groff +SRCDIR= ${GROFFDIR}/doc + +version=`cat ${GROFFDIR}/VERSION` +revision=`sed -e 's/^0$$//' -e 's/^[1-9].*$$/.&/' ${GROFFDIR}/REVISION` + +meref.me-sed: meref.me + sed -e "s;@VERSION@;$(version)$(revision);" ${.ALLSRC} > ${.TARGET} + +CLEANFILES= ${SRCS} + +.include <bsd.doc.mk> diff --git a/share/doc/usd/21.troff/Makefile b/share/doc/usd/21.troff/Makefile new file mode 100644 index 0000000..92ed429 --- /dev/null +++ b/share/doc/usd/21.troff/Makefile @@ -0,0 +1,8 @@ +# @(#)Makefile 8.1 (Berkeley) 8/14/93 +# $FreeBSD$ + +VOLUME= usd/21.troff +SRCS= m.mac m0 m0a m1 m2 m3 m4 m5 table1 table2 +USE_TBL= + +.include <bsd.doc.mk> diff --git a/share/doc/usd/21.troff/m.mac b/share/doc/usd/21.troff/m.mac new file mode 100644 index 0000000..6e68c4e --- /dev/null +++ b/share/doc/usd/21.troff/m.mac @@ -0,0 +1,288 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m.mac 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.if \n(mo=1 .ds mo January +.if \n(mo=2 .ds mo February +.if \n(mo=3 .ds mo March +.if \n(mo=4 .ds mo April +.if \n(mo=5 .ds mo May +.if \n(mo=6 .ds mo June +.if \n(mo=7 .ds mo July +.if \n(mo=8 .ds mo August +.if \n(mo=9 .ds mo September +.if \n(mo=10 .ds mo October +.if \n(mo=11 .ds mo November +.if \n(mo=12 .ds mo December +.if \n(dw=1 .ds dw Sunday +.if \n(dw=2 .ds dw Monday +.if \n(dw=3 .ds dw Tuesday +.if \n(dw=4 .ds dw Wednesday +.if \n(dw=5 .ds dw Thursday +.if \n(dw=6 .ds dw Friday +.if \n(dw=7 .ds dw Saturday +.\" +.bd S B 3 +.ds NR "\s-1NROFF\s+1 +.ds TR "\s-1TROFF\s+1 +.ds Nr "N\s-2ROFF\s+2 +.ds Tr "T\s-2ROFF\s+2 +.nr PS 10 +.hy 14 +.ds u \v'-0.3m'\s-2 +.ds d \s0\v'0.3m' +.nr 2C 0 +.ds H +.nr a .8i +.nr b 1.6i +.nr c 2.4i +.nr d 2.9i +.nr e 0.25i +.nr p 0 1 +.nr s 0 1 +.af p 1 +.af s 1 +.nr m -1i +.nr x 0 1 +.nr y 0+\nmu +.ev 1 +.ps \n(PS-2 +.vs \n(PS +.ll 6.5i +'in 0 +.ev +.tr &. +.de xx +.sp 0.4 +.. +.de ht +.tl `\*(Nr/\*(Tr User's Manual``USD:21-%` +.\" .tl 'updated to May 15, 1977'''\".tl 'Version \n(mo/\n(dy/\n(yr''' +.. +.de he +.tl `USD:21-%``\*(Nr/\*(Tr User's Manual` +.\" .tl 'updated to May 15, 1977'''\".tl 'Version \n(mo/\n(dy/\n(yr''' +.. +.de hd +.\".tl '\(rn''' +.if \\n%>1 \{'sp |.30i +.if e .he +.if o .ht +.ps \\n(S2 +.ps \\n(S1 +.ft +'sp |.9i\} +.nr x 0 1 +.nr y 0+\\nmu +.ch fo \\nmu +.if \\n(dn .fz +.ns +.if dmx .mx +.nr cl 0 1 +.mk +.. +.de fz +.fn +.nf +.fy +.fi +.ef +.. +.de fx +.if \\nx .di fy +.. +.de fo +.if dcx .cx +.nr dn 0 +.if \\nx .xf +.nr x 0 \"disable fx +.ie \\n(2C&(\\n+(cl<2) \{\ +.po +3.4i +.rt +.nr y 0+\\nmu +.ch fo \\nmu +.if \\n(dn .fz +.ns \} +.el \{\ +.po 26i/27u +.nr S1 \\n(.s +.ps +.nr S2 \\n(.s +.ps 10 +'bp +.\} +.. +.de 2C +.br +.mk +.nr 2C 1 +.ll 3.1i +.ev 1 +.ll 3.1i +.ev +.. +.de 1C +.br +.nr 2C 0 +.ll 6.5i +.ev 1 +.ll 6.5i +.ev +.. +.de co +.de cx +.br +\fI(Continued next page.)\fP +.br +.rm cx +\\.. +.. +.de pp +'ps \\n(PS +.ft R +.\"'tl ''- % -'' +'bp +.. +.wh 0 hd +.wh 12i fo +.wh \nmu fx +.ch fo \nmu +.de fn +.da FN +.ev 1 +.if \\n+x=1 .fs +.fi +.ti 0 +.. +.de xf +.ev 1 +.nf +.FN +.rm FN +.di +.ev +.. +.de fs +.ti 0 +\l'1i' +.br +.. +.de ef +.br +.ev +.di +.nr y -\\n(dn +.if \\nx=1 .nr y -2p +.ch fo \\nyu +.if \\n(nl+\\n(.v-\\n(.p-\\ny .ch fo \\n(nlu+\\n(.vu +.. +.wh -.6i pp +.de h1 +.xx +.ne 5 +.nf +.ta \\nau \\nbu \\ncu \\ndu +\\neu +.ft I +.bd I 3 +Request Initial If No +Form Value\\$2 Argument Notes\\$1 Explanation +.bd I +.ft R +.ft +.fi +.in \\ndu +.. +.de bt +.ft R +.xx +.ne 1.1 +.ti 0 +.. +.de b1 +.br +.ti 0 +.. +.de pg +.ft R +.fi +.in 0 +.xx +.ne 1.1 +.. +.de sc +.pg +\fI\\*H\\np.\\n+s.\|\\c +.ft R +.ul +.. +.de mh +.nr s 0 +.in 0 +.xx +.ne 2.5 +.ft B +\\*H\\n+p. +.. +.de x1 +.xx +.in .5i +.nf +.. +.de x2 +.xx +.in 0 +.fi +.. +.de EM +.br +\&\c +.pl 2i +.. +.em EM +.de TS +.sp +.. +.de TE +.sp +.ce 0 +.ft R +.ps \n(PS +.ta \\nau \\nbu \\ncu \\ndu +\\neu +.. +.de T& +.. diff --git a/share/doc/usd/21.troff/m0 b/share/doc/usd/21.troff/m0 new file mode 100644 index 0000000..d91c08f --- /dev/null +++ b/share/doc/usd/21.troff/m0 @@ -0,0 +1,290 @@ +.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode! +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m0 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.br +.rs +.sp |1.0i +.ce 1000 +.ps 12 +.ft B +\*(Nr\(sl\*(Tr User's Manual +.sp .2i +.ft I +.ps 10 +Joseph F. Ossanna +(updated for 4.3BSD by Mark Seiden) +.ft R +.sp +Bell Laboratories +Murray Hill, New Jersey 07974 +.ce 0 +.sp 2 +.ps \n(PS +.fi +.ft B +.ps +1 +NOTE: This document in its current form describes the \fItroff\fP\| program +supplied with 4.4BSD. The \fIgroff\fP\| program supplied with FreeBSD has a +number of additional features and a couple of small incompatibilities. See +\fIgroff(1)\fP\| for more details. +.ps +.sp 1 +Introduction +.pg +\*(NR and \*(TR are text processors under +the \s-1UNIX\s+1 Time-Sharing System +that format text for typewriter-like terminals and +for a \%Graphic Systems phototypesetter, respectively. +(Device-independent \*(TR, part of the Documenter's Workbench, +supports additional output devices.) +They accept lines of text interspersed with lines of +format control information and +format the text into a printable, paginated document +having a user-designed style. +\*(NR and \*(TR offer +unusual freedom in document styling, +including: +arbitrary style headers and footers; +arbitrary style footnotes; +multiple automatic sequence numbering for paragraphs, sections, etc; +multiple column output; +dynamic font and point-size control; +arbitrary horizontal and vertical local motions at any point; +and +a family of automatic overstriking, bracket construction, and +line drawing functions. +.pg +\*(NR and \*(TR are highly compatible with each other and it is almost always +possible to prepare input acceptable to both. +Conditional input is provided that enables +the user to embed input expressly destined for either program. +\*(NR can prepare output directly for a variety of terminal types and +is capable of utilizing the full resolution of each terminal. +.pg +.ft B +Usage +.pg +The general form of invoking \*(NR (or \*(TR) at \s-1UNIX\s+1 command level is +.x1 +\fBnroff \fIoptions files\fR\ +\h'|2i'(or \fBtroff \fIoptions files\fR) +.x2 +where \fIoptions\fR represents any of a number of option arguments +and \fIfiles\fR represents the list of files containing the document +to be formatted. +An argument consisting of a single minus (\fB\-\fR) is taken to be +a file name corresponding to the standard input. +If no file names are given input is taken from the standard input. +The options, which may appear in any order so long as they appear +before the files, are: +.sp +.ta .2i 1.0i +.ft I +.bd I 3 + Option Effect +.br +.bd I +.ft R +.ta .3i 1.0i +.in 1.0i +.ll -.3i +.bt + \fB\-i\fP Read standard input after the input files are exhausted. +.bt + \fB\-m\fIname\fR Prepends the macro file +\fB\(slusr\(sllib\(sltmac.\fIname\fR +to the input \fIfiles\fR. +.bt + \fB\-n\fIN\fR Number first generated page \fIN\fR. +.bt + \fB\-o\fIlist\fR \ +Print only pages whose page numbers appear in \fIlist\fR, +which consists of comma-separated numbers and number ranges. +A number range has the form \fIN\-M\fR +and means pages \fIN\fR through \fIM;\fR +a initial \fI\-N\fR means +from the beginning to page \fIN;\fR and a final \fIN\-\fR means +from \fIN\fR to the end. +.bt + \fB\-q\fR \ +Invoke the simultaneous input-output mode of the \fBrd\fR request. +.bt + \fB\-r\fIaN\fR Number register \fIa\fR (one-character) is set to \fIN\fR. +.bt + \fB\-s\fIN\fR Stop every \fIN\fR pages. +\*(NR will halt prior to every \fIN\fR pages (default \fIN\fR=1) +to allow paper loading or +changing, and will resume upon receipt of a newline. +\*(TR will stop the phototypesetter every \fIN\fR pages, +produce a trailer to allow changing cassettes, +and will resume after the phototypesetter \s-1START\s+1 button is pressed. +.bt + \fB\-z\fR Efficiently suppress formatted output. +Only produce output to standard error (from \fBtm\fP requests or +diagnostics). +.sp +.ne 5 +.ft I +.bd I 3 + \*(NR Only +.br +.bd I +.ft +.bt + \fB\-T\fIname\fR Specifies +the name of the output terminal type. +Currently defined names are \fB37\fR for the (default) Model 37 Teletype\(rg, +\fBtn300\fR for the GE TermiNet\ 300 (or any terminal without half-line +capabilities), +\fB300S\fR for the \s-1DASI\s+1-300S, +\fB300\fR for the \s-1DASI\s+1-300, +and +\fB450\fR for the \s-1DASI\s+1-450 (Diablo Hyterm). +.bt + \fB\-e\fR \ +Produce equally-spaced words in adjusted +lines, using full terminal resolution. +.bt + \fB\-h\fR \ +On output, use tabs during horizontal spacing to increase speed. +Device tabs setting are assumed to be (and input tabs are initially +set to) every 8 character widths. +.sp +.ne 3 +.ft I +.bd I 3 + \*(TR Only +.br +.bd I +.ft +.bt + \fB\-a\fP Send a printable \s-1(ASCII)\s+1 approximation +of the results to the standard output. +.bt + \fB\-b\fR \*(TR will report whether the phototypesetter +is busy or available. +No text processing is done. +.bt + \fB\-f\fP Refrain from feeding out paper and stopping +phototypesetter at the end of the run. +.bt + \fB\-t\fP Direct output to the standard output instead +of the phototypesetter. +.bt + \fB\-w\fP Wait until phototypesetter is available, if +currently busy. +.ll +.in 0 +.xx +.pg +Each option is invoked as a separate argument; +for example, +.x1 +\fBnroff \-o\fI4,8\-10 \fB\-T\fI300S \fB\-m\fIabc file1 file2\fR +.x2 +requests formatting of pages 4, 8, 9, and 10 of a document contained in the files +named \fIfile1\fR and \fIfile2\fR, +specifies the output terminal as a \s-1DASI\s+1-300S, +and invokes the macro package \fIabc\fR. +.pg +Various pre- and post-processors are available for use with \*(NR and \*(TR. +These include the equation preprocessors \s-1NEQN\s+1 and \s-1EQN\s+1\*u1\*d +(for \*(NR and \*(TR respectively), +and the table-construction preprocessor \s-1TBL\s+1\*u2\*d. +A reverse-line postprocessor \s-1COL\s+1\*u3\*d +is available for multiple-column \*(NR output on terminals without reverse-line ability; +\s-1COL\s+1 expects the Model 37 Teletype +escape sequences that \*(NR produces by default. +\s-1TK\s+1\*u3\*d +is a 37 Teletype simulator postprocessor for printing \*(NR output on a Tektronix 4014. +\s-1TC\s+1\*u5\*d +is a phototypesetter-simulator postprocessor +for \*(TR that produces an approximation of phototypesetter output +on a Tektronix 4014. +For example, in +.x1 +\fBtbl \fIfiles \fB| eqn | troff \-t \fIoptions \fB| tc\fR +.x2 +the first \|\fB|\fR\| indicates the piping of \s-1TBL\s+1's output to \s-1EQN\s+1's input; +the second the piping of \s-1EQN\s+1's output to \*(TR's input; +and the third indicates the piping of \*(TR's output to \s-1TC\s+1. +.br +.pg +The remainder of this manual consists of: +a Summary and outline; +a Reference Manual keyed to the outline; +and +a set of Tutorial Examples. +Another tutorial is [5]. +.sp .4 +.ps -1 +.vs -1p +.pg +.ft B +References +.pg +.ta .3i +.in .3i +.ti 0 +[1] B. W. Kernighan, L. L. Cherry, +.ul +Typesetting Mathematics \(em User's Guide (Second Edition), +Bell Laboratories. +.sp .4 +.ti 0 +[2] M. E. Lesk, +.ul +Tbl \(em A Program to Format Tables, +Bell Laboratories internal memorandum. +.sp .4 +.ti 0 +[3] Internal on-line documentation (\fIman\fP pages) on \s-1UNIX\s+1. +.sp .4 +.ti 0 +[4] B. W. Kernighan, \fIA TROFF Tutorial\fR, +Bell Laboratories. +.sp .4 +.ti 0 +[5] Your site may have similar programs for more modern displays. +.in 0 +.ps 10 +.vs 12 +.ft R +.bp diff --git a/share/doc/usd/21.troff/m0a b/share/doc/usd/21.troff/m0a new file mode 100644 index 0000000..1a0be6c --- /dev/null +++ b/share/doc/usd/21.troff/m0a @@ -0,0 +1,607 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m0a 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.br +.tr | +.ce +.ft B +SUMMARY OF REQUESTS AND OUTLINE OF THIS MANUAL +.ft R +.de mx +.ev 2 +.nf +.h1 +.in +.sp +.fi +.ev +.ns +.. +.xx +.h1 \s-1#\s+1 * +.fn +.sp .3 +*Values separated by "\fB;\fR" are for \*(NR and \*(TR respectively. +.sp .2 +\s-1#\s+1Notes are explained at the end of this Summary and Index +.ef +.mh +General Explanation +.mh +Font and Character Size Control +.bt +\fB&ps\fI\|\(+-N\fR 10\|point previous E Point size; also \fB\es\fI\(+-N\fR.\(dg +.b1 +\fB&fz\fI|F|\(+-N\fR off - E font \fIF\fR to point size \fI\(+-N\fR. +.b1 +\fB&fz|S|\fIF|\(+-N\fR off - E Special Font characters to point size \fI\(+-N\fR. +.b1 +\fB&ss\fI|N\fR 12\(sl36\|em ignored E Space-character size +set to \fIN\fR\(sl36\|em.\(dg +.b1 +\fB&cs\fI|F\|N\|M\fR off - P Constant character +space (width) +mode (font \fIF\^\fR\^).\(dg +.b1 +\fB&bd\fI|F|N\fR off - P Embolden font \fIF\fR by \fIN\fR\(mi1 units.\(dg +.b1 +\fB&bd|S|\fIF|N\fR off - P Embolden Special Font when current font is \fIF\fR.\(dg +.fn +.sp .2 +\(dgNo effect in \*(NR. +.ef +.b1 +\fB&ft\fI|F\fR Roman previous E Change to font +\fIF\fR|= \fIx\fR, \fIxx\fR, or 1-4. +Also \fB\ef\fIx\fR,\|\fB\ef(\fIxx\fR,\|\fB\ef\fIN\fR. +.b1 +\fB&fp\fI|N|F\fR R,I,B,S ignored - Font named \fIF\fR mounted on physical position 1\(<=\fIN\fR\(<=4. +.mh +Page Control +.bt +\fB&pl\fI|\(+-N\fR 11\|in 11\|in \fBv\fR Page length. +.b1 +\fB&bp|\fI\(+-N\fR \fIN\(eq\fR1 - B\(dd,\fBv\fR \ +Eject current page; next page number \fIN\fR. +.fn +.sp .2 +\(ddThe use of "\ \fB\'\fR\ " as control character (instead of "\fB.\fR") +suppresses the break function. +.ef +.b1 +\fB&pn\fI|\(+-N N\(eq\fR1 ignored - Next page number \fIN\fR. +.b1 +\fB&po\fI|\(+-N\fR 0;|26\(sl27\|in previous \fBv\fR Page offset. +.b1 +\fB&ne\fI|N\fR - \fIN\(eq\fR1\fIV\fR D,\fBv\fR Need \fIN\fR vertical space (\fIV\fR = vertical spacing). +.b1 +\fB&mk|\fIR\fR none internal D Mark current vertical place in register \fIR\fR. +.b1 +\fB&rt\fI|\(+-N\fR none internal D,\fBv\fR Return \fI(upward only)\fR to marked vertical place. +.mh +Text Filling, Adjusting, and Centering +.bt +\fB&br\fR - - B Break. +.b1 +.lg 0 +\fB&fi\fR \(fill - B,E Fill output lines. +.lg +.b1 +\fB&nf\fR fill - B,E No filling or adjusting of output lines. +.b1 +\fB&ad\fI|c\fR adj,both adjust E Adjust output lines with mode \fIc\fR. +.b1 +\fB&na\fR adjust - E No output line adjusting. +.b1 +\fB&ce\fI|N\fR off \fIN\(eq\fR1 B,E Center following \fIN\fR input text lines. +.mh +Vertical Spacing +.bt +\fB&vs\fI|N\fR 1\(sl6in;12pts previous E,\fBp\fR Vertical base line spacing (\fIV\fR\^). +.b1 +\fB&ls\fI|N N\(eq\fR1 previous E Output \fIN\(mi\fR1 \fIV\^\fRs after each text output line. +.b1 +\fB&sp\fI|N\fR - \fIN\(eq\fR1\fIV\fR B,\fBv\fR Space \ +vertical distance \fIN\fR \fIin either direction\fR. +.b1 +\fB&sv\fI|N\fR - \fIN\(eq\fR1\fIV\fR \fBv\fR Save vertical distance \fIN\fR. +.b1 +\fB&os\fR - - - Output saved vertical distance. +.b1 +\fB&ns\fR space - D Turn no-space mode on. +.b1 +\fB&rs\fR - - D Restore spacing; turn no-space mode off. +.mh +Line Length and Indenting +.bt +\fB&ll\fI|\(+-N\fR 6.5\|in previous E,\fBm\fR Line length. +.b1 +\fB&in\fI|\(+-N\fR \fIN\(eq\fR\^0 previous B,E,\fBm\fR Indent. +.b1 +\fB&ti\fI|\(+-N\fR - ignored B,E,\fBm\fR Temporary indent. +.mh +Macros, Strings, Diversion, and Position Traps +.bt +\fB&de\fI|xx|yy\fR - \fI.yy=\fB..\fR - Define or redefine macro \fIxx;\fR end at call of \fIyy\fR. +.b1 +\fB&am\fI|xx|yy\fR - \fI.yy=\fB..\fR - Append to a macro. +.b1 +\fB&ds\fI|xx|string\fR - ignored - Define a string \fIxx\fR containing \fIstring\fR. +.b1 +\fB&as\fI|xx|string\fR - ignored - Append \fIstring\fR to string \fIxx\fR. +.b1 +\fB&rm\fI|xx\fR - ignored - Remove request, macro, or string. +.b1 +\fB&rn\fI|xx|yy\fR - ignored - Rename request, macro, or string \fIxx\fR to \fIyy\fR. +.b1 +\fB&di\fI|xx\fR - end D Divert output to macro \fIxx\fR. +.b1 +\fB&da\fI|xx\fR - end D Divert and append to \fIxx\fR. +.b1 +\fB&wh\fI|N|xx\fR - - \fBv\fR Set location trap; negative is w.r.t. page bottom. +.b1 +\fB&ch\fI|xx|N\fR - - \fBv\fR Change trap location. +.b1 +\fB&dt\fI|N|xx\fR - off D,\fBv\fR Set a diversion trap. +.b1 +\fB&it\fI|N|xx\fR - off E Set an input-line count trap. +.b1 +\fB&em\fI|xx\fR none none - End macro is \fIxx\fI. +.mh +Number Registers +.bt +\fB&nr\fI|R|\(+-N|M\fR - - \fBu\fR Define and set number register \fIR\fR; auto-increment by \fIM\fR. +.b1 +\fB&af\fI|R|c\fR arabic - - Assign format to register \fIR\fR (\fIc=\fB1\fR, \fBi\fR, \fBI\fR, \fBa\fR, \fBA\fR). +.b1 +\fB&rr\fI|R\fR - - - Remove register \fIR\fR. +.mh +Tabs, Leaders, and Fields +.bt +\fB&ta\fI|Nt|...\fR 0.8;|0.5in none E,\fBm\fR Tab settings; \fIleft\fR type, unless \fIt=\fBR\fR(right), \fBC\fR(centered). +.b1 +\fB&tc\fI|c\fR none none E Tab repetition character. +.b1 +\fB&lc\fI|c\fR \fB.\fR none E Leader repetition character. +.b1 +\fB&fc\fI|a|b\fR off off - Set field delimiter \fIa\fR and pad character \fIb\fR. +.mh +Input and Output Conventions and Character Translations +.bt +\fB&ec\fI|c\fR \e \e - Set escape character. +.b1 +\fB&eo\fR on - - Turn off escape character mechanism. +.b1 +\fB&lg\fI|N\fR -;\|on on - Ligature mode +on if \fIN\fR>0. +.b1 +\fB&ul\fI|N\fR off \fIN\(eq\fR1 E Underline (italicize in \*(TR) \fIN\fR input lines. +.b1 +\fB&cu\fI|N\fR off \fIN\(eq\fR1 E Continuous underline in \*(NR; like \fBul\fR in \*(TR. +.b1 +\fB&uf\fI|F\fR Italic Italic - Underline font set to \fIF\fR (to be switched to by \fBul\fR). +.b1 +\fB&cc\fI|c\fR \fB. .\fR E Set control character to \fIc\fR. +.b1 +\fB&c2\fI|c\fR \fB\' \'\fR E Set nobreak control character to \fIc\fR. +.b1 +\fB&tr\fI|abcd....\fR none - O Translate \fIa\fR to \fIb\fR, etc. on output. +.mh +Local Horizontal and Vertical Motions, and the Width Function +.mh +Overstrike, Bracket, Line-drawing, and Zero-width Functions +.mh +Hyphenation. +.bt +\fB&nh\fR hyphenate - E No hyphenation. +.b1 +\fB&hy\fI|N\fR hyphenate hyphenate E Hyphenate; \fIN =\fR mode. +.b1 +\fB&hc\fI|c\fR \fB\e% \e%\fR E Hyphenation indicator character \fIc\fR. +.b1 +\fB&hw\fI|word1|...\fR ignored - Exception words. +.mh +Three Part Titles. +.bt +\fB&tl\fI|\'left\|\'center\|\'right\|\'\fR - - Three part title. +.b1 +\fB&pc\fI|c\fR \fB%\fR off - Page number character. +.b1 +\fB<\fI|\(+-N\fR 6.5\|in previous E,\fBm\fR Length of title. +.mh +Output Line Numbering. +.bt +\fB&nm\fI|\(+-N|M|S|I\fR off E Number mode on or off, set parameters. +.b1 +\fB&nn\fI|N\fR - \fIN\(eq\fR1 E Do not number next \fIN\fR lines. +.mh +Conditional Acceptance of Input +.bt +\fB&if\fI|c|anything\fR - - If condition \fIc\fR true, accept \fIanything\fR as input, +.b1 + for multi-line use \fI\e{anything\|\e}\fR. +.b1 +\fB&if|!\fIc|anything\fR - - If condition \fIc\fR false, accept \fIanything\fR. +.b1 +\fB&if\fI|N|anything\fR - \fBu\fR If expression \fIN\fR > 0, accept \fIanything\fR. +.b1 +\fB&if|!\fIN|anything\fR - \fBu\fR If expression \fIN\fR \(<= 0, accept \fIanything\fR. +.b1 +\fB&if\fI|\|\'string1\|\'string2\|\'|anything\fR - If \fIstring1\fR identical to \fIstring2\fR, +accept \fIanything\fR. +.b1 +\fB&if|!\fI\|\'string1\|\'string2\|\'|anything\fR - If \fIstring1\fR not identical to \fIstring2\fR, +accept \fIanything\fR. +.b1 +\fB&ie\fI|c|anything\fR - \fBu\fR If portion of if-else; all above forms (like \fBif\fR). +.b1 +\fB&el\fI|anything\fR - - Else portion of if-else. +.mh +Environment Switching. +.bt +\fB&ev\fI|N\fR \fIN\(eq\fR0 previous - Environment switched (\fIpush down\fR). +.mh +Insertions from the Standard Input +.bt +\fB&rd\fI|prompt\fR\fR - \fIprompt=\s-1\fRBEL\s+1 Read insertion. +.b1 +\fB&ex\fR - - - \ +Exit from \*(NR\(sl\*(TR. +.mh +Input\(slOutput File Switching +.bt +\fB&so\fI|filename\fR - - Switch source file \fI(push down)\fR. +.b1 +\fB&nx\fI|filename\fR end-of-file - Next file. +.b1 +\fB&pi\fI|program\fR - - Pipe output to \fIprogram\fR (\*(NR only). +.mh +Miscellaneous +.bt +\fB&mc\fI|c|N\fR - off E,\fBm\fR Set margin character \fIc\fR and separation \fIN\fR. +.b1 +\fB&tm\fI|string\fR - newline - Print \fIstring\fR on terminal \ +(\s-1UNIX\s+1 standard error output). +.b1 +\fB&ig\fI|yy\fR - \fI.yy=\fB..\fR - Ignore till call of \fIyy\fR. +.b1 +\fB&pm\fI|t\fR - all - Print macro names and sizes; +.b1 + if \fIt\fR present, print only total of sizes. +.b1 +\fB&ab\fI|string\fR - - - Print a message and abort. +.b1 +.lg 0 +\fB&fl\fR - - B Flush output buffer. +.lg +.mh +Output and Error Messages +.xx +.nf +.rm mx +.ft R +\l'\n(.lu' +.ft B +.xx +.ta .3iC .6i + Notes- +.xx +.ft R + B Request normally causes a break. + D Mode or relevant parameters associated with current diversion level. + E Relevant parameters are a part of the current environment. + O Must stay in effect until logical output. + P Mode must be still or again in effect at the time of physical output. + \fBv\fR,\fBp\fR,\fBm\fR,\fBu\fR Default scale indicator; if not specified, scale indicators are \fIignored\fR. +.br +.nr zz 11 +.de cl +.ie \\n+(cl<\n(zz \{\ +. po +\\n(.lu/\n(zzu +. rt +.\} +.el \{\ +.po 26i/27u +.\} +.. +.nr cl 0 1 +.di zz +.ta .3iR +.nf +.ps 8 +.vs 10 +ab 20 +ad 4 +af 8 +am 7 +as 7 +bd 2 +bp 3 +br 4 +c2 10 +cc 10 +ce 4 +ch 7 +cs 2 +cu 10 +da 7 +de 7 +di 7 +ds 7 +dt 7 +ec 10 +el 16 +em 7 +eo 10 +ev 17 +ex 18 +fc 9 +fi 4 +fl 20 +fp 2 +ft 2 +fz 2 +hc 13 +hw 13 +hy 13 +ie 16 +if 16 +ig 20 +in 6 +it 7 +lc 9 +lg 10 +li 10 +ll 6 +ls 5 +lt 14 +mc 20 +mk 3 +na 4 +ne 3 +nf 4 +nh 13 +nm 15 +nn 15 +nr 8 +ns 5 +nx 19 +os 5 +pc 14 +pi 19 +pl 3 +pm 20 +pn 3 +po 3 +ps 2 +rd 18 +rm 7 +rn 7 +rr 8 +rs 5 +rt 3 +so 19 +sp 5 +ss 2 +sv 5 +ta 9 +tc 9 +ti 6 +tl 14 +tm 20 +tr 10 +uf 10 +ul 10 +vs 5 +wh 7 +.di +.nr aa \n(dn/\n(zz +.ne \n(aau+10p +.sp +.ft B +Alphabetical Request and Section Number Cross Reference +.ft +.sp .3 +.wh \n(nlu+\n(aau cl +.nr qq \n(nlu+\n(aau +.ps +.vs +.mk +.zz +.rt +.sp \n(.tu +.ch cl 12i +.sp +.bp +.nf +.ft B +Escape Sequences for Characters, Indicators, and Functions +.ft R +.xx +.TS +c2l +c2l2l +n2l2l. +.ft I +.bd I 3 +Section Escape +Reference Sequence Meaning +.ft R +.bd I +.xx +10.1 \fB\e\e\fR \e (to prevent or delay the interpretation of \e\|) +10.1 \fB\ee\fR Printable version of the \fIcurrent\fR escape character. +2.1 \fB\e\'\fR \' (acute accent); equivalent to \fB\e(aa\fR +2.1 \fB\e\`\fR \` (grave accent); equivalent to \fB\e(ga\fR +2.1 \fB\e\-\fR \- Minus sign in the \fIcurrent\fR font +7 \fB\e\^.\fR Period (dot) (see \fBde\fR) +11.1 \fB\e\fR(space) Unpaddable space-size space character +11.1 \fB\e0\fR Digit width space +.tr || +11.1 \fB\e\||\fR 1\(sl6\|em narrow space character (zero width in \*(NR) +.tr | +11.1 \fB\e^\fR 1\(sl12\|em half-narrow space character (zero width in \*(NR) +.tr && +4.1 \fB\e&\fR Non-printing, zero width character +.tr &. +10.6 \fB\e!\fR Transparent line indicator +10.7 \fB\e"\fR Beginning of comment +7.3 \fB\e$\fIN\fR Interpolate argument 1\(<=\fIN\fR\(<=9 +13 \fB\e%\fR Default optional hyphenation character +2.1 \fB\e(\fIxx\fR Character named \fIxx\fR +7.1 \fB\e\(**\fIx\fR,|\fB\e\(**(\fIxx\fR Interpolate string \fIx\fR or \fIxx\fR +9.1 \fB\ea\fR Non-interpreted leader character +12.3 \fB\eb\fI\'abc...\|\'\fR Bracket building function +4.2 \fB\ec\fR Interrupt text processing +11.1 \fB\ed\fR Forward (down) 1\(sl2\|em vertical motion (1\(sl2 line in \*(NR) +2.2 \fB\ef\fIx\fR,\fB\ef(\fIxx\fR,\fB\ef\fIN\fR Change to font named \fIx\fR or \fIxx\fR, or position \fIN\fR +11.1 \fB\eh\fI\'N|\'\fR Local horizontal motion; move right \fIN\fR \fI(negative left)\fR +11.3 \fB\ek\fIx\fR Mark horizontal \fIinput\fR place in register \fIx\fR +12.4 \fB\el\fI\|\'Nc\|\'\fR Horizontal line drawing function (optionally with \fIc\fR\|) +12.4 \fB\eL\fI\'Nc\|\'\fR Vertical line drawing function (optionally with \fIc\fR\|) +8 \fB\en\fIx\fR,\fB\en(\fIxx\fR Interpolate number register \fIx\fR or \fIxx\fR +12.1 \fB\eo\fI\'abc...\|\'\fR Overstrike characters \fIa, b, c, ...\fR +4.1 \fB\ep\fR Break and spread output line +11.1 \fB\er\fR Reverse 1\|em vertical motion (reverse line in \*(NR) +2.3 \fB\es\fIN\fR,\|\fB\es\fI\(+-N\fR Point-size change function +9.1 \fB\et\fR Non-interpreted horizontal tab +11.1 \fB\eu\fR Reverse (up) 1\(sl2\|em vertical motion (1\(sl2 line in \*(NR) +11.1 \fB\ev\fI\'N\|\|\'\fR Local vertical motion; move down \fIN\fR \fI(negative up)\fR +11.2 \fB\ew\fI\'string\|\'\fR Interpolate width of \fIstring\fR +5.2 \fB\ex\fI\'N\|\|\'\fR Extra line-space function \fI(negative before, positive after)\fR +12.2 \fB\ez\fIc\fR Print \fIc\fR with zero width (without spacing) +16 \fB\e{\fR Begin conditional input +16 \fB\e}\fR End conditional input +10.7 \fB\e\fR(newline) Concealed (ignored) newline +- \fB\e\fIX\fR \fIX\fR, any character \fInot\fR listed above +.TE +.fi +.sp +The escape sequences +\fB\e\e\fR, +\fB\e\^.\fR, +\fB\e"\fR, +\fB\e$\fR, +\fB\e\(**\fR, +\fB\ea\fR, +\fB\en\fR, +\fB\et\fR, +and +\fB\e\fR(newline) are interpreted in \fIcopy mode\fR (\(sc7.2). +.bp +.ft B +.nf +Predefined General Number Registers +.ft +.TS +c2l +c2l2l +n2l2l. +.ft I +.bd I 3 +Section Register +Reference Name Description +.ft R +.bd I +.xx +3 \fB%\fR Current page number. +19 \fBc&\fR Number of \fIlines\fR read from current input file. +11.2 \fBct\fR Character type (set by \fIwidth\fR function). +7.4 \fBdl\fR Width (maximum) of last completed diversion. +7.4 \fBdn\fR Height (vertical size) of last completed diversion. +- \fBdw\fR Current day of the week (1-7). +- \fBdy\fR Current day of the month (1-31). +11.3 \fBhp\fR Current horizontal place on \fIinput\fR line (not in ditroff) +15 \fBln\fR Output line number. +- \fBmo\fR Current month (1-12). +4.1 \fBnl\fR Vertical position of last printed text base-line. +11.2 \fBsb\fR Depth of string below base line (generated by \fIwidth\fR function). +11.2 \fBst\fR Height of string above base line (generated by \fIwidth\fR function). +- \fByr\fR Last two digits of current year. +.TE +.sp +.ft B +Predefined Read-Only Number Registers +.ft R +.TS +c2l +c2l2l +n2l2l. +.ft I +.bd I 3 +Section Register +Reference Name Description +.ft R +.bd I +.xx +7.3 \fB&$\fR Number of arguments available at the current macro level. +- \fB&A\fR Set to 1 in \*(TR, if \fB\-a\fR option used; always 1 in \*(NR. +11.1 \fB&H\fR Available horizontal resolution in basic units. +5.3 \fB&L\fR Set to current \fIline-spacing\fR (\fBls\fR) parameter +- \fB&P\fR Set to 1 if the current page is being printed; otherwise 0. +- \fB&T\fR Set to 1 in \*(NR, if \fB\-T\fR option used; always 0 in \*(TR. +11.1 \fB&V\fR Available vertical resolution in basic units. +5.2 \fB&a\fR Post-line extra line-space most recently utilized \ +using \fB\ex\fI\'N\|\'\fR. +19 \fB&c\fR Number of \fIlines\fR read from current input file. +7.4 \fB&d\fR Current vertical place in current diversion; equal to \fBnl\fR, if no diversion. +2.2 \fB&f\fR Current font as physical quadrant (1-4). +4 \fB&h\fR Text base-line high-water mark on current page or diversion. +6 \fB&i\fR Current indent. +4.2 \fB&j\fR Current adjustment mode and type. +4.1 \fB&k\fR Length of text portion on current partial output line. +6 \fB&l\fR Current line length. +4 \fB&n\fR Length of text portion on previous output line. +3 \fB&o\fR Current page offset. +3 \fB&p\fR Current page length. +2.3 \fB&s\fR Current point size. +7.5 \fB&t\fR Distance to the next trap. +4.1 \fB&u\fR Equal to 1 in fill mode and 0 in nofill mode. +5.1 \fB&v\fR Current vertical line spacing. +11.2 \fB&w\fR Width of previous character. +- \fB&x\fR Reserved version-dependent register. +- \fB&y\fR Reserved version-dependent register. +7.4 \fB&z\fR Name of current diversion. +.TE +.in 0 +.fi +.ps 10 +.vs 12 +.ft R +.bp diff --git a/share/doc/usd/21.troff/m1 b/share/doc/usd/21.troff/m1 new file mode 100644 index 0000000..0df1d52 --- /dev/null +++ b/share/doc/usd/21.troff/m1 @@ -0,0 +1,746 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m1 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.nr p 0 1 +.tr | +.tr ~| +.rm mx +.br +.ce +.ft B +.ps +2 +.rs +.\".sp1.0i +REFERENCE MANUAL +.ft R +.ps -2 +.sp +.mh +General Explanation +.sc +Form of input. +Input consists of \fItext lines\fR, which are destined to be printed, +interspersed with \fIcontrol lines\fR, +which set parameters or otherwise control subsequent processing. +Control lines begin with a \fIcontrol character\fR\(em\ +normally \fB.\fR (period) or \fB\'\fR (acute accent)\(em\ +followed by a one or two character name that specifies +a basic \fIrequest\fR or the substitution of +a user-defined \fImacro\fR in place of the control line. +The control character \fB\'\fR suppresses the \fIbreak\fR function\(em\ +the forced output of a partially filled line\(em\ +caused by certain requests. +The control character may be separated from the request/macro name by +white space (spaces and/or tabs) for \(aesthetic reasons. +Names must be followed by either +space or newline. +Control lines with unrecognized names are ignored. +.pg +Various special functions may be introduced anywhere in the input by +means of an \fIescape\fR character, normally \fB\e\fR. +For example, the function +\fB\en\fIR\fR +causes the interpolation (insertion in place) of the contents of the +\fInumber register R\fR +in place of the function; +here \fIR\fR is either a single character name +as in \fB\en\fIx\fR, +or left-parenthesis-introduced, two-character name as in \fB\en(\fIxx\fR. +.sc +Formatter and device resolution. +\*(TR internally uses 432 units\(slinch, (for historical reasons, corresponding to +the Graphic Systems phototypesetter +which had a horizontal resolution of +1\(sl432 inch and a vertical resolution +of 1\(sl144 inch.) +\*(NR internally uses 240 units\(slinch, +corresponding to the least common multiple of the +horizontal and vertical resolutions of various +typewriter-like output devices. +\*(TR rounds horizontal\(slvertical numerical parameter input to its own +internal horizontal\(slvertical resolution. +\*(NR similarly rounds numerical input to the actual resolution +of the output device indicated by the \fB\(miT\fR option +(default Model 37 Teletype). +.sc +Numerical parameter input. +Both \*(NR and \*(TR +accept numerical input with the scale +indicator suffixes +shown in the following table, +where +\fIS\fR is the current type size in points, +\fIV\fR is the current vertical line spacing in +basic units, +and +\fIC\fR is a \fInominal character width\fR in basic units. +.TS +center box; +c|c|ls +c|c|ll +c|l|l|l. +Scale Number of basic units +Indicator Meaning \*(TR \*(NR +_ +\fBi\fR Inch 432 240 +\fBc\fR Centimeter 432\(mu50\(sl127 240\(mu50\(sl127 +\fBP\fR Pica = 1\(sl6 inch 72 240\(sl6 +\fBm\fR Em = \fIS\fR points 6\(mu\fIS\fR \fIC\fR +\fBn\fR En = Em\(sl2 3\(mu\fIS\fR \fIC, same as Em\fR +\fBp\fR Point = 1\(sl72 inch 6 240\(sl72 +\fBu\fR Basic unit 1 1 +\fBv\fR Vertical line space \fIV\fR \fIV\fR +none Default, see below +.TE +In \*(NR, \fIboth\fR the em and the en are taken to be equal to the \fIC\fR, +which is output-device dependent; +common values are 1\(sl10 and 1\(sl12 inch. +Actual character widths in \*(NR need not be all the same and constructed characters +such as \(mi> (\(->) are often extra wide. +The default scaling is ems for the horizontally-oriented requests +and functions +\fBll\fR, +\fBin\fR, +\fBti\fR, +\fBta\fR, +\fBlt\fR, +\fBpo\fR, +\fBmc\fR, +\fB\eh\fR, +and +\fB\el\fR; +\fIV\^\fRs +for the vertically-oriented requests and functions +\fBpl\fR, +\fBwh\fR, +\fBch\fR, +\fBdt\fR, +\fBsp\fR, +\fBsv\fR, +\fBne\fR, +\fBrt\fR, +\fB\ev\fR, +\fB\ex\fR, +and +\fB\eL\fR; +\fBp\fR for the \fBvs\fR request; +and \fBu\fR for the requests +\fBnr\fR, +\fBif\fR, +and +\fBie\fR. +\fIAll\fR other requests ignore any scale indicators. +When a number register containing an already appropriately scaled number +is interpolated to provide numerical input, +the unit scale indicator +\fBu\fR may need to be appended to prevent +an additional inappropriate default scaling. +The number, \fIN\fR, may be specified in decimal-fraction form +but the parameter finally stored is rounded to an integer number of basic units. +.pg +The \fIabsolute position\fR indicator \fB~\fR may be prefixed +to a number \fIN\fR +to generate the distance to the vertical or horizontal place \fIN\fR. +For vertically-oriented requests and functions, \fB~\|\fIN\fR +becomes the distance in basic units from the current vertical place on the page or in a \fIdiversion\fR (\(sc7.4) +to the vertical place \fIN\fR. +For \fIall\fR other requests and functions, +\fB~\|\fIN\fR +becomes the distance from +the current horizontal place on the \fIinput\fR line to the horizontal place \fIN\fR. +For example, +.x1 +\&\fB.sp ~\|3.2c\fR +.x2 +will space \fIin the required direction\fR to 3.2 centimeters from the top of the page. +.sc +.tr && +Numerical expressions. +Wherever numerical input is expected, an expression involving parentheses, +the arithmetic operators \fB\(pl\fR, \fB\(mi\fR, \fB\(sl\fR, \fB\(**\fR, \fB%\fR (mod), +and the logical operators +\fB<\fR, +\fB>\fR, +\fB<\(eq\fR, +\fB>\(eq\fR, +\fB\(eq\fR (or \fB\(eq\(eq\fR), +\fB&\fR\ (and), +\fB:\fR\ (or) +may be used. +Except where controlled by parentheses, evaluation of expressions is left-to-right; +there is no operator precedence. +In the case of certain requests, an initial \fB\(pl\fR or \fB\(mi\fR is stripped +and interpreted as an increment or decrement indicator respectively. +In the presence of default scaling, the desired scale indicator must be +attached to \fIevery\fR number in an expression +for which the desired and default scaling differ. +For example, +if the number register \fBx\fR contains 2 +and the current point size is 10, +then +.br +.tr &. +.x1 +.ft B +\&.ll (4.25i\(pl\enxP\(pl3)\(sl2u +.ft R +.x2 +will set the line length to 1\(sl2 the sum of 4.25 inches \(pl 2 picas \(pl 30 points. +.sc +Notation. +Numerical parameters are indicated in this manual in two ways. +\(+-\fIN\fR means that the argument may take the forms \fIN\fR, \(pl\fIN\fR, or \(mi\fIN\fR and +that the corresponding effect is to set the affected parameter +to \fIN\fR, to increment it by \fIN\fR, or to decrement it by \fIN\fR respectively. +Plain \fIN\fR means that an initial algebraic sign is \fInot\fR +an increment indicator, +but merely the sign of \fIN\fR. +Generally, unreasonable numerical input is either ignored +or truncated to a reasonable value. +For example, +most requests expect to set parameters to non-negative +values; +exceptions are +\fBsp\fR, +\fBwh\fR, +\fBch\fR, +\fBnr\fR, +and +\fBif\fR. +The requests +\fBps\fR, +\fBft\fR, +\fBpo\fR, +\fBvs\fR, +\fBls\fR, +\fBll\fR, +\fBin\fR, +and +\fBlt\fR +restore the \fIprevious\fR parameter value in the \fIabsence\fR +of an argument. +.pg +Single character arguments are indicated by single lower case letters +and +one/two character arguments are indicated by a pair of lower case letters. +Character string arguments are indicated by multi-character mnemonics. +.mh +Font and Character Size Control +.sc +Character set. +The \*(TR character set consists of a typesetter-dependent basic +character set plus a Special Mathematical Font character +set\(emeach having 102 characters. +An example of these character sets is shown in the Appendix Table|I. +All printable \s-1ASCII\s+1 characters are included, +with some on the Special Font. +With three exceptions, these \s-1ASCII\s+1 characters are input as themselves, +and non-\s-1ASCII\s+1 characters are input in the form \fB\e(\fIxx\fR where +\fIxx\fR is a two-character name given in the Appendix Table|II. +The three \s-1ASCII\s+1 exceptions are mapped as follows: +.TS +center box; +cs|cs +cc|cc +cl|cl. +\s-1ASCII\s+1 Input Printed by \*(TR +Character Name Character Name +_ +\' acute accent ' close quote +\` grave accent ` open quote +\(mi minus - hyphen +.TE +.tr ~~ +The characters +\fB\'\fR, +\fB\`\fR, +and +\fB\-\fR +may be input +by \fB\e\'\fR, \fB\e\`\fR, and \fB\e\-\fR respectively or by their names (Table II). +The \s-1ASCII\s+1 characters \fB@\fR, \fB#\fR, \fB"\fR, \fB\(aa\fR, \fB\(ga\fR, \fB<\fR, \fB>\fR, \fB\e\fR, \fB{\fR, \fB}\fR, \fB~\fR, \fB^\fR, and \fB\(ul\fR exist +only on the Special Font and are printed as a 1-em space if that font +is not mounted. +.pg +.tr ~| +\*(NR understands the entire \*(TR character set, +but can in general print only \s-1ASCII\s+1 +characters, +additional characters as may be available on +the output device, +such characters as may be able to be constructed +by overstriking or other combination, +and those that can reasonably be mapped +into other printable characters. +The exact behavior is determined by a driving +table prepared for each device. +The characters +\fB\'\fR, +\fB\`\fR, +and +\fB\(ul\fR +print +as themselves. +.sc +Fonts. +The default mounted fonts are +Times Roman (\fBR\fR), +Times Italic (\fBI\fR), +Times Bold (\fBB\fR), +and the Special Mathematical Font (\fBS\fR) +on physical typesetter positions 1, 2, 3, and 4 respectively. +These fonts are used in this document. +The \fIcurrent\fR font, initially Roman, may be changed +(among the mounted fonts) +by use of the \fBft\fR request, +or by imbedding at any desired point +either \fB\ef\fIx\fR, \fB\ef(\fIxx\fR, or \fB\ef\fIN\fR +where +\fIx\fR and \fIxx\fR are the name of a mounted font +and \fIN\fR is a numerical font position. +It is \fInot\fR necessary to change to the Special Font; +characters on that font are automatically handled. +A request for a named but not-mounted font is \fIignored\fR. +\*(TR can be informed that any particular font is mounted +by use of the \fBfp\fR request. +The list of known fonts is installation dependent. +In the subsequent discussion of font-related requests, +\fIF\fR represents either a one\(sltwo-character +font name or the numerical font position, 1-4. +The current font is available (as numerical position) in the read-only number register \fB.f\fR. +.pg +\*(NR understands font control +and normally underlines Italic characters (see \(sc10.5). +.sc +Character size. +Character point sizes available are typesetter dependent, but often include +6, 7, 8, 9, 10, 11, 12, 14, 16, 18, 20, 22, 24, 28, and 36. +This is a range of 1\(sl12 inch to 1\(sl2 inch. +The \fBps\fR request is used to change or restore the point size. +Alternatively the point size may be changed between any two characters +by imbedding a \fB\es\fIN\fR +at the desired point +to set the size to \fIN\fR, +or a \fB\es\fI\(+-N\fR (1\(<=\fIN\fR\(<=9) +to increment\(sldecrement the size by \fIN\fR; +\fB\es0\fR restores the \fIprevious\fR size. +Requested point size values that are between two valid +sizes yield the larger of the two. +The current size is available in the \fB.s\fR register. +\*(NR ignores type size control. +.h1 * +.fn +.xx +*Notes are explained at the end of the Summary and Index above. +.ef +.bt +\fB&ps\fI|\(+-N\fR 10\|point previous E Point size +set to \(+-\fIN\fR. +Alternatively imbed \fB\es\fIN\fR or \fB\es\fI\(+-N\fR. +Any positive size value may be requested; +if invalid, the next larger valid size will result, with a +maximum of 36. +A paired sequence +\(pl\fIN\fR,\|\(mi\fIN\fR +will work because the previous requested value is also remembered. +Ignored in \*(NR. +.bt +\fB&fz\fI|F|\(+-N\fR off - E The characters in font \fIF\fR will be adjusted to +be in size \(+-\fIN\fR. Characters in the Special Font encountered during the +use of font \fIF\fR will have the same size modification. (Use the \fB&fz S\fR +request if different treatment of Special Font characters is required). \fB&fz\fR +must follow any \fB&fp\fR request for the position. +.bt +\fB&fz|S|\fIF|\(+-N\fR off - E The characters in the Special Font +will be in size \(+-\fIN\fR independent of previous \fB&fz\fR requests. +.bt +\fB&ss\fI|N\fR 12\(sl36\|em ignored E Space-character size +is set to \fIN\fR\(sl36\|ems. +This size is the minimum word spacing in adjusted text. +Ignored in \*(NR. +.bt +\fB&cs|\fIF\|N\|M\fR off - P Constant character space +(width) mode is +set on for font \fIF\fR (if mounted); the width of every character will be +taken to be \fIN\fR\(sl36 ems. +If \fIM\fR is absent, +the em is that of the character's point size; +if \fIM\fR is given, +the em is \fIM\fR-points. +All affected characters +are centered in this space, including those with an actual width +larger than this space. +Special Font characters occurring while the current font +is \fIF\fR are also so treated. +If \fIN\fR is absent, the mode is turned off. +The mode must be still or again in effect when the characters are physically printed. +Ignored in \*(NR. +.bt +\fB&bd\fI|F|N\fR off - P The characters in font \fIF\fR will be artificially +emboldened by printing each one twice, separated by \fIN\fR\^\(mi1 basic units. +A reasonable value for \fIN\fR is 3 when the character size is in the vicinity +of 10 points. +If \fIN\fR is missing the embolden mode is turned off. +The column heads above were printed with \fB.bd|I|3\fR. +The mode must be still or again in effect when the characters are physically printed. +Ignored in \*(NR. +.bt +\fB&bd|S|\fIF|N\fR off - P The characters in the Special Font +will be emboldened whenever the current font is \fIF\fR. +This manual was printed with \fB.bd\|S\|B\|3\fR. +The mode must be still or again in effect when the characters are physically printed. +.bt +\fB&ft|\fIF\fR Roman previous E Font changed to +\fIF\fR. +Alternatively, imbed \fB\ef\fIF\fR. +The font name \fBP\fR is reserved to mean the previous font. +.bt +\fB&fp|\fIN|F\fR R,I,B,S ignored - Font position. +This is a statement +that a font named \fIF\fR is mounted on position \fIN\fR (1-4). +It is a fatal error if \fIF\fR is not known. +The phototypesetter has four fonts physically mounted. +Each font consists of a film strip which can be mounted on a numbered +quadrant of a wheel. +The default mounting sequence assumed by \*(TR is +R, I, B, and S on positions 1, 2, 3 and 4. +.mh +Page control +.pg +Top and bottom margins are \fInot\fR automatically provided; +it is conventional to define two \fImacros\fR and to set \fItraps\fR +for them at vertical positions 0 (top) and \fI\(miN\fR (\fIN\fR from the bottom). +See \(sc7 and Tutorial Examples \(scT2. +A pseudo-page transition onto the \fIfirst\fR page occurs +either when the first \fIbreak\fR occurs or +when the first \fInon-diverted\fR text processing occurs. +Arrangements +for a trap to occur at the top of the first page +must be completed before this transition. +In the following, references to the \fIcurrent diversion\fR (\(sc7.4) +mean that the mechanism being described works during both +ordinary and diverted output (the former considered as the top diversion level). +.pg +The usable page width on the Graphic Systems phototypesetter +was about 7.54|inches, +beginning about 1\(sl27|inch from the left edge of the +8|inch wide, continuous roll paper, but these characteristics are typesetter- +dependent. +The physical limitations on \*(NR output +are output-device dependent. +.h1 +.bt +\fB&pl\fI|\(+-N\fR 11\|in 11\|in \fBv\fR Page length set to \fI\(+-N\fR. +The internal limitation is about 75|inches in \*(TR and +about 136|inches in \*(NR. +The current page length is available in the \fB.p\fR register. +.bt +\fB&bp\fI|\(+-N\fR \fIN\(eq\fR1 - B*,\fBv\fR Begin page. +.fn +.xx +*The use of "\ \fB\'\fR\ " as control character (instead of "\fB.\fR") +suppresses the break function. +.ef +The current page is ejected and a new page is begun. +If \fI\(+-N\fR is given, the new page number will be \fI\(+-N\fR. +Also see request \fBns\fR. +.bt +\fB&pn\fI|\(+-N\fR \fIN\fR\(eq1 ignored - Page number. +The next page (when it occurs) will have the page number \fI\(+-N\fR. +A \fBpn\fR must occur before the initial pseudo-page transition +to affect the page number of the first page. +The current page number is in the \fB%\fR register. +.bt +\fB&po\fI|\(+-N\fR 0;|26\(sl27\|in\(dg previous \fBv\fR Page offset. +.fn +.xx +\(dgValues separated by ";" are for \*(NR and \*(TR respectively. +.ef +The current \fIleft margin\fR is set to \fI\(+-N\fR. +The \*(TR initial value provides about 1|inch of paper margin +including the physical typesetter margin of 1\(sl27|inch. +In \*(TR the maximum (line-length)+(page-offset) is about 7.54 inches. +See \(sc6. +The current page offset is available in the \fB.o\fR register. +.bt +\fB&ne\fI|N\fR - \fIN\(eq\fR1\|\fIV\fR D,\fBv\fR Need \fIN\fR vertical space. +If the distance, \fID\fR, to the next trap position (see \(sc7.5) is less than \fIN\fR, +a forward vertical space of size \fID\fR occurs, +which will spring the trap. +If there are no remaining +traps on the page, +\fID\fR is the distance to the bottom of the page. +If \fID\|<\|V\fR, another line could still be output +and spring the trap. +In a diversion, \fID\fR is the distance to the \fIdiversion trap\fR, if any, +or is very large. +.bt +\fB&mk\fI|R\fR none internal D Mark the \fIcurrent\fR vertical place +in an internal register (both associated with the current diversion level), +or in register \fIR\fR, if given. +See \fBrt\fR request. +.bt +\fB&rt\fI|\(+-N\fR none internal D,\fBv\fR Return \fIupward only\fR to a marked vertical place +in the current diversion. +If \fI\(+-N\fR (w.r.t. current place) is given, +the place is \fI\(+-N\fR from the top of the page or diversion +or, if \fIN\fR is absent, to a +place marked by a previous \fBmk\fR. +Note that the \fBsp\fR request (\(sc5.3) may be used +in all cases instead of \fBrt\fR +by spacing to the absolute place stored in an explicit register; +e.|g. using the sequence \fB.mk|\fIR\fR ... \fB.sp|~\|\en\fIR\fBu\fR. +.mh +Text Filling, Adjusting, and Centering +.sc +Filling and adjusting. +Normally, +words are collected from input text lines +and assembled into an output text line +until some word doesn't fit. +An attempt is then made +to hyphenate the word to assemble a part +of it into the output line. +The spaces between the words on the output line +are then increased to spread out the line +to the current \fIline length\fR +minus any current \fIindent\fR. +A \fIword\fR is any string of characters delimited by +the \fIspace\fR character or the beginning/end of the input line. +Any adjacent pair of words that must be kept together +(neither split across output lines nor spread apart +in the adjustment process) +can be tied together by separating them with the +\fIunpaddable space\fR character +"\fB\e\ \ \fR" (backslash-space). +The adjusted word spacings are uniform in \*(TR +and the minimum interword spacing can be controlled +with the \fBss\fR request (\(sc2). +In \*(NR, they are normally nonuniform because of +quantization to character-size spaces; +however, +the command line option \fB\-e\fR causes uniform +spacing with full output device resolution. +Filling, adjustment, and hyphenation (\(sc13) can all be +prevented or controlled. +The \fItext length\fR on the last line output is available in the \fB.n\fR register, +and text base-line position on the page for this line is in the \fBnl\fR register. +The text base-line high-water mark (lowest place) on the current page is in +the \fB.h\fR register. The \fB.k\fR register (read-only) contains the horizontal size of +the text portion (without indent) of the current partially-collected output +line (if any) in the current environment. +.pg +An input text line ending with \fB.\fR\^, \fB?\fR, or \fB!\fR is taken +to be the end of a \fIsentence\fR, and an additional space character is +automatically provided during filling. +Multiple inter-word space characters found in the input are retained, +except for trailing spaces; +initial spaces also cause a \fIbreak\fR. +.pg +When filling is in effect, a \fB\ep\fR may be imbedded or attached to a word to +cause a \fIbreak\fR at the \fIend\fR of the word and have the resulting output +line \fIspread out\fR to fill the current line length. +.pg +.tr && +A text input line that happens to begin +with a control character (\(sc10.4) can +be made to not look like a control line +by preceding it by +the non-printing, zero-width filler character \fB\e&\fR. +Still another way is to specify output translation of some +convenient character into the control character +using \fBtr\fR (\(sc10.5). +.tr &. +.sc +Interrupted text. +The copying of an input line in \fInofill\fR +(non-fill) mode can be \fIinterrupted\fR by terminating +the partial line with a \fB\ec\fR. +The \fInext\fR encountered input text line will be considered to be a continuation +of the same line of input text. +Similarly, +a word within \fIfilled\fR text may be interrupted by terminating the +word (and line) with \fB\ec\fR; +the next encountered text will be taken as a continuation of the +interrupted word. +If the intervening control lines cause a break, +any partial line will be forced out along with any partial word. +.h1 +.bt +\fB&br\fR - - B Break. +The filling of the line currently +being collected is stopped and +the line is output without adjustment. +Text lines beginning with space characters +and empty text lines (blank lines) also cause a break. +.bt +.lg 0 +\fB&fi\fR \(fill|on - B,E Fill subsequent output lines. +.lg +The register \fB.u\fR is 1 in fill mode and 0 in nofill mode. +.bt +\fB&nf\fR fill|on - B,E Nofill. +Subsequent output lines are \fIneither\fR filled \fInor\fR adjusted. +Input text lines are copied directly to output lines +\fIwithout regard\fR for the current line length. +.bt +\fB&ad\fI|c\fR adj,both adjust E \ +Line adjustment is begun. +If fill mode is not on, adjustment will be deferred until +fill mode is back on. +If the type indicator \fIc\fR is present, +the adjustment type is changed as shown in the following table. +The type indicator can also be a value saved from the read-only \fB.j\fR number +register, which is set to contain the current adjustment mode and type. +.TS +center box; +c|c +c|l. +Indicator Adjust Type +_ +\fBl\fR adjust left margin only +\fBr\fR adjust right margin only +\fBc\fR center +\fBb\fR or \fBn\fR adjust both margins +absent unchanged +.TE +.bt +\fB&na\fR adjust - E Noadjust. +Adjustment is turned off; +the right margin will be ragged. +The adjustment type for \fBad\fR is not changed. +Output line filling still occurs if fill mode is on. +.bt +\fB&ce\fI|N\fR off \fIN\fR\(eq1 B,E Center the next \fIN\fR input text lines +within the current (line-length minus indent). +If \fIN\fR\(eq\^0, any residual count is cleared. +A break occurs after each of the \fIN\fR input lines. +If the input line is too long, +it will be left adjusted. +.mh +Vertical Spacing +.sc +Base-line spacing. +The vertical spacing \fI(V)\fR between the base-lines of successive +output lines can be set +using the \fBvs\fR request +with a resolution of 1\(sl144\|inch\|\(eq\|1\(sl2|point +in \*(TR, +and to the output device resolution in \*(NR. +\fIV\fR must be large enough to accommodate the character sizes +on the affected output lines. +For the common type sizes (9-12 points), +usual typesetting practice is to set \fIV\fR to 2\ points greater than the +point size; +\*(TR default is 10-point type on a 12-point spacing +(as in this document). +The current \fIV\fR is available in the \fB.v\fR register. +Multiple-\fIV\|\fR line separation (e.\|g. double spacing) may be requested +with \fBls\fR. +.sc +Extra line-space. +If a word contains a vertically tall construct requiring +the output line containing it to have extra vertical space +before and\(slor after it, +the \fIextra-line-space\fR function \fB\ex\fI\'N\|\|\'\fR +can be imbedded in or attached to that word. +In this and other functions having a pair of delimiters around +their parameter (here \fB\'\fR\|), +the delimiter choice is arbitrary, +except that it can't look like the continuation of a number expression for \fIN\fR. +If \fIN\fR is negative, +the output line containing the word will +be preceded by \fIN\fR extra vertical space; +if \fIN\fR is positive, +the output line containing the word +will be followed by \fIN\fR extra vertical space. +If successive requests for extra space apply to the same line, +the maximum values are used. +The most recently utilized post-line extra line-space is available in the \fB.a\fR register. +.sc +Blocks of vertical space. +A block of vertical space is ordinarily requested using \fBsp\fR, +which honors the \fIno-space\fR mode and which does +not space \fIpast\fR a trap. +A contiguous block of vertical space may be reserved using \fBsv\fR. +.h1 +.bt +\fB&vs\fI|N\fR 1\(sl6in;12pts previous E,\fBp\fR Set vertical base-line spacing size \fIV\fR. +Transient \fIextra\fR vertical space available with \fB\ex\fI\'N\|\|\'\fR (see above). +.bt +\fB&ls\fI|N\fR \fIN\(eq\^\fR1 previous E \fILine\fR spacing +set to \fI\(+-N\fR. +\fIN\(mi\fR1 \fIV\fR\^s \fI(blank lines)\fR are +appended to each output text line. The (read-only) number register \fB.L\fR +is set to contain the current line-spacing value. +Appended blank lines are omitted, if the text or previous appended blank line reached a trap position. +.bt +\fB&sp\fI|N\fR - \fIN\fR\(eq1\fIV\fR B,\fBv\fR Space vertically in \fIeither\fR direction. +If \fIN\fR is negative, the motion is \fIbackward\fR (upward) +and is limited to the distance to the top of the page. +Forward (downward) motion is truncated to the distance to the +nearest trap. +If the no-space mode is on, +no spacing occurs (see \fBns\fR, and \fBrs\fR below). +.bt +\fB&sv\fI|N\fR - \fIN\(eq\fR1\fIV\fR \fBv\fR Save a contiguous vertical block of size \fIN\fR. +If the distance to the next trap is greater +than \fIN\fR, \fIN\fR vertical space is output. +No-space mode has \fIno\fR effect. +If this distance is less than \fIN\fR, +no vertical space is immediately output, +but \fIN\fR is remembered for later output (see \fBos\fR). +Subsequent \fBsv\fR requests will overwrite any still remembered \fIN\fR. +.bt +\fB&os\fR - - - Output saved vertical space. +No-space mode has \fIno\fR effect. +Used to finally output a block of vertical space requested +by an earlier \fBsv\fR request. +.bt +\fB&ns\fR space - D No-space mode turned on. +When on, the no-space mode inhibits \fBsp\fR requests and +\fBbp\fR requests \fIwithout\fR a next page number. +The no-space mode is turned off when a line of +output occurs, or with \fBrs\fR. +.bt +\fB&rs\fR space - D Restore spacing. +The no-space mode is turned off. +.bt +Blank|text|line. - B Causes a break and +outputs a blank line just like \fBsp|1\fR. diff --git a/share/doc/usd/21.troff/m2 b/share/doc/usd/21.troff/m2 new file mode 100644 index 0000000..b94f478 --- /dev/null +++ b/share/doc/usd/21.troff/m2 @@ -0,0 +1,400 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m2 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.tr | +.rm mx +.br +.mh +Line Length and Indenting +.pg +The maximum line length for fill mode may be set with \fBll\fR. +The indent may be set with \fBin\fR; +an indent applicable to \fIonly\fR the \fInext\fR output line may be set with \fBti\fR. +The line length includes indent space but \fInot\fR +page offset space. +The line-length minus the indent is the basis for centering with \fBce\fR. +The effect of \fBll\fR, \fBin\fR, or \fBti\fR +is delayed, if a partially collected line exists, +until after that line is output. +In fill mode the length of text on an output line is less than or equal to +the line length minus the indent. +The current line length and indent are available in registers \fB.l\fR and \fB.i\fR respectively. +The length of \fIthree-part titles\fR produced by \fBtl\fR +(see \(sc14) is \fIindependently\fR set by \fBlt\fR. +.h1 +.bt +\fB&ll\fI|\(+-N\fR 6.5\|in previous E,\fBm\fR Line length is set to \(+-\fIN\fR. +In \*(TR the maximum (line-length)+(page-offset) is about 7.54 inches. +.bt +\fB&in\fI|\(+-N\fR \fIN\(eq\^\fR0 previous B,E,\fBm\fR Indent is set to \fI\(+-N\fR. +The indent is prepended to each output line. +.bt +\fB&ti\fI|\(+-N\fR - ignored B,E,\fBm\fR Temporary indent. +The \fInext\fR output text line will be indented a distance \fI\(+-N\fR +with respect to the current indent. +The resulting total indent may not be negative. +The current indent is not changed. +.mh +Macros, Strings, Diversion, and Position Traps +.sc +Macros and strings. +A \fImacro\fR is a named set of arbitrary \fIlines\fR that may be invoked by name or +with a \fItrap\fR. +A \fIstring\fR is a named string of \fIcharacters\fR, +\fInot\fR including a newline character, +that may be interpolated by name at any point. +Request, macro, and string names share the \fIsame\fR name list. +Macro and string names +may be one or two characters long and may usurp previously defined +request, macro, or string names. +Any of these entities may be renamed with \fBrn\fR +or removed with \fBrm\fR. +Macros are created by \fBde\fR and \fBdi\fR, and appended to by \fBam\fR and \fBda\fR; +\fBdi\fR and \fBda\fR cause normal output to be stored in a macro. +Strings are created by \fBds\fR and appended to by \fBas\fR. +A macro is invoked in the same way as a request; +a control line beginning \fB.\fIxx\fR will interpolate the contents of macro \fIxx\fR. +The remainder of the line may contain up to nine \fIarguments\fR. +The strings \fIx\fR and \fIxx\fR are interpolated at any desired point with +\fB\e\(**\fIx\fR and \fB\e\(**(\fIxx\fR respectively. +String references and macro invocations may be nested. +.sc +Copy mode input interpretation. +During the definition and extension +of strings and macros (not by diversion) +the input is read in \fIcopy mode\fR. +The input is copied without interpretation +\fIexcept\fR that: +.x1 +.ds + \v'-.1m'\s-4\(bu\s+4\v'+.1m' +\(bu The contents of number registers indicated by \fB\en\fR are interpolated. +\(bu Strings indicated by \fB\e\(**\fR are interpolated. +\(bu Arguments indicated by \fB\e$\fR are interpolated. +\(bu Concealed newlines indicated by \fB\e\fR(newline) are eliminated. +\(bu Comments indicated by \fB\e"\fR are eliminated. +\(bu \fB\et\fR and \fB\ea\fR are interpreted as \s-1ASCII\s+1 horizontal tab and \s-1SOH\s+1 respectively (\(sc9). +\(bu \fB\e\e\fR is interpreted as \fB\e\fR. +\(bu \fB\e.\fR is interpreted as "\fB.\fR". +.x2 +These interpretations can be suppressed by +prepending +a \fB\e\fR. +For example, since \fB\e\e\fR maps into a \fB\e\fR, \fB\e\en\fR will copy as \fB\en\fR which +will be interpreted as a number register indicator when the +macro or string is reread. +.sc +Arguments. +When a macro is invoked by name, the remainder of the line is +taken to contain up to nine arguments. +The argument separator is the space character, and arguments +may be surrounded by double-quotes to permit imbedded space characters. +Pairs of double-quotes may be imbedded in double-quoted arguments to +represent a single double-quote. +If the desired arguments won't fit on a line, +a concealed newline may be used to continue on the next line. +.pg +When a macro is invoked the \fIinput level\fR is \fIpushed down\fR and +any arguments available at the previous level become unavailable +until the macro is completely read and the previous level is restored. +A macro's own arguments can be interpolated at \fIany\fR point +within the macro with \fB\e$\fIN\fR, which interpolates the \fIN\fR\^th +argument +(1\(<=\fIN\fR\^\(<=9). +If an invoked argument doesn't exist, +a null string results. +For example, the macro \fIxx\fR may be defined by +.x1 +.ft B +.ta .75i +&de xx \e"begin definition +Today is \e\e$1 the \e\e$2. +&. \e"end definition +.ft R +.x2 +and called by +.x1 +.ft B +&xx Monday 14th +.ft R +.x2 +to produce the text +.x1 +.ft B +Today is Monday the 14th. +.ft R +.x2 +Note that the \fB\e$\fR +was concealed in the definition with a prepended \fB\e\fR. +The number of currently available +arguments is in the \fB.$\fR register. +.pg +No arguments are available at the top (non-macro) level +in this implementation. +Because string referencing is implemented +as an input-level push down, +no arguments are available from \fIwithin\fR a string. +No arguments are available within a trap-invoked macro. +.pg +Arguments are copied in \fIcopy mode\fR onto a stack +where they are available for reference. +The mechanism does not allow an argument to contain +a direct reference to a \fIlong\fR string +(interpolated at copy time) and it is advisable to +conceal string references (with an extra \fB\e\fR\|) +to delay interpolation until argument reference time. +.sc +Diversions. +Processed output may be diverted into a macro for purposes +such as footnote processing (see Tutorial \(scT5) +or determining the horizontal and vertical size of some text for +conditional changing of pages or columns. +A single diversion trap may be set at a specified vertical position. +The number registers \fBdn\fR and \fBdl\fR respectively contain the +vertical and horizontal size of the most +recently ended diversion. +Processed text that is diverted into a macro +retains the vertical size of each of its lines when reread +in \fInofill\fR mode +regardless of the current \fIV\fR. +Constant-spaced (\fBcs\fR) or emboldened (\fBbd\fR) text that is diverted +can be reread correctly only if these modes are again or still in effect +at reread time. +One way to do this is to imbed in the diversion the appropriate +\fBcs\fR or \fBbd\fR requests with the \fItransparent\fR +mechanism described in \(sc10.6. +.pg +Diversions may be nested +and certain parameters and registers +are associated +with the current diversion level +(the top non-diversion level may be thought of as the +0th diversion level). +These are the diversion trap and associated macro, +no-space mode, +the internally-saved marked place (see \fBmk\fR and \fBrt\fR), +the current vertical place (\fB.d\fR register), +the current high-water text base-line (\fB.h\fR register), +and the current diversion name (\fB.z\fR register). +.sc +Traps. +Three types of trap mechanisms are available\(empage traps, a diversion trap, and +an input-line-count trap. +Macro-invocation traps may be planted using \fBwh\fR at any page position including the top. +This trap position may be changed using \fBch\fR. +Trap positions at or below the bottom of the page +have no effect unless or until +moved to within the page or rendered effective by an increase in page length. +Two traps may be planted at the \fIsame\fR position only by first planting them at different +positions and then moving one of the traps; +the first planted trap will conceal the second unless and until the first one is moved +(see Tutorial Examples \(scT5). +If the first one is moved back, it again conceals the second trap. +The macro associated with a page trap is automatically +invoked when a line of text is output whose vertical size \fIreaches\fR +or \fIsweeps past\fR the trap position. +Reaching the bottom of a page springs the top-of-page trap, if any, +provided there is a next page. +The distance to the next trap position is available in the \fB.t\fR register; +if there are no traps between the current position and the bottom of the page, +the distance returned is the distance to the page bottom. +.pg +A macro-invocation trap effective in the current diversion may be planted using \fBdt\fR. +The \fB.t\fR register works in a diversion; if there is no subsequent trap a \fIlarge\fR +distance is returned. +For a description of input-line-count traps, see the \fBit\fR request below. +.h1 +.bt +\fB&de\fI|xx|yy\fR - \fI.yy=\fB..\fR - Define or redefine the macro \fIxx\fR. +The contents of the macro begin on the next input line. +Input lines are copied in \fIcopy mode\fR until the definition is terminated by a +line beginning with \fB.\fIyy\fR, +whereupon the macro \fIyy\fR is called. +In the absence of \fIyy\fR, the definition +is terminated by a +line beginning with "\fB..\fR". +A macro may contain \fBde\fR requests +provided the terminating macros differ +or the contained definition terminator is concealed. +\&"\fB..\fR" can be concealed as +\fB\e\e..\fR which will copy as \fB\e..\fR and be reread as "\fB..\fR". +.bt +\fB&am\fI|xx|yy\fR - \fI.yy=\fB..\fR - Append to macro (append version of \fBde\fR). +.bt +\fB&ds\fI|xx|string\fR - ignored - Define a string +\fIxx\fR containing \fIstring\fR. +Any initial double-quote in \fIstring\fR is stripped off to permit +initial blanks. +.bt +\fB&as\fI|xx|string\fR - ignored - Append +\fIstring\fR to string \fIxx\fR +(append version of \fBds\fR). +.bt +\fB&rm\fI|xx\fR - ignored - Remove +request, macro, or string. +The name \fIxx\fR is removed from the name list and +any related storage space is freed. +Subsequent references will have no effect. +.bt +\fB&rn\fI|xx|yy\fR - ignored - Rename request, macro, or string +\fIxx\fR to \fIyy\fR. +If \fIyy\fR exists, it is first removed. +.bt +\fB&di|\fIxx\fR - end D Divert output to macro \fIxx\fR. +Normal text processing occurs during diversion +except that page offsetting is not done. +The diversion ends when the request \fBdi\fR or \fBda\fR is encountered without an argument; +extraneous +requests of this type should not appear when nested diversions are being used. +.bt +\fB&da|\fIxx\fR - end D Divert, appending to \fIxx\fR +(append version of \fBdi\fR). +.bt +\fB&wh\fI|N|xx\fR - - \fBv\fR Install +a trap to invoke \fIxx\fR at page position \fIN;\fR +a \fInegative N\fR will be interpreted with respect to the +page \fIbottom\fR. +Any macro previously planted at \fIN\fR is replaced by \fIxx\fR. +A zero \fIN\fR refers to the \fItop\fR of a page. +In the absence of \fIxx\fR, the first found trap at \fIN\fR, if any, is removed. +.bt +\fB&ch\fI|xx|N\fR - - \fBv\fR Change +the trap position for macro \fIxx\fR to be \fIN\fR. +In the absence of \fIN\fR, the trap, if any, is removed. +.bt +\fB&dt\fI|N|xx\fR - off D,\fBv\fR Install a diversion trap +at position \fIN\fR in the \fIcurrent\fR diversion to invoke +macro \fIxx\fR. +Another \fBdt\fR will redefine the diversion trap. +If no arguments are given, the diversion trap is removed. +.bt +\fB&it\fI|N|xx\fR - off E Set an input-line-count trap +to invoke the macro \fIxx\fR after \fIN\fR lines of \fItext\fR input +have been read +(control or request lines don't count). +The text may be in-line text or +text interpolated by inline or trap-invoked macros. +.bt +\fB&em\fI|xx\fR none none - The +macro \fIxx\fR will be invoked +when all input has ended. +The effect is the same as if the contents of \fIxx\fR had been at the end +of the last file processed. +.mh +Number Registers +.pg +A variety of parameters are available to the user as +predefined, named \fInumber registers\fR (see Summary and Index, page 7). +In addition, the user may define his own named registers. +Register names are one or two characters long and \fIdo not\fR conflict +with request, macro, or string names. +Except for certain predefined read-only registers, +a number register can be read, written, automatically +incremented or decremented, and interpolated +into the input in a variety of formats. +One common use of user-defined registers is to +automatically number sections, paragraphs, lines, etc. +A number register may be used any time numerical input is expected or desired +and may be used in numerical \fIexpressions\fR (\(sc1.4). +.pg +Number registers are created and modified using \fBnr\fR, which +specifies the name, numerical value, and the auto-increment size. +Registers are also modified, if accessed +with an auto-incrementing sequence. +If the registers \fIx\fR and \fIxx\fR both contain +\fIN\fR and have the auto-increment size \fIM\fR, +the following access sequences have the effect shown: +.TS +center box; +c2|c2|c +c2|c2|c +l2|c2|c +l2|c2|c +l2|l2|c. + Effect on Value +Sequence Register Interpolated +_ +\fB\en\fIx\fR none \fIN\fR +\fB\en(\fIxx\fR none \fIN\fR +\fB\en+\fIx\fR \fIx\fR incremented by \fIM\fR \fIN+M\fR +\fB\en\-\fIx\fR \fIx\fR decremented by \fIM\fR \fIN\-M\fR +\fB\en+(\fIxx\fR \fIxx\fR incremented by \fIM\fR \fIN+M\fR +\fB\en\-(\fIxx\fR \fIxx\fR decremented by \fIM\fR \fIN\-M\fR +.TE +When interpolated, a number register is converted to +decimal (default), +decimal with leading zeros, +lower-case Roman, +upper-case Roman, +lower-case sequential alphabetic, +or +upper-case sequential alphabetic +according to the format specified by \fBaf\fR. +.h1 +.bt +\fB&nr\fI|R|\(+-N|M\fR - - \fBu\fR \ +The number register \fIR\fR is assigned the value \fI\(+-N\fR +with respect to the previous value, if any. +The increment for auto-incrementing is set to \fIM\fR. +.bt +\fB&af\fI|R|c\fR arabic - - Assign format \fIc\fR to register \fIR\fR. +The available formats are: +.TS +center box; +c2|c +c2|c +c2|l. + Numbering +Format Sequence +_ +\fB1\fR 0,1,2,3,4,5,... +\fB001\fR 000,001,002,003,004,005,... +\fBi\fR 0,i,ii,iii,iv,v,... +\fBI\fR 0,I,II,III,IV,V,... +\fBa\fR 0,a,b,c,...,z,aa,ab,...,zz,aaa,... +\fBA\fR 0,A,B,C,...,Z,AA,AB,...,ZZ,AAA,... +.TE +An arabic format having \fIN\fR digits +specifies a field width of \fIN\fR digits (example 2 above). +The read-only registers and the \fIwidth\fR function (\(sc11.2) +are always arabic. +.bt +\fB&rr\fI|R\fR - ignored - Remove register \fIR\fR. +If many registers are being created dynamically, it +may become necessary to remove no longer used registers +to recapture internal storage space for newer registers. diff --git a/share/doc/usd/21.troff/m3 b/share/doc/usd/21.troff/m3 new file mode 100644 index 0000000..01e0023 --- /dev/null +++ b/share/doc/usd/21.troff/m3 @@ -0,0 +1,521 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m3 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.tr | +.rm mx +.mh +Tabs, Leaders, and Fields +.sc +Tabs and leaders. +The \s-1ASCII\s+1 horizontal tab character and the \s-1ASCII\s+1 +\s-1SOH\s+1 (hereafter known as the \fIleader\fR character) +can both be used to generate either horizontal motion or +a string of repeated characters. +The length of the generated entity is governed +by internal \fItab stops\fR specifiable +with \fBta\fR. +The default difference is that tabs generate motion and leaders generate +a string of periods; +\fBtc\fR and \fBlc\fR +offer the choice of repeated character or motion. +There are three types of internal tab stops\(em\ +\fIleft\fR adjusting, \fIright\fR adjusting, +and \fIcentering\fR. +In the following table: +\fID\fR is the distance from the current position on the \fIinput\fR line +(where a tab or leader was found) +to the next tab stop; +\fInext-string\fR consists +of the input characters following the tab (or leader) up to the next tab (or leader) or end of line; +and +\fIW\fR is the width of \fInext-string\fR. +.TS +center box; +c2|c2|c +c2|c2|c +c2|c2|l. +Tab Length of motion or Location of +type repeated characters \fInext-string\fR +_ +Left \fID\fR Following \fID\fR +Right \fID\-W\fR Right adjusted within \fID\fR +Centered \fID\-W\(sl\fR2 Centered on right end of \fID\fR +.TE +The length of generated motion is allowed to be negative, but +that of a repeated character string cannot be. +Repeated character strings contain an integer number of characters, and +any residual distance is prepended as motion. +Tabs or leaders found after the last tab stop are ignored, but may be used +as \fInext-string\fR terminators. +.pg +Tabs and leaders are not interpreted in \fIcopy mode\fR. +\fB\et\fR and \fB\ea\fR always generate a non-interpreted +tab and leader respectively, and +are equivalent to actual tabs and leaders in \fIcopy mode\fR. +.sc +Fields. +A \fIfield\fR is contained between +a \fIpair\fR of \fIfield delimiter\fR characters, +and consists of sub-strings +separated by \fIpadding\fR indicator characters. +The field length is the distance on the +\fIinput\fR line from the position where the field begins to the next tab stop. +The difference between the total length of all the sub-strings +and the field length is incorporated as horizontal +padding space that is divided among the indicated +padding places. +The incorporated padding is allowed to be negative. +For example, +if the field delimiter is \fB#\fR and the padding indicator is \fB^\fR, +\fB#^\fIxxx\fB^\fIright\|\fB#\fR +specifies a right-adjusted string with the string \fIxxx\fR centered +in the remaining space. +.h1 +.bt +\fB&ta\fI|Nt|...\fR 8n;|0.5in none E,\fBm\fR \ +Set tab stops and types. +\fIt=\fBR\fR, right adjusting; +\fIt=\fBC\fR, centering; +\fIt\fR absent, left adjusting. +\*(TR tab stops are preset every 0.5in.; +\*(NR every 8 character widths. +The stop values are separated by spaces, and +a value preceded by \fB+\fR +is treated as an increment to the previous stop value. +.bt +\fB&tc\fI|c\fR none none E \ +The tab repetition character becomes \fIc\fR, +or is removed specifying motion. +.bt +\fB&lc\fI|c\fR \fB.\fR none E \ +The leader repetition character becomes \fIc\fR, +or is removed specifying motion. +.bt +\fB&fc\fI|a|b\fR off off - \ +The field delimiter is set to \fIa\fR; +the padding indicator is set to the \fIspace\fR character or to +\fIb\fR, if given. +In the absence of arguments the field mechanism is turned off. +.mh +Input and Output Conventions and Character Translations +.sc +Input character translations. +Ways of inputting the graphic character set were +discussed in \(sc2.1. +The \s-1ASCII\s+1 control characters horizontal tab (\(sc9.1), +\s-1SOH\s+1 (\(sc9.1), and backspace (\(sc10.3) are discussed elsewhere. +The newline delimits input lines. +In addition, +\s-1STX\s+1, \s-1ETX\s+1, \s-1ENQ\s+1, \s-1ACK\s+1, and \s-1BEL\s+1 +are accepted, +and may be used as delimiters or translated into a graphic with \fBtr\fR (\(sc10.5). +\fIAll\fR others are ignored. +.pg +The \fIescape\fR character \fB\e\fR +introduces \fIescape sequences\fR\(em\ +causes the following character to mean +another character, or to indicate +some function. +A complete list of such sequences is given in the Summary and Index on page 6. +\fB\e\fR +should not be confused with the \s-1ASCII\s+1 control character \s-1ESC\s+1 of the +same name. +The escape character \fB\e\fR can be input with the sequence \fB\e\e\fR. +The escape character can be changed with \fBec\fR, +and all that has been said about the default \fB\e\fR becomes true +for the new escape character. +\fB\ee\fR can be used to print whatever the current escape character is. +If necessary or convenient, the escape mechanism may be turned off with \fBeo\fR, +and restored with \fBec\fR. +.h1 +.bt +\fB&ec\fI|c\fR \fB\e\fR \fB\e\fR - \ +Set escape character to \fB\e\fR, or to \fIc\fR, if given. +.bt +\fB&eo\fR on - - Turn escape mechanism off. +.sc +Ligatures. +.lg 0 +Five ligatures are available +in the current \*(TR character set \(em +\fB\(fi\fR, \fB\(fl\fR, \fB\(ff\fR, \fB\(Fi\fR, and \fB\(Fl\fR. +They may be input (even in \*(NR) by +\fB\e(fi\fR, \fB\e(fl\fR, \fB\e(ff\fR, \fB\e(Fi\fR, and \fB\e(Fl\fR respectively. +.lg +The ligature mode is normally on in \*(TR, and \fIautomatically\fR invokes +ligatures during input. +.h1 +.bt +\fB&lg\fI|N\fR off;|on on - Ligature mode +is turned on if \fIN\fR is absent or non-zero, +and turned off if \fIN\(eq\^\fR0. +If \fIN\fR\(eq\^2, only the two-character ligatures are automatically invoked. +Ligature mode is inhibited for +request, macro, string, register, or file names, +and in \fIcopy mode\fR. +No effect in \*(NR. +.sc +Backspacing, underlining, overstriking, etc. +Unless in \fIcopy mode\fR, the \s-1ASCII\s+1 backspace character is replaced +by a backward horizontal motion having the width of the +space character. +Underlining as a form of line-drawing is discussed in \(sc12.4. +A generalized overstriking function is described in \(sc12.1. +.pg +\*(NR automatically underlines +characters in the \fIunderline\fR font, +specifiable with \fBuf\fR, +normally Times Italic on font position 2 (see \(sc2.2). +In addition to \fBft\fR and \fB\ef\fIF\fR, +the underline font may be selected by \fBul\fR and \fBcu\fR. +Underlining is restricted to an output-device-dependent +subset of \fIreasonable\fR characters. +.h1 +.bt +\fB&ul\fI|N\fR off \fIN\(eq\fR1 E \ +Underline in \*(NR (italicize in \*(TR) the next \fIN\fR +input text lines. +Actually, switch to \fIunderline\fR font, saving the +current font for later restoration; +\fIother\fR font changes within the span of a \fBul\fR +will take effect, +but the restoration will undo the last change. +Output generated by \fBtl\fR (\(sc14) \fIis\fR affected by the +font change, but does \fInot\fR decrement \fIN\fR. +If \fIN\fR\^>\^1, there is the risk that +a trap interpolated macro may provide text +lines within the span; +environment switching can prevent this. +.bt +\fB&cu\fI|N\fR off \fIN\(eq\fR1 E \ +A variant of \fBul\fR that causes \fIevery\fR character to be underlined in \*(NR. +Identical to \fBul\fR in \*(TR. +.bt +\fB&uf\fI|F\fR Italic Italic - \ +Underline font set to \fIF\fR. +In \*(NR, +\fIF\fR may \fInot\fR be on position 1 (initially Times Roman). +.sc +Control characters. +Both the control character \fB.\fR and the \fIno-break\fR +control character \fB\'\fR may be changed, if desired. +Such a change must be compatible with the design +of any macros used in the span of the change, +and +particularly of any trap-invoked macros. +.h1 +.bt +\fB&cc\fI|c\fR \fB.\fR \fB.\fR E \ +The basic control character is set to \fIc\fR, +or reset to "\fB.\fR". +.bt +\fB&c2\fI|c\fR \fB\' \'\fR E The \fInobreak\fR control character is set +to \fIc\fR, or reset to "\fB\'\fR". +.sc +Output translation. +One character can be made a stand-in for another character using \fBtr\fR. +All text processing (e. g. character comparisons) takes place +with the input (stand-in) character which appears to have the width of the final +character. +The graphic translation occurs at the moment of output +(including diversion). +.h1 +.bt +\fB&tr\fI|abcd....\fR none - O Translate \ +\fIa\fR into \fIb\fR, \fIc\fR into \fId\fR, etc. +If an odd number of characters is given, +the last one will be mapped into the space character. +To be consistent, a particular translation +must stay in effect from \fIinput\fR to \fIoutput\fR time. +.sc +Transparent throughput. +An input line beginning with a \fB\e!\fR is read in \fIcopy mode\fR and \fItransparently\fR output +(without the initial \fB\e!\fR); +the text processor is otherwise unaware of the line's presence. +This mechanism may be used to pass control information to a post-processor +or to imbed control lines in a macro created by a diversion. +.sc +Comments and concealed newlines. +An uncomfortably long input line that must stay +one line (e. g. a string definition, or nofilled text) +can be split into many physical lines by ending all but +the last one with the escape \fB\e\fR. +The sequence \fB\e\fR(newline) is \fIalways\fR ignored\(em\ +except in a comment. +Comments may be imbedded at the \fIend\fR of any line by +prefacing them with \fB\e"\fR. +The newline at the end of a comment cannot be concealed. +A line beginning with \fB\e"\fR will appear as a blank line and +behave like \fB.sp|1\fR; +a comment can be on a line by itself by beginning the line with \fB.\e"\fR. +.mh +Local Horizontal and Vertical Motions, and the Width Function +.sc +Local Motions. +The functions \fB\ev\'\fIN\fB\|\'\fR and +\fB\eh\'\fIN\fB\|\'\fR +can be used for \fIlocal\fR vertical and horizontal motion respectively. +The distance \fIN\fR may be negative; the \fIpositive\fR directions +are \fIrightward\fR and \fIdownward\fR. +A \fIlocal\fR motion is one contained \fIwithin\fR a line. +To avoid unexpected vertical dislocations, it is necessary that +the \fInet\fR vertical local motion within a word in filled text +and otherwise within a line balance to zero. +The above and certain other escape sequences providing local motion are +summarized in the following table. +.tr || +.ds X \0\0\0 +.TS +center box; +c2|cs2||c2|cs +c1|c2c2||c2|c2c. +Vertical Effect in Horizontal Effect in +Local Motion \*(TR \*(NR Local Motion \*(TR \*(NR +_ +.sp .4 +.T& +l2|ls2||l2|ls. +\fB\*X\ev\'\fIN\|\^\fB\'\fR Move distance \fIN\fR \ +\fB\*X\eh\'\fIN\|\^\fB\'\fR Move distance \fIN\fR +.T& +_2|_2_2||l2|ls. + \fB\*X\e\fR(space) Unpaddable space-size space +.T& +l2|l2|l2||l2|ls. +\fB\*X\eu\fR \(12 em up \(12 line up \fB\*X\e0\fR Digit-size space +.T& +l2|l2|l2||_2|_2_. +\fB\*X\ed\fR \(12 em down \(12 line down +.T& +l2|l2|l2||l2|l2|l. +\fB\*X\er\fR 1 em up 1 line up \fB\*X\e\||\fR 1\(sl6 em space ignored + \fB\*X\e^\fR 1\(sl12 em space ignored +.sp .4 +.TE +.rm X +.tr | +As an example, +\fBE\s-2\v'-.4m'2\v'.4m'\s+2\fR +could be generated by the sequence +\fBE\es\-2\ev\'\-0.4m\'2\ev\'0.4m\'\es+2\fR; +it should be noted in this example that +the 0.4|em vertical motions are at the smaller size. +.sc +Width Function. +The \fIwidth\fR function \fB\ew\'\fIstring\fB\|\'\fR +generates the numerical width of \fIstring\fR (in basic units). +Size and font changes may be safely imbedded in \fIstring\fR, +and will not affect the current environment. +For example, +\&\fB.ti|\-\\w\'1.|\'u\fR could be used to +temporarily indent leftward a distance equal to the +size of the string "\fB1.|\fR". +.pg +The width function also sets three number registers. +The registers \fBst\fR and \fBsb\fR are set respectively to the highest and +lowest extent of \fIstring\fR relative to the baseline; +then, for example, +the total \fIheight\fR of the string is \fB\en(stu\-\en(sbu\fR. +In \*(TR the number register \fBct\fR is set to a value +between 0|and|3: +0 means that all of the characters in \fIstring\fR were short lower +case characters without descenders (like \fBe\fR); +1 means that at least one character has a descender (like \fBy\fR); +2 means that at least one character is tall (like \fBH\fR); +and 3 means that both tall characters and characters with +descenders are present. +.sc +Mark horizontal place. +The escape sequence \fB\ek\fIx\fR will cause the \fIcurrent\fR horizontal +position in the \fIinput line\fR to be stored in register \fIx\fR. +As an example, +the construction \fB\ekx\fIword\|\fB\eh\'\|~\|\enxu+2u\'\fIword\fB\fR +will embolden \fIword\fR by backing up to almost its beginning and overprinting it, +resulting in \kz\fIword\fR\h'|\nzu+2u'\fIword\fR. +.mh +Overstrike, Bracket, Line-drawing, and Zero-width Functions +.sc +Overstriking. +Automatically centered overstriking of up to nine characters +is provided by the \fIoverstrike\fR function +\fB\eo\'\fIstring\fB\|\'\fR. +The characters in \fIstring\fR are overprinted with centers aligned; the total width +is that of the widest character. +\fIstring\fR should \fInot\fR contain local vertical motion. +As examples, +\fB\eo\'e\e\'\'\fR produces \fB\o'e\''\fR, and +\fB\eo\'\e(mo\e(sl\'\fR produces \fB\o'\(mo\(sl'\fR. +.sc +Zero-width characters. +The function \fB\ez\fIc\fR will output \fIc\fR without spacing over +it, and can be used to produce left-aligned overstruck +combinations. +As examples, +\fB\ez\e(ci\e(pl\fR will produce \fB\z\(ci\(pl\fR, and +\fB\e(br\ez\e(rn\e(ul\e(br\fR will produce the smallest possible +constructed box \fB\(br\z\(rn\(ul\(br\fR\|. +.sc +Large Brackets. +The Special Mathematical Font contains a number of bracket construction pieces +(\|\|\|\(lt\|\|\|\(lb\|\|\|\(rt\|\|\|\(rb\|\|\|\(lk\|\|\|\(rk\|\|\|\(bv\|\|\|\(lf\|\|\|\(rf\|\|\|\(lc\|\|\|\(rc\|\|) +that can be combined into various bracket styles. +The function \fB\eb\'\fIstring\fB\|\'\fR may be used to pile +up vertically the characters in \fIstring\fR +(the first character on top and the last at the bottom); +the characters are vertically separated by 1|em and the total +pile is centered 1\(sl2\|em above the current baseline +(\(12 line in \*(NR). +For example, +\fB\eb\'\|\e(lc\e(lf\|\'E\e\|~\|\eb\'\|\e(rc\e(rf\|\'\|\ex\'\|\-0.5m\'\|\ex\'0.5m\'\|\fR +produces +\x'-.5m'\x'.5m'\fB\b'\(lc\(lf'E\|\b'\(rc\(rf'\fR. +.sc +Line drawing. +.tr && +The function \fB\e\|l\|\'\fINc\fB\|\'\fR will draw a string of repeated \fIc\fR\|'s towards the right for a distance \fIN\fR. +(\|\fB\el\fR is \fB\e\fR(lower case L). +If \fIc\fR looks like a continuation of +an expression for \fIN\fR, it may insulated from \fIN\fR with a \fB\e&\fR. +If \fIc\fR is not specified, the \fB\(ru\fR (baseline rule) is used +(underline character in \*(NR). +If \fIN\fR is negative, a backward horizontal motion +of size \fIN\fR is made \fIbefore\fR drawing the string. +Any space resulting from \fIN\fR\|\(sl(size of \fIc\fR) having a remainder is put at the beginning (left end) +of the string. +In the case of characters +that are designed to be connected such as +baseline-rule\ \fB\(ru\fR\|, +underrule\ \fB\(ul\fR\|, +and +root-en\ \fB\(rn\fR\|, +the remainder space is covered by over-lapping. +If \fIN\fR is \fIless\fR than the width of \fIc\fR, +a single \fIc\fR is centered on a distance \fIN\fR. +As an example, a macro to underscore a string can be written +.br +.tr &. +.x1 +.ft B +.ne 2.1 +&de us +\e\e$1\e\|l\|\'\|~\|0\e(ul\' +&& +.ft R +.x2 +.ne 2.1 +.de xu +\\$1\l'|0\(ul' +.. +or one to draw a box around a string +.x1 +.ft B +&de bx +\e(br\e\|~\|\e\e$1\e\|~\|\e(br\e\|l\|\'\|~\|0\e(rn\'\e\|l\|\'\|~\|0\e(ul\' +&& +.ft R +.x2 +.de bx +\(br\|\\$1\|\(br\l'|0\(rn'\l'|0\(ul' +.. +such that +.x1 +.ft B +&us "underlined words" +.ft R +.x2 +and +.x1 +.ft B +&bx "words in a box" +.ft R +.x2 +yield +.xu "underlined words" +and +.bx "words in a box" +\h'-\w'.'u'. +.pg +The function \fB\eL\'\|\fINc\fB\|\'\fR will draw a vertical line consisting +of the (optional) character \fIc\fR stacked vertically apart 1\|em +(1 line in \*(NR), +with the first two characters overlapped, +if necessary, to form a continuous line. +The default character is the \fIbox rule\fR |\(br| (\fB\|\e(br\fR); +the other suitable character is the \fIbold vertical\fR \|\(bv\| (\fB\|\e(bv\fR). +The line is begun without any initial motion relative to the +current base line. +A positive \fIN\fR specifies a line drawn downward and +a negative \fIN\fR specifies a line drawn upward. +After the line is drawn \fIno\fR compensating +motions are made; +the instantaneous baseline is at the \fIend\fR of the line. +.pg +.de eb +.sp -1 +.nf +\h'-.5n'\L'|\\nzu-1'\l'\\n(.lu+1n\(ul'\L'-|\\nzu+1'\l'|0u-.5n\(ul' +.fi +.. +.ne 2i +.mk z +The horizontal and vertical line drawing functions may be used +in combination to produce large boxes. +The zero-width \fIbox-rule\fR and the \(12-em wide \fIunderrule\fR +were \fIdesigned\fR to form corners when using 1-em vertical +spacings. +For example the macro +.x1 +.ft B +\&.de eb +\&.sp \-1 \e"compensate for next automatic base-line spacing +\&.nf \e"avoid possibly overflowing word buffer +.tr || +\&\eh\'\-.5n\'\eL\'\||\|\e\enau\-1\'\el\'\e\en(.lu+1n\e(ul\'\eL\'\-\||\|\e\enau+1\'\el\'\||\|0u\-.5n\e(ul\' \e"draw box +.tr | +.lg 0 +\&.fi +.lg +\&.. +.ft R +.x2 +will draw a box around some text whose beginning vertical place was +saved in number register \fIa\fR +(e. g. using \fB.mk|a\fR) +as done for this paragraph. +.eb diff --git a/share/doc/usd/21.troff/m4 b/share/doc/usd/21.troff/m4 new file mode 100644 index 0000000..931ac48 --- /dev/null +++ b/share/doc/usd/21.troff/m4 @@ -0,0 +1,416 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m4 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.tr | +.mh +Hyphenation. +.pg +The automatic hyphenation may be switched off and on. +When switched on with \fBhy\fR, +several variants may be set. +A \fIhyphenation indicator\fR character may be imbedded in a word to +specify desired hyphenation points, +or may be prepended to suppress hyphenation. +In addition, +the user may specify a small exception word list. +.pg +Only words that consist of a central alphabetic string +surrounded by (usually null) non-alphabetic strings +are considered candidates for automatic hyphenation. +Words that were input containing hyphens +(minus), +em-dashes (\fB\e(em\fR), +or hyphenation indicator characters\ +\(emsuch as mother-in-law\(em\ +are \fIalways\fR subject to splitting after those characters, +whether or not automatic hyphenation is on or off. +.h1 +.bt +\fB&nh\fR hyphenate - E \ +Automatic hyphenation is turned off. +.bt +\fB&hy\fIN\fR on,\fIN=\fR1 on,\fIN=\fR1 E \ +Automatic hyphenation is turned on +for \fIN\fR\|\(>=1, or off for \fIN=\fR\|0. +If \fIN=\fR\|2, \fIlast\fR lines (ones that will cause a trap) +are not hyphenated. +For \fIN=\fR\|4 and 8, the last and first two characters +respectively of a word are not split off. +These values are additive; +i.|e. \fIN=\fR\|14 will invoke all three restrictions. +.bt +\fB&hc\fI|c\fR \fB\e% \e%\fR E Hyphenation indicator character is set +to \fIc\fR or to the default \fB\e%\fR. +The indicator does not appear in the output. +.bt +\fB&hw\fI|word1|...\fR ignored - Specify hyphenation points in words +with imbedded minus signs. +Versions of a word with terminal \fIs\fR are implied; +i.|e. \fIdig\-it\fR implies \fIdig\-its\fR. +This list is examined initially \fIand\fR after +each suffix stripping. +The space available is small\(emabout 128 characters. +.mh +Three Part Titles. +.pg +The titling function \fBtl\fR provides for automatic placement +of three fields at the left, center, and right of a line +with a title-length +specifiable with \fBlt\fR. +\fBtl\fR may be used anywhere, and is independent of the +normal text collecting process. +A common use is in header and footer macros. +.h1 +.bt +\fB&tl\fI|\'left\|\'center\|\'right\|\'\fR - - \ +The strings \fIleft\fR, \fIcenter\fR, and \fIright\fR are +respectively left-adjusted, centered, and right-adjusted +in the current title-length. +Any of the strings may be empty, +and overlapping is permitted. +If the page-number character (initially \fB%\fR) is found within any of the fields it is replaced +by the current page number having the format assigned to register \fB%\fR. +Any character may be used as the string delimiter. +.bt +\fB&pc\fI|c\fR \fB%\fR off - The page number character is set to \fIc\fR, +or removed. +The page-number register remains \fB%\fR. +.bt +\fB<\fI|\(+-N\fR 6.5\|in previous E,\fBm\fR Length of title set to \fI\(+-N\fR. +The line-length and the title-length are \fIindependent\fR. +Indents do not apply to titles; page-offsets do. +.mh +Output Line Numbering. +.pg +.ll -\w'0000'u +.nm 1 3 +Automatic sequence numbering of output lines may be +requested with \fBnm\fR. +When in effect, +a three-digit, arabic number plus a digit-space +is prepended to output text lines. +The text lines are thus offset by four digit-spaces, +and otherwise retain their line length; +a reduction in line length may be desired to keep the right margin +aligned with an earlier margin. +Blank lines, other vertical spaces, and lines generated by \fBtl\fR +are \fInot\fR numbered. +Numbering can be temporarily suspended with \fBnn\fR, +or with an \fB.nm\fR followed by a later \fB.nm|+0\fR. +In addition, +a line number indent \fII\fR, and the number-text separation \fIS\fR +may be specified in digit-spaces. +Further, it can be specified that only those line numbers that are +multiples of some number \fIM\fR are to be printed (the others will appear +as blank number fields). +.br +.nm +.ll +.h1 +.bt +\fB&nm\fI|\(+-N|M|S|I\fR off E \ +Line number mode. +If \fI\(+-N\fR is given, +line numbering is turned on, +and the next output line numbered is numbered \fI\(+-N\fR. +Default values are \fIM=\fR\|1, \fIS=\fR\|1, and \fII=\fR\|0. +Parameters corresponding to missing arguments are unaffected; +a non-numeric argument is considered missing. +In the absence of all arguments, numbering is turned off; +the next line number is preserved for possible further use +in number register \fBln\fR. +.bt +\fB&nn\fI|N\fR - \fIN=\fR1 E The next \fIN\fR text output lines are not +numbered. +.pg +.ll -\w'0000'u +.nm +0 +As an example, the paragraph portions of this section +are numbered with \fIM=\fR\|3: +\&\fB.nm|1|3\fR was placed at the beginning; +\&\fB.nm\fR was placed at the end of the first paragraph; +and \fB.nm|+0\fR was placed in front of this paragraph; +and \fB.nm\fR finally placed at the end. +Line lengths were also changed (by \fB\ew\'0000\'u\fR) to keep the right side aligned. +Another example is +\&\fB.nm|+5|5|x|3\fR which turns on numbering with the line number of the next +line to be five greater than the last numbered line, +with \fIM=\fR\|5, with spacing \fIS\fR untouched, and with the indent \fII\fR set to 3. +.br +.ll +.nm +.mh +Conditional Acceptance of Input +.pg +In the following, +\fIc\fR is a one-character, built-in \fIcondition\fR name, +\fB!\fR signifies \fInot\fR, +\fIN\fR is a numerical expression, +\fIstring1\fR and \fIstring2\fR are strings delimited by any non-blank, non-numeric character \fInot\fR in the strings, +and +\fIanything\fR represents what is conditionally accepted. +.h1 +.bt +\fB&if\fI|c|anything\fR - - If condition \fIc\fR true, accept \fIanything\fR as input; +in multi-line case use \fI\e{anything\|\e}\fR. +.bt +\fB&if|!\fIc|anything\fR - - If condition \fIc\fR false, accept \fIanything\fR. +.bt +\fB&if\fI|N|anything\fR - \fBu\fR If expression \fIN\fR > 0, accept \fIanything\fR. +.bt +\fB&if|!\fIN|anything\fR - \fBu\fR If expression \fIN\fR \(<= 0, accept \fIanything\fR. +.bt +\fB&if\fI|\|\'string1\|\'string2\|\'|anything\fR - If \fIstring1\fR identical to \fIstring2\fR, +accept \fIanything\fR. +.bt +\fB&if|!\fI\|\'string1\|\'string2\|\'|anything\fR - If \fIstring1\fR not identical to \fIstring2\fR, +accept \fIanything\fR. +.bt +\fB&ie\fI|c|anything\fR - \fBu\fR If portion of if-else; all above forms (like \fBif\fR). +.bt +\fB&el\fI|anything\fR - - Else portion of if-else. +.pg +The built-in condition names are: +.TS +center box; +c2|c +c2|c +c2|l. +Condition +Name True If +_ +\fBo\fR Current page number is odd +\fBe\fR Current page number is even +\fBt\fR Formatter is \*(TR +\fBn\fR Formatter is \*(NR +.TE +If the condition \fIc\fR is \fItrue\fR, or if the number \fIN\fR is greater than zero, +or if the strings compare identically (including motions and character size and font), +\fIanything\fR is accepted as input. +If a \fB!\fR precedes the condition, number, or string comparison, +the sense of the acceptance is reversed. +.pg +Any spaces between the condition and the beginning of \fIanything\fR are skipped over. +The \fIanything\fR can be either a single input line (text, macro, or whatever) +or a number of input lines. +In the multi-line case, +the first line must begin with a left delimiter \fB\e{\fR and +the last line must end with a right delimiter \fB\e}\fR. +.pg +The request \fBie\fR (if-else) is identical to \fBif\fR +except that the acceptance state is remembered. +A subsequent and matching \fBel\fR (else) request then uses the reverse sense of that state. +\fBie\fR|-|\fBel\fR pairs may be nested. +.pg +Some examples are: +.x1 +.ft B +.ne 1 +&if e .tl \'\|Even Page %\'\'\' +.ft R +.x2 +which outputs a title if the page number is even; and +.x1 +.ft B +.ne 3.1 +&ie \en%>1 \e{\e +\&\'sp 0.5i +&tl \'\|Page %\'\'\' +\&\'sp ~\|1.2i|\e} +&el .sp ~\|2.5i +.ft R +.x2 +which treats page 1 differently from other pages. +.mh +Environment Switching. +.pg +A number of the parameters that +control the text processing are gathered together into an +\fIenvironment\fR, which can be switched by the user. +The environment parameters are those associated +with requests noting E in their \fINotes\fR column; +in addition, partially collected lines and words are in the environment. +Everything else is global; examples are page-oriented parameters, +diversion-oriented parameters, number registers, and macro and string definitions. +All environments are initialized with default parameter values. +.h1 +.bt +\fB&ev\fI|N\fR \fIN\(eq\fR0 previous - Environment switched to +environment 0\(<=\fIN\fR\(<=2. +Switching is done in push-down fashion so that +restoring a previous environment \fImust\fR be done with \fB.ev\fR +rather than specific reference. +.mh +Insertions from the Standard Input +.pg +The input can be temporarily switched to the system \fIstandard input\fR +with \fBrd\fR, +which will switch back when \fItwo\fR newlines +in a row are found (the \fIextra\fR blank line is not used). +This mechanism is intended for insertions in form-letter-like documentation. +On \s-1UNIX\s+1, the \fIstandard input\fR can be the user's keyboard, +a \fIpipe\fR, or a \fIfile\fR. +.h1 +.bt +\fB&rd\fI|prompt\fR - \fIprompt=\fR\s-1BEL\s+1 \ +Read insertion from the standard input until two newlines in a row are found. +If the standard input is the user's keyboard, \fIprompt\fR (or a \s-1BEL\s+1) +is written onto the user's terminal. +\fBrd\fR behaves like a macro, +and arguments may be placed after \fIprompt\fR. +.bt +\fB&ex\fR - - - Exit from \*(NR\(sl\*(TR. +Text processing is terminated exactly as if all input had ended. +.pg +If insertions are to be +taken from the terminal keyboard \fIwhile\fR output is being printed +on the terminal, the command line option \fB\-q\fR will turn off the echoing +of keyboard input and prompt only with \s-1BEL\s+1. +The regular input and insertion input \fIcannot\fR +simultaneously come from the standard input. +.pg +As an example, +multiple copies of a form letter may be prepared by entering the insertions +for all the copies in one file to be used as the standard input, +and causing the file containing the letter to reinvoke itself using \fBnx\fR (\(sc19); +the process would ultimately be ended by an \fBex\fR in the insertion file. +.mh +Input\(slOutput File Switching +.pg +The (read-only) number register \fB.c\fR contains the input line number in +the current input file. The number register \fBc.\fR is a general register +serving the same purpose. +.h1 +.bt +\fB&so\fI|filename\fR - - Switch source file. +The top input (file reading) level is switched to \fIfilename\fR. +The effect of an \fBso\fR encountered in a macro +occurs immediately. +When the new file ends, +input is again taken from the original file. +\fBso\fR's may be nested. +.bt +\fB&nx\fI|filename\fR end-of-file - Next file is \fIfilename\fR. +The current file is considered ended, and the input is immediately switched +to \fIfilename\fR. +.bt +\fB&pi\fI|program\fR - - Pipe output to \fIprogram\fR (\*(NR only). +This request must occur \fIbefore\fR any printing occurs. +No arguments are transmitted to \fIprogram\fR. +.mh +Miscellaneous +.pg +.h1 +.bt +.mc \s12\(br\s0 +\fB&mc\fI|c|N\fR - off E,\fBm\fR \ +Specifies that a \fImargin\fR character \fIc\fR appear a distance +\fIN\fR to the right of the right margin +after each non-empty text line (except those produced by \fBtl\fR). +If the output line is too-long (as can happen in nofill mode) +the character will be appended to the line. +If \fIN\fR is not given, the previous \fIN\fR is used; the initial \fIN\fR is +0.2|inches in \*(NR and 1\|em in \*(TR. +The margin character used with this paragraph was a 12-point box-rule. +.br +.mc +.bt +\fB&tm\fI|string\fR - newline - \ +After skipping initial blanks, \fIstring\fR (rest of the line) is read in \fIcopy mode\fR +and written on the user's terminal. (see \(sc21). +.bt +\fB&ig\fI|yy\fR - \fI.yy=\fB..\fR - Ignore \ +input lines. +\fBig\fR behaves exactly like \fBde\fR (\(sc7) except that the +input is discarded. +The input is read in \fIcopy mode\fR, and any auto-incremented +registers will be affected. +.bt +\fB&pm\fI|t\fR - all - \ +Print macros. +The names and sizes of all of the defined macros and strings are printed +on the user's terminal; +if \fIt\fR is given, only the total of the sizes is printed. +The sizes is given in \fIblocks\fR +of 128 characters. +.bt +\fB&ab\fI|string\fR - - - \ +Print \fIstring\fR on standard error and terminate immediately. The +default \fIstring\fR is "User Abort". Does not cause a break. Only output +preceding the last break is written. +.bt +.lg 0 +\fB&fl\fR - - B \c +.lg +Flush output buffer. +Used in interactive debugging to force output. +.mh +Output and Error Messages. +.pg +The output from \fBtm\fR, \fBpm\fR, \fBab\fR and the prompt from \fBrd\fR, +as well as various \fIerror\fR messages are written onto +\s-1UNIX\s+1's \fIstandard error\fR output. +The latter is different from the \fIstandard output\fR, +where \*(NR formatted output goes. +By default, both are written onto the user's terminal, +but they can be independently redirected. +.pg +Various \fIerror\fR conditions may occur during +the operation of \*(NR and \*(TR. +Certain less serious errors having only local impact do not +cause processing to terminate. +Two examples are \fIword overflow\fR, caused by a word that is too large +to fit into the word buffer (in fill mode), and +\fIline overflow\fR, caused by an output line that grew too large +to fit in the line buffer; +in both cases, a message is printed, the offending excess +is discarded, +and the affected word or line is marked at the point of truncation +with a \(** in \*(NR and a \(lh in \*(TR. +The philosophy is to continue processing, if possible, +on the grounds that output useful for debugging may be produced. +If a serious error occurs, processing terminates, +and an appropriate message is printed. +Examples are the inability to create, read, or write files, +and the exceeding of certain internal limits that +make future output unlikely to be useful. +.ps 10 +.vs 12 +.ft R +.bp diff --git a/share/doc/usd/21.troff/m5 b/share/doc/usd/21.troff/m5 new file mode 100644 index 0000000..692db28 --- /dev/null +++ b/share/doc/usd/21.troff/m5 @@ -0,0 +1,462 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)m5 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.ds H T +.tr | +.tr ~| +.de x1 +.xx +.ft B +.in .2i +.nf +.ne 2.1 +.ta 1i +.. +.de x2 +.fi +.in 0 +.ft R +.xx +.. +.br +.ce +.ft B +.rs +.sp 0.5i +TUTORIAL EXAMPLES +.ft R +.sp 2 +.nr p 0 +.2C +.ns +.mh +.mk +Introduction +.pg +Although \*(NR and \*(TR +have by design a syntax reminiscent +of earlier text processors* +.fn +.xx +*For example: +P.|A.|Crisman, Ed., +.ul +The Compatible Time-Sharing System, +MIT Press, 1965, Section|AH9.01 +(Description of RUNOFF program on MIT's CTSS system). +.ef +with the intent of easing their use, +it is almost always necessary to +prepare at least a small set of macro definitions +to describe most documents. +Such common formatting needs +as page margins and footnotes +are deliberately not built into \*(NR and \*(TR. +Instead, +the macro and string definition, number register, diversion, +environment switching, page-position trap, and conditional input mechanisms +provide the basis for user-defined implementations. +.pg +The examples to be discussed are intended to be useful and somewhat realistic, +but won't necessarily cover all relevant contingencies. +Explicit numerical parameters are used +in the examples +to make them easier to read and to +illustrate typical values. +In many cases, number registers would really be used +to reduce the number of places where numerical +information is kept, +and to concentrate conditional parameter initialization +like that which depends on whether \*(TR or \*(NR is being used. +.mh +Page Margins +.pg +As discussed in \(sc3, +\fIheader\fR and \fIfooter\fR macros are usually defined +to describe the top and bottom page margin areas respectively. +A trap is planted at page position 0 for the header, and at +\fI\-N\fR (\fIN\fR from the page bottom) for the footer. +The simplest such definitions might be +.x1 +&de hd \e"define header +\'sp 1i +&& \e"end definition +&de fo \e"define footer +\'bp +&& \e"end definition +&wh 0 hd +&wh \-1i fo +.x2 +which provide blank 1|inch top and bottom margins. +The header will occur on the \fIfirst\fR page, +only if the definition and trap exist prior to +the initial pseudo-page transition (\(sc3). +In fill mode, the output line that springs the footer trap +was typically forced out because some part or whole word didn't fit on it. +If anything in the footer and header that follows causes a \fIbreak\fR, +that word or part word will be forced out. +In this and other examples, +requests like \fBbp\fR and \fBsp\fR that normally cause breaks are invoked using +the \fIno-break\fR control character \fB\'\fR +to avoid this. +When the header\(slfooter design contains material +requiring independent text processing, the +environment may be switched, avoiding +most interaction with the running text. +.pg +A more realistic example would be +.x1 +&de hd \e"header +&if t .tl \'\|\e(rn\'\'\e(rn\' \e"troff cut mark +&if \e\en%>1 \e{\e +\'sp ~\|0.5i\-1 \e"tl base at 0.5i +&tl \'\'\- % \-\'\' \e"centered page number +&ps \e"restore size +&ft \e"restore font +&vs \e} \e"restore vs +\'sp ~\|1.0i \e"space to 1.0i +&ns \e"turn on no-space mode +&& +&de fo \e"footer +&ps 10 \e"set footer\(slheader size +&ft R \e"set font +&vs 12p \e"set base-line spacing +&if \e\en%=1 \e{\e +\'sp ~\|\e\en(.pu\-0.5i\-1 \e"tl base 0.5i up +&tl \'\'\- % \-\'\' \e} \e"first page number +\'bp +&& +&wh 0 hd +&wh \-1i fo +.x2 +which sets the size, font, and base-line spacing for the +header\(slfooter material, and ultimately restores them. +The material in this case is a page number at the bottom of the +first page and at the top of the remaining pages. +If \*(TR is used, a \fIcut mark\fR is drawn in the form +of \fIroot-en\fR's at each margin. +The \fBsp\fR's refer to absolute positions to avoid +dependence on the base-line spacing. +Another reason for this in the footer +is that the footer is invoked by printing a line whose +vertical spacing swept past the trap position by possibly +as much as the base-line spacing. +The \fIno-space\fR mode is turned on at the end of \fBhd\fR +to render ineffective +accidental occurrences of \fBsp\fR at the top of the running text. +.pg +The above method of restoring size, font, etc. presupposes +that such requests (that set \fIprevious\fR value) are \fInot\fR +used in the running text. +A better scheme is save and restore both the current \fIand\fR +previous values as shown for size in the following: +.x1 +&de fo +&nr s1 \e\en(.s \e"current size +&ps +&nr s2 \e\en(.s \e"previous size +& --- \e"rest of footer +&& +&de hd +& --- \e"header stuff +&ps \e\en(s2 \e"restore previous size +&ps \e\en(s1 \e"restore current size +&& +.x2 +Page numbers may be printed in the bottom margin +by a separate macro triggered during the footer's +page ejection: +.x1 +&de bn \e"bottom number +&tl \'\'\- % \-\'\' \e"centered page number +&& +&wh \-0.5i\-1v bn \e"tl base 0.5i up +.x2 +.mh +Paragraphs and Headings +.pg +The housekeeping +associated with starting a new paragraph should be collected +in a paragraph macro +that, for example, +does the desired preparagraph spacing, +forces the correct font, size, base-line spacing, and indent, +checks that enough space remains for \fImore than one\fR line, +and +requests a temporary indent. +.x1 +&de pg \e"paragraph +&br \e"break +&ft R \e"force font, +&ps 10 \e"size, +&vs 12p \e"spacing, +&in 0 \e"and indent +&sp 0.4 \e"prespace +&ne 1+\e\en(.Vu \e"want more than 1 line +&ti 0.2i \e"temp indent +&& +.x2 +The first break in \fBpg\fR +will force out any previous partial lines, +and must occur before the \fBvs\fR. +The forcing of font, etc. is +partly a defense against prior error and +partly to permit +things like section heading macros to +set parameters only once. +The prespacing parameter is suitable for \*(TR; +a larger space, at least as big as the output device vertical resolution, would be +more suitable in \*(NR. +The choice of remaining space to test for in the \fBne\fR +is the smallest amount greater than one line +(the \fB.V\fR is the available vertical resolution). +.pg +A macro to automatically number section headings +might look like: +.x1 +&de sc \e"section +& --- \e"force font, etc. +&sp 0.4 \e"prespace +&ne 2.4+\e\en(.Vu \e"want 2.4+ lines +.lg 0 +&fi +.lg +\e\en+S. +&& +&nr S 0 1 \e"init S +.x2 +The usage is \fB.sc\fR, +followed by the section heading text, +followed by \fB.pg\fR. +The \fBne\fR test value includes one line of heading, +0.4 line in the following \fBpg\fR, and +one line of the paragraph text. +A word consisting of the next section number and a period is +produced to begin the heading line. +The format of the number may be set by \fBaf\fR (\(sc8). +.pg +Another common form is the labeled, indented paragraph, +where the label protrudes left into the indent space. +.x1 +&de lp \e"labeled paragraph +&pg +&in 0.5i \e"paragraph indent +&ta 0.2i 0.5i \e"label, paragraph +&ti 0 +\et\e\e$1\et\ec \e"flow into paragraph +&& +.x2 +The intended usage is "\fB.lp\fR \fIlabel\fR\|"; +\fIlabel\fR will begin at 0.2\|inch, and +cannot exceed a length of 0.3\|inch without intruding into +the paragraph. +The label could be right adjusted against 0.4\|inch by +setting the tabs instead with \fB.ta|0.4iR|0.5i\fR. +The last line of \fBlp\fR ends with \fB\ec\fR so that +it will become a part of the first line of the text +that follows. +.mh +Multiple Column Output +.pg +The production of multiple column pages requires +the footer macro to decide whether it was +invoked by other than the last column, +so that it will begin a new column rather than +produce the bottom margin. +The header can initialize a column register that +the footer will increment and test. +The following is arranged for two columns, but +is easily modified for more. +.x1 +&de hd \e"header +& --- +&nr cl 0 1 \e"init column count +&mk \e"mark top of text +&& +&de fo \e"footer +&ie \e\en+(cl<2 \e{\e +&po +3.4i \e"next column; 3.1+0.3 +&rt \e"back to mark +&ns \e} \e"no-space mode +&el \e{\e +&po \e\enMu \e"restore left margin +& --- +\'bp \e} +&& +&ll 3.1i \e"column width +&nr M \e\en(.o \e"save left margin +.x2 +Typically a portion of the top of the first page +contains full width text; +the request for the narrower line length, +as well as another \fB.mk\fR would +be made where the two column output was to begin. +.mh +Footnote Processing +.pg +The footnote mechanism to be described is used by +imbedding the footnotes in the input text at the +point of reference, +demarcated by an initial \fB.fn\fR and a terminal \fB.ef\fR: +.x1 +&fn +\fIFootnote text and control lines...\fP +&ef +.x2 +In the following, +footnotes are processed in a separate environment and diverted +for later printing in the space immediately prior to the bottom +margin. +There is provision for the case where the last collected +footnote doesn't completely fit in the available space. +.x1 +&de hd \e"header +& --- +&nr x 0 1 \e"init footnote count +&nr y 0\-\e\enb \e"current footer place +&ch fo \-\e\enbu \e"reset footer trap +&if \e\en(dn .fz \e"leftover footnote +&& +&de fo \e"footer +&nr dn 0 \e"zero last diversion size +&if \e\enx \e{\e +&ev 1 \e"expand footnotes in ev1 +&nf \e"retain vertical size +&FN \e"footnotes +&rm FN \e"delete it +&if "\e\en(.z"fy" .di \e"end overflow diversion +&nr x 0 \e"disable fx +&ev \e} \e"pop environment +& --- +\'bp +&& +&de fx \e"process footnote overflow +&if \e\enx .di fy \e"divert overflow +&& +&de fn \e"start footnote +&da FN \e"divert (append) footnote +&ev 1 \e"in environment 1 +&if \e\en+x=1 .fs \e"if first, include separator +.lg 0 +&fi \e"fill mode +.lg +&& +&de ef \e"end footnote +&br \e"finish output +&nr z \e\en(.v \e"save spacing +&ev \e"pop ev +&di \e"end diversion +&nr y \-\e\en(dn \e"new footer position, +&if \e\enx=1 .nr y \-(\e\en(.v\-\e\enz) \e + \e"uncertainty correction +&ch fo \e\enyu \e"y is negative +&if (\|\e\en(nl+1v)>(\|\e\en(.p+\e\eny) \e +&ch fo \e\en(nlu+1v \e"it didn't fit +&& +&de fs \e"separator +\el\'\|1i\' \e"1 inch rule +&br +&& +&de fz \e"get leftover footnote +&fn +&nf \e"retain vertical size +&fy \e"where fx put it +&ef +&& +&nr b 1.0i \e"bottom margin size +&wh 0 hd \e"header trap +&wh 12i fo \e"footer trap, temp position +&wh \-\e\enbu fx \e"fx at footer position +&ch fo \-\e\enbu \e"conceal fx with fo +.x2 +The header \fBhd\fR initializes a footnote count register \fBx\fR, +and sets both the current footer trap position register \fBy\fR and +the footer trap itself to a nominal position specified in +register \fBb\fR. +In addition, if the register \fBdn\fR indicates a leftover footnote, +\fBfz\fR is invoked to reprocess it. +The footnote start macro \fBfn\fR begins a diversion (append) in environment 1, +and increments the count \fBx\fR; if the count is one, the footnote separator \fBfs\fR +is interpolated. +The separator is kept in a separate macro to permit user redefinition. +The footnote end macro \fBef\fR restores +the previous environment and ends the diversion after saving the spacing size in register \fBz\fR. +\fBy\fR is then decremented by the size of the footnote, available in \fBdn\fR; +then on the first footnote, \fBy\fR is further decremented by the difference +in vertical base-line spacings of the two environments, to +prevent the late triggering the footer trap from causing the last +line of the combined footnotes to overflow. +The footer trap is then set to the lower (on the page) of \fBy\fR or the current page position (\fBnl\fR) +plus one line, to allow for printing the reference line. +If indicated by \fBx\fR, the footer \fBfo\fR rereads the footnotes from \fBFN\fR in nofill mode +in environment 1, +and deletes \fBFN\fR. +If the footnotes were too large to fit, the macro \fBfx\fR will be trap-invoked to redivert +the overflow into \fBfy\fR, +and the register \fBdn\fR will later indicate to the header whether \fBfy\fR is empty. +Both \fBfo\fR and \fBfx\fR are planted in the nominal footer trap position in an order +that causes \fBfx\fR to be concealed unless the \fBfo\fR trap is moved. +The footer then terminates the overflow diversion, if necessary, and +zeros \fBx\fR to disable \fBfx\fR, +because the uncertainty correction +together with a not-too-late triggering of the footer can result +in the footnote rereading finishing before reaching the \fBfx\fR trap. +.pg +A good exercise for the student is to combine the multiple-column and footnote mechanisms. +.mh +The Last Page +.pg +After the last input file has ended, \*(NR and \*(TR +invoke the \fIend macro\fR (\(sc7), if any, +and when it finishes, eject the remainder of the page. +During the eject, any traps encountered are processed normally. +At the \fIend\fR of this last page, processing terminates +\fIunless\fR a partial line, word, or partial word remains. +If it is desired that another page be started, the end-macro +.x1 +&de en \e"end-macro +\ec +\'bp +&& +&em en +.x2 +will deposit a null partial word, +and effect another last page. +.1C +'bp diff --git a/share/doc/usd/21.troff/table1 b/share/doc/usd/21.troff/table1 new file mode 100644 index 0000000..d0fad8f --- /dev/null +++ b/share/doc/usd/21.troff/table1 @@ -0,0 +1,129 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)table1 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.pn 30 +.rm mx +.br +.tr && +.tr || +.tr ~~ +.de aa +.nf +abcdefghijklmnopqrstuvwxyz +ABCDEFGHIJKLMNOPQRSTUVWXYZ +1234567890 +.ss 9 +! $ % & ( ) ` ' * + \- . , / : ; = ? [ ] | +.fi +\(bu \(sq \(em \(hy \(ru \(14 \(12 \(34 \(fi \(fl \(ff +\(Fi \(Fl +\(de \(dg \(fm +\(ct \(rg \(co +.ss 12 +.. +.de bb +.ss 9 +.fi +.ll 5i +" \' \e ^ \_ \` ~ \(sl < > { } # @ \(pl \(mi \(eq \(** +.br +\(*a \(*b \(*g \(*d \(*e \(*z \(*y \(*h \(*i \(*k \(*l \(*m +\(*n \(*c \(*o \(*p \(*r \(*s \(ts \(*t \(*u \(*f \(*x \(*q \(*w +.br +\(*G \(*D \(*H \(*L \(*C \(*P \(*S \(*U \(*F \(*Q \(*W +.br +\(sr \(rn \(>= \(<= \(== \(ap \(~= \(!= +\(-> \(<- \(ua \(da \(mu +\(di \(+- \(cu \(ca \(sb \(sp \(ib \(ip \(if \(pd +.br +\(sc \(gr \(no \(is \(pt \(es \(mo +\(dd \(rh \(lh \(or \(ci +\(lt \(lb \(rt \(rb \(lk \(rk \(bv \(lf \(rf \(lc \(rc +\(br +.br +.ss 12 +.nf +.. +.nf +.ps 12 +.vs 14p +.ft B +.ce +.sp 3 +Table I +.sp +.ce +Font Style Examples +.sp .5i +.ft R +.ps 10 +.fi +.vs 12p +.na +The following fonts are printed in 12-point, with a vertical spacing of 14-point, +and with +non-alphanumeric characters separated by \(14\|em space (all measurements +on 8.5 \(mu 11 inch paper prior to photoreduction). +This font sample is printed on an Apple Laserwriter +at University of California, Berkeley. +.sp .5i +.ps 12 +.vs 14p +.ft R +Times Roman +.sp .5 +.aa +.sp +.ft I +Times Italic +.sp .5 +.aa +.sp +.ft B +Times Bold +.sp .5 +.aa +.sp +.ft R +Special Mathematical Font +.sp .5 +.fi +.ll 5i +.bb +.bp diff --git a/share/doc/usd/21.troff/table2 b/share/doc/usd/21.troff/table2 new file mode 100644 index 0000000..7d212c0 --- /dev/null +++ b/share/doc/usd/21.troff/table2 @@ -0,0 +1,253 @@ +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" @(#)table2 8.1 (Berkeley) 8/14/93 +.\" +.\" $FreeBSD$ +.sp 100 +.br +.de mx +.nf +.ft I +.ta .25iC .5i +.45i 3.25iC +.25i +.45i + Input Character Input Character + Char Name Name Char Name Name +.ft R +.sp .2 +.nr cl 0 +.mk +.. +.br +.tr ~~ +.nf +.ps 12 +.vs 14p +.ft B +.ce +Table II +.sp +.ce 2 +Input Naming Conventions for \', \`, and \- +and for Non-ASCII Special Characters +.sp .5i +.ft R +.ps 10 +.vs 12p +.ft B +.bd I 3 +Non-\s-1ASCII\s+1 characters and \fIminus\fP on the standard fonts. +.sp +.ft R +.de cl +.ie \\n+(cl<2 \{\ +.po +3.0i +.rt +.\} +.el .sc +.. +.de sc +.po 26i/27u +.nr cl 0 +.. +.nr cl 0 1 +.de qq + \&' \' close quote + ` \` open quote + \(em \e\|(em 3\(sl4 Em dash + - \- hyphen or + \(hy \e\|(hy hyphen + \- \e\- current font minus + \(bu \e\|(bu bullet + \(sq \e\|(sq square + \(ru \e\|(ru rule + \(14 \e\|(14 1\(sl4 + \(12 \e\|(12 1\(sl2 + \(34 \e\|(34 3\(sl4 + \(fi \e\|(fi fi + \(fl \e\|(fl fl + \(ff \e\|(ff ff + \(Fi \e\|(Fi ffi + \(Fl \e\|(Fl ffl + \(de \e\|(de degree + \(dg \e\|(dg dagger + \(fm \e\|(fm foot mark + \(ct \e\|(ct cent sign + \(rg \e\|(rg registered + \(co \e\|(co copyright +.. +.di zz +.lg 0 +.qq +.di +.lg +.mx +.nr aa \n(dn/2 +.ne \n(aau+1 +.nr bb \n(nl+\n(aa +.wh \n(bbu cl +.qq +.sp |\n(bbu +.ch cl 12i +.fi +.sp 2 +.ft B +.bd I +Non-\s-1ASCII\s+1 characters and \', \`, \_\|, \(pl, \(mi, \(eq, and \(** on the special font. +.sp .4 +.ft R +.fi +.ps 10 +The ASCII characters @, #, ", \', \`, <, >, \\, {, }, ~, ^, and \(ul exist +\fIonly\fR on the special font and are printed as a 1-em space if that font +is not mounted. +The following characters exist only on the special font except +for the upper case Greek letter names followed by \(dg which are mapped into +upper case English letters in +whatever font is mounted on font position one (default Times Roman). +The special math plus, minus, and equals are provided to +insulate the appearance of equations from the choice of standard fonts. +.bd I 3 +.nf +.ps 10 +.sp +.ch cl \nmu-\n(.vu-1u +.mx +.lg 0 + \(pl \e\|(pl math plus + \(mi \e\|(mi math minus + \(eq \e\|(eq math equals + \(** \e\|(** math star + \(sc \e\|(sc section + \(aa \e\|(aa acute accent + \(ga \e\|(ga grave accent + \(ul \e\|(ul underrule + \(sl \e\|(sl slash (matching backslash) + \(*a \e\|(*a alpha + \(*b \e\|(*b beta + \(*g \e\|(*g gamma + \(*d \e\|(*d delta + \(*e \e\|(*e epsilon + \(*z \e\|(*z zeta + \(*y \e\|(*y eta + \(*h \e\|(*h theta + \(*i \e\|(*i iota + \(*k \e\|(*k kappa + \(*l \e\|(*l lambda + \(*m \e\|(*m mu + \(*n \e\|(*n nu + \(*c \e\|(*c xi + \(*o \e\|(*o omicron + \(*p \e\|(*p pi + \(*r \e\|(*r rho + \(*s \e\|(*s sigma + \(ts \e\|(ts terminal sigma + \(*t \e\|(*t tau + \(*u \e\|(*u upsilon + \(*f \e\|(*f phi + \(*x \e\|(*x chi + \(*q \e\|(*q psi + \(*w \e\|(*w omega + \(*A \e\|(*A Alpha\(dg + \(*B \e\|(*B Beta\(dg + \(*G \e\|(*G Gamma + \(*D \e\|(*D Delta + \(*E \e\|(*E Epsilon\(dg + \(*Z \e\|(*Z Zeta\(dg + \(*Y \e\|(*Y Eta\(dg + \(*H \e\|(*H Theta + \(*I \e\|(*I Iota\(dg + \(*K \e\|(*K Kappa\(dg + \(*L \e\|(*L Lambda + \(*M \e\|(*M Mu\(dg + \(*N \e\|(*N Nu\(dg + \(*C \e\|(*C Xi + \(*O \e\|(*O Omicron\(dg + \(*P \e\|(*P Pi + \(*R \e\|(*R Rho\(dg + \(*S \e\|(*S Sigma + \(*T \e\|(*T Tau\(dg + \(*U \e\|(*U Upsilon + \(*F \e\|(*F Phi + \(*X \e\|(*X Chi\(dg + \(*Q \e\|(*Q Psi + \(*W \e\|(*W Omega + \(sr \e\|(sr square root + \(rn \e\|(rn root en extender + \(>= \e\|(>= >= + \(<= \e\|(<= <= + \(== \e\|(== identically equal + \(~= \e\|(~= approx = + \(ap \e\|(ap approximates + \(!= \e\|(!= not equal + \(-> \e\|(\(mi> right arrow + \(<- \e\|(<\(mi left arrow + \(ua \e\|(ua up arrow + \(da \e\|(da down arrow + \(mu \e\|(mu multiply + \(di \e\|(di divide + \(+- \e\|(+\(mi plus-minus + \(cu \e\|(cu cup (union) + \(ca \e\|(ca cap (intersection) + \(sb \e\|(sb subset of + \(sp \e\|(sp superset of + \(ib \e\|(ib improper subset + \(ip \e\|(ip improper superset + \(if \e\|(if infinity + \(pd \e\|(pd partial derivative + \(gr \e\|(gr gradient + \(no \e\|(no not + \(is \e\|(is integral sign + \(pt \e\|(pt proportional to + \(es \e\|(es empty set + \(mo \e\|(mo member of + \(br \e\|(br box vertical rule + \(dd \e\|(dd double dagger + \(rh \e\|(rh right hand + \(lh \e\|(lh left hand + \(or \e\|(or or + \(ci \e\|(ci circle + \(lt \e\|(lt left top of big curly bracket + \(lb \e\|(lb left bottom + \(rt \e\|(rt right top + \(rb \e\|(rb right bot + \(lk \e\|(lk left center of big curly bracket + \(rk \e\|(rk right center of big curly bracket + \(bv \e\|(bv bold vertical + \(lf \e\|(lf left floor (left bottom of big + square bracket) + \(rf \e\|(rf right floor (right bottom) + \(lc \e\|(lc left ceiling (left top) + \(rc \e\|(rc right ceiling (right top) diff --git a/share/doc/usd/22.trofftut/Makefile b/share/doc/usd/22.trofftut/Makefile new file mode 100644 index 0000000..a3f32fd --- /dev/null +++ b/share/doc/usd/22.trofftut/Makefile @@ -0,0 +1,45 @@ +# @(#)Makefile 8.1 (Berkeley) 6/8/93 +# Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# Redistributions of source code and documentation must retain the above +# copyright notice, this list of conditions and the following +# disclaimer. +# +# Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# All advertising materials mentioning features or use of this software +# must display the following acknowledgement: +# +# This product includes software developed or owned by Caldera +# International, Inc. Neither the name of Caldera International, Inc. +# nor the names of other contributors may be used to endorse or promote +# products derived from this software without specific prior written +# permission. +# +# USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +# INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +# DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +# FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +# BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +# OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +# IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# +# $FreeBSD$ + +VOLUME= usd/22.trofftut +SRCS= tt.mac tt00 tt01 tt02 tt03 tt04 tt05 tt06 tt07 tt08 tt09 tt10 \ + tt11 tt12 tt13 tt14 ttack ttcharset ttindex +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/usd/22.trofftut/tt.mac b/share/doc/usd/22.trofftut/tt.mac new file mode 100644 index 0000000..4bc3bb7 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt.mac @@ -0,0 +1,111 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt.mac 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.tr _\(em +.tr *\(** +.de UL +.if n .ul +.if n \\$3\\$1\\$2 +.if t \\$3\f3\\$1\fP\\$2 +.. +.de UC +\\$3\s-1\\$1\s+1\\$2 +.. +.de C +.if n .ul +.if n \\$3\\$1\\$2 +.if t \\$3\f3\\$1\fP\\$2 +.. +.de IT +.if t \\$3\f2\\$1\fP\\$2 +.if n .ul +.if n \\$3\\$1\\$2 +.. +.de UI +\f3\\$1\fI\\$2\fR\\$3 +.. +.de P1 +.if n .ls 1 +.nf +.\" use first argument as indent if present +.if \\n(.$ .DS I \\$1 +.if !\\n(.$ .DS I 5 +.ta .75i 1.5i 2.25i 3i 3.75i +.tr '\' +.. +.de P2 +.tr '' +.DE +.if n .ls 2 +.lg +.. +.if t .ds m \(mi +.if n .ds m - +.if t .ds n \(no +.if n .ds n - +.if t .ds s \v'.41m'\s+4*\s-4\v'-.41m' +.if n .ds s * +.if t .ds S \(sl +.if n .ds S / +.if t .ds d \s+4\&.\&\s-4 +.if n .ds d \&.\& +.if t .ds a \z@@ +.if n .ds a @ +.hy 14 +.\" XXX is this supposed to be a comment? . 2=not last lines; 4= no -xx; 8=no xx- +.de WS +.sp \\$1 +.. +.\" ACCENTS say \*'e or \*`e to get e acute or e grave +.ds ' \h'\w'e'u*4/10'\z\(aa\h'-\w'e'u*4/10' +.ds e \o"e\'" +.ds ` \h'\w'e'u*4/10'\z\(ga\h'-\w'e'u*4/10' +.\" UMLAUT \*:u, etc. +.ds : \v'-0.6m'\h'(1u-(\\n(.fu%2u))*0.13m+0.06m'\z.\h'0.2m'\z.\h'-((1u-(\\n(.fu%2u))*0.13m+0.26m)'\v'0.6m' +.\" TILDE and CIRCUMFLEX +.ds ^ \\k:\h'-\\n(.fu+1u/2u*2u+\\n(.fu-1u*0.13m+0.06m'\z^\h'|\\n:u' +.ds ~ \\k:\h'-\\n(.fu+1u/2u*2u+\\n(.fu-1u*0.13m+0.06m'\z~\h'|\\n:u' +.de BD +\&\\$3\f1\\$1\h\(ts-\w\(ts\\$1\(tsu+1u\(ts\\$1\fP\\$2\& +.. +.hw semi-colon diff --git a/share/doc/usd/22.trofftut/tt00 b/share/doc/usd/22.trofftut/tt00 new file mode 100644 index 0000000..f8c5ea7 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt00 @@ -0,0 +1,122 @@ +.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode! +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt00 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.EH 'USD:22-%''A TROFF Tutorial' +.OH 'A TROFF Tutorial''USD:22-%' +.\".RP +.\" .....TM 76-1273-7 39199 39199-11 +.TL +A TROFF Tutorial +.AU "MH 2C-518" 6021 +Brian W. Kernighan +(updated for 4.3BSD by Mark Seiden) +.AI +.\" What's this? .MH +.\" And this? .OK +\"Typesetting +\"Text formatting +\"NROFF +.AB +.PP +.UL troff +is a text-formatting program for +typesetting on the +.UX +operating system. +This device is capable of producing high quality +text; +this paper is an example of +.UL troff +output. +.PP +The phototypesetter itself normally runs with four fonts, +containing roman, italic and bold letters +(as on this page), +a full greek alphabet, and a substantial number of +special characters and mathematical symbols. +Characters can be printed in a range of sizes, +and placed anywhere on the page. +.PP +.UL troff +allows the user full control over fonts, +sizes, and character positions, +as well as the usual features of a formatter _ +right-margin justification, automatic hyphenation, +page titling and numbering, and so on. +It also provides macros, arithmetic variables and operations, +and conditional testing, for complicated formatting tasks. +.PP +This document is an introduction to the most basic use of +.UL troff . +It presents just enough information to enable the user +to do simple formatting +tasks like making viewgraphs, +and to make incremental changes to existing packages +of +.UL troff +commands. +In most respects, the +.UC UNIX +formatter +.UL nroff +and a more recent version +.ul +(device-independent +.UL troff) +are identical to +the version described here, so this document also serves as a tutorial for +them as well. +.PP +.vs 12p +\fB\s+1NOTE: This document refers to the historical \f(BItroff\fB program, and +not to \f(BIgroff\fB. This is a first cut at importing the tutorial from +4.4BSD, now that the code has been released. It should at some time be modified +to describe \f(BIgroff\fR.\s0 +.AE +.nr LL 6.5i +.nr LT 6.5i +.\" Unknown macro .CS 13 1 14 0 0 5 +.if t .2C +.nr PS 9 +.nr VS 11 diff --git a/share/doc/usd/22.trofftut/tt01 b/share/doc/usd/22.trofftut/tt01 new file mode 100644 index 0000000..ff6fb83 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt01 @@ -0,0 +1,223 @@ +.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode! +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt01 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Introduction +.tr ^. +.PP +.UL troff +[1] +is a text-formatting program, +written originally by J. F. Ossanna, +for producing +high-quality printed output from the phototypesetter +on the +.UC UNIX +operating system. +This document is an example of +.UL troff +output. +.PP +The single most important rule +of using +.UL troff +is +not to use it directly, but through some intermediary. +In many ways, +.UL troff +resembles an assembly language _ +a remarkably powerful and flexible one _ +but nonetheless such that many operations must be specified +at a level of detail and in a form that is too hard +for most people to use effectively. +.PP +For two special applications, there are programs that provide +an interface to +.UL troff +for the majority of users. +.UL eqn +[2] +provides an easy to learn language for typesetting mathematics; +the +.UL eqn +user +need know no +.UL troff +whatsoever +to typeset mathematics. +.UL tbl +[3] +provides the same convenience for producing tables of arbitrary +complexity. +.PP +For producing straight text (which may well contain mathematics or tables), there are a number of `macro packages' +that define formatting rules and operations for specific styles +of documents, +and reduce the amount of +direct contact with +.UL troff . +In particular, the `\-ms' +[4], +PWB/MM [5], and `\-me' [6] +packages +for internal memoranda and external papers +provide most of the facilities needed +for a wide range of document preparation.\(dg +.FS +\(dg Most Berkeley Unix sites only have \-ms and \-me. +.FE +(This memo was prepared with `\-ms'.) +There are also packages for viewgraphs, +for simulating the older +.UL roff +formatters, +and for other special applications. +Typically you will find these packages easier to use +than +.UL troff +once you get beyond the most trivial operations; +you should always consider them first. +.PP +In the few cases where existing packages don't do the whole job, +the solution is +.ul +not +to write an entirely new set of +.UL troff +instructions from scratch, but to make small changes +to adapt packages that already exist. +.WS +.PP +In accordance with this philosophy of letting someone else +do the work, +the part of +.UL troff +described here is only a small part of the whole, +although it tries to concentrate on the more useful parts. +In any case, there is no attempt to be complete. +Rather, the emphasis is on showing how to do simple things, +and how to make incremental changes to what already exists. +The contents of the remaining sections are: +.sp +.nf +.in .1i +.ta .3i +\02. Point sizes and line spacing +\03. Fonts and special characters +\04. Indents and line length +\05. Tabs +\06. Local motions: Drawing lines and characters +\07. Strings +\08. Introduction to macros +\09. Titles, pages and numbering +10. Number registers and arithmetic +11. Macros with arguments +12. Conditionals +13. Environments +14. Diversions + Appendix: Typesetter character set +.sp +.in 0 +.fi +The +.UL troff +described here is the C-language version supplied with +.UC UNIX +Version 7 and 32V as documented in [1]. +.WS +.PP +To use +.UL troff +you have to prepare not only the actual text you want printed, +but some information that tells +.ul +how +you want it printed. +(Readers who use +.UL roff +will find the approach familiar.) +For +.UL troff +the text +and +the formatting information are often intertwined quite intimately. +Most commands to +.UL troff +are placed on a line separate from the text itself, +beginning with a period (one command per line). +For example, +.P1 +Some text. +^ps 14 +Some more text. +.P2 +will change the `point size', +that is, +the size of the letters being printed, +to `14 point' (one point is 1/72 inch) like this: +.P1 +.fi +Some text. +.ps 14 +Some more text. +.ps 10 +.P2 +.PP +Occasionally, though, +something special occurs in the middle of a line _ +to produce +.P1 +Area = \(*p\fIr\fR\|\s8\u2\d\s0 +.P2 +you have to type +.P1 +Area = \e(*p\efIr\efR\e\^|\^\es8\eu2\ed\es0 +.P2 +(which we will explain shortly). +The backslash character +.BD \e +is used +to introduce +.UL troff +commands and special characters within a line of text. diff --git a/share/doc/usd/22.trofftut/tt02 b/share/doc/usd/22.trofftut/tt02 new file mode 100644 index 0000000..4d14a52 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt02 @@ -0,0 +1,244 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt02 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Point Sizes; Line Spacing +.PP +As mentioned above, +the command +.BD .ps +sets the point size. +One point is 1/72 inch, +so 6-point characters are at most 1/12 inch high, +and 36-point characters are \(12 inch. +There are 15 point sizes, listed below. +.P1 1 +.ps 6 +6 point: Pack my box with five dozen liquor jugs. +.ps 7 +.vs 8p +7 point: Pack my box with five dozen liquor jugs. +.vs 9p +.ps 8 +8 point: Pack my box with five dozen liquor jugs. +.vs 10p +.ps 9 +9 point: Pack my box with five dozen liquor jugs. +.vs 11p +.ps 10 +10 point: Pack my box with five dozen liquor +.vs 12p +.ps 11 +11 point: Pack my box with five dozen +.vs 14p +.ps 12 +12 point: Pack my box with five dozen +.vs 16p +.ps 14 +14 point: Pack my box with five +.vs 24p +\s1616 point\s18 18 point\s20 20 point +.vs 40p +\s2222\s24 24\s28 28\s36 36 +.ps 10 +.vs 12p +.P2 +.PP +If the number after +.BD .ps +is not one of these +legal sizes, +it is rounded up to the next valid value, +with a maximum of 36. +If no number follows +.BD .ps , +.UL troff +reverts to the previous size, whatever it was. +.UL troff +begins with point size 10, +which is usually fine. +The original of this document (on 8.5 by 11 inch paper) is in 9 point. +.PP +The point size can also be changed in the middle of a line +or even a word +with the in-line command +.BD \es . +To produce +.P1 +\s8UNIX\s10 runs on a \s8PDP-\s1011/45 +.P2 +type +.P1 +\es8UNIX\es10 runs on a \es8PDP-\es1011/45 +.P2 +As above, +.BD \es +should be followed by a legal point size, +except that +.BD \es0 +causes the size to revert to +its previous value. +Notice that +.BD \es1011 +can be understood correctly as `size 10, followed by an 11', if the size is legal, +but not otherwise. +Be cautious with similar constructions. +.PP +Relative size changes are also legal and useful: +.P1 +\es\-2UNIX\es+2 +.P2 +temporarily decreases the size, whatever it is, by two points, then +restores it. +Relative size changes have the advantage that the size difference +is independent of the starting size of the document. +The amount of the relative change is restricted +to a single digit. +.WS +.PP +The other parameter that determines what the type looks like +is the spacing between lines, +which is set independently of the point size. +Vertical spacing is measured from the bottom of one line to +the bottom of the next. +The command to control vertical spacing is +.BD .vs . +For running text, it is usually best to set the vertical spacing +about 20% bigger than the character size. +For example, so far in this document, we have used +``9 on 11'', that is, +.P1 +^ps 9 +^vs 11p +.P2 +If we changed to +.P1 +^ps 9 +^vs 9p +.P2 +.vs 9p +.ne 3 +the running text would look like this. +After a few lines, you will agree it looks a little cramped. +The right vertical spacing is partly a matter of taste, depending on how +much text you want to squeeze into a given space, +and partly a matter of traditional printing style. +By default, +.UL troff +uses 10 on 12. +.PP +.vs 14p +.ps 12 +Point size and vertical spacing make a substantial difference in the amount of text +per square inch. +This is 12 on 14. +.ne 2 +.PP +.ne 2 +.ps 6 +.vs 7p +Point size and vertical spacing make a substantial difference in the amount of text +per square inch. +For example, +10 on 12 uses about twice as much space as 7 on 8. +This is 6 on 7, which is even smaller. +It packs a lot more words per line, +but you can go blind trying to read it. +.PP +When used without arguments, +.BD .ps +and +.BD .vs +revert to the previous size and vertical spacing +respectively. +.WS +.PP +The command +.BD .sp +is used to get extra vertical space. +Unadorned, +it gives you one extra blank line (one +.BD .vs , +whatever that has been set to). +Typically, that's more or less than you want, +so +.BD .sp +can be followed by +information about how much space you want _ +.P1 +^sp 2i +.P2 +means `two inches of vertical space'. +.P1 +^sp 2p +.P2 +means `two points of vertical space'; +and +.P1 +^sp 2 +.P2 +means `two vertical spaces' _ two of whatever +.BD .vs +is set to +(this can also be made explicit with +.BD .sp\ 2v ); +.UL troff +also understands decimal fractions in most places, +so +.P1 +^sp 1.5i +.P2 +is a space of 1.5 inches. +These same scale factors can be used after +.BD .vs +to define line spacing, and in fact after most commands +that deal with physical dimensions. +.PP +It should be noted that all size numbers are converted internally +to `machine units', which are 1/432 inch +(1/6 point). +For most purposes, this is enough resolution +that you don't have to worry about the accuracy of the representation. +The situation is not quite so good vertically, +where resolution is 1/144 inch +(1/2 point). diff --git a/share/doc/usd/22.trofftut/tt03 b/share/doc/usd/22.trofftut/tt03 new file mode 100644 index 0000000..e475d45 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt03 @@ -0,0 +1,240 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt03 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Fonts and Special Characters +.PP +.UL troff +and the typesetter allow four different fonts at any one time. +Normally three fonts (Times roman, italic and bold) and one collection of special characters +are permanently +mounted. +.P1 2 +.ft R +abcdefghijklmnopqrstuvwxyz 0123456789 +ABCDEFGHIJKLMNOPQRSTUVWXYZ +.ft I +abcdefghijklmnopqrstuvwxyz 0123456789 +ABCDEFGHIJKLMNOPQRSTUVWXYZ +.ft B +abcdefghijklmnopqrstuvwxyz 0123456789 +ABCDEFGHIJKLMNOPQRSTUVWXYZ +.ft R +.P2 +The +greek, mathematical symbols and miscellany +of the special font are +listed in Appendix A. +.PP +.UL troff +prints in roman unless told otherwise. +To switch into bold, use +the +.BD .ft +command +.P1 +^ft B +.P2 +and for italics, +.P1 +^ft I +.P2 +To return to roman, use +.BD .ft\ R ; +to return to the previous font, +whatever it was, +use either +.BD .ft\ P +or just +.BD .ft . +The `underline' command +.P1 +^ul +.P2 +causes the next input line to print in italics. +.BD .ul +can be followed by a count to +indicate that more than one line is to be italicized. +.PP +Fonts can also be changed within a line or word +with the in-line command +.BD \ef : +.P1 +\fBbold\fIface\fR text +.P2 +is produced by +.P1 +\efBbold\efIface\efR text +.P2 +If you want to do this so the previous font, whatever it was, +is left undisturbed, insert extra +.BD \efP +commands, like this: +.P1 +\efBbold\efP\efIface\efP\efR text\efP +.P2 +Because only the immediately previous font is remembered, +you have to restore the previous font after each change +or you can lose it. +The same is true of +.BD .ps +and +.BD .vs +when used without an argument. +.PP +There are other fonts available besides the standard set, +although you can still use only four at any given time. +The command +.BD .fp +tells +.UL troff +what fonts are physically mounted on the typesetter: +.P1 +^fp 3 H +.P2 +says that the Helvetica font is mounted on position 3. +(The complete list of font sizes and styles depends on +your typesetter or laser printer.) +Appropriate +.BD .fp +commands should appear at the beginning of your document +if you do not use the standard fonts. +.PP +It is possible to make a document relatively independent +of the actual fonts used to print it +by using font numbers instead of names; +for example, +.BD \ef3 +and +.BD .ft\ 3 +mean `whatever font is mounted at position 3', +and thus work for any setting. +Normal settings are roman font on 1, italic on 2, +bold on 3, +and special on 4. +.PP +There is also a way to get `synthetic' bold fonts +by overstriking letters with a slight offset. +Look at the +.BD .bd +command in [1]. +.WS +.PP +Special characters have four-character names beginning with +.BD \e( , +and they may be inserted anywhere. +For example, +.P1 +\(14 + \(12 = \(34 +.P2 +is produced by +.P1 +\e(14 + \e(12 = \e(34 +.P2 +In particular, +greek letters are all of the form +.BD \e(*\- , +where +.BD \- +is an upper or lower case roman letter +reminiscent of the greek. +Thus +to get +.P1 +\(*S(\(*a\(mu\(*b) \(-> \(if +.P2 +in bare +.UL troff +we have to type +.P1 +\e(*S(\e(*a\e(mu\e(*b) \e(\(mi> \e(if +.P2 +That line is unscrambled as follows: +.P1 +.ta 1i 2i 3i +\e(*S \(*S +( ( +\e(*a \(*a +\e(mu \(mu +\e(*b \(*b +) ) +\e(\(mi> \(-> +\e(if \(if +.P2 +A complete list of these special names occurs in Appendix A. +.PP +In +.UL eqn +[2] +the same effect can be achieved with the input +.P1 +SIGMA ( alpha times beta ) \-> inf +.P2 +which is less concise, but clearer to the uninitiated. +.PP +Notice that +each +four-character name is a single character +as far as +.UL troff +is concerned _ +the +`translate' command +.P1 +^tr \e(mi\e(em +.P2 +is perfectly clear, meaning +.P1 +^tr \(mi\(em +.P2 +that is, to translate \(mi into \(em. +.PP +Some characters are automatically translated into others: +grave \(ga and acute \(aa accents (apostrophes) become open and close single quotes +`\|'\|; +the combination of ``...'' is generally preferable to the double quotes "...". +Similarly a typed minus sign becomes a hyphen -. +To print an explicit \- sign, use +.BD \e\|- . +To get a backslash printed, use +.BD \ee . diff --git a/share/doc/usd/22.trofftut/tt04 b/share/doc/usd/22.trofftut/tt04 new file mode 100644 index 0000000..c44c94b --- /dev/null +++ b/share/doc/usd/22.trofftut/tt04 @@ -0,0 +1,189 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt04 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Indents and Line Lengths +.PP +.UL troff +starts with a line length of 6.5 inches, +which some people think is too wide for 8\(12\(mu11 paper. +To reset the line length, +use +the +.BD .ll +command, as in +.P1 +^ll 6i +.P2 +As with +.BD .sp , +the actual length can be specified in several ways; +inches are probably the most intuitive. +.PP +The maximum line length provided by the typesetter is 7.5 inches, by the way. +To use the full width, you will have to reset the default physical left margin (``page offset''), +which is normally slightly less than one inch from the left edge +of the paper. +This is done by the +.BD .po +command. +.P1 +^po 0 +.P2 +sets the offset as far to the left as it will go. +.WS +.PP +The indent command +.BD .in +causes the left margin to be indented +by some specified amount from the page offset. +If we use +.BD .in +to move the left margin in, +and +.BD .ll +to move the right margin to the left, +we can +make offset blocks of text: +.P1 +^in 0.3i +^ll \(mi0.3i +text to be set into a block +^ll +0.3i +^in \(mi0.3i +.P2 +will create a block that looks like this: +.P1 +.fi +.ll -0.3i +Pater noster qui est in caelis sanctificetur nomen tuum; +adveniat regnum tuum; fiat voluntas tua, sicut in caelo, +et in terra. ... +Amen. +.ll +0.3i +.P2 +Notice the use of `+' and `\(mi' +to specify the amount of change. +These change the previous setting by the specified amount, +rather than just overriding it. +The distinction is quite important: +.BD .ll\ +1i +makes lines one inch longer; +.BD .ll\ 1i +makes them one inch +.ul +long. +.PP +With +.BD .in , +.BD .ll +and +.BD .po , +the previous value is used if no argument is specified. +.PP +To indent a single line, use the `temporary indent' +command +.BD .ti . +For example, all paragraphs in this memo +effectively begin with the command +.P1 +^ti 3 +.P2 +Three of what? +The default unit for +.BD .ti , +as for most horizontally oriented commands +.BD .ll , ( +.BD .in , +.BD .po ), +is ems; +an em is roughly the width of the letter `m' +in the current point size. +(Precisely, an em in size +.ul +p +is +.ul +p +points.) +Although inches are usually clearer than ems to people who don't set type +for a living, +ems have a place: +they are a measure of size that is proportional to the current point size. +If you want to make text that keeps its proportions +regardless of point size, +you should use ems for all dimensions. +Ems can be specified as scale factors directly, +as in +.BD .ti\ 2.5m . +.PP +Lines can also be indented negatively +if the indent is already positive: +.P1 +^ti \(mi0.3i +.P2 +causes the next line to be moved back three tenths of an inch. +Thus to make a decorative initial capital, +we indent the whole paragraph, then move the letter `P' back with +a +.BD .ti +command: +.P1 +.ll -0.3i +.fi +.in +.3i +.ti -0.3i +\s36\v'2'P\v'-2'\s0ater noster qui est in caelis sanctificetur +nomen tuum; +adveniat regnum tuum; +'in -.3i +fiat voluntas tua, +sicut in caelo, et in terra. ... +Amen. +.ll +0.3i +.P2 +Of course, there is also some trickery to make the `P' +bigger (just a `\es36P\es0'), +and to move it +down from its normal position +(see the section on local motions). diff --git a/share/doc/usd/22.trofftut/tt05 b/share/doc/usd/22.trofftut/tt05 new file mode 100644 index 0000000..b7bff82 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt05 @@ -0,0 +1,130 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt05 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Tabs +.PP +Tabs +(the \s8ASCII\s0 `horizontal tab' character) +can be used to produce output in columns, +or to set the horizontal position of output. +Typically +tabs are used only in unfilled text. +Tab stops are set by default every half inch from the +current indent, +but +can be changed by the +.BD .ta +command. +To set stops every inch, for example, +.P1 +^ta 1i 2i 3i 4i 5i 6i +.P2 +.PP +Unfortunately the stops are left-justified only +(as on a typewriter), +so lining up columns of right-justified numbers can be painful. +If you have many numbers, +or if you need more complicated table layout, +.ul +don't +use +.UL troff +directly; +use the +.UL tbl +program described in [3]. +.PP +For a handful of numeric columns, you can do it this way: +Precede every number by enough blanks to make it line up +when typed. +.P1 +^nf +^ta 1i 2i 3i +\0\01\0\fItab\fR\0\0\02\0\fItab\fR\0\0\03 +\040\0\fItab\fR\0\050\0\fItab\fR\0\060 +700\0\fItab\fR\0800\0\fItab\fR\0900 +^fi +.P2 +Then change each leading blank into the string +.BD \e0 . +This is a character that does not print, but that has +the same width as a digit. +When printed, this will produce +.P1 +.ta 1i 2i 3i +\0\01 \0\02 \0\03 +\040 \050 \060 +700 800 900 +.P2 +.PP +It is also possible to fill up tabbed-over space with +some character other than blanks by setting the `tab replacement character' +with the +.BD .tc +command: +.P1 +^ta 1.5i 2.5i +^tc \e(ru (\e(ru is "\(ru") +Name \fItab\fR Age \fItab\fR +.P2 +produces +.P1 3 +.ta 1.5i 2.5i +.tc \(ru +Name Age +.tc +.P2 +To reset the tab replacement character to a blank, use +.BD .tc +with no argument. +(Lines can also be drawn with the +.BD \el +command, described in Section 6.) +.PP +.UL troff +also provides a very general mechanism called `fields' +for setting up complicated columns. +(This is used by +.UL tbl ). +We will not go into it in this paper. diff --git a/share/doc/usd/22.trofftut/tt06 b/share/doc/usd/22.trofftut/tt06 new file mode 100644 index 0000000..3f73958 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt06 @@ -0,0 +1,351 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt06 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Local Motions: Drawing lines and characters +.PP +Remember `Area = \(*pr\u2\d' and the big `P' +in the Paternoster. +How are they done? +.UL troff +provides a host of commands for placing characters of any size +at any place. +You can use them to draw special characters +or to tune your output for a particular appearance. +Most of these commands are straightforward, but messy to read +and tough to type correctly. +.PP +If you won't use +.UL eqn , +subscripts and superscripts are most easily done with +the half-line local motions +.BD \eu +and +.BD \ed . +To go back up the page half a point-size, insert a +.BD \eu +at the desired place; +to go down, insert a +.BD \ed . +.BD \eu \& ( +and +.BD \ed +should always +be used in pairs, as explained below.) +Thus +.P1 +Area = \e(*pr\eu2\ed +.P2 +produces +.P1 +Area = \(*pr\u2\d +.P2 +To make the `2' smaller, bracket it with +.BD \es\-2...\es0 . +Since +.BD \eu +and +.BD \ed +refer to the current point size, +be sure to put them either both inside or both outside +the size changes, +or you will get an unbalanced vertical motion. +.PP +Sometimes the space given by +.BD \eu +and +.BD \ed +isn't the right amount. +The +.BD \ev +command can be used to request an arbitrary amount of vertical motion. +The in-line command +.P1 +\ev'(amount)' +.P2 +causes motion up or down the page by the amount specified in +`(amount)'. +For example, to move the `P' down, we used +.P1 2 +.ta 1i +^in +0.6i (move paragraph in) +^ll \-0.3i (shorten lines) +^ti \-0.3i (move P back) +\ev'2'\es36P\es0\ev'\-2'ater noster qui est +in caelis ... +.P2 +A minus sign causes upward motion, while +no sign or a plus sign means down the page. +Thus +.BD \ev\(fm\-2\(fm +causes an upward vertical motion +of two line spaces. +.PP +There are many other ways to specify the amount of motion _ +.P1 +\ev'0.1i' +\ev'3p' +\ev'\-0.5m' +.P2 +and so on are all legal. +Notice that the scale specifier +.BD i +or +.BD p +or +.BD m +goes inside the quotes. +Any character can be used in place of the quotes; +this is also true of all other +.UL troff +commands described in this section. +.PP +Since +.UL troff +does not take within-the-line vertical motions into account +when figuring out where it is on the page, +output lines can have unexpected positions +if the left and right ends aren't at the same +vertical position. +Thus +.BD \ev , +like +.BD \eu +and +.BD \ed , +should always balance upward vertical motion in a line with +the same amount in the downward direction. +.PP +Arbitrary horizontal motions are also available _ +.BD \eh +is quite analogous to +.BD \ev , +except that the default scale factor is ems instead of line spaces. +As an example, +.P1 +\eh'\-0.1i' +.P2 +causes a backwards motion of a tenth of an inch. +As a practical matter, consider printing the mathematical symbol +`>>'. +The default spacing is too wide, so +.UL eqn +replaces this by +.P1 +>\eh'\-0.3m'> +.P2 +to produce >\h'-.3m'>. +.PP +Frequently +.BD \eh +is used with the `width function' +.BD \ew +to generate motions equal to the width +of some character string. +The construction +.P1 +\ew'thing' +.P2 +is a number equal to the width of `thing' in machine units +(1/432 inch). +All +.UL troff +computations are ultimately done in these units. +To move horizontally the width of an `x', +we can say +.P1 +\eh'\ew'x'u' +.P2 +As we mentioned above, +the default scale factor for +all horizontal dimensions is +.BD m , +ems, so here we must have the +.BD u +for machine units, +or the motion produced will be far too large. +.UL troff +is quite happy with the nested quotes, by the way, +so long as you don't leave any out. +.PP +As a live example of this kind of construction, +all of the command names in the text, like +.BD .sp , +were done by overstriking with a slight offset. +The commands for +.BD .sp +are +.P1 +^sp\eh'\-\ew'.sp'u'\eh'1u'.sp +.P2 +That is, put out `.sp', move left by the width of `.sp', +move right 1 unit, and print +`.sp' again. +(Of course there is a way to avoid typing that much input +for each command name, which we will discuss in Section 11.) +.WS +.PP +There are also several special-purpose +.UL troff +commands for local motion. +We have already seen +.BD \e0 , +which is an unpaddable white space +of the same width as a digit. +`Unpaddable' means that it will never be widened +or split across a line by line justification and filling. +There is also +.BD \e (blank), +.tr ^^ +which is an unpaddable character the width of a space, +.BD \e| , +which is half that width, +.BD \e^ , +which is one quarter of the width of a space, +and +.BD \e& , +which has zero width. +.tr ^. +(This last one is useful, for example, in entering +a text line which would otherwise begin with a `.'.) +.PP +The command +.BD \eo , +used like +.P1 +\eo'set of characters' +.P2 +causes (up to 9) +characters to be overstruck, +centered on the widest. +This is nice for accents, as in +.P1 2 +syst\eo"e\e(ga"me t\eo"e\e(aa"l\eo"e\e(aa"phonique +.P2 +which makes +.P1 +syst\o"e\(ga"me t\o"e\(aa"l\o"e\(aa"phonique +.P2 +The accents are +.BD \e(ga +and +.BD \e(aa , +or +.BD \e\` +and +.BD \e\' ; +remember that each is just one character to +.UL troff . +.PP +You can make your own overstrikes with another special convention, +.BD \ez , +the zero-motion command. +.BD \ezx +suppresses the normal horizontal motion +after printing the single character +.BD x , +so another character can be laid on top of it. +Although sizes can be changed within +.BD \eo , +it centers the characters on the widest, +and +there can be no horizontal or vertical motions, +so +.BD \ez +may be the only way to get what you want: +.P1 +.sp 2 +\s8\z\(sq\s14\z\(sq\s22\z\(sq\s36\(sq +.P2 +is produced by +.P1 +^sp 2 +\es8\ez\e(sq\es14\ez\e(sq\es22\ez\e(sq\es36\e(sq +.P2 +The +.BD .sp +is needed to leave room for the result. +.PP +As another example, an extra-heavy semicolon +that looks like +.P1 +\s+6\z,\v'-0.25m'.\v'0.25m'\s0 instead of ; or \s+6;\s0 +.P2 +can be constructed with a big comma and a big period above it: +.P1 +\es+6\ez,\ev'\(mi0.25m'.\ev'0.25m'\es0 +.P2 +`0.25m' is an experimentally-derived constant. +.PP +A more ornate overstrike is given by the bracketing function +.BD \eb , +which piles up characters vertically, +centered on the current baseline. +Thus we can get big brackets, +constructing them with piled-up smaller pieces: +.P1 +.sp +.ne 3 +\b'\(lt\(lk\(lb' \b'\(lc\(lf' x \b'\(rc\(rf' \b'\(rt\(rk\(rb' +.sp +.P2 +by typing in only this: +.P1 0 +\&^sp +\eb\(fm\e(lt\e(lk\e(lb\(fm \eb\(fm\e(lc\e(lf\(fm x \eb\(fm\e(rc\e(rf\(fm \eb\(fm\e(rt\e(rk\e(rb\(fm +.P2 +.PP +.UL troff +also provides a convenient facility for drawing horizontal and vertical +lines of arbitrary length with arbitrary characters. +.BD \el\(fm1i\(fm +draws a line one inch long, like this: +\l'1i'\|. +The length can be followed by +the character to use if the \(ru isn't appropriate; +.BD \el\(fm0.5i.\(fm +draws a half-inch line of dots: \l'.5i.'. +The construction +.BD \eL +is entirely analogous, +except that it draws a vertical line instead of horizontal. diff --git a/share/doc/usd/22.trofftut/tt07 b/share/doc/usd/22.trofftut/tt07 new file mode 100644 index 0000000..3a8cf57 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt07 @@ -0,0 +1,124 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt07 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Strings +.PP +Obviously if a paper contains a large number of occurrences +of an acute accent over a letter `e', +typing +.BD \eo"e\e\'" +for each \*e +would be a great nuisance. +.PP +Fortunately, +.UL troff +provides a way in which you can store an arbitrary +collection of text in a `string', +and thereafter use the string name as a shorthand +for its contents. +Strings are one of several +.UL troff +mechanisms whose judicious use +lets you type a document +with less effort and organize +it +so that extensive format changes +can be made with few editing changes. +.PP +A reference to a string is replaced by whatever +text +the string was defined as. +Strings are defined with the command +.BD .ds . +The line +.P1 +\&^ds e \eo"e\e'" +.P2 +defines the string +.BD e +to have the value +.BD \eo"e\e\'" +.PP +String names may be either one or two characters long, +and are referred to by +.BD \e*x +for one character names or +.BD \e*(xy +for two character names. +Thus to get +t\*el\*ephone, +given the definition of the string +.BD e +as above, +we can say +t\e*el\e*ephone. +.PP +If a string must begin with blanks, define it as +.P1 +\&.ds xx " text +.P2 +The double quote signals the beginning of the definition. +There is no trailing quote; +the end of the line terminates the string. +.PP +A string may actually be several lines long; +if +.UL troff +encounters a +.BD \e +at the end of +.ul +any +line, it is thrown away and the next line +added to the current one. +So you can make a long string simply by ending each line +but the last with a backslash: +.P1 +\&^ds xx this \e +is a very \e +long string +.P2 +.PP +Strings may be defined in terms of other strings, or even in terms of themselves; +we will discuss some of these possibilities later. diff --git a/share/doc/usd/22.trofftut/tt08 b/share/doc/usd/22.trofftut/tt08 new file mode 100644 index 0000000..8f5075d9 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt08 @@ -0,0 +1,199 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt08 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Introduction to Macros +.PP +Before we can go much further in +.UL troff , +we need to learn a bit about the +macro +facility. +In its simplest form, a macro is just a shorthand notation +quite similar to a string. +Suppose we want every paragraph to start +in exactly the same way _ +with a space and a temporary indent of two ems: +.P1 +^sp +^ti +2m +.P2 +Then to save typing, we would like to collapse these into +one shorthand line, +a +.UL troff +`command' like +.P1 +^PP +.P2 +that would be treated by +.UL troff +exactly as +.P1 +^sp +^ti +2m +.P2 +.BD .PP +is called a +.ul +macro. +The way we tell +.UL troff +what +.BD .PP +means is to +.ul +define +it with the +.BD .de +command: +.P1 +^de PP +^sp +^ti +2m +^^ +.P2 +The first line names the macro +(we used +.BD .PP ' ` +for `paragraph', +and upper case so it wouldn't conflict with +any name that +.UL troff +might +already know about). +The last line +.BD .. +marks the end of the definition. +In between is the text, +which is simply inserted whenever +.UL troff +sees the `command' +or macro call +.P1 +^PP +.P2 +A macro +can contain any mixture of text and formatting commands. +.PP +The definition of +.BD .PP +has to precede its first use; +undefined macros are simply ignored. +Names are restricted to one or two characters. +.PP +Using macros for commonly occurring sequences of commands +is critically important. +Not only does it save typing, +but it makes later changes much easier. +Suppose we decide that the paragraph indent is too small, +the vertical space is much too big, +and roman font should be forced. +Instead of changing the whole document, +we need only change the definition of +.BD .PP +to +something like +.P1 +^de PP \e" paragraph macro +^sp 2p +^ti +3m +^ft R +^^ +.P2 +and the change takes +effect everywhere we used +.BD .PP . +.PP +.BD \e" +is a +.UL troff +command that causes the rest of the line to be ignored. +We use it here to add comments to the macro +definition +(a wise idea once definitions get complicated). +.PP +As another example of macros, +consider these two which start and end a block of offset, +unfilled text, like most of the examples in this paper: +.P1 +^de BS \e" start indented block +^sp +^nf +^in +0.3i +^^ +^de BE \e" end indented block +^sp +^fi +^in \(mi0.3i +^^ +.P2 +Now we can surround text like +.P1 +Copy to +John Doe +Richard Roberts +Stanley Smith +.P2 +by the commands +.BD .BS +and +.BD .BE , +and it will come out as it did above. +Notice that we indented by +.BD .in\ +0.3i +instead of +.BD .in\ 0.3i . +This way we can nest our uses of +.BD .BS +and +.BD BE +to get blocks within blocks. +.PP +If later on we decide that the indent +should be 0.5i, then it is only necessary to +change the definitions of +.BD .BS +and +.BD .BE , +not the whole paper. diff --git a/share/doc/usd/22.trofftut/tt09 b/share/doc/usd/22.trofftut/tt09 new file mode 100644 index 0000000..4a44d34 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt09 @@ -0,0 +1,322 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt09 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Titles, Pages and Numbering +.PP +This is an area where things get tougher, +because nothing is done for you automatically. +Of necessity, some of this section is a cookbook, +to be copied literally until you get some experience. +.PP +Suppose you want a title at the top of each page, +saying just +.sp 3p +.lt 2.8i +.tl 'left top'center top'right top' +.lt +.sp 3p +In +.UL roff , +one can say +.P1 2 +^he 'left top'center top'right top' +^fo 'left bottom'center bottom'right bottom' +.P2 +to get headers and footers automatically on every page. +Alas, this doesn't work so easily in +.UL troff , +a serious hardship for the novice. +Instead you have to do a lot of specification (or use +a macro package, which makes it effortless). +.PP +You have to say what the actual title is (easy); +when to print it (easy enough); +and what to do at and around the title line (harder). +Taking these in reverse order, +first we define a macro +.BD .NP +(for `new page') to process +titles and the like at the end of one page +and the beginning of the next: +.P1 +^de NP +\(fmbp +\(fmsp 0.5i +\&.tl 'left top'center top'right top' +\(fmsp 0.3i +^^ +.P2 +To make sure we're at the top of a page, +we issue a `begin page' command +.BD \(fmbp , +which causes a skip to top-of-page +(we'll explain the +.BD \(fm +shortly). +Then we space down half an inch, +print the title +(the use of +.BD .tl +should be self explanatory; later we will discuss parameterizing the titles), +space another 0.3 inches, +and we're done. +.PP +To ask for +.BD .NP +at the bottom of each page, +we have to say something like +`when the text is within an inch +of the bottom of the page, +start the processing +for a new page.' +This is done with a `when' command +.BD .wh : +.P1 +^wh \-1i NP +.P2 +(No `.' is used before NP; +this is simply the name of a macro, not a macro call.) +The minus sign means +`measure up from the bottom of the page', +so +`\-1i' means `one inch from the bottom'. +.PP +The +.BD .wh +command appears in the input outside the definition of +.BD .NP ; +typically the input would be +.P1 +^de NP +^^^ +^^ +^wh \-1i NP +.P2 +.PP +Now what happens? +As text is actually being output, +.UL troff +keeps track of its vertical position on the page, +and after a line is printed within one inch from the bottom, +the +.BD .NP +macro is activated. +(In the jargon, the +.BD .wh +command sets a +.ul +trap +at the specified place, +which is `sprung' when that point is passed.) +.BD .NP +causes a skip to the top of the next page +(that's what the +.BD \(fmbp +was for), +then prints the title with the appropriate margins. +.PP +Why +.BD \(fmbp +and +.BD \(fmsp +instead of +.BD .bp +and +.BD .sp ? +The answer is that +.BD .sp +and +.BD .bp , +like several other commands, +cause a +.ul +break +to take place. +That is, all the input text collected but not yet printed +is flushed out as soon as possible, +and the next input line is guaranteed to start +a new line of output. +If we had used +.BD .sp +or +.BD .bp +in the +.BD .NP +macro, +this would cause a break in the middle +of the current output line when a new page is started. +The effect would be to print the left-over part of that line +at the top of the page, followed by the next input line on a new output line. +This is +.ul +not +what we want. +Using +.BD \(fm +instead of +.BD . +for a command +tells +.UL troff +that +no break is to take place _ +the output line +currently being filled +should +.ul +not +be forced out before the space or new page. +.PP +The list of commands that cause a break +is short and natural: +.P1 +^bp ^br ^ce ^fi ^nf ^sp ^in ^ti +.P2 +All others cause +.ul +no +break, +regardless of whether you use a +.BD . +or a +.BD \(fm . +If you really need a break, add a +.BD .br +command at the appropriate place. +.PP +One other thing to beware of _ +if you're changing fonts or point sizes a lot, +you may find that +if you cross a page boundary +in an unexpected font or size, +your titles come out in that size and font +instead of what you intended. +Furthermore, the length of a title is independent of the current line length, +so titles will come out at the default length of 6.5 inches +unless you change it, +which is done with the +.BD .lt +command. +.PP +There are several ways to fix the problems of point sizes +and fonts in titles. +For the simplest applications, we can change +.BD .NP +to set the proper size and font for the title, +then restore the previous values, like this: +.P1 2 +.ta .8i +^de NP +\(fmbp +\(fmsp 0.5i +^ft R \e" set title font to roman +^ps 10 \e" and size to 10 point +^lt 6i \e" and length to 6 inches +^tl 'left'center'right' +^ps \e" revert to previous size +^ft P \e" and to previous font +\(fmsp 0.3i +^^ +.P2 +.PP +This version of +.BD .NP +does +.ul +not +work if the fields in the +.BD .tl +command contain size or font changes. +To cope with that +requires +.UL troff 's +`environment' mechanism, +which we will discuss in Section 13. +.PP +To get a footer at the bottom of a page, +you can modify +.BD .NP +so it does +some processing before +the +.BD \(fmbp +command, +or split the job into a footer macro invoked +at the bottom margin and a header macro invoked +at the top of the page. +These variations are left as exercises. +.WS +.PP +Output page numbers are computed automatically +as each page is produced (starting at 1), +but no numbers are printed unless you ask for them explicitly. +To get page numbers printed, +include the character +.BD % +in the +.BD .tl +line at +the position where you want the number to appear. +For example +.P1 +^tl ''- % -'' +.P2 +centers the page number inside hyphens, as on this page. +You can set the page number at any time +with either +.BD .bp\ n , +which immediately starts a new page numbered +.BD n , +or with +.BD .pn\ n , +which sets the page number for the next page +but doesn't cause a skip to the new page. +Again, +.BD .bp\ +n +sets the page number to +.BD n +more than its current value; +.BD .bp +means +.BD .bp\ +1 . diff --git a/share/doc/usd/22.trofftut/tt10 b/share/doc/usd/22.trofftut/tt10 new file mode 100644 index 0000000..a63bebd --- /dev/null +++ b/share/doc/usd/22.trofftut/tt10 @@ -0,0 +1,256 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt10 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Number Registers and Arithmetic +.PP +.UL troff +has a facility for doing arithmetic, +and for defining and using variables with numeric values, +called +.ul +number registers. +Number registers, like strings and macros, can be useful in setting up a document +so it is easy to change later. +And of course they serve for any sort of arithmetic computation. +.PP +Like strings, number registers have one or two character names. +They are set by the +.BD .nr +command, +and are referenced anywhere by +.BD \enx +(one character name) or +.BD \en(xy +(two character name). +.PP +There are quite a few pre-defined number registers maintained by +.UL troff , +among them +.BD % +for the current page number; +.BD nl +for the current vertical position on the page; +.BD dy , +.BD mo +and +.BD yr +for the current day, month and year; and +.BD .s +and +.BD .f +for the current size and font. +(The font is a number from 1 to 4.) +Any of these can be used in computations like any other register, +but some, like +.BD .s +and +.BD .f , +cannot be changed with +.BD .nr . +.PP +As an example of the use of number registers, +in the +.BD \-ms +macro package [4], +most significant parameters are defined in terms of the values +of a handful of number registers. +These include the point size for text, the vertical spacing, +and the line and title lengths. +To set the point size and vertical spacing for the following paragraphs, for example, a user may say +.P1 +^nr PS 9 +^nr VS 11 +.P2 +The paragraph macro +.BD .PP +is defined (roughly) as follows: +.P1 +.ta 1i +^de PP +^ps \e\en(PS \e" reset size +^vs \e\en(VSp \e" spacing +^ft R \e" font +^sp 0.5v \e" half a line +^ti +3m +^^ +.P2 +This sets the font to Roman and the point size and line spacing +to whatever values are stored in the number registers +.BD PS +and +.BD VS . +.PP +Why are there two backslashes? +This is the eternal problem of how to quote a quote. +When +.UL troff +originally reads the macro definition, +it peels off one backslash +to see what's coming next. +To ensure that another is left in the definition when the +macro is +.ul +used, +we have to put in two backslashes in the definition. +If only one backslash is used, +point size and vertical spacing will be frozen at the time the macro +is defined, not when it is used. +.PP +Protecting by an extra layer of backslashes +is only needed for +.BD \en , +.BD \e* , +.BD \e$ +(which we haven't come to yet), +and +.BD \e +itself. +Things like +.BD \es , +.BD \ef , +.BD \eh , +.BD \ev , +and so on do not need an extra backslash, +since they are converted by +.UL troff +to an internal code immediately upon being seen. +.WS +.PP +Arithmetic expressions can appear anywhere that +a number is expected. +As a trivial example, +.P1 +^nr PS \e\en(PS\-2 +.P2 +decrements PS by 2. +Expressions can use the arithmetic operators +, \-, *, /, % (mod), +the relational operators >, >=, <, <=, =, and != (not equal), +and parentheses. +.PP +Although the arithmetic we have done so far +has been straightforward, +more complicated things are somewhat tricky. +First, +number registers hold only integers. +.UL troff +arithmetic uses truncating integer division, just like Fortran. +Second, in the absence of parentheses, +evaluation is done left-to-right +without any operator precedence +(including relational operators). +Thus +.P1 +7*\-4+3/13 +.P2 +becomes `\-1'. +Number registers can occur anywhere in an expression, +and so can scale indicators like +.BD p , +.BD i , +.BD m , +and so on (but no spaces). +Although integer division causes truncation, +each number and its scale indicator is converted +to machine units (1/432 inch) before any arithmetic is done, +so +1i/2u +evaluates to +0.5i +correctly. +.PP +The scale indicator +.BD u +often has to appear +when you wouldn't expect it _ +in particular, when arithmetic is being done +in a context that implies horizontal or vertical dimensions. +For example, +.P1 +^ll 7/2i +.P2 +would seem obvious enough _ +3\(12 inches. +Sorry. +Remember that the default units for horizontal parameters like +.BD .ll +are ems. +That's really `7 ems / 2 inches', +and when translated into machine units, it becomes zero. +How about +.P1 +^ll 7i/2 +.P2 +Sorry, still no good _ +the `2' is `2 ems', so `7i/2' is small, +although not zero. +You +.ul +must +use +.P1 +^ll 7i/2u +.P2 +So again, a safe rule is to +attach a scale indicator to every number, +even constants. +.PP +For arithmetic done within a +.BD .nr +command, +there is no implication of horizontal or vertical dimension, +so the default units are `units', +and 7i/2 and 7i/2u +mean the same thing. +Thus +.P1 +^nr ll 7i/2 +^ll \e\en(llu +.P2 +does just what you want, +so long as you +don't forget the +.BD u +on the +.BD .ll +command. diff --git a/share/doc/usd/22.trofftut/tt11 b/share/doc/usd/22.trofftut/tt11 new file mode 100644 index 0000000..538fac7 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt11 @@ -0,0 +1,233 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt11 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Macros with arguments +.PP +The next step is to define macros that can change from one +use to the next +according to parameters supplied as arguments. +To make this work, we need two things: +first, when we define the macro, we have to indicate that some +parts of it will be provided as arguments when the macro is called. +Then when the macro is +called +we have to provide actual arguments +to be plugged into the definition. +.PP +Let us illustrate by defining a macro +.BD .SM +that will print its argument two points +smaller than the surrounding text. +That is, the macro call +.P1 +^SM TROFF +.P2 +will produce +.UC TROFF . +.PP +The definition of +.BD .SM +is +.P1 +^de SM +\es\-2\e\e$1\es+2 +^^ +.P2 +Within a macro definition, +the symbol +.BD \e\e$n +refers to the +.BD n th +argument +that the macro was called with. +Thus +.BD \e\e$1 +is the string to be placed in a smaller point +size when +.BD .SM +is called. +.PP +As a slightly more complicated version, the following definition of +.BD .SM +permits optional second and third arguments +that will be printed in the normal size: +.P1 +^de SM +\e\e$3\es\-2\e\e$1\es+2\e\e$2 +^^ +.P2 +Arguments not provided when the macro is called are treated +as empty, +so +.P1 +^SM TROFF ), +.P2 +produces +.UC TROFF ), +while +.P1 +^SM TROFF ). ( +.P2 +produces +.UC TROFF ). ( +It is convenient to reverse +the order of arguments because trailing punctuation +is much more common than leading. +.PP +By the way, the number of arguments that a macro was called with +is available in number register +.BD .$ . +.PP +The following macro +.BD ^BD +is the one used to make the +`bold roman' we have been using for +.UL troff +command names in text. +It combines horizontal motions, width computations, +and argument rearrangement. +.P1 2 +\&.de BD +\e&\e\e$3\ef1\e\e$1\eh'\-\ew'\e\e$1'u+1u'\e\e$1\efP\e\e$2 +\&.. +.P2 +The +.BD \eh +and +.BD \ew +commands need no extra backslash, as we discussed above. +The +.BD \e& +is there in case the argument begins with a period. +.WS +.PP +Two backslashes are needed with the +.BD \e\e$n +commands, though, +to protect one of them when the macro is +being defined. +Perhaps a second example will make this clearer. +Consider a macro called +.BD .SH +which +produces section headings rather like those in this paper, +with the sections numbered automatically, +and the title in bold in a smaller size. +The use is +.P1 +^SH "Section title ..." +.P2 +(If the argument to a macro is to contain blanks, +then it must be +.ul +surrounded +by double quotes, +unlike a string, where only one leading quote is permitted.) +.PP +Here is the definition of the +.BD .SH +macro: +.P1 +.ta .75i 1.15i +^nr SH 0 \e" initialize section number +^de SH +^sp 0.3i +^ft B +^nr SH \e\en(SH+1 \e" increment number +^ps \e\en(PS\-1 \e" decrease PS +\e\en(SH. \e\e$1 \e" number. title +^ps \e\en(PS \e" restore PS +^sp 0.3i +^ft R +^^ +.P2 +The section number is kept in number register SH, which is incremented each +time just before it is used. +(A number register may have the same name as a macro +without conflict but a string may not.) +.PP +We used +.BD \e\en(SH +instead of +.BD \en(SH +and +.BD \e\en(PS +instead of +.BD \en(PS . +If we had used +.BD \en(SH , +we would get the value of the register at the time the macro was +.ul +defined, +not at the time it was +.ul +used. +If that's what you want, fine, +but not here. +Similarly, +by using +.BD \e\en(PS , +we get the point size at the time the macro is called. +.WS +.PP +As an example that does not involve numbers, +recall our +.BD .NP +macro which had a +.P1 +^tl 'left'center'right' +.P2 +We could make these into parameters by using instead +.P1 +^tl '\e\e*(LT'\e\e*(CT'\e\e*(RT' +.P2 +so the title comes from three strings called LT, CT and RT. +If these are empty, then the title will be a blank line. +Normally CT would be set with something like +.P1 +\&^ds CT - % - +.P2 +to give just the page number between hyphens (as on the top of this page), +but a user could supply private definitions for +any of the strings. diff --git a/share/doc/usd/22.trofftut/tt12 b/share/doc/usd/22.trofftut/tt12 new file mode 100644 index 0000000..8cb3893 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt12 @@ -0,0 +1,164 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt12 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Conditionals +.PP +Suppose we want the +.BD .SH +macro to leave two extra inches of space just before section 1, +but nowhere else. +The cleanest way to do that is to test inside the +.BD .SH +macro +whether +the section number is 1, +and add some space if it is. +The +.BD .if +command provides the conditional test +that we can add +just before the heading line is output: +.P1 4 +^if \e\en(SH=1 ^sp 2i \e" first section only +.P2 +.PP +The condition after the +.BD .if +can be any arithmetic or logical expression. +If the condition is logically true, or arithmetically greater than zero, +the rest of the line is treated as if +it were text _ +here a command. +If the condition is false, or zero or negative, +the rest of the line is skipped. +.PP +It is possible to do more than one command if a condition is true. +Suppose several operations are to be done before section 1. +One possibility is to define a macro +.BD .S1 +and invoke it +if we are about to do section 1 +(as determined by an +.BD .if ). +.P1 +^de S1 +--- processing for section 1 --- +^^ +^de SH +^^^ +^if \e\en(SH=1 ^S1 +^^^ +^^ +.P2 +.PP +An alternate way is to use the +extended form of the +.BD .if , +like this: +.P1 +^if \e\en(SH=1 \e{--- processing +for section 1 ----\e} +.P2 +The braces +.BD \e{ +and +.BD \e} +must occur in the positions shown +or you will get unexpected extra lines in your output. +.UL troff +also provides +an `if-else' construction, +which we will not go into here. +.PP +A condition can be negated by preceding it with +.BD ! ; +we get the same effect as above (but less clearly) by using +.P1 +^if !\e\en(SH>1 ^S1 +.P2 +.PP +There are a handful of +other conditions that can be tested with +.BD .if . +For example, is the current page even or odd? +.P1 +^if o ^tl 'odd page title''- % -' +^if e ^tl '- % -''even page title' +.P2 +gives facing pages different titles and page numbers on the +outside edge when used inside an appropriate new page macro. +.PP +Two other conditions +are +.BD t +and +.BD n , +which tell you whether the formatter is +.UL troff +or +.UL nroff . +.P1 +^if t troff stuff ... +^if n nroff stuff ... +.P2 +.PP +Finally, string comparisons may be made in an +.BD .if : +.P1 +^if 'string1'string2' stuff +.P2 +does `stuff' if +.ul +string1 +is the same as +.ul +string2. +The character separating the strings can be anything +reasonable that is +not contained in either string. +The strings themselves can reference strings with +.BD \e* , +arguments with +.BD \e$ , +and so on. diff --git a/share/doc/usd/22.trofftut/tt13 b/share/doc/usd/22.trofftut/tt13 new file mode 100644 index 0000000..67c97e2 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt13 @@ -0,0 +1,99 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt13 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Environments +.PP +As we mentioned, there is a potential problem +when going across a page boundary: +parameters like size and font +for a page title may well be different from those +in effect in the text when the page boundary occurs. +.UL troff +provides a very general way to deal with this and +similar situations. +There are three `environments', +each of which has independently settable versions of +many of the parameters associated with processing, +including size, font, line and title lengths, +fill/nofill mode, tab stops, and even partially collected lines. +Thus the titling problem may be readily solved by processing the main text +in one environment and titles in a separate one +with its own suitable parameters. +.PP +The command +.BD .ev\ n +shifts to environment +.BD n ; +.BD n +must be 0, 1 or 2. +The command +.BD .ev +with no argument returns to the +previous environment. +Environment names are maintained in a stack, so calls +for different environments may be nested and unwound consistently. +.PP +Suppose we say that the main text is processed in environment 0, +which is where +.UL troff +begins by default. +Then we can modify the new page macro +.BD .NP +to process titles in environment 1 like this: +.P1 2 +^de NP +^ev 1 \e" shift to new environment +^lt 6i \e" set parameters here +^ft R +^ps 10 +\&... any other processing ... +^ev \e" return to previous environment +^^ +.P2 +It is also possible to initialize the parameters for an environment +outside the +.BD .NP +macro, +but the version shown keeps all the processing in one place +and is thus easier to understand and change. diff --git a/share/doc/usd/22.trofftut/tt14 b/share/doc/usd/22.trofftut/tt14 new file mode 100644 index 0000000..6a83f51 --- /dev/null +++ b/share/doc/usd/22.trofftut/tt14 @@ -0,0 +1,155 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)tt14 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.NH +Diversions +.PP +There are numerous occasions in page layout when it is necessary to store some text +for a period of time without actually printing it. +Footnotes are the most obvious example: +the text of the footnote usually appears in the input well before the place +on the page where it is to be printed is reached. +In fact, +the place where it is output normally depends on how big it is, +which implies that there must be a way +to process the footnote at least +enough to decide its size +without printing it. +.PP +.UL troff +provides a mechanism called a diversion +for doing this processing. +Any part of the output may be diverted into a macro instead +of being printed, +and then at some convenient time the macro may be put back into +the input. +.PP +The command +.BD .di\ xy +begins a diversion _ all subsequent output is collected into the macro +.BD xy +until the command +.BD .di +with no arguments is encountered. +This terminates the diversion. +The processed text is available at any time thereafter, simply +by giving the command +.P1 +^xy +.P2 +The vertical size of the last finished diversion is contained in +the built-in number register +.BD dn . +.PP +As a simple example, +suppose we want to implement a `keep-release' +operation, +so that text between the commands +.BD .KS +and +.BD .KE +will not be split across a page boundary +(as for a figure or table). +Clearly, when a +.BD .KS +is encountered, we have to begin diverting +the output so we can find out how big it is. +Then when a +.BD .KE +is seen, we decide +whether the diverted text will fit on the current page, +and print it either there if it fits, or at the top of the next page if it doesn't. +So: +.P1 2 +.ta .6i +^de KS \e" start keep +^br \e" start fresh line +^ev 1 \e" collect in new environment +^fi \e" make it filled text +^di XX \e" collect in XX +^^ +.P2 +.P1 2 +.ta .6i +^de KE \e" end keep +^br \e" get last partial line +^di \e" end diversion +^if \e\en(dn>=\e\en(.t .bp \e" bp if doesn't fit +^nf \e" bring it back in no-fill +^XX \e" text +^ev \e" return to normal environment +^^ +.P2 +Recall that number register +.BD nl +is the current position +on the output page. +Since output was being diverted, this remains +at its value when the diversion started. +.BD dn +is the amount of text in the diversion; +.BD .t +(another built-in register) +is the distance to the next trap, +which we assume is at the bottom margin of the page. +If the diversion is large enough to go past the trap, +the +.BD .if +is satisfied, and +a +.BD .bp +is issued. +In either case, the diverted output is then brought back with +.BD .XX . +It is essential to bring it back in no-fill mode so +.UL troff +will do no further processing on it. +.PP +This is not the most general keep-release, +nor is it robust in the face of all conceivable inputs, +but it would require more space than we have here to write it +in full generality. +This section is not intended +to teach everything about diversions, +but to sketch out enough that you can read +existing macro packages with some comprehension. diff --git a/share/doc/usd/22.trofftut/ttack b/share/doc/usd/22.trofftut/ttack new file mode 100644 index 0000000..ee633f7 --- /dev/null +++ b/share/doc/usd/22.trofftut/ttack @@ -0,0 +1,100 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ttack 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.SH +Acknowledgements +.PP +I am deeply indebted to J. F. Ossanna, +the author of +.UL troff , +for his repeated patient explanations +of +fine points, +and for his continuing willingness to +adapt +.UL troff +to make other uses easier. +I am also grateful to Jim Blinn, Ted Dolotta, +Doug McIlroy, Mike Lesk and Joel Sturman +for helpful comments on this paper. +.SH +References +.LP +.IP [1] +J. F. Ossanna, +.ul +.UC NROFF/TROFF +User's Manual, +Bell Laboratories +Computing Science Technical Report 54, 1976. +.IP [2] +B. W. Kernighan, +.ul +A System for Typesetting Mathematics _ User's Guide +.ul +(Second Edition), +Bell Laboratories +Computing Science Technical Report 17, 1977. +.IP [3] +M. E. Lesk, +.ul +TBL _ A Program to Format Tables, +Bell Laboratories +Computing Science Technical Report 49, 1976. +.IP [4] +M. E. Lesk, +.ul +Typing Documents on UNIX, +Bell Laboratories, 1978. +.IP [5] +J. R. Mashey and D. W. Smith, +.ul +PWB/MM _ +.ul +Programmer's Workbench Memorandum Macros, +Bell Laboratories internal memorandum. +.IP [6] +Eric P. Allman, +.ul +Writing Papers with NROFF using -me, +University of California, Berkeley. diff --git a/share/doc/usd/22.trofftut/ttcharset b/share/doc/usd/22.trofftut/ttcharset new file mode 100644 index 0000000..551e79d --- /dev/null +++ b/share/doc/usd/22.trofftut/ttcharset @@ -0,0 +1,135 @@ +.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode! +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ttcharset 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.bp +.tr __ +.nr VS 12 +.vs 12p +.1C +.SH +Appendix A: Phototypesetter Character Set (APS-5) +.LP +These characters exist in roman, italic, and bold. +To get the one on the left, type the +four-character name on the right. +.sp +.ta .2i .8i 1i 1.6i 1.8i 2.4i 2.6i 3.2i 3.4i 4.0i 4.2i 4.8i 5i 5.6i 5.8i +.nf +.in 0.5i +\(ff \\(ff \(fi \\(fi \(fl \\(fl \(Fi \\(Fi \(Fl \\(Fl +\(ru \\(ru \(em \\(em \(14 \\(14 \(12 \\(12 \(34 \\(34 +\(co \\(co \(de \\(de \(dg \\(dg \(fm \\(fm \(ct \\(ct +\(rg \\(rg \(bu \\(bu \(sq \\(sq \(hy \\(hy + (In bold, \e(sq is \fB\(sq\fP.) +.sp +.in 0 +.tr ~~ +.ps 9 +.fi +The following are special-font characters: +.sp +.in 0.5i +.tr ~~ +.nf +.ta .3i 1i 1.3i 2i 2.3i 3i 3.3i +\(pl \\(pl \(mi \\(mi \(mu \\(mu \(di \\(di +\(eq \\(eq \(== \\(== \(>= \\(>= \(<= \\(<= +\(!= \\(!= \(+- \\(+- \(no \\(no \(sl \\(sl +\(ap \\(ap \(~= \\(~= \(pt \\(pt \(gr \\(gr +\(-> \\(-> \(<- \\(<- \(ua \\(ua \(da \\(da +\(is \\(is \(pd \\(pd \(if \\(if \(sr \\(sr +\(sb \\(sb \(sp \\(sp \(cu \\(cu \(ca \\(ca +\(ib \\(ib \(ip \\(ip \(mo \\(mo \(es \\(es +\(aa \\(aa \(ga \\(ga \(ci \\(ci (gone) \\(bs +\(sc \\(sc \(dd \\(dd \(lh \\(lh \(rh \\(rh +\(lt \\(lt \(rt \\(rt \(lc \\(lc \(rc \\(rc +\(lb \\(lb \(rb \\(rb \(lf \\(lf \(rf \\(rf +\(lk \\(lk \(rk \\(rk \(bv \\(bv \(ts \\(ts +\(br \\(br \(or \\(or \(ul \\(ul \(rn \\(rn +\(** \\(** +.sp +.in 0 +.ps 9 +.fi +These +four +characters also have two-character names. +The \' is the apostrophe on terminals; +the \` is the other quote mark. +.sp +.in .5i +\' \e\(aa \` \e\(ga \(mi \e\(mi \_ \e\_ +.sp +.in 0 +These +characters exist only on the special font, +but they do not have four-character names: +.sp +.in .5i +.nf +.tr ^^ +" { } < > ~ ^ \e # @ +.sp +.in 0 +.fi +For greek, precede the roman letter by +.BD \e(* +to get the corresponding greek; +for example, +.BD \e(*a +is +\(*a. +.sp +.in 0.5i +.nf +.cs R 36 +abgdezyhiklmncoprstufxqw +\(*a\(*b\(*g\(*d\(*e\(*z\(*y\(*h\(*i\(*k\(*l\(*m\(*n\(*c\(*o\(*p\(*r\(*s\(*t\(*u\(*f\(*x\(*q\(*w +.sp +ABGDEZYHIKLMNCOPRSTUFXQW +\(*A\(*B\(*G\(*D\(*E\(*Z\(*Y\(*H\(*I\(*K\(*L\(*M\(*N\(*C\(*O\(*P\(*R\(*S\(*T\(*U\(*F\(*X\(*Q\(*W +.ps 9 +.cs R +.in 0 +.fi diff --git a/share/doc/usd/22.trofftut/ttindex b/share/doc/usd/22.trofftut/ttindex new file mode 100644 index 0000000..ca032e8 --- /dev/null +++ b/share/doc/usd/22.trofftut/ttindex @@ -0,0 +1,200 @@ +.\" This module is believed to contain source code proprietary to AT&T. +.\" Use and redistribution is subject to the Berkeley Software License +.\" Agreement and your Software Agreement with AT&T (Western Electric). +.\" +.\" @(#)ttindex 8.1 (Berkeley) 6/8/93 +.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions are +.\" met: +.\" +.\" Redistributions of source code and documentation must retain the above +.\" copyright notice, this list of conditions and the following +.\" disclaimer. +.\" +.\" Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" +.\" This product includes software developed or owned by Caldera +.\" International, Inc. Neither the name of Caldera International, Inc. +.\" nor the names of other contributors may be used to endorse or promote +.\" products derived from this software without specific prior written +.\" permission. +.\" +.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA +.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR +.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE +.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR +.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE +.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN +.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.bp +.2C +.SH +Index +.LP +.nf +.ps 8 +.vs 9p +! (negating conditionals) 17 +#$ (macro argument) 16 +#*x, #(xy (invoke string macro) 14 +#b (bracketing function) 13 +#d (subscript) 11 +#f (font change) 5 +#h (horizontal motion) 12 +#nx, #n(xy (number register) 15 +#o (overstrike) 13 +#s (size change) 3 +#u (superscript) 11 +#v (vertical motion) 11 +#w (width function) 12 +#z (zero motion) 13 +\(fmcommand instead of ^command 9 +% (page number register) 10,15 +^^ (end of macro definition) 7 +^bp 9,10 +^br (break) 9 +^ce (center) 2 +^ds (define string macro) 7,14 +^fi (fill) 2 +^ft (change font) 5 +^if (conditional test) 16 +^in (indent) 6 +^lg (set ligatures 5 +^ll (line length) 6 +^nf (nofill) 2 +^nr (set number register) 14 +^pn (page number) 10 +^ps (change point size) 1,3 +^sp (space) 4 +^ss (set space size) 10 +^ta (set tab stops) 11 +^tc (set tab character) 10 +^tl (title) 9 +^tr (translate characters) 2,6 +^ul (italicize) 6 +^vs (vertical spacing) 3 +^wh (when conditional) 9,17 +accents 6,13 +apostrophes 6 +arithmetic 15 +backslash 1,3,5,14,16 +begin page (^bp) 9 +block macros (B1,B2) 8 +bold font (.ft B) 5 +boustrophedon 12 +bracketing function (##b) 13 +break (^br) 9 +break-causing commmands 9 +centering (^ce) 2 +changing fonts (^ft, #f) 5 +changing macros 15 +character set 4,5,19 +character translation (^tr) 2,6 +columnated output 10 +commands 1 +commands that cause break 9 +conditionals (^if) 16 +constant proportion 7 +default break list 9 +define macro (^de) 7 +define string macro (^ds) 14 +drawing lines 11 +em 7,11 +end of macro (^^) 7 +even page test (e) 17 +fill (^fi) 2 +fonts (^ft) 4,19 +Greek (#(*-) 5,19 +hanging indent (^ti) 12 +hints 20 +horizontal motion (#h) 12 +hp (horizontal position register) 15 +hyphen 6 +i scale indicator 4 +indent (^in) 6 +index 21 +italic font (.ft I) 4 +italicize (^ul) 6 +legal point sizes 3 +ligatures (ff,fi,fl; ^lg) 5 +line length (^ll) 6 +line spacing (^vs) 3 +local motions (#u,#d,#v,#h,#w,#o,#z,#b) 11 ff +m scale indicator 7 +machine units 4,12 +macro arguments 15 +macros 7 +macros that change 15 +multiple backslashes 14 +negating conditionals (!) 17 +new page macro (NP) 8 +nl (current vertical position register) 15 +nofill (^nf) 2 +NROFF test (n) 17 +nested quotes 12 +number registers (^nr,#n) 14 +numbered paragraphs 12 +odd page test (o) 17 +order of evaluation 14 +overstrike (#o) 13 +p scale indicator 3 +page number register (%) 10 +page numbers (^pn, ^bp) 10 +paragraph macro (PG) 7 +Paternoster 6 +point size (^ps) 1,3 +previous font (#fP, ^ft P) 5 +previous point size (#s0,^ps) 3 +quotes 6 +relative change (\(+-) 6 +ROFF 1 +ROFF header and footer 8 +Roman font (.ft R) 4 +scale indicator i 4 +scale indicator m 7 +scale indicator p 3 +scale indicator u 12 +scale indicators in arithmetic 15 +section heading macro (SC) 15 +set space size (^ss) 10 +size _ see point size +space (^sp) 4 +space between lines (^vs) 3 +special characters (#(xx) 5,19 +string macros (^ds,#*) 14 +subscripts (#d) 11 +superscripts (#u) 11 +tab character (^tc) 11 +tabs (^ta) 10 +temporary indent (^ti) 7 +titles (^tl) 8 +translate (^tr) 2,6,12 +TROFF examples 19 +TROFF test (t) 17 +truncating division 15 +type faces _ see fonts +u scale indicator 12 +underline (^ul) 6 +valid point sizes 3 +vertical motion (#v) 11 +vertical position on page 9 +vertical spacing (^vs) 3 +when (^wh) 9,17 +width function (#w) 12 +width of digits 10 +zero motion (#z) 13 diff --git a/share/doc/usd/Makefile b/share/doc/usd/Makefile new file mode 100644 index 0000000..e7939fe --- /dev/null +++ b/share/doc/usd/Makefile @@ -0,0 +1,21 @@ +# From: @(#)Makefile 8.2 (Berkeley) 4/20/94 +# $FreeBSD$ + +# The following modules are not provided: +# 14.jove + +SUBDIR= title \ + contents \ + 04.csh \ + 07.mail \ + 10.exref \ + 11.vitut \ + 12.vi \ + 13.viref \ + 18.msdiffs \ + 19.memacros \ + 20.meref \ + 21.troff \ + 22.trofftut + +.include <bsd.subdir.mk> diff --git a/share/doc/usd/contents/Makefile b/share/doc/usd/contents/Makefile new file mode 100644 index 0000000..ec0c508 --- /dev/null +++ b/share/doc/usd/contents/Makefile @@ -0,0 +1,8 @@ +# $FreeBSD$ + +VOLUME= usd +DOC= contents +SRCS= contents.ms +MACROS= -ms + +.include <bsd.doc.mk> diff --git a/share/doc/usd/contents/contents.ms b/share/doc/usd/contents/contents.ms new file mode 100644 index 0000000..15bc07a --- /dev/null +++ b/share/doc/usd/contents/contents.ms @@ -0,0 +1,312 @@ +.\" Copyright (c) 1986, 1993 +.\" The Regents of the University of California. All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)00.contents 8.2 (Berkeley) 4/20/94 +.\" $FreeBSD$ +.\" +.de ND +.KE +.sp +.KS +.. +.OH '''USD Contents' +.EH 'USD Contents''' +.TL +UNIX User's Supplementary Documents (USD) +.if !r.U .nr .U 0 +.if \n(.U \{\ +.br +.>> <a href="Title.html">Title.html</a> +.\} +.sp +\s-2 4.4 Berkeley Software Distribution\s+2 +.sp +\fRJune, 1993\fR +.PP +This volume contains documents which supplement the manual pages in +.I +The Unix User's Reference Manual +.R +for the 4.4BSD system as distributed by U.C. Berkeley. +.sp +.KS +.SH +Getting Started +.ND +.IP +.tl 'Unix for Beginners \- Second Edition''USD:1' +.QP +An introduction to the most basic uses of the system. +.ND +.IP +.tl 'Learn \- Computer\-Aided Instruction on UNIX (Second Edition)''USD:2' +.QP +Describes a computer-aided instruction program that walks new users through +the basics of files, the editor, and document prepararation software. +.ND +.SH +Basic Utilities +.ND +.IP +.tl 'An Introduction to the UNIX Shell''USD:3' +.QP +Steve Bourne's introduction to the capabilities of +.I sh, +a command interpreter especially popular for writing shell scripts. +.ND +.IP +.tl 'An Introduction to the C shell''USD:4' +.if \n(.U \{\ +.br +.>> <a href="04.csh/paper.html">04.csh/paper.html</a> +.\} +.QP +This introduction to +.I csh, +(a command interpreter popular for interactive work) describes many +commonly used UNIX commands, assumes little prior knowledge of UNIX, +and has a glossary useful for beginners. +.ND +.IP +.tl 'DC \- An Interactive Desk Calculator''USD:5' +.QP +A super HP calculator, if you do not need floating point. +.ND +.IP +.tl 'BC \- An Arbitrary Precision Desk-Calculator Language''USD:6' +.QP +A front end for DC that provides infix notation, control flow, and +built\-in functions. +.ND +.SH +Communicating with the World +.ND +.IP +.tl 'Mail Reference Manual''USD:7' +.if \n(.U \{\ +.br +.>> <a href="07.mail/paper.html">07.mail/paper.html</a> +.\} +.QP +Complete details on one of the programs for sending and reading your mail. +.ND +.IP +.tl 'The Rand MH Message Handling System''USD:8' +.QP +This system for managing your computer mail uses lots of small programs, +instead of one large one. +.ND +.SH +Text Editing +.ND +.IP +.tl 'A Tutorial Introduction to the Unix Text Editor''USD:9' +.QP +An easy way to get started with the line editor, +.I ed. +.ND +.IP +.tl 'Advanced Editing on Unix''USD:10' +.if \n(.U \{\ +.br +.>> <a href="10.exref/paper.html">10.exref/paper.html</a> +.\} +.QP +The next step. +.ND +.IP +.tl 'An Introduction to Display Editing with Vi''USD:11' +.if \n(.U \{\ +.br +.>> <a href="11.vitut/paper.html">11.vitut/paper.html</a> +.\} +.QP +The document to learn to use the \fIvi\fR screen editor. +.ND +.IP +.tl 'Ex Reference Manual (Version 3.7)''USD:12' +.if \n(.U \{\ +.br +.>> <a href="12.vi/paper.html">12.vi/paper.html</a> +.\} +.QP +The final reference for the \fIex\fR editor. +.ND +.IP +.tl 'Vi Reference Manual''USD:13' +.if \n(.U \{\ +.br +.>> <a href="13.viref/paper.html">13.viref/paper.html</a> +.\} +.QP +The definitive reference for the \fInvi\fR editor. +.ND +.IP +.tl 'Jove Manual for UNIX Users''USD:14' +.QP +Jove is a small, self-documenting, customizable display editor, based on +EMACS. A plausible alternative to +.I vi. +.ND +.IP +.tl 'SED \- A Non-interactive Text Editor''USD:15' +.QP +Describes a one-pass variant of +.I ed +useful as a filter for processing large files. +.ND +.IP +.tl 'AWK \- A Pattern Scanning and Processing Language (Second Edition)''USD:16' +.QP +A program for data selection and transformation. +.ND +.SH +Document Preparation +.ND +.IP +.tl 'Typing Documents on UNIX: Using the \-ms Macros with Troff and Nroff''USD:17' +.QP +Describes and gives examples of the basic use of the typesetting tools and +``-ms'', a frequently used package of formatting requests that make it easier +to lay out most documents. +.ND +.IP +.tl 'A Revised Version of \-ms''USD:18' +.if \n(.U \{\ +.br +.>> <a href="18.msdiffs/paper.html">18.msdiffs/paper.html</a> +.\} +.QP +A brief description of the Berkeley revisions made to the \-ms formatting +macros for nroff and troff. +.ND +.IP +.tl 'Writing Papers with \fInroff\fR using \-me''USD:19' +.if \n(.U \{\ +.br +.>> <a href="19.memacros/paper.html">19.memacros/paper.html</a> +.\} +.QP +Another popular macro package for +.I nroff. +.ND +.IP +.tl '\-me Reference Manual''USD:20' +.if \n(.U \{\ +.br +.>> <a href="20.meref/paper.html">20.meref/paper.html</a> +.\} +.QP +The final word on \-me. +.ND +.IP +.tl 'NROFF/TROFF User\'s Manual''USD:21' +.QP +Extremely detailed information about these document formatting programs. +.ND +.IP +.tl 'A TROFF Tutorial''USD:22' +.QP +An introduction to the most basic uses of +.I troff +for those who really want to know such things, or want to write their +own macros. +.ND +.IP +.tl 'A System for Typesetting Mathematics''USD:23' +.QP +Describes +.I eqn, +an easy-to-learn language for high-quality mathematical typesetting. +.ND +.IP +.tl 'Typesetting Mathematics \- User\'s Guide (Second Edition)''USD:24' +.QP +More details about how to use +.I eqn. +.ND +.IP +.tl 'Tbl \- A Program to Format Tables''USD:25' +.QP +A program for easily typesetting tabular material. +.ND +.IP +.tl 'Refer \- A Bibliography System''USD:26' +.QP +An introduction to one set of tools used to maintain bibliographic databases. +The major program, +.I refer, +is used to automatically retrieve and format the references +based on document citations. +.ND +.IP +.tl 'Some Applications of Inverted Indexes on the UNIX System''USD:27' +.QP +Mike Lesk's paper describes the +.I refer +programs in a somewhat larger context. +.ND +.IP +.tl 'BIB \- A Program for Formatting Bibliographies''USD:28' +.QP +This is an alternative to +.I refer +for expanding citations in documents. +.ND +.IP +.tl 'Writing Tools \- The STYLE and DICTION Programs''USD:29' +.QP +These are programs which can help you understand and improve your +writing style. +.ND +.SH +Amusements +.ND +.IP +.tl 'A Guide to the Dungeons of Doom''USD:30' +.if \n(.U \{\ +.br +.>> <a href="30.rogue/paper.html">30.rogue/paper.html</a> +.\} +.QP +An introduction to the popular game of \fIrogue\fP, a fantasy game +which is one of the biggest known users of VAX cycles. +.ND +.IP +.tl 'Star Trek''USD:31' +.if \n(.U \{\ +.br +.>> <a href="31.trek/paper.html">31.trek/paper.html</a> +.\} +.QP +You are the Captain of the Starship Enterprise. Wipe out the +Klingons and save the Federation. +.KE diff --git a/share/doc/usd/title/Makefile b/share/doc/usd/title/Makefile new file mode 100644 index 0000000..b773fc3 --- /dev/null +++ b/share/doc/usd/title/Makefile @@ -0,0 +1,7 @@ +# $FreeBSD$ + +VOLUME= usd +DOC= Title +SRCS= Title + +.include <bsd.doc.mk> diff --git a/share/doc/usd/title/Title b/share/doc/usd/title/Title new file mode 100644 index 0000000..1de7297 --- /dev/null +++ b/share/doc/usd/title/Title @@ -0,0 +1,121 @@ +.\" Copyright (c) 1986, 1993 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" @(#)Title 8.2 (Berkeley) 4/19/94 +.\" $FreeBSD$ +.\" +.ps 18 +.vs 22 +.sp 2.75i +.ft B +.ce 2 +UNIX User's Supplementary Documents +(USD) +.ps 14 +.vs 16 +.sp |4i +.ce 2 +4.4 Berkeley Software Distribution +.sp |5.75i +.ft R +.ps 12 +.vs 16 +.ce +June, 1993 +.sp |8.2i +.ce 5 +Computer Systems Research Group +Computer Science Division +Department of Electrical Engineering and Computer Science +University of California +Berkeley, California 94720 +.bp +\& +.sp |1i +.hy 0 +.ps 10 +.vs 12p +Copyright 1979, 1980, 1983, 1986, 1993 +The Regents of the University of California. All rights reserved. +.sp 2 +Other than the specific documents listed below as copyrighted by AT&T, +redistribution and use of this manual in source and binary forms, +with or without modification, are permitted provided that the +following conditions are met: +.sp 0.5 +.in +0.2i +.ta 0.2i +.ti -0.2i +1) Redistributions of this manual must retain the copyright +notices on this page, this list of conditions and the following disclaimer. +.ti -0.2i +2) Software or documentation that incorporates part of this manual must +reproduce the copyright notices on this page, this list of conditions and +the following disclaimer in the documentation and/or other materials +provided with the distribution. +.ti -0.2i +3) All advertising materials mentioning features or use of this software +must display the following acknowledgement: +``This product includes software developed by the University of +California, Berkeley and its contributors.'' +.ti -0.2i +4) Neither the name of the University nor the names of its contributors +may be used to endorse or promote products derived from this software +without specific prior written permission. +.in -0.2i +.sp +\fB\s-1THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE.\s+1\fP +.sp 2 +Documents USD:1, 2, 3, 5, 6, 9, 10, 15, 16, 17, 21, 22, 23, 24, 25, 26, 27, +and 29 are copyright 1979, AT&T Bell Laboratories, Incorporated. +Holders of \x'-1p'UNIX\v'-4p'\s-3TM\s0\v'4p'/32V, +System III, or System V software licenses are +permitted to copy these documents, or any portion of them, +as necessary for licensed use of the software, +provided this copyright notice and statement of permission +are included. +.sp 2 +Documents USD:8, 14, and 28 are part of the +user contributed software. +.sp 2 +The views and conclusions contained in this manual are those of the +authors and should not be interpreted as representing official policies, +either expressed or implied, of the Regents of the University of California. |