diff options
Diffstat (limited to 'contrib/bind9/doc/rfc/rfc1122.txt')
-rw-r--r-- | contrib/bind9/doc/rfc/rfc1122.txt | 6844 |
1 files changed, 0 insertions, 6844 deletions
diff --git a/contrib/bind9/doc/rfc/rfc1122.txt b/contrib/bind9/doc/rfc/rfc1122.txt deleted file mode 100644 index c14f2e5..0000000 --- a/contrib/bind9/doc/rfc/rfc1122.txt +++ /dev/null @@ -1,6844 +0,0 @@ - - - - - - -Network Working Group Internet Engineering Task Force -Request for Comments: 1122 R. Braden, Editor - October 1989 - - - Requirements for Internet Hosts -- Communication Layers - - -Status of This Memo - - This RFC is an official specification for the Internet community. It - incorporates by reference, amends, corrects, and supplements the - primary protocol standards documents relating to hosts. Distribution - of this document is unlimited. - -Summary - - This is one RFC of a pair that defines and discusses the requirements - for Internet host software. This RFC covers the communications - protocol layers: link layer, IP layer, and transport layer; its - companion RFC-1123 covers the application and support protocols. - - - - Table of Contents - - - - - 1. INTRODUCTION ............................................... 5 - 1.1 The Internet Architecture .............................. 6 - 1.1.1 Internet Hosts .................................... 6 - 1.1.2 Architectural Assumptions ......................... 7 - 1.1.3 Internet Protocol Suite ........................... 8 - 1.1.4 Embedded Gateway Code ............................. 10 - 1.2 General Considerations ................................. 12 - 1.2.1 Continuing Internet Evolution ..................... 12 - 1.2.2 Robustness Principle .............................. 12 - 1.2.3 Error Logging ..................................... 13 - 1.2.4 Configuration ..................................... 14 - 1.3 Reading this Document .................................. 15 - 1.3.1 Organization ...................................... 15 - 1.3.2 Requirements ...................................... 16 - 1.3.3 Terminology ....................................... 17 - 1.4 Acknowledgments ........................................ 20 - - 2. LINK LAYER .................................................. 21 - 2.1 INTRODUCTION ........................................... 21 - - - -Internet Engineering Task Force [Page 1] - - - - -RFC1122 INTRODUCTION October 1989 - - - 2.2 PROTOCOL WALK-THROUGH .................................. 21 - 2.3 SPECIFIC ISSUES ........................................ 21 - 2.3.1 Trailer Protocol Negotiation ...................... 21 - 2.3.2 Address Resolution Protocol -- ARP ................ 22 - 2.3.2.1 ARP Cache Validation ......................... 22 - 2.3.2.2 ARP Packet Queue ............................. 24 - 2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24 - 2.4 LINK/INTERNET LAYER INTERFACE .......................... 25 - 2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26 - - 3. INTERNET LAYER PROTOCOLS .................................... 27 - 3.1 INTRODUCTION ............................................ 27 - 3.2 PROTOCOL WALK-THROUGH .................................. 29 - 3.2.1 Internet Protocol -- IP ............................ 29 - 3.2.1.1 Version Number ............................... 29 - 3.2.1.2 Checksum ..................................... 29 - 3.2.1.3 Addressing ................................... 29 - 3.2.1.4 Fragmentation and Reassembly ................. 32 - 3.2.1.5 Identification ............................... 32 - 3.2.1.6 Type-of-Service .............................. 33 - 3.2.1.7 Time-to-Live ................................. 34 - 3.2.1.8 Options ...................................... 35 - 3.2.2 Internet Control Message Protocol -- ICMP .......... 38 - 3.2.2.1 Destination Unreachable ...................... 39 - 3.2.2.2 Redirect ..................................... 40 - 3.2.2.3 Source Quench ................................ 41 - 3.2.2.4 Time Exceeded ................................ 41 - 3.2.2.5 Parameter Problem ............................ 42 - 3.2.2.6 Echo Request/Reply ........................... 42 - 3.2.2.7 Information Request/Reply .................... 43 - 3.2.2.8 Timestamp and Timestamp Reply ................ 43 - 3.2.2.9 Address Mask Request/Reply ................... 45 - 3.2.3 Internet Group Management Protocol IGMP ........... 47 - 3.3 SPECIFIC ISSUES ........................................ 47 - 3.3.1 Routing Outbound Datagrams ........................ 47 - 3.3.1.1 Local/Remote Decision ........................ 47 - 3.3.1.2 Gateway Selection ............................ 48 - 3.3.1.3 Route Cache .................................. 49 - 3.3.1.4 Dead Gateway Detection ....................... 51 - 3.3.1.5 New Gateway Selection ........................ 55 - 3.3.1.6 Initialization ............................... 56 - 3.3.2 Reassembly ........................................ 56 - 3.3.3 Fragmentation ..................................... 58 - 3.3.4 Local Multihoming ................................. 60 - 3.3.4.1 Introduction ................................. 60 - 3.3.4.2 Multihoming Requirements ..................... 61 - 3.3.4.3 Choosing a Source Address .................... 64 - 3.3.5 Source Route Forwarding ........................... 65 - - - -Internet Engineering Task Force [Page 2] - - - - -RFC1122 INTRODUCTION October 1989 - - - 3.3.6 Broadcasts ........................................ 66 - 3.3.7 IP Multicasting ................................... 67 - 3.3.8 Error Reporting ................................... 69 - 3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69 - 3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72 - - 4. TRANSPORT PROTOCOLS ......................................... 77 - 4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77 - 4.1.1 INTRODUCTION ...................................... 77 - 4.1.2 PROTOCOL WALK-THROUGH ............................. 77 - 4.1.3 SPECIFIC ISSUES ................................... 77 - 4.1.3.1 Ports ........................................ 77 - 4.1.3.2 IP Options ................................... 77 - 4.1.3.3 ICMP Messages ................................ 78 - 4.1.3.4 UDP Checksums ................................ 78 - 4.1.3.5 UDP Multihoming .............................. 79 - 4.1.3.6 Invalid Addresses ............................ 79 - 4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79 - 4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80 - 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82 - 4.2.1 INTRODUCTION ...................................... 82 - 4.2.2 PROTOCOL WALK-THROUGH ............................. 82 - 4.2.2.1 Well-Known Ports ............................. 82 - 4.2.2.2 Use of Push .................................. 82 - 4.2.2.3 Window Size .................................. 83 - 4.2.2.4 Urgent Pointer ............................... 84 - 4.2.2.5 TCP Options .................................. 85 - 4.2.2.6 Maximum Segment Size Option .................. 85 - 4.2.2.7 TCP Checksum ................................. 86 - 4.2.2.8 TCP Connection State Diagram ................. 86 - 4.2.2.9 Initial Sequence Number Selection ............ 87 - 4.2.2.10 Simultaneous Open Attempts .................. 87 - 4.2.2.11 Recovery from Old Duplicate SYN ............. 87 - 4.2.2.12 RST Segment ................................. 87 - 4.2.2.13 Closing a Connection ........................ 87 - 4.2.2.14 Data Communication .......................... 89 - 4.2.2.15 Retransmission Timeout ...................... 90 - 4.2.2.16 Managing the Window ......................... 91 - 4.2.2.17 Probing Zero Windows ........................ 92 - 4.2.2.18 Passive OPEN Calls .......................... 92 - 4.2.2.19 Time to Live ................................ 93 - 4.2.2.20 Event Processing ............................ 93 - 4.2.2.21 Acknowledging Queued Segments ............... 94 - 4.2.3 SPECIFIC ISSUES ................................... 95 - 4.2.3.1 Retransmission Timeout Calculation ........... 95 - 4.2.3.2 When to Send an ACK Segment .................. 96 - 4.2.3.3 When to Send a Window Update ................. 97 - 4.2.3.4 When to Send Data ............................ 98 - - - -Internet Engineering Task Force [Page 3] - - - - -RFC1122 INTRODUCTION October 1989 - - - 4.2.3.5 TCP Connection Failures ...................... 100 - 4.2.3.6 TCP Keep-Alives .............................. 101 - 4.2.3.7 TCP Multihoming .............................. 103 - 4.2.3.8 IP Options ................................... 103 - 4.2.3.9 ICMP Messages ................................ 103 - 4.2.3.10 Remote Address Validation ................... 104 - 4.2.3.11 TCP Traffic Patterns ........................ 104 - 4.2.3.12 Efficiency .................................. 105 - 4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106 - 4.2.4.1 Asynchronous Reports ......................... 106 - 4.2.4.2 Type-of-Service .............................. 107 - 4.2.4.3 Flush Call ................................... 107 - 4.2.4.4 Multihoming .................................. 108 - 4.2.5 TCP REQUIREMENT SUMMARY ........................... 108 - - 5. REFERENCES ................................................. 112 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Internet Engineering Task Force [Page 4] - - - - -RFC1122 INTRODUCTION October 1989 - - -1. INTRODUCTION - - This document is one of a pair that defines and discusses the - requirements for host system implementations of the Internet protocol - suite. This RFC covers the communication protocol layers: link - layer, IP layer, and transport layer. Its companion RFC, - "Requirements for Internet Hosts -- Application and Support" - [INTRO:1], covers the application layer protocols. This document - should also be read in conjunction with "Requirements for Internet - Gateways" [INTRO:2]. - - These documents are intended to provide guidance for vendors, - implementors, and users of Internet communication software. They - represent the consensus of a large body of technical experience and - wisdom, contributed by the members of the Internet research and - vendor communities. - - This RFC enumerates standard protocols that a host connected to the - Internet must use, and it incorporates by reference the RFCs and - other documents describing the current specifications for these - protocols. It corrects errors in the referenced documents and adds - additional discussion and guidance for an implementor. - - For each protocol, this document also contains an explicit set of - requirements, recommendations, and options. The reader must - understand that the list of requirements in this document is - incomplete by itself; the complete set of requirements for an - Internet host is primarily defined in the standard protocol - specification documents, with the corrections, amendments, and - supplements contained in this RFC. - - A good-faith implementation of the protocols that was produced after - careful reading of the RFC's and with some interaction with the - Internet technical community, and that followed good communications - software engineering practices, should differ from the requirements - of this document in only minor ways. Thus, in many cases, the - "requirements" in this RFC are already stated or implied in the - standard protocol documents, so that their inclusion here is, in a - sense, redundant. However, they were included because some past - implementation has made the wrong choice, causing problems of - interoperability, performance, and/or robustness. - - This document includes discussion and explanation of many of the - requirements and recommendations. A simple list of requirements - would be dangerous, because: - - o Some required features are more important than others, and some - features are optional. - - - -Internet Engineering Task Force [Page 5] - - - - -RFC1122 INTRODUCTION October 1989 - - - o There may be valid reasons why particular vendor products that - are designed for restricted contexts might choose to use - different specifications. - - However, the specifications of this document must be followed to meet - the general goal of arbitrary host interoperation across the - diversity and complexity of the Internet system. Although most - current implementations fail to meet these requirements in various - ways, some minor and some major, this specification is the ideal - towards which we need to move. - - These requirements are based on the current level of Internet - architecture. This document will be updated as required to provide - additional clarifications or to include additional information in - those areas in which specifications are still evolving. - - This introductory section begins with a brief overview of the - Internet architecture as it relates to hosts, and then gives some - general advice to host software vendors. Finally, there is some - guidance on reading the rest of the document and some terminology. - - 1.1 The Internet Architecture - - General background and discussion on the Internet architecture and - supporting protocol suite can be found in the DDN Protocol - Handbook [INTRO:3]; for background see for example [INTRO:9], - [INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the - procedure for obtaining Internet protocol documents, while - [INTRO:6] contains a list of the numbers assigned within Internet - protocols. - - 1.1.1 Internet Hosts - - A host computer, or simply "host," is the ultimate consumer of - communication services. A host generally executes application - programs on behalf of user(s), employing network and/or - Internet communication services in support of this function. - An Internet host corresponds to the concept of an "End-System" - used in the OSI protocol suite [INTRO:13]. - - An Internet communication system consists of interconnected - packet networks supporting communication among host computers - using the Internet protocols. The networks are interconnected - using packet-switching computers called "gateways" or "IP - routers" by the Internet community, and "Intermediate Systems" - by the OSI world [INTRO:13]. The RFC "Requirements for - Internet Gateways" [INTRO:2] contains the official - specifications for Internet gateways. That RFC together with - - - -Internet Engineering Task Force [Page 6] - - - - -RFC1122 INTRODUCTION October 1989 - - - the present document and its companion [INTRO:1] define the - rules for the current realization of the Internet architecture. - - Internet hosts span a wide range of size, speed, and function. - They range in size from small microprocessors through - workstations to mainframes and supercomputers. In function, - they range from single-purpose hosts (such as terminal servers) - to full-service hosts that support a variety of online network - services, typically including remote login, file transfer, and - electronic mail. - - A host is generally said to be multihomed if it has more than - one interface to the same or to different networks. See - Section 1.1.3 on "Terminology". - - 1.1.2 Architectural Assumptions - - The current Internet architecture is based on a set of - assumptions about the communication system. The assumptions - most relevant to hosts are as follows: - - (a) The Internet is a network of networks. - - Each host is directly connected to some particular - network(s); its connection to the Internet is only - conceptual. Two hosts on the same network communicate - with each other using the same set of protocols that they - would use to communicate with hosts on distant networks. - - (b) Gateways don't keep connection state information. - - To improve robustness of the communication system, - gateways are designed to be stateless, forwarding each IP - datagram independently of other datagrams. As a result, - redundant paths can be exploited to provide robust service - in spite of failures of intervening gateways and networks. - - All state information required for end-to-end flow control - and reliability is implemented in the hosts, in the - transport layer or in application programs. All - connection control information is thus co-located with the - end points of the communication, so it will be lost only - if an end point fails. - - (c) Routing complexity should be in the gateways. - - Routing is a complex and difficult problem, and ought to - be performed by the gateways, not the hosts. An important - - - -Internet Engineering Task Force [Page 7] - - - - -RFC1122 INTRODUCTION October 1989 - - - objective is to insulate host software from changes caused - by the inevitable evolution of the Internet routing - architecture. - - (d) The System must tolerate wide network variation. - - A basic objective of the Internet design is to tolerate a - wide range of network characteristics -- e.g., bandwidth, - delay, packet loss, packet reordering, and maximum packet - size. Another objective is robustness against failure of - individual networks, gateways, and hosts, using whatever - bandwidth is still available. Finally, the goal is full - "open system interconnection": an Internet host must be - able to interoperate robustly and effectively with any - other Internet host, across diverse Internet paths. - - Sometimes host implementors have designed for less - ambitious goals. For example, the LAN environment is - typically much more benign than the Internet as a whole; - LANs have low packet loss and delay and do not reorder - packets. Some vendors have fielded host implementations - that are adequate for a simple LAN environment, but work - badly for general interoperation. The vendor justifies - such a product as being economical within the restricted - LAN market. However, isolated LANs seldom stay isolated - for long; they are soon gatewayed to each other, to - organization-wide internets, and eventually to the global - Internet system. In the end, neither the customer nor the - vendor is served by incomplete or substandard Internet - host software. - - The requirements spelled out in this document are designed - for a full-function Internet host, capable of full - interoperation over an arbitrary Internet path. - - - 1.1.3 Internet Protocol Suite - - To communicate using the Internet system, a host must implement - the layered set of protocols comprising the Internet protocol - suite. A host typically must implement at least one protocol - from each layer. - - The protocol layers used in the Internet architecture are as - follows [INTRO:4]: - - - o Application Layer - - - -Internet Engineering Task Force [Page 8] - - - - -RFC1122 INTRODUCTION October 1989 - - - The application layer is the top layer of the Internet - protocol suite. The Internet suite does not further - subdivide the application layer, although some of the - Internet application layer protocols do contain some - internal sub-layering. The application layer of the - Internet suite essentially combines the functions of the - top two layers -- Presentation and Application -- of the - OSI reference model. - - We distinguish two categories of application layer - protocols: user protocols that provide service directly - to users, and support protocols that provide common system - functions. Requirements for user and support protocols - will be found in the companion RFC [INTRO:1]. - - The most common Internet user protocols are: - - o Telnet (remote login) - o FTP (file transfer) - o SMTP (electronic mail delivery) - - There are a number of other standardized user protocols - [INTRO:4] and many private user protocols. - - Support protocols, used for host name mapping, booting, - and management, include SNMP, BOOTP, RARP, and the Domain - Name System (DNS) protocols. - - - o Transport Layer - - The transport layer provides end-to-end communication - services for applications. There are two primary - transport layer protocols at present: - - o Transmission Control Protocol (TCP) - o User Datagram Protocol (UDP) - - TCP is a reliable connection-oriented transport service - that provides end-to-end reliability, resequencing, and - flow control. UDP is a connectionless ("datagram") - transport service. - - Other transport protocols have been developed by the - research community, and the set of official Internet - transport protocols may be expanded in the future. - - Transport layer protocols are discussed in Chapter 4. - - - -Internet Engineering Task Force [Page 9] - - - - -RFC1122 INTRODUCTION October 1989 - - - o Internet Layer - - All Internet transport protocols use the Internet Protocol - (IP) to carry data from source host to destination host. - IP is a connectionless or datagram internetwork service, - providing no end-to-end delivery guarantees. Thus, IP - datagrams may arrive at the destination host damaged, - duplicated, out of order, or not at all. The layers above - IP are responsible for reliable delivery service when it - is required. The IP protocol includes provision for - addressing, type-of-service specification, fragmentation - and reassembly, and security information. - - The datagram or connectionless nature of the IP protocol - is a fundamental and characteristic feature of the - Internet architecture. Internet IP was the model for the - OSI Connectionless Network Protocol [INTRO:12]. - - ICMP is a control protocol that is considered to be an - integral part of IP, although it is architecturally - layered upon IP, i.e., it uses IP to carry its data end- - to-end just as a transport protocol like TCP or UDP does. - ICMP provides error reporting, congestion reporting, and - first-hop gateway redirection. - - IGMP is an Internet layer protocol used for establishing - dynamic host groups for IP multicasting. - - The Internet layer protocols IP, ICMP, and IGMP are - discussed in Chapter 3. - - - o Link Layer - - To communicate on its directly-connected network, a host - must implement the communication protocol used to - interface to that network. We call this a link layer or - media-access layer protocol. - - There is a wide variety of link layer protocols, - corresponding to the many different types of networks. - See Chapter 2. - - - 1.1.4 Embedded Gateway Code - - Some Internet host software includes embedded gateway - functionality, so that these hosts can forward packets as a - - - -Internet Engineering Task Force [Page 10] - - - - -RFC1122 INTRODUCTION October 1989 - - - gateway would, while still performing the application layer - functions of a host. - - Such dual-purpose systems must follow the Gateway Requirements - RFC [INTRO:2] with respect to their gateway functions, and - must follow the present document with respect to their host - functions. In all overlapping cases, the two specifications - should be in agreement. - - There are varying opinions in the Internet community about - embedded gateway functionality. The main arguments are as - follows: - - o Pro: in a local network environment where networking is - informal, or in isolated internets, it may be convenient - and economical to use existing host systems as gateways. - - There is also an architectural argument for embedded - gateway functionality: multihoming is much more common - than originally foreseen, and multihoming forces a host to - make routing decisions as if it were a gateway. If the - multihomed host contains an embedded gateway, it will - have full routing knowledge and as a result will be able - to make more optimal routing decisions. - - o Con: Gateway algorithms and protocols are still changing, - and they will continue to change as the Internet system - grows larger. Attempting to include a general gateway - function within the host IP layer will force host system - maintainers to track these (more frequent) changes. Also, - a larger pool of gateway implementations will make - coordinating the changes more difficult. Finally, the - complexity of a gateway IP layer is somewhat greater than - that of a host, making the implementation and operation - tasks more complex. - - In addition, the style of operation of some hosts is not - appropriate for providing stable and robust gateway - service. - - There is considerable merit in both of these viewpoints. One - conclusion can be drawn: an host administrator must have - conscious control over whether or not a given host acts as a - gateway. See Section 3.1 for the detailed requirements. - - - - - - - -Internet Engineering Task Force [Page 11] - - - - -RFC1122 INTRODUCTION October 1989 - - - 1.2 General Considerations - - There are two important lessons that vendors of Internet host - software have learned and which a new vendor should consider - seriously. - - 1.2.1 Continuing Internet Evolution - - The enormous growth of the Internet has revealed problems of - management and scaling in a large datagram-based packet - communication system. These problems are being addressed, and - as a result there will be continuing evolution of the - specifications described in this document. These changes will - be carefully planned and controlled, since there is extensive - participation in this planning by the vendors and by the - organizations responsible for operations of the networks. - - Development, evolution, and revision are characteristic of - computer network protocols today, and this situation will - persist for some years. A vendor who develops computer - communication software for the Internet protocol suite (or any - other protocol suite!) and then fails to maintain and update - that software for changing specifications is going to leave a - trail of unhappy customers. The Internet is a large - communication network, and the users are in constant contact - through it. Experience has shown that knowledge of - deficiencies in vendor software propagates quickly through the - Internet technical community. - - 1.2.2 Robustness Principle - - At every layer of the protocols, there is a general rule whose - application can lead to enormous benefits in robustness and - interoperability [IP:1]: - - "Be liberal in what you accept, and - conservative in what you send" - - Software should be written to deal with every conceivable - error, no matter how unlikely; sooner or later a packet will - come in with that particular combination of errors and - attributes, and unless the software is prepared, chaos can - ensue. In general, it is best to assume that the network is - filled with malevolent entities that will send in packets - designed to have the worst possible effect. This assumption - will lead to suitable protective design, although the most - serious problems in the Internet have been caused by - unenvisaged mechanisms triggered by low-probability events; - - - -Internet Engineering Task Force [Page 12] - - - - -RFC1122 INTRODUCTION October 1989 - - - mere human malice would never have taken so devious a course! - - Adaptability to change must be designed into all levels of - Internet host software. As a simple example, consider a - protocol specification that contains an enumeration of values - for a particular header field -- e.g., a type field, a port - number, or an error code; this enumeration must be assumed to - be incomplete. Thus, if a protocol specification defines four - possible error codes, the software must not break when a fifth - code shows up. An undefined code might be logged (see below), - but it must not cause a failure. - - The second part of the principle is almost as important: - software on other hosts may contain deficiencies that make it - unwise to exploit legal but obscure protocol features. It is - unwise to stray far from the obvious and simple, lest untoward - effects result elsewhere. A corollary of this is "watch out - for misbehaving hosts"; host software should be prepared, not - just to survive other misbehaving hosts, but also to cooperate - to limit the amount of disruption such hosts can cause to the - shared communication facility. - - 1.2.3 Error Logging - - The Internet includes a great variety of host and gateway - systems, each implementing many protocols and protocol layers, - and some of these contain bugs and mis-features in their - Internet protocol software. As a result of complexity, - diversity, and distribution of function, the diagnosis of - Internet problems is often very difficult. - - Problem diagnosis will be aided if host implementations include - a carefully designed facility for logging erroneous or - "strange" protocol events. It is important to include as much - diagnostic information as possible when an error is logged. In - particular, it is often useful to record the header(s) of a - packet that caused an error. However, care must be taken to - ensure that error logging does not consume prohibitive amounts - of resources or otherwise interfere with the operation of the - host. - - There is a tendency for abnormal but harmless protocol events - to overflow error logging files; this can be avoided by using a - "circular" log, or by enabling logging only while diagnosing a - known failure. It may be useful to filter and count duplicate - successive messages. One strategy that seems to work well is: - (1) always count abnormalities and make such counts accessible - through the management protocol (see [INTRO:1]); and (2) allow - - - -Internet Engineering Task Force [Page 13] - - - - -RFC1122 INTRODUCTION October 1989 - - - the logging of a great variety of events to be selectively - enabled. For example, it might useful to be able to "log - everything" or to "log everything for host X". - - Note that different managements may have differing policies - about the amount of error logging that they want normally - enabled in a host. Some will say, "if it doesn't hurt me, I - don't want to know about it", while others will want to take a - more watchful and aggressive attitude about detecting and - removing protocol abnormalities. - - 1.2.4 Configuration - - It would be ideal if a host implementation of the Internet - protocol suite could be entirely self-configuring. This would - allow the whole suite to be implemented in ROM or cast into - silicon, it would simplify diskless workstations, and it would - be an immense boon to harried LAN administrators as well as - system vendors. We have not reached this ideal; in fact, we - are not even close. - - At many points in this document, you will find a requirement - that a parameter be a configurable option. There are several - different reasons behind such requirements. In a few cases, - there is current uncertainty or disagreement about the best - value, and it may be necessary to update the recommended value - in the future. In other cases, the value really depends on - external factors -- e.g., the size of the host and the - distribution of its communication load, or the speeds and - topology of nearby networks -- and self-tuning algorithms are - unavailable and may be insufficient. In some cases, - configurability is needed because of administrative - requirements. - - Finally, some configuration options are required to communicate - with obsolete or incorrect implementations of the protocols, - distributed without sources, that unfortunately persist in many - parts of the Internet. To make correct systems coexist with - these faulty systems, administrators often have to "mis- - configure" the correct systems. This problem will correct - itself gradually as the faulty systems are retired, but it - cannot be ignored by vendors. - - When we say that a parameter must be configurable, we do not - intend to require that its value be explicitly read from a - configuration file at every boot time. We recommend that - implementors set up a default for each parameter, so a - configuration file is only necessary to override those defaults - - - -Internet Engineering Task Force [Page 14] - - - - -RFC1122 INTRODUCTION October 1989 - - - that are inappropriate in a particular installation. Thus, the - configurability requirement is an assurance that it will be - POSSIBLE to override the default when necessary, even in a - binary-only or ROM-based product. - - This document requires a particular value for such defaults in - some cases. The choice of default is a sensitive issue when - the configuration item controls the accommodation to existing - faulty systems. If the Internet is to converge successfully to - complete interoperability, the default values built into - implementations must implement the official protocol, not - "mis-configurations" to accommodate faulty implementations. - Although marketing considerations have led some vendors to - choose mis-configuration defaults, we urge vendors to choose - defaults that will conform to the standard. - - Finally, we note that a vendor needs to provide adequate - documentation on all configuration parameters, their limits and - effects. - - - 1.3 Reading this Document - - 1.3.1 Organization - - Protocol layering, which is generally used as an organizing - principle in implementing network software, has also been used - to organize this document. In describing the rules, we assume - that an implementation does strictly mirror the layering of the - protocols. Thus, the following three major sections specify - the requirements for the link layer, the internet layer, and - the transport layer, respectively. A companion RFC [INTRO:1] - covers application level software. This layerist organization - was chosen for simplicity and clarity. - - However, strict layering is an imperfect model, both for the - protocol suite and for recommended implementation approaches. - Protocols in different layers interact in complex and sometimes - subtle ways, and particular functions often involve multiple - layers. There are many design choices in an implementation, - many of which involve creative "breaking" of strict layering. - Every implementor is urged to read references [INTRO:7] and - [INTRO:8]. - - This document describes the conceptual service interface - between layers using a functional ("procedure call") notation, - like that used in the TCP specification [TCP:1]. A host - implementation must support the logical information flow - - - -Internet Engineering Task Force [Page 15] - - - - -RFC1122 INTRODUCTION October 1989 - - - implied by these calls, but need not literally implement the - calls themselves. For example, many implementations reflect - the coupling between the transport layer and the IP layer by - giving them shared access to common data structures. These - data structures, rather than explicit procedure calls, are then - the agency for passing much of the information that is - required. - - In general, each major section of this document is organized - into the following subsections: - - (1) Introduction - - (2) Protocol Walk-Through -- considers the protocol - specification documents section-by-section, correcting - errors, stating requirements that may be ambiguous or - ill-defined, and providing further clarification or - explanation. - - (3) Specific Issues -- discusses protocol design and - implementation issues that were not included in the walk- - through. - - (4) Interfaces -- discusses the service interface to the next - higher layer. - - (5) Summary -- contains a summary of the requirements of the - section. - - - Under many of the individual topics in this document, there is - parenthetical material labeled "DISCUSSION" or - "IMPLEMENTATION". This material is intended to give - clarification and explanation of the preceding requirements - text. It also includes some suggestions on possible future - directions or developments. The implementation material - contains suggested approaches that an implementor may want to - consider. - - The summary sections are intended to be guides and indexes to - the text, but are necessarily cryptic and incomplete. The - summaries should never be used or referenced separately from - the complete RFC. - - 1.3.2 Requirements - - In this document, the words that are used to define the - significance of each particular requirement are capitalized. - - - -Internet Engineering Task Force [Page 16] - - - - -RFC1122 INTRODUCTION October 1989 - - - These words are: - - * "MUST" - - This word or the adjective "REQUIRED" means that the item - is an absolute requirement of the specification. - - * "SHOULD" - - This word or the adjective "RECOMMENDED" means that there - may exist valid reasons in particular circumstances to - ignore this item, but the full implications should be - understood and the case carefully weighed before choosing - a different course. - - * "MAY" - - This word or the adjective "OPTIONAL" means that this item - is truly optional. One vendor may choose to include the - item because a particular marketplace requires it or - because it enhances the product, for example; another - vendor may omit the same item. - - - An implementation is not compliant if it fails to satisfy one - or more of the MUST requirements for the protocols it - implements. An implementation that satisfies all the MUST and - all the SHOULD requirements for its protocols is said to be - "unconditionally compliant"; one that satisfies all the MUST - requirements but not all the SHOULD requirements for its - protocols is said to be "conditionally compliant". - - 1.3.3 Terminology - - This document uses the following technical terms: - - Segment - A segment is the unit of end-to-end transmission in the - TCP protocol. A segment consists of a TCP header followed - by application data. A segment is transmitted by - encapsulation inside an IP datagram. - - Message - In this description of the lower-layer protocols, a - message is the unit of transmission in a transport layer - protocol. In particular, a TCP segment is a message. A - message consists of a transport protocol header followed - by application protocol data. To be transmitted end-to- - - - -Internet Engineering Task Force [Page 17] - - - - -RFC1122 INTRODUCTION October 1989 - - - end through the Internet, a message must be encapsulated - inside a datagram. - - IP Datagram - An IP datagram is the unit of end-to-end transmission in - the IP protocol. An IP datagram consists of an IP header - followed by transport layer data, i.e., of an IP header - followed by a message. - - In the description of the internet layer (Section 3), the - unqualified term "datagram" should be understood to refer - to an IP datagram. - - Packet - A packet is the unit of data passed across the interface - between the internet layer and the link layer. It - includes an IP header and data. A packet may be a - complete IP datagram or a fragment of an IP datagram. - - Frame - A frame is the unit of transmission in a link layer - protocol, and consists of a link-layer header followed by - a packet. - - Connected Network - A network to which a host is interfaced is often known as - the "local network" or the "subnetwork" relative to that - host. However, these terms can cause confusion, and - therefore we use the term "connected network" in this - document. - - Multihomed - A host is said to be multihomed if it has multiple IP - addresses. For a discussion of multihoming, see Section - 3.3.4 below. - - Physical network interface - This is a physical interface to a connected network and - has a (possibly unique) link-layer address. Multiple - physical network interfaces on a single host may share the - same link-layer address, but the address must be unique - for different hosts on the same physical network. - - Logical [network] interface - We define a logical [network] interface to be a logical - path, distinguished by a unique IP address, to a connected - network. See Section 3.3.4. - - - - -Internet Engineering Task Force [Page 18] - - - - -RFC1122 INTRODUCTION October 1989 - - - Specific-destination address - This is the effective destination address of a datagram, - even if it is broadcast or multicast; see Section 3.2.1.3. - - Path - At a given moment, all the IP datagrams from a particular - source host to a particular destination host will - typically traverse the same sequence of gateways. We use - the term "path" for this sequence. Note that a path is - uni-directional; it is not unusual to have different paths - in the two directions between a given host pair. - - MTU - The maximum transmission unit, i.e., the size of the - largest packet that can be transmitted. - - - The terms frame, packet, datagram, message, and segment are - illustrated by the following schematic diagrams: - - A. Transmission on connected network: - _______________________________________________ - | LL hdr | IP hdr | (data) | - |________|________|_____________________________| - - <---------- Frame -----------------------------> - <----------Packet --------------------> - - - B. Before IP fragmentation or after IP reassembly: - ______________________________________ - | IP hdr | transport| Application Data | - |________|____hdr___|__________________| - - <-------- Datagram ------------------> - <-------- Message -----------> - or, for TCP: - ______________________________________ - | IP hdr | TCP hdr | Application Data | - |________|__________|__________________| - - <-------- Datagram ------------------> - <-------- Segment -----------> - - - - - - - - -Internet Engineering Task Force [Page 19] - - - - -RFC1122 INTRODUCTION October 1989 - - - 1.4 Acknowledgments - - This document incorporates contributions and comments from a large - group of Internet protocol experts, including representatives of - university and research labs, vendors, and government agencies. - It was assembled primarily by the Host Requirements Working Group - of the Internet Engineering Task Force (IETF). - - The Editor would especially like to acknowledge the tireless - dedication of the following people, who attended many long - meetings and generated 3 million bytes of electronic mail over the - past 18 months in pursuit of this document: Philip Almquist, Dave - Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve - Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore), - John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG), - Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge - (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software). - - In addition, the following people made major contributions to the - effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia - (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA), - Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL), - John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill - Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul - (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue - Technology), and Mike StJohns (DCA). The following also made - significant contributions to particular areas: Eric Allman - (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic - (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn - (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann - (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun - Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen - (Toronto). - - We are grateful to all, including any contributors who may have - been inadvertently omitted from this list. - - - - - - - - - - - - - - - -Internet Engineering Task Force [Page 20] - - - - -RFC1122 LINK LAYER October 1989 - - -2. LINK LAYER - - 2.1 INTRODUCTION - - All Internet systems, both hosts and gateways, have the same - requirements for link layer protocols. These requirements are - given in Chapter 3 of "Requirements for Internet Gateways" - [INTRO:2], augmented with the material in this section. - - 2.2 PROTOCOL WALK-THROUGH - - None. - - 2.3 SPECIFIC ISSUES - - 2.3.1 Trailer Protocol Negotiation - - The trailer protocol [LINK:1] for link-layer encapsulation MAY - be used, but only when it has been verified that both systems - (host or gateway) involved in the link-layer communication - implement trailers. If the system does not dynamically - negotiate use of the trailer protocol on a per-destination - basis, the default configuration MUST disable the protocol. - - DISCUSSION: - The trailer protocol is a link-layer encapsulation - technique that rearranges the data contents of packets - sent on the physical network. In some cases, trailers - improve the throughput of higher layer protocols by - reducing the amount of data copying within the operating - system. Higher layer protocols are unaware of trailer - use, but both the sending and receiving host MUST - understand the protocol if it is used. - - Improper use of trailers can result in very confusing - symptoms. Only packets with specific size attributes are - encapsulated using trailers, and typically only a small - fraction of the packets being exchanged have these - attributes. Thus, if a system using trailers exchanges - packets with a system that does not, some packets - disappear into a black hole while others are delivered - successfully. - - IMPLEMENTATION: - On an Ethernet, packets encapsulated with trailers use a - distinct Ethernet type [LINK:1], and trailer negotiation - is performed at the time that ARP is used to discover the - link-layer address of a destination system. - - - -Internet Engineering Task Force [Page 21] - - - - -RFC1122 LINK LAYER October 1989 - - - Specifically, the ARP exchange is completed in the usual - manner using the normal IP protocol type, but a host that - wants to speak trailers will send an additional "trailer - ARP reply" packet, i.e., an ARP reply that specifies the - trailer encapsulation protocol type but otherwise has the - format of a normal ARP reply. If a host configured to use - trailers receives a trailer ARP reply message from a - remote machine, it can add that machine to the list of - machines that understand trailers, e.g., by marking the - corresponding entry in the ARP cache. - - Hosts wishing to receive trailer encapsulations send - trailer ARP replies whenever they complete exchanges of - normal ARP messages for IP. Thus, a host that received an - ARP request for its IP protocol address would send a - trailer ARP reply in addition to the normal IP ARP reply; - a host that sent the IP ARP request would send a trailer - ARP reply when it received the corresponding IP ARP reply. - In this way, either the requesting or responding host in - an IP ARP exchange may request that it receive trailer - encapsulations. - - This scheme, using extra trailer ARP reply packets rather - than sending an ARP request for the trailer protocol type, - was designed to avoid a continuous exchange of ARP packets - with a misbehaving host that, contrary to any - specification or common sense, responded to an ARP reply - for trailers with another ARP reply for IP. This problem - is avoided by sending a trailer ARP reply in response to - an IP ARP reply only when the IP ARP reply answers an - outstanding request; this is true when the hardware - address for the host is still unknown when the IP ARP - reply is received. A trailer ARP reply may always be sent - along with an IP ARP reply responding to an IP ARP - request. - - 2.3.2 Address Resolution Protocol -- ARP - - 2.3.2.1 ARP Cache Validation - - An implementation of the Address Resolution Protocol (ARP) - [LINK:2] MUST provide a mechanism to flush out-of-date cache - entries. If this mechanism involves a timeout, it SHOULD be - possible to configure the timeout value. - - A mechanism to prevent ARP flooding (repeatedly sending an - ARP Request for the same IP address, at a high rate) MUST be - included. The recommended maximum rate is 1 per second per - - - -Internet Engineering Task Force [Page 22] - - - - -RFC1122 LINK LAYER October 1989 - - - destination. - - DISCUSSION: - The ARP specification [LINK:2] suggests but does not - require a timeout mechanism to invalidate cache entries - when hosts change their Ethernet addresses. The - prevalence of proxy ARP (see Section 2.4 of [INTRO:2]) - has significantly increased the likelihood that cache - entries in hosts will become invalid, and therefore - some ARP-cache invalidation mechanism is now required - for hosts. Even in the absence of proxy ARP, a long- - period cache timeout is useful in order to - automatically correct any bad ARP data that might have - been cached. - - IMPLEMENTATION: - Four mechanisms have been used, sometimes in - combination, to flush out-of-date cache entries. - - (1) Timeout -- Periodically time out cache entries, - even if they are in use. Note that this timeout - should be restarted when the cache entry is - "refreshed" (by observing the source fields, - regardless of target address, of an ARP broadcast - from the system in question). For proxy ARP - situations, the timeout needs to be on the order - of a minute. - - (2) Unicast Poll -- Actively poll the remote host by - periodically sending a point-to-point ARP Request - to it, and delete the entry if no ARP Reply is - received from N successive polls. Again, the - timeout should be on the order of a minute, and - typically N is 2. - - (3) Link-Layer Advice -- If the link-layer driver - detects a delivery problem, flush the - corresponding ARP cache entry. - - (4) Higher-layer Advice -- Provide a call from the - Internet layer to the link layer to indicate a - delivery problem. The effect of this call would - be to invalidate the corresponding cache entry. - This call would be analogous to the - "ADVISE_DELIVPROB()" call from the transport layer - to the Internet layer (see Section 3.4), and in - fact the ADVISE_DELIVPROB routine might in turn - call the link-layer advice routine to invalidate - - - -Internet Engineering Task Force [Page 23] - - - - -RFC1122 LINK LAYER October 1989 - - - the ARP cache entry. - - Approaches (1) and (2) involve ARP cache timeouts on - the order of a minute or less. In the absence of proxy - ARP, a timeout this short could create noticeable - overhead traffic on a very large Ethernet. Therefore, - it may be necessary to configure a host to lengthen the - ARP cache timeout. - - 2.3.2.2 ARP Packet Queue - - The link layer SHOULD save (rather than discard) at least - one (the latest) packet of each set of packets destined to - the same unresolved IP address, and transmit the saved - packet when the address has been resolved. - - DISCUSSION: - Failure to follow this recommendation causes the first - packet of every exchange to be lost. Although higher- - layer protocols can generally cope with packet loss by - retransmission, packet loss does impact performance. - For example, loss of a TCP open request causes the - initial round-trip time estimate to be inflated. UDP- - based applications such as the Domain Name System are - more seriously affected. - - 2.3.3 Ethernet and IEEE 802 Encapsulation - - The IP encapsulation for Ethernets is described in RFC-894 - [LINK:3], while RFC-1042 [LINK:4] describes the IP - encapsulation for IEEE 802 networks. RFC-1042 elaborates and - replaces the discussion in Section 3.4 of [INTRO:2]. - - Every Internet host connected to a 10Mbps Ethernet cable: - - o MUST be able to send and receive packets using RFC-894 - encapsulation; - - o SHOULD be able to receive RFC-1042 packets, intermixed - with RFC-894 packets; and - - o MAY be able to send packets using RFC-1042 encapsulation. - - - An Internet host that implements sending both the RFC-894 and - the RFC-1042 encapsulations MUST provide a configuration switch - to select which is sent, and this switch MUST default to RFC- - 894. - - - -Internet Engineering Task Force [Page 24] - - - - -RFC1122 LINK LAYER October 1989 - - - Note that the standard IP encapsulation in RFC-1042 does not - use the protocol id value (K1=6) that IEEE reserved for IP; - instead, it uses a value (K1=170) that implies an extension - (the "SNAP") which can be used to hold the Ether-Type field. - An Internet system MUST NOT send 802 packets using K1=6. - - Address translation from Internet addresses to link-layer - addresses on Ethernet and IEEE 802 networks MUST be managed by - the Address Resolution Protocol (ARP). - - The MTU for an Ethernet is 1500 and for 802.3 is 1492. - - DISCUSSION: - The IEEE 802.3 specification provides for operation over a - 10Mbps Ethernet cable, in which case Ethernet and IEEE - 802.3 frames can be physically intermixed. A receiver can - distinguish Ethernet and 802.3 frames by the value of the - 802.3 Length field; this two-octet field coincides in the - header with the Ether-Type field of an Ethernet frame. In - particular, the 802.3 Length field must be less than or - equal to 1500, while all valid Ether-Type values are - greater than 1500. - - Another compatibility problem arises with link-layer - broadcasts. A broadcast sent with one framing will not be - seen by hosts that can receive only the other framing. - - The provisions of this section were designed to provide - direct interoperation between 894-capable and 1042-capable - systems on the same cable, to the maximum extent possible. - It is intended to support the present situation where - 894-only systems predominate, while providing an easy - transition to a possible future in which 1042-capable - systems become common. - - Note that 894-only systems cannot interoperate directly - with 1042-only systems. If the two system types are set - up as two different logical networks on the same cable, - they can communicate only through an IP gateway. - Furthermore, it is not useful or even possible for a - dual-format host to discover automatically which format to - send, because of the problem of link-layer broadcasts. - - 2.4 LINK/INTERNET LAYER INTERFACE - - The packet receive interface between the IP layer and the link - layer MUST include a flag to indicate whether the incoming packet - was addressed to a link-layer broadcast address. - - - -Internet Engineering Task Force [Page 25] - - - - -RFC1122 LINK LAYER October 1989 - - - DISCUSSION - Although the IP layer does not generally know link layer - addresses (since every different network medium typically has - a different address format), the broadcast address on a - broadcast-capable medium is an important special case. See - Section 3.2.2, especially the DISCUSSION concerning broadcast - storms. - - The packet send interface between the IP and link layers MUST - include the 5-bit TOS field (see Section 3.2.1.6). - - The link layer MUST NOT report a Destination Unreachable error to - IP solely because there is no ARP cache entry for a destination. - - 2.5 LINK LAYER REQUIREMENTS SUMMARY - - | | | | |S| | - | | | | |H| |F - | | | | |O|M|o - | | |S| |U|U|o - | | |H| |L|S|t - | |M|O| |D|T|n - | |U|U|M| | |o - | |S|L|A|N|N|t - | |T|D|Y|O|O|t -FEATURE |SECTION| | | |T|T|e ---------------------------------------------------|-------|-|-|-|-|-|-- - | | | | | | | -Trailer encapsulation |2.3.1 | | |x| | | -Send Trailers by default without negotiation |2.3.1 | | | | |x| -ARP |2.3.2 | | | | | | - Flush out-of-date ARP cache entries |2.3.2.1|x| | | | | - Prevent ARP floods |2.3.2.1|x| | | | | - Cache timeout configurable |2.3.2.1| |x| | | | - Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | | -Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | | - Host able to: |2.3.3 | | | | | | - Send & receive RFC-894 encapsulation |2.3.3 |x| | | | | - Receive RFC-1042 encapsulation |2.3.3 | |x| | | | - Send RFC-1042 encapsulation |2.3.3 | | |x| | | - Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | | - Send K1=6 encapsulation |2.3.3 | | | | |x| - Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | | -Link layer report b'casts to IP layer |2.4 |x| | | | | -IP layer pass TOS to link layer |2.4 |x| | | | | -No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x| - - - - - -Internet Engineering Task Force [Page 26] - - - - -RFC1122 INTERNET LAYER October 1989 - - -3. INTERNET LAYER PROTOCOLS - - 3.1 INTRODUCTION - - The Robustness Principle: "Be liberal in what you accept, and - conservative in what you send" is particularly important in the - Internet layer, where one misbehaving host can deny Internet - service to many other hosts. - - The protocol standards used in the Internet layer are: - - o RFC-791 [IP:1] defines the IP protocol and gives an - introduction to the architecture of the Internet. - - o RFC-792 [IP:2] defines ICMP, which provides routing, - diagnostic and error functionality for IP. Although ICMP - messages are encapsulated within IP datagrams, ICMP - processing is considered to be (and is typically implemented - as) part of the IP layer. See Section 3.2.2. - - o RFC-950 [IP:3] defines the mandatory subnet extension to the - addressing architecture. - - o RFC-1112 [IP:4] defines the Internet Group Management - Protocol IGMP, as part of a recommended extension to hosts - and to the host-gateway interface to support Internet-wide - multicasting at the IP level. See Section 3.2.3. - - The target of an IP multicast may be an arbitrary group of - Internet hosts. IP multicasting is designed as a natural - extension of the link-layer multicasting facilities of some - networks, and it provides a standard means for local access - to such link-layer multicasting facilities. - - Other important references are listed in Section 5 of this - document. - - The Internet layer of host software MUST implement both IP and - ICMP. See Section 3.3.7 for the requirements on support of IGMP. - - The host IP layer has two basic functions: (1) choose the "next - hop" gateway or host for outgoing IP datagrams and (2) reassemble - incoming IP datagrams. The IP layer may also (3) implement - intentional fragmentation of outgoing datagrams. Finally, the IP - layer must (4) provide diagnostic and error functionality. We - expect that IP layer functions may increase somewhat in the - future, as further Internet control and management facilities are - developed. - - - -Internet Engineering Task Force [Page 27] - - - - -RFC1122 INTERNET LAYER October 1989 - - - For normal datagrams, the processing is straightforward. For - incoming datagrams, the IP layer: - - (1) verifies that the datagram is correctly formatted; - - (2) verifies that it is destined to the local host; - - (3) processes options; - - (4) reassembles the datagram if necessary; and - - (5) passes the encapsulated message to the appropriate - transport-layer protocol module. - - For outgoing datagrams, the IP layer: - - (1) sets any fields not set by the transport layer; - - (2) selects the correct first hop on the connected network (a - process called "routing"); - - (3) fragments the datagram if necessary and if intentional - fragmentation is implemented (see Section 3.3.3); and - - (4) passes the packet(s) to the appropriate link-layer driver. - - - A host is said to be multihomed if it has multiple IP addresses. - Multihoming introduces considerable confusion and complexity into - the protocol suite, and it is an area in which the Internet - architecture falls seriously short of solving all problems. There - are two distinct problem areas in multihoming: - - (1) Local multihoming -- the host itself is multihomed; or - - (2) Remote multihoming -- the local host needs to communicate - with a remote multihomed host. - - At present, remote multihoming MUST be handled at the application - layer, as discussed in the companion RFC [INTRO:1]. A host MAY - support local multihoming, which is discussed in this document, - and in particular in Section 3.3.4. - - Any host that forwards datagrams generated by another host is - acting as a gateway and MUST also meet the specifications laid out - in the gateway requirements RFC [INTRO:2]. An Internet host that - includes embedded gateway code MUST have a configuration switch to - disable the gateway function, and this switch MUST default to the - - - -Internet Engineering Task Force [Page 28] - - - - -RFC1122 INTERNET LAYER October 1989 - - - non-gateway mode. In this mode, a datagram arriving through one - interface will not be forwarded to another host or gateway (unless - it is source-routed), regardless of whether the host is single- - homed or multihomed. The host software MUST NOT automatically - move into gateway mode if the host has more than one interface, as - the operator of the machine may neither want to provide that - service nor be competent to do so. - - In the following, the action specified in certain cases is to - "silently discard" a received datagram. This means that the - datagram will be discarded without further processing and that the - host will not send any ICMP error message (see Section 3.2.2) as a - result. However, for diagnosis of problems a host SHOULD provide - the capability of logging the error (see Section 1.2.3), including - the contents of the silently-discarded datagram, and SHOULD record - the event in a statistics counter. - - DISCUSSION: - Silent discard of erroneous datagrams is generally intended - to prevent "broadcast storms". - - 3.2 PROTOCOL WALK-THROUGH - - 3.2.1 Internet Protocol -- IP - - 3.2.1.1 Version Number: RFC-791 Section 3.1 - - A datagram whose version number is not 4 MUST be silently - discarded. - - 3.2.1.2 Checksum: RFC-791 Section 3.1 - - A host MUST verify the IP header checksum on every received - datagram and silently discard every datagram that has a bad - checksum. - - 3.2.1.3 Addressing: RFC-791 Section 3.2 - - There are now five classes of IP addresses: Class A through - Class E. Class D addresses are used for IP multicasting - [IP:4], while Class E addresses are reserved for - experimental use. - - A multicast (Class D) address is a 28-bit logical address - that stands for a group of hosts, and may be either - permanent or transient. Permanent multicast addresses are - allocated by the Internet Assigned Number Authority - [INTRO:6], while transient addresses may be allocated - - - -Internet Engineering Task Force [Page 29] - - - - -RFC1122 INTERNET LAYER October 1989 - - - dynamically to transient groups. Group membership is - determined dynamically using IGMP [IP:4]. - - We now summarize the important special cases for Class A, B, - and C IP addresses, using the following notation for an IP - address: - - { <Network-number>, <Host-number> } - - or - { <Network-number>, <Subnet-number>, <Host-number> } - - and the notation "-1" for a field that contains all 1 bits. - This notation is not intended to imply that the 1-bits in an - address mask need be contiguous. - - (a) { 0, 0 } - - This host on this network. MUST NOT be sent, except as - a source address as part of an initialization procedure - by which the host learns its own IP address. - - See also Section 3.3.6 for a non-standard use of {0,0}. - - (b) { 0, <Host-number> } - - Specified host on this network. It MUST NOT be sent, - except as a source address as part of an initialization - procedure by which the host learns its full IP address. - - (c) { -1, -1 } - - Limited broadcast. It MUST NOT be used as a source - address. - - A datagram with this destination address will be - received by every host on the connected physical - network but will not be forwarded outside that network. - - (d) { <Network-number>, -1 } - - Directed broadcast to the specified network. It MUST - NOT be used as a source address. - - (e) { <Network-number>, <Subnet-number>, -1 } - - Directed broadcast to the specified subnet. It MUST - NOT be used as a source address. - - - -Internet Engineering Task Force [Page 30] - - - - -RFC1122 INTERNET LAYER October 1989 - - - (f) { <Network-number>, -1, -1 } - - Directed broadcast to all subnets of the specified - subnetted network. It MUST NOT be used as a source - address. - - (g) { 127, <any> } - - Internal host loopback address. Addresses of this form - MUST NOT appear outside a host. - - The <Network-number> is administratively assigned so that - its value will be unique in the entire world. - - IP addresses are not permitted to have the value 0 or -1 for - any of the <Host-number>, <Network-number>, or <Subnet- - number> fields (except in the special cases listed above). - This implies that each of these fields will be at least two - bits long. - - For further discussion of broadcast addresses, see Section - 3.3.6. - - A host MUST support the subnet extensions to IP [IP:3]. As - a result, there will be an address mask of the form: - {-1, -1, 0} associated with each of the host's local IP - addresses; see Sections 3.2.2.9 and 3.3.1.1. - - When a host sends any datagram, the IP source address MUST - be one of its own IP addresses (but not a broadcast or - multicast address). - - A host MUST silently discard an incoming datagram that is - not destined for the host. An incoming datagram is destined - for the host if the datagram's destination address field is: - - (1) (one of) the host's IP address(es); or - - (2) an IP broadcast address valid for the connected - network; or - - (3) the address for a multicast group of which the host is - a member on the incoming physical interface. - - For most purposes, a datagram addressed to a broadcast or - multicast destination is processed as if it had been - addressed to one of the host's IP addresses; we use the term - "specific-destination address" for the equivalent local IP - - - -Internet Engineering Task Force [Page 31] - - - - -RFC1122 INTERNET LAYER October 1989 - - - address of the host. The specific-destination address is - defined to be the destination address in the IP header - unless the header contains a broadcast or multicast address, - in which case the specific-destination is an IP address - assigned to the physical interface on which the datagram - arrived. - - A host MUST silently discard an incoming datagram containing - an IP source address that is invalid by the rules of this - section. This validation could be done in either the IP - layer or by each protocol in the transport layer. - - DISCUSSION: - A mis-addressed datagram might be caused by a link- - layer broadcast of a unicast datagram or by a gateway - or host that is confused or mis-configured. - - An architectural goal for Internet hosts was to allow - IP addresses to be featureless 32-bit numbers, avoiding - algorithms that required a knowledge of the IP address - format. Otherwise, any future change in the format or - interpretation of IP addresses will require host - software changes. However, validation of broadcast and - multicast addresses violates this goal; a few other - violations are described elsewhere in this document. - - Implementers should be aware that applications - depending upon the all-subnets directed broadcast - address (f) may be unusable on some networks. All- - subnets broadcast is not widely implemented in vendor - gateways at present, and even when it is implemented, a - particular network administration may disable it in the - gateway configuration. - - 3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2 - - The Internet model requires that every host support - reassembly. See Sections 3.3.2 and 3.3.3 for the - requirements on fragmentation and reassembly. - - 3.2.1.5 Identification: RFC-791 Section 3.2 - - When sending an identical copy of an earlier datagram, a - host MAY optionally retain the same Identification field in - the copy. - - - - - - -Internet Engineering Task Force [Page 32] - - - - -RFC1122 INTERNET LAYER October 1989 - - - DISCUSSION: - Some Internet protocol experts have maintained that - when a host sends an identical copy of an earlier - datagram, the new copy should contain the same - Identification value as the original. There are two - suggested advantages: (1) if the datagrams are - fragmented and some of the fragments are lost, the - receiver may be able to reconstruct a complete datagram - from fragments of the original and the copies; (2) a - congested gateway might use the IP Identification field - (and Fragment Offset) to discard duplicate datagrams - from the queue. - - However, the observed patterns of datagram loss in the - Internet do not favor the probability of retransmitted - fragments filling reassembly gaps, while other - mechanisms (e.g., TCP repacketizing upon - retransmission) tend to prevent retransmission of an - identical datagram [IP:9]. Therefore, we believe that - retransmitting the same Identification field is not - useful. Also, a connectionless transport protocol like - UDP would require the cooperation of the application - programs to retain the same Identification value in - identical datagrams. - - 3.2.1.6 Type-of-Service: RFC-791 Section 3.2 - - The "Type-of-Service" byte in the IP header is divided into - two sections: the Precedence field (high-order 3 bits), and - a field that is customarily called "Type-of-Service" or - "TOS" (low-order 5 bits). In this document, all references - to "TOS" or the "TOS field" refer to the low-order 5 bits - only. - - The Precedence field is intended for Department of Defense - applications of the Internet protocols. The use of non-zero - values in this field is outside the scope of this document - and the IP standard specification. Vendors should consult - the Defense Communication Agency (DCA) for guidance on the - IP Precedence field and its implications for other protocol - layers. However, vendors should note that the use of - precedence will most likely require that its value be passed - between protocol layers in just the same way as the TOS - field is passed. - - The IP layer MUST provide a means for the transport layer to - set the TOS field of every datagram that is sent; the - default is all zero bits. The IP layer SHOULD pass received - - - -Internet Engineering Task Force [Page 33] - - - - -RFC1122 INTERNET LAYER October 1989 - - - TOS values up to the transport layer. - - The particular link-layer mappings of TOS contained in RFC- - 795 SHOULD NOT be implemented. - - DISCUSSION: - While the TOS field has been little used in the past, - it is expected to play an increasing role in the near - future. The TOS field is expected to be used to - control two aspects of gateway operations: routing and - queueing algorithms. See Section 2 of [INTRO:1] for - the requirements on application programs to specify TOS - values. - - The TOS field may also be mapped into link-layer - service selectors. This has been applied to provide - effective sharing of serial lines by different classes - of TCP traffic, for example. However, the mappings - suggested in RFC-795 for networks that were included in - the Internet as of 1981 are now obsolete. - - 3.2.1.7 Time-to-Live: RFC-791 Section 3.2 - - A host MUST NOT send a datagram with a Time-to-Live (TTL) - value of zero. - - A host MUST NOT discard a datagram just because it was - received with TTL less than 2. - - The IP layer MUST provide a means for the transport layer to - set the TTL field of every datagram that is sent. When a - fixed TTL value is used, it MUST be configurable. The - current suggested value will be published in the "Assigned - Numbers" RFC. - - DISCUSSION: - The TTL field has two functions: limit the lifetime of - TCP segments (see RFC-793 [TCP:1], p. 28), and - terminate Internet routing loops. Although TTL is a - time in seconds, it also has some attributes of a hop- - count, since each gateway is required to reduce the TTL - field by at least one. - - The intent is that TTL expiration will cause a datagram - to be discarded by a gateway but not by the destination - host; however, hosts that act as gateways by forwarding - datagrams must follow the gateway rules for TTL. - - - - -Internet Engineering Task Force [Page 34] - - - - -RFC1122 INTERNET LAYER October 1989 - - - A higher-layer protocol may want to set the TTL in - order to implement an "expanding scope" search for some - Internet resource. This is used by some diagnostic - tools, and is expected to be useful for locating the - "nearest" server of a given class using IP - multicasting, for example. A particular transport - protocol may also want to specify its own TTL bound on - maximum datagram lifetime. - - A fixed value must be at least big enough for the - Internet "diameter," i.e., the longest possible path. - A reasonable value is about twice the diameter, to - allow for continued Internet growth. - - 3.2.1.8 Options: RFC-791 Section 3.2 - - There MUST be a means for the transport layer to specify IP - options to be included in transmitted IP datagrams (see - Section 3.4). - - All IP options (except NOP or END-OF-LIST) received in - datagrams MUST be passed to the transport layer (or to ICMP - processing when the datagram is an ICMP message). The IP - and transport layer MUST each interpret those IP options - that they understand and silently ignore the others. - - Later sections of this document discuss specific IP option - support required by each of ICMP, TCP, and UDP. - - DISCUSSION: - Passing all received IP options to the transport layer - is a deliberate "violation of strict layering" that is - designed to ease the introduction of new transport- - relevant IP options in the future. Each layer must - pick out any options that are relevant to its own - processing and ignore the rest. For this purpose, - every IP option except NOP and END-OF-LIST will include - a specification of its own length. - - This document does not define the order in which a - receiver must process multiple options in the same IP - header. Hosts sending multiple options must be aware - that this introduces an ambiguity in the meaning of - certain options when combined with a source-route - option. - - IMPLEMENTATION: - The IP layer must not crash as the result of an option - - - -Internet Engineering Task Force [Page 35] - - - - -RFC1122 INTERNET LAYER October 1989 - - - length that is outside the possible range. For - example, erroneous option lengths have been observed to - put some IP implementations into infinite loops. - - Here are the requirements for specific IP options: - - - (a) Security Option - - Some environments require the Security option in every - datagram; such a requirement is outside the scope of - this document and the IP standard specification. Note, - however, that the security options described in RFC-791 - and RFC-1038 are obsolete. For DoD applications, - vendors should consult [IP:8] for guidance. - - - (b) Stream Identifier Option - - This option is obsolete; it SHOULD NOT be sent, and it - MUST be silently ignored if received. - - - (c) Source Route Options - - A host MUST support originating a source route and MUST - be able to act as the final destination of a source - route. - - If host receives a datagram containing a completed - source route (i.e., the pointer points beyond the last - field), the datagram has reached its final destination; - the option as received (the recorded route) MUST be - passed up to the transport layer (or to ICMP message - processing). This recorded route will be reversed and - used to form a return source route for reply datagrams - (see discussion of IP Options in Section 4). When a - return source route is built, it MUST be correctly - formed even if the recorded route included the source - host (see case (B) in the discussion below). - - An IP header containing more than one Source Route - option MUST NOT be sent; the effect on routing of - multiple Source Route options is implementation- - specific. - - Section 3.3.5 presents the rules for a host acting as - an intermediate hop in a source route, i.e., forwarding - - - -Internet Engineering Task Force [Page 36] - - - - -RFC1122 INTERNET LAYER October 1989 - - - a source-routed datagram. - - DISCUSSION: - If a source-routed datagram is fragmented, each - fragment will contain a copy of the source route. - Since the processing of IP options (including a - source route) must precede reassembly, the - original datagram will not be reassembled until - the final destination is reached. - - Suppose a source routed datagram is to be routed - from host S to host D via gateways G1, G2, ... Gn. - There was an ambiguity in the specification over - whether the source route option in a datagram sent - out by S should be (A) or (B): - - (A): {>>G2, G3, ... Gn, D} <--- CORRECT - - (B): {S, >>G2, G3, ... Gn, D} <---- WRONG - - (where >> represents the pointer). If (A) is - sent, the datagram received at D will contain the - option: {G1, G2, ... Gn >>}, with S and D as the - IP source and destination addresses. If (B) were - sent, the datagram received at D would again - contain S and D as the same IP source and - destination addresses, but the option would be: - {S, G1, ...Gn >>}; i.e., the originating host - would be the first hop in the route. - - - (d) Record Route Option - - Implementation of originating and processing the Record - Route option is OPTIONAL. - - - (e) Timestamp Option - - Implementation of originating and processing the - Timestamp option is OPTIONAL. If it is implemented, - the following rules apply: - - o The originating host MUST record a timestamp in a - Timestamp option whose Internet address fields are - not pre-specified or whose first pre-specified - address is the host's interface address. - - - - -Internet Engineering Task Force [Page 37] - - - - -RFC1122 INTERNET LAYER October 1989 - - - o The destination host MUST (if possible) add the - current timestamp to a Timestamp option before - passing the option to the transport layer or to - ICMP for processing. - - o A timestamp value MUST follow the rules given in - Section 3.2.2.8 for the ICMP Timestamp message. - - - 3.2.2 Internet Control Message Protocol -- ICMP - - ICMP messages are grouped into two classes. - - * - ICMP error messages: - - Destination Unreachable (see Section 3.2.2.1) - Redirect (see Section 3.2.2.2) - Source Quench (see Section 3.2.2.3) - Time Exceeded (see Section 3.2.2.4) - Parameter Problem (see Section 3.2.2.5) - - - * - ICMP query messages: - - Echo (see Section 3.2.2.6) - Information (see Section 3.2.2.7) - Timestamp (see Section 3.2.2.8) - Address Mask (see Section 3.2.2.9) - - - If an ICMP message of unknown type is received, it MUST be - silently discarded. - - Every ICMP error message includes the Internet header and at - least the first 8 data octets of the datagram that triggered - the error; more than 8 octets MAY be sent; this header and data - MUST be unchanged from the received datagram. - - In those cases where the Internet layer is required to pass an - ICMP error message to the transport layer, the IP protocol - number MUST be extracted from the original header and used to - select the appropriate transport protocol entity to handle the - error. - - An ICMP error message SHOULD be sent with normal (i.e., zero) - TOS bits. - - - -Internet Engineering Task Force [Page 38] - - - - -RFC1122 INTERNET LAYER October 1989 - - - An ICMP error message MUST NOT be sent as the result of - receiving: - - * an ICMP error message, or - - * a datagram destined to an IP broadcast or IP multicast - address, or - - * a datagram sent as a link-layer broadcast, or - - * a non-initial fragment, or - - * a datagram whose source address does not define a single - host -- e.g., a zero address, a loopback address, a - broadcast address, a multicast address, or a Class E - address. - - NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT - ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES. - - DISCUSSION: - These rules will prevent the "broadcast storms" that have - resulted from hosts returning ICMP error messages in - response to broadcast datagrams. For example, a broadcast - UDP segment to a non-existent port could trigger a flood - of ICMP Destination Unreachable datagrams from all - machines that do not have a client for that destination - port. On a large Ethernet, the resulting collisions can - render the network useless for a second or more. - - Every datagram that is broadcast on the connected network - should have a valid IP broadcast address as its IP - destination (see Section 3.3.6). However, some hosts - violate this rule. To be certain to detect broadcast - datagrams, therefore, hosts are required to check for a - link-layer broadcast as well as an IP-layer broadcast - address. - - IMPLEMENTATION: - This requires that the link layer inform the IP layer when - a link-layer broadcast datagram has been received; see - Section 2.4. - - 3.2.2.1 Destination Unreachable: RFC-792 - - The following additional codes are hereby defined: - - 6 = destination network unknown - - - -Internet Engineering Task Force [Page 39] - - - - -RFC1122 INTERNET LAYER October 1989 - - - 7 = destination host unknown - - 8 = source host isolated - - 9 = communication with destination network - administratively prohibited - - 10 = communication with destination host - administratively prohibited - - 11 = network unreachable for type of service - - 12 = host unreachable for type of service - - A host SHOULD generate Destination Unreachable messages with - code: - - 2 (Protocol Unreachable), when the designated transport - protocol is not supported; or - - 3 (Port Unreachable), when the designated transport - protocol (e.g., UDP) is unable to demultiplex the - datagram but has no protocol mechanism to inform the - sender. - - A Destination Unreachable message that is received MUST be - reported to the transport layer. The transport layer SHOULD - use the information appropriately; for example, see Sections - 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol - that has its own mechanism for notifying the sender that a - port is unreachable (e.g., TCP, which sends RST segments) - MUST nevertheless accept an ICMP Port Unreachable for the - same purpose. - - A Destination Unreachable message that is received with code - 0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a - routing transient and MUST therefore be interpreted as only - a hint, not proof, that the specified destination is - unreachable [IP:11]. For example, it MUST NOT be used as - proof of a dead gateway (see Section 3.3.1). - - 3.2.2.2 Redirect: RFC-792 - - A host SHOULD NOT send an ICMP Redirect message; Redirects - are to be sent only by gateways. - - A host receiving a Redirect message MUST update its routing - information accordingly. Every host MUST be prepared to - - - -Internet Engineering Task Force [Page 40] - - - - -RFC1122 INTERNET LAYER October 1989 - - - accept both Host and Network Redirects and to process them - as described in Section 3.3.1.2 below. - - A Redirect message SHOULD be silently discarded if the new - gateway address it specifies is not on the same connected - (sub-) net through which the Redirect arrived [INTRO:2, - Appendix A], or if the source of the Redirect is not the - current first-hop gateway for the specified destination (see - Section 3.3.1). - - 3.2.2.3 Source Quench: RFC-792 - - A host MAY send a Source Quench message if it is - approaching, or has reached, the point at which it is forced - to discard incoming datagrams due to a shortage of - reassembly buffers or other resources. See Section 2.2.3 of - [INTRO:2] for suggestions on when to send Source Quench. - - If a Source Quench message is received, the IP layer MUST - report it to the transport layer (or ICMP processing). In - general, the transport or application layer SHOULD implement - a mechanism to respond to Source Quench for any protocol - that can send a sequence of datagrams to the same - destination and which can reasonably be expected to maintain - enough state information to make this feasible. See Section - 4 for the handling of Source Quench by TCP and UDP. - - DISCUSSION: - A Source Quench may be generated by the target host or - by some gateway in the path of a datagram. The host - receiving a Source Quench should throttle itself back - for a period of time, then gradually increase the - transmission rate again. The mechanism to respond to - Source Quench may be in the transport layer (for - connection-oriented protocols like TCP) or in the - application layer (for protocols that are built on top - of UDP). - - A mechanism has been proposed [IP:14] to make the IP - layer respond directly to Source Quench by controlling - the rate at which datagrams are sent, however, this - proposal is currently experimental and not currently - recommended. - - 3.2.2.4 Time Exceeded: RFC-792 - - An incoming Time Exceeded message MUST be passed to the - transport layer. - - - -Internet Engineering Task Force [Page 41] - - - - -RFC1122 INTERNET LAYER October 1989 - - - DISCUSSION: - A gateway will send a Time Exceeded Code 0 (In Transit) - message when it discards a datagram due to an expired - TTL field. This indicates either a gateway routing - loop or too small an initial TTL value. - - A host may receive a Time Exceeded Code 1 (Reassembly - Timeout) message from a destination host that has timed - out and discarded an incomplete datagram; see Section - 3.3.2 below. In the future, receipt of this message - might be part of some "MTU discovery" procedure, to - discover the maximum datagram size that can be sent on - the path without fragmentation. - - 3.2.2.5 Parameter Problem: RFC-792 - - A host SHOULD generate Parameter Problem messages. An - incoming Parameter Problem message MUST be passed to the - transport layer, and it MAY be reported to the user. - - DISCUSSION: - The ICMP Parameter Problem message is sent to the - source host for any problem not specifically covered by - another ICMP message. Receipt of a Parameter Problem - message generally indicates some local or remote - implementation error. - - A new variant on the Parameter Problem message is hereby - defined: - Code 1 = required option is missing. - - DISCUSSION: - This variant is currently in use in the military - community for a missing security option. - - 3.2.2.6 Echo Request/Reply: RFC-792 - - Every host MUST implement an ICMP Echo server function that - receives Echo Requests and sends corresponding Echo Replies. - A host SHOULD also implement an application-layer interface - for sending an Echo Request and receiving an Echo Reply, for - diagnostic purposes. - - An ICMP Echo Request destined to an IP broadcast or IP - multicast address MAY be silently discarded. - - - - - - -Internet Engineering Task Force [Page 42] - - - - -RFC1122 INTERNET LAYER October 1989 - - - DISCUSSION: - This neutral provision results from a passionate debate - between those who feel that ICMP Echo to a broadcast - address provides a valuable diagnostic capability and - those who feel that misuse of this feature can too - easily create packet storms. - - The IP source address in an ICMP Echo Reply MUST be the same - as the specific-destination address (defined in Section - 3.2.1.3) of the corresponding ICMP Echo Request message. - - Data received in an ICMP Echo Request MUST be entirely - included in the resulting Echo Reply. However, if sending - the Echo Reply requires intentional fragmentation that is - not implemented, the datagram MUST be truncated to maximum - transmission size (see Section 3.3.3) and sent. - - Echo Reply messages MUST be passed to the ICMP user - interface, unless the corresponding Echo Request originated - in the IP layer. - - If a Record Route and/or Time Stamp option is received in an - ICMP Echo Request, this option (these options) SHOULD be - updated to include the current host and included in the IP - header of the Echo Reply message, without "truncation". - Thus, the recorded route will be for the entire round trip. - - If a Source Route option is received in an ICMP Echo - Request, the return route MUST be reversed and used as a - Source Route option for the Echo Reply message. - - 3.2.2.7 Information Request/Reply: RFC-792 - - A host SHOULD NOT implement these messages. - - DISCUSSION: - The Information Request/Reply pair was intended to - support self-configuring systems such as diskless - workstations, to allow them to discover their IP - network numbers at boot time. However, the RARP and - BOOTP protocols provide better mechanisms for a host to - discover its own IP address. - - 3.2.2.8 Timestamp and Timestamp Reply: RFC-792 - - A host MAY implement Timestamp and Timestamp Reply. If they - are implemented, the following rules MUST be followed. - - - - -Internet Engineering Task Force [Page 43] - - - - -RFC1122 INTERNET LAYER October 1989 - - - o The ICMP Timestamp server function returns a Timestamp - Reply to every Timestamp message that is received. If - this function is implemented, it SHOULD be designed for - minimum variability in delay (e.g., implemented in the - kernel to avoid delay in scheduling a user process). - - The following cases for Timestamp are to be handled - according to the corresponding rules for ICMP Echo: - - o An ICMP Timestamp Request message to an IP broadcast or - IP multicast address MAY be silently discarded. - - o The IP source address in an ICMP Timestamp Reply MUST - be the same as the specific-destination address of the - corresponding Timestamp Request message. - - o If a Source-route option is received in an ICMP Echo - Request, the return route MUST be reversed and used as - a Source Route option for the Timestamp Reply message. - - o If a Record Route and/or Timestamp option is received - in a Timestamp Request, this (these) option(s) SHOULD - be updated to include the current host and included in - the IP header of the Timestamp Reply message. - - o Incoming Timestamp Reply messages MUST be passed up to - the ICMP user interface. - - The preferred form for a timestamp value (the "standard - value") is in units of milliseconds since midnight Universal - Time. However, it may be difficult to provide this value - with millisecond resolution. For example, many systems use - clocks that update only at line frequency, 50 or 60 times - per second. Therefore, some latitude is allowed in a - "standard value": - - (a) A "standard value" MUST be updated at least 15 times - per second (i.e., at most the six low-order bits of the - value may be undefined). - - (b) The accuracy of a "standard value" MUST approximate - that of operator-set CPU clocks, i.e., correct within a - few minutes. - - - - - - - - -Internet Engineering Task Force [Page 44] - - - - -RFC1122 INTERNET LAYER October 1989 - - - 3.2.2.9 Address Mask Request/Reply: RFC-950 - - A host MUST support the first, and MAY implement all three, - of the following methods for determining the address mask(s) - corresponding to its IP address(es): - - (1) static configuration information; - - (2) obtaining the address mask(s) dynamically as a side- - effect of the system initialization process (see - [INTRO:1]); and - - (3) sending ICMP Address Mask Request(s) and receiving ICMP - Address Mask Reply(s). - - The choice of method to be used in a particular host MUST be - configurable. - - When method (3), the use of Address Mask messages, is - enabled, then: - - (a) When it initializes, the host MUST broadcast an Address - Mask Request message on the connected network - corresponding to the IP address. It MUST retransmit - this message a small number of times if it does not - receive an immediate Address Mask Reply. - - (b) Until it has received an Address Mask Reply, the host - SHOULD assume a mask appropriate for the address class - of the IP address, i.e., assume that the connected - network is not subnetted. - - (c) The first Address Mask Reply message received MUST be - used to set the address mask corresponding to the - particular local IP address. This is true even if the - first Address Mask Reply message is "unsolicited", in - which case it will have been broadcast and may arrive - after the host has ceased to retransmit Address Mask - Requests. Once the mask has been set by an Address - Mask Reply, later Address Mask Reply messages MUST be - (silently) ignored. - - Conversely, if Address Mask messages are disabled, then no - ICMP Address Mask Requests will be sent, and any ICMP - Address Mask Replies received for that local IP address MUST - be (silently) ignored. - - A host SHOULD make some reasonableness check on any address - - - -Internet Engineering Task Force [Page 45] - - - - -RFC1122 INTERNET LAYER October 1989 - - - mask it installs; see IMPLEMENTATION section below. - - A system MUST NOT send an Address Mask Reply unless it is an - authoritative agent for address masks. An authoritative - agent may be a host or a gateway, but it MUST be explicitly - configured as a address mask agent. Receiving an address - mask via an Address Mask Reply does not give the receiver - authority and MUST NOT be used as the basis for issuing - Address Mask Replies. - - With a statically configured address mask, there SHOULD be - an additional configuration flag that determines whether the - host is to act as an authoritative agent for this mask, - i.e., whether it will answer Address Mask Request messages - using this mask. - - If it is configured as an agent, the host MUST broadcast an - Address Mask Reply for the mask on the appropriate interface - when it initializes. - - See "System Initialization" in [INTRO:1] for more - information about the use of Address Mask Request/Reply - messages. - - DISCUSSION - Hosts that casually send Address Mask Replies with - invalid address masks have often been a serious - nuisance. To prevent this, Address Mask Replies ought - to be sent only by authoritative agents that have been - selected by explicit administrative action. - - When an authoritative agent receives an Address Mask - Request message, it will send a unicast Address Mask - Reply to the source IP address. If the network part of - this address is zero (see (a) and (b) in 3.2.1.3), the - Reply will be broadcast. - - Getting no reply to its Address Mask Request messages, - a host will assume there is no agent and use an - unsubnetted mask, but the agent may be only temporarily - unreachable. An agent will broadcast an unsolicited - Address Mask Reply whenever it initializes, in order to - update the masks of all hosts that have initialized in - the meantime. - - IMPLEMENTATION: - The following reasonableness check on an address mask - is suggested: the mask is not all 1 bits, and it is - - - -Internet Engineering Task Force [Page 46] - - - - -RFC1122 INTERNET LAYER October 1989 - - - either zero or else the 8 highest-order bits are on. - - 3.2.3 Internet Group Management Protocol IGMP - - IGMP [IP:4] is a protocol used between hosts and gateways on a - single network to establish hosts' membership in particular - multicast groups. The gateways use this information, in - conjunction with a multicast routing protocol, to support IP - multicasting across the Internet. - - At this time, implementation of IGMP is OPTIONAL; see Section - 3.3.7 for more information. Without IGMP, a host can still - participate in multicasting local to its connected networks. - - 3.3 SPECIFIC ISSUES - - 3.3.1 Routing Outbound Datagrams - - The IP layer chooses the correct next hop for each datagram it - sends. If the destination is on a connected network, the - datagram is sent directly to the destination host; otherwise, - it has to be routed to a gateway on a connected network. - - 3.3.1.1 Local/Remote Decision - - To decide if the destination is on a connected network, the - following algorithm MUST be used [see IP:3]: - - (a) The address mask (particular to a local IP address for - a multihomed host) is a 32-bit mask that selects the - network number and subnet number fields of the - corresponding IP address. - - (b) If the IP destination address bits extracted by the - address mask match the IP source address bits extracted - by the same mask, then the destination is on the - corresponding connected network, and the datagram is to - be transmitted directly to the destination host. - - (c) If not, then the destination is accessible only through - a gateway. Selection of a gateway is described below - (3.3.1.2). - - A special-case destination address is handled as follows: - - * For a limited broadcast or a multicast address, simply - pass the datagram to the link layer for the appropriate - interface. - - - -Internet Engineering Task Force [Page 47] - - - - -RFC1122 INTERNET LAYER October 1989 - - - * For a (network or subnet) directed broadcast, the - datagram can use the standard routing algorithms. - - The host IP layer MUST operate correctly in a minimal - network environment, and in particular, when there are no - gateways. For example, if the IP layer of a host insists on - finding at least one gateway to initialize, the host will be - unable to operate on a single isolated broadcast net. - - 3.3.1.2 Gateway Selection - - To efficiently route a series of datagrams to the same - destination, the source host MUST keep a "route cache" of - mappings to next-hop gateways. A host uses the following - basic algorithm on this cache to route a datagram; this - algorithm is designed to put the primary routing burden on - the gateways [IP:11]. - - (a) If the route cache contains no information for a - particular destination, the host chooses a "default" - gateway and sends the datagram to it. It also builds a - corresponding Route Cache entry. - - (b) If that gateway is not the best next hop to the - destination, the gateway will forward the datagram to - the best next-hop gateway and return an ICMP Redirect - message to the source host. - - (c) When it receives a Redirect, the host updates the - next-hop gateway in the appropriate route cache entry, - so later datagrams to the same destination will go - directly to the best gateway. - - Since the subnet mask appropriate to the destination address - is generally not known, a Network Redirect message SHOULD be - treated identically to a Host Redirect message; i.e., the - cache entry for the destination host (only) would be updated - (or created, if an entry for that host did not exist) for - the new gateway. - - DISCUSSION: - This recommendation is to protect against gateways that - erroneously send Network Redirects for a subnetted - network, in violation of the gateway requirements - [INTRO:2]. - - When there is no route cache entry for the destination host - address (and the destination is not on the connected - - - -Internet Engineering Task Force [Page 48] - - - - -RFC1122 INTERNET LAYER October 1989 - - - network), the IP layer MUST pick a gateway from its list of - "default" gateways. The IP layer MUST support multiple - default gateways. - - As an extra feature, a host IP layer MAY implement a table - of "static routes". Each such static route MAY include a - flag specifying whether it may be overridden by ICMP - Redirects. - - DISCUSSION: - A host generally needs to know at least one default - gateway to get started. This information can be - obtained from a configuration file or else from the - host startup sequence, e.g., the BOOTP protocol (see - [INTRO:1]). - - It has been suggested that a host can augment its list - of default gateways by recording any new gateways it - learns about. For example, it can record every gateway - to which it is ever redirected. Such a feature, while - possibly useful in some circumstances, may cause - problems in other cases (e.g., gateways are not all - equal), and it is not recommended. - - A static route is typically a particular preset mapping - from destination host or network into a particular - next-hop gateway; it might also depend on the Type-of- - Service (see next section). Static routes would be set - up by system administrators to override the normal - automatic routing mechanism, to handle exceptional - situations. However, any static routing information is - a potential source of failure as configurations change - or equipment fails. - - 3.3.1.3 Route Cache - - Each route cache entry needs to include the following - fields: - - (1) Local IP address (for a multihomed host) - - (2) Destination IP address - - (3) Type(s)-of-Service - - (4) Next-hop gateway IP address - - Field (2) MAY be the full IP address of the destination - - - -Internet Engineering Task Force [Page 49] - - - - -RFC1122 INTERNET LAYER October 1989 - - - host, or only the destination network number. Field (3), - the TOS, SHOULD be included. - - See Section 3.3.4.2 for a discussion of the implications of - multihoming for the lookup procedure in this cache. - - DISCUSSION: - Including the Type-of-Service field in the route cache - and considering it in the host route algorithm will - provide the necessary mechanism for the future when - Type-of-Service routing is commonly used in the - Internet. See Section 3.2.1.6. - - Each route cache entry defines the endpoints of an - Internet path. Although the connecting path may change - dynamically in an arbitrary way, the transmission - characteristics of the path tend to remain - approximately constant over a time period longer than a - single typical host-host transport connection. - Therefore, a route cache entry is a natural place to - cache data on the properties of the path. Examples of - such properties might be the maximum unfragmented - datagram size (see Section 3.3.3), or the average - round-trip delay measured by a transport protocol. - This data will generally be both gathered and used by a - higher layer protocol, e.g., by TCP, or by an - application using UDP. Experiments are currently in - progress on caching path properties in this manner. - - There is no consensus on whether the route cache should - be keyed on destination host addresses alone, or allow - both host and network addresses. Those who favor the - use of only host addresses argue that: - - (1) As required in Section 3.3.1.2, Redirect messages - will generally result in entries keyed on - destination host addresses; the simplest and most - general scheme would be to use host addresses - always. - - (2) The IP layer may not always know the address mask - for a network address in a complex subnetted - environment. - - (3) The use of only host addresses allows the - destination address to be used as a pure 32-bit - number, which may allow the Internet architecture - to be more easily extended in the future without - - - -Internet Engineering Task Force [Page 50] - - - - -RFC1122 INTERNET LAYER October 1989 - - - any change to the hosts. - - The opposing view is that allowing a mixture of - destination hosts and networks in the route cache: - - (1) Saves memory space. - - (2) Leads to a simpler data structure, easily - combining the cache with the tables of default and - static routes (see below). - - (3) Provides a more useful place to cache path - properties, as discussed earlier. - - - IMPLEMENTATION: - The cache needs to be large enough to include entries - for the maximum number of destination hosts that may be - in use at one time. - - A route cache entry may also include control - information used to choose an entry for replacement. - This might take the form of a "recently used" bit, a - use count, or a last-used timestamp, for example. It - is recommended that it include the time of last - modification of the entry, for diagnostic purposes. - - An implementation may wish to reduce the overhead of - scanning the route cache for every datagram to be - transmitted. This may be accomplished with a hash - table to speed the lookup, or by giving a connection- - oriented transport protocol a "hint" or temporary - handle on the appropriate cache entry, to be passed to - the IP layer with each subsequent datagram. - - Although we have described the route cache, the lists - of default gateways, and a table of static routes as - conceptually distinct, in practice they may be combined - into a single "routing table" data structure. - - 3.3.1.4 Dead Gateway Detection - - The IP layer MUST be able to detect the failure of a "next- - hop" gateway that is listed in its route cache and to choose - an alternate gateway (see Section 3.3.1.5). - - Dead gateway detection is covered in some detail in RFC-816 - [IP:11]. Experience to date has not produced a complete - - - -Internet Engineering Task Force [Page 51] - - - - -RFC1122 INTERNET LAYER October 1989 - - - algorithm which is totally satisfactory, though it has - identified several forbidden paths and promising techniques. - - * A particular gateway SHOULD NOT be used indefinitely in - the absence of positive indications that it is - functioning. - - * Active probes such as "pinging" (i.e., using an ICMP - Echo Request/Reply exchange) are expensive and scale - poorly. In particular, hosts MUST NOT actively check - the status of a first-hop gateway by simply pinging the - gateway continuously. - - * Even when it is the only effective way to verify a - gateway's status, pinging MUST be used only when - traffic is being sent to the gateway and when there is - no other positive indication to suggest that the - gateway is functioning. - - * To avoid pinging, the layers above and/or below the - Internet layer SHOULD be able to give "advice" on the - status of route cache entries when either positive - (gateway OK) or negative (gateway dead) information is - available. - - - DISCUSSION: - If an implementation does not include an adequate - mechanism for detecting a dead gateway and re-routing, - a gateway failure may cause datagrams to apparently - vanish into a "black hole". This failure can be - extremely confusing for users and difficult for network - personnel to debug. - - The dead-gateway detection mechanism must not cause - unacceptable load on the host, on connected networks, - or on first-hop gateway(s). The exact constraints on - the timeliness of dead gateway detection and on - acceptable load may vary somewhat depending on the - nature of the host's mission, but a host generally - needs to detect a failed first-hop gateway quickly - enough that transport-layer connections will not break - before an alternate gateway can be selected. - - Passing advice from other layers of the protocol stack - complicates the interfaces between the layers, but it - is the preferred approach to dead gateway detection. - Advice can come from almost any part of the IP/TCP - - - -Internet Engineering Task Force [Page 52] - - - - -RFC1122 INTERNET LAYER October 1989 - - - architecture, but it is expected to come primarily from - the transport and link layers. Here are some possible - sources for gateway advice: - - o TCP or any connection-oriented transport protocol - should be able to give negative advice, e.g., - triggered by excessive retransmissions. - - o TCP may give positive advice when (new) data is - acknowledged. Even though the route may be - asymmetric, an ACK for new data proves that the - acknowleged data must have been transmitted - successfully. - - o An ICMP Redirect message from a particular gateway - should be used as positive advice about that - gateway. - - o Link-layer information that reliably detects and - reports host failures (e.g., ARPANET Destination - Dead messages) should be used as negative advice. - - o Failure to ARP or to re-validate ARP mappings may - be used as negative advice for the corresponding - IP address. - - o Packets arriving from a particular link-layer - address are evidence that the system at this - address is alive. However, turning this - information into advice about gateways requires - mapping the link-layer address into an IP address, - and then checking that IP address against the - gateways pointed to by the route cache. This is - probably prohibitively inefficient. - - Note that positive advice that is given for every - datagram received may cause unacceptable overhead in - the implementation. - - While advice might be passed using required arguments - in all interfaces to the IP layer, some transport and - application layer protocols cannot deduce the correct - advice. These interfaces must therefore allow a - neutral value for advice, since either always-positive - or always-negative advice leads to incorrect behavior. - - There is another technique for dead gateway detection - that has been commonly used but is not recommended. - - - -Internet Engineering Task Force [Page 53] - - - - -RFC1122 INTERNET LAYER October 1989 - - - This technique depends upon the host passively - receiving ("wiretapping") the Interior Gateway Protocol - (IGP) datagrams that the gateways are broadcasting to - each other. This approach has the drawback that a host - needs to recognize all the interior gateway protocols - that gateways may use (see [INTRO:2]). In addition, it - only works on a broadcast network. - - At present, pinging (i.e., using ICMP Echo messages) is - the mechanism for gateway probing when absolutely - required. A successful ping guarantees that the - addressed interface and its associated machine are up, - but it does not guarantee that the machine is a gateway - as opposed to a host. The normal inference is that if - a Redirect or other evidence indicates that a machine - was a gateway, successful pings will indicate that the - machine is still up and hence still a gateway. - However, since a host silently discards packets that a - gateway would forward or redirect, this assumption - could sometimes fail. To avoid this problem, a new - ICMP message under development will ask "are you a - gateway?" - - IMPLEMENTATION: - The following specific algorithm has been suggested: - - o Associate a "reroute timer" with each gateway - pointed to by the route cache. Initialize the - timer to a value Tr, which must be small enough to - allow detection of a dead gateway before transport - connections time out. - - o Positive advice would reset the reroute timer to - Tr. Negative advice would reduce or zero the - reroute timer. - - o Whenever the IP layer used a particular gateway to - route a datagram, it would check the corresponding - reroute timer. If the timer had expired (reached - zero), the IP layer would send a ping to the - gateway, followed immediately by the datagram. - - o The ping (ICMP Echo) would be sent again if - necessary, up to N times. If no ping reply was - received in N tries, the gateway would be assumed - to have failed, and a new first-hop gateway would - be chosen for all cache entries pointing to the - failed gateway. - - - -Internet Engineering Task Force [Page 54] - - - - -RFC1122 INTERNET LAYER October 1989 - - - Note that the size of Tr is inversely related to the - amount of advice available. Tr should be large enough - to insure that: - - * Any pinging will be at a low level (e.g., <10%) of - all packets sent to a gateway from the host, AND - - * pinging is infrequent (e.g., every 3 minutes) - - Since the recommended algorithm is concerned with the - gateways pointed to by route cache entries, rather than - the cache entries themselves, a two level data - structure (perhaps coordinated with ARP or similar - caches) may be desirable for implementing a route - cache. - - 3.3.1.5 New Gateway Selection - - If the failed gateway is not the current default, the IP - layer can immediately switch to a default gateway. If it is - the current default that failed, the IP layer MUST select a - different default gateway (assuming more than one default is - known) for the failed route and for establishing new routes. - - DISCUSSION: - When a gateway does fail, the other gateways on the - connected network will learn of the failure through - some inter-gateway routing protocol. However, this - will not happen instantaneously, since gateway routing - protocols typically have a settling time of 30-60 - seconds. If the host switches to an alternative - gateway before the gateways have agreed on the failure, - the new target gateway will probably forward the - datagram to the failed gateway and send a Redirect back - to the host pointing to the failed gateway (!). The - result is likely to be a rapid oscillation in the - contents of the host's route cache during the gateway - settling period. It has been proposed that the dead- - gateway logic should include some hysteresis mechanism - to prevent such oscillations. However, experience has - not shown any harm from such oscillations, since - service cannot be restored to the host until the - gateways' routing information does settle down. - - IMPLEMENTATION: - One implementation technique for choosing a new default - gateway is to simply round-robin among the default - gateways in the host's list. Another is to rank the - - - -Internet Engineering Task Force [Page 55] - - - - -RFC1122 INTERNET LAYER October 1989 - - - gateways in priority order, and when the current - default gateway is not the highest priority one, to - "ping" the higher-priority gateways slowly to detect - when they return to service. This pinging can be at a - very low rate, e.g., 0.005 per second. - - 3.3.1.6 Initialization - - The following information MUST be configurable: - - (1) IP address(es). - - (2) Address mask(s). - - (3) A list of default gateways, with a preference level. - - A manual method of entering this configuration data MUST be - provided. In addition, a variety of methods can be used to - determine this information dynamically; see the section on - "Host Initialization" in [INTRO:1]. - - DISCUSSION: - Some host implementations use "wiretapping" of gateway - protocols on a broadcast network to learn what gateways - exist. A standard method for default gateway discovery - is under development. - - 3.3.2 Reassembly - - The IP layer MUST implement reassembly of IP datagrams. - - We designate the largest datagram size that can be reassembled - by EMTU_R ("Effective MTU to receive"); this is sometimes - called the "reassembly buffer size". EMTU_R MUST be greater - than or equal to 576, SHOULD be either configurable or - indefinite, and SHOULD be greater than or equal to the MTU of - the connected network(s). - - DISCUSSION: - A fixed EMTU_R limit should not be built into the code - because some application layer protocols require EMTU_R - values larger than 576. - - IMPLEMENTATION: - An implementation may use a contiguous reassembly buffer - for each datagram, or it may use a more complex data - structure that places no definite limit on the reassembled - datagram size; in the latter case, EMTU_R is said to be - - - -Internet Engineering Task Force [Page 56] - - - - -RFC1122 INTERNET LAYER October 1989 - - - "indefinite". - - Logically, reassembly is performed by simply copying each - fragment into the packet buffer at the proper offset. - Note that fragments may overlap if successive - retransmissions use different packetizing but the same - reassembly Id. - - The tricky part of reassembly is the bookkeeping to - determine when all bytes of the datagram have been - reassembled. We recommend Clark's algorithm [IP:10] that - requires no additional data space for the bookkeeping. - However, note that, contrary to [IP:10], the first - fragment header needs to be saved for inclusion in a - possible ICMP Time Exceeded (Reassembly Timeout) message. - - There MUST be a mechanism by which the transport layer can - learn MMS_R, the maximum message size that can be received and - reassembled in an IP datagram (see GET_MAXSIZES calls in - Section 3.4). If EMTU_R is not indefinite, then the value of - MMS_R is given by: - - MMS_R = EMTU_R - 20 - - since 20 is the minimum size of an IP header. - - There MUST be a reassembly timeout. The reassembly timeout - value SHOULD be a fixed value, not set from the remaining TTL. - It is recommended that the value lie between 60 seconds and 120 - seconds. If this timeout expires, the partially-reassembled - datagram MUST be discarded and an ICMP Time Exceeded message - sent to the source host (if fragment zero has been received). - - DISCUSSION: - The IP specification says that the reassembly timeout - should be the remaining TTL from the IP header, but this - does not work well because gateways generally treat TTL as - a simple hop count rather than an elapsed time. If the - reassembly timeout is too small, datagrams will be - discarded unnecessarily, and communication may fail. The - timeout needs to be at least as large as the typical - maximum delay across the Internet. A realistic minimum - reassembly timeout would be 60 seconds. - - It has been suggested that a cache might be kept of - round-trip times measured by transport protocols for - various destinations, and that these values might be used - to dynamically determine a reasonable reassembly timeout - - - -Internet Engineering Task Force [Page 57] - - - - -RFC1122 INTERNET LAYER October 1989 - - - value. Further investigation of this approach is - required. - - If the reassembly timeout is set too high, buffer - resources in the receiving host will be tied up too long, - and the MSL (Maximum Segment Lifetime) [TCP:1] will be - larger than necessary. The MSL controls the maximum rate - at which fragmented datagrams can be sent using distinct - values of the 16-bit Ident field; a larger MSL lowers the - maximum rate. The TCP specification [TCP:1] arbitrarily - assumes a value of 2 minutes for MSL. This sets an upper - limit on a reasonable reassembly timeout value. - - 3.3.3 Fragmentation - - Optionally, the IP layer MAY implement a mechanism to fragment - outgoing datagrams intentionally. - - We designate by EMTU_S ("Effective MTU for sending") the - maximum IP datagram size that may be sent, for a particular - combination of IP source and destination addresses and perhaps - TOS. - - A host MUST implement a mechanism to allow the transport layer - to learn MMS_S, the maximum transport-layer message size that - may be sent for a given {source, destination, TOS} triplet (see - GET_MAXSIZES call in Section 3.4). If no local fragmentation - is performed, the value of MMS_S will be: - - MMS_S = EMTU_S - <IP header size> - - and EMTU_S must be less than or equal to the MTU of the network - interface corresponding to the source address of the datagram. - Note that <IP header size> in this equation will be 20, unless - the IP reserves space to insert IP options for its own purposes - in addition to any options inserted by the transport layer. - - A host that does not implement local fragmentation MUST ensure - that the transport layer (for TCP) or the application layer - (for UDP) obtains MMS_S from the IP layer and does not send a - datagram exceeding MMS_S in size. - - It is generally desirable to avoid local fragmentation and to - choose EMTU_S low enough to avoid fragmentation in any gateway - along the path. In the absence of actual knowledge of the - minimum MTU along the path, the IP layer SHOULD use - EMTU_S <= 576 whenever the destination address is not on a - connected network, and otherwise use the connected network's - - - -Internet Engineering Task Force [Page 58] - - - - -RFC1122 INTERNET LAYER October 1989 - - - MTU. - - The MTU of each physical interface MUST be configurable. - - A host IP layer implementation MAY have a configuration flag - "All-Subnets-MTU", indicating that the MTU of the connected - network is to be used for destinations on different subnets - within the same network, but not for other networks. Thus, - this flag causes the network class mask, rather than the subnet - address mask, to be used to choose an EMTU_S. For a multihomed - host, an "All-Subnets-MTU" flag is needed for each network - interface. - - DISCUSSION: - Picking the correct datagram size to use when sending data - is a complex topic [IP:9]. - - (a) In general, no host is required to accept an IP - datagram larger than 576 bytes (including header and - data), so a host must not send a larger datagram - without explicit knowledge or prior arrangement with - the destination host. Thus, MMS_S is only an upper - bound on the datagram size that a transport protocol - may send; even when MMS_S exceeds 556, the transport - layer must limit its messages to 556 bytes in the - absence of other knowledge about the destination - host. - - (b) Some transport protocols (e.g., TCP) provide a way to - explicitly inform the sender about the largest - datagram the other end can receive and reassemble - [IP:7]. There is no corresponding mechanism in the - IP layer. - - A transport protocol that assumes an EMTU_R larger - than 576 (see Section 3.3.2), can send a datagram of - this larger size to another host that implements the - same protocol. - - (c) Hosts should ideally limit their EMTU_S for a given - destination to the minimum MTU of all the networks - along the path, to avoid any fragmentation. IP - fragmentation, while formally correct, can create a - serious transport protocol performance problem, - because loss of a single fragment means all the - fragments in the segment must be retransmitted - [IP:9]. - - - - -Internet Engineering Task Force [Page 59] - - - - -RFC1122 INTERNET LAYER October 1989 - - - Since nearly all networks in the Internet currently - support an MTU of 576 or greater, we strongly recommend - the use of 576 for datagrams sent to non-local networks. - - It has been suggested that a host could determine the MTU - over a given path by sending a zero-offset datagram - fragment and waiting for the receiver to time out the - reassembly (which cannot complete!) and return an ICMP - Time Exceeded message. This message would include the - largest remaining fragment header in its body. More - direct mechanisms are being experimented with, but have - not yet been adopted (see e.g., RFC-1063). - - 3.3.4 Local Multihoming - - 3.3.4.1 Introduction - - A multihomed host has multiple IP addresses, which we may - think of as "logical interfaces". These logical interfaces - may be associated with one or more physical interfaces, and - these physical interfaces may be connected to the same or - different networks. - - Here are some important cases of multihoming: - - (a) Multiple Logical Networks - - The Internet architects envisioned that each physical - network would have a single unique IP network (or - subnet) number. However, LAN administrators have - sometimes found it useful to violate this assumption, - operating a LAN with multiple logical networks per - physical connected network. - - If a host connected to such a physical network is - configured to handle traffic for each of N different - logical networks, then the host will have N logical - interfaces. These could share a single physical - interface, or might use N physical interfaces to the - same network. - - (b) Multiple Logical Hosts - - When a host has multiple IP addresses that all have the - same <Network-number> part (and the same <Subnet- - number> part, if any), the logical interfaces are known - as "logical hosts". These logical interfaces might - share a single physical interface or might use separate - - - -Internet Engineering Task Force [Page 60] - - - - -RFC1122 INTERNET LAYER October 1989 - - - physical interfaces to the same physical network. - - (c) Simple Multihoming - - In this case, each logical interface is mapped into a - separate physical interface and each physical interface - is connected to a different physical network. The term - "multihoming" was originally applied only to this case, - but it is now applied more generally. - - A host with embedded gateway functionality will - typically fall into the simple multihoming case. Note, - however, that a host may be simply multihomed without - containing an embedded gateway, i.e., without - forwarding datagrams from one connected network to - another. - - This case presents the most difficult routing problems. - The choice of interface (i.e., the choice of first-hop - network) may significantly affect performance or even - reachability of remote parts of the Internet. - - - Finally, we note another possibility that is NOT - multihoming: one logical interface may be bound to multiple - physical interfaces, in order to increase the reliability or - throughput between directly connected machines by providing - alternative physical paths between them. For instance, two - systems might be connected by multiple point-to-point links. - We call this "link-layer multiplexing". With link-layer - multiplexing, the protocols above the link layer are unaware - that multiple physical interfaces are present; the link- - layer device driver is responsible for multiplexing and - routing packets across the physical interfaces. - - In the Internet protocol architecture, a transport protocol - instance ("entity") has no address of its own, but instead - uses a single Internet Protocol (IP) address. This has - implications for the IP, transport, and application layers, - and for the interfaces between them. In particular, the - application software may have to be aware of the multiple IP - addresses of a multihomed host; in other cases, the choice - can be made within the network software. - - 3.3.4.2 Multihoming Requirements - - The following general rules apply to the selection of an IP - source address for sending a datagram from a multihomed - - - -Internet Engineering Task Force [Page 61] - - - - -RFC1122 INTERNET LAYER October 1989 - - - host. - - (1) If the datagram is sent in response to a received - datagram, the source address for the response SHOULD be - the specific-destination address of the request. See - Sections 4.1.3.5 and 4.2.3.7 and the "General Issues" - section of [INTRO:1] for more specific requirements on - higher layers. - - Otherwise, a source address must be selected. - - (2) An application MUST be able to explicitly specify the - source address for initiating a connection or a - request. - - (3) In the absence of such a specification, the networking - software MUST choose a source address. Rules for this - choice are described below. - - - There are two key requirement issues related to multihoming: - - (A) A host MAY silently discard an incoming datagram whose - destination address does not correspond to the physical - interface through which it is received. - - (B) A host MAY restrict itself to sending (non-source- - routed) IP datagrams only through the physical - interface that corresponds to the IP source address of - the datagrams. - - - DISCUSSION: - Internet host implementors have used two different - conceptual models for multihoming, briefly summarized - in the following discussion. This document takes no - stand on which model is preferred; each seems to have a - place. This ambivalence is reflected in the issues (A) - and (B) being optional. - - o Strong ES Model - - The Strong ES (End System, i.e., host) model - emphasizes the host/gateway (ES/IS) distinction, - and would therefore substitute MUST for MAY in - issues (A) and (B) above. It tends to model a - multihomed host as a set of logical hosts within - the same physical host. - - - -Internet Engineering Task Force [Page 62] - - - - -RFC1122 INTERNET LAYER October 1989 - - - With respect to (A), proponents of the Strong ES - model note that automatic Internet routing - mechanisms could not route a datagram to a - physical interface that did not correspond to the - destination address. - - Under the Strong ES model, the route computation - for an outgoing datagram is the mapping: - - route(src IP addr, dest IP addr, TOS) - -> gateway - - Here the source address is included as a parameter - in order to select a gateway that is directly - reachable on the corresponding physical interface. - Note that this model logically requires that in - general there be at least one default gateway, and - preferably multiple defaults, for each IP source - address. - - o Weak ES Model - - This view de-emphasizes the ES/IS distinction, and - would therefore substitute MUST NOT for MAY in - issues (A) and (B). This model may be the more - natural one for hosts that wiretap gateway routing - protocols, and is necessary for hosts that have - embedded gateway functionality. - - The Weak ES Model may cause the Redirect mechanism - to fail. If a datagram is sent out a physical - interface that does not correspond to the - destination address, the first-hop gateway will - not realize when it needs to send a Redirect. On - the other hand, if the host has embedded gateway - functionality, then it has routing information - without listening to Redirects. - - In the Weak ES model, the route computation for an - outgoing datagram is the mapping: - - route(dest IP addr, TOS) -> gateway, interface - - - - - - - - - -Internet Engineering Task Force [Page 63] - - - - -RFC1122 INTERNET LAYER October 1989 - - - 3.3.4.3 Choosing a Source Address - - DISCUSSION: - When it sends an initial connection request (e.g., a - TCP "SYN" segment) or a datagram service request (e.g., - a UDP-based query), the transport layer on a multihomed - host needs to know which source address to use. If the - application does not specify it, the transport layer - must ask the IP layer to perform the conceptual - mapping: - - GET_SRCADDR(remote IP addr, TOS) - -> local IP address - - Here TOS is the Type-of-Service value (see Section - 3.2.1.6), and the result is the desired source address. - The following rules are suggested for implementing this - mapping: - - (a) If the remote Internet address lies on one of the - (sub-) nets to which the host is directly - connected, a corresponding source address may be - chosen, unless the corresponding interface is - known to be down. - - (b) The route cache may be consulted, to see if there - is an active route to the specified destination - network through any network interface; if so, a - local IP address corresponding to that interface - may be chosen. - - (c) The table of static routes, if any (see Section - 3.3.1.2) may be similarly consulted. - - (d) The default gateways may be consulted. If these - gateways are assigned to different interfaces, the - interface corresponding to the gateway with the - highest preference may be chosen. - - In the future, there may be a defined way for a - multihomed host to ask the gateways on all connected - networks for advice about the best network to use for a - given destination. - - IMPLEMENTATION: - It will be noted that this process is essentially the - same as datagram routing (see Section 3.3.1), and - therefore hosts may be able to combine the - - - -Internet Engineering Task Force [Page 64] - - - - -RFC1122 INTERNET LAYER October 1989 - - - implementation of the two functions. - - 3.3.5 Source Route Forwarding - - Subject to restrictions given below, a host MAY be able to act - as an intermediate hop in a source route, forwarding a source- - routed datagram to the next specified hop. - - However, in performing this gateway-like function, the host - MUST obey all the relevant rules for a gateway forwarding - source-routed datagrams [INTRO:2]. This includes the following - specific provisions, which override the corresponding host - provisions given earlier in this document: - - (A) TTL (ref. Section 3.2.1.7) - - The TTL field MUST be decremented and the datagram perhaps - discarded as specified for a gateway in [INTRO:2]. - - (B) ICMP Destination Unreachable (ref. Section 3.2.2.1) - - A host MUST be able to generate Destination Unreachable - messages with the following codes: - - 4 (Fragmentation Required but DF Set) when a source- - routed datagram cannot be fragmented to fit into the - target network; - - 5 (Source Route Failed) when a source-routed datagram - cannot be forwarded, e.g., because of a routing - problem or because the next hop of a strict source - route is not on a connected network. - - (C) IP Source Address (ref. Section 3.2.1.3) - - A source-routed datagram being forwarded MAY (and normally - will) have a source address that is not one of the IP - addresses of the forwarding host. - - (D) Record Route Option (ref. Section 3.2.1.8d) - - A host that is forwarding a source-routed datagram - containing a Record Route option MUST update that option, - if it has room. - - (E) Timestamp Option (ref. Section 3.2.1.8e) - - A host that is forwarding a source-routed datagram - - - -Internet Engineering Task Force [Page 65] - - - - -RFC1122 INTERNET LAYER October 1989 - - - containing a Timestamp Option MUST add the current - timestamp to that option, according to the rules for this - option. - - To define the rules restricting host forwarding of source- - routed datagrams, we use the term "local source-routing" if the - next hop will be through the same physical interface through - which the datagram arrived; otherwise, it is "non-local - source-routing". - - o A host is permitted to perform local source-routing - without restriction. - - o A host that supports non-local source-routing MUST have a - configurable switch to disable forwarding, and this switch - MUST default to disabled. - - o The host MUST satisfy all gateway requirements for - configurable policy filters [INTRO:2] restricting non- - local forwarding. - - If a host receives a datagram with an incomplete source route - but does not forward it for some reason, the host SHOULD return - an ICMP Destination Unreachable (code 5, Source Route Failed) - message, unless the datagram was itself an ICMP error message. - - 3.3.6 Broadcasts - - Section 3.2.1.3 defined the four standard IP broadcast address - forms: - - Limited Broadcast: {-1, -1} - - Directed Broadcast: {<Network-number>,-1} - - Subnet Directed Broadcast: - {<Network-number>,<Subnet-number>,-1} - - All-Subnets Directed Broadcast: {<Network-number>,-1,-1} - - A host MUST recognize any of these forms in the destination - address of an incoming datagram. - - There is a class of hosts* that use non-standard broadcast - address forms, substituting 0 for -1. All hosts SHOULD -_________________________ -*4.2BSD Unix and its derivatives, but not 4.3BSD. - - - - -Internet Engineering Task Force [Page 66] - - - - -RFC1122 INTERNET LAYER October 1989 - - - recognize and accept any of these non-standard broadcast - addresses as the destination address of an incoming datagram. - A host MAY optionally have a configuration option to choose the - 0 or the -1 form of broadcast address, for each physical - interface, but this option SHOULD default to the standard (-1) - form. - - When a host sends a datagram to a link-layer broadcast address, - the IP destination address MUST be a legal IP broadcast or IP - multicast address. - - A host SHOULD silently discard a datagram that is received via - a link-layer broadcast (see Section 2.4) but does not specify - an IP multicast or broadcast destination address. - - Hosts SHOULD use the Limited Broadcast address to broadcast to - a connected network. - - - DISCUSSION: - Using the Limited Broadcast address instead of a Directed - Broadcast address may improve system robustness. Problems - are often caused by machines that do not understand the - plethora of broadcast addresses (see Section 3.2.1.3), or - that may have different ideas about which broadcast - addresses are in use. The prime example of the latter is - machines that do not understand subnetting but are - attached to a subnetted net. Sending a Subnet Broadcast - for the connected network will confuse those machines, - which will see it as a message to some other host. - - There has been discussion on whether a datagram addressed - to the Limited Broadcast address ought to be sent from all - the interfaces of a multihomed host. This specification - takes no stand on the issue. - - 3.3.7 IP Multicasting - - A host SHOULD support local IP multicasting on all connected - networks for which a mapping from Class D IP addresses to - link-layer addresses has been specified (see below). Support - for local IP multicasting includes sending multicast datagrams, - joining multicast groups and receiving multicast datagrams, and - leaving multicast groups. This implies support for all of - [IP:4] except the IGMP protocol itself, which is OPTIONAL. - - - - - - -Internet Engineering Task Force [Page 67] - - - - -RFC1122 INTERNET LAYER October 1989 - - - DISCUSSION: - IGMP provides gateways that are capable of multicast - routing with the information required to support IP - multicasting across multiple networks. At this time, - multicast-routing gateways are in the experimental stage - and are not widely available. For hosts that are not - connected to networks with multicast-routing gateways or - that do not need to receive multicast datagrams - originating on other networks, IGMP serves no purpose and - is therefore optional for now. However, the rest of - [IP:4] is currently recommended for the purpose of - providing IP-layer access to local network multicast - addressing, as a preferable alternative to local broadcast - addressing. It is expected that IGMP will become - recommended at some future date, when multicast-routing - gateways have become more widely available. - - If IGMP is not implemented, a host SHOULD still join the "all- - hosts" group (224.0.0.1) when the IP layer is initialized and - remain a member for as long as the IP layer is active. - - DISCUSSION: - Joining the "all-hosts" group will support strictly local - uses of multicasting, e.g., a gateway discovery protocol, - even if IGMP is not implemented. - - The mapping of IP Class D addresses to local addresses is - currently specified for the following types of networks: - - o Ethernet/IEEE 802.3, as defined in [IP:4]. - - o Any network that supports broadcast but not multicast, - addressing: all IP Class D addresses map to the local - broadcast address. - - o Any type of point-to-point link (e.g., SLIP or HDLC - links): no mapping required. All IP multicast datagrams - are sent as-is, inside the local framing. - - Mappings for other types of networks will be specified in the - future. - - A host SHOULD provide a way for higher-layer protocols or - applications to determine which of the host's connected - network(s) support IP multicast addressing. - - - - - - -Internet Engineering Task Force [Page 68] - - - - -RFC1122 INTERNET LAYER October 1989 - - - 3.3.8 Error Reporting - - Wherever practical, hosts MUST return ICMP error datagrams on - detection of an error, except in those cases where returning an - ICMP error message is specifically prohibited. - - DISCUSSION: - A common phenomenon in datagram networks is the "black - hole disease": datagrams are sent out, but nothing comes - back. Without any error datagrams, it is difficult for - the user to figure out what the problem is. - - 3.4 INTERNET/TRANSPORT LAYER INTERFACE - - The interface between the IP layer and the transport layer MUST - provide full access to all the mechanisms of the IP layer, - including options, Type-of-Service, and Time-to-Live. The - transport layer MUST either have mechanisms to set these interface - parameters, or provide a path to pass them through from an - application, or both. - - DISCUSSION: - Applications are urged to make use of these mechanisms where - applicable, even when the mechanisms are not currently - effective in the Internet (e.g., TOS). This will allow these - mechanisms to be immediately useful when they do become - effective, without a large amount of retrofitting of host - software. - - We now describe a conceptual interface between the transport layer - and the IP layer, as a set of procedure calls. This is an - extension of the information in Section 3.3 of RFC-791 [IP:1]. - - - * Send Datagram - - SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt - => result ) - - where the parameters are defined in RFC-791. Passing an Id - parameter is optional; see Section 3.2.1.5. - - - * Receive Datagram - - RECV(BufPTR, prot - => result, src, dst, SpecDest, TOS, len, opt) - - - - -Internet Engineering Task Force [Page 69] - - - - -RFC1122 INTERNET LAYER October 1989 - - - All the parameters are defined in RFC-791, except for: - - SpecDest = specific-destination address of datagram - (defined in Section 3.2.1.3) - - The result parameter dst contains the datagram's destination - address. Since this may be a broadcast or multicast address, - the SpecDest parameter (not shown in RFC-791) MUST be passed. - The parameter opt contains all the IP options received in the - datagram; these MUST also be passed to the transport layer. - - - * Select Source Address - - GET_SRCADDR(remote, TOS) -> local - - remote = remote IP address - TOS = Type-of-Service - local = local IP address - - See Section 3.3.4.3. - - - * Find Maximum Datagram Sizes - - GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S - - MMS_R = maximum receive transport-message size. - MMS_S = maximum send transport-message size. - (local, remote, TOS defined above) - - See Sections 3.3.2 and 3.3.3. - - - * Advice on Delivery Success - - ADVISE_DELIVPROB(sense, local, remote, TOS) - - Here the parameter sense is a 1-bit flag indicating whether - positive or negative advice is being given; see the - discussion in Section 3.3.1.4. The other parameters were - defined earlier. - - - * Send ICMP Message - - SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt) - -> result - - - -Internet Engineering Task Force [Page 70] - - - - -RFC1122 INTERNET LAYER October 1989 - - - (Parameters defined in RFC-791). - - Passing an Id parameter is optional; see Section 3.2.1.5. - The transport layer MUST be able to send certain ICMP - messages: Port Unreachable or any of the query-type - messages. This function could be considered to be a special - case of the SEND() call, of course; we describe it separately - for clarity. - - - * Receive ICMP Message - - RECV_ICMP(BufPTR ) -> result, src, dst, len, opt - - (Parameters defined in RFC-791). - - The IP layer MUST pass certain ICMP messages up to the - appropriate transport-layer routine. This function could be - considered to be a special case of the RECV() call, of - course; we describe it separately for clarity. - - For an ICMP error message, the data that is passed up MUST - include the original Internet header plus all the octets of - the original message that are included in the ICMP message. - This data will be used by the transport layer to locate the - connection state information, if any. - - In particular, the following ICMP messages are to be passed - up: - - o Destination Unreachable - - o Source Quench - - o Echo Reply (to ICMP user interface, unless the Echo - Request originated in the IP layer) - - o Timestamp Reply (to ICMP user interface) - - o Time Exceeded - - - DISCUSSION: - In the future, there may be additions to this interface to - pass path data (see Section 3.3.1.3) between the IP and - transport layers. - - - - - -Internet Engineering Task Force [Page 71] - - - - -RFC1122 INTERNET LAYER October 1989 - - - 3.5 INTERNET LAYER REQUIREMENTS SUMMARY - - - | | | | |S| | - | | | | |H| |F - | | | | |O|M|o - | | |S| |U|U|o - | | |H| |L|S|t - | |M|O| |D|T|n - | |U|U|M| | |o - | |S|L|A|N|N|t - | |T|D|Y|O|O|t -FEATURE |SECTION | | | |T|T|e --------------------------------------------------|--------|-|-|-|-|-|-- - | | | | | | | -Implement IP and ICMP |3.1 |x| | | | | -Handle remote multihoming in application layer |3.1 |x| | | | | -Support local multihoming |3.1 | | |x| | | -Meet gateway specs if forward datagrams |3.1 |x| | | | | -Configuration switch for embedded gateway |3.1 |x| | | | |1 - Config switch default to non-gateway |3.1 |x| | | | |1 - Auto-config based on number of interfaces |3.1 | | | | |x|1 -Able to log discarded datagrams |3.1 | |x| | | | - Record in counter |3.1 | |x| | | | - | | | | | | | -Silently discard Version != 4 |3.2.1.1 |x| | | | | -Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | | -Addressing: | | | | | | | - Subnet addressing (RFC-950) |3.2.1.3 |x| | | | | - Src address must be host's own IP address |3.2.1.3 |x| | | | | - Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | | - Silently discard datagram with bad src addr |3.2.1.3 |x| | | | | -Support reassembly |3.2.1.4 |x| | | | | -Retain same Id field in identical datagram |3.2.1.5 | | |x| | | - | | | | | | | -TOS: | | | | | | | - Allow transport layer to set TOS |3.2.1.6 |x| | | | | - Pass received TOS up to transport layer |3.2.1.6 | |x| | | | - Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| | -TTL: | | | | | | | - Send packet with TTL of 0 |3.2.1.7 | | | | |x| - Discard received packets with TTL < 2 |3.2.1.7 | | | | |x| - Allow transport layer to set TTL |3.2.1.7 |x| | | | | - Fixed TTL is configurable |3.2.1.7 |x| | | | | - | | | | | | | -IP Options: | | | | | | | - Allow transport layer to send IP options |3.2.1.8 |x| | | | | - Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | | - - - -Internet Engineering Task Force [Page 72] - - - - -RFC1122 INTERNET LAYER October 1989 - - - IP layer silently ignore unknown options |3.2.1.8 |x| | | | | - Security option |3.2.1.8a| | |x| | | - Send Stream Identifier option |3.2.1.8b| | | |x| | - Silently ignore Stream Identifer option |3.2.1.8b|x| | | | | - Record Route option |3.2.1.8d| | |x| | | - Timestamp option |3.2.1.8e| | |x| | | -Source Route Option: | | | | | | | - Originate & terminate Source Route options |3.2.1.8c|x| | | | | - Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | | - Build correct (non-redundant) return route |3.2.1.8c|x| | | | | - Send multiple SR options in one header |3.2.1.8c| | | | |x| - | | | | | | | -ICMP: | | | | | | | - Silently discard ICMP msg with unknown type |3.2.2 |x| | | | | - Include more than 8 octets of orig datagram |3.2.2 | | |x| | | - Included octets same as received |3.2.2 |x| | | | | - Demux ICMP Error to transport protocol |3.2.2 |x| | | | | - Send ICMP error message with TOS=0 |3.2.2 | |x| | | | - Send ICMP error message for: | | | | | | | - - ICMP error msg |3.2.2 | | | | |x| - - IP b'cast or IP m'cast |3.2.2 | | | | |x| - - Link-layer b'cast |3.2.2 | | | | |x| - - Non-initial fragment |3.2.2 | | | | |x| - - Datagram with non-unique src address |3.2.2 | | | | |x| - Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | | - | | | | | | | - Dest Unreachable: | | | | | | | - Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | | - Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | | - Higher layer act on Dest Unreach |3.2.2.1 | |x| | | | - Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | | - Redirect: | | | | | | | - Host send Redirect |3.2.2.2 | | | |x| | - Update route cache when recv Redirect |3.2.2.2 |x| | | | | - Handle both Host and Net Redirects |3.2.2.2 |x| | | | | - Discard illegal Redirect |3.2.2.2 | |x| | | | - Source Quench: | | | | | | | - Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | | - Pass Source Quench to higher layer |3.2.2.3 |x| | | | | - Higher layer act on Source Quench |3.2.2.3 | |x| | | | - Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | | - Parameter Problem: | | | | | | | - Send Parameter Problem messages |3.2.2.5 | |x| | | | - Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | | - Report Parameter Problem to user |3.2.2.5 | | |x| | | - | | | | | | | - ICMP Echo Request or Reply: | | | | | | | - Echo server and Echo client |3.2.2.6 |x| | | | | - - - -Internet Engineering Task Force [Page 73] - - - - -RFC1122 INTERNET LAYER October 1989 - - - Echo client |3.2.2.6 | |x| | | | - Discard Echo Request to broadcast address |3.2.2.6 | | |x| | | - Discard Echo Request to multicast address |3.2.2.6 | | |x| | | - Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | | - Send same data in Echo Reply |3.2.2.6 |x| | | | | - Pass Echo Reply to higher layer |3.2.2.6 |x| | | | | - Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | | - Reverse and reflect Source Route option |3.2.2.6 |x| | | | | - | | | | | | | - ICMP Information Request or Reply: |3.2.2.7 | | | |x| | - ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | | - Minimize delay variability |3.2.2.8 | |x| | | |1 - Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1 - Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1 - Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1 - Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1 - Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1 - Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1 - Obey rules for "standard value" |3.2.2.8 |x| | | | |1 - | | | | | | | - ICMP Address Mask Request and Reply: | | | | | | | - Addr Mask source configurable |3.2.2.9 |x| | | | | - Support static configuration of addr mask |3.2.2.9 |x| | | | | - Get addr mask dynamically during booting |3.2.2.9 | | |x| | | - Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | | - Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3 - Assume default mask if no Reply |3.2.2.9 | |x| | | |3 - Update address mask from first Reply only |3.2.2.9 |x| | | | |3 - Reasonableness check on Addr Mask |3.2.2.9 | |x| | | | - Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x| - Explicitly configured to be agent |3.2.2.9 |x| | | | | - Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | | - Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3 - | | | | | | | -ROUTING OUTBOUND DATAGRAMS: | | | | | | | - Use address mask in local/remote decision |3.3.1.1 |x| | | | | - Operate with no gateways on conn network |3.3.1.1 |x| | | | | - Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | | - Treat Host and Net Redirect the same |3.3.1.2 | |x| | | | - If no cache entry, use default gateway |3.3.1.2 |x| | | | | - Support multiple default gateways |3.3.1.2 |x| | | | | - Provide table of static routes |3.3.1.2 | | |x| | | - Flag: route overridable by Redirects |3.3.1.2 | | |x| | | - Key route cache on host, not net address |3.3.1.3 | | |x| | | - Include TOS in route cache |3.3.1.3 | |x| | | | - | | | | | | | - Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | | - Assume route is good forever |3.3.1.4 | | | |x| | - - - -Internet Engineering Task Force [Page 74] - - - - -RFC1122 INTERNET LAYER October 1989 - - - Ping gateways continuously |3.3.1.4 | | | | |x| - Ping only when traffic being sent |3.3.1.4 |x| | | | | - Ping only when no positive indication |3.3.1.4 |x| | | | | - Higher and lower layers give advice |3.3.1.4 | |x| | | | - Switch from failed default g'way to another |3.3.1.5 |x| | | | | - Manual method of entering config info |3.3.1.6 |x| | | | | - | | | | | | | -REASSEMBLY and FRAGMENTATION: | | | | | | | - Able to reassemble incoming datagrams |3.3.2 |x| | | | | - At least 576 byte datagrams |3.3.2 |x| | | | | - EMTU_R configurable or indefinite |3.3.2 | |x| | | | - Transport layer able to learn MMS_R |3.3.2 |x| | | | | - Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | | - Fixed reassembly timeout value |3.3.2 | |x| | | | - | | | | | | | - Pass MMS_S to higher layers |3.3.3 |x| | | | | - Local fragmentation of outgoing packets |3.3.3 | | |x| | | - Else don't send bigger than MMS_S |3.3.3 |x| | | | | - Send max 576 to off-net destination |3.3.3 | |x| | | | - All-Subnets-MTU configuration flag |3.3.3 | | |x| | | - | | | | | | | -MULTIHOMING: | | | | | | | - Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | | - Allow application to choose local IP addr |3.3.4.2 |x| | | | | - Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | | - Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4 - | | | | | | | -SOURCE-ROUTE FORWARDING: | | | | | | | - Forward datagram with Source Route option |3.3.5 | | |x| | |1 - Obey corresponding gateway rules |3.3.5 |x| | | | |1 - Update TTL by gateway rules |3.3.5 |x| | | | |1 - Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1 - IP src addr not local host |3.3.5 | | |x| | |1 - Update Timestamp, Record Route options |3.3.5 |x| | | | |1 - Configurable switch for non-local SRing |3.3.5 |x| | | | |1 - Defaults to OFF |3.3.5 |x| | | | |1 - Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1 - If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2 - | | | | | | | -BROADCAST: | | | | | | | - Broadcast addr as IP source addr |3.2.1.3 | | | | |x| - Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | | - Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | | - Default to -1 broadcast |3.3.6 | |x| | | | - Recognize all broadcast address formats |3.3.6 |x| | | | | - Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | | - Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | | - Use Limited Broadcast addr for connected net |3.3.6 | |x| | | | - - - -Internet Engineering Task Force [Page 75] - - - - -RFC1122 INTERNET LAYER October 1989 - - - | | | | | | | -MULTICAST: | | | | | | | - Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | | - Support IGMP (RFC-1112) |3.3.7 | | |x| | | - Join all-hosts group at startup |3.3.7 | |x| | | | - Higher layers learn i'face m'cast capability |3.3.7 | |x| | | | - | | | | | | | -INTERFACE: | | | | | | | - Allow transport layer to use all IP mechanisms |3.4 |x| | | | | - Pass interface ident up to transport layer |3.4 |x| | | | | - Pass all IP options up to transport layer |3.4 |x| | | | | - Transport layer can send certain ICMP messages |3.4 |x| | | | | - Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | | - Include IP hdr+8 octets or more from orig. |3.4 |x| | | | | - Able to leap tall buildings at a single bound |3.5 | |x| | | | - -Footnotes: - -(1) Only if feature is implemented. - -(2) This requirement is overruled if datagram is an ICMP error message. - -(3) Only if feature is implemented and is configured "on". - -(4) Unless has embedded gateway functionality or is source routed. - - - - - - - - - - - - - - - - - - - - - - - - - - -Internet Engineering Task Force [Page 76] - - - - -RFC1122 TRANSPORT LAYER -- UDP October 1989 - - -4. TRANSPORT PROTOCOLS - - 4.1 USER DATAGRAM PROTOCOL -- UDP - - 4.1.1 INTRODUCTION - - The User Datagram Protocol UDP [UDP:1] offers only a minimal - transport service -- non-guaranteed datagram delivery -- and - gives applications direct access to the datagram service of the - IP layer. UDP is used by applications that do not require the - level of service of TCP or that wish to use communications - services (e.g., multicast or broadcast delivery) not available - from TCP. - - UDP is almost a null protocol; the only services it provides - over IP are checksumming of data and multiplexing by port - number. Therefore, an application program running over UDP - must deal directly with end-to-end communication problems that - a connection-oriented protocol would have handled -- e.g., - retransmission for reliable delivery, packetization and - reassembly, flow control, congestion avoidance, etc., when - these are required. The fairly complex coupling between IP and - TCP will be mirrored in the coupling between UDP and many - applications using UDP. - - 4.1.2 PROTOCOL WALK-THROUGH - - There are no known errors in the specification of UDP. - - 4.1.3 SPECIFIC ISSUES - - 4.1.3.1 Ports - - UDP well-known ports follow the same rules as TCP well-known - ports; see Section 4.2.2.1 below. - - If a datagram arrives addressed to a UDP port for which - there is no pending LISTEN call, UDP SHOULD send an ICMP - Port Unreachable message. - - 4.1.3.2 IP Options - - UDP MUST pass any IP option that it receives from the IP - layer transparently to the application layer. - - An application MUST be able to specify IP options to be sent - in its UDP datagrams, and UDP MUST pass these options to the - IP layer. - - - -Internet Engineering Task Force [Page 77] - - - - -RFC1122 TRANSPORT LAYER -- UDP October 1989 - - - DISCUSSION: - At present, the only options that need be passed - through UDP are Source Route, Record Route, and Time - Stamp. However, new options may be defined in the - future, and UDP need not and should not make any - assumptions about the format or content of options it - passes to or from the application; an exception to this - might be an IP-layer security option. - - An application based on UDP will need to obtain a - source route from a request datagram and supply a - reversed route for sending the corresponding reply. - - 4.1.3.3 ICMP Messages - - UDP MUST pass to the application layer all ICMP error - messages that it receives from the IP layer. Conceptually - at least, this may be accomplished with an upcall to the - ERROR_REPORT routine (see Section 4.2.4.1). - - DISCUSSION: - Note that ICMP error messages resulting from sending a - UDP datagram are received asynchronously. A UDP-based - application that wants to receive ICMP error messages - is responsible for maintaining the state necessary to - demultiplex these messages when they arrive; for - example, the application may keep a pending receive - operation for this purpose. The application is also - responsible to avoid confusion from a delayed ICMP - error message resulting from an earlier use of the same - port(s). - - 4.1.3.4 UDP Checksums - - A host MUST implement the facility to generate and validate - UDP checksums. An application MAY optionally be able to - control whether a UDP checksum will be generated, but it - MUST default to checksumming on. - - If a UDP datagram is received with a checksum that is non- - zero and invalid, UDP MUST silently discard the datagram. - An application MAY optionally be able to control whether UDP - datagrams without checksums should be discarded or passed to - the application. - - DISCUSSION: - Some applications that normally run only across local - area networks have chosen to turn off UDP checksums for - - - -Internet Engineering Task Force [Page 78] - - - - -RFC1122 TRANSPORT LAYER -- UDP October 1989 - - - efficiency. As a result, numerous cases of undetected - errors have been reported. The advisability of ever - turning off UDP checksumming is very controversial. - - IMPLEMENTATION: - There is a common implementation error in UDP - checksums. Unlike the TCP checksum, the UDP checksum - is optional; the value zero is transmitted in the - checksum field of a UDP header to indicate the absence - of a checksum. If the transmitter really calculates a - UDP checksum of zero, it must transmit the checksum as - all 1's (65535). No special action is required at the - receiver, since zero and 65535 are equivalent in 1's - complement arithmetic. - - 4.1.3.5 UDP Multihoming - - When a UDP datagram is received, its specific-destination - address MUST be passed up to the application layer. - - An application program MUST be able to specify the IP source - address to be used for sending a UDP datagram or to leave it - unspecified (in which case the networking software will - choose an appropriate source address). There SHOULD be a - way to communicate the chosen source address up to the - application layer (e.g, so that the application can later - receive a reply datagram only from the corresponding - interface). - - DISCUSSION: - A request/response application that uses UDP should use - a source address for the response that is the same as - the specific destination address of the request. See - the "General Issues" section of [INTRO:1]. - - 4.1.3.6 Invalid Addresses - - A UDP datagram received with an invalid IP source address - (e.g., a broadcast or multicast address) must be discarded - by UDP or by the IP layer (see Section 3.2.1.3). - - When a host sends a UDP datagram, the source address MUST be - (one of) the IP address(es) of the host. - - 4.1.4 UDP/APPLICATION LAYER INTERFACE - - The application interface to UDP MUST provide the full services - of the IP/transport interface described in Section 3.4 of this - - - -Internet Engineering Task Force [Page 79] - - - - -RFC1122 TRANSPORT LAYER -- UDP October 1989 - - - document. Thus, an application using UDP needs the functions - of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and - RECV_ICMP() calls described in Section 3.4. For example, - GET_MAXSIZES() can be used to learn the effective maximum UDP - maximum datagram size for a particular {interface,remote - host,TOS} triplet. - - An application-layer program MUST be able to set the TTL and - TOS values as well as IP options for sending a UDP datagram, - and these values must be passed transparently to the IP layer. - UDP MAY pass the received TOS up to the application layer. - - 4.1.5 UDP REQUIREMENTS SUMMARY - - - | | | | |S| | - | | | | |H| |F - | | | | |O|M|o - | | |S| |U|U|o - | | |H| |L|S|t - | |M|O| |D|T|n - | |U|U|M| | |o - | |S|L|A|N|N|t - | |T|D|Y|O|O|t -FEATURE |SECTION | | | |T|T|e --------------------------------------------------|--------|-|-|-|-|-|-- - | | | | | | | - UDP | | | | | | | --------------------------------------------------|--------|-|-|-|-|-|-- - | | | | | | | -UDP send Port Unreachable |4.1.3.1 | |x| | | | - | | | | | | | -IP Options in UDP | | | | | | | - - Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | | - - Applic layer can specify IP options in Send |4.1.3.2 |x| | | | | - - UDP passes IP options down to IP layer |4.1.3.2 |x| | | | | - | | | | | | | -Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | | - | | | | | | | -UDP checksums: | | | | | | | - - Able to generate/check checksum |4.1.3.4 |x| | | | | - - Silently discard bad checksum |4.1.3.4 |x| | | | | - - Sender Option to not generate checksum |4.1.3.4 | | |x| | | - - Default is to checksum |4.1.3.4 |x| | | | | - - Receiver Option to require checksum |4.1.3.4 | | |x| | | - | | | | | | | -UDP Multihoming | | | | | | | - - Pass spec-dest addr to application |4.1.3.5 |x| | | | | - - - -Internet Engineering Task Force [Page 80] - - - - -RFC1122 TRANSPORT LAYER -- UDP October 1989 - - - - Applic layer can specify Local IP addr |4.1.3.5 |x| | | | | - - Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | | - - Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | | - | | | | | | | -Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | | -Only send valid IP source address |4.1.3.6 |x| | | | | -UDP Application Interface Services | | | | | | | -Full IP interface of 3.4 for application |4.1.4 |x| | | | | - - Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | | - - Pass received TOS up to applic layer |4.1.4 | | |x| | | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Internet Engineering Task Force [Page 81] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP - - 4.2.1 INTRODUCTION - - The Transmission Control Protocol TCP [TCP:1] is the primary - virtual-circuit transport protocol for the Internet suite. TCP - provides reliable, in-sequence delivery of a full-duplex stream - of octets (8-bit bytes). TCP is used by those applications - needing reliable, connection-oriented transport service, e.g., - mail (SMTP), file transfer (FTP), and virtual terminal service - (Telnet); requirements for these application-layer protocols - are described in [INTRO:1]. - - 4.2.2 PROTOCOL WALK-THROUGH - - 4.2.2.1 Well-Known Ports: RFC-793 Section 2.7 - - DISCUSSION: - TCP reserves port numbers in the range 0-255 for - "well-known" ports, used to access services that are - standardized across the Internet. The remainder of the - port space can be freely allocated to application - processes. Current well-known port definitions are - listed in the RFC entitled "Assigned Numbers" - [INTRO:6]. A prerequisite for defining a new well- - known port is an RFC documenting the proposed service - in enough detail to allow new implementations. - - Some systems extend this notion by adding a third - subdivision of the TCP port space: reserved ports, - which are generally used for operating-system-specific - services. For example, reserved ports might fall - between 256 and some system-dependent upper limit. - Some systems further choose to protect well-known and - reserved ports by permitting only privileged users to - open TCP connections with those port values. This is - perfectly reasonable as long as the host does not - assume that all hosts protect their low-numbered ports - in this manner. - - 4.2.2.2 Use of Push: RFC-793 Section 2.8 - - When an application issues a series of SEND calls without - setting the PUSH flag, the TCP MAY aggregate the data - internally without sending it. Similarly, when a series of - segments is received without the PSH bit, a TCP MAY queue - the data internally without passing it to the receiving - application. - - - -Internet Engineering Task Force [Page 82] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - The PSH bit is not a record marker and is independent of - segment boundaries. The transmitter SHOULD collapse - successive PSH bits when it packetizes data, to send the - largest possible segment. - - A TCP MAY implement PUSH flags on SEND calls. If PUSH flags - are not implemented, then the sending TCP: (1) must not - buffer data indefinitely, and (2) MUST set the PSH bit in - the last buffered segment (i.e., when there is no more - queued data to be sent). - - The discussion in RFC-793 on pages 48, 50, and 74 - erroneously implies that a received PSH flag must be passed - to the application layer. Passing a received PSH flag to - the application layer is now OPTIONAL. - - An application program is logically required to set the PUSH - flag in a SEND call whenever it needs to force delivery of - the data to avoid a communication deadlock. However, a TCP - SHOULD send a maximum-sized segment whenever possible, to - improve performance (see Section 4.2.3.4). - - DISCUSSION: - When the PUSH flag is not implemented on SEND calls, - i.e., when the application/TCP interface uses a pure - streaming model, responsibility for aggregating any - tiny data fragments to form reasonable sized segments - is partially borne by the application layer. - - Generally, an interactive application protocol must set - the PUSH flag at least in the last SEND call in each - command or response sequence. A bulk transfer protocol - like FTP should set the PUSH flag on the last segment - of a file or when necessary to prevent buffer deadlock. - - At the receiver, the PSH bit forces buffered data to be - delivered to the application (even if less than a full - buffer has been received). Conversely, the lack of a - PSH bit can be used to avoid unnecessary wakeup calls - to the application process; this can be an important - performance optimization for large timesharing hosts. - Passing the PSH bit to the receiving application allows - an analogous optimization within the application. - - 4.2.2.3 Window Size: RFC-793 Section 3.1 - - The window size MUST be treated as an unsigned number, or - else large window sizes will appear like negative windows - - - -Internet Engineering Task Force [Page 83] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - and TCP will not work. It is RECOMMENDED that - implementations reserve 32-bit fields for the send and - receive window sizes in the connection record and do all - window computations with 32 bits. - - DISCUSSION: - It is known that the window field in the TCP header is - too small for high-speed, long-delay paths. - Experimental TCP options have been defined to extend - the window size; see for example [TCP:11]. In - anticipation of the adoption of such an extension, TCP - implementors should treat windows as 32 bits. - - 4.2.2.4 Urgent Pointer: RFC-793 Section 3.1 - - The second sentence is in error: the urgent pointer points - to the sequence number of the LAST octet (not LAST+1) in a - sequence of urgent data. The description on page 56 (last - sentence) is correct. - - A TCP MUST support a sequence of urgent data of any length. - - A TCP MUST inform the application layer asynchronously - whenever it receives an Urgent pointer and there was - previously no pending urgent data, or whenever the Urgent - pointer advances in the data stream. There MUST be a way - for the application to learn how much urgent data remains to - be read from the connection, or at least to determine - whether or not more urgent data remains to be read. - - DISCUSSION: - Although the Urgent mechanism may be used for any - application, it is normally used to send "interrupt"- - type commands to a Telnet program (see "Using Telnet - Synch Sequence" section in [INTRO:1]). - - The asynchronous or "out-of-band" notification will - allow the application to go into "urgent mode", reading - data from the TCP connection. This allows control - commands to be sent to an application whose normal - input buffers are full of unprocessed data. - - IMPLEMENTATION: - The generic ERROR-REPORT() upcall described in Section - 4.2.4.1 is a possible mechanism for informing the - application of the arrival of urgent data. - - - - - -Internet Engineering Task Force [Page 84] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - 4.2.2.5 TCP Options: RFC-793 Section 3.1 - - A TCP MUST be able to receive a TCP option in any segment. - A TCP MUST ignore without error any TCP option it does not - implement, assuming that the option has a length field (all - TCP options defined in the future will have length fields). - TCP MUST be prepared to handle an illegal option length - (e.g., zero) without crashing; a suggested procedure is to - reset the connection and log the reason. - - 4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1 - - TCP MUST implement both sending and receiving the Maximum - Segment Size option [TCP:4]. - - TCP SHOULD send an MSS (Maximum Segment Size) option in - every SYN segment when its receive MSS differs from the - default 536, and MAY send it always. - - If an MSS option is not received at connection setup, TCP - MUST assume a default send MSS of 536 (576-40) [TCP:4]. - - The maximum size of a segment that TCP really sends, the - "effective send MSS," MUST be the smaller of the send MSS - (which reflects the available reassembly buffer size at the - remote host) and the largest size permitted by the IP layer: - - Eff.snd.MSS = - - min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize - - where: - - * SendMSS is the MSS value received from the remote host, - or the default 536 if no MSS option is received. - - * MMS_S is the maximum size for a transport-layer message - that TCP may send. - - * TCPhdrsize is the size of the TCP header; this is - normally 20, but may be larger if TCP options are to be - sent. - - * IPoptionsize is the size of any IP options that TCP - will pass to the IP layer with the current message. - - - The MSS value to be sent in an MSS option must be less than - - - -Internet Engineering Task Force [Page 85] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - or equal to: - - MMS_R - 20 - - where MMS_R is the maximum size for a transport-layer - message that can be received (and reassembled). TCP obtains - MMS_R and MMS_S from the IP layer; see the generic call - GET_MAXSIZES in Section 3.4. - - DISCUSSION: - The choice of TCP segment size has a strong effect on - performance. Larger segments increase throughput by - amortizing header size and per-datagram processing - overhead over more data bytes; however, if the packet - is so large that it causes IP fragmentation, efficiency - drops sharply if any fragments are lost [IP:9]. - - Some TCP implementations send an MSS option only if the - destination host is on a non-connected network. - However, in general the TCP layer may not have the - appropriate information to make this decision, so it is - preferable to leave to the IP layer the task of - determining a suitable MTU for the Internet path. We - therefore recommend that TCP always send the option (if - not 536) and that the IP layer determine MMS_R as - specified in 3.3.3 and 3.4. A proposed IP-layer - mechanism to measure the MTU would then modify the IP - layer without changing TCP. - - 4.2.2.7 TCP Checksum: RFC-793 Section 3.1 - - Unlike the UDP checksum (see Section 4.1.3.4), the TCP - checksum is never optional. The sender MUST generate it and - the receiver MUST check it. - - 4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2, - page 23 - - There are several problems with this diagram: - - (a) The arrow from SYN-SENT to SYN-RCVD should be labeled - with "snd SYN,ACK", to agree with the text on page 68 - and with Figure 8. - - (b) There could be an arrow from SYN-RCVD state to LISTEN - state, conditioned on receiving a RST after a passive - open (see text page 70). - - - - -Internet Engineering Task Force [Page 86] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - (c) It is possible to go directly from FIN-WAIT-1 to the - TIME-WAIT state (see page 75 of the spec). - - - 4.2.2.9 Initial Sequence Number Selection: RFC-793 Section - 3.3, page 27 - - A TCP MUST use the specified clock-driven selection of - initial sequence numbers. - - 4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page - 32 - - There is an error in Figure 8: the packet on line 7 should - be identical to the packet on line 5. - - A TCP MUST support simultaneous open attempts. - - DISCUSSION: - It sometimes surprises implementors that if two - applications attempt to simultaneously connect to each - other, only one connection is generated instead of two. - This was an intentional design decision; don't try to - "fix" it. - - 4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4, - page 33 - - Note that a TCP implementation MUST keep track of whether a - connection has reached SYN_RCVD state as the result of a - passive OPEN or an active OPEN. - - 4.2.2.12 RST Segment: RFC-793 Section 3.4 - - A TCP SHOULD allow a received RST segment to include data. - - DISCUSSION - It has been suggested that a RST segment could contain - ASCII text that encoded and explained the cause of the - RST. No standard has yet been established for such - data. - - 4.2.2.13 Closing a Connection: RFC-793 Section 3.5 - - A TCP connection may terminate in two ways: (1) the normal - TCP close sequence using a FIN handshake, and (2) an "abort" - in which one or more RST segments are sent and the - connection state is immediately discarded. If a TCP - - - -Internet Engineering Task Force [Page 87] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - connection is closed by the remote site, the local - application MUST be informed whether it closed normally or - was aborted. - - The normal TCP close sequence delivers buffered data - reliably in both directions. Since the two directions of a - TCP connection are closed independently, it is possible for - a connection to be "half closed," i.e., closed in only one - direction, and a host is permitted to continue sending data - in the open direction on a half-closed connection. - - A host MAY implement a "half-duplex" TCP close sequence, so - that an application that has called CLOSE cannot continue to - read data from the connection. If such a host issues a - CLOSE call while received data is still pending in TCP, or - if new data is received after CLOSE is called, its TCP - SHOULD send a RST to show that data was lost. - - When a connection is closed actively, it MUST linger in - TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). - However, it MAY accept a new SYN from the remote TCP to - reopen the connection directly from TIME-WAIT state, if it: - - (1) assigns its initial sequence number for the new - connection to be larger than the largest sequence - number it used on the previous connection incarnation, - and - - (2) returns to TIME-WAIT state if the SYN turns out to be - an old duplicate. - - - DISCUSSION: - TCP's full-duplex data-preserving close is a feature - that is not included in the analogous ISO transport - protocol TP4. - - Some systems have not implemented half-closed - connections, presumably because they do not fit into - the I/O model of their particular operating system. On - these systems, once an application has called CLOSE, it - can no longer read input data from the connection; this - is referred to as a "half-duplex" TCP close sequence. - - The graceful close algorithm of TCP requires that the - connection state remain defined on (at least) one end - of the connection, for a timeout period of 2xMSL, i.e., - 4 minutes. During this period, the (remote socket, - - - -Internet Engineering Task Force [Page 88] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - local socket) pair that defines the connection is busy - and cannot be reused. To shorten the time that a given - port pair is tied up, some TCPs allow a new SYN to be - accepted in TIME-WAIT state. - - 4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40 - - Since RFC-793 was written, there has been extensive work on - TCP algorithms to achieve efficient data communication. - Later sections of the present document describe required and - recommended TCP algorithms to determine when to send data - (Section 4.2.3.4), when to send an acknowledgment (Section - 4.2.3.2), and when to update the window (Section 4.2.3.3). - - DISCUSSION: - One important performance issue is "Silly Window - Syndrome" or "SWS" [TCP:5], a stable pattern of small - incremental window movements resulting in extremely - poor TCP performance. Algorithms to avoid SWS are - described below for both the sending side (Section - 4.2.3.4) and the receiving side (Section 4.2.3.3). - - In brief, SWS is caused by the receiver advancing the - right window edge whenever it has any new buffer space - available to receive data and by the sender using any - incremental window, no matter how small, to send more - data [TCP:5]. The result can be a stable pattern of - sending tiny data segments, even though both sender and - receiver have a large total buffer space for the - connection. SWS can only occur during the transmission - of a large amount of data; if the connection goes - quiescent, the problem will disappear. It is caused by - typical straightforward implementation of window - management, but the sender and receiver algorithms - given below will avoid it. - - Another important TCP performance issue is that some - applications, especially remote login to character-at- - a-time hosts, tend to send streams of one-octet data - segments. To avoid deadlocks, every TCP SEND call from - such applications must be "pushed", either explicitly - by the application or else implicitly by TCP. The - result may be a stream of TCP segments that contain one - data octet each, which makes very inefficient use of - the Internet and contributes to Internet congestion. - The Nagle Algorithm described in Section 4.2.3.4 - provides a simple and effective solution to this - problem. It does have the effect of clumping - - - -Internet Engineering Task Force [Page 89] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - characters over Telnet connections; this may initially - surprise users accustomed to single-character echo, but - user acceptance has not been a problem. - - Note that the Nagle algorithm and the send SWS - avoidance algorithm play complementary roles in - improving performance. The Nagle algorithm discourages - sending tiny segments when the data to be sent - increases in small increments, while the SWS avoidance - algorithm discourages small segments resulting from the - right window edge advancing in small increments. - - A careless implementation can send two or more - acknowledgment segments per data segment received. For - example, suppose the receiver acknowledges every data - segment immediately. When the application program - subsequently consumes the data and increases the - available receive buffer space again, the receiver may - send a second acknowledgment segment to update the - window at the sender. The extreme case occurs with - single-character segments on TCP connections using the - Telnet protocol for remote login service. Some - implementations have been observed in which each - incoming 1-character segment generates three return - segments: (1) the acknowledgment, (2) a one byte - increase in the window, and (3) the echoed character, - respectively. - - 4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41 - - The algorithm suggested in RFC-793 for calculating the - retransmission timeout is now known to be inadequate; see - Section 4.2.3.1 below. - - Recent work by Jacobson [TCP:7] on Internet congestion and - TCP retransmission stability has produced a transmission - algorithm combining "slow start" with "congestion - avoidance". A TCP MUST implement this algorithm. - - If a retransmitted packet is identical to the original - packet (which implies not only that the data boundaries have - not changed, but also that the window and acknowledgment - fields of the header have not changed), then the same IP - Identification field MAY be used (see Section 3.2.1.5). - - IMPLEMENTATION: - Some TCP implementors have chosen to "packetize" the - data stream, i.e., to pick segment boundaries when - - - -Internet Engineering Task Force [Page 90] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - segments are originally sent and to queue these - segments in a "retransmission queue" until they are - acknowledged. Another design (which may be simpler) is - to defer packetizing until each time data is - transmitted or retransmitted, so there will be no - segment retransmission queue. - - In an implementation with a segment retransmission - queue, TCP performance may be enhanced by repacketizing - the segments awaiting acknowledgment when the first - retransmission timeout occurs. That is, the - outstanding segments that fitted would be combined into - one maximum-sized segment, with a new IP Identification - value. The TCP would then retain this combined segment - in the retransmit queue until it was acknowledged. - However, if the first two segments in the - retransmission queue totalled more than one maximum- - sized segment, the TCP would retransmit only the first - segment using the original IP Identification field. - - 4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41 - - A TCP receiver SHOULD NOT shrink the window, i.e., move the - right window edge to the left. However, a sending TCP MUST - be robust against window shrinking, which may cause the - "useable window" (see Section 4.2.3.4) to become negative. - - If this happens, the sender SHOULD NOT send new data, but - SHOULD retransmit normally the old unacknowledged data - between SND.UNA and SND.UNA+SND.WND. The sender MAY also - retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT - time out the connection if data beyond the right window edge - is not acknowledged. If the window shrinks to zero, the TCP - MUST probe it in the standard way (see next Section). - - DISCUSSION: - Many TCP implementations become confused if the window - shrinks from the right after data has been sent into a - larger window. Note that TCP has a heuristic to select - the latest window update despite possible datagram - reordering; as a result, it may ignore a window update - with a smaller window than previously offered if - neither the sequence number nor the acknowledgment - number is increased. - - - - - - - -Internet Engineering Task Force [Page 91] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - 4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42 - - Probing of zero (offered) windows MUST be supported. - - A TCP MAY keep its offered receive window closed - indefinitely. As long as the receiving TCP continues to - send acknowledgments in response to the probe segments, the - sending TCP MUST allow the connection to stay open. - - DISCUSSION: - It is extremely important to remember that ACK - (acknowledgment) segments that contain no data are not - reliably transmitted by TCP. If zero window probing is - not supported, a connection may hang forever when an - ACK segment that re-opens the window is lost. - - The delay in opening a zero window generally occurs - when the receiving application stops taking data from - its TCP. For example, consider a printer daemon - application, stopped because the printer ran out of - paper. - - The transmitting host SHOULD send the first zero-window - probe when a zero window has existed for the retransmission - timeout period (see Section 4.2.2.15), and SHOULD increase - exponentially the interval between successive probes. - - DISCUSSION: - This procedure minimizes delay if the zero-window - condition is due to a lost ACK segment containing a - window-opening update. Exponential backoff is - recommended, possibly with some maximum interval not - specified here. This procedure is similar to that of - the retransmission algorithm, and it may be possible to - combine the two procedures in the implementation. - - 4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8 - - Every passive OPEN call either creates a new connection - record in LISTEN state, or it returns an error; it MUST NOT - affect any previously created connection record. - - A TCP that supports multiple concurrent users MUST provide - an OPEN call that will functionally allow an application to - LISTEN on a port while a connection block with the same - local port is in SYN-SENT or SYN-RECEIVED state. - - DISCUSSION: - - - -Internet Engineering Task Force [Page 92] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - Some applications (e.g., SMTP servers) may need to - handle multiple connection attempts at about the same - time. The probability of a connection attempt failing - is reduced by giving the application some means of - listening for a new connection at the same time that an - earlier connection attempt is going through the three- - way handshake. - - IMPLEMENTATION: - Acceptable implementations of concurrent opens may - permit multiple passive OPEN calls, or they may allow - "cloning" of LISTEN-state connections from a single - passive OPEN call. - - 4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52 - - RFC-793 specified that TCP was to request the IP layer to - send TCP segments with TTL = 60. This is obsolete; the TTL - value used to send TCP segments MUST be configurable. See - Section 3.2.1.7 for discussion. - - 4.2.2.20 Event Processing: RFC-793 Section 3.9 - - While it is not strictly required, a TCP SHOULD be capable - of queueing out-of-order TCP segments. Change the "may" in - the last sentence of the first paragraph on page 70 to - "should". - - DISCUSSION: - Some small-host implementations have omitted segment - queueing because of limited buffer space. This - omission may be expected to adversely affect TCP - throughput, since loss of a single segment causes all - later segments to appear to be "out of sequence". - - In general, the processing of received segments MUST be - implemented to aggregate ACK segments whenever possible. - For example, if the TCP is processing a series of queued - segments, it MUST process them all before sending any ACK - segments. - - Here are some detailed error corrections and notes on the - Event Processing section of RFC-793. - - (a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK - state, not CLOSING. - - (b) LISTEN state, check for SYN (pp. 65, 66): With a SYN - - - -Internet Engineering Task Force [Page 93] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - bit, if the security/compartment or the precedence is - wrong for the segment, a reset is sent. The wrong form - of reset is shown in the text; it should be: - - <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> - - - (c) SYN-SENT state, Check for SYN, p. 68: When the - connection enters ESTABLISHED state, the following - variables must be set: - SND.WND <- SEG.WND - SND.WL1 <- SEG.SEQ - SND.WL2 <- SEG.ACK - - - (d) Check security and precedence, p. 71: The first heading - "ESTABLISHED STATE" should really be a list of all - states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT- - 1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and - TIME-WAIT. - - (e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if - the connection was initiated with a passive OPEN, then - return this connection to the LISTEN state and return. - Otherwise...". - - (f) Check ACK field, SYN-RECEIVED state, p. 72: When the - connection enters ESTABLISHED state, the variables - listed in (c) must be set. - - (g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a - duplicate if SEG.ACK =< SND.UNA (the = was omitted). - Similarly, the window should be updated if: SND.UNA =< - SEG.ACK =< SND.NXT. - - (h) USER TIMEOUT, p. 77: - - It would be better to notify the application of the - timeout rather than letting TCP force the connection - closed. However, see also Section 4.2.3.5. - - - 4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9 - - A TCP MAY send an ACK segment acknowledging RCV.NXT when a - valid segment arrives that is in the window but not at the - left window edge. - - - - -Internet Engineering Task Force [Page 94] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - DISCUSSION: - RFC-793 (see page 74) was ambiguous about whether or - not an ACK segment should be sent when an out-of-order - segment was received, i.e., when SEG.SEQ was unequal to - RCV.NXT. - - One reason for ACKing out-of-order segments might be to - support an experimental algorithm known as "fast - retransmit". With this algorithm, the sender uses the - "redundant" ACK's to deduce that a segment has been - lost before the retransmission timer has expired. It - counts the number of times an ACK has been received - with the same value of SEG.ACK and with the same right - window edge. If more than a threshold number of such - ACK's is received, then the segment containing the - octets starting at SEG.ACK is assumed to have been lost - and is retransmitted, without awaiting a timeout. The - threshold is chosen to compensate for the maximum - likely segment reordering in the Internet. There is - not yet enough experience with the fast retransmit - algorithm to determine how useful it is. - - 4.2.3 SPECIFIC ISSUES - - 4.2.3.1 Retransmission Timeout Calculation - - A host TCP MUST implement Karn's algorithm and Jacobson's - algorithm for computing the retransmission timeout ("RTO"). - - o Jacobson's algorithm for computing the smoothed round- - trip ("RTT") time incorporates a simple measure of the - variance [TCP:7]. - - o Karn's algorithm for selecting RTT measurements ensures - that ambiguous round-trip times will not corrupt the - calculation of the smoothed round-trip time [TCP:6]. - - This implementation also MUST include "exponential backoff" - for successive RTO values for the same segment. - Retransmission of SYN segments SHOULD use the same algorithm - as data segments. - - DISCUSSION: - There were two known problems with the RTO calculations - specified in RFC-793. First, the accurate measurement - of RTTs is difficult when there are retransmissions. - Second, the algorithm to compute the smoothed round- - trip time is inadequate [TCP:7], because it incorrectly - - - -Internet Engineering Task Force [Page 95] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - assumed that the variance in RTT values would be small - and constant. These problems were solved by Karn's and - Jacobson's algorithm, respectively. - - The performance increase resulting from the use of - these improvements varies from noticeable to dramatic. - Jacobson's algorithm for incorporating the measured RTT - variance is especially important on a low-speed link, - where the natural variation of packet sizes causes a - large variation in RTT. One vendor found link - utilization on a 9.6kb line went from 10% to 90% as a - result of implementing Jacobson's variance algorithm in - TCP. - - The following values SHOULD be used to initialize the - estimation parameters for a new connection: - - (a) RTT = 0 seconds. - - (b) RTO = 3 seconds. (The smoothed variance is to be - initialized to the value that will result in this RTO). - - The recommended upper and lower bounds on the RTO are known - to be inadequate on large internets. The lower bound SHOULD - be measured in fractions of a second (to accommodate high - speed LANs) and the upper bound should be 2*MSL, i.e., 240 - seconds. - - DISCUSSION: - Experience has shown that these initialization values - are reasonable, and that in any case the Karn and - Jacobson algorithms make TCP behavior reasonably - insensitive to the initial parameter choices. - - 4.2.3.2 When to Send an ACK Segment - - A host that is receiving a stream of TCP data segments can - increase efficiency in both the Internet and the hosts by - sending fewer than one ACK (acknowledgment) segment per data - segment received; this is known as a "delayed ACK" [TCP:5]. - - A TCP SHOULD implement a delayed ACK, but an ACK should not - be excessively delayed; in particular, the delay MUST be - less than 0.5 seconds, and in a stream of full-sized - segments there SHOULD be an ACK for at least every second - segment. - - DISCUSSION: - - - -Internet Engineering Task Force [Page 96] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - A delayed ACK gives the application an opportunity to - update the window and perhaps to send an immediate - response. In particular, in the case of character-mode - remote login, a delayed ACK can reduce the number of - segments sent by the server by a factor of 3 (ACK, - window update, and echo character all combined in one - segment). - - In addition, on some large multi-user hosts, a delayed - ACK can substantially reduce protocol processing - overhead by reducing the total number of packets to be - processed [TCP:5]. However, excessive delays on ACK's - can disturb the round-trip timing and packet "clocking" - algorithms [TCP:7]. - - 4.2.3.3 When to Send a Window Update - - A TCP MUST include a SWS avoidance algorithm in the receiver - [TCP:5]. - - IMPLEMENTATION: - The receiver's SWS avoidance algorithm determines when - the right window edge may be advanced; this is - customarily known as "updating the window". This - algorithm combines with the delayed ACK algorithm (see - Section 4.2.3.2) to determine when an ACK segment - containing the current window will really be sent to - the receiver. We use the notation of RFC-793; see - Figures 4 and 5 in that document. - - The solution to receiver SWS is to avoid advancing the - right window edge RCV.NXT+RCV.WND in small increments, - even if data is received from the network in small - segments. - - Suppose the total receive buffer space is RCV.BUFF. At - any given moment, RCV.USER octets of this total may be - tied up with data that has been received and - acknowledged but which the user process has not yet - consumed. When the connection is quiescent, RCV.WND = - RCV.BUFF and RCV.USER = 0. - - Keeping the right window edge fixed as data arrives and - is acknowledged requires that the receiver offer less - than its full buffer space, i.e., the receiver must - specify a RCV.WND that keeps RCV.NXT+RCV.WND constant - as RCV.NXT increases. Thus, the total buffer space - RCV.BUFF is generally divided into three parts: - - - -Internet Engineering Task Force [Page 97] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - - |<------- RCV.BUFF ---------------->| - 1 2 3 - ----|---------|------------------|------|---- - RCV.NXT ^ - (Fixed) - - 1 - RCV.USER = data received but not yet consumed; - 2 - RCV.WND = space advertised to sender; - 3 - Reduction = space available but not yet - advertised. - - - The suggested SWS avoidance algorithm for the receiver - is to keep RCV.NXT+RCV.WND fixed until the reduction - satisfies: - - RCV.BUFF - RCV.USER - RCV.WND >= - - min( Fr * RCV.BUFF, Eff.snd.MSS ) - - where Fr is a fraction whose recommended value is 1/2, - and Eff.snd.MSS is the effective send MSS for the - connection (see Section 4.2.2.6). When the inequality - is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER. - - Note that the general effect of this algorithm is to - advance RCV.WND in increments of Eff.snd.MSS (for - realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2). - Note also that the receiver must use its own - Eff.snd.MSS, assuming it is the same as the sender's. - - 4.2.3.4 When to Send Data - - A TCP MUST include a SWS avoidance algorithm in the sender. - - A TCP SHOULD implement the Nagle Algorithm [TCP:9] to - coalesce short segments. However, there MUST be a way for - an application to disable the Nagle algorithm on an - individual connection. In all cases, sending data is also - subject to the limitation imposed by the Slow Start - algorithm (Section 4.2.2.15). - - DISCUSSION: - The Nagle algorithm is generally as follows: - - If there is unacknowledged data (i.e., SND.NXT > - SND.UNA), then the sending TCP buffers all user - - - -Internet Engineering Task Force [Page 98] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - data (regardless of the PSH bit), until the - outstanding data has been acknowledged or until - the TCP can send a full-sized segment (Eff.snd.MSS - bytes; see Section 4.2.2.6). - - Some applications (e.g., real-time display window - updates) require that the Nagle algorithm be turned - off, so small data segments can be streamed out at the - maximum rate. - - IMPLEMENTATION: - The sender's SWS avoidance algorithm is more difficult - than the receivers's, because the sender does not know - (directly) the receiver's total buffer space RCV.BUFF. - An approach which has been found to work well is for - the sender to calculate Max(SND.WND), the maximum send - window it has seen so far on the connection, and to use - this value as an estimate of RCV.BUFF. Unfortunately, - this can only be an estimate; the receiver may at any - time reduce the size of RCV.BUFF. To avoid a resulting - deadlock, it is necessary to have a timeout to force - transmission of data, overriding the SWS avoidance - algorithm. In practice, this timeout should seldom - occur. - - The "useable window" [TCP:5] is: - - U = SND.UNA + SND.WND - SND.NXT - - i.e., the offered window less the amount of data sent - but not acknowledged. If D is the amount of data - queued in the sending TCP but not yet sent, then the - following set of rules is recommended. - - Send data: - - (1) if a maximum-sized segment can be sent, i.e, if: - - min(D,U) >= Eff.snd.MSS; - - - (2) or if the data is pushed and all queued data can - be sent now, i.e., if: - - [SND.NXT = SND.UNA and] PUSHED and D <= U - - (the bracketed condition is imposed by the Nagle - algorithm); - - - -Internet Engineering Task Force [Page 99] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - (3) or if at least a fraction Fs of the maximum window - can be sent, i.e., if: - - [SND.NXT = SND.UNA and] - - min(D.U) >= Fs * Max(SND.WND); - - - (4) or if data is PUSHed and the override timeout - occurs. - - Here Fs is a fraction whose recommended value is 1/2. - The override timeout should be in the range 0.1 - 1.0 - seconds. It may be convenient to combine this timer - with the timer used to probe zero windows (Section - 4.2.2.17). - - Finally, note that the SWS avoidance algorithm just - specified is to be used instead of the sender-side - algorithm contained in [TCP:5]. - - 4.2.3.5 TCP Connection Failures - - Excessive retransmission of the same segment by TCP - indicates some failure of the remote host or the Internet - path. This failure may be of short or long duration. The - following procedure MUST be used to handle excessive - retransmissions of data segments [IP:11]: - - (a) There are two thresholds R1 and R2 measuring the amount - of retransmission that has occurred for the same - segment. R1 and R2 might be measured in time units or - as a count of retransmissions. - - (b) When the number of transmissions of the same segment - reaches or exceeds threshold R1, pass negative advice - (see Section 3.3.1.4) to the IP layer, to trigger - dead-gateway diagnosis. - - (c) When the number of transmissions of the same segment - reaches a threshold R2 greater than R1, close the - connection. - - (d) An application MUST be able to set the value for R2 for - a particular connection. For example, an interactive - application might set R2 to "infinity," giving the user - control over when to disconnect. - - - - -Internet Engineering Task Force [Page 100] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - (d) TCP SHOULD inform the application of the delivery - problem (unless such information has been disabled by - the application; see Section 4.2.4.1), when R1 is - reached and before R2. This will allow a remote login - (User Telnet) application program to inform the user, - for example. - - The value of R1 SHOULD correspond to at least 3 - retransmissions, at the current RTO. The value of R2 SHOULD - correspond to at least 100 seconds. - - An attempt to open a TCP connection could fail with - excessive retransmissions of the SYN segment or by receipt - of a RST segment or an ICMP Port Unreachable. SYN - retransmissions MUST be handled in the general way just - described for data retransmissions, including notification - of the application layer. - - However, the values of R1 and R2 may be different for SYN - and data segments. In particular, R2 for a SYN segment MUST - be set large enough to provide retransmission of the segment - for at least 3 minutes. The application can close the - connection (i.e., give up on the open attempt) sooner, of - course. - - DISCUSSION: - Some Internet paths have significant setup times, and - the number of such paths is likely to increase in the - future. - - 4.2.3.6 TCP Keep-Alives - - Implementors MAY include "keep-alives" in their TCP - implementations, although this practice is not universally - accepted. If keep-alives are included, the application MUST - be able to turn them on or off for each TCP connection, and - they MUST default to off. - - Keep-alive packets MUST only be sent when no data or - acknowledgement packets have been received for the - connection within an interval. This interval MUST be - configurable and MUST default to no less than two hours. - - It is extremely important to remember that ACK segments that - contain no data are not reliably transmitted by TCP. - Consequently, if a keep-alive mechanism is implemented it - MUST NOT interpret failure to respond to any specific probe - as a dead connection. - - - -Internet Engineering Task Force [Page 101] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - An implementation SHOULD send a keep-alive segment with no - data; however, it MAY be configurable to send a keep-alive - segment containing one garbage octet, for compatibility with - erroneous TCP implementations. - - DISCUSSION: - A "keep-alive" mechanism periodically probes the other - end of a connection when the connection is otherwise - idle, even when there is no data to be sent. The TCP - specification does not include a keep-alive mechanism - because it could: (1) cause perfectly good connections - to break during transient Internet failures; (2) - consume unnecessary bandwidth ("if no one is using the - connection, who cares if it is still good?"); and (3) - cost money for an Internet path that charges for - packets. - - Some TCP implementations, however, have included a - keep-alive mechanism. To confirm that an idle - connection is still active, these implementations send - a probe segment designed to elicit a response from the - peer TCP. Such a segment generally contains SEG.SEQ = - SND.NXT-1 and may or may not contain one garbage octet - of data. Note that on a quiet connection SND.NXT = - RCV.NXT, so that this SEG.SEQ will be outside the - window. Therefore, the probe causes the receiver to - return an acknowledgment segment, confirming that the - connection is still live. If the peer has dropped the - connection due to a network partition or a crash, it - will respond with a RST instead of an acknowledgment - segment. - - Unfortunately, some misbehaved TCP implementations fail - to respond to a segment with SEG.SEQ = SND.NXT-1 unless - the segment contains data. Alternatively, an - implementation could determine whether a peer responded - correctly to keep-alive packets with no garbage data - octet. - - A TCP keep-alive mechanism should only be invoked in - server applications that might otherwise hang - indefinitely and consume resources unnecessarily if a - client crashes or aborts a connection during a network - failure. - - - - - - - -Internet Engineering Task Force [Page 102] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - 4.2.3.7 TCP Multihoming - - If an application on a multihomed host does not specify the - local IP address when actively opening a TCP connection, - then the TCP MUST ask the IP layer to select a local IP - address before sending the (first) SYN. See the function - GET_SRCADDR() in Section 3.4. - - At all other times, a previous segment has either been sent - or received on this connection, and TCP MUST use the same - local address is used that was used in those previous - segments. - - 4.2.3.8 IP Options - - When received options are passed up to TCP from the IP - layer, TCP MUST ignore options that it does not understand. - - A TCP MAY support the Time Stamp and Record Route options. - - An application MUST be able to specify a source route when - it actively opens a TCP connection, and this MUST take - precedence over a source route received in a datagram. - - When a TCP connection is OPENed passively and a packet - arrives with a completed IP Source Route option (containing - a return route), TCP MUST save the return route and use it - for all segments sent on this connection. If a different - source route arrives in a later segment, the later - definition SHOULD override the earlier one. - - 4.2.3.9 ICMP Messages - - TCP MUST act on an ICMP error message passed up from the IP - layer, directing it to the connection that created the - error. The necessary demultiplexing information can be - found in the IP header contained within the ICMP message. - - o Source Quench - - TCP MUST react to a Source Quench by slowing - transmission on the connection. The RECOMMENDED - procedure is for a Source Quench to trigger a "slow - start," as if a retransmission timeout had occurred. - - o Destination Unreachable -- codes 0, 1, 5 - - Since these Unreachable messages indicate soft error - - - -Internet Engineering Task Force [Page 103] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - conditions, TCP MUST NOT abort the connection, and it - SHOULD make the information available to the - application. - - DISCUSSION: - TCP could report the soft error condition directly - to the application layer with an upcall to the - ERROR_REPORT routine, or it could merely note the - message and report it to the application only when - and if the TCP connection times out. - - o Destination Unreachable -- codes 2-4 - - These are hard error conditions, so TCP SHOULD abort - the connection. - - o Time Exceeded -- codes 0, 1 - - This should be handled the same way as Destination - Unreachable codes 0, 1, 5 (see above). - - o Parameter Problem - - This should be handled the same way as Destination - Unreachable codes 0, 1, 5 (see above). - - - 4.2.3.10 Remote Address Validation - - A TCP implementation MUST reject as an error a local OPEN - call for an invalid remote IP address (e.g., a broadcast or - multicast address). - - An incoming SYN with an invalid source address must be - ignored either by TCP or by the IP layer (see Section - 3.2.1.3). - - A TCP implementation MUST silently discard an incoming SYN - segment that is addressed to a broadcast or multicast - address. - - 4.2.3.11 TCP Traffic Patterns - - IMPLEMENTATION: - The TCP protocol specification [TCP:1] gives the - implementor much freedom in designing the algorithms - that control the message flow over the connection -- - packetizing, managing the window, sending - - - -Internet Engineering Task Force [Page 104] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - acknowledgments, etc. These design decisions are - difficult because a TCP must adapt to a wide range of - traffic patterns. Experience has shown that a TCP - implementor needs to verify the design on two extreme - traffic patterns: - - o Single-character Segments - - Even if the sender is using the Nagle Algorithm, - when a TCP connection carries remote login traffic - across a low-delay LAN the receiver will generally - get a stream of single-character segments. If - remote terminal echo mode is in effect, the - receiver's system will generally echo each - character as it is received. - - o Bulk Transfer - - When TCP is used for bulk transfer, the data - stream should be made up (almost) entirely of - segments of the size of the effective MSS. - Although TCP uses a sequence number space with - byte (octet) granularity, in bulk-transfer mode - its operation should be as if TCP used a sequence - space that counted only segments. - - Experience has furthermore shown that a single TCP can - effectively and efficiently handle these two extremes. - - The most important tool for verifying a new TCP - implementation is a packet trace program. There is a - large volume of experience showing the importance of - tracing a variety of traffic patterns with other TCP - implementations and studying the results carefully. - - - 4.2.3.12 Efficiency - - IMPLEMENTATION: - Extensive experience has led to the following - suggestions for efficient implementation of TCP: - - (a) Don't Copy Data - - In bulk data transfer, the primary CPU-intensive - tasks are copying data from one place to another - and checksumming the data. It is vital to - minimize the number of copies of TCP data. Since - - - -Internet Engineering Task Force [Page 105] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - the ultimate speed limitation may be fetching data - across the memory bus, it may be useful to combine - the copy with checksumming, doing both with a - single memory fetch. - - (b) Hand-Craft the Checksum Routine - - A good TCP checksumming routine is typically two - to five times faster than a simple and direct - implementation of the definition. Great care and - clever coding are often required and advisable to - make the checksumming code "blazing fast". See - [TCP:10]. - - (c) Code for the Common Case - - TCP protocol processing can be complicated, but - for most segments there are only a few simple - decisions to be made. Per-segment processing will - be greatly speeded up by coding the main line to - minimize the number of decisions in the most - common case. - - - 4.2.4 TCP/APPLICATION LAYER INTERFACE - - 4.2.4.1 Asynchronous Reports - - There MUST be a mechanism for reporting soft TCP error - conditions to the application. Generically, we assume this - takes the form of an application-supplied ERROR_REPORT - routine that may be upcalled [INTRO:7] asynchronously from - the transport layer: - - ERROR_REPORT(local connection name, reason, subreason) - - The precise encoding of the reason and subreason parameters - is not specified here. However, the conditions that are - reported asynchronously to the application MUST include: - - * ICMP error message arrived (see 4.2.3.9) - - * Excessive retransmissions (see 4.2.3.5) - - * Urgent pointer advance (see 4.2.2.4). - - However, an application program that does not want to - receive such ERROR_REPORT calls SHOULD be able to - - - -Internet Engineering Task Force [Page 106] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - effectively disable these calls. - - DISCUSSION: - These error reports generally reflect soft errors that - can be ignored without harm by many applications. It - has been suggested that these error report calls should - default to "disabled," but this is not required. - - 4.2.4.2 Type-of-Service - - The application layer MUST be able to specify the Type-of- - Service (TOS) for segments that are sent on a connection. - It not required, but the application SHOULD be able to - change the TOS during the connection lifetime. TCP SHOULD - pass the current TOS value without change to the IP layer, - when it sends segments on the connection. - - The TOS will be specified independently in each direction on - the connection, so that the receiver application will - specify the TOS used for ACK segments. - - TCP MAY pass the most recently received TOS up to the - application. - - DISCUSSION - Some applications (e.g., SMTP) change the nature of - their communication during the lifetime of a - connection, and therefore would like to change the TOS - specification. - - Note also that the OPEN call specified in RFC-793 - includes a parameter ("options") in which the caller - can specify IP options such as source route, record - route, or timestamp. - - 4.2.4.3 Flush Call - - Some TCP implementations have included a FLUSH call, which - will empty the TCP send queue of any data for which the user - has issued SEND calls but which is still to the right of the - current send window. That is, it flushes as much queued - send data as possible without losing sequence number - synchronization. This is useful for implementing the "abort - output" function of Telnet. - - - - - - - -Internet Engineering Task Force [Page 107] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - 4.2.4.4 Multihoming - - The user interface outlined in sections 2.7 and 3.8 of RFC- - 793 needs to be extended for multihoming. The OPEN call - MUST have an optional parameter: - - OPEN( ... [local IP address,] ... ) - - to allow the specification of the local IP address. - - DISCUSSION: - Some TCP-based applications need to specify the local - IP address to be used to open a particular connection; - FTP is an example. - - IMPLEMENTATION: - A passive OPEN call with a specified "local IP address" - parameter will await an incoming connection request to - that address. If the parameter is unspecified, a - passive OPEN will await an incoming connection request - to any local IP address, and then bind the local IP - address of the connection to the particular address - that is used. - - For an active OPEN call, a specified "local IP address" - parameter will be used for opening the connection. If - the parameter is unspecified, the networking software - will choose an appropriate local IP address (see - Section 3.3.4.2) for the connection - - 4.2.5 TCP REQUIREMENT SUMMARY - - | | | | |S| | - | | | | |H| |F - | | | | |O|M|o - | | |S| |U|U|o - | | |H| |L|S|t - | |M|O| |D|T|n - | |U|U|M| | |o - | |S|L|A|N|N|t - | |T|D|Y|O|O|t -FEATURE |SECTION | | | |T|T|e --------------------------------------------------|--------|-|-|-|-|-|-- - | | | | | | | -Push flag | | | | | | | - Aggregate or queue un-pushed data |4.2.2.2 | | |x| | | - Sender collapse successive PSH flags |4.2.2.2 | |x| | | | - SEND call can specify PUSH |4.2.2.2 | | |x| | | - - - -Internet Engineering Task Force [Page 108] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x| - If cannot: PSH last segment |4.2.2.2 |x| | | | | - Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1 - Send max size segment when possible |4.2.2.2 | |x| | | | - | | | | | | | -Window | | | | | | | - Treat as unsigned number |4.2.2.3 |x| | | | | - Handle as 32-bit number |4.2.2.3 | |x| | | | - Shrink window from right |4.2.2.16| | | |x| | - Robust against shrinking window |4.2.2.16|x| | | | | - Receiver's window closed indefinitely |4.2.2.17| | |x| | | - Sender probe zero window |4.2.2.17|x| | | | | - First probe after RTO |4.2.2.17| |x| | | | - Exponential backoff |4.2.2.17| |x| | | | - Allow window stay zero indefinitely |4.2.2.17|x| | | | | - Sender timeout OK conn with zero wind |4.2.2.17| | | | |x| - | | | | | | | -Urgent Data | | | | | | | - Pointer points to last octet |4.2.2.4 |x| | | | | - Arbitrary length urgent data sequence |4.2.2.4 |x| | | | | - Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1 - ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1 - | | | | | | | -TCP Options | | | | | | | - Receive TCP option in any segment |4.2.2.5 |x| | | | | - Ignore unsupported options |4.2.2.5 |x| | | | | - Cope with illegal option length |4.2.2.5 |x| | | | | - Implement sending & receiving MSS option |4.2.2.6 |x| | | | | - Send MSS option unless 536 |4.2.2.6 | |x| | | | - Send MSS option always |4.2.2.6 | | |x| | | - Send-MSS default is 536 |4.2.2.6 |x| | | | | - Calculate effective send seg size |4.2.2.6 |x| | | | | - | | | | | | | -TCP Checksums | | | | | | | - Sender compute checksum |4.2.2.7 |x| | | | | - Receiver check checksum |4.2.2.7 |x| | | | | - | | | | | | | -Use clock-driven ISN selection |4.2.2.9 |x| | | | | - | | | | | | | -Opening Connections | | | | | | | - Support simultaneous open attempts |4.2.2.10|x| | | | | - SYN-RCVD remembers last state |4.2.2.11|x| | | | | - Passive Open call interfere with others |4.2.2.18| | | | |x| - Function: simultan. LISTENs for same port |4.2.2.18|x| | | | | - Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | | - Otherwise, use local addr of conn. |4.2.3.7 |x| | | | | - OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x| - Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | | - - - -Internet Engineering Task Force [Page 109] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - | | | | | | | -Closing Connections | | | | | | | - RST can contain data |4.2.2.12| |x| | | | - Inform application of aborted conn |4.2.2.13|x| | | | | - Half-duplex close connections |4.2.2.13| | |x| | | - Send RST to indicate data lost |4.2.2.13| |x| | | | - In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | | - Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | | - | | | | | | | -Retransmissions | | | | | | | - Jacobson Slow Start algorithm |4.2.2.15|x| | | | | - Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | | - Retransmit with same IP ident |4.2.2.15| | |x| | | - Karn's algorithm |4.2.3.1 |x| | | | | - Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | | - Exponential backoff |4.2.3.1 |x| | | | | - SYN RTO calc same as data |4.2.3.1 | |x| | | | - Recommended initial values and bounds |4.2.3.1 | |x| | | | - | | | | | | | -Generating ACK's: | | | | | | | - Queue out-of-order segments |4.2.2.20| |x| | | | - Process all Q'd before send ACK |4.2.2.20|x| | | | | - Send ACK for out-of-order segment |4.2.2.21| | |x| | | - Delayed ACK's |4.2.3.2 | |x| | | | - Delay < 0.5 seconds |4.2.3.2 |x| | | | | - Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | | - Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | | - | | | | | | | -Sending data | | | | | | | - Configurable TTL |4.2.2.19|x| | | | | - Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | | - Nagle algorithm |4.2.3.4 | |x| | | | - Application can disable Nagle algorithm |4.2.3.4 |x| | | | | - | | | | | | | -Connection Failures: | | | | | | | - Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | | - Close connection on R2 retxs |4.2.3.5 |x| | | | | - ALP can set R2 |4.2.3.5 |x| | | | |1 - Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1 - Recommended values for R1, R2 |4.2.3.5 | |x| | | | - Same mechanism for SYNs |4.2.3.5 |x| | | | | - R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | | - | | | | | | | -Send Keep-alive Packets: |4.2.3.6 | | |x| | | - - Application can request |4.2.3.6 |x| | | | | - - Default is "off" |4.2.3.6 |x| | | | | - - Only send if idle for interval |4.2.3.6 |x| | | | | - - Interval configurable |4.2.3.6 |x| | | | | - - - -Internet Engineering Task Force [Page 110] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - - Default at least 2 hrs. |4.2.3.6 |x| | | | | - - Tolerant of lost ACK's |4.2.3.6 |x| | | | | - | | | | | | | -IP Options | | | | | | | - Ignore options TCP doesn't understand |4.2.3.8 |x| | | | | - Time Stamp support |4.2.3.8 | | |x| | | - Record Route support |4.2.3.8 | | |x| | | - Source Route: | | | | | | | - ALP can specify |4.2.3.8 |x| | | | |1 - Overrides src rt in datagram |4.2.3.8 |x| | | | | - Build return route from src rt |4.2.3.8 |x| | | | | - Later src route overrides |4.2.3.8 | |x| | | | - | | | | | | | -Receiving ICMP Messages from IP |4.2.3.9 |x| | | | | - Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | | - Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x| - Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | | - Source Quench => slow start |4.2.3.9 | |x| | | | - Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | | - Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | | - | | | | | | | -Address Validation | | | | | | | - Reject OPEN call to invalid IP address |4.2.3.10|x| | | | | - Reject SYN from invalid IP address |4.2.3.10|x| | | | | - Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | | - | | | | | | | -TCP/ALP Interface Services | | | | | | | - Error Report mechanism |4.2.4.1 |x| | | | | - ALP can disable Error Report Routine |4.2.4.1 | |x| | | | - ALP can specify TOS for sending |4.2.4.2 |x| | | | | - Passed unchanged to IP |4.2.4.2 | |x| | | | - ALP can change TOS during connection |4.2.4.2 | |x| | | | - Pass received TOS up to ALP |4.2.4.2 | | |x| | | - FLUSH call |4.2.4.3 | | |x| | | - Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | | --------------------------------------------------|--------|-|-|-|-|-|-- --------------------------------------------------|--------|-|-|-|-|-|-- - -FOOTNOTES: - -(1) "ALP" means Application-Layer program. - - - - - - - - - - -Internet Engineering Task Force [Page 111] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - -5. REFERENCES - -INTRODUCTORY REFERENCES - - -[INTRO:1] "Requirements for Internet Hosts -- Application and Support," - IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123, - October 1989. - -[INTRO:2] "Requirements for Internet Gateways," R. Braden and J. - Postel, RFC-1009, June 1987. - -[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006, - (three volumes), SRI International, December 1985. - -[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel, - RFC-1011, May 1987. - - This document is republished periodically with new RFC numbers; the - latest version must be used. - -[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J. - Postel, RFC-980, March 1986. - -[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May - 1987. - - This document is republished periodically with new RFC numbers; the - latest version must be used. - -[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D. - Clark, RFC-817, July 1982. - -[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM - SOSP, Orcas Island, Washington, December 1985. - - -Secondary References: - - -[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf - and R. Kahn, IEEE Transactions on Communication, May 1974. - -[INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D. - Cohen, Computer Networks, Vol. 5, No. 4, July 1981. - -[INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel, - R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC, - - - -Internet Engineering Task Force [Page 112] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - March 1985. Also in: IEEE Communications Magazine, March 1985. - Also available as ISI-RS-85-153. - -[INTRO:12] "Final Text of DIS8473, Protocol for Providing the - Connectionless Mode Network Service," ANSI, published as RFC-994, - March 1986. - -[INTRO:13] "End System to Intermediate System Routing Exchange - Protocol," ANSI X3S3.3, published as RFC-995, April 1986. - - -LINK LAYER REFERENCES - - -[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893, - April 1984. - -[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826, - November 1982. - -[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet - Networks," C. Hornig, RFC-894, April 1984. - -[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802 - "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988. - - This RFC contains a great deal of information of importance to - Internet implementers planning to use IEEE 802 networks. - - -IP LAYER REFERENCES - - -[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981. - -[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792, - September 1981. - -[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel, - RFC-950, August 1985. - -[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112, - August 1989. - -[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department - of Defense, August 1983. - - This specification, as amended by RFC-963, is intended to describe - - - -Internet Engineering Task Force [Page 113] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - - the Internet Protocol but has some serious omissions (e.g., the - mandatory subnet extension [IP:3] and the optional multicasting - extension [IP:4]). It is also out of date. If there is a - conflict, RFC-791, RFC-792, and RFC-950 must be taken as - authoritative, while the present document is authoritative over - all. - -[IP:6] "Some Problems with the Specification of the Military Standard - Internet Protocol," D. Sidhu, RFC-963, November 1985. - -[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel, - RFC-879, November 1983. - - Discusses and clarifies the relationship between the TCP Maximum - Segment Size option and the IP datagram size. - -[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108, - October 1989. - -[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM - SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol. - 17, no. 5. - - This useful paper discusses the problems created by Internet - fragmentation and presents alternative solutions. - -[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July - 1982. - - This and the following paper should be read by every implementor. - -[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982. - -SECONDARY IP REFERENCES: - - -[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J. - Mogul, RFC-922, October 1984. - -[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July - 1982. - -[IP:14] "Something a Host Could Do with Source Quench: The Source Quench - Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July - 1987. - - This RFC first described directed broadcast addresses. However, - the bulk of the RFC is concerned with gateways, not hosts. - - - -Internet Engineering Task Force [Page 114] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - -UDP REFERENCES: - - -[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980. - - -TCP REFERENCES: - - -[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September - 1981. - - -[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of - Defense, August 1984. - - This specification as amended by RFC-964 is intended to describe - the same protocol as RFC-793 [TCP:1]. If there is a conflict, - RFC-793 takes precedence, and the present document is authoritative - over both. - - -[TCP:3] "Some Problems with the Specification of the Military Standard - Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964, - November 1985. - - -[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel, - RFC-879, November 1983. - - -[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813, - July 1982. - - -[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM - SIGCOMM-87, August 1987. - - -[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88, - August 1988. - - -SECONDARY TCP REFERENCES: - - -[TCP:8] "Modularity and Efficiency in Protocol Implementation," D. - Clark, RFC-817, July 1982. - - - -Internet Engineering Task Force [Page 115] - - - - -RFC1122 TRANSPORT LAYER -- TCP October 1989 - - -[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984. - - -[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C. - Partridge, RFC-1071, September 1988. - - -[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden, - RFC-1072, October 1988. - - -Security Considerations - - There are many security issues in the communication layers of host - software, but a full discussion is beyond the scope of this RFC. - - The Internet architecture generally provides little protection - against spoofing of IP source addresses, so any security mechanism - that is based upon verifying the IP source address of a datagram - should be treated with suspicion. However, in restricted - environments some source-address checking may be possible. For - example, there might be a secure LAN whose gateway to the rest of the - Internet discarded any incoming datagram with a source address that - spoofed the LAN address. In this case, a host on the LAN could use - the source address to test for local vs. remote source. This problem - is complicated by source routing, and some have suggested that - source-routed datagram forwarding by hosts (see Section 3.3.5) should - be outlawed for security reasons. - - Security-related issues are mentioned in sections concerning the IP - Security option (Section 3.2.1.8), the ICMP Parameter Problem message - (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and - reserved TCP ports (Section 4.2.2.1). - -Author's Address - - Robert Braden - USC/Information Sciences Institute - 4676 Admiralty Way - Marina del Rey, CA 90292-6695 - - Phone: (213) 822 1511 - - EMail: Braden@ISI.EDU - - - - - - - -Internet Engineering Task Force [Page 116] - |