summaryrefslogtreecommitdiffstats
path: root/contrib/bind9/doc/rfc/rfc1122.txt
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/bind9/doc/rfc/rfc1122.txt')
-rw-r--r--contrib/bind9/doc/rfc/rfc1122.txt6844
1 files changed, 0 insertions, 6844 deletions
diff --git a/contrib/bind9/doc/rfc/rfc1122.txt b/contrib/bind9/doc/rfc/rfc1122.txt
deleted file mode 100644
index c14f2e5..0000000
--- a/contrib/bind9/doc/rfc/rfc1122.txt
+++ /dev/null
@@ -1,6844 +0,0 @@
-
-
-
-
-
-
-Network Working Group Internet Engineering Task Force
-Request for Comments: 1122 R. Braden, Editor
- October 1989
-
-
- Requirements for Internet Hosts -- Communication Layers
-
-
-Status of This Memo
-
- This RFC is an official specification for the Internet community. It
- incorporates by reference, amends, corrects, and supplements the
- primary protocol standards documents relating to hosts. Distribution
- of this document is unlimited.
-
-Summary
-
- This is one RFC of a pair that defines and discusses the requirements
- for Internet host software. This RFC covers the communications
- protocol layers: link layer, IP layer, and transport layer; its
- companion RFC-1123 covers the application and support protocols.
-
-
-
- Table of Contents
-
-
-
-
- 1. INTRODUCTION ............................................... 5
- 1.1 The Internet Architecture .............................. 6
- 1.1.1 Internet Hosts .................................... 6
- 1.1.2 Architectural Assumptions ......................... 7
- 1.1.3 Internet Protocol Suite ........................... 8
- 1.1.4 Embedded Gateway Code ............................. 10
- 1.2 General Considerations ................................. 12
- 1.2.1 Continuing Internet Evolution ..................... 12
- 1.2.2 Robustness Principle .............................. 12
- 1.2.3 Error Logging ..................................... 13
- 1.2.4 Configuration ..................................... 14
- 1.3 Reading this Document .................................. 15
- 1.3.1 Organization ...................................... 15
- 1.3.2 Requirements ...................................... 16
- 1.3.3 Terminology ....................................... 17
- 1.4 Acknowledgments ........................................ 20
-
- 2. LINK LAYER .................................................. 21
- 2.1 INTRODUCTION ........................................... 21
-
-
-
-Internet Engineering Task Force [Page 1]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- 2.2 PROTOCOL WALK-THROUGH .................................. 21
- 2.3 SPECIFIC ISSUES ........................................ 21
- 2.3.1 Trailer Protocol Negotiation ...................... 21
- 2.3.2 Address Resolution Protocol -- ARP ................ 22
- 2.3.2.1 ARP Cache Validation ......................... 22
- 2.3.2.2 ARP Packet Queue ............................. 24
- 2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24
- 2.4 LINK/INTERNET LAYER INTERFACE .......................... 25
- 2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26
-
- 3. INTERNET LAYER PROTOCOLS .................................... 27
- 3.1 INTRODUCTION ............................................ 27
- 3.2 PROTOCOL WALK-THROUGH .................................. 29
- 3.2.1 Internet Protocol -- IP ............................ 29
- 3.2.1.1 Version Number ............................... 29
- 3.2.1.2 Checksum ..................................... 29
- 3.2.1.3 Addressing ................................... 29
- 3.2.1.4 Fragmentation and Reassembly ................. 32
- 3.2.1.5 Identification ............................... 32
- 3.2.1.6 Type-of-Service .............................. 33
- 3.2.1.7 Time-to-Live ................................. 34
- 3.2.1.8 Options ...................................... 35
- 3.2.2 Internet Control Message Protocol -- ICMP .......... 38
- 3.2.2.1 Destination Unreachable ...................... 39
- 3.2.2.2 Redirect ..................................... 40
- 3.2.2.3 Source Quench ................................ 41
- 3.2.2.4 Time Exceeded ................................ 41
- 3.2.2.5 Parameter Problem ............................ 42
- 3.2.2.6 Echo Request/Reply ........................... 42
- 3.2.2.7 Information Request/Reply .................... 43
- 3.2.2.8 Timestamp and Timestamp Reply ................ 43
- 3.2.2.9 Address Mask Request/Reply ................... 45
- 3.2.3 Internet Group Management Protocol IGMP ........... 47
- 3.3 SPECIFIC ISSUES ........................................ 47
- 3.3.1 Routing Outbound Datagrams ........................ 47
- 3.3.1.1 Local/Remote Decision ........................ 47
- 3.3.1.2 Gateway Selection ............................ 48
- 3.3.1.3 Route Cache .................................. 49
- 3.3.1.4 Dead Gateway Detection ....................... 51
- 3.3.1.5 New Gateway Selection ........................ 55
- 3.3.1.6 Initialization ............................... 56
- 3.3.2 Reassembly ........................................ 56
- 3.3.3 Fragmentation ..................................... 58
- 3.3.4 Local Multihoming ................................. 60
- 3.3.4.1 Introduction ................................. 60
- 3.3.4.2 Multihoming Requirements ..................... 61
- 3.3.4.3 Choosing a Source Address .................... 64
- 3.3.5 Source Route Forwarding ........................... 65
-
-
-
-Internet Engineering Task Force [Page 2]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- 3.3.6 Broadcasts ........................................ 66
- 3.3.7 IP Multicasting ................................... 67
- 3.3.8 Error Reporting ................................... 69
- 3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69
- 3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72
-
- 4. TRANSPORT PROTOCOLS ......................................... 77
- 4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77
- 4.1.1 INTRODUCTION ...................................... 77
- 4.1.2 PROTOCOL WALK-THROUGH ............................. 77
- 4.1.3 SPECIFIC ISSUES ................................... 77
- 4.1.3.1 Ports ........................................ 77
- 4.1.3.2 IP Options ................................... 77
- 4.1.3.3 ICMP Messages ................................ 78
- 4.1.3.4 UDP Checksums ................................ 78
- 4.1.3.5 UDP Multihoming .............................. 79
- 4.1.3.6 Invalid Addresses ............................ 79
- 4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79
- 4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80
- 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82
- 4.2.1 INTRODUCTION ...................................... 82
- 4.2.2 PROTOCOL WALK-THROUGH ............................. 82
- 4.2.2.1 Well-Known Ports ............................. 82
- 4.2.2.2 Use of Push .................................. 82
- 4.2.2.3 Window Size .................................. 83
- 4.2.2.4 Urgent Pointer ............................... 84
- 4.2.2.5 TCP Options .................................. 85
- 4.2.2.6 Maximum Segment Size Option .................. 85
- 4.2.2.7 TCP Checksum ................................. 86
- 4.2.2.8 TCP Connection State Diagram ................. 86
- 4.2.2.9 Initial Sequence Number Selection ............ 87
- 4.2.2.10 Simultaneous Open Attempts .................. 87
- 4.2.2.11 Recovery from Old Duplicate SYN ............. 87
- 4.2.2.12 RST Segment ................................. 87
- 4.2.2.13 Closing a Connection ........................ 87
- 4.2.2.14 Data Communication .......................... 89
- 4.2.2.15 Retransmission Timeout ...................... 90
- 4.2.2.16 Managing the Window ......................... 91
- 4.2.2.17 Probing Zero Windows ........................ 92
- 4.2.2.18 Passive OPEN Calls .......................... 92
- 4.2.2.19 Time to Live ................................ 93
- 4.2.2.20 Event Processing ............................ 93
- 4.2.2.21 Acknowledging Queued Segments ............... 94
- 4.2.3 SPECIFIC ISSUES ................................... 95
- 4.2.3.1 Retransmission Timeout Calculation ........... 95
- 4.2.3.2 When to Send an ACK Segment .................. 96
- 4.2.3.3 When to Send a Window Update ................. 97
- 4.2.3.4 When to Send Data ............................ 98
-
-
-
-Internet Engineering Task Force [Page 3]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- 4.2.3.5 TCP Connection Failures ...................... 100
- 4.2.3.6 TCP Keep-Alives .............................. 101
- 4.2.3.7 TCP Multihoming .............................. 103
- 4.2.3.8 IP Options ................................... 103
- 4.2.3.9 ICMP Messages ................................ 103
- 4.2.3.10 Remote Address Validation ................... 104
- 4.2.3.11 TCP Traffic Patterns ........................ 104
- 4.2.3.12 Efficiency .................................. 105
- 4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106
- 4.2.4.1 Asynchronous Reports ......................... 106
- 4.2.4.2 Type-of-Service .............................. 107
- 4.2.4.3 Flush Call ................................... 107
- 4.2.4.4 Multihoming .................................. 108
- 4.2.5 TCP REQUIREMENT SUMMARY ........................... 108
-
- 5. REFERENCES ................................................. 112
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 4]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
-1. INTRODUCTION
-
- This document is one of a pair that defines and discusses the
- requirements for host system implementations of the Internet protocol
- suite. This RFC covers the communication protocol layers: link
- layer, IP layer, and transport layer. Its companion RFC,
- "Requirements for Internet Hosts -- Application and Support"
- [INTRO:1], covers the application layer protocols. This document
- should also be read in conjunction with "Requirements for Internet
- Gateways" [INTRO:2].
-
- These documents are intended to provide guidance for vendors,
- implementors, and users of Internet communication software. They
- represent the consensus of a large body of technical experience and
- wisdom, contributed by the members of the Internet research and
- vendor communities.
-
- This RFC enumerates standard protocols that a host connected to the
- Internet must use, and it incorporates by reference the RFCs and
- other documents describing the current specifications for these
- protocols. It corrects errors in the referenced documents and adds
- additional discussion and guidance for an implementor.
-
- For each protocol, this document also contains an explicit set of
- requirements, recommendations, and options. The reader must
- understand that the list of requirements in this document is
- incomplete by itself; the complete set of requirements for an
- Internet host is primarily defined in the standard protocol
- specification documents, with the corrections, amendments, and
- supplements contained in this RFC.
-
- A good-faith implementation of the protocols that was produced after
- careful reading of the RFC's and with some interaction with the
- Internet technical community, and that followed good communications
- software engineering practices, should differ from the requirements
- of this document in only minor ways. Thus, in many cases, the
- "requirements" in this RFC are already stated or implied in the
- standard protocol documents, so that their inclusion here is, in a
- sense, redundant. However, they were included because some past
- implementation has made the wrong choice, causing problems of
- interoperability, performance, and/or robustness.
-
- This document includes discussion and explanation of many of the
- requirements and recommendations. A simple list of requirements
- would be dangerous, because:
-
- o Some required features are more important than others, and some
- features are optional.
-
-
-
-Internet Engineering Task Force [Page 5]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- o There may be valid reasons why particular vendor products that
- are designed for restricted contexts might choose to use
- different specifications.
-
- However, the specifications of this document must be followed to meet
- the general goal of arbitrary host interoperation across the
- diversity and complexity of the Internet system. Although most
- current implementations fail to meet these requirements in various
- ways, some minor and some major, this specification is the ideal
- towards which we need to move.
-
- These requirements are based on the current level of Internet
- architecture. This document will be updated as required to provide
- additional clarifications or to include additional information in
- those areas in which specifications are still evolving.
-
- This introductory section begins with a brief overview of the
- Internet architecture as it relates to hosts, and then gives some
- general advice to host software vendors. Finally, there is some
- guidance on reading the rest of the document and some terminology.
-
- 1.1 The Internet Architecture
-
- General background and discussion on the Internet architecture and
- supporting protocol suite can be found in the DDN Protocol
- Handbook [INTRO:3]; for background see for example [INTRO:9],
- [INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the
- procedure for obtaining Internet protocol documents, while
- [INTRO:6] contains a list of the numbers assigned within Internet
- protocols.
-
- 1.1.1 Internet Hosts
-
- A host computer, or simply "host," is the ultimate consumer of
- communication services. A host generally executes application
- programs on behalf of user(s), employing network and/or
- Internet communication services in support of this function.
- An Internet host corresponds to the concept of an "End-System"
- used in the OSI protocol suite [INTRO:13].
-
- An Internet communication system consists of interconnected
- packet networks supporting communication among host computers
- using the Internet protocols. The networks are interconnected
- using packet-switching computers called "gateways" or "IP
- routers" by the Internet community, and "Intermediate Systems"
- by the OSI world [INTRO:13]. The RFC "Requirements for
- Internet Gateways" [INTRO:2] contains the official
- specifications for Internet gateways. That RFC together with
-
-
-
-Internet Engineering Task Force [Page 6]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- the present document and its companion [INTRO:1] define the
- rules for the current realization of the Internet architecture.
-
- Internet hosts span a wide range of size, speed, and function.
- They range in size from small microprocessors through
- workstations to mainframes and supercomputers. In function,
- they range from single-purpose hosts (such as terminal servers)
- to full-service hosts that support a variety of online network
- services, typically including remote login, file transfer, and
- electronic mail.
-
- A host is generally said to be multihomed if it has more than
- one interface to the same or to different networks. See
- Section 1.1.3 on "Terminology".
-
- 1.1.2 Architectural Assumptions
-
- The current Internet architecture is based on a set of
- assumptions about the communication system. The assumptions
- most relevant to hosts are as follows:
-
- (a) The Internet is a network of networks.
-
- Each host is directly connected to some particular
- network(s); its connection to the Internet is only
- conceptual. Two hosts on the same network communicate
- with each other using the same set of protocols that they
- would use to communicate with hosts on distant networks.
-
- (b) Gateways don't keep connection state information.
-
- To improve robustness of the communication system,
- gateways are designed to be stateless, forwarding each IP
- datagram independently of other datagrams. As a result,
- redundant paths can be exploited to provide robust service
- in spite of failures of intervening gateways and networks.
-
- All state information required for end-to-end flow control
- and reliability is implemented in the hosts, in the
- transport layer or in application programs. All
- connection control information is thus co-located with the
- end points of the communication, so it will be lost only
- if an end point fails.
-
- (c) Routing complexity should be in the gateways.
-
- Routing is a complex and difficult problem, and ought to
- be performed by the gateways, not the hosts. An important
-
-
-
-Internet Engineering Task Force [Page 7]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- objective is to insulate host software from changes caused
- by the inevitable evolution of the Internet routing
- architecture.
-
- (d) The System must tolerate wide network variation.
-
- A basic objective of the Internet design is to tolerate a
- wide range of network characteristics -- e.g., bandwidth,
- delay, packet loss, packet reordering, and maximum packet
- size. Another objective is robustness against failure of
- individual networks, gateways, and hosts, using whatever
- bandwidth is still available. Finally, the goal is full
- "open system interconnection": an Internet host must be
- able to interoperate robustly and effectively with any
- other Internet host, across diverse Internet paths.
-
- Sometimes host implementors have designed for less
- ambitious goals. For example, the LAN environment is
- typically much more benign than the Internet as a whole;
- LANs have low packet loss and delay and do not reorder
- packets. Some vendors have fielded host implementations
- that are adequate for a simple LAN environment, but work
- badly for general interoperation. The vendor justifies
- such a product as being economical within the restricted
- LAN market. However, isolated LANs seldom stay isolated
- for long; they are soon gatewayed to each other, to
- organization-wide internets, and eventually to the global
- Internet system. In the end, neither the customer nor the
- vendor is served by incomplete or substandard Internet
- host software.
-
- The requirements spelled out in this document are designed
- for a full-function Internet host, capable of full
- interoperation over an arbitrary Internet path.
-
-
- 1.1.3 Internet Protocol Suite
-
- To communicate using the Internet system, a host must implement
- the layered set of protocols comprising the Internet protocol
- suite. A host typically must implement at least one protocol
- from each layer.
-
- The protocol layers used in the Internet architecture are as
- follows [INTRO:4]:
-
-
- o Application Layer
-
-
-
-Internet Engineering Task Force [Page 8]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- The application layer is the top layer of the Internet
- protocol suite. The Internet suite does not further
- subdivide the application layer, although some of the
- Internet application layer protocols do contain some
- internal sub-layering. The application layer of the
- Internet suite essentially combines the functions of the
- top two layers -- Presentation and Application -- of the
- OSI reference model.
-
- We distinguish two categories of application layer
- protocols: user protocols that provide service directly
- to users, and support protocols that provide common system
- functions. Requirements for user and support protocols
- will be found in the companion RFC [INTRO:1].
-
- The most common Internet user protocols are:
-
- o Telnet (remote login)
- o FTP (file transfer)
- o SMTP (electronic mail delivery)
-
- There are a number of other standardized user protocols
- [INTRO:4] and many private user protocols.
-
- Support protocols, used for host name mapping, booting,
- and management, include SNMP, BOOTP, RARP, and the Domain
- Name System (DNS) protocols.
-
-
- o Transport Layer
-
- The transport layer provides end-to-end communication
- services for applications. There are two primary
- transport layer protocols at present:
-
- o Transmission Control Protocol (TCP)
- o User Datagram Protocol (UDP)
-
- TCP is a reliable connection-oriented transport service
- that provides end-to-end reliability, resequencing, and
- flow control. UDP is a connectionless ("datagram")
- transport service.
-
- Other transport protocols have been developed by the
- research community, and the set of official Internet
- transport protocols may be expanded in the future.
-
- Transport layer protocols are discussed in Chapter 4.
-
-
-
-Internet Engineering Task Force [Page 9]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- o Internet Layer
-
- All Internet transport protocols use the Internet Protocol
- (IP) to carry data from source host to destination host.
- IP is a connectionless or datagram internetwork service,
- providing no end-to-end delivery guarantees. Thus, IP
- datagrams may arrive at the destination host damaged,
- duplicated, out of order, or not at all. The layers above
- IP are responsible for reliable delivery service when it
- is required. The IP protocol includes provision for
- addressing, type-of-service specification, fragmentation
- and reassembly, and security information.
-
- The datagram or connectionless nature of the IP protocol
- is a fundamental and characteristic feature of the
- Internet architecture. Internet IP was the model for the
- OSI Connectionless Network Protocol [INTRO:12].
-
- ICMP is a control protocol that is considered to be an
- integral part of IP, although it is architecturally
- layered upon IP, i.e., it uses IP to carry its data end-
- to-end just as a transport protocol like TCP or UDP does.
- ICMP provides error reporting, congestion reporting, and
- first-hop gateway redirection.
-
- IGMP is an Internet layer protocol used for establishing
- dynamic host groups for IP multicasting.
-
- The Internet layer protocols IP, ICMP, and IGMP are
- discussed in Chapter 3.
-
-
- o Link Layer
-
- To communicate on its directly-connected network, a host
- must implement the communication protocol used to
- interface to that network. We call this a link layer or
- media-access layer protocol.
-
- There is a wide variety of link layer protocols,
- corresponding to the many different types of networks.
- See Chapter 2.
-
-
- 1.1.4 Embedded Gateway Code
-
- Some Internet host software includes embedded gateway
- functionality, so that these hosts can forward packets as a
-
-
-
-Internet Engineering Task Force [Page 10]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- gateway would, while still performing the application layer
- functions of a host.
-
- Such dual-purpose systems must follow the Gateway Requirements
- RFC [INTRO:2] with respect to their gateway functions, and
- must follow the present document with respect to their host
- functions. In all overlapping cases, the two specifications
- should be in agreement.
-
- There are varying opinions in the Internet community about
- embedded gateway functionality. The main arguments are as
- follows:
-
- o Pro: in a local network environment where networking is
- informal, or in isolated internets, it may be convenient
- and economical to use existing host systems as gateways.
-
- There is also an architectural argument for embedded
- gateway functionality: multihoming is much more common
- than originally foreseen, and multihoming forces a host to
- make routing decisions as if it were a gateway. If the
- multihomed host contains an embedded gateway, it will
- have full routing knowledge and as a result will be able
- to make more optimal routing decisions.
-
- o Con: Gateway algorithms and protocols are still changing,
- and they will continue to change as the Internet system
- grows larger. Attempting to include a general gateway
- function within the host IP layer will force host system
- maintainers to track these (more frequent) changes. Also,
- a larger pool of gateway implementations will make
- coordinating the changes more difficult. Finally, the
- complexity of a gateway IP layer is somewhat greater than
- that of a host, making the implementation and operation
- tasks more complex.
-
- In addition, the style of operation of some hosts is not
- appropriate for providing stable and robust gateway
- service.
-
- There is considerable merit in both of these viewpoints. One
- conclusion can be drawn: an host administrator must have
- conscious control over whether or not a given host acts as a
- gateway. See Section 3.1 for the detailed requirements.
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 11]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- 1.2 General Considerations
-
- There are two important lessons that vendors of Internet host
- software have learned and which a new vendor should consider
- seriously.
-
- 1.2.1 Continuing Internet Evolution
-
- The enormous growth of the Internet has revealed problems of
- management and scaling in a large datagram-based packet
- communication system. These problems are being addressed, and
- as a result there will be continuing evolution of the
- specifications described in this document. These changes will
- be carefully planned and controlled, since there is extensive
- participation in this planning by the vendors and by the
- organizations responsible for operations of the networks.
-
- Development, evolution, and revision are characteristic of
- computer network protocols today, and this situation will
- persist for some years. A vendor who develops computer
- communication software for the Internet protocol suite (or any
- other protocol suite!) and then fails to maintain and update
- that software for changing specifications is going to leave a
- trail of unhappy customers. The Internet is a large
- communication network, and the users are in constant contact
- through it. Experience has shown that knowledge of
- deficiencies in vendor software propagates quickly through the
- Internet technical community.
-
- 1.2.2 Robustness Principle
-
- At every layer of the protocols, there is a general rule whose
- application can lead to enormous benefits in robustness and
- interoperability [IP:1]:
-
- "Be liberal in what you accept, and
- conservative in what you send"
-
- Software should be written to deal with every conceivable
- error, no matter how unlikely; sooner or later a packet will
- come in with that particular combination of errors and
- attributes, and unless the software is prepared, chaos can
- ensue. In general, it is best to assume that the network is
- filled with malevolent entities that will send in packets
- designed to have the worst possible effect. This assumption
- will lead to suitable protective design, although the most
- serious problems in the Internet have been caused by
- unenvisaged mechanisms triggered by low-probability events;
-
-
-
-Internet Engineering Task Force [Page 12]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- mere human malice would never have taken so devious a course!
-
- Adaptability to change must be designed into all levels of
- Internet host software. As a simple example, consider a
- protocol specification that contains an enumeration of values
- for a particular header field -- e.g., a type field, a port
- number, or an error code; this enumeration must be assumed to
- be incomplete. Thus, if a protocol specification defines four
- possible error codes, the software must not break when a fifth
- code shows up. An undefined code might be logged (see below),
- but it must not cause a failure.
-
- The second part of the principle is almost as important:
- software on other hosts may contain deficiencies that make it
- unwise to exploit legal but obscure protocol features. It is
- unwise to stray far from the obvious and simple, lest untoward
- effects result elsewhere. A corollary of this is "watch out
- for misbehaving hosts"; host software should be prepared, not
- just to survive other misbehaving hosts, but also to cooperate
- to limit the amount of disruption such hosts can cause to the
- shared communication facility.
-
- 1.2.3 Error Logging
-
- The Internet includes a great variety of host and gateway
- systems, each implementing many protocols and protocol layers,
- and some of these contain bugs and mis-features in their
- Internet protocol software. As a result of complexity,
- diversity, and distribution of function, the diagnosis of
- Internet problems is often very difficult.
-
- Problem diagnosis will be aided if host implementations include
- a carefully designed facility for logging erroneous or
- "strange" protocol events. It is important to include as much
- diagnostic information as possible when an error is logged. In
- particular, it is often useful to record the header(s) of a
- packet that caused an error. However, care must be taken to
- ensure that error logging does not consume prohibitive amounts
- of resources or otherwise interfere with the operation of the
- host.
-
- There is a tendency for abnormal but harmless protocol events
- to overflow error logging files; this can be avoided by using a
- "circular" log, or by enabling logging only while diagnosing a
- known failure. It may be useful to filter and count duplicate
- successive messages. One strategy that seems to work well is:
- (1) always count abnormalities and make such counts accessible
- through the management protocol (see [INTRO:1]); and (2) allow
-
-
-
-Internet Engineering Task Force [Page 13]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- the logging of a great variety of events to be selectively
- enabled. For example, it might useful to be able to "log
- everything" or to "log everything for host X".
-
- Note that different managements may have differing policies
- about the amount of error logging that they want normally
- enabled in a host. Some will say, "if it doesn't hurt me, I
- don't want to know about it", while others will want to take a
- more watchful and aggressive attitude about detecting and
- removing protocol abnormalities.
-
- 1.2.4 Configuration
-
- It would be ideal if a host implementation of the Internet
- protocol suite could be entirely self-configuring. This would
- allow the whole suite to be implemented in ROM or cast into
- silicon, it would simplify diskless workstations, and it would
- be an immense boon to harried LAN administrators as well as
- system vendors. We have not reached this ideal; in fact, we
- are not even close.
-
- At many points in this document, you will find a requirement
- that a parameter be a configurable option. There are several
- different reasons behind such requirements. In a few cases,
- there is current uncertainty or disagreement about the best
- value, and it may be necessary to update the recommended value
- in the future. In other cases, the value really depends on
- external factors -- e.g., the size of the host and the
- distribution of its communication load, or the speeds and
- topology of nearby networks -- and self-tuning algorithms are
- unavailable and may be insufficient. In some cases,
- configurability is needed because of administrative
- requirements.
-
- Finally, some configuration options are required to communicate
- with obsolete or incorrect implementations of the protocols,
- distributed without sources, that unfortunately persist in many
- parts of the Internet. To make correct systems coexist with
- these faulty systems, administrators often have to "mis-
- configure" the correct systems. This problem will correct
- itself gradually as the faulty systems are retired, but it
- cannot be ignored by vendors.
-
- When we say that a parameter must be configurable, we do not
- intend to require that its value be explicitly read from a
- configuration file at every boot time. We recommend that
- implementors set up a default for each parameter, so a
- configuration file is only necessary to override those defaults
-
-
-
-Internet Engineering Task Force [Page 14]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- that are inappropriate in a particular installation. Thus, the
- configurability requirement is an assurance that it will be
- POSSIBLE to override the default when necessary, even in a
- binary-only or ROM-based product.
-
- This document requires a particular value for such defaults in
- some cases. The choice of default is a sensitive issue when
- the configuration item controls the accommodation to existing
- faulty systems. If the Internet is to converge successfully to
- complete interoperability, the default values built into
- implementations must implement the official protocol, not
- "mis-configurations" to accommodate faulty implementations.
- Although marketing considerations have led some vendors to
- choose mis-configuration defaults, we urge vendors to choose
- defaults that will conform to the standard.
-
- Finally, we note that a vendor needs to provide adequate
- documentation on all configuration parameters, their limits and
- effects.
-
-
- 1.3 Reading this Document
-
- 1.3.1 Organization
-
- Protocol layering, which is generally used as an organizing
- principle in implementing network software, has also been used
- to organize this document. In describing the rules, we assume
- that an implementation does strictly mirror the layering of the
- protocols. Thus, the following three major sections specify
- the requirements for the link layer, the internet layer, and
- the transport layer, respectively. A companion RFC [INTRO:1]
- covers application level software. This layerist organization
- was chosen for simplicity and clarity.
-
- However, strict layering is an imperfect model, both for the
- protocol suite and for recommended implementation approaches.
- Protocols in different layers interact in complex and sometimes
- subtle ways, and particular functions often involve multiple
- layers. There are many design choices in an implementation,
- many of which involve creative "breaking" of strict layering.
- Every implementor is urged to read references [INTRO:7] and
- [INTRO:8].
-
- This document describes the conceptual service interface
- between layers using a functional ("procedure call") notation,
- like that used in the TCP specification [TCP:1]. A host
- implementation must support the logical information flow
-
-
-
-Internet Engineering Task Force [Page 15]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- implied by these calls, but need not literally implement the
- calls themselves. For example, many implementations reflect
- the coupling between the transport layer and the IP layer by
- giving them shared access to common data structures. These
- data structures, rather than explicit procedure calls, are then
- the agency for passing much of the information that is
- required.
-
- In general, each major section of this document is organized
- into the following subsections:
-
- (1) Introduction
-
- (2) Protocol Walk-Through -- considers the protocol
- specification documents section-by-section, correcting
- errors, stating requirements that may be ambiguous or
- ill-defined, and providing further clarification or
- explanation.
-
- (3) Specific Issues -- discusses protocol design and
- implementation issues that were not included in the walk-
- through.
-
- (4) Interfaces -- discusses the service interface to the next
- higher layer.
-
- (5) Summary -- contains a summary of the requirements of the
- section.
-
-
- Under many of the individual topics in this document, there is
- parenthetical material labeled "DISCUSSION" or
- "IMPLEMENTATION". This material is intended to give
- clarification and explanation of the preceding requirements
- text. It also includes some suggestions on possible future
- directions or developments. The implementation material
- contains suggested approaches that an implementor may want to
- consider.
-
- The summary sections are intended to be guides and indexes to
- the text, but are necessarily cryptic and incomplete. The
- summaries should never be used or referenced separately from
- the complete RFC.
-
- 1.3.2 Requirements
-
- In this document, the words that are used to define the
- significance of each particular requirement are capitalized.
-
-
-
-Internet Engineering Task Force [Page 16]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- These words are:
-
- * "MUST"
-
- This word or the adjective "REQUIRED" means that the item
- is an absolute requirement of the specification.
-
- * "SHOULD"
-
- This word or the adjective "RECOMMENDED" means that there
- may exist valid reasons in particular circumstances to
- ignore this item, but the full implications should be
- understood and the case carefully weighed before choosing
- a different course.
-
- * "MAY"
-
- This word or the adjective "OPTIONAL" means that this item
- is truly optional. One vendor may choose to include the
- item because a particular marketplace requires it or
- because it enhances the product, for example; another
- vendor may omit the same item.
-
-
- An implementation is not compliant if it fails to satisfy one
- or more of the MUST requirements for the protocols it
- implements. An implementation that satisfies all the MUST and
- all the SHOULD requirements for its protocols is said to be
- "unconditionally compliant"; one that satisfies all the MUST
- requirements but not all the SHOULD requirements for its
- protocols is said to be "conditionally compliant".
-
- 1.3.3 Terminology
-
- This document uses the following technical terms:
-
- Segment
- A segment is the unit of end-to-end transmission in the
- TCP protocol. A segment consists of a TCP header followed
- by application data. A segment is transmitted by
- encapsulation inside an IP datagram.
-
- Message
- In this description of the lower-layer protocols, a
- message is the unit of transmission in a transport layer
- protocol. In particular, a TCP segment is a message. A
- message consists of a transport protocol header followed
- by application protocol data. To be transmitted end-to-
-
-
-
-Internet Engineering Task Force [Page 17]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- end through the Internet, a message must be encapsulated
- inside a datagram.
-
- IP Datagram
- An IP datagram is the unit of end-to-end transmission in
- the IP protocol. An IP datagram consists of an IP header
- followed by transport layer data, i.e., of an IP header
- followed by a message.
-
- In the description of the internet layer (Section 3), the
- unqualified term "datagram" should be understood to refer
- to an IP datagram.
-
- Packet
- A packet is the unit of data passed across the interface
- between the internet layer and the link layer. It
- includes an IP header and data. A packet may be a
- complete IP datagram or a fragment of an IP datagram.
-
- Frame
- A frame is the unit of transmission in a link layer
- protocol, and consists of a link-layer header followed by
- a packet.
-
- Connected Network
- A network to which a host is interfaced is often known as
- the "local network" or the "subnetwork" relative to that
- host. However, these terms can cause confusion, and
- therefore we use the term "connected network" in this
- document.
-
- Multihomed
- A host is said to be multihomed if it has multiple IP
- addresses. For a discussion of multihoming, see Section
- 3.3.4 below.
-
- Physical network interface
- This is a physical interface to a connected network and
- has a (possibly unique) link-layer address. Multiple
- physical network interfaces on a single host may share the
- same link-layer address, but the address must be unique
- for different hosts on the same physical network.
-
- Logical [network] interface
- We define a logical [network] interface to be a logical
- path, distinguished by a unique IP address, to a connected
- network. See Section 3.3.4.
-
-
-
-
-Internet Engineering Task Force [Page 18]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- Specific-destination address
- This is the effective destination address of a datagram,
- even if it is broadcast or multicast; see Section 3.2.1.3.
-
- Path
- At a given moment, all the IP datagrams from a particular
- source host to a particular destination host will
- typically traverse the same sequence of gateways. We use
- the term "path" for this sequence. Note that a path is
- uni-directional; it is not unusual to have different paths
- in the two directions between a given host pair.
-
- MTU
- The maximum transmission unit, i.e., the size of the
- largest packet that can be transmitted.
-
-
- The terms frame, packet, datagram, message, and segment are
- illustrated by the following schematic diagrams:
-
- A. Transmission on connected network:
- _______________________________________________
- | LL hdr | IP hdr | (data) |
- |________|________|_____________________________|
-
- <---------- Frame ----------------------------->
- <----------Packet -------------------->
-
-
- B. Before IP fragmentation or after IP reassembly:
- ______________________________________
- | IP hdr | transport| Application Data |
- |________|____hdr___|__________________|
-
- <-------- Datagram ------------------>
- <-------- Message ----------->
- or, for TCP:
- ______________________________________
- | IP hdr | TCP hdr | Application Data |
- |________|__________|__________________|
-
- <-------- Datagram ------------------>
- <-------- Segment ----------->
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 19]
-
-
-
-
-RFC1122 INTRODUCTION October 1989
-
-
- 1.4 Acknowledgments
-
- This document incorporates contributions and comments from a large
- group of Internet protocol experts, including representatives of
- university and research labs, vendors, and government agencies.
- It was assembled primarily by the Host Requirements Working Group
- of the Internet Engineering Task Force (IETF).
-
- The Editor would especially like to acknowledge the tireless
- dedication of the following people, who attended many long
- meetings and generated 3 million bytes of electronic mail over the
- past 18 months in pursuit of this document: Philip Almquist, Dave
- Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve
- Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore),
- John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG),
- Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge
- (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software).
-
- In addition, the following people made major contributions to the
- effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia
- (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA),
- Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL),
- John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill
- Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul
- (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue
- Technology), and Mike StJohns (DCA). The following also made
- significant contributions to particular areas: Eric Allman
- (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic
- (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn
- (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann
- (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun
- Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen
- (Toronto).
-
- We are grateful to all, including any contributors who may have
- been inadvertently omitted from this list.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 20]
-
-
-
-
-RFC1122 LINK LAYER October 1989
-
-
-2. LINK LAYER
-
- 2.1 INTRODUCTION
-
- All Internet systems, both hosts and gateways, have the same
- requirements for link layer protocols. These requirements are
- given in Chapter 3 of "Requirements for Internet Gateways"
- [INTRO:2], augmented with the material in this section.
-
- 2.2 PROTOCOL WALK-THROUGH
-
- None.
-
- 2.3 SPECIFIC ISSUES
-
- 2.3.1 Trailer Protocol Negotiation
-
- The trailer protocol [LINK:1] for link-layer encapsulation MAY
- be used, but only when it has been verified that both systems
- (host or gateway) involved in the link-layer communication
- implement trailers. If the system does not dynamically
- negotiate use of the trailer protocol on a per-destination
- basis, the default configuration MUST disable the protocol.
-
- DISCUSSION:
- The trailer protocol is a link-layer encapsulation
- technique that rearranges the data contents of packets
- sent on the physical network. In some cases, trailers
- improve the throughput of higher layer protocols by
- reducing the amount of data copying within the operating
- system. Higher layer protocols are unaware of trailer
- use, but both the sending and receiving host MUST
- understand the protocol if it is used.
-
- Improper use of trailers can result in very confusing
- symptoms. Only packets with specific size attributes are
- encapsulated using trailers, and typically only a small
- fraction of the packets being exchanged have these
- attributes. Thus, if a system using trailers exchanges
- packets with a system that does not, some packets
- disappear into a black hole while others are delivered
- successfully.
-
- IMPLEMENTATION:
- On an Ethernet, packets encapsulated with trailers use a
- distinct Ethernet type [LINK:1], and trailer negotiation
- is performed at the time that ARP is used to discover the
- link-layer address of a destination system.
-
-
-
-Internet Engineering Task Force [Page 21]
-
-
-
-
-RFC1122 LINK LAYER October 1989
-
-
- Specifically, the ARP exchange is completed in the usual
- manner using the normal IP protocol type, but a host that
- wants to speak trailers will send an additional "trailer
- ARP reply" packet, i.e., an ARP reply that specifies the
- trailer encapsulation protocol type but otherwise has the
- format of a normal ARP reply. If a host configured to use
- trailers receives a trailer ARP reply message from a
- remote machine, it can add that machine to the list of
- machines that understand trailers, e.g., by marking the
- corresponding entry in the ARP cache.
-
- Hosts wishing to receive trailer encapsulations send
- trailer ARP replies whenever they complete exchanges of
- normal ARP messages for IP. Thus, a host that received an
- ARP request for its IP protocol address would send a
- trailer ARP reply in addition to the normal IP ARP reply;
- a host that sent the IP ARP request would send a trailer
- ARP reply when it received the corresponding IP ARP reply.
- In this way, either the requesting or responding host in
- an IP ARP exchange may request that it receive trailer
- encapsulations.
-
- This scheme, using extra trailer ARP reply packets rather
- than sending an ARP request for the trailer protocol type,
- was designed to avoid a continuous exchange of ARP packets
- with a misbehaving host that, contrary to any
- specification or common sense, responded to an ARP reply
- for trailers with another ARP reply for IP. This problem
- is avoided by sending a trailer ARP reply in response to
- an IP ARP reply only when the IP ARP reply answers an
- outstanding request; this is true when the hardware
- address for the host is still unknown when the IP ARP
- reply is received. A trailer ARP reply may always be sent
- along with an IP ARP reply responding to an IP ARP
- request.
-
- 2.3.2 Address Resolution Protocol -- ARP
-
- 2.3.2.1 ARP Cache Validation
-
- An implementation of the Address Resolution Protocol (ARP)
- [LINK:2] MUST provide a mechanism to flush out-of-date cache
- entries. If this mechanism involves a timeout, it SHOULD be
- possible to configure the timeout value.
-
- A mechanism to prevent ARP flooding (repeatedly sending an
- ARP Request for the same IP address, at a high rate) MUST be
- included. The recommended maximum rate is 1 per second per
-
-
-
-Internet Engineering Task Force [Page 22]
-
-
-
-
-RFC1122 LINK LAYER October 1989
-
-
- destination.
-
- DISCUSSION:
- The ARP specification [LINK:2] suggests but does not
- require a timeout mechanism to invalidate cache entries
- when hosts change their Ethernet addresses. The
- prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
- has significantly increased the likelihood that cache
- entries in hosts will become invalid, and therefore
- some ARP-cache invalidation mechanism is now required
- for hosts. Even in the absence of proxy ARP, a long-
- period cache timeout is useful in order to
- automatically correct any bad ARP data that might have
- been cached.
-
- IMPLEMENTATION:
- Four mechanisms have been used, sometimes in
- combination, to flush out-of-date cache entries.
-
- (1) Timeout -- Periodically time out cache entries,
- even if they are in use. Note that this timeout
- should be restarted when the cache entry is
- "refreshed" (by observing the source fields,
- regardless of target address, of an ARP broadcast
- from the system in question). For proxy ARP
- situations, the timeout needs to be on the order
- of a minute.
-
- (2) Unicast Poll -- Actively poll the remote host by
- periodically sending a point-to-point ARP Request
- to it, and delete the entry if no ARP Reply is
- received from N successive polls. Again, the
- timeout should be on the order of a minute, and
- typically N is 2.
-
- (3) Link-Layer Advice -- If the link-layer driver
- detects a delivery problem, flush the
- corresponding ARP cache entry.
-
- (4) Higher-layer Advice -- Provide a call from the
- Internet layer to the link layer to indicate a
- delivery problem. The effect of this call would
- be to invalidate the corresponding cache entry.
- This call would be analogous to the
- "ADVISE_DELIVPROB()" call from the transport layer
- to the Internet layer (see Section 3.4), and in
- fact the ADVISE_DELIVPROB routine might in turn
- call the link-layer advice routine to invalidate
-
-
-
-Internet Engineering Task Force [Page 23]
-
-
-
-
-RFC1122 LINK LAYER October 1989
-
-
- the ARP cache entry.
-
- Approaches (1) and (2) involve ARP cache timeouts on
- the order of a minute or less. In the absence of proxy
- ARP, a timeout this short could create noticeable
- overhead traffic on a very large Ethernet. Therefore,
- it may be necessary to configure a host to lengthen the
- ARP cache timeout.
-
- 2.3.2.2 ARP Packet Queue
-
- The link layer SHOULD save (rather than discard) at least
- one (the latest) packet of each set of packets destined to
- the same unresolved IP address, and transmit the saved
- packet when the address has been resolved.
-
- DISCUSSION:
- Failure to follow this recommendation causes the first
- packet of every exchange to be lost. Although higher-
- layer protocols can generally cope with packet loss by
- retransmission, packet loss does impact performance.
- For example, loss of a TCP open request causes the
- initial round-trip time estimate to be inflated. UDP-
- based applications such as the Domain Name System are
- more seriously affected.
-
- 2.3.3 Ethernet and IEEE 802 Encapsulation
-
- The IP encapsulation for Ethernets is described in RFC-894
- [LINK:3], while RFC-1042 [LINK:4] describes the IP
- encapsulation for IEEE 802 networks. RFC-1042 elaborates and
- replaces the discussion in Section 3.4 of [INTRO:2].
-
- Every Internet host connected to a 10Mbps Ethernet cable:
-
- o MUST be able to send and receive packets using RFC-894
- encapsulation;
-
- o SHOULD be able to receive RFC-1042 packets, intermixed
- with RFC-894 packets; and
-
- o MAY be able to send packets using RFC-1042 encapsulation.
-
-
- An Internet host that implements sending both the RFC-894 and
- the RFC-1042 encapsulations MUST provide a configuration switch
- to select which is sent, and this switch MUST default to RFC-
- 894.
-
-
-
-Internet Engineering Task Force [Page 24]
-
-
-
-
-RFC1122 LINK LAYER October 1989
-
-
- Note that the standard IP encapsulation in RFC-1042 does not
- use the protocol id value (K1=6) that IEEE reserved for IP;
- instead, it uses a value (K1=170) that implies an extension
- (the "SNAP") which can be used to hold the Ether-Type field.
- An Internet system MUST NOT send 802 packets using K1=6.
-
- Address translation from Internet addresses to link-layer
- addresses on Ethernet and IEEE 802 networks MUST be managed by
- the Address Resolution Protocol (ARP).
-
- The MTU for an Ethernet is 1500 and for 802.3 is 1492.
-
- DISCUSSION:
- The IEEE 802.3 specification provides for operation over a
- 10Mbps Ethernet cable, in which case Ethernet and IEEE
- 802.3 frames can be physically intermixed. A receiver can
- distinguish Ethernet and 802.3 frames by the value of the
- 802.3 Length field; this two-octet field coincides in the
- header with the Ether-Type field of an Ethernet frame. In
- particular, the 802.3 Length field must be less than or
- equal to 1500, while all valid Ether-Type values are
- greater than 1500.
-
- Another compatibility problem arises with link-layer
- broadcasts. A broadcast sent with one framing will not be
- seen by hosts that can receive only the other framing.
-
- The provisions of this section were designed to provide
- direct interoperation between 894-capable and 1042-capable
- systems on the same cable, to the maximum extent possible.
- It is intended to support the present situation where
- 894-only systems predominate, while providing an easy
- transition to a possible future in which 1042-capable
- systems become common.
-
- Note that 894-only systems cannot interoperate directly
- with 1042-only systems. If the two system types are set
- up as two different logical networks on the same cable,
- they can communicate only through an IP gateway.
- Furthermore, it is not useful or even possible for a
- dual-format host to discover automatically which format to
- send, because of the problem of link-layer broadcasts.
-
- 2.4 LINK/INTERNET LAYER INTERFACE
-
- The packet receive interface between the IP layer and the link
- layer MUST include a flag to indicate whether the incoming packet
- was addressed to a link-layer broadcast address.
-
-
-
-Internet Engineering Task Force [Page 25]
-
-
-
-
-RFC1122 LINK LAYER October 1989
-
-
- DISCUSSION
- Although the IP layer does not generally know link layer
- addresses (since every different network medium typically has
- a different address format), the broadcast address on a
- broadcast-capable medium is an important special case. See
- Section 3.2.2, especially the DISCUSSION concerning broadcast
- storms.
-
- The packet send interface between the IP and link layers MUST
- include the 5-bit TOS field (see Section 3.2.1.6).
-
- The link layer MUST NOT report a Destination Unreachable error to
- IP solely because there is no ARP cache entry for a destination.
-
- 2.5 LINK LAYER REQUIREMENTS SUMMARY
-
- | | | | |S| |
- | | | | |H| |F
- | | | | |O|M|o
- | | |S| |U|U|o
- | | |H| |L|S|t
- | |M|O| |D|T|n
- | |U|U|M| | |o
- | |S|L|A|N|N|t
- | |T|D|Y|O|O|t
-FEATURE |SECTION| | | |T|T|e
---------------------------------------------------|-------|-|-|-|-|-|--
- | | | | | | |
-Trailer encapsulation |2.3.1 | | |x| | |
-Send Trailers by default without negotiation |2.3.1 | | | | |x|
-ARP |2.3.2 | | | | | |
- Flush out-of-date ARP cache entries |2.3.2.1|x| | | | |
- Prevent ARP floods |2.3.2.1|x| | | | |
- Cache timeout configurable |2.3.2.1| |x| | | |
- Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | |
-Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | |
- Host able to: |2.3.3 | | | | | |
- Send & receive RFC-894 encapsulation |2.3.3 |x| | | | |
- Receive RFC-1042 encapsulation |2.3.3 | |x| | | |
- Send RFC-1042 encapsulation |2.3.3 | | |x| | |
- Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | |
- Send K1=6 encapsulation |2.3.3 | | | | |x|
- Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | |
-Link layer report b'casts to IP layer |2.4 |x| | | | |
-IP layer pass TOS to link layer |2.4 |x| | | | |
-No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x|
-
-
-
-
-
-Internet Engineering Task Force [Page 26]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
-3. INTERNET LAYER PROTOCOLS
-
- 3.1 INTRODUCTION
-
- The Robustness Principle: "Be liberal in what you accept, and
- conservative in what you send" is particularly important in the
- Internet layer, where one misbehaving host can deny Internet
- service to many other hosts.
-
- The protocol standards used in the Internet layer are:
-
- o RFC-791 [IP:1] defines the IP protocol and gives an
- introduction to the architecture of the Internet.
-
- o RFC-792 [IP:2] defines ICMP, which provides routing,
- diagnostic and error functionality for IP. Although ICMP
- messages are encapsulated within IP datagrams, ICMP
- processing is considered to be (and is typically implemented
- as) part of the IP layer. See Section 3.2.2.
-
- o RFC-950 [IP:3] defines the mandatory subnet extension to the
- addressing architecture.
-
- o RFC-1112 [IP:4] defines the Internet Group Management
- Protocol IGMP, as part of a recommended extension to hosts
- and to the host-gateway interface to support Internet-wide
- multicasting at the IP level. See Section 3.2.3.
-
- The target of an IP multicast may be an arbitrary group of
- Internet hosts. IP multicasting is designed as a natural
- extension of the link-layer multicasting facilities of some
- networks, and it provides a standard means for local access
- to such link-layer multicasting facilities.
-
- Other important references are listed in Section 5 of this
- document.
-
- The Internet layer of host software MUST implement both IP and
- ICMP. See Section 3.3.7 for the requirements on support of IGMP.
-
- The host IP layer has two basic functions: (1) choose the "next
- hop" gateway or host for outgoing IP datagrams and (2) reassemble
- incoming IP datagrams. The IP layer may also (3) implement
- intentional fragmentation of outgoing datagrams. Finally, the IP
- layer must (4) provide diagnostic and error functionality. We
- expect that IP layer functions may increase somewhat in the
- future, as further Internet control and management facilities are
- developed.
-
-
-
-Internet Engineering Task Force [Page 27]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- For normal datagrams, the processing is straightforward. For
- incoming datagrams, the IP layer:
-
- (1) verifies that the datagram is correctly formatted;
-
- (2) verifies that it is destined to the local host;
-
- (3) processes options;
-
- (4) reassembles the datagram if necessary; and
-
- (5) passes the encapsulated message to the appropriate
- transport-layer protocol module.
-
- For outgoing datagrams, the IP layer:
-
- (1) sets any fields not set by the transport layer;
-
- (2) selects the correct first hop on the connected network (a
- process called "routing");
-
- (3) fragments the datagram if necessary and if intentional
- fragmentation is implemented (see Section 3.3.3); and
-
- (4) passes the packet(s) to the appropriate link-layer driver.
-
-
- A host is said to be multihomed if it has multiple IP addresses.
- Multihoming introduces considerable confusion and complexity into
- the protocol suite, and it is an area in which the Internet
- architecture falls seriously short of solving all problems. There
- are two distinct problem areas in multihoming:
-
- (1) Local multihoming -- the host itself is multihomed; or
-
- (2) Remote multihoming -- the local host needs to communicate
- with a remote multihomed host.
-
- At present, remote multihoming MUST be handled at the application
- layer, as discussed in the companion RFC [INTRO:1]. A host MAY
- support local multihoming, which is discussed in this document,
- and in particular in Section 3.3.4.
-
- Any host that forwards datagrams generated by another host is
- acting as a gateway and MUST also meet the specifications laid out
- in the gateway requirements RFC [INTRO:2]. An Internet host that
- includes embedded gateway code MUST have a configuration switch to
- disable the gateway function, and this switch MUST default to the
-
-
-
-Internet Engineering Task Force [Page 28]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- non-gateway mode. In this mode, a datagram arriving through one
- interface will not be forwarded to another host or gateway (unless
- it is source-routed), regardless of whether the host is single-
- homed or multihomed. The host software MUST NOT automatically
- move into gateway mode if the host has more than one interface, as
- the operator of the machine may neither want to provide that
- service nor be competent to do so.
-
- In the following, the action specified in certain cases is to
- "silently discard" a received datagram. This means that the
- datagram will be discarded without further processing and that the
- host will not send any ICMP error message (see Section 3.2.2) as a
- result. However, for diagnosis of problems a host SHOULD provide
- the capability of logging the error (see Section 1.2.3), including
- the contents of the silently-discarded datagram, and SHOULD record
- the event in a statistics counter.
-
- DISCUSSION:
- Silent discard of erroneous datagrams is generally intended
- to prevent "broadcast storms".
-
- 3.2 PROTOCOL WALK-THROUGH
-
- 3.2.1 Internet Protocol -- IP
-
- 3.2.1.1 Version Number: RFC-791 Section 3.1
-
- A datagram whose version number is not 4 MUST be silently
- discarded.
-
- 3.2.1.2 Checksum: RFC-791 Section 3.1
-
- A host MUST verify the IP header checksum on every received
- datagram and silently discard every datagram that has a bad
- checksum.
-
- 3.2.1.3 Addressing: RFC-791 Section 3.2
-
- There are now five classes of IP addresses: Class A through
- Class E. Class D addresses are used for IP multicasting
- [IP:4], while Class E addresses are reserved for
- experimental use.
-
- A multicast (Class D) address is a 28-bit logical address
- that stands for a group of hosts, and may be either
- permanent or transient. Permanent multicast addresses are
- allocated by the Internet Assigned Number Authority
- [INTRO:6], while transient addresses may be allocated
-
-
-
-Internet Engineering Task Force [Page 29]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- dynamically to transient groups. Group membership is
- determined dynamically using IGMP [IP:4].
-
- We now summarize the important special cases for Class A, B,
- and C IP addresses, using the following notation for an IP
- address:
-
- { <Network-number>, <Host-number> }
-
- or
- { <Network-number>, <Subnet-number>, <Host-number> }
-
- and the notation "-1" for a field that contains all 1 bits.
- This notation is not intended to imply that the 1-bits in an
- address mask need be contiguous.
-
- (a) { 0, 0 }
-
- This host on this network. MUST NOT be sent, except as
- a source address as part of an initialization procedure
- by which the host learns its own IP address.
-
- See also Section 3.3.6 for a non-standard use of {0,0}.
-
- (b) { 0, <Host-number> }
-
- Specified host on this network. It MUST NOT be sent,
- except as a source address as part of an initialization
- procedure by which the host learns its full IP address.
-
- (c) { -1, -1 }
-
- Limited broadcast. It MUST NOT be used as a source
- address.
-
- A datagram with this destination address will be
- received by every host on the connected physical
- network but will not be forwarded outside that network.
-
- (d) { <Network-number>, -1 }
-
- Directed broadcast to the specified network. It MUST
- NOT be used as a source address.
-
- (e) { <Network-number>, <Subnet-number>, -1 }
-
- Directed broadcast to the specified subnet. It MUST
- NOT be used as a source address.
-
-
-
-Internet Engineering Task Force [Page 30]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- (f) { <Network-number>, -1, -1 }
-
- Directed broadcast to all subnets of the specified
- subnetted network. It MUST NOT be used as a source
- address.
-
- (g) { 127, <any> }
-
- Internal host loopback address. Addresses of this form
- MUST NOT appear outside a host.
-
- The <Network-number> is administratively assigned so that
- its value will be unique in the entire world.
-
- IP addresses are not permitted to have the value 0 or -1 for
- any of the <Host-number>, <Network-number>, or <Subnet-
- number> fields (except in the special cases listed above).
- This implies that each of these fields will be at least two
- bits long.
-
- For further discussion of broadcast addresses, see Section
- 3.3.6.
-
- A host MUST support the subnet extensions to IP [IP:3]. As
- a result, there will be an address mask of the form:
- {-1, -1, 0} associated with each of the host's local IP
- addresses; see Sections 3.2.2.9 and 3.3.1.1.
-
- When a host sends any datagram, the IP source address MUST
- be one of its own IP addresses (but not a broadcast or
- multicast address).
-
- A host MUST silently discard an incoming datagram that is
- not destined for the host. An incoming datagram is destined
- for the host if the datagram's destination address field is:
-
- (1) (one of) the host's IP address(es); or
-
- (2) an IP broadcast address valid for the connected
- network; or
-
- (3) the address for a multicast group of which the host is
- a member on the incoming physical interface.
-
- For most purposes, a datagram addressed to a broadcast or
- multicast destination is processed as if it had been
- addressed to one of the host's IP addresses; we use the term
- "specific-destination address" for the equivalent local IP
-
-
-
-Internet Engineering Task Force [Page 31]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- address of the host. The specific-destination address is
- defined to be the destination address in the IP header
- unless the header contains a broadcast or multicast address,
- in which case the specific-destination is an IP address
- assigned to the physical interface on which the datagram
- arrived.
-
- A host MUST silently discard an incoming datagram containing
- an IP source address that is invalid by the rules of this
- section. This validation could be done in either the IP
- layer or by each protocol in the transport layer.
-
- DISCUSSION:
- A mis-addressed datagram might be caused by a link-
- layer broadcast of a unicast datagram or by a gateway
- or host that is confused or mis-configured.
-
- An architectural goal for Internet hosts was to allow
- IP addresses to be featureless 32-bit numbers, avoiding
- algorithms that required a knowledge of the IP address
- format. Otherwise, any future change in the format or
- interpretation of IP addresses will require host
- software changes. However, validation of broadcast and
- multicast addresses violates this goal; a few other
- violations are described elsewhere in this document.
-
- Implementers should be aware that applications
- depending upon the all-subnets directed broadcast
- address (f) may be unusable on some networks. All-
- subnets broadcast is not widely implemented in vendor
- gateways at present, and even when it is implemented, a
- particular network administration may disable it in the
- gateway configuration.
-
- 3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2
-
- The Internet model requires that every host support
- reassembly. See Sections 3.3.2 and 3.3.3 for the
- requirements on fragmentation and reassembly.
-
- 3.2.1.5 Identification: RFC-791 Section 3.2
-
- When sending an identical copy of an earlier datagram, a
- host MAY optionally retain the same Identification field in
- the copy.
-
-
-
-
-
-
-Internet Engineering Task Force [Page 32]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- DISCUSSION:
- Some Internet protocol experts have maintained that
- when a host sends an identical copy of an earlier
- datagram, the new copy should contain the same
- Identification value as the original. There are two
- suggested advantages: (1) if the datagrams are
- fragmented and some of the fragments are lost, the
- receiver may be able to reconstruct a complete datagram
- from fragments of the original and the copies; (2) a
- congested gateway might use the IP Identification field
- (and Fragment Offset) to discard duplicate datagrams
- from the queue.
-
- However, the observed patterns of datagram loss in the
- Internet do not favor the probability of retransmitted
- fragments filling reassembly gaps, while other
- mechanisms (e.g., TCP repacketizing upon
- retransmission) tend to prevent retransmission of an
- identical datagram [IP:9]. Therefore, we believe that
- retransmitting the same Identification field is not
- useful. Also, a connectionless transport protocol like
- UDP would require the cooperation of the application
- programs to retain the same Identification value in
- identical datagrams.
-
- 3.2.1.6 Type-of-Service: RFC-791 Section 3.2
-
- The "Type-of-Service" byte in the IP header is divided into
- two sections: the Precedence field (high-order 3 bits), and
- a field that is customarily called "Type-of-Service" or
- "TOS" (low-order 5 bits). In this document, all references
- to "TOS" or the "TOS field" refer to the low-order 5 bits
- only.
-
- The Precedence field is intended for Department of Defense
- applications of the Internet protocols. The use of non-zero
- values in this field is outside the scope of this document
- and the IP standard specification. Vendors should consult
- the Defense Communication Agency (DCA) for guidance on the
- IP Precedence field and its implications for other protocol
- layers. However, vendors should note that the use of
- precedence will most likely require that its value be passed
- between protocol layers in just the same way as the TOS
- field is passed.
-
- The IP layer MUST provide a means for the transport layer to
- set the TOS field of every datagram that is sent; the
- default is all zero bits. The IP layer SHOULD pass received
-
-
-
-Internet Engineering Task Force [Page 33]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- TOS values up to the transport layer.
-
- The particular link-layer mappings of TOS contained in RFC-
- 795 SHOULD NOT be implemented.
-
- DISCUSSION:
- While the TOS field has been little used in the past,
- it is expected to play an increasing role in the near
- future. The TOS field is expected to be used to
- control two aspects of gateway operations: routing and
- queueing algorithms. See Section 2 of [INTRO:1] for
- the requirements on application programs to specify TOS
- values.
-
- The TOS field may also be mapped into link-layer
- service selectors. This has been applied to provide
- effective sharing of serial lines by different classes
- of TCP traffic, for example. However, the mappings
- suggested in RFC-795 for networks that were included in
- the Internet as of 1981 are now obsolete.
-
- 3.2.1.7 Time-to-Live: RFC-791 Section 3.2
-
- A host MUST NOT send a datagram with a Time-to-Live (TTL)
- value of zero.
-
- A host MUST NOT discard a datagram just because it was
- received with TTL less than 2.
-
- The IP layer MUST provide a means for the transport layer to
- set the TTL field of every datagram that is sent. When a
- fixed TTL value is used, it MUST be configurable. The
- current suggested value will be published in the "Assigned
- Numbers" RFC.
-
- DISCUSSION:
- The TTL field has two functions: limit the lifetime of
- TCP segments (see RFC-793 [TCP:1], p. 28), and
- terminate Internet routing loops. Although TTL is a
- time in seconds, it also has some attributes of a hop-
- count, since each gateway is required to reduce the TTL
- field by at least one.
-
- The intent is that TTL expiration will cause a datagram
- to be discarded by a gateway but not by the destination
- host; however, hosts that act as gateways by forwarding
- datagrams must follow the gateway rules for TTL.
-
-
-
-
-Internet Engineering Task Force [Page 34]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- A higher-layer protocol may want to set the TTL in
- order to implement an "expanding scope" search for some
- Internet resource. This is used by some diagnostic
- tools, and is expected to be useful for locating the
- "nearest" server of a given class using IP
- multicasting, for example. A particular transport
- protocol may also want to specify its own TTL bound on
- maximum datagram lifetime.
-
- A fixed value must be at least big enough for the
- Internet "diameter," i.e., the longest possible path.
- A reasonable value is about twice the diameter, to
- allow for continued Internet growth.
-
- 3.2.1.8 Options: RFC-791 Section 3.2
-
- There MUST be a means for the transport layer to specify IP
- options to be included in transmitted IP datagrams (see
- Section 3.4).
-
- All IP options (except NOP or END-OF-LIST) received in
- datagrams MUST be passed to the transport layer (or to ICMP
- processing when the datagram is an ICMP message). The IP
- and transport layer MUST each interpret those IP options
- that they understand and silently ignore the others.
-
- Later sections of this document discuss specific IP option
- support required by each of ICMP, TCP, and UDP.
-
- DISCUSSION:
- Passing all received IP options to the transport layer
- is a deliberate "violation of strict layering" that is
- designed to ease the introduction of new transport-
- relevant IP options in the future. Each layer must
- pick out any options that are relevant to its own
- processing and ignore the rest. For this purpose,
- every IP option except NOP and END-OF-LIST will include
- a specification of its own length.
-
- This document does not define the order in which a
- receiver must process multiple options in the same IP
- header. Hosts sending multiple options must be aware
- that this introduces an ambiguity in the meaning of
- certain options when combined with a source-route
- option.
-
- IMPLEMENTATION:
- The IP layer must not crash as the result of an option
-
-
-
-Internet Engineering Task Force [Page 35]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- length that is outside the possible range. For
- example, erroneous option lengths have been observed to
- put some IP implementations into infinite loops.
-
- Here are the requirements for specific IP options:
-
-
- (a) Security Option
-
- Some environments require the Security option in every
- datagram; such a requirement is outside the scope of
- this document and the IP standard specification. Note,
- however, that the security options described in RFC-791
- and RFC-1038 are obsolete. For DoD applications,
- vendors should consult [IP:8] for guidance.
-
-
- (b) Stream Identifier Option
-
- This option is obsolete; it SHOULD NOT be sent, and it
- MUST be silently ignored if received.
-
-
- (c) Source Route Options
-
- A host MUST support originating a source route and MUST
- be able to act as the final destination of a source
- route.
-
- If host receives a datagram containing a completed
- source route (i.e., the pointer points beyond the last
- field), the datagram has reached its final destination;
- the option as received (the recorded route) MUST be
- passed up to the transport layer (or to ICMP message
- processing). This recorded route will be reversed and
- used to form a return source route for reply datagrams
- (see discussion of IP Options in Section 4). When a
- return source route is built, it MUST be correctly
- formed even if the recorded route included the source
- host (see case (B) in the discussion below).
-
- An IP header containing more than one Source Route
- option MUST NOT be sent; the effect on routing of
- multiple Source Route options is implementation-
- specific.
-
- Section 3.3.5 presents the rules for a host acting as
- an intermediate hop in a source route, i.e., forwarding
-
-
-
-Internet Engineering Task Force [Page 36]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- a source-routed datagram.
-
- DISCUSSION:
- If a source-routed datagram is fragmented, each
- fragment will contain a copy of the source route.
- Since the processing of IP options (including a
- source route) must precede reassembly, the
- original datagram will not be reassembled until
- the final destination is reached.
-
- Suppose a source routed datagram is to be routed
- from host S to host D via gateways G1, G2, ... Gn.
- There was an ambiguity in the specification over
- whether the source route option in a datagram sent
- out by S should be (A) or (B):
-
- (A): {>>G2, G3, ... Gn, D} <--- CORRECT
-
- (B): {S, >>G2, G3, ... Gn, D} <---- WRONG
-
- (where >> represents the pointer). If (A) is
- sent, the datagram received at D will contain the
- option: {G1, G2, ... Gn >>}, with S and D as the
- IP source and destination addresses. If (B) were
- sent, the datagram received at D would again
- contain S and D as the same IP source and
- destination addresses, but the option would be:
- {S, G1, ...Gn >>}; i.e., the originating host
- would be the first hop in the route.
-
-
- (d) Record Route Option
-
- Implementation of originating and processing the Record
- Route option is OPTIONAL.
-
-
- (e) Timestamp Option
-
- Implementation of originating and processing the
- Timestamp option is OPTIONAL. If it is implemented,
- the following rules apply:
-
- o The originating host MUST record a timestamp in a
- Timestamp option whose Internet address fields are
- not pre-specified or whose first pre-specified
- address is the host's interface address.
-
-
-
-
-Internet Engineering Task Force [Page 37]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- o The destination host MUST (if possible) add the
- current timestamp to a Timestamp option before
- passing the option to the transport layer or to
- ICMP for processing.
-
- o A timestamp value MUST follow the rules given in
- Section 3.2.2.8 for the ICMP Timestamp message.
-
-
- 3.2.2 Internet Control Message Protocol -- ICMP
-
- ICMP messages are grouped into two classes.
-
- *
- ICMP error messages:
-
- Destination Unreachable (see Section 3.2.2.1)
- Redirect (see Section 3.2.2.2)
- Source Quench (see Section 3.2.2.3)
- Time Exceeded (see Section 3.2.2.4)
- Parameter Problem (see Section 3.2.2.5)
-
-
- *
- ICMP query messages:
-
- Echo (see Section 3.2.2.6)
- Information (see Section 3.2.2.7)
- Timestamp (see Section 3.2.2.8)
- Address Mask (see Section 3.2.2.9)
-
-
- If an ICMP message of unknown type is received, it MUST be
- silently discarded.
-
- Every ICMP error message includes the Internet header and at
- least the first 8 data octets of the datagram that triggered
- the error; more than 8 octets MAY be sent; this header and data
- MUST be unchanged from the received datagram.
-
- In those cases where the Internet layer is required to pass an
- ICMP error message to the transport layer, the IP protocol
- number MUST be extracted from the original header and used to
- select the appropriate transport protocol entity to handle the
- error.
-
- An ICMP error message SHOULD be sent with normal (i.e., zero)
- TOS bits.
-
-
-
-Internet Engineering Task Force [Page 38]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- An ICMP error message MUST NOT be sent as the result of
- receiving:
-
- * an ICMP error message, or
-
- * a datagram destined to an IP broadcast or IP multicast
- address, or
-
- * a datagram sent as a link-layer broadcast, or
-
- * a non-initial fragment, or
-
- * a datagram whose source address does not define a single
- host -- e.g., a zero address, a loopback address, a
- broadcast address, a multicast address, or a Class E
- address.
-
- NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT
- ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES.
-
- DISCUSSION:
- These rules will prevent the "broadcast storms" that have
- resulted from hosts returning ICMP error messages in
- response to broadcast datagrams. For example, a broadcast
- UDP segment to a non-existent port could trigger a flood
- of ICMP Destination Unreachable datagrams from all
- machines that do not have a client for that destination
- port. On a large Ethernet, the resulting collisions can
- render the network useless for a second or more.
-
- Every datagram that is broadcast on the connected network
- should have a valid IP broadcast address as its IP
- destination (see Section 3.3.6). However, some hosts
- violate this rule. To be certain to detect broadcast
- datagrams, therefore, hosts are required to check for a
- link-layer broadcast as well as an IP-layer broadcast
- address.
-
- IMPLEMENTATION:
- This requires that the link layer inform the IP layer when
- a link-layer broadcast datagram has been received; see
- Section 2.4.
-
- 3.2.2.1 Destination Unreachable: RFC-792
-
- The following additional codes are hereby defined:
-
- 6 = destination network unknown
-
-
-
-Internet Engineering Task Force [Page 39]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- 7 = destination host unknown
-
- 8 = source host isolated
-
- 9 = communication with destination network
- administratively prohibited
-
- 10 = communication with destination host
- administratively prohibited
-
- 11 = network unreachable for type of service
-
- 12 = host unreachable for type of service
-
- A host SHOULD generate Destination Unreachable messages with
- code:
-
- 2 (Protocol Unreachable), when the designated transport
- protocol is not supported; or
-
- 3 (Port Unreachable), when the designated transport
- protocol (e.g., UDP) is unable to demultiplex the
- datagram but has no protocol mechanism to inform the
- sender.
-
- A Destination Unreachable message that is received MUST be
- reported to the transport layer. The transport layer SHOULD
- use the information appropriately; for example, see Sections
- 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
- that has its own mechanism for notifying the sender that a
- port is unreachable (e.g., TCP, which sends RST segments)
- MUST nevertheless accept an ICMP Port Unreachable for the
- same purpose.
-
- A Destination Unreachable message that is received with code
- 0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a
- routing transient and MUST therefore be interpreted as only
- a hint, not proof, that the specified destination is
- unreachable [IP:11]. For example, it MUST NOT be used as
- proof of a dead gateway (see Section 3.3.1).
-
- 3.2.2.2 Redirect: RFC-792
-
- A host SHOULD NOT send an ICMP Redirect message; Redirects
- are to be sent only by gateways.
-
- A host receiving a Redirect message MUST update its routing
- information accordingly. Every host MUST be prepared to
-
-
-
-Internet Engineering Task Force [Page 40]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- accept both Host and Network Redirects and to process them
- as described in Section 3.3.1.2 below.
-
- A Redirect message SHOULD be silently discarded if the new
- gateway address it specifies is not on the same connected
- (sub-) net through which the Redirect arrived [INTRO:2,
- Appendix A], or if the source of the Redirect is not the
- current first-hop gateway for the specified destination (see
- Section 3.3.1).
-
- 3.2.2.3 Source Quench: RFC-792
-
- A host MAY send a Source Quench message if it is
- approaching, or has reached, the point at which it is forced
- to discard incoming datagrams due to a shortage of
- reassembly buffers or other resources. See Section 2.2.3 of
- [INTRO:2] for suggestions on when to send Source Quench.
-
- If a Source Quench message is received, the IP layer MUST
- report it to the transport layer (or ICMP processing). In
- general, the transport or application layer SHOULD implement
- a mechanism to respond to Source Quench for any protocol
- that can send a sequence of datagrams to the same
- destination and which can reasonably be expected to maintain
- enough state information to make this feasible. See Section
- 4 for the handling of Source Quench by TCP and UDP.
-
- DISCUSSION:
- A Source Quench may be generated by the target host or
- by some gateway in the path of a datagram. The host
- receiving a Source Quench should throttle itself back
- for a period of time, then gradually increase the
- transmission rate again. The mechanism to respond to
- Source Quench may be in the transport layer (for
- connection-oriented protocols like TCP) or in the
- application layer (for protocols that are built on top
- of UDP).
-
- A mechanism has been proposed [IP:14] to make the IP
- layer respond directly to Source Quench by controlling
- the rate at which datagrams are sent, however, this
- proposal is currently experimental and not currently
- recommended.
-
- 3.2.2.4 Time Exceeded: RFC-792
-
- An incoming Time Exceeded message MUST be passed to the
- transport layer.
-
-
-
-Internet Engineering Task Force [Page 41]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- DISCUSSION:
- A gateway will send a Time Exceeded Code 0 (In Transit)
- message when it discards a datagram due to an expired
- TTL field. This indicates either a gateway routing
- loop or too small an initial TTL value.
-
- A host may receive a Time Exceeded Code 1 (Reassembly
- Timeout) message from a destination host that has timed
- out and discarded an incomplete datagram; see Section
- 3.3.2 below. In the future, receipt of this message
- might be part of some "MTU discovery" procedure, to
- discover the maximum datagram size that can be sent on
- the path without fragmentation.
-
- 3.2.2.5 Parameter Problem: RFC-792
-
- A host SHOULD generate Parameter Problem messages. An
- incoming Parameter Problem message MUST be passed to the
- transport layer, and it MAY be reported to the user.
-
- DISCUSSION:
- The ICMP Parameter Problem message is sent to the
- source host for any problem not specifically covered by
- another ICMP message. Receipt of a Parameter Problem
- message generally indicates some local or remote
- implementation error.
-
- A new variant on the Parameter Problem message is hereby
- defined:
- Code 1 = required option is missing.
-
- DISCUSSION:
- This variant is currently in use in the military
- community for a missing security option.
-
- 3.2.2.6 Echo Request/Reply: RFC-792
-
- Every host MUST implement an ICMP Echo server function that
- receives Echo Requests and sends corresponding Echo Replies.
- A host SHOULD also implement an application-layer interface
- for sending an Echo Request and receiving an Echo Reply, for
- diagnostic purposes.
-
- An ICMP Echo Request destined to an IP broadcast or IP
- multicast address MAY be silently discarded.
-
-
-
-
-
-
-Internet Engineering Task Force [Page 42]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- DISCUSSION:
- This neutral provision results from a passionate debate
- between those who feel that ICMP Echo to a broadcast
- address provides a valuable diagnostic capability and
- those who feel that misuse of this feature can too
- easily create packet storms.
-
- The IP source address in an ICMP Echo Reply MUST be the same
- as the specific-destination address (defined in Section
- 3.2.1.3) of the corresponding ICMP Echo Request message.
-
- Data received in an ICMP Echo Request MUST be entirely
- included in the resulting Echo Reply. However, if sending
- the Echo Reply requires intentional fragmentation that is
- not implemented, the datagram MUST be truncated to maximum
- transmission size (see Section 3.3.3) and sent.
-
- Echo Reply messages MUST be passed to the ICMP user
- interface, unless the corresponding Echo Request originated
- in the IP layer.
-
- If a Record Route and/or Time Stamp option is received in an
- ICMP Echo Request, this option (these options) SHOULD be
- updated to include the current host and included in the IP
- header of the Echo Reply message, without "truncation".
- Thus, the recorded route will be for the entire round trip.
-
- If a Source Route option is received in an ICMP Echo
- Request, the return route MUST be reversed and used as a
- Source Route option for the Echo Reply message.
-
- 3.2.2.7 Information Request/Reply: RFC-792
-
- A host SHOULD NOT implement these messages.
-
- DISCUSSION:
- The Information Request/Reply pair was intended to
- support self-configuring systems such as diskless
- workstations, to allow them to discover their IP
- network numbers at boot time. However, the RARP and
- BOOTP protocols provide better mechanisms for a host to
- discover its own IP address.
-
- 3.2.2.8 Timestamp and Timestamp Reply: RFC-792
-
- A host MAY implement Timestamp and Timestamp Reply. If they
- are implemented, the following rules MUST be followed.
-
-
-
-
-Internet Engineering Task Force [Page 43]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- o The ICMP Timestamp server function returns a Timestamp
- Reply to every Timestamp message that is received. If
- this function is implemented, it SHOULD be designed for
- minimum variability in delay (e.g., implemented in the
- kernel to avoid delay in scheduling a user process).
-
- The following cases for Timestamp are to be handled
- according to the corresponding rules for ICMP Echo:
-
- o An ICMP Timestamp Request message to an IP broadcast or
- IP multicast address MAY be silently discarded.
-
- o The IP source address in an ICMP Timestamp Reply MUST
- be the same as the specific-destination address of the
- corresponding Timestamp Request message.
-
- o If a Source-route option is received in an ICMP Echo
- Request, the return route MUST be reversed and used as
- a Source Route option for the Timestamp Reply message.
-
- o If a Record Route and/or Timestamp option is received
- in a Timestamp Request, this (these) option(s) SHOULD
- be updated to include the current host and included in
- the IP header of the Timestamp Reply message.
-
- o Incoming Timestamp Reply messages MUST be passed up to
- the ICMP user interface.
-
- The preferred form for a timestamp value (the "standard
- value") is in units of milliseconds since midnight Universal
- Time. However, it may be difficult to provide this value
- with millisecond resolution. For example, many systems use
- clocks that update only at line frequency, 50 or 60 times
- per second. Therefore, some latitude is allowed in a
- "standard value":
-
- (a) A "standard value" MUST be updated at least 15 times
- per second (i.e., at most the six low-order bits of the
- value may be undefined).
-
- (b) The accuracy of a "standard value" MUST approximate
- that of operator-set CPU clocks, i.e., correct within a
- few minutes.
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 44]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- 3.2.2.9 Address Mask Request/Reply: RFC-950
-
- A host MUST support the first, and MAY implement all three,
- of the following methods for determining the address mask(s)
- corresponding to its IP address(es):
-
- (1) static configuration information;
-
- (2) obtaining the address mask(s) dynamically as a side-
- effect of the system initialization process (see
- [INTRO:1]); and
-
- (3) sending ICMP Address Mask Request(s) and receiving ICMP
- Address Mask Reply(s).
-
- The choice of method to be used in a particular host MUST be
- configurable.
-
- When method (3), the use of Address Mask messages, is
- enabled, then:
-
- (a) When it initializes, the host MUST broadcast an Address
- Mask Request message on the connected network
- corresponding to the IP address. It MUST retransmit
- this message a small number of times if it does not
- receive an immediate Address Mask Reply.
-
- (b) Until it has received an Address Mask Reply, the host
- SHOULD assume a mask appropriate for the address class
- of the IP address, i.e., assume that the connected
- network is not subnetted.
-
- (c) The first Address Mask Reply message received MUST be
- used to set the address mask corresponding to the
- particular local IP address. This is true even if the
- first Address Mask Reply message is "unsolicited", in
- which case it will have been broadcast and may arrive
- after the host has ceased to retransmit Address Mask
- Requests. Once the mask has been set by an Address
- Mask Reply, later Address Mask Reply messages MUST be
- (silently) ignored.
-
- Conversely, if Address Mask messages are disabled, then no
- ICMP Address Mask Requests will be sent, and any ICMP
- Address Mask Replies received for that local IP address MUST
- be (silently) ignored.
-
- A host SHOULD make some reasonableness check on any address
-
-
-
-Internet Engineering Task Force [Page 45]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- mask it installs; see IMPLEMENTATION section below.
-
- A system MUST NOT send an Address Mask Reply unless it is an
- authoritative agent for address masks. An authoritative
- agent may be a host or a gateway, but it MUST be explicitly
- configured as a address mask agent. Receiving an address
- mask via an Address Mask Reply does not give the receiver
- authority and MUST NOT be used as the basis for issuing
- Address Mask Replies.
-
- With a statically configured address mask, there SHOULD be
- an additional configuration flag that determines whether the
- host is to act as an authoritative agent for this mask,
- i.e., whether it will answer Address Mask Request messages
- using this mask.
-
- If it is configured as an agent, the host MUST broadcast an
- Address Mask Reply for the mask on the appropriate interface
- when it initializes.
-
- See "System Initialization" in [INTRO:1] for more
- information about the use of Address Mask Request/Reply
- messages.
-
- DISCUSSION
- Hosts that casually send Address Mask Replies with
- invalid address masks have often been a serious
- nuisance. To prevent this, Address Mask Replies ought
- to be sent only by authoritative agents that have been
- selected by explicit administrative action.
-
- When an authoritative agent receives an Address Mask
- Request message, it will send a unicast Address Mask
- Reply to the source IP address. If the network part of
- this address is zero (see (a) and (b) in 3.2.1.3), the
- Reply will be broadcast.
-
- Getting no reply to its Address Mask Request messages,
- a host will assume there is no agent and use an
- unsubnetted mask, but the agent may be only temporarily
- unreachable. An agent will broadcast an unsolicited
- Address Mask Reply whenever it initializes, in order to
- update the masks of all hosts that have initialized in
- the meantime.
-
- IMPLEMENTATION:
- The following reasonableness check on an address mask
- is suggested: the mask is not all 1 bits, and it is
-
-
-
-Internet Engineering Task Force [Page 46]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- either zero or else the 8 highest-order bits are on.
-
- 3.2.3 Internet Group Management Protocol IGMP
-
- IGMP [IP:4] is a protocol used between hosts and gateways on a
- single network to establish hosts' membership in particular
- multicast groups. The gateways use this information, in
- conjunction with a multicast routing protocol, to support IP
- multicasting across the Internet.
-
- At this time, implementation of IGMP is OPTIONAL; see Section
- 3.3.7 for more information. Without IGMP, a host can still
- participate in multicasting local to its connected networks.
-
- 3.3 SPECIFIC ISSUES
-
- 3.3.1 Routing Outbound Datagrams
-
- The IP layer chooses the correct next hop for each datagram it
- sends. If the destination is on a connected network, the
- datagram is sent directly to the destination host; otherwise,
- it has to be routed to a gateway on a connected network.
-
- 3.3.1.1 Local/Remote Decision
-
- To decide if the destination is on a connected network, the
- following algorithm MUST be used [see IP:3]:
-
- (a) The address mask (particular to a local IP address for
- a multihomed host) is a 32-bit mask that selects the
- network number and subnet number fields of the
- corresponding IP address.
-
- (b) If the IP destination address bits extracted by the
- address mask match the IP source address bits extracted
- by the same mask, then the destination is on the
- corresponding connected network, and the datagram is to
- be transmitted directly to the destination host.
-
- (c) If not, then the destination is accessible only through
- a gateway. Selection of a gateway is described below
- (3.3.1.2).
-
- A special-case destination address is handled as follows:
-
- * For a limited broadcast or a multicast address, simply
- pass the datagram to the link layer for the appropriate
- interface.
-
-
-
-Internet Engineering Task Force [Page 47]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- * For a (network or subnet) directed broadcast, the
- datagram can use the standard routing algorithms.
-
- The host IP layer MUST operate correctly in a minimal
- network environment, and in particular, when there are no
- gateways. For example, if the IP layer of a host insists on
- finding at least one gateway to initialize, the host will be
- unable to operate on a single isolated broadcast net.
-
- 3.3.1.2 Gateway Selection
-
- To efficiently route a series of datagrams to the same
- destination, the source host MUST keep a "route cache" of
- mappings to next-hop gateways. A host uses the following
- basic algorithm on this cache to route a datagram; this
- algorithm is designed to put the primary routing burden on
- the gateways [IP:11].
-
- (a) If the route cache contains no information for a
- particular destination, the host chooses a "default"
- gateway and sends the datagram to it. It also builds a
- corresponding Route Cache entry.
-
- (b) If that gateway is not the best next hop to the
- destination, the gateway will forward the datagram to
- the best next-hop gateway and return an ICMP Redirect
- message to the source host.
-
- (c) When it receives a Redirect, the host updates the
- next-hop gateway in the appropriate route cache entry,
- so later datagrams to the same destination will go
- directly to the best gateway.
-
- Since the subnet mask appropriate to the destination address
- is generally not known, a Network Redirect message SHOULD be
- treated identically to a Host Redirect message; i.e., the
- cache entry for the destination host (only) would be updated
- (or created, if an entry for that host did not exist) for
- the new gateway.
-
- DISCUSSION:
- This recommendation is to protect against gateways that
- erroneously send Network Redirects for a subnetted
- network, in violation of the gateway requirements
- [INTRO:2].
-
- When there is no route cache entry for the destination host
- address (and the destination is not on the connected
-
-
-
-Internet Engineering Task Force [Page 48]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- network), the IP layer MUST pick a gateway from its list of
- "default" gateways. The IP layer MUST support multiple
- default gateways.
-
- As an extra feature, a host IP layer MAY implement a table
- of "static routes". Each such static route MAY include a
- flag specifying whether it may be overridden by ICMP
- Redirects.
-
- DISCUSSION:
- A host generally needs to know at least one default
- gateway to get started. This information can be
- obtained from a configuration file or else from the
- host startup sequence, e.g., the BOOTP protocol (see
- [INTRO:1]).
-
- It has been suggested that a host can augment its list
- of default gateways by recording any new gateways it
- learns about. For example, it can record every gateway
- to which it is ever redirected. Such a feature, while
- possibly useful in some circumstances, may cause
- problems in other cases (e.g., gateways are not all
- equal), and it is not recommended.
-
- A static route is typically a particular preset mapping
- from destination host or network into a particular
- next-hop gateway; it might also depend on the Type-of-
- Service (see next section). Static routes would be set
- up by system administrators to override the normal
- automatic routing mechanism, to handle exceptional
- situations. However, any static routing information is
- a potential source of failure as configurations change
- or equipment fails.
-
- 3.3.1.3 Route Cache
-
- Each route cache entry needs to include the following
- fields:
-
- (1) Local IP address (for a multihomed host)
-
- (2) Destination IP address
-
- (3) Type(s)-of-Service
-
- (4) Next-hop gateway IP address
-
- Field (2) MAY be the full IP address of the destination
-
-
-
-Internet Engineering Task Force [Page 49]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- host, or only the destination network number. Field (3),
- the TOS, SHOULD be included.
-
- See Section 3.3.4.2 for a discussion of the implications of
- multihoming for the lookup procedure in this cache.
-
- DISCUSSION:
- Including the Type-of-Service field in the route cache
- and considering it in the host route algorithm will
- provide the necessary mechanism for the future when
- Type-of-Service routing is commonly used in the
- Internet. See Section 3.2.1.6.
-
- Each route cache entry defines the endpoints of an
- Internet path. Although the connecting path may change
- dynamically in an arbitrary way, the transmission
- characteristics of the path tend to remain
- approximately constant over a time period longer than a
- single typical host-host transport connection.
- Therefore, a route cache entry is a natural place to
- cache data on the properties of the path. Examples of
- such properties might be the maximum unfragmented
- datagram size (see Section 3.3.3), or the average
- round-trip delay measured by a transport protocol.
- This data will generally be both gathered and used by a
- higher layer protocol, e.g., by TCP, or by an
- application using UDP. Experiments are currently in
- progress on caching path properties in this manner.
-
- There is no consensus on whether the route cache should
- be keyed on destination host addresses alone, or allow
- both host and network addresses. Those who favor the
- use of only host addresses argue that:
-
- (1) As required in Section 3.3.1.2, Redirect messages
- will generally result in entries keyed on
- destination host addresses; the simplest and most
- general scheme would be to use host addresses
- always.
-
- (2) The IP layer may not always know the address mask
- for a network address in a complex subnetted
- environment.
-
- (3) The use of only host addresses allows the
- destination address to be used as a pure 32-bit
- number, which may allow the Internet architecture
- to be more easily extended in the future without
-
-
-
-Internet Engineering Task Force [Page 50]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- any change to the hosts.
-
- The opposing view is that allowing a mixture of
- destination hosts and networks in the route cache:
-
- (1) Saves memory space.
-
- (2) Leads to a simpler data structure, easily
- combining the cache with the tables of default and
- static routes (see below).
-
- (3) Provides a more useful place to cache path
- properties, as discussed earlier.
-
-
- IMPLEMENTATION:
- The cache needs to be large enough to include entries
- for the maximum number of destination hosts that may be
- in use at one time.
-
- A route cache entry may also include control
- information used to choose an entry for replacement.
- This might take the form of a "recently used" bit, a
- use count, or a last-used timestamp, for example. It
- is recommended that it include the time of last
- modification of the entry, for diagnostic purposes.
-
- An implementation may wish to reduce the overhead of
- scanning the route cache for every datagram to be
- transmitted. This may be accomplished with a hash
- table to speed the lookup, or by giving a connection-
- oriented transport protocol a "hint" or temporary
- handle on the appropriate cache entry, to be passed to
- the IP layer with each subsequent datagram.
-
- Although we have described the route cache, the lists
- of default gateways, and a table of static routes as
- conceptually distinct, in practice they may be combined
- into a single "routing table" data structure.
-
- 3.3.1.4 Dead Gateway Detection
-
- The IP layer MUST be able to detect the failure of a "next-
- hop" gateway that is listed in its route cache and to choose
- an alternate gateway (see Section 3.3.1.5).
-
- Dead gateway detection is covered in some detail in RFC-816
- [IP:11]. Experience to date has not produced a complete
-
-
-
-Internet Engineering Task Force [Page 51]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- algorithm which is totally satisfactory, though it has
- identified several forbidden paths and promising techniques.
-
- * A particular gateway SHOULD NOT be used indefinitely in
- the absence of positive indications that it is
- functioning.
-
- * Active probes such as "pinging" (i.e., using an ICMP
- Echo Request/Reply exchange) are expensive and scale
- poorly. In particular, hosts MUST NOT actively check
- the status of a first-hop gateway by simply pinging the
- gateway continuously.
-
- * Even when it is the only effective way to verify a
- gateway's status, pinging MUST be used only when
- traffic is being sent to the gateway and when there is
- no other positive indication to suggest that the
- gateway is functioning.
-
- * To avoid pinging, the layers above and/or below the
- Internet layer SHOULD be able to give "advice" on the
- status of route cache entries when either positive
- (gateway OK) or negative (gateway dead) information is
- available.
-
-
- DISCUSSION:
- If an implementation does not include an adequate
- mechanism for detecting a dead gateway and re-routing,
- a gateway failure may cause datagrams to apparently
- vanish into a "black hole". This failure can be
- extremely confusing for users and difficult for network
- personnel to debug.
-
- The dead-gateway detection mechanism must not cause
- unacceptable load on the host, on connected networks,
- or on first-hop gateway(s). The exact constraints on
- the timeliness of dead gateway detection and on
- acceptable load may vary somewhat depending on the
- nature of the host's mission, but a host generally
- needs to detect a failed first-hop gateway quickly
- enough that transport-layer connections will not break
- before an alternate gateway can be selected.
-
- Passing advice from other layers of the protocol stack
- complicates the interfaces between the layers, but it
- is the preferred approach to dead gateway detection.
- Advice can come from almost any part of the IP/TCP
-
-
-
-Internet Engineering Task Force [Page 52]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- architecture, but it is expected to come primarily from
- the transport and link layers. Here are some possible
- sources for gateway advice:
-
- o TCP or any connection-oriented transport protocol
- should be able to give negative advice, e.g.,
- triggered by excessive retransmissions.
-
- o TCP may give positive advice when (new) data is
- acknowledged. Even though the route may be
- asymmetric, an ACK for new data proves that the
- acknowleged data must have been transmitted
- successfully.
-
- o An ICMP Redirect message from a particular gateway
- should be used as positive advice about that
- gateway.
-
- o Link-layer information that reliably detects and
- reports host failures (e.g., ARPANET Destination
- Dead messages) should be used as negative advice.
-
- o Failure to ARP or to re-validate ARP mappings may
- be used as negative advice for the corresponding
- IP address.
-
- o Packets arriving from a particular link-layer
- address are evidence that the system at this
- address is alive. However, turning this
- information into advice about gateways requires
- mapping the link-layer address into an IP address,
- and then checking that IP address against the
- gateways pointed to by the route cache. This is
- probably prohibitively inefficient.
-
- Note that positive advice that is given for every
- datagram received may cause unacceptable overhead in
- the implementation.
-
- While advice might be passed using required arguments
- in all interfaces to the IP layer, some transport and
- application layer protocols cannot deduce the correct
- advice. These interfaces must therefore allow a
- neutral value for advice, since either always-positive
- or always-negative advice leads to incorrect behavior.
-
- There is another technique for dead gateway detection
- that has been commonly used but is not recommended.
-
-
-
-Internet Engineering Task Force [Page 53]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- This technique depends upon the host passively
- receiving ("wiretapping") the Interior Gateway Protocol
- (IGP) datagrams that the gateways are broadcasting to
- each other. This approach has the drawback that a host
- needs to recognize all the interior gateway protocols
- that gateways may use (see [INTRO:2]). In addition, it
- only works on a broadcast network.
-
- At present, pinging (i.e., using ICMP Echo messages) is
- the mechanism for gateway probing when absolutely
- required. A successful ping guarantees that the
- addressed interface and its associated machine are up,
- but it does not guarantee that the machine is a gateway
- as opposed to a host. The normal inference is that if
- a Redirect or other evidence indicates that a machine
- was a gateway, successful pings will indicate that the
- machine is still up and hence still a gateway.
- However, since a host silently discards packets that a
- gateway would forward or redirect, this assumption
- could sometimes fail. To avoid this problem, a new
- ICMP message under development will ask "are you a
- gateway?"
-
- IMPLEMENTATION:
- The following specific algorithm has been suggested:
-
- o Associate a "reroute timer" with each gateway
- pointed to by the route cache. Initialize the
- timer to a value Tr, which must be small enough to
- allow detection of a dead gateway before transport
- connections time out.
-
- o Positive advice would reset the reroute timer to
- Tr. Negative advice would reduce or zero the
- reroute timer.
-
- o Whenever the IP layer used a particular gateway to
- route a datagram, it would check the corresponding
- reroute timer. If the timer had expired (reached
- zero), the IP layer would send a ping to the
- gateway, followed immediately by the datagram.
-
- o The ping (ICMP Echo) would be sent again if
- necessary, up to N times. If no ping reply was
- received in N tries, the gateway would be assumed
- to have failed, and a new first-hop gateway would
- be chosen for all cache entries pointing to the
- failed gateway.
-
-
-
-Internet Engineering Task Force [Page 54]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- Note that the size of Tr is inversely related to the
- amount of advice available. Tr should be large enough
- to insure that:
-
- * Any pinging will be at a low level (e.g., <10%) of
- all packets sent to a gateway from the host, AND
-
- * pinging is infrequent (e.g., every 3 minutes)
-
- Since the recommended algorithm is concerned with the
- gateways pointed to by route cache entries, rather than
- the cache entries themselves, a two level data
- structure (perhaps coordinated with ARP or similar
- caches) may be desirable for implementing a route
- cache.
-
- 3.3.1.5 New Gateway Selection
-
- If the failed gateway is not the current default, the IP
- layer can immediately switch to a default gateway. If it is
- the current default that failed, the IP layer MUST select a
- different default gateway (assuming more than one default is
- known) for the failed route and for establishing new routes.
-
- DISCUSSION:
- When a gateway does fail, the other gateways on the
- connected network will learn of the failure through
- some inter-gateway routing protocol. However, this
- will not happen instantaneously, since gateway routing
- protocols typically have a settling time of 30-60
- seconds. If the host switches to an alternative
- gateway before the gateways have agreed on the failure,
- the new target gateway will probably forward the
- datagram to the failed gateway and send a Redirect back
- to the host pointing to the failed gateway (!). The
- result is likely to be a rapid oscillation in the
- contents of the host's route cache during the gateway
- settling period. It has been proposed that the dead-
- gateway logic should include some hysteresis mechanism
- to prevent such oscillations. However, experience has
- not shown any harm from such oscillations, since
- service cannot be restored to the host until the
- gateways' routing information does settle down.
-
- IMPLEMENTATION:
- One implementation technique for choosing a new default
- gateway is to simply round-robin among the default
- gateways in the host's list. Another is to rank the
-
-
-
-Internet Engineering Task Force [Page 55]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- gateways in priority order, and when the current
- default gateway is not the highest priority one, to
- "ping" the higher-priority gateways slowly to detect
- when they return to service. This pinging can be at a
- very low rate, e.g., 0.005 per second.
-
- 3.3.1.6 Initialization
-
- The following information MUST be configurable:
-
- (1) IP address(es).
-
- (2) Address mask(s).
-
- (3) A list of default gateways, with a preference level.
-
- A manual method of entering this configuration data MUST be
- provided. In addition, a variety of methods can be used to
- determine this information dynamically; see the section on
- "Host Initialization" in [INTRO:1].
-
- DISCUSSION:
- Some host implementations use "wiretapping" of gateway
- protocols on a broadcast network to learn what gateways
- exist. A standard method for default gateway discovery
- is under development.
-
- 3.3.2 Reassembly
-
- The IP layer MUST implement reassembly of IP datagrams.
-
- We designate the largest datagram size that can be reassembled
- by EMTU_R ("Effective MTU to receive"); this is sometimes
- called the "reassembly buffer size". EMTU_R MUST be greater
- than or equal to 576, SHOULD be either configurable or
- indefinite, and SHOULD be greater than or equal to the MTU of
- the connected network(s).
-
- DISCUSSION:
- A fixed EMTU_R limit should not be built into the code
- because some application layer protocols require EMTU_R
- values larger than 576.
-
- IMPLEMENTATION:
- An implementation may use a contiguous reassembly buffer
- for each datagram, or it may use a more complex data
- structure that places no definite limit on the reassembled
- datagram size; in the latter case, EMTU_R is said to be
-
-
-
-Internet Engineering Task Force [Page 56]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- "indefinite".
-
- Logically, reassembly is performed by simply copying each
- fragment into the packet buffer at the proper offset.
- Note that fragments may overlap if successive
- retransmissions use different packetizing but the same
- reassembly Id.
-
- The tricky part of reassembly is the bookkeeping to
- determine when all bytes of the datagram have been
- reassembled. We recommend Clark's algorithm [IP:10] that
- requires no additional data space for the bookkeeping.
- However, note that, contrary to [IP:10], the first
- fragment header needs to be saved for inclusion in a
- possible ICMP Time Exceeded (Reassembly Timeout) message.
-
- There MUST be a mechanism by which the transport layer can
- learn MMS_R, the maximum message size that can be received and
- reassembled in an IP datagram (see GET_MAXSIZES calls in
- Section 3.4). If EMTU_R is not indefinite, then the value of
- MMS_R is given by:
-
- MMS_R = EMTU_R - 20
-
- since 20 is the minimum size of an IP header.
-
- There MUST be a reassembly timeout. The reassembly timeout
- value SHOULD be a fixed value, not set from the remaining TTL.
- It is recommended that the value lie between 60 seconds and 120
- seconds. If this timeout expires, the partially-reassembled
- datagram MUST be discarded and an ICMP Time Exceeded message
- sent to the source host (if fragment zero has been received).
-
- DISCUSSION:
- The IP specification says that the reassembly timeout
- should be the remaining TTL from the IP header, but this
- does not work well because gateways generally treat TTL as
- a simple hop count rather than an elapsed time. If the
- reassembly timeout is too small, datagrams will be
- discarded unnecessarily, and communication may fail. The
- timeout needs to be at least as large as the typical
- maximum delay across the Internet. A realistic minimum
- reassembly timeout would be 60 seconds.
-
- It has been suggested that a cache might be kept of
- round-trip times measured by transport protocols for
- various destinations, and that these values might be used
- to dynamically determine a reasonable reassembly timeout
-
-
-
-Internet Engineering Task Force [Page 57]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- value. Further investigation of this approach is
- required.
-
- If the reassembly timeout is set too high, buffer
- resources in the receiving host will be tied up too long,
- and the MSL (Maximum Segment Lifetime) [TCP:1] will be
- larger than necessary. The MSL controls the maximum rate
- at which fragmented datagrams can be sent using distinct
- values of the 16-bit Ident field; a larger MSL lowers the
- maximum rate. The TCP specification [TCP:1] arbitrarily
- assumes a value of 2 minutes for MSL. This sets an upper
- limit on a reasonable reassembly timeout value.
-
- 3.3.3 Fragmentation
-
- Optionally, the IP layer MAY implement a mechanism to fragment
- outgoing datagrams intentionally.
-
- We designate by EMTU_S ("Effective MTU for sending") the
- maximum IP datagram size that may be sent, for a particular
- combination of IP source and destination addresses and perhaps
- TOS.
-
- A host MUST implement a mechanism to allow the transport layer
- to learn MMS_S, the maximum transport-layer message size that
- may be sent for a given {source, destination, TOS} triplet (see
- GET_MAXSIZES call in Section 3.4). If no local fragmentation
- is performed, the value of MMS_S will be:
-
- MMS_S = EMTU_S - <IP header size>
-
- and EMTU_S must be less than or equal to the MTU of the network
- interface corresponding to the source address of the datagram.
- Note that <IP header size> in this equation will be 20, unless
- the IP reserves space to insert IP options for its own purposes
- in addition to any options inserted by the transport layer.
-
- A host that does not implement local fragmentation MUST ensure
- that the transport layer (for TCP) or the application layer
- (for UDP) obtains MMS_S from the IP layer and does not send a
- datagram exceeding MMS_S in size.
-
- It is generally desirable to avoid local fragmentation and to
- choose EMTU_S low enough to avoid fragmentation in any gateway
- along the path. In the absence of actual knowledge of the
- minimum MTU along the path, the IP layer SHOULD use
- EMTU_S <= 576 whenever the destination address is not on a
- connected network, and otherwise use the connected network's
-
-
-
-Internet Engineering Task Force [Page 58]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- MTU.
-
- The MTU of each physical interface MUST be configurable.
-
- A host IP layer implementation MAY have a configuration flag
- "All-Subnets-MTU", indicating that the MTU of the connected
- network is to be used for destinations on different subnets
- within the same network, but not for other networks. Thus,
- this flag causes the network class mask, rather than the subnet
- address mask, to be used to choose an EMTU_S. For a multihomed
- host, an "All-Subnets-MTU" flag is needed for each network
- interface.
-
- DISCUSSION:
- Picking the correct datagram size to use when sending data
- is a complex topic [IP:9].
-
- (a) In general, no host is required to accept an IP
- datagram larger than 576 bytes (including header and
- data), so a host must not send a larger datagram
- without explicit knowledge or prior arrangement with
- the destination host. Thus, MMS_S is only an upper
- bound on the datagram size that a transport protocol
- may send; even when MMS_S exceeds 556, the transport
- layer must limit its messages to 556 bytes in the
- absence of other knowledge about the destination
- host.
-
- (b) Some transport protocols (e.g., TCP) provide a way to
- explicitly inform the sender about the largest
- datagram the other end can receive and reassemble
- [IP:7]. There is no corresponding mechanism in the
- IP layer.
-
- A transport protocol that assumes an EMTU_R larger
- than 576 (see Section 3.3.2), can send a datagram of
- this larger size to another host that implements the
- same protocol.
-
- (c) Hosts should ideally limit their EMTU_S for a given
- destination to the minimum MTU of all the networks
- along the path, to avoid any fragmentation. IP
- fragmentation, while formally correct, can create a
- serious transport protocol performance problem,
- because loss of a single fragment means all the
- fragments in the segment must be retransmitted
- [IP:9].
-
-
-
-
-Internet Engineering Task Force [Page 59]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- Since nearly all networks in the Internet currently
- support an MTU of 576 or greater, we strongly recommend
- the use of 576 for datagrams sent to non-local networks.
-
- It has been suggested that a host could determine the MTU
- over a given path by sending a zero-offset datagram
- fragment and waiting for the receiver to time out the
- reassembly (which cannot complete!) and return an ICMP
- Time Exceeded message. This message would include the
- largest remaining fragment header in its body. More
- direct mechanisms are being experimented with, but have
- not yet been adopted (see e.g., RFC-1063).
-
- 3.3.4 Local Multihoming
-
- 3.3.4.1 Introduction
-
- A multihomed host has multiple IP addresses, which we may
- think of as "logical interfaces". These logical interfaces
- may be associated with one or more physical interfaces, and
- these physical interfaces may be connected to the same or
- different networks.
-
- Here are some important cases of multihoming:
-
- (a) Multiple Logical Networks
-
- The Internet architects envisioned that each physical
- network would have a single unique IP network (or
- subnet) number. However, LAN administrators have
- sometimes found it useful to violate this assumption,
- operating a LAN with multiple logical networks per
- physical connected network.
-
- If a host connected to such a physical network is
- configured to handle traffic for each of N different
- logical networks, then the host will have N logical
- interfaces. These could share a single physical
- interface, or might use N physical interfaces to the
- same network.
-
- (b) Multiple Logical Hosts
-
- When a host has multiple IP addresses that all have the
- same <Network-number> part (and the same <Subnet-
- number> part, if any), the logical interfaces are known
- as "logical hosts". These logical interfaces might
- share a single physical interface or might use separate
-
-
-
-Internet Engineering Task Force [Page 60]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- physical interfaces to the same physical network.
-
- (c) Simple Multihoming
-
- In this case, each logical interface is mapped into a
- separate physical interface and each physical interface
- is connected to a different physical network. The term
- "multihoming" was originally applied only to this case,
- but it is now applied more generally.
-
- A host with embedded gateway functionality will
- typically fall into the simple multihoming case. Note,
- however, that a host may be simply multihomed without
- containing an embedded gateway, i.e., without
- forwarding datagrams from one connected network to
- another.
-
- This case presents the most difficult routing problems.
- The choice of interface (i.e., the choice of first-hop
- network) may significantly affect performance or even
- reachability of remote parts of the Internet.
-
-
- Finally, we note another possibility that is NOT
- multihoming: one logical interface may be bound to multiple
- physical interfaces, in order to increase the reliability or
- throughput between directly connected machines by providing
- alternative physical paths between them. For instance, two
- systems might be connected by multiple point-to-point links.
- We call this "link-layer multiplexing". With link-layer
- multiplexing, the protocols above the link layer are unaware
- that multiple physical interfaces are present; the link-
- layer device driver is responsible for multiplexing and
- routing packets across the physical interfaces.
-
- In the Internet protocol architecture, a transport protocol
- instance ("entity") has no address of its own, but instead
- uses a single Internet Protocol (IP) address. This has
- implications for the IP, transport, and application layers,
- and for the interfaces between them. In particular, the
- application software may have to be aware of the multiple IP
- addresses of a multihomed host; in other cases, the choice
- can be made within the network software.
-
- 3.3.4.2 Multihoming Requirements
-
- The following general rules apply to the selection of an IP
- source address for sending a datagram from a multihomed
-
-
-
-Internet Engineering Task Force [Page 61]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- host.
-
- (1) If the datagram is sent in response to a received
- datagram, the source address for the response SHOULD be
- the specific-destination address of the request. See
- Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
- section of [INTRO:1] for more specific requirements on
- higher layers.
-
- Otherwise, a source address must be selected.
-
- (2) An application MUST be able to explicitly specify the
- source address for initiating a connection or a
- request.
-
- (3) In the absence of such a specification, the networking
- software MUST choose a source address. Rules for this
- choice are described below.
-
-
- There are two key requirement issues related to multihoming:
-
- (A) A host MAY silently discard an incoming datagram whose
- destination address does not correspond to the physical
- interface through which it is received.
-
- (B) A host MAY restrict itself to sending (non-source-
- routed) IP datagrams only through the physical
- interface that corresponds to the IP source address of
- the datagrams.
-
-
- DISCUSSION:
- Internet host implementors have used two different
- conceptual models for multihoming, briefly summarized
- in the following discussion. This document takes no
- stand on which model is preferred; each seems to have a
- place. This ambivalence is reflected in the issues (A)
- and (B) being optional.
-
- o Strong ES Model
-
- The Strong ES (End System, i.e., host) model
- emphasizes the host/gateway (ES/IS) distinction,
- and would therefore substitute MUST for MAY in
- issues (A) and (B) above. It tends to model a
- multihomed host as a set of logical hosts within
- the same physical host.
-
-
-
-Internet Engineering Task Force [Page 62]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- With respect to (A), proponents of the Strong ES
- model note that automatic Internet routing
- mechanisms could not route a datagram to a
- physical interface that did not correspond to the
- destination address.
-
- Under the Strong ES model, the route computation
- for an outgoing datagram is the mapping:
-
- route(src IP addr, dest IP addr, TOS)
- -> gateway
-
- Here the source address is included as a parameter
- in order to select a gateway that is directly
- reachable on the corresponding physical interface.
- Note that this model logically requires that in
- general there be at least one default gateway, and
- preferably multiple defaults, for each IP source
- address.
-
- o Weak ES Model
-
- This view de-emphasizes the ES/IS distinction, and
- would therefore substitute MUST NOT for MAY in
- issues (A) and (B). This model may be the more
- natural one for hosts that wiretap gateway routing
- protocols, and is necessary for hosts that have
- embedded gateway functionality.
-
- The Weak ES Model may cause the Redirect mechanism
- to fail. If a datagram is sent out a physical
- interface that does not correspond to the
- destination address, the first-hop gateway will
- not realize when it needs to send a Redirect. On
- the other hand, if the host has embedded gateway
- functionality, then it has routing information
- without listening to Redirects.
-
- In the Weak ES model, the route computation for an
- outgoing datagram is the mapping:
-
- route(dest IP addr, TOS) -> gateway, interface
-
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 63]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- 3.3.4.3 Choosing a Source Address
-
- DISCUSSION:
- When it sends an initial connection request (e.g., a
- TCP "SYN" segment) or a datagram service request (e.g.,
- a UDP-based query), the transport layer on a multihomed
- host needs to know which source address to use. If the
- application does not specify it, the transport layer
- must ask the IP layer to perform the conceptual
- mapping:
-
- GET_SRCADDR(remote IP addr, TOS)
- -> local IP address
-
- Here TOS is the Type-of-Service value (see Section
- 3.2.1.6), and the result is the desired source address.
- The following rules are suggested for implementing this
- mapping:
-
- (a) If the remote Internet address lies on one of the
- (sub-) nets to which the host is directly
- connected, a corresponding source address may be
- chosen, unless the corresponding interface is
- known to be down.
-
- (b) The route cache may be consulted, to see if there
- is an active route to the specified destination
- network through any network interface; if so, a
- local IP address corresponding to that interface
- may be chosen.
-
- (c) The table of static routes, if any (see Section
- 3.3.1.2) may be similarly consulted.
-
- (d) The default gateways may be consulted. If these
- gateways are assigned to different interfaces, the
- interface corresponding to the gateway with the
- highest preference may be chosen.
-
- In the future, there may be a defined way for a
- multihomed host to ask the gateways on all connected
- networks for advice about the best network to use for a
- given destination.
-
- IMPLEMENTATION:
- It will be noted that this process is essentially the
- same as datagram routing (see Section 3.3.1), and
- therefore hosts may be able to combine the
-
-
-
-Internet Engineering Task Force [Page 64]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- implementation of the two functions.
-
- 3.3.5 Source Route Forwarding
-
- Subject to restrictions given below, a host MAY be able to act
- as an intermediate hop in a source route, forwarding a source-
- routed datagram to the next specified hop.
-
- However, in performing this gateway-like function, the host
- MUST obey all the relevant rules for a gateway forwarding
- source-routed datagrams [INTRO:2]. This includes the following
- specific provisions, which override the corresponding host
- provisions given earlier in this document:
-
- (A) TTL (ref. Section 3.2.1.7)
-
- The TTL field MUST be decremented and the datagram perhaps
- discarded as specified for a gateway in [INTRO:2].
-
- (B) ICMP Destination Unreachable (ref. Section 3.2.2.1)
-
- A host MUST be able to generate Destination Unreachable
- messages with the following codes:
-
- 4 (Fragmentation Required but DF Set) when a source-
- routed datagram cannot be fragmented to fit into the
- target network;
-
- 5 (Source Route Failed) when a source-routed datagram
- cannot be forwarded, e.g., because of a routing
- problem or because the next hop of a strict source
- route is not on a connected network.
-
- (C) IP Source Address (ref. Section 3.2.1.3)
-
- A source-routed datagram being forwarded MAY (and normally
- will) have a source address that is not one of the IP
- addresses of the forwarding host.
-
- (D) Record Route Option (ref. Section 3.2.1.8d)
-
- A host that is forwarding a source-routed datagram
- containing a Record Route option MUST update that option,
- if it has room.
-
- (E) Timestamp Option (ref. Section 3.2.1.8e)
-
- A host that is forwarding a source-routed datagram
-
-
-
-Internet Engineering Task Force [Page 65]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- containing a Timestamp Option MUST add the current
- timestamp to that option, according to the rules for this
- option.
-
- To define the rules restricting host forwarding of source-
- routed datagrams, we use the term "local source-routing" if the
- next hop will be through the same physical interface through
- which the datagram arrived; otherwise, it is "non-local
- source-routing".
-
- o A host is permitted to perform local source-routing
- without restriction.
-
- o A host that supports non-local source-routing MUST have a
- configurable switch to disable forwarding, and this switch
- MUST default to disabled.
-
- o The host MUST satisfy all gateway requirements for
- configurable policy filters [INTRO:2] restricting non-
- local forwarding.
-
- If a host receives a datagram with an incomplete source route
- but does not forward it for some reason, the host SHOULD return
- an ICMP Destination Unreachable (code 5, Source Route Failed)
- message, unless the datagram was itself an ICMP error message.
-
- 3.3.6 Broadcasts
-
- Section 3.2.1.3 defined the four standard IP broadcast address
- forms:
-
- Limited Broadcast: {-1, -1}
-
- Directed Broadcast: {<Network-number>,-1}
-
- Subnet Directed Broadcast:
- {<Network-number>,<Subnet-number>,-1}
-
- All-Subnets Directed Broadcast: {<Network-number>,-1,-1}
-
- A host MUST recognize any of these forms in the destination
- address of an incoming datagram.
-
- There is a class of hosts* that use non-standard broadcast
- address forms, substituting 0 for -1. All hosts SHOULD
-_________________________
-*4.2BSD Unix and its derivatives, but not 4.3BSD.
-
-
-
-
-Internet Engineering Task Force [Page 66]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- recognize and accept any of these non-standard broadcast
- addresses as the destination address of an incoming datagram.
- A host MAY optionally have a configuration option to choose the
- 0 or the -1 form of broadcast address, for each physical
- interface, but this option SHOULD default to the standard (-1)
- form.
-
- When a host sends a datagram to a link-layer broadcast address,
- the IP destination address MUST be a legal IP broadcast or IP
- multicast address.
-
- A host SHOULD silently discard a datagram that is received via
- a link-layer broadcast (see Section 2.4) but does not specify
- an IP multicast or broadcast destination address.
-
- Hosts SHOULD use the Limited Broadcast address to broadcast to
- a connected network.
-
-
- DISCUSSION:
- Using the Limited Broadcast address instead of a Directed
- Broadcast address may improve system robustness. Problems
- are often caused by machines that do not understand the
- plethora of broadcast addresses (see Section 3.2.1.3), or
- that may have different ideas about which broadcast
- addresses are in use. The prime example of the latter is
- machines that do not understand subnetting but are
- attached to a subnetted net. Sending a Subnet Broadcast
- for the connected network will confuse those machines,
- which will see it as a message to some other host.
-
- There has been discussion on whether a datagram addressed
- to the Limited Broadcast address ought to be sent from all
- the interfaces of a multihomed host. This specification
- takes no stand on the issue.
-
- 3.3.7 IP Multicasting
-
- A host SHOULD support local IP multicasting on all connected
- networks for which a mapping from Class D IP addresses to
- link-layer addresses has been specified (see below). Support
- for local IP multicasting includes sending multicast datagrams,
- joining multicast groups and receiving multicast datagrams, and
- leaving multicast groups. This implies support for all of
- [IP:4] except the IGMP protocol itself, which is OPTIONAL.
-
-
-
-
-
-
-Internet Engineering Task Force [Page 67]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- DISCUSSION:
- IGMP provides gateways that are capable of multicast
- routing with the information required to support IP
- multicasting across multiple networks. At this time,
- multicast-routing gateways are in the experimental stage
- and are not widely available. For hosts that are not
- connected to networks with multicast-routing gateways or
- that do not need to receive multicast datagrams
- originating on other networks, IGMP serves no purpose and
- is therefore optional for now. However, the rest of
- [IP:4] is currently recommended for the purpose of
- providing IP-layer access to local network multicast
- addressing, as a preferable alternative to local broadcast
- addressing. It is expected that IGMP will become
- recommended at some future date, when multicast-routing
- gateways have become more widely available.
-
- If IGMP is not implemented, a host SHOULD still join the "all-
- hosts" group (224.0.0.1) when the IP layer is initialized and
- remain a member for as long as the IP layer is active.
-
- DISCUSSION:
- Joining the "all-hosts" group will support strictly local
- uses of multicasting, e.g., a gateway discovery protocol,
- even if IGMP is not implemented.
-
- The mapping of IP Class D addresses to local addresses is
- currently specified for the following types of networks:
-
- o Ethernet/IEEE 802.3, as defined in [IP:4].
-
- o Any network that supports broadcast but not multicast,
- addressing: all IP Class D addresses map to the local
- broadcast address.
-
- o Any type of point-to-point link (e.g., SLIP or HDLC
- links): no mapping required. All IP multicast datagrams
- are sent as-is, inside the local framing.
-
- Mappings for other types of networks will be specified in the
- future.
-
- A host SHOULD provide a way for higher-layer protocols or
- applications to determine which of the host's connected
- network(s) support IP multicast addressing.
-
-
-
-
-
-
-Internet Engineering Task Force [Page 68]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- 3.3.8 Error Reporting
-
- Wherever practical, hosts MUST return ICMP error datagrams on
- detection of an error, except in those cases where returning an
- ICMP error message is specifically prohibited.
-
- DISCUSSION:
- A common phenomenon in datagram networks is the "black
- hole disease": datagrams are sent out, but nothing comes
- back. Without any error datagrams, it is difficult for
- the user to figure out what the problem is.
-
- 3.4 INTERNET/TRANSPORT LAYER INTERFACE
-
- The interface between the IP layer and the transport layer MUST
- provide full access to all the mechanisms of the IP layer,
- including options, Type-of-Service, and Time-to-Live. The
- transport layer MUST either have mechanisms to set these interface
- parameters, or provide a path to pass them through from an
- application, or both.
-
- DISCUSSION:
- Applications are urged to make use of these mechanisms where
- applicable, even when the mechanisms are not currently
- effective in the Internet (e.g., TOS). This will allow these
- mechanisms to be immediately useful when they do become
- effective, without a large amount of retrofitting of host
- software.
-
- We now describe a conceptual interface between the transport layer
- and the IP layer, as a set of procedure calls. This is an
- extension of the information in Section 3.3 of RFC-791 [IP:1].
-
-
- * Send Datagram
-
- SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt
- => result )
-
- where the parameters are defined in RFC-791. Passing an Id
- parameter is optional; see Section 3.2.1.5.
-
-
- * Receive Datagram
-
- RECV(BufPTR, prot
- => result, src, dst, SpecDest, TOS, len, opt)
-
-
-
-
-Internet Engineering Task Force [Page 69]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- All the parameters are defined in RFC-791, except for:
-
- SpecDest = specific-destination address of datagram
- (defined in Section 3.2.1.3)
-
- The result parameter dst contains the datagram's destination
- address. Since this may be a broadcast or multicast address,
- the SpecDest parameter (not shown in RFC-791) MUST be passed.
- The parameter opt contains all the IP options received in the
- datagram; these MUST also be passed to the transport layer.
-
-
- * Select Source Address
-
- GET_SRCADDR(remote, TOS) -> local
-
- remote = remote IP address
- TOS = Type-of-Service
- local = local IP address
-
- See Section 3.3.4.3.
-
-
- * Find Maximum Datagram Sizes
-
- GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S
-
- MMS_R = maximum receive transport-message size.
- MMS_S = maximum send transport-message size.
- (local, remote, TOS defined above)
-
- See Sections 3.3.2 and 3.3.3.
-
-
- * Advice on Delivery Success
-
- ADVISE_DELIVPROB(sense, local, remote, TOS)
-
- Here the parameter sense is a 1-bit flag indicating whether
- positive or negative advice is being given; see the
- discussion in Section 3.3.1.4. The other parameters were
- defined earlier.
-
-
- * Send ICMP Message
-
- SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
- -> result
-
-
-
-Internet Engineering Task Force [Page 70]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- (Parameters defined in RFC-791).
-
- Passing an Id parameter is optional; see Section 3.2.1.5.
- The transport layer MUST be able to send certain ICMP
- messages: Port Unreachable or any of the query-type
- messages. This function could be considered to be a special
- case of the SEND() call, of course; we describe it separately
- for clarity.
-
-
- * Receive ICMP Message
-
- RECV_ICMP(BufPTR ) -> result, src, dst, len, opt
-
- (Parameters defined in RFC-791).
-
- The IP layer MUST pass certain ICMP messages up to the
- appropriate transport-layer routine. This function could be
- considered to be a special case of the RECV() call, of
- course; we describe it separately for clarity.
-
- For an ICMP error message, the data that is passed up MUST
- include the original Internet header plus all the octets of
- the original message that are included in the ICMP message.
- This data will be used by the transport layer to locate the
- connection state information, if any.
-
- In particular, the following ICMP messages are to be passed
- up:
-
- o Destination Unreachable
-
- o Source Quench
-
- o Echo Reply (to ICMP user interface, unless the Echo
- Request originated in the IP layer)
-
- o Timestamp Reply (to ICMP user interface)
-
- o Time Exceeded
-
-
- DISCUSSION:
- In the future, there may be additions to this interface to
- pass path data (see Section 3.3.1.3) between the IP and
- transport layers.
-
-
-
-
-
-Internet Engineering Task Force [Page 71]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- 3.5 INTERNET LAYER REQUIREMENTS SUMMARY
-
-
- | | | | |S| |
- | | | | |H| |F
- | | | | |O|M|o
- | | |S| |U|U|o
- | | |H| |L|S|t
- | |M|O| |D|T|n
- | |U|U|M| | |o
- | |S|L|A|N|N|t
- | |T|D|Y|O|O|t
-FEATURE |SECTION | | | |T|T|e
--------------------------------------------------|--------|-|-|-|-|-|--
- | | | | | | |
-Implement IP and ICMP |3.1 |x| | | | |
-Handle remote multihoming in application layer |3.1 |x| | | | |
-Support local multihoming |3.1 | | |x| | |
-Meet gateway specs if forward datagrams |3.1 |x| | | | |
-Configuration switch for embedded gateway |3.1 |x| | | | |1
- Config switch default to non-gateway |3.1 |x| | | | |1
- Auto-config based on number of interfaces |3.1 | | | | |x|1
-Able to log discarded datagrams |3.1 | |x| | | |
- Record in counter |3.1 | |x| | | |
- | | | | | | |
-Silently discard Version != 4 |3.2.1.1 |x| | | | |
-Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | |
-Addressing: | | | | | | |
- Subnet addressing (RFC-950) |3.2.1.3 |x| | | | |
- Src address must be host's own IP address |3.2.1.3 |x| | | | |
- Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | |
- Silently discard datagram with bad src addr |3.2.1.3 |x| | | | |
-Support reassembly |3.2.1.4 |x| | | | |
-Retain same Id field in identical datagram |3.2.1.5 | | |x| | |
- | | | | | | |
-TOS: | | | | | | |
- Allow transport layer to set TOS |3.2.1.6 |x| | | | |
- Pass received TOS up to transport layer |3.2.1.6 | |x| | | |
- Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| |
-TTL: | | | | | | |
- Send packet with TTL of 0 |3.2.1.7 | | | | |x|
- Discard received packets with TTL < 2 |3.2.1.7 | | | | |x|
- Allow transport layer to set TTL |3.2.1.7 |x| | | | |
- Fixed TTL is configurable |3.2.1.7 |x| | | | |
- | | | | | | |
-IP Options: | | | | | | |
- Allow transport layer to send IP options |3.2.1.8 |x| | | | |
- Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | |
-
-
-
-Internet Engineering Task Force [Page 72]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- IP layer silently ignore unknown options |3.2.1.8 |x| | | | |
- Security option |3.2.1.8a| | |x| | |
- Send Stream Identifier option |3.2.1.8b| | | |x| |
- Silently ignore Stream Identifer option |3.2.1.8b|x| | | | |
- Record Route option |3.2.1.8d| | |x| | |
- Timestamp option |3.2.1.8e| | |x| | |
-Source Route Option: | | | | | | |
- Originate & terminate Source Route options |3.2.1.8c|x| | | | |
- Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | |
- Build correct (non-redundant) return route |3.2.1.8c|x| | | | |
- Send multiple SR options in one header |3.2.1.8c| | | | |x|
- | | | | | | |
-ICMP: | | | | | | |
- Silently discard ICMP msg with unknown type |3.2.2 |x| | | | |
- Include more than 8 octets of orig datagram |3.2.2 | | |x| | |
- Included octets same as received |3.2.2 |x| | | | |
- Demux ICMP Error to transport protocol |3.2.2 |x| | | | |
- Send ICMP error message with TOS=0 |3.2.2 | |x| | | |
- Send ICMP error message for: | | | | | | |
- - ICMP error msg |3.2.2 | | | | |x|
- - IP b'cast or IP m'cast |3.2.2 | | | | |x|
- - Link-layer b'cast |3.2.2 | | | | |x|
- - Non-initial fragment |3.2.2 | | | | |x|
- - Datagram with non-unique src address |3.2.2 | | | | |x|
- Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | |
- | | | | | | |
- Dest Unreachable: | | | | | | |
- Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | |
- Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | |
- Higher layer act on Dest Unreach |3.2.2.1 | |x| | | |
- Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | |
- Redirect: | | | | | | |
- Host send Redirect |3.2.2.2 | | | |x| |
- Update route cache when recv Redirect |3.2.2.2 |x| | | | |
- Handle both Host and Net Redirects |3.2.2.2 |x| | | | |
- Discard illegal Redirect |3.2.2.2 | |x| | | |
- Source Quench: | | | | | | |
- Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | |
- Pass Source Quench to higher layer |3.2.2.3 |x| | | | |
- Higher layer act on Source Quench |3.2.2.3 | |x| | | |
- Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | |
- Parameter Problem: | | | | | | |
- Send Parameter Problem messages |3.2.2.5 | |x| | | |
- Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | |
- Report Parameter Problem to user |3.2.2.5 | | |x| | |
- | | | | | | |
- ICMP Echo Request or Reply: | | | | | | |
- Echo server and Echo client |3.2.2.6 |x| | | | |
-
-
-
-Internet Engineering Task Force [Page 73]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- Echo client |3.2.2.6 | |x| | | |
- Discard Echo Request to broadcast address |3.2.2.6 | | |x| | |
- Discard Echo Request to multicast address |3.2.2.6 | | |x| | |
- Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | |
- Send same data in Echo Reply |3.2.2.6 |x| | | | |
- Pass Echo Reply to higher layer |3.2.2.6 |x| | | | |
- Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |
- Reverse and reflect Source Route option |3.2.2.6 |x| | | | |
- | | | | | | |
- ICMP Information Request or Reply: |3.2.2.7 | | | |x| |
- ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | |
- Minimize delay variability |3.2.2.8 | |x| | | |1
- Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1
- Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1
- Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1
- Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1
- Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1
- Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1
- Obey rules for "standard value" |3.2.2.8 |x| | | | |1
- | | | | | | |
- ICMP Address Mask Request and Reply: | | | | | | |
- Addr Mask source configurable |3.2.2.9 |x| | | | |
- Support static configuration of addr mask |3.2.2.9 |x| | | | |
- Get addr mask dynamically during booting |3.2.2.9 | | |x| | |
- Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | |
- Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3
- Assume default mask if no Reply |3.2.2.9 | |x| | | |3
- Update address mask from first Reply only |3.2.2.9 |x| | | | |3
- Reasonableness check on Addr Mask |3.2.2.9 | |x| | | |
- Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x|
- Explicitly configured to be agent |3.2.2.9 |x| | | | |
- Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | |
- Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3
- | | | | | | |
-ROUTING OUTBOUND DATAGRAMS: | | | | | | |
- Use address mask in local/remote decision |3.3.1.1 |x| | | | |
- Operate with no gateways on conn network |3.3.1.1 |x| | | | |
- Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | |
- Treat Host and Net Redirect the same |3.3.1.2 | |x| | | |
- If no cache entry, use default gateway |3.3.1.2 |x| | | | |
- Support multiple default gateways |3.3.1.2 |x| | | | |
- Provide table of static routes |3.3.1.2 | | |x| | |
- Flag: route overridable by Redirects |3.3.1.2 | | |x| | |
- Key route cache on host, not net address |3.3.1.3 | | |x| | |
- Include TOS in route cache |3.3.1.3 | |x| | | |
- | | | | | | |
- Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | |
- Assume route is good forever |3.3.1.4 | | | |x| |
-
-
-
-Internet Engineering Task Force [Page 74]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- Ping gateways continuously |3.3.1.4 | | | | |x|
- Ping only when traffic being sent |3.3.1.4 |x| | | | |
- Ping only when no positive indication |3.3.1.4 |x| | | | |
- Higher and lower layers give advice |3.3.1.4 | |x| | | |
- Switch from failed default g'way to another |3.3.1.5 |x| | | | |
- Manual method of entering config info |3.3.1.6 |x| | | | |
- | | | | | | |
-REASSEMBLY and FRAGMENTATION: | | | | | | |
- Able to reassemble incoming datagrams |3.3.2 |x| | | | |
- At least 576 byte datagrams |3.3.2 |x| | | | |
- EMTU_R configurable or indefinite |3.3.2 | |x| | | |
- Transport layer able to learn MMS_R |3.3.2 |x| | | | |
- Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | |
- Fixed reassembly timeout value |3.3.2 | |x| | | |
- | | | | | | |
- Pass MMS_S to higher layers |3.3.3 |x| | | | |
- Local fragmentation of outgoing packets |3.3.3 | | |x| | |
- Else don't send bigger than MMS_S |3.3.3 |x| | | | |
- Send max 576 to off-net destination |3.3.3 | |x| | | |
- All-Subnets-MTU configuration flag |3.3.3 | | |x| | |
- | | | | | | |
-MULTIHOMING: | | | | | | |
- Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | |
- Allow application to choose local IP addr |3.3.4.2 |x| | | | |
- Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | |
- Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4
- | | | | | | |
-SOURCE-ROUTE FORWARDING: | | | | | | |
- Forward datagram with Source Route option |3.3.5 | | |x| | |1
- Obey corresponding gateway rules |3.3.5 |x| | | | |1
- Update TTL by gateway rules |3.3.5 |x| | | | |1
- Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1
- IP src addr not local host |3.3.5 | | |x| | |1
- Update Timestamp, Record Route options |3.3.5 |x| | | | |1
- Configurable switch for non-local SRing |3.3.5 |x| | | | |1
- Defaults to OFF |3.3.5 |x| | | | |1
- Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1
- If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2
- | | | | | | |
-BROADCAST: | | | | | | |
- Broadcast addr as IP source addr |3.2.1.3 | | | | |x|
- Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | |
- Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | |
- Default to -1 broadcast |3.3.6 | |x| | | |
- Recognize all broadcast address formats |3.3.6 |x| | | | |
- Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | |
- Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | |
- Use Limited Broadcast addr for connected net |3.3.6 | |x| | | |
-
-
-
-Internet Engineering Task Force [Page 75]
-
-
-
-
-RFC1122 INTERNET LAYER October 1989
-
-
- | | | | | | |
-MULTICAST: | | | | | | |
- Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | |
- Support IGMP (RFC-1112) |3.3.7 | | |x| | |
- Join all-hosts group at startup |3.3.7 | |x| | | |
- Higher layers learn i'face m'cast capability |3.3.7 | |x| | | |
- | | | | | | |
-INTERFACE: | | | | | | |
- Allow transport layer to use all IP mechanisms |3.4 |x| | | | |
- Pass interface ident up to transport layer |3.4 |x| | | | |
- Pass all IP options up to transport layer |3.4 |x| | | | |
- Transport layer can send certain ICMP messages |3.4 |x| | | | |
- Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | |
- Include IP hdr+8 octets or more from orig. |3.4 |x| | | | |
- Able to leap tall buildings at a single bound |3.5 | |x| | | |
-
-Footnotes:
-
-(1) Only if feature is implemented.
-
-(2) This requirement is overruled if datagram is an ICMP error message.
-
-(3) Only if feature is implemented and is configured "on".
-
-(4) Unless has embedded gateway functionality or is source routed.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 76]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- UDP October 1989
-
-
-4. TRANSPORT PROTOCOLS
-
- 4.1 USER DATAGRAM PROTOCOL -- UDP
-
- 4.1.1 INTRODUCTION
-
- The User Datagram Protocol UDP [UDP:1] offers only a minimal
- transport service -- non-guaranteed datagram delivery -- and
- gives applications direct access to the datagram service of the
- IP layer. UDP is used by applications that do not require the
- level of service of TCP or that wish to use communications
- services (e.g., multicast or broadcast delivery) not available
- from TCP.
-
- UDP is almost a null protocol; the only services it provides
- over IP are checksumming of data and multiplexing by port
- number. Therefore, an application program running over UDP
- must deal directly with end-to-end communication problems that
- a connection-oriented protocol would have handled -- e.g.,
- retransmission for reliable delivery, packetization and
- reassembly, flow control, congestion avoidance, etc., when
- these are required. The fairly complex coupling between IP and
- TCP will be mirrored in the coupling between UDP and many
- applications using UDP.
-
- 4.1.2 PROTOCOL WALK-THROUGH
-
- There are no known errors in the specification of UDP.
-
- 4.1.3 SPECIFIC ISSUES
-
- 4.1.3.1 Ports
-
- UDP well-known ports follow the same rules as TCP well-known
- ports; see Section 4.2.2.1 below.
-
- If a datagram arrives addressed to a UDP port for which
- there is no pending LISTEN call, UDP SHOULD send an ICMP
- Port Unreachable message.
-
- 4.1.3.2 IP Options
-
- UDP MUST pass any IP option that it receives from the IP
- layer transparently to the application layer.
-
- An application MUST be able to specify IP options to be sent
- in its UDP datagrams, and UDP MUST pass these options to the
- IP layer.
-
-
-
-Internet Engineering Task Force [Page 77]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- UDP October 1989
-
-
- DISCUSSION:
- At present, the only options that need be passed
- through UDP are Source Route, Record Route, and Time
- Stamp. However, new options may be defined in the
- future, and UDP need not and should not make any
- assumptions about the format or content of options it
- passes to or from the application; an exception to this
- might be an IP-layer security option.
-
- An application based on UDP will need to obtain a
- source route from a request datagram and supply a
- reversed route for sending the corresponding reply.
-
- 4.1.3.3 ICMP Messages
-
- UDP MUST pass to the application layer all ICMP error
- messages that it receives from the IP layer. Conceptually
- at least, this may be accomplished with an upcall to the
- ERROR_REPORT routine (see Section 4.2.4.1).
-
- DISCUSSION:
- Note that ICMP error messages resulting from sending a
- UDP datagram are received asynchronously. A UDP-based
- application that wants to receive ICMP error messages
- is responsible for maintaining the state necessary to
- demultiplex these messages when they arrive; for
- example, the application may keep a pending receive
- operation for this purpose. The application is also
- responsible to avoid confusion from a delayed ICMP
- error message resulting from an earlier use of the same
- port(s).
-
- 4.1.3.4 UDP Checksums
-
- A host MUST implement the facility to generate and validate
- UDP checksums. An application MAY optionally be able to
- control whether a UDP checksum will be generated, but it
- MUST default to checksumming on.
-
- If a UDP datagram is received with a checksum that is non-
- zero and invalid, UDP MUST silently discard the datagram.
- An application MAY optionally be able to control whether UDP
- datagrams without checksums should be discarded or passed to
- the application.
-
- DISCUSSION:
- Some applications that normally run only across local
- area networks have chosen to turn off UDP checksums for
-
-
-
-Internet Engineering Task Force [Page 78]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- UDP October 1989
-
-
- efficiency. As a result, numerous cases of undetected
- errors have been reported. The advisability of ever
- turning off UDP checksumming is very controversial.
-
- IMPLEMENTATION:
- There is a common implementation error in UDP
- checksums. Unlike the TCP checksum, the UDP checksum
- is optional; the value zero is transmitted in the
- checksum field of a UDP header to indicate the absence
- of a checksum. If the transmitter really calculates a
- UDP checksum of zero, it must transmit the checksum as
- all 1's (65535). No special action is required at the
- receiver, since zero and 65535 are equivalent in 1's
- complement arithmetic.
-
- 4.1.3.5 UDP Multihoming
-
- When a UDP datagram is received, its specific-destination
- address MUST be passed up to the application layer.
-
- An application program MUST be able to specify the IP source
- address to be used for sending a UDP datagram or to leave it
- unspecified (in which case the networking software will
- choose an appropriate source address). There SHOULD be a
- way to communicate the chosen source address up to the
- application layer (e.g, so that the application can later
- receive a reply datagram only from the corresponding
- interface).
-
- DISCUSSION:
- A request/response application that uses UDP should use
- a source address for the response that is the same as
- the specific destination address of the request. See
- the "General Issues" section of [INTRO:1].
-
- 4.1.3.6 Invalid Addresses
-
- A UDP datagram received with an invalid IP source address
- (e.g., a broadcast or multicast address) must be discarded
- by UDP or by the IP layer (see Section 3.2.1.3).
-
- When a host sends a UDP datagram, the source address MUST be
- (one of) the IP address(es) of the host.
-
- 4.1.4 UDP/APPLICATION LAYER INTERFACE
-
- The application interface to UDP MUST provide the full services
- of the IP/transport interface described in Section 3.4 of this
-
-
-
-Internet Engineering Task Force [Page 79]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- UDP October 1989
-
-
- document. Thus, an application using UDP needs the functions
- of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and
- RECV_ICMP() calls described in Section 3.4. For example,
- GET_MAXSIZES() can be used to learn the effective maximum UDP
- maximum datagram size for a particular {interface,remote
- host,TOS} triplet.
-
- An application-layer program MUST be able to set the TTL and
- TOS values as well as IP options for sending a UDP datagram,
- and these values must be passed transparently to the IP layer.
- UDP MAY pass the received TOS up to the application layer.
-
- 4.1.5 UDP REQUIREMENTS SUMMARY
-
-
- | | | | |S| |
- | | | | |H| |F
- | | | | |O|M|o
- | | |S| |U|U|o
- | | |H| |L|S|t
- | |M|O| |D|T|n
- | |U|U|M| | |o
- | |S|L|A|N|N|t
- | |T|D|Y|O|O|t
-FEATURE |SECTION | | | |T|T|e
--------------------------------------------------|--------|-|-|-|-|-|--
- | | | | | | |
- UDP | | | | | | |
--------------------------------------------------|--------|-|-|-|-|-|--
- | | | | | | |
-UDP send Port Unreachable |4.1.3.1 | |x| | | |
- | | | | | | |
-IP Options in UDP | | | | | | |
- - Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | |
- - Applic layer can specify IP options in Send |4.1.3.2 |x| | | | |
- - UDP passes IP options down to IP layer |4.1.3.2 |x| | | | |
- | | | | | | |
-Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | |
- | | | | | | |
-UDP checksums: | | | | | | |
- - Able to generate/check checksum |4.1.3.4 |x| | | | |
- - Silently discard bad checksum |4.1.3.4 |x| | | | |
- - Sender Option to not generate checksum |4.1.3.4 | | |x| | |
- - Default is to checksum |4.1.3.4 |x| | | | |
- - Receiver Option to require checksum |4.1.3.4 | | |x| | |
- | | | | | | |
-UDP Multihoming | | | | | | |
- - Pass spec-dest addr to application |4.1.3.5 |x| | | | |
-
-
-
-Internet Engineering Task Force [Page 80]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- UDP October 1989
-
-
- - Applic layer can specify Local IP addr |4.1.3.5 |x| | | | |
- - Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | |
- - Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | |
- | | | | | | |
-Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | |
-Only send valid IP source address |4.1.3.6 |x| | | | |
-UDP Application Interface Services | | | | | | |
-Full IP interface of 3.4 for application |4.1.4 |x| | | | |
- - Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | |
- - Pass received TOS up to applic layer |4.1.4 | | |x| | |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 81]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP
-
- 4.2.1 INTRODUCTION
-
- The Transmission Control Protocol TCP [TCP:1] is the primary
- virtual-circuit transport protocol for the Internet suite. TCP
- provides reliable, in-sequence delivery of a full-duplex stream
- of octets (8-bit bytes). TCP is used by those applications
- needing reliable, connection-oriented transport service, e.g.,
- mail (SMTP), file transfer (FTP), and virtual terminal service
- (Telnet); requirements for these application-layer protocols
- are described in [INTRO:1].
-
- 4.2.2 PROTOCOL WALK-THROUGH
-
- 4.2.2.1 Well-Known Ports: RFC-793 Section 2.7
-
- DISCUSSION:
- TCP reserves port numbers in the range 0-255 for
- "well-known" ports, used to access services that are
- standardized across the Internet. The remainder of the
- port space can be freely allocated to application
- processes. Current well-known port definitions are
- listed in the RFC entitled "Assigned Numbers"
- [INTRO:6]. A prerequisite for defining a new well-
- known port is an RFC documenting the proposed service
- in enough detail to allow new implementations.
-
- Some systems extend this notion by adding a third
- subdivision of the TCP port space: reserved ports,
- which are generally used for operating-system-specific
- services. For example, reserved ports might fall
- between 256 and some system-dependent upper limit.
- Some systems further choose to protect well-known and
- reserved ports by permitting only privileged users to
- open TCP connections with those port values. This is
- perfectly reasonable as long as the host does not
- assume that all hosts protect their low-numbered ports
- in this manner.
-
- 4.2.2.2 Use of Push: RFC-793 Section 2.8
-
- When an application issues a series of SEND calls without
- setting the PUSH flag, the TCP MAY aggregate the data
- internally without sending it. Similarly, when a series of
- segments is received without the PSH bit, a TCP MAY queue
- the data internally without passing it to the receiving
- application.
-
-
-
-Internet Engineering Task Force [Page 82]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- The PSH bit is not a record marker and is independent of
- segment boundaries. The transmitter SHOULD collapse
- successive PSH bits when it packetizes data, to send the
- largest possible segment.
-
- A TCP MAY implement PUSH flags on SEND calls. If PUSH flags
- are not implemented, then the sending TCP: (1) must not
- buffer data indefinitely, and (2) MUST set the PSH bit in
- the last buffered segment (i.e., when there is no more
- queued data to be sent).
-
- The discussion in RFC-793 on pages 48, 50, and 74
- erroneously implies that a received PSH flag must be passed
- to the application layer. Passing a received PSH flag to
- the application layer is now OPTIONAL.
-
- An application program is logically required to set the PUSH
- flag in a SEND call whenever it needs to force delivery of
- the data to avoid a communication deadlock. However, a TCP
- SHOULD send a maximum-sized segment whenever possible, to
- improve performance (see Section 4.2.3.4).
-
- DISCUSSION:
- When the PUSH flag is not implemented on SEND calls,
- i.e., when the application/TCP interface uses a pure
- streaming model, responsibility for aggregating any
- tiny data fragments to form reasonable sized segments
- is partially borne by the application layer.
-
- Generally, an interactive application protocol must set
- the PUSH flag at least in the last SEND call in each
- command or response sequence. A bulk transfer protocol
- like FTP should set the PUSH flag on the last segment
- of a file or when necessary to prevent buffer deadlock.
-
- At the receiver, the PSH bit forces buffered data to be
- delivered to the application (even if less than a full
- buffer has been received). Conversely, the lack of a
- PSH bit can be used to avoid unnecessary wakeup calls
- to the application process; this can be an important
- performance optimization for large timesharing hosts.
- Passing the PSH bit to the receiving application allows
- an analogous optimization within the application.
-
- 4.2.2.3 Window Size: RFC-793 Section 3.1
-
- The window size MUST be treated as an unsigned number, or
- else large window sizes will appear like negative windows
-
-
-
-Internet Engineering Task Force [Page 83]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- and TCP will not work. It is RECOMMENDED that
- implementations reserve 32-bit fields for the send and
- receive window sizes in the connection record and do all
- window computations with 32 bits.
-
- DISCUSSION:
- It is known that the window field in the TCP header is
- too small for high-speed, long-delay paths.
- Experimental TCP options have been defined to extend
- the window size; see for example [TCP:11]. In
- anticipation of the adoption of such an extension, TCP
- implementors should treat windows as 32 bits.
-
- 4.2.2.4 Urgent Pointer: RFC-793 Section 3.1
-
- The second sentence is in error: the urgent pointer points
- to the sequence number of the LAST octet (not LAST+1) in a
- sequence of urgent data. The description on page 56 (last
- sentence) is correct.
-
- A TCP MUST support a sequence of urgent data of any length.
-
- A TCP MUST inform the application layer asynchronously
- whenever it receives an Urgent pointer and there was
- previously no pending urgent data, or whenever the Urgent
- pointer advances in the data stream. There MUST be a way
- for the application to learn how much urgent data remains to
- be read from the connection, or at least to determine
- whether or not more urgent data remains to be read.
-
- DISCUSSION:
- Although the Urgent mechanism may be used for any
- application, it is normally used to send "interrupt"-
- type commands to a Telnet program (see "Using Telnet
- Synch Sequence" section in [INTRO:1]).
-
- The asynchronous or "out-of-band" notification will
- allow the application to go into "urgent mode", reading
- data from the TCP connection. This allows control
- commands to be sent to an application whose normal
- input buffers are full of unprocessed data.
-
- IMPLEMENTATION:
- The generic ERROR-REPORT() upcall described in Section
- 4.2.4.1 is a possible mechanism for informing the
- application of the arrival of urgent data.
-
-
-
-
-
-Internet Engineering Task Force [Page 84]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- 4.2.2.5 TCP Options: RFC-793 Section 3.1
-
- A TCP MUST be able to receive a TCP option in any segment.
- A TCP MUST ignore without error any TCP option it does not
- implement, assuming that the option has a length field (all
- TCP options defined in the future will have length fields).
- TCP MUST be prepared to handle an illegal option length
- (e.g., zero) without crashing; a suggested procedure is to
- reset the connection and log the reason.
-
- 4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1
-
- TCP MUST implement both sending and receiving the Maximum
- Segment Size option [TCP:4].
-
- TCP SHOULD send an MSS (Maximum Segment Size) option in
- every SYN segment when its receive MSS differs from the
- default 536, and MAY send it always.
-
- If an MSS option is not received at connection setup, TCP
- MUST assume a default send MSS of 536 (576-40) [TCP:4].
-
- The maximum size of a segment that TCP really sends, the
- "effective send MSS," MUST be the smaller of the send MSS
- (which reflects the available reassembly buffer size at the
- remote host) and the largest size permitted by the IP layer:
-
- Eff.snd.MSS =
-
- min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize
-
- where:
-
- * SendMSS is the MSS value received from the remote host,
- or the default 536 if no MSS option is received.
-
- * MMS_S is the maximum size for a transport-layer message
- that TCP may send.
-
- * TCPhdrsize is the size of the TCP header; this is
- normally 20, but may be larger if TCP options are to be
- sent.
-
- * IPoptionsize is the size of any IP options that TCP
- will pass to the IP layer with the current message.
-
-
- The MSS value to be sent in an MSS option must be less than
-
-
-
-Internet Engineering Task Force [Page 85]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- or equal to:
-
- MMS_R - 20
-
- where MMS_R is the maximum size for a transport-layer
- message that can be received (and reassembled). TCP obtains
- MMS_R and MMS_S from the IP layer; see the generic call
- GET_MAXSIZES in Section 3.4.
-
- DISCUSSION:
- The choice of TCP segment size has a strong effect on
- performance. Larger segments increase throughput by
- amortizing header size and per-datagram processing
- overhead over more data bytes; however, if the packet
- is so large that it causes IP fragmentation, efficiency
- drops sharply if any fragments are lost [IP:9].
-
- Some TCP implementations send an MSS option only if the
- destination host is on a non-connected network.
- However, in general the TCP layer may not have the
- appropriate information to make this decision, so it is
- preferable to leave to the IP layer the task of
- determining a suitable MTU for the Internet path. We
- therefore recommend that TCP always send the option (if
- not 536) and that the IP layer determine MMS_R as
- specified in 3.3.3 and 3.4. A proposed IP-layer
- mechanism to measure the MTU would then modify the IP
- layer without changing TCP.
-
- 4.2.2.7 TCP Checksum: RFC-793 Section 3.1
-
- Unlike the UDP checksum (see Section 4.1.3.4), the TCP
- checksum is never optional. The sender MUST generate it and
- the receiver MUST check it.
-
- 4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2,
- page 23
-
- There are several problems with this diagram:
-
- (a) The arrow from SYN-SENT to SYN-RCVD should be labeled
- with "snd SYN,ACK", to agree with the text on page 68
- and with Figure 8.
-
- (b) There could be an arrow from SYN-RCVD state to LISTEN
- state, conditioned on receiving a RST after a passive
- open (see text page 70).
-
-
-
-
-Internet Engineering Task Force [Page 86]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- (c) It is possible to go directly from FIN-WAIT-1 to the
- TIME-WAIT state (see page 75 of the spec).
-
-
- 4.2.2.9 Initial Sequence Number Selection: RFC-793 Section
- 3.3, page 27
-
- A TCP MUST use the specified clock-driven selection of
- initial sequence numbers.
-
- 4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page
- 32
-
- There is an error in Figure 8: the packet on line 7 should
- be identical to the packet on line 5.
-
- A TCP MUST support simultaneous open attempts.
-
- DISCUSSION:
- It sometimes surprises implementors that if two
- applications attempt to simultaneously connect to each
- other, only one connection is generated instead of two.
- This was an intentional design decision; don't try to
- "fix" it.
-
- 4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4,
- page 33
-
- Note that a TCP implementation MUST keep track of whether a
- connection has reached SYN_RCVD state as the result of a
- passive OPEN or an active OPEN.
-
- 4.2.2.12 RST Segment: RFC-793 Section 3.4
-
- A TCP SHOULD allow a received RST segment to include data.
-
- DISCUSSION
- It has been suggested that a RST segment could contain
- ASCII text that encoded and explained the cause of the
- RST. No standard has yet been established for such
- data.
-
- 4.2.2.13 Closing a Connection: RFC-793 Section 3.5
-
- A TCP connection may terminate in two ways: (1) the normal
- TCP close sequence using a FIN handshake, and (2) an "abort"
- in which one or more RST segments are sent and the
- connection state is immediately discarded. If a TCP
-
-
-
-Internet Engineering Task Force [Page 87]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- connection is closed by the remote site, the local
- application MUST be informed whether it closed normally or
- was aborted.
-
- The normal TCP close sequence delivers buffered data
- reliably in both directions. Since the two directions of a
- TCP connection are closed independently, it is possible for
- a connection to be "half closed," i.e., closed in only one
- direction, and a host is permitted to continue sending data
- in the open direction on a half-closed connection.
-
- A host MAY implement a "half-duplex" TCP close sequence, so
- that an application that has called CLOSE cannot continue to
- read data from the connection. If such a host issues a
- CLOSE call while received data is still pending in TCP, or
- if new data is received after CLOSE is called, its TCP
- SHOULD send a RST to show that data was lost.
-
- When a connection is closed actively, it MUST linger in
- TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
- However, it MAY accept a new SYN from the remote TCP to
- reopen the connection directly from TIME-WAIT state, if it:
-
- (1) assigns its initial sequence number for the new
- connection to be larger than the largest sequence
- number it used on the previous connection incarnation,
- and
-
- (2) returns to TIME-WAIT state if the SYN turns out to be
- an old duplicate.
-
-
- DISCUSSION:
- TCP's full-duplex data-preserving close is a feature
- that is not included in the analogous ISO transport
- protocol TP4.
-
- Some systems have not implemented half-closed
- connections, presumably because they do not fit into
- the I/O model of their particular operating system. On
- these systems, once an application has called CLOSE, it
- can no longer read input data from the connection; this
- is referred to as a "half-duplex" TCP close sequence.
-
- The graceful close algorithm of TCP requires that the
- connection state remain defined on (at least) one end
- of the connection, for a timeout period of 2xMSL, i.e.,
- 4 minutes. During this period, the (remote socket,
-
-
-
-Internet Engineering Task Force [Page 88]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- local socket) pair that defines the connection is busy
- and cannot be reused. To shorten the time that a given
- port pair is tied up, some TCPs allow a new SYN to be
- accepted in TIME-WAIT state.
-
- 4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40
-
- Since RFC-793 was written, there has been extensive work on
- TCP algorithms to achieve efficient data communication.
- Later sections of the present document describe required and
- recommended TCP algorithms to determine when to send data
- (Section 4.2.3.4), when to send an acknowledgment (Section
- 4.2.3.2), and when to update the window (Section 4.2.3.3).
-
- DISCUSSION:
- One important performance issue is "Silly Window
- Syndrome" or "SWS" [TCP:5], a stable pattern of small
- incremental window movements resulting in extremely
- poor TCP performance. Algorithms to avoid SWS are
- described below for both the sending side (Section
- 4.2.3.4) and the receiving side (Section 4.2.3.3).
-
- In brief, SWS is caused by the receiver advancing the
- right window edge whenever it has any new buffer space
- available to receive data and by the sender using any
- incremental window, no matter how small, to send more
- data [TCP:5]. The result can be a stable pattern of
- sending tiny data segments, even though both sender and
- receiver have a large total buffer space for the
- connection. SWS can only occur during the transmission
- of a large amount of data; if the connection goes
- quiescent, the problem will disappear. It is caused by
- typical straightforward implementation of window
- management, but the sender and receiver algorithms
- given below will avoid it.
-
- Another important TCP performance issue is that some
- applications, especially remote login to character-at-
- a-time hosts, tend to send streams of one-octet data
- segments. To avoid deadlocks, every TCP SEND call from
- such applications must be "pushed", either explicitly
- by the application or else implicitly by TCP. The
- result may be a stream of TCP segments that contain one
- data octet each, which makes very inefficient use of
- the Internet and contributes to Internet congestion.
- The Nagle Algorithm described in Section 4.2.3.4
- provides a simple and effective solution to this
- problem. It does have the effect of clumping
-
-
-
-Internet Engineering Task Force [Page 89]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- characters over Telnet connections; this may initially
- surprise users accustomed to single-character echo, but
- user acceptance has not been a problem.
-
- Note that the Nagle algorithm and the send SWS
- avoidance algorithm play complementary roles in
- improving performance. The Nagle algorithm discourages
- sending tiny segments when the data to be sent
- increases in small increments, while the SWS avoidance
- algorithm discourages small segments resulting from the
- right window edge advancing in small increments.
-
- A careless implementation can send two or more
- acknowledgment segments per data segment received. For
- example, suppose the receiver acknowledges every data
- segment immediately. When the application program
- subsequently consumes the data and increases the
- available receive buffer space again, the receiver may
- send a second acknowledgment segment to update the
- window at the sender. The extreme case occurs with
- single-character segments on TCP connections using the
- Telnet protocol for remote login service. Some
- implementations have been observed in which each
- incoming 1-character segment generates three return
- segments: (1) the acknowledgment, (2) a one byte
- increase in the window, and (3) the echoed character,
- respectively.
-
- 4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41
-
- The algorithm suggested in RFC-793 for calculating the
- retransmission timeout is now known to be inadequate; see
- Section 4.2.3.1 below.
-
- Recent work by Jacobson [TCP:7] on Internet congestion and
- TCP retransmission stability has produced a transmission
- algorithm combining "slow start" with "congestion
- avoidance". A TCP MUST implement this algorithm.
-
- If a retransmitted packet is identical to the original
- packet (which implies not only that the data boundaries have
- not changed, but also that the window and acknowledgment
- fields of the header have not changed), then the same IP
- Identification field MAY be used (see Section 3.2.1.5).
-
- IMPLEMENTATION:
- Some TCP implementors have chosen to "packetize" the
- data stream, i.e., to pick segment boundaries when
-
-
-
-Internet Engineering Task Force [Page 90]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- segments are originally sent and to queue these
- segments in a "retransmission queue" until they are
- acknowledged. Another design (which may be simpler) is
- to defer packetizing until each time data is
- transmitted or retransmitted, so there will be no
- segment retransmission queue.
-
- In an implementation with a segment retransmission
- queue, TCP performance may be enhanced by repacketizing
- the segments awaiting acknowledgment when the first
- retransmission timeout occurs. That is, the
- outstanding segments that fitted would be combined into
- one maximum-sized segment, with a new IP Identification
- value. The TCP would then retain this combined segment
- in the retransmit queue until it was acknowledged.
- However, if the first two segments in the
- retransmission queue totalled more than one maximum-
- sized segment, the TCP would retransmit only the first
- segment using the original IP Identification field.
-
- 4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41
-
- A TCP receiver SHOULD NOT shrink the window, i.e., move the
- right window edge to the left. However, a sending TCP MUST
- be robust against window shrinking, which may cause the
- "useable window" (see Section 4.2.3.4) to become negative.
-
- If this happens, the sender SHOULD NOT send new data, but
- SHOULD retransmit normally the old unacknowledged data
- between SND.UNA and SND.UNA+SND.WND. The sender MAY also
- retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT
- time out the connection if data beyond the right window edge
- is not acknowledged. If the window shrinks to zero, the TCP
- MUST probe it in the standard way (see next Section).
-
- DISCUSSION:
- Many TCP implementations become confused if the window
- shrinks from the right after data has been sent into a
- larger window. Note that TCP has a heuristic to select
- the latest window update despite possible datagram
- reordering; as a result, it may ignore a window update
- with a smaller window than previously offered if
- neither the sequence number nor the acknowledgment
- number is increased.
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 91]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- 4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42
-
- Probing of zero (offered) windows MUST be supported.
-
- A TCP MAY keep its offered receive window closed
- indefinitely. As long as the receiving TCP continues to
- send acknowledgments in response to the probe segments, the
- sending TCP MUST allow the connection to stay open.
-
- DISCUSSION:
- It is extremely important to remember that ACK
- (acknowledgment) segments that contain no data are not
- reliably transmitted by TCP. If zero window probing is
- not supported, a connection may hang forever when an
- ACK segment that re-opens the window is lost.
-
- The delay in opening a zero window generally occurs
- when the receiving application stops taking data from
- its TCP. For example, consider a printer daemon
- application, stopped because the printer ran out of
- paper.
-
- The transmitting host SHOULD send the first zero-window
- probe when a zero window has existed for the retransmission
- timeout period (see Section 4.2.2.15), and SHOULD increase
- exponentially the interval between successive probes.
-
- DISCUSSION:
- This procedure minimizes delay if the zero-window
- condition is due to a lost ACK segment containing a
- window-opening update. Exponential backoff is
- recommended, possibly with some maximum interval not
- specified here. This procedure is similar to that of
- the retransmission algorithm, and it may be possible to
- combine the two procedures in the implementation.
-
- 4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8
-
- Every passive OPEN call either creates a new connection
- record in LISTEN state, or it returns an error; it MUST NOT
- affect any previously created connection record.
-
- A TCP that supports multiple concurrent users MUST provide
- an OPEN call that will functionally allow an application to
- LISTEN on a port while a connection block with the same
- local port is in SYN-SENT or SYN-RECEIVED state.
-
- DISCUSSION:
-
-
-
-Internet Engineering Task Force [Page 92]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- Some applications (e.g., SMTP servers) may need to
- handle multiple connection attempts at about the same
- time. The probability of a connection attempt failing
- is reduced by giving the application some means of
- listening for a new connection at the same time that an
- earlier connection attempt is going through the three-
- way handshake.
-
- IMPLEMENTATION:
- Acceptable implementations of concurrent opens may
- permit multiple passive OPEN calls, or they may allow
- "cloning" of LISTEN-state connections from a single
- passive OPEN call.
-
- 4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52
-
- RFC-793 specified that TCP was to request the IP layer to
- send TCP segments with TTL = 60. This is obsolete; the TTL
- value used to send TCP segments MUST be configurable. See
- Section 3.2.1.7 for discussion.
-
- 4.2.2.20 Event Processing: RFC-793 Section 3.9
-
- While it is not strictly required, a TCP SHOULD be capable
- of queueing out-of-order TCP segments. Change the "may" in
- the last sentence of the first paragraph on page 70 to
- "should".
-
- DISCUSSION:
- Some small-host implementations have omitted segment
- queueing because of limited buffer space. This
- omission may be expected to adversely affect TCP
- throughput, since loss of a single segment causes all
- later segments to appear to be "out of sequence".
-
- In general, the processing of received segments MUST be
- implemented to aggregate ACK segments whenever possible.
- For example, if the TCP is processing a series of queued
- segments, it MUST process them all before sending any ACK
- segments.
-
- Here are some detailed error corrections and notes on the
- Event Processing section of RFC-793.
-
- (a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK
- state, not CLOSING.
-
- (b) LISTEN state, check for SYN (pp. 65, 66): With a SYN
-
-
-
-Internet Engineering Task Force [Page 93]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- bit, if the security/compartment or the precedence is
- wrong for the segment, a reset is sent. The wrong form
- of reset is shown in the text; it should be:
-
- <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK>
-
-
- (c) SYN-SENT state, Check for SYN, p. 68: When the
- connection enters ESTABLISHED state, the following
- variables must be set:
- SND.WND <- SEG.WND
- SND.WL1 <- SEG.SEQ
- SND.WL2 <- SEG.ACK
-
-
- (d) Check security and precedence, p. 71: The first heading
- "ESTABLISHED STATE" should really be a list of all
- states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT-
- 1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and
- TIME-WAIT.
-
- (e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if
- the connection was initiated with a passive OPEN, then
- return this connection to the LISTEN state and return.
- Otherwise...".
-
- (f) Check ACK field, SYN-RECEIVED state, p. 72: When the
- connection enters ESTABLISHED state, the variables
- listed in (c) must be set.
-
- (g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a
- duplicate if SEG.ACK =< SND.UNA (the = was omitted).
- Similarly, the window should be updated if: SND.UNA =<
- SEG.ACK =< SND.NXT.
-
- (h) USER TIMEOUT, p. 77:
-
- It would be better to notify the application of the
- timeout rather than letting TCP force the connection
- closed. However, see also Section 4.2.3.5.
-
-
- 4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9
-
- A TCP MAY send an ACK segment acknowledging RCV.NXT when a
- valid segment arrives that is in the window but not at the
- left window edge.
-
-
-
-
-Internet Engineering Task Force [Page 94]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- DISCUSSION:
- RFC-793 (see page 74) was ambiguous about whether or
- not an ACK segment should be sent when an out-of-order
- segment was received, i.e., when SEG.SEQ was unequal to
- RCV.NXT.
-
- One reason for ACKing out-of-order segments might be to
- support an experimental algorithm known as "fast
- retransmit". With this algorithm, the sender uses the
- "redundant" ACK's to deduce that a segment has been
- lost before the retransmission timer has expired. It
- counts the number of times an ACK has been received
- with the same value of SEG.ACK and with the same right
- window edge. If more than a threshold number of such
- ACK's is received, then the segment containing the
- octets starting at SEG.ACK is assumed to have been lost
- and is retransmitted, without awaiting a timeout. The
- threshold is chosen to compensate for the maximum
- likely segment reordering in the Internet. There is
- not yet enough experience with the fast retransmit
- algorithm to determine how useful it is.
-
- 4.2.3 SPECIFIC ISSUES
-
- 4.2.3.1 Retransmission Timeout Calculation
-
- A host TCP MUST implement Karn's algorithm and Jacobson's
- algorithm for computing the retransmission timeout ("RTO").
-
- o Jacobson's algorithm for computing the smoothed round-
- trip ("RTT") time incorporates a simple measure of the
- variance [TCP:7].
-
- o Karn's algorithm for selecting RTT measurements ensures
- that ambiguous round-trip times will not corrupt the
- calculation of the smoothed round-trip time [TCP:6].
-
- This implementation also MUST include "exponential backoff"
- for successive RTO values for the same segment.
- Retransmission of SYN segments SHOULD use the same algorithm
- as data segments.
-
- DISCUSSION:
- There were two known problems with the RTO calculations
- specified in RFC-793. First, the accurate measurement
- of RTTs is difficult when there are retransmissions.
- Second, the algorithm to compute the smoothed round-
- trip time is inadequate [TCP:7], because it incorrectly
-
-
-
-Internet Engineering Task Force [Page 95]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- assumed that the variance in RTT values would be small
- and constant. These problems were solved by Karn's and
- Jacobson's algorithm, respectively.
-
- The performance increase resulting from the use of
- these improvements varies from noticeable to dramatic.
- Jacobson's algorithm for incorporating the measured RTT
- variance is especially important on a low-speed link,
- where the natural variation of packet sizes causes a
- large variation in RTT. One vendor found link
- utilization on a 9.6kb line went from 10% to 90% as a
- result of implementing Jacobson's variance algorithm in
- TCP.
-
- The following values SHOULD be used to initialize the
- estimation parameters for a new connection:
-
- (a) RTT = 0 seconds.
-
- (b) RTO = 3 seconds. (The smoothed variance is to be
- initialized to the value that will result in this RTO).
-
- The recommended upper and lower bounds on the RTO are known
- to be inadequate on large internets. The lower bound SHOULD
- be measured in fractions of a second (to accommodate high
- speed LANs) and the upper bound should be 2*MSL, i.e., 240
- seconds.
-
- DISCUSSION:
- Experience has shown that these initialization values
- are reasonable, and that in any case the Karn and
- Jacobson algorithms make TCP behavior reasonably
- insensitive to the initial parameter choices.
-
- 4.2.3.2 When to Send an ACK Segment
-
- A host that is receiving a stream of TCP data segments can
- increase efficiency in both the Internet and the hosts by
- sending fewer than one ACK (acknowledgment) segment per data
- segment received; this is known as a "delayed ACK" [TCP:5].
-
- A TCP SHOULD implement a delayed ACK, but an ACK should not
- be excessively delayed; in particular, the delay MUST be
- less than 0.5 seconds, and in a stream of full-sized
- segments there SHOULD be an ACK for at least every second
- segment.
-
- DISCUSSION:
-
-
-
-Internet Engineering Task Force [Page 96]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- A delayed ACK gives the application an opportunity to
- update the window and perhaps to send an immediate
- response. In particular, in the case of character-mode
- remote login, a delayed ACK can reduce the number of
- segments sent by the server by a factor of 3 (ACK,
- window update, and echo character all combined in one
- segment).
-
- In addition, on some large multi-user hosts, a delayed
- ACK can substantially reduce protocol processing
- overhead by reducing the total number of packets to be
- processed [TCP:5]. However, excessive delays on ACK's
- can disturb the round-trip timing and packet "clocking"
- algorithms [TCP:7].
-
- 4.2.3.3 When to Send a Window Update
-
- A TCP MUST include a SWS avoidance algorithm in the receiver
- [TCP:5].
-
- IMPLEMENTATION:
- The receiver's SWS avoidance algorithm determines when
- the right window edge may be advanced; this is
- customarily known as "updating the window". This
- algorithm combines with the delayed ACK algorithm (see
- Section 4.2.3.2) to determine when an ACK segment
- containing the current window will really be sent to
- the receiver. We use the notation of RFC-793; see
- Figures 4 and 5 in that document.
-
- The solution to receiver SWS is to avoid advancing the
- right window edge RCV.NXT+RCV.WND in small increments,
- even if data is received from the network in small
- segments.
-
- Suppose the total receive buffer space is RCV.BUFF. At
- any given moment, RCV.USER octets of this total may be
- tied up with data that has been received and
- acknowledged but which the user process has not yet
- consumed. When the connection is quiescent, RCV.WND =
- RCV.BUFF and RCV.USER = 0.
-
- Keeping the right window edge fixed as data arrives and
- is acknowledged requires that the receiver offer less
- than its full buffer space, i.e., the receiver must
- specify a RCV.WND that keeps RCV.NXT+RCV.WND constant
- as RCV.NXT increases. Thus, the total buffer space
- RCV.BUFF is generally divided into three parts:
-
-
-
-Internet Engineering Task Force [Page 97]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
-
- |<------- RCV.BUFF ---------------->|
- 1 2 3
- ----|---------|------------------|------|----
- RCV.NXT ^
- (Fixed)
-
- 1 - RCV.USER = data received but not yet consumed;
- 2 - RCV.WND = space advertised to sender;
- 3 - Reduction = space available but not yet
- advertised.
-
-
- The suggested SWS avoidance algorithm for the receiver
- is to keep RCV.NXT+RCV.WND fixed until the reduction
- satisfies:
-
- RCV.BUFF - RCV.USER - RCV.WND >=
-
- min( Fr * RCV.BUFF, Eff.snd.MSS )
-
- where Fr is a fraction whose recommended value is 1/2,
- and Eff.snd.MSS is the effective send MSS for the
- connection (see Section 4.2.2.6). When the inequality
- is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.
-
- Note that the general effect of this algorithm is to
- advance RCV.WND in increments of Eff.snd.MSS (for
- realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2).
- Note also that the receiver must use its own
- Eff.snd.MSS, assuming it is the same as the sender's.
-
- 4.2.3.4 When to Send Data
-
- A TCP MUST include a SWS avoidance algorithm in the sender.
-
- A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
- coalesce short segments. However, there MUST be a way for
- an application to disable the Nagle algorithm on an
- individual connection. In all cases, sending data is also
- subject to the limitation imposed by the Slow Start
- algorithm (Section 4.2.2.15).
-
- DISCUSSION:
- The Nagle algorithm is generally as follows:
-
- If there is unacknowledged data (i.e., SND.NXT >
- SND.UNA), then the sending TCP buffers all user
-
-
-
-Internet Engineering Task Force [Page 98]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- data (regardless of the PSH bit), until the
- outstanding data has been acknowledged or until
- the TCP can send a full-sized segment (Eff.snd.MSS
- bytes; see Section 4.2.2.6).
-
- Some applications (e.g., real-time display window
- updates) require that the Nagle algorithm be turned
- off, so small data segments can be streamed out at the
- maximum rate.
-
- IMPLEMENTATION:
- The sender's SWS avoidance algorithm is more difficult
- than the receivers's, because the sender does not know
- (directly) the receiver's total buffer space RCV.BUFF.
- An approach which has been found to work well is for
- the sender to calculate Max(SND.WND), the maximum send
- window it has seen so far on the connection, and to use
- this value as an estimate of RCV.BUFF. Unfortunately,
- this can only be an estimate; the receiver may at any
- time reduce the size of RCV.BUFF. To avoid a resulting
- deadlock, it is necessary to have a timeout to force
- transmission of data, overriding the SWS avoidance
- algorithm. In practice, this timeout should seldom
- occur.
-
- The "useable window" [TCP:5] is:
-
- U = SND.UNA + SND.WND - SND.NXT
-
- i.e., the offered window less the amount of data sent
- but not acknowledged. If D is the amount of data
- queued in the sending TCP but not yet sent, then the
- following set of rules is recommended.
-
- Send data:
-
- (1) if a maximum-sized segment can be sent, i.e, if:
-
- min(D,U) >= Eff.snd.MSS;
-
-
- (2) or if the data is pushed and all queued data can
- be sent now, i.e., if:
-
- [SND.NXT = SND.UNA and] PUSHED and D <= U
-
- (the bracketed condition is imposed by the Nagle
- algorithm);
-
-
-
-Internet Engineering Task Force [Page 99]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- (3) or if at least a fraction Fs of the maximum window
- can be sent, i.e., if:
-
- [SND.NXT = SND.UNA and]
-
- min(D.U) >= Fs * Max(SND.WND);
-
-
- (4) or if data is PUSHed and the override timeout
- occurs.
-
- Here Fs is a fraction whose recommended value is 1/2.
- The override timeout should be in the range 0.1 - 1.0
- seconds. It may be convenient to combine this timer
- with the timer used to probe zero windows (Section
- 4.2.2.17).
-
- Finally, note that the SWS avoidance algorithm just
- specified is to be used instead of the sender-side
- algorithm contained in [TCP:5].
-
- 4.2.3.5 TCP Connection Failures
-
- Excessive retransmission of the same segment by TCP
- indicates some failure of the remote host or the Internet
- path. This failure may be of short or long duration. The
- following procedure MUST be used to handle excessive
- retransmissions of data segments [IP:11]:
-
- (a) There are two thresholds R1 and R2 measuring the amount
- of retransmission that has occurred for the same
- segment. R1 and R2 might be measured in time units or
- as a count of retransmissions.
-
- (b) When the number of transmissions of the same segment
- reaches or exceeds threshold R1, pass negative advice
- (see Section 3.3.1.4) to the IP layer, to trigger
- dead-gateway diagnosis.
-
- (c) When the number of transmissions of the same segment
- reaches a threshold R2 greater than R1, close the
- connection.
-
- (d) An application MUST be able to set the value for R2 for
- a particular connection. For example, an interactive
- application might set R2 to "infinity," giving the user
- control over when to disconnect.
-
-
-
-
-Internet Engineering Task Force [Page 100]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- (d) TCP SHOULD inform the application of the delivery
- problem (unless such information has been disabled by
- the application; see Section 4.2.4.1), when R1 is
- reached and before R2. This will allow a remote login
- (User Telnet) application program to inform the user,
- for example.
-
- The value of R1 SHOULD correspond to at least 3
- retransmissions, at the current RTO. The value of R2 SHOULD
- correspond to at least 100 seconds.
-
- An attempt to open a TCP connection could fail with
- excessive retransmissions of the SYN segment or by receipt
- of a RST segment or an ICMP Port Unreachable. SYN
- retransmissions MUST be handled in the general way just
- described for data retransmissions, including notification
- of the application layer.
-
- However, the values of R1 and R2 may be different for SYN
- and data segments. In particular, R2 for a SYN segment MUST
- be set large enough to provide retransmission of the segment
- for at least 3 minutes. The application can close the
- connection (i.e., give up on the open attempt) sooner, of
- course.
-
- DISCUSSION:
- Some Internet paths have significant setup times, and
- the number of such paths is likely to increase in the
- future.
-
- 4.2.3.6 TCP Keep-Alives
-
- Implementors MAY include "keep-alives" in their TCP
- implementations, although this practice is not universally
- accepted. If keep-alives are included, the application MUST
- be able to turn them on or off for each TCP connection, and
- they MUST default to off.
-
- Keep-alive packets MUST only be sent when no data or
- acknowledgement packets have been received for the
- connection within an interval. This interval MUST be
- configurable and MUST default to no less than two hours.
-
- It is extremely important to remember that ACK segments that
- contain no data are not reliably transmitted by TCP.
- Consequently, if a keep-alive mechanism is implemented it
- MUST NOT interpret failure to respond to any specific probe
- as a dead connection.
-
-
-
-Internet Engineering Task Force [Page 101]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- An implementation SHOULD send a keep-alive segment with no
- data; however, it MAY be configurable to send a keep-alive
- segment containing one garbage octet, for compatibility with
- erroneous TCP implementations.
-
- DISCUSSION:
- A "keep-alive" mechanism periodically probes the other
- end of a connection when the connection is otherwise
- idle, even when there is no data to be sent. The TCP
- specification does not include a keep-alive mechanism
- because it could: (1) cause perfectly good connections
- to break during transient Internet failures; (2)
- consume unnecessary bandwidth ("if no one is using the
- connection, who cares if it is still good?"); and (3)
- cost money for an Internet path that charges for
- packets.
-
- Some TCP implementations, however, have included a
- keep-alive mechanism. To confirm that an idle
- connection is still active, these implementations send
- a probe segment designed to elicit a response from the
- peer TCP. Such a segment generally contains SEG.SEQ =
- SND.NXT-1 and may or may not contain one garbage octet
- of data. Note that on a quiet connection SND.NXT =
- RCV.NXT, so that this SEG.SEQ will be outside the
- window. Therefore, the probe causes the receiver to
- return an acknowledgment segment, confirming that the
- connection is still live. If the peer has dropped the
- connection due to a network partition or a crash, it
- will respond with a RST instead of an acknowledgment
- segment.
-
- Unfortunately, some misbehaved TCP implementations fail
- to respond to a segment with SEG.SEQ = SND.NXT-1 unless
- the segment contains data. Alternatively, an
- implementation could determine whether a peer responded
- correctly to keep-alive packets with no garbage data
- octet.
-
- A TCP keep-alive mechanism should only be invoked in
- server applications that might otherwise hang
- indefinitely and consume resources unnecessarily if a
- client crashes or aborts a connection during a network
- failure.
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 102]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- 4.2.3.7 TCP Multihoming
-
- If an application on a multihomed host does not specify the
- local IP address when actively opening a TCP connection,
- then the TCP MUST ask the IP layer to select a local IP
- address before sending the (first) SYN. See the function
- GET_SRCADDR() in Section 3.4.
-
- At all other times, a previous segment has either been sent
- or received on this connection, and TCP MUST use the same
- local address is used that was used in those previous
- segments.
-
- 4.2.3.8 IP Options
-
- When received options are passed up to TCP from the IP
- layer, TCP MUST ignore options that it does not understand.
-
- A TCP MAY support the Time Stamp and Record Route options.
-
- An application MUST be able to specify a source route when
- it actively opens a TCP connection, and this MUST take
- precedence over a source route received in a datagram.
-
- When a TCP connection is OPENed passively and a packet
- arrives with a completed IP Source Route option (containing
- a return route), TCP MUST save the return route and use it
- for all segments sent on this connection. If a different
- source route arrives in a later segment, the later
- definition SHOULD override the earlier one.
-
- 4.2.3.9 ICMP Messages
-
- TCP MUST act on an ICMP error message passed up from the IP
- layer, directing it to the connection that created the
- error. The necessary demultiplexing information can be
- found in the IP header contained within the ICMP message.
-
- o Source Quench
-
- TCP MUST react to a Source Quench by slowing
- transmission on the connection. The RECOMMENDED
- procedure is for a Source Quench to trigger a "slow
- start," as if a retransmission timeout had occurred.
-
- o Destination Unreachable -- codes 0, 1, 5
-
- Since these Unreachable messages indicate soft error
-
-
-
-Internet Engineering Task Force [Page 103]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- conditions, TCP MUST NOT abort the connection, and it
- SHOULD make the information available to the
- application.
-
- DISCUSSION:
- TCP could report the soft error condition directly
- to the application layer with an upcall to the
- ERROR_REPORT routine, or it could merely note the
- message and report it to the application only when
- and if the TCP connection times out.
-
- o Destination Unreachable -- codes 2-4
-
- These are hard error conditions, so TCP SHOULD abort
- the connection.
-
- o Time Exceeded -- codes 0, 1
-
- This should be handled the same way as Destination
- Unreachable codes 0, 1, 5 (see above).
-
- o Parameter Problem
-
- This should be handled the same way as Destination
- Unreachable codes 0, 1, 5 (see above).
-
-
- 4.2.3.10 Remote Address Validation
-
- A TCP implementation MUST reject as an error a local OPEN
- call for an invalid remote IP address (e.g., a broadcast or
- multicast address).
-
- An incoming SYN with an invalid source address must be
- ignored either by TCP or by the IP layer (see Section
- 3.2.1.3).
-
- A TCP implementation MUST silently discard an incoming SYN
- segment that is addressed to a broadcast or multicast
- address.
-
- 4.2.3.11 TCP Traffic Patterns
-
- IMPLEMENTATION:
- The TCP protocol specification [TCP:1] gives the
- implementor much freedom in designing the algorithms
- that control the message flow over the connection --
- packetizing, managing the window, sending
-
-
-
-Internet Engineering Task Force [Page 104]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- acknowledgments, etc. These design decisions are
- difficult because a TCP must adapt to a wide range of
- traffic patterns. Experience has shown that a TCP
- implementor needs to verify the design on two extreme
- traffic patterns:
-
- o Single-character Segments
-
- Even if the sender is using the Nagle Algorithm,
- when a TCP connection carries remote login traffic
- across a low-delay LAN the receiver will generally
- get a stream of single-character segments. If
- remote terminal echo mode is in effect, the
- receiver's system will generally echo each
- character as it is received.
-
- o Bulk Transfer
-
- When TCP is used for bulk transfer, the data
- stream should be made up (almost) entirely of
- segments of the size of the effective MSS.
- Although TCP uses a sequence number space with
- byte (octet) granularity, in bulk-transfer mode
- its operation should be as if TCP used a sequence
- space that counted only segments.
-
- Experience has furthermore shown that a single TCP can
- effectively and efficiently handle these two extremes.
-
- The most important tool for verifying a new TCP
- implementation is a packet trace program. There is a
- large volume of experience showing the importance of
- tracing a variety of traffic patterns with other TCP
- implementations and studying the results carefully.
-
-
- 4.2.3.12 Efficiency
-
- IMPLEMENTATION:
- Extensive experience has led to the following
- suggestions for efficient implementation of TCP:
-
- (a) Don't Copy Data
-
- In bulk data transfer, the primary CPU-intensive
- tasks are copying data from one place to another
- and checksumming the data. It is vital to
- minimize the number of copies of TCP data. Since
-
-
-
-Internet Engineering Task Force [Page 105]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- the ultimate speed limitation may be fetching data
- across the memory bus, it may be useful to combine
- the copy with checksumming, doing both with a
- single memory fetch.
-
- (b) Hand-Craft the Checksum Routine
-
- A good TCP checksumming routine is typically two
- to five times faster than a simple and direct
- implementation of the definition. Great care and
- clever coding are often required and advisable to
- make the checksumming code "blazing fast". See
- [TCP:10].
-
- (c) Code for the Common Case
-
- TCP protocol processing can be complicated, but
- for most segments there are only a few simple
- decisions to be made. Per-segment processing will
- be greatly speeded up by coding the main line to
- minimize the number of decisions in the most
- common case.
-
-
- 4.2.4 TCP/APPLICATION LAYER INTERFACE
-
- 4.2.4.1 Asynchronous Reports
-
- There MUST be a mechanism for reporting soft TCP error
- conditions to the application. Generically, we assume this
- takes the form of an application-supplied ERROR_REPORT
- routine that may be upcalled [INTRO:7] asynchronously from
- the transport layer:
-
- ERROR_REPORT(local connection name, reason, subreason)
-
- The precise encoding of the reason and subreason parameters
- is not specified here. However, the conditions that are
- reported asynchronously to the application MUST include:
-
- * ICMP error message arrived (see 4.2.3.9)
-
- * Excessive retransmissions (see 4.2.3.5)
-
- * Urgent pointer advance (see 4.2.2.4).
-
- However, an application program that does not want to
- receive such ERROR_REPORT calls SHOULD be able to
-
-
-
-Internet Engineering Task Force [Page 106]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- effectively disable these calls.
-
- DISCUSSION:
- These error reports generally reflect soft errors that
- can be ignored without harm by many applications. It
- has been suggested that these error report calls should
- default to "disabled," but this is not required.
-
- 4.2.4.2 Type-of-Service
-
- The application layer MUST be able to specify the Type-of-
- Service (TOS) for segments that are sent on a connection.
- It not required, but the application SHOULD be able to
- change the TOS during the connection lifetime. TCP SHOULD
- pass the current TOS value without change to the IP layer,
- when it sends segments on the connection.
-
- The TOS will be specified independently in each direction on
- the connection, so that the receiver application will
- specify the TOS used for ACK segments.
-
- TCP MAY pass the most recently received TOS up to the
- application.
-
- DISCUSSION
- Some applications (e.g., SMTP) change the nature of
- their communication during the lifetime of a
- connection, and therefore would like to change the TOS
- specification.
-
- Note also that the OPEN call specified in RFC-793
- includes a parameter ("options") in which the caller
- can specify IP options such as source route, record
- route, or timestamp.
-
- 4.2.4.3 Flush Call
-
- Some TCP implementations have included a FLUSH call, which
- will empty the TCP send queue of any data for which the user
- has issued SEND calls but which is still to the right of the
- current send window. That is, it flushes as much queued
- send data as possible without losing sequence number
- synchronization. This is useful for implementing the "abort
- output" function of Telnet.
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 107]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- 4.2.4.4 Multihoming
-
- The user interface outlined in sections 2.7 and 3.8 of RFC-
- 793 needs to be extended for multihoming. The OPEN call
- MUST have an optional parameter:
-
- OPEN( ... [local IP address,] ... )
-
- to allow the specification of the local IP address.
-
- DISCUSSION:
- Some TCP-based applications need to specify the local
- IP address to be used to open a particular connection;
- FTP is an example.
-
- IMPLEMENTATION:
- A passive OPEN call with a specified "local IP address"
- parameter will await an incoming connection request to
- that address. If the parameter is unspecified, a
- passive OPEN will await an incoming connection request
- to any local IP address, and then bind the local IP
- address of the connection to the particular address
- that is used.
-
- For an active OPEN call, a specified "local IP address"
- parameter will be used for opening the connection. If
- the parameter is unspecified, the networking software
- will choose an appropriate local IP address (see
- Section 3.3.4.2) for the connection
-
- 4.2.5 TCP REQUIREMENT SUMMARY
-
- | | | | |S| |
- | | | | |H| |F
- | | | | |O|M|o
- | | |S| |U|U|o
- | | |H| |L|S|t
- | |M|O| |D|T|n
- | |U|U|M| | |o
- | |S|L|A|N|N|t
- | |T|D|Y|O|O|t
-FEATURE |SECTION | | | |T|T|e
--------------------------------------------------|--------|-|-|-|-|-|--
- | | | | | | |
-Push flag | | | | | | |
- Aggregate or queue un-pushed data |4.2.2.2 | | |x| | |
- Sender collapse successive PSH flags |4.2.2.2 | |x| | | |
- SEND call can specify PUSH |4.2.2.2 | | |x| | |
-
-
-
-Internet Engineering Task Force [Page 108]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x|
- If cannot: PSH last segment |4.2.2.2 |x| | | | |
- Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1
- Send max size segment when possible |4.2.2.2 | |x| | | |
- | | | | | | |
-Window | | | | | | |
- Treat as unsigned number |4.2.2.3 |x| | | | |
- Handle as 32-bit number |4.2.2.3 | |x| | | |
- Shrink window from right |4.2.2.16| | | |x| |
- Robust against shrinking window |4.2.2.16|x| | | | |
- Receiver's window closed indefinitely |4.2.2.17| | |x| | |
- Sender probe zero window |4.2.2.17|x| | | | |
- First probe after RTO |4.2.2.17| |x| | | |
- Exponential backoff |4.2.2.17| |x| | | |
- Allow window stay zero indefinitely |4.2.2.17|x| | | | |
- Sender timeout OK conn with zero wind |4.2.2.17| | | | |x|
- | | | | | | |
-Urgent Data | | | | | | |
- Pointer points to last octet |4.2.2.4 |x| | | | |
- Arbitrary length urgent data sequence |4.2.2.4 |x| | | | |
- Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1
- ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1
- | | | | | | |
-TCP Options | | | | | | |
- Receive TCP option in any segment |4.2.2.5 |x| | | | |
- Ignore unsupported options |4.2.2.5 |x| | | | |
- Cope with illegal option length |4.2.2.5 |x| | | | |
- Implement sending & receiving MSS option |4.2.2.6 |x| | | | |
- Send MSS option unless 536 |4.2.2.6 | |x| | | |
- Send MSS option always |4.2.2.6 | | |x| | |
- Send-MSS default is 536 |4.2.2.6 |x| | | | |
- Calculate effective send seg size |4.2.2.6 |x| | | | |
- | | | | | | |
-TCP Checksums | | | | | | |
- Sender compute checksum |4.2.2.7 |x| | | | |
- Receiver check checksum |4.2.2.7 |x| | | | |
- | | | | | | |
-Use clock-driven ISN selection |4.2.2.9 |x| | | | |
- | | | | | | |
-Opening Connections | | | | | | |
- Support simultaneous open attempts |4.2.2.10|x| | | | |
- SYN-RCVD remembers last state |4.2.2.11|x| | | | |
- Passive Open call interfere with others |4.2.2.18| | | | |x|
- Function: simultan. LISTENs for same port |4.2.2.18|x| | | | |
- Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | |
- Otherwise, use local addr of conn. |4.2.3.7 |x| | | | |
- OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x|
- Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |
-
-
-
-Internet Engineering Task Force [Page 109]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- | | | | | | |
-Closing Connections | | | | | | |
- RST can contain data |4.2.2.12| |x| | | |
- Inform application of aborted conn |4.2.2.13|x| | | | |
- Half-duplex close connections |4.2.2.13| | |x| | |
- Send RST to indicate data lost |4.2.2.13| |x| | | |
- In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | |
- Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | |
- | | | | | | |
-Retransmissions | | | | | | |
- Jacobson Slow Start algorithm |4.2.2.15|x| | | | |
- Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | |
- Retransmit with same IP ident |4.2.2.15| | |x| | |
- Karn's algorithm |4.2.3.1 |x| | | | |
- Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | |
- Exponential backoff |4.2.3.1 |x| | | | |
- SYN RTO calc same as data |4.2.3.1 | |x| | | |
- Recommended initial values and bounds |4.2.3.1 | |x| | | |
- | | | | | | |
-Generating ACK's: | | | | | | |
- Queue out-of-order segments |4.2.2.20| |x| | | |
- Process all Q'd before send ACK |4.2.2.20|x| | | | |
- Send ACK for out-of-order segment |4.2.2.21| | |x| | |
- Delayed ACK's |4.2.3.2 | |x| | | |
- Delay < 0.5 seconds |4.2.3.2 |x| | | | |
- Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | |
- Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | |
- | | | | | | |
-Sending data | | | | | | |
- Configurable TTL |4.2.2.19|x| | | | |
- Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | |
- Nagle algorithm |4.2.3.4 | |x| | | |
- Application can disable Nagle algorithm |4.2.3.4 |x| | | | |
- | | | | | | |
-Connection Failures: | | | | | | |
- Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | |
- Close connection on R2 retxs |4.2.3.5 |x| | | | |
- ALP can set R2 |4.2.3.5 |x| | | | |1
- Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1
- Recommended values for R1, R2 |4.2.3.5 | |x| | | |
- Same mechanism for SYNs |4.2.3.5 |x| | | | |
- R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | |
- | | | | | | |
-Send Keep-alive Packets: |4.2.3.6 | | |x| | |
- - Application can request |4.2.3.6 |x| | | | |
- - Default is "off" |4.2.3.6 |x| | | | |
- - Only send if idle for interval |4.2.3.6 |x| | | | |
- - Interval configurable |4.2.3.6 |x| | | | |
-
-
-
-Internet Engineering Task Force [Page 110]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- - Default at least 2 hrs. |4.2.3.6 |x| | | | |
- - Tolerant of lost ACK's |4.2.3.6 |x| | | | |
- | | | | | | |
-IP Options | | | | | | |
- Ignore options TCP doesn't understand |4.2.3.8 |x| | | | |
- Time Stamp support |4.2.3.8 | | |x| | |
- Record Route support |4.2.3.8 | | |x| | |
- Source Route: | | | | | | |
- ALP can specify |4.2.3.8 |x| | | | |1
- Overrides src rt in datagram |4.2.3.8 |x| | | | |
- Build return route from src rt |4.2.3.8 |x| | | | |
- Later src route overrides |4.2.3.8 | |x| | | |
- | | | | | | |
-Receiving ICMP Messages from IP |4.2.3.9 |x| | | | |
- Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | |
- Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x|
- Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | |
- Source Quench => slow start |4.2.3.9 | |x| | | |
- Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | |
- Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | |
- | | | | | | |
-Address Validation | | | | | | |
- Reject OPEN call to invalid IP address |4.2.3.10|x| | | | |
- Reject SYN from invalid IP address |4.2.3.10|x| | | | |
- Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | |
- | | | | | | |
-TCP/ALP Interface Services | | | | | | |
- Error Report mechanism |4.2.4.1 |x| | | | |
- ALP can disable Error Report Routine |4.2.4.1 | |x| | | |
- ALP can specify TOS for sending |4.2.4.2 |x| | | | |
- Passed unchanged to IP |4.2.4.2 | |x| | | |
- ALP can change TOS during connection |4.2.4.2 | |x| | | |
- Pass received TOS up to ALP |4.2.4.2 | | |x| | |
- FLUSH call |4.2.4.3 | | |x| | |
- Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | |
--------------------------------------------------|--------|-|-|-|-|-|--
--------------------------------------------------|--------|-|-|-|-|-|--
-
-FOOTNOTES:
-
-(1) "ALP" means Application-Layer program.
-
-
-
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 111]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
-5. REFERENCES
-
-INTRODUCTORY REFERENCES
-
-
-[INTRO:1] "Requirements for Internet Hosts -- Application and Support,"
- IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123,
- October 1989.
-
-[INTRO:2] "Requirements for Internet Gateways," R. Braden and J.
- Postel, RFC-1009, June 1987.
-
-[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006,
- (three volumes), SRI International, December 1985.
-
-[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel,
- RFC-1011, May 1987.
-
- This document is republished periodically with new RFC numbers; the
- latest version must be used.
-
-[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J.
- Postel, RFC-980, March 1986.
-
-[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May
- 1987.
-
- This document is republished periodically with new RFC numbers; the
- latest version must be used.
-
-[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D.
- Clark, RFC-817, July 1982.
-
-[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM
- SOSP, Orcas Island, Washington, December 1985.
-
-
-Secondary References:
-
-
-[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf
- and R. Kahn, IEEE Transactions on Communication, May 1974.
-
-[INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.
- Cohen, Computer Networks, Vol. 5, No. 4, July 1981.
-
-[INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel,
- R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,
-
-
-
-Internet Engineering Task Force [Page 112]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- March 1985. Also in: IEEE Communications Magazine, March 1985.
- Also available as ISI-RS-85-153.
-
-[INTRO:12] "Final Text of DIS8473, Protocol for Providing the
- Connectionless Mode Network Service," ANSI, published as RFC-994,
- March 1986.
-
-[INTRO:13] "End System to Intermediate System Routing Exchange
- Protocol," ANSI X3S3.3, published as RFC-995, April 1986.
-
-
-LINK LAYER REFERENCES
-
-
-[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893,
- April 1984.
-
-[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826,
- November 1982.
-
-[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet
- Networks," C. Hornig, RFC-894, April 1984.
-
-[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802
- "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.
-
- This RFC contains a great deal of information of importance to
- Internet implementers planning to use IEEE 802 networks.
-
-
-IP LAYER REFERENCES
-
-
-[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.
-
-[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792,
- September 1981.
-
-[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel,
- RFC-950, August 1985.
-
-[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112,
- August 1989.
-
-[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department
- of Defense, August 1983.
-
- This specification, as amended by RFC-963, is intended to describe
-
-
-
-Internet Engineering Task Force [Page 113]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
- the Internet Protocol but has some serious omissions (e.g., the
- mandatory subnet extension [IP:3] and the optional multicasting
- extension [IP:4]). It is also out of date. If there is a
- conflict, RFC-791, RFC-792, and RFC-950 must be taken as
- authoritative, while the present document is authoritative over
- all.
-
-[IP:6] "Some Problems with the Specification of the Military Standard
- Internet Protocol," D. Sidhu, RFC-963, November 1985.
-
-[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel,
- RFC-879, November 1983.
-
- Discusses and clarifies the relationship between the TCP Maximum
- Segment Size option and the IP datagram size.
-
-[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108,
- October 1989.
-
-[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM
- SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol.
- 17, no. 5.
-
- This useful paper discusses the problems created by Internet
- fragmentation and presents alternative solutions.
-
-[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July
- 1982.
-
- This and the following paper should be read by every implementor.
-
-[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.
-
-SECONDARY IP REFERENCES:
-
-
-[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.
- Mogul, RFC-922, October 1984.
-
-[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July
- 1982.
-
-[IP:14] "Something a Host Could Do with Source Quench: The Source Quench
- Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July
- 1987.
-
- This RFC first described directed broadcast addresses. However,
- the bulk of the RFC is concerned with gateways, not hosts.
-
-
-
-Internet Engineering Task Force [Page 114]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
-UDP REFERENCES:
-
-
-[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.
-
-
-TCP REFERENCES:
-
-
-[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September
- 1981.
-
-
-[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of
- Defense, August 1984.
-
- This specification as amended by RFC-964 is intended to describe
- the same protocol as RFC-793 [TCP:1]. If there is a conflict,
- RFC-793 takes precedence, and the present document is authoritative
- over both.
-
-
-[TCP:3] "Some Problems with the Specification of the Military Standard
- Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964,
- November 1985.
-
-
-[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel,
- RFC-879, November 1983.
-
-
-[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
- July 1982.
-
-
-[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM
- SIGCOMM-87, August 1987.
-
-
-[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88,
- August 1988.
-
-
-SECONDARY TCP REFERENCES:
-
-
-[TCP:8] "Modularity and Efficiency in Protocol Implementation," D.
- Clark, RFC-817, July 1982.
-
-
-
-Internet Engineering Task Force [Page 115]
-
-
-
-
-RFC1122 TRANSPORT LAYER -- TCP October 1989
-
-
-[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.
-
-
-[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C.
- Partridge, RFC-1071, September 1988.
-
-
-[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden,
- RFC-1072, October 1988.
-
-
-Security Considerations
-
- There are many security issues in the communication layers of host
- software, but a full discussion is beyond the scope of this RFC.
-
- The Internet architecture generally provides little protection
- against spoofing of IP source addresses, so any security mechanism
- that is based upon verifying the IP source address of a datagram
- should be treated with suspicion. However, in restricted
- environments some source-address checking may be possible. For
- example, there might be a secure LAN whose gateway to the rest of the
- Internet discarded any incoming datagram with a source address that
- spoofed the LAN address. In this case, a host on the LAN could use
- the source address to test for local vs. remote source. This problem
- is complicated by source routing, and some have suggested that
- source-routed datagram forwarding by hosts (see Section 3.3.5) should
- be outlawed for security reasons.
-
- Security-related issues are mentioned in sections concerning the IP
- Security option (Section 3.2.1.8), the ICMP Parameter Problem message
- (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and
- reserved TCP ports (Section 4.2.2.1).
-
-Author's Address
-
- Robert Braden
- USC/Information Sciences Institute
- 4676 Admiralty Way
- Marina del Rey, CA 90292-6695
-
- Phone: (213) 822 1511
-
- EMail: Braden@ISI.EDU
-
-
-
-
-
-
-
-Internet Engineering Task Force [Page 116]
-
OpenPOWER on IntegriCloud