Add IPv6 related docs.

Reviewed by: phantom
author: shin <shin@FreeBSD.org> 2000-02-26 19:44:12 +0000
committer: shin <shin@FreeBSD.org> 2000-02-26 19:44:12 +0000
commit: 9b8b2074975c685e02cccd4939abd1200f82d010 (patch)
tree: 8de4cc9d2457db0a62abeac5dee6fb8e363579d7 /share/doc
parent: 60843485ad65175ee23a3e9a01666a9cf467602f (diff)
download: FreeBSD-src-9b8b2074975c685e02cccd4939abd1200f82d010.zip
FreeBSD-src-9b8b2074975c685e02cccd4939abd1200f82d010.tar.gz
1 files changed, 998 insertions, 0 deletions
diff --git a/share/doc/IPv6/IMPLEMENTATION b/share/doc/IPv6/IMPLEMENTATION
new file mode 100644
index 0000000..5909a72
--- /dev/null
+++ b/share/doc/IPv6/IMPLEMENTATION
@@ -0,0 +1,998 @@
+			Implementation Note
+
+			KAME Project
+			http://www.kame.net/
+			$FreeBSD$
+
+1. IPv6
+
+1.1 Conformance
+
+The KAME kit conforms, or tries to conform, to the latest set of IPv6
+specifications.  For future reference we list some of the relevant documents
+below (NOTE: this is not a complete list - this is too hard to maintain...).
+For details please refer to specific chapter in the document, RFCs, manpages
+come with KAME, or comments in the source code.
+
+Conformance tests have been performed on the KAME STABLE kit
+at TAHI project.  Results can be viewed at http://www.tahi.org/report/KAME/.
+We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
+in the past, with our past snapshots.
+
+RFC1639: FTP Operation Over Big Address Records (FOOBAR)
+    * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
+      then RFC1639 if failed.
+RFC1886: DNS Extensions to support IPv6
+RFC1933: Transition Mechanisms for IPv6 Hosts and Routers
+    * IPv4 compatible address is not supported.
+    * automatic tunneling (4.3) is not supported.
+    * "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
+      and it covers "configured tunnel" described in the spec.
+      See 1.5 in this document for details.
+RFC1981: Path MTU Discovery for IPv6
+RFC2080: RIPng for IPv6
+    * KAME-supplied route6d, bgpd and hroute6d support this.
+RFC2283: Multiprotocol Extensions for BGP-4
+    * so-called "BGP4+".
+    * KAME-supplied bgpd supports this.
+RFC2292: Advanced Sockets API for IPv6
+    * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI.
+RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
+    * RFC2362 defines packet formats for PIM-SM.  draft-ietf-pim-ipv6-01.txt
+      is written based on this.
+RFC2373: IPv6 Addressing Architecture
+    * KAME supports node required addresses, and conforms to the scope
+      requirement.
+RFC2374: An IPv6 Aggregatable Global Unicast Address Format
+    * KAME supports 64-bit length of Interface ID.
+RFC2375: IPv6 Multicast Address Assignments
+    * Userland applications use the well-known addresses assigned in the RFC.
+RFC2428: FTP Extensions for IPv6 and NATs
+    * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
+      then RFC1639 if failed.
+RFC2460: IPv6 specification
+RFC2461: Neighbor discovery for IPv6
+    * See 1.2 in this document for details.
+RFC2462: IPv6 Stateless Address Autoconfiguration
+    * See 1.4 in this document for details.
+RFC2463: ICMPv6 for IPv6 specification
+    * See 1.8 in this document for details.
+RFC2464: Transmission of IPv6 Packets over Ethernet Networks
+RFC2465: MIB for IPv6: Textual Conventions and General Group
+    * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
+      support is provided as patchkit for ucd-snmp.
+RFC2466: MIB for IPv6: ICMPv6 group
+    * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
+      support is provided as patchkit for ucd-snmp.
+RFC2467: Transmission of IPv6 Packets over FDDI Networks
+RFC2472: IPv6 over PPP
+RFC2492: IPv6 over ATM Networks
+    * only PVC is supported.
+RFC2497: Transmission of IPv6 packet over ARCnet Networks
+RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
+RFC2553: Basic Socket Interface Extensions for IPv6
+    * IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
+      socket (3.8) are supported.
+      see 1.12 in this document for details.
+RFC2675: IPv6 Jumbograms
+    * See 1.7 in this document for details.
+RFC2710: Multicast Listener Discovery for IPv6
+RFC2711: IPv6 router alert option
+draft-ietf-ipngwg-router-renum-08: Router renumbering for IPv6
+draft-ietf-ipngwg-icmp-namelookups-02: IPv6 Name Lookups Through ICMP
+draft-ietf-ipngwg-icmp-name-lookups-03: IPv6 Name Lookups Through ICMP
+draft-ietf-pim-ipv6-01.txt: PIM for IPv6
+    * pim6dd implements dense mode.  pim6sd implements sparse mode.
+draft-ietf-dhc-dhcpv6-14.txt: DHCPv6
+draft-ietf-dhc-v6exts-11.txt: Extensions for DHCPv6
+    * kame/dhcp6 has test implementation, which will not be compiled in
+      default compilation.
+draft-itojun-ipv6-tcp-to-anycast-00:
+	Disconnecting TCP connection toward IPv6 anycast address
+draft-yamamoto-wideipv6-comm-model-00
+    * See 1.6 in this document for details.
+draft-ietf-ipngwg-scopedaddr-format-00.txt:
+	An Extension of Format for IPv6 Scoped Addresses
+
+1.2 Neighbor Discovery
+
+Neighbor Discovery is fairly stable.  Currently Address Resolution,
+Duplicated Address Detection, and Neighbor Unreachability Detection
+are supported.  In the near future we will be adding Proxy Neighbor
+Advertisement support in the kernel and Unsolicited Neighbor Advertisement
+transmission command as admin tool.
+
+If DAD fails, the address will be marked "duplicated" and message will be
+generated to syslog (and usually to console).  The "duplicated" mark
+can be checked with ifconfig.  It is administrators' responsibility to check
+for and recover from DAD failures.
+The behavior should be improved in the near future.
+
+Some of the network driver loops multicast packets back to itself,
+even if instructed not to do so (especially in promiscuous mode).
+In such cases DAD may fail, because DAD engine sees inbound NS packet
+(actually from the node itself) and considers it as a sign of duplicate.
+You may want to look at #if condition marked "heuristics" in
+sys/netinet6/nd6_nbr.c:nd6_dad_timer() as workaround (note that the code
+fragment in "heuristics" section is not spec conformant).
+
+Neighbor Discovery specification (RFC2461) does not talk about neighbor
+cache handling in the following cases:
+(1) when there was no neighbor cache entry, node received unsolicited
+    RS/NS/NA/redirect packet without link-layer address
+(2) neighbor cache handling on medium without link-layer address
+    (we need a neighbor cache entry for IsRouter bit)
+For (1), we implemented workaround based on discussions on IETF ipngwg mailing
+list.  For more details, see the comments in the source code and email
+thread started from (IPng 7155), dated Feb 6 1999.
+
+IPv6 on-link determination rule (RFC2461) is quite different from assumptions
+in BSD network code.  At this moment, KAME does not implement on-link
+determination rule when default router list is empty (RFC2461, section 5.2,
+last sentence in 2nd paragraph - note that the spec misuse the word "host"
+and "node" in several places in the section).
+
+To avoid possible DoS attacks and infinite loops, KAME stack will accept
+only 10 options on ND packet.  Therefore, if you have 20 prefix options
+attached to RA, only the first 10 prefixes will be recognized.
+If this troubles you, please contact KAME team and/or modify
+nd6_maxndopt in sys/netinet6/nd6.c.  If there are high demands we may
+provide sysctl knob for the variable.
+
+1.3 Scope Index
+
+IPv6 uses scoped addresses.  Therefore, it is very important to
+specify scope index (interface index for link-local address, or
+site index for site-local address) with an IPv6 address.  Without
+scope index, scoped IPv6 address is ambiguous to the kernel, and
+kernel will not be able to determine the outbound interface for a
+packet.
+
+Ordinary userland applications should use advanced API (RFC2292) to
+specify scope index, or interface index.  For similar purpose,
+sin6_scope_id member in sockaddr_in6 structure is defined in RFC2553.
+However, the semantics for sin6_scope_id is rather vague.  If you
+care about portability of your application, we suggest you to use
+advanced API rather than sin6_scope_id.
+
+In the kernel, an interface index for link-local scoped address is
+embedded into 2nd 16bit-word (3rd and 4th byte) in IPv6 address.
+For example, you may see something like:
+	fe80:1::200:f8ff:fe01:6317
+in the routing table and interface address structure (struct
+in6_ifaddr).  The address above is a link-local unicast address
+which belongs to a network interface whose interface identifier is 1.
+The embedded index enables us to identify IPv6 link local
+addresses over multiple interfaces effectively and with only a
+little code change.
+Routing daemons and configuration programs, like route6d and
+ifconfig, will need to manipulate the "embedded" scope index.
+These programs use routing sockets and ioctls (like SIOCGIFADDR_IN6)
+and the kernel API will return IPv6 addresses with 2nd 16bit-word
+filled in.  The APIs are for manipulating kernel internal structure.
+Programs that use these APIs have to be prepared about differences
+in kernels anyway.
+
+When you specify scoped address to the command line, NEVER write the
+embedded form (such as ff02:1::1 or fe80:2::fedc).  This is not supposed
+to work.  Always use standard form, like ff02::1 or fe80::fedc, with
+command line option for specifying interface (like "ping6 -I ne0 ff02::1).
+In general, if a command does not have command line option to specify
+outgoing interface, that command is not ready to accept scoped address.
+This may seem to be opposite from IPv6's premise to support "dentist office"
+situation.  We believe that specifications need some improvements for this.
+
+Some of the userland tools support extended numeric IPv6 syntax, as
+documented in draft-ietf-ipngwg-scopedaddr-format-00.txt.  You can specify
+outgoing link, by using name of the outgoing interface like "fe80::1%ne0".
+This way you will be able to specify link-local scoped address without much
+trouble.
+To use this extension in your program, you'll need to use getaddrinfo(3),
+and getnameinfo(3) with NI_WITHSCOPEID.
+The implementation currently assumes 1-to-1 relationship between a link and an
+interface, which is stronger than what specs say.
+
+1.4 Plug and Play
+
+The KAME kit implements most of the IPv6 stateless address
+autoconfiguration in the kernel.
+Neighbor Discovery functions are implemented in the kernel as a whole.
+Router Advertisement (RA) input for hosts is implemented in the
+kernel.  Router Solicitation (RS) output for endhosts, RS input
+for routers, and RA output for routers are implemented in the
+userland.
+
+1.4.1 Assignment of link-local, and special addresses
+
+IPv6 link-local address is generated from IEEE802 adddress (ethernet MAC
+address).  Each of interface is assigned an IPv6 link-local address
+automatically, when the interface becomes up (IFF_UP).  Also, direct route
+for the link-local address is added to routing table.
+
+Here is an output of netstat command:
+
+Internet6:
+Destination                   Gateway                   Flags      Netif Expire
+fe80:1::%ed0/64               link#1                    UC          ed0
+fe80:2::%ep0/64               link#2                    UC          ep0
+
+Interfaces that has no IEEE802 address (pseudo interfaces like tunnel
+interfaces, or ppp interfaces) will borrow IEEE802 address from other
+interfaces, such as ethernet interfaces, whenever possible.
+If there is no IEEE802 hardware attached, last-resort pseudorandom value,
+which is from MD5(hostname), will be used as source of link-local address.
+If it is not suitable for your usage, you will need to configure the
+link-local address manually.
+
+If an interface is not capable of handling IPv6 (such as lack of multicast
+support), link-local address will not be assigned to that interface.
+See section 2 for details.
+
+Each interface joins the solicited multicast address and the
+link-local all-nodes multicast addresses (e.g.  fe80::1:ff01:6317
+and ff02::1, respectively, on the link the interface is attached).
+In addition to a link-local address, the loopback address (::1) will be
+assigned to the loopback interface.  Also, ::1/128 and ff01::/32 are
+automatically added to routing table, and loopback interface joins
+node-local multicast group ff01::1.
+
+1.4.2 Stateless address autoconfiguration on hosts
+
+In IPv6 specification, nodes are separated into two categories:
+routers and hosts.  Routers forward packets addressed to others, hosts does
+not forward the packets.  net.inet6.ip6.forwarding defines whether this
+node is router or host (router if it is 1, host if it is 0).
+
+When a host hears Router Advertisement from the router, a host may
+autoconfigure itself by stateless address autoconfiguration.
+This behavior can be controlled by net.inet6.ip6.accept_rtadv
+(host autoconfigures itself if it is set to 1).
+By autoconfiguration, network address prefix for the receiving interface
+(usually global address prefix) is added.  Default route is also configured.
+Routers periodically generate Router Advertisement packets.  To request
+an adjacent router to generate RA packet, a host can transmit Router
+Solicitation.  To generate a RS packet at any time, use the "rtsol" command.
+"rtsold" daemon is also available.  "rtsold" generates Router Solicitation
+whenever necessary, and it works great for nomadic usage (notebooks/laptops).
+If one wishes to ignore Router Advertisements, use sysctl to set
+net.inet6.ip6.accept_rtadv to 0.
+
+To generate Router Advertisement from a router, use the "rtadvd" daemon.
+
+Note that, IPv6 specification assumes the following items, and nonconforming
+cases are left unspecified:
+- Only hosts will listen to router advertisements
+- Hosts have single network interface (except loopback)
+Therefore, this is unwise to enable net.inet6.ip6.accept_rtadv on routers,
+or multi-interface host.  A misconfigured node can behave strange
+(KAME code allows nonconforming configuration, for those who would like
+to do some experiments).
+
+To summarize the sysctl knob:
+	accept_rtadv	forwarding	role of the node
+	---		---		---
+	0		0		host (to be manually configured)
+	0		1		router
+	1		0		autoconfigured host
+					(spec assumes that host has single
+					interface only, autoconfigured host
+					with multiple interface is
+					out-of-scope)
+	1		1		invalid, or experimental
+					(out-of-scope of spec)
+
+RFC2462 has validation rule against incoming RA prefix information option,
+in 5.5.3 (e).  This is to protect hosts from malicious (or misconfigured)
+routers that advertise very short prefix lifetime.
+There was an update from Jim Bound to ipngwg mailing list (look
+for "(ipng 6712)" in the archive) and KAME implements Jim's update.
+
+See 1.2 in the document for relationship between DAD and autoconfiguration.
+
+1.4.3 DHCPv6 (not yet put into freebsd4.0)
+
+We supply a tiny DHCPv6 server/client in kame/dhcp6.  However, the
+implementation is very premature (for example, this does NOT
+implement address lease/release), and it is not in default compilation
+tree.  If you want to do some experiment, compile it on your own.
+
+DHCPv6 and autoconfiguration also needs more work.  "Managed" and "Other"
+bits in RA have no special effect to stateful autoconfiguration procedure
+in DHCPv6 client program ("Managed" bit actually prevents stateless
+autoconfiguration, but no special action will be taken for DHCPv6 client).
+
+1.5 Generic tunnel interface
+
+GIF (Generic InterFace) is a pseudo interface for configured tunnel.
+Details are described in gif(4) manpage.
+Currently
+	v6 in v6
+	v6 in v4
+	v4 in v6
+	v4 in v4
+are available.  Use "gifconfig" to assign physical (outer) source
+and destination address to gif interfaces.
+Configuration that uses same address family for inner and outer IP
+header (v4 in v4, or v6 in v6) is dangerous.  It is very easy to
+configure interfaces and routing tables to perform infinite level
+of tunneling.  Please be warned.
+
+gif can be configured to be ECN-friendly.  See 4.5 for ECN-friendliness
+of tunnels, and gif(4) manpage for how to configure.
+
+If you would like to configure an IPv4-in-IPv6 tunnel with gif interface,
+read gif(4) carefully.  You will need to remove IPv6 link-local address
+automatically assigned to the gif interface.
+
+1.6 Source Address Selection
+
+Source selection of KAME is scope oriented (there are some exceptions -
+see below).  For a given destination, a source IPv6 address is selected
+by the following rule:
+    1. If the source address is explicitly specified by the user
+       (e.g. via the advanced API), the specified address is used.
+    2. If there is an address assigned to the outgoing interface
+       (which is usually determined by looking up the routing table)
+       that has the same scope as the destination address, the address
+       is used.
+       This is the most typical case.
+    3. If there is no address that satisfies the above condition,
+       choose a global address assigned to one of the interfaces
+       on the sending node.
+    4. If there is no address that satisfies the above condition,
+       and destination address is site local scope,
+       choose a site local address assigned to one of the interfaces
+       on the sending node.
+    5. If there is no address that satisfies the above condition,
+       choose the address associated with the routing table
+       entry for the destination.
+       This is the last resort, which may cause scope violation.
+
+For instance, ::1 is selected for ff01::1, fe80:1::200:f8ff:fe01:6317
+for fe80:1::2a0:24ff:feab:839b (note that embedded interface index -
+described in 1.3 - helps us choose the right source address.  Those
+embedded indices will not be on the wire).
+If the outgoing interface has multiple address for the scope,
+a source is selected longest match basis (rule 3).  Suppose
+3ffe:501:808:1:200:f8ff:fe01:6317 and 3ffe:2001:9:124:200:f8ff:fe01:6317
+are given to the outgoing interface.  3ffe:501:808:1:200:f8ff:fe01:6317
+is chosen as the source for the destination 3ffe:501:800::1.
+
+Note that the above rule is not documented in the IPv6 spec.  It is
+considered "up to implementation" item.
+There are some cases where we do not use the above rule.  One
+example is connected TCP session, and we use the address kept in tcb
+as the source.
+Another example is source address for Neighbor Advertisement.
+Under the spec (RFC2461 7.2.2) NA's source should be the target
+address of the corresponding NS's target.  In this case we follow
+the spec rather than the above longest-match rule.
+
+For new connections (when rule 1 does not apply), deprecated addresses
+(addresses with preferred lifetime = 0) will not be chosen as source address
+if other choises are available.  If no other choices are available,
+deprecated address will be used as a last resort.  If there are multiple
+choice of deprecated addresses, the above scope rule will be used to choose
+from those deprecated addreses.  If you would like to prohibit the use
+of deprecated address for some reason, configure net.inet6.ip6.use_deprecated
+to 0.  The issue related to deprecated address is described in RFC2462 5.5.4
+(NOTE: there is some debate underway in IETF ipngwg on how to use
+"deprecated" address).
+
+1.7 Jumbo Payload
+
+KAME supports the Jumbo Payload hop-by-hop option used to send IPv6
+packets with payloads longer than 65,535 octets.  But since currently
+KAME does not support any physical interface whose MTU is more than
+65,535, such payloads can be seen only on the loopback interface(i.e.
+lo0).
+
+If you want to try jumbo payloads, you first have to reconfigure the
+kernel so that the MTU of the loopback interface is more than 65,535
+bytes; add the following to the kernel configuration file:
+	options		"LARGE_LOMTU"		#To test jumbo payload
+and recompile the new kernel.
+
+Then you can test jumbo payloads by the ping6 command with -b and -s
+options.  The -b option must be specified to enlarge the size of the
+socket buffer and the -s option specifies the length of the packet,
+which should be more than 65,535.  For example, type as follows;
+	% ping6 -b 70000 -s 68000 ::1
+
+The IPv6 specification requires that the Jumbo Payload option must not
+be used in a packet that carries a fragment header.  If this condition
+is broken, an ICMPv6 Parameter Problem message must be sent to the
+sender.  KAME kernel follows the specification, but you cannot usually
+see an ICMPv6 error caused by this requirement.
+
+If KAME kernel receives an IPv6 packet, it checks the frame length of
+the packet and compares it to the length specified in the payload
+length field of the IPv6 header or in the value of the Jumbo Payload
+option, if any.  If the former is shorter than the latter, KAME kernel
+discards the packet and increments the statistics. You can see the
+statistics as output of netstat command with `-s -p ip6' option:
+	% netstat -s -p ip6
+	ip6:
+		(snip)
+		1 with data size < data length
+
+So, KAME kernel does not send an ICMPv6 error unless the erroneous
+packet is an actual Jumbo Payload, that is, its packet size is more
+than 65,535 bytes.  As described above, KAME kernel currently does not
+support physical interface with such a huge MTU, so it rarely returns an
+ICMPv6 error.
+
+TCP/UDP over jumbogram is not supported at this moment.  This is because
+we have no medium (other than loopback) to test this.  Contact us if you
+need this.
+
+IPsec does not work on jumbograms.  This is due to some specification twists
+in supporting AH with jumbograms (AH header size influences payload length,
+and this makes it real hard to authenticate inbound packet with jumbo payload
+option as well as AH).
+
+There are fundamental issues in *BSD support for jumbograms.  We would like to
+address those, but we need more time to finalize these.  To name a few:
+- mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it will not hold
+  jumbogram with len > 2G on 32bit architecture CPUs.  If we would like to
+  support jumbogram properly, the field must be expanded to hold 4G +
+  IPv6 header + link-layer header.  Therefore, it must be expanded to at least
+  int64_t (u_int32_t is NOT enough).
+- We mistakingly use "int" to hold packet length in many places.  We need
+  to convert them into larger integral type.  It needs a great care, as we may
+  experience overflow during packet length computation.
+- We mistakingly check for ip6_plen field of IPv6 header for packet payload
+  length in various places.  We should be checking mbuf pkthdr.len instead.
+  ip6_input() will perform sanity check on jumbo payload option on input,
+  and we can safely use mbuf pkthdr.len afterwards.
+- TCP code needs a careful update in bunch of places, of course.
+
+1.8 Loop prevention in header processing
+
+IPv6 specification allows arbitrary number of extension headers to
+be placed onto packets.  If we implement IPv6 packet processing
+code in the way BSD IPv4 code is implemented, kernel stack may
+overflow due to long function call chain.  KAME sys/netinet6 code
+is carefully designed to avoid kernel stack overflow.  Because of
+this, KAME sys/netinet6 code defines its own protocol switch
+structure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
+There is no such update to IPv4 part (sys/netinet) for
+compatibility, but small change is added to its pr_input()
+prototype. So "struct ipprotosw" is also defined.
+Because of this, if you receive IPsec-over-IPv4 packet with massive
+number of IPsec headers, kernel stack may blow up.  IPsec-over-IPv6 is okay.
+(Off-course, for those all IPsec headers to be processed, each
+such IPsec header must pass each IPsec check. So an anonymous
+attacker won't be able to do such an attack.)
+
+1.9 ICMPv6
+
+After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
+packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
+KAME already implements this into the kernel.
+
+1.10 Applications
+
+For userland programming, we support IPv6 socket API as specified in
+RFC2553, RFC2292 and upcoming internet drafts.
+
+TCP/UDP over IPv6 is available and quite stable.  You can enjoy "telnet",
+"ftp", "rlogin", "rsh", "ssh", etc.  These applications are protocol
+independent.  That is, they automatically chooses IPv4 or IPv6
+according to DNS.
+
+1.11 Kernel Internals
+
+ (*) TCP/UDP part is handled differently between operating system platforms.
+     See 1.12 for details.
+
+The current KAME has escaped from the IPv4 netinet logic.  While
+ip_forward() calls ip_output(), ip6_forward() directly calls
+if_output() since routers must not divide IPv6 packets into fragments.
+
+ICMPv6 should contain the original packet as long as possible up to
+1280.  UDP6/IP6 port unreach, for instance, should contain all
+extension headers and the *unchanged* UDP6 and IP6 headers.
+So, all IP6 functions except TCP never convert network byte
+order into host byte order, to save the original packet.
+
+tcp_input(), udp6_input() and icmp6_input() can't assume that IP6
+header is preceding the transport headers due to extension
+headers.  So, in6_cksum() was implemented to handle packets whose IP6
+header and transport header is not continuous.  TCP/IP6 nor UDP6/IP6
+header structure don't exist for checksum calculation.
+
+To process IP6 header, extension headers and transport headers easily,
+KAME requires network drivers to store packets in one internal mbuf or
+one or more external mbufs.  A typical old driver prepares two
+internal mbufs for 96 - 204 bytes data, however, KAME's reference
+implementation stores it in one external mbuf.
+
+"netstat -s -p ip6" tells you whether or not your driver conforms
+KAME's requirement.  In the following example, "cce0" violates the
+requirement. (For more information, refer to Section 2.)
+
+        Mbuf statistics:
+                317 one mbuf
+                two or more mbuf::
+                        lo0 = 8
+			cce0 = 10
+                3282 one ext mbuf
+                0 two or more ext mbuf
+
+Each input function calls IP6_EXTHDR_CHECK in the beginning to check
+if the region between IP6 and its header is
+continuous.  IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has
+M_LOOP flag, that is, the packet comes from the loopback
+interface.  m_pullup() is never called for packets coming from physical
+network interfaces.
+
+Both IP and IP6 reassemble functions never call m_pullup().
+
+1.12 IPv4 mapped address and IPv6 wildcard socket
+
+RFC2553 describes IPv4 mapped address (3.7) and special behavior
+of IPv6 wildcard bind socket (3.8).  The spec allows you to:
+- Accept IPv4 connections by AF_INET6 wildcard bind socket.
+- Transmit IPv4 packet over AF_INET6 socket by using special form of
+  the address like ::ffff:10.1.1.1.
+but the spec itself is very complicated and does not specify how the
+socket layer should behave.
+Here we call the former one "listening side" and the latter one "initiating
+side", for reference purposes.
+
+Almost all KAME implementations treat tcp/udp port number space separately
+between IPv4 and IPv6.  You can perform wildcard bind on both of the adderss
+families, on the same port.
+
+The following table show the behavior of FreeBSD4x.
+
+		listening side		initiating side
+		(AF_INET6 wildcard	(connetion to ::ffff:10.1.1.1)
+		socket gets IPv4 conn.)
+		---			---
+FreeBSD4x	configurable		supported
+		default: enabled
+
+The following sections will give you more details, and how you can
+configure the behavior.
+
+Comments on listening side:
+
+It looks that RFC2553 talks too little on wildcard bind issue,
+especially on the port space issue, failure mode and relationship
+between AF_INET/INET6 wildcard bind.  There can be several separate
+interpretation for this RFC which conform to it but behaves differently.
+So, to implement portable application you should assume nothing
+about the behavior in the kernel.  Using getaddrinfo() is the safest way.
+Port number space and wildcard bind issues were discussed in detail
+on ipv6imp mailing list, in mid March 1999 and it looks that there's
+no concrete consensus (means, up to implementers).  You may want to
+check the mailing list archives.
+
+If a server application would like to accept IPv4 and IPv6 connections,
+there will be two alternatives.
+
+One is using AF_INET and AF_INET6 socket (you'll need two sockets).
+Use getaddrinfo() with AI_PASSIVE into ai_flags, and socket(2) and bind(2)
+to all the addresses returned.
+By opening multiple sockets, you can accept connections onto the socket with
+proper address family.  IPv4 connections will be accepted by AF_INET socket,
+and IPv6 connections will be accepted by AF_INET6 socket.
+
+Another way is using one AF_INET6 wildcard bind socket.
+Use getaddrinfo() with AI_PASSIVE into ai_flags and with
+AF_INET6 into ai_family, and set the 1st argument hostname to
+NULL. And socket(2) and bind(2) to the address returned.
+(should be IPv6 unspecified addr)
+You can accept either of IPv4 and IPv6 packet via this one socket.
+
+To support only IPv6 traffic on AF_INET6 wildcard binded socket portably,
+always check the peer address when a connection is made toward
+AF_INET6 listening socket.  If the address is IPv4 mapped address, you may
+want to reject the connection.  You can check the condition by using
+IN6_IS_ADDR_V4MAPPED() macro.
+To resolv this issue more easily, there is system dependent setsockopt()
+option, IPV6_BINDV6ONLY, used like below.
+	int on;
+
+	setsockopt(s, IPPROTO_IPV6, IPV6_BINDV6ONLY,
+		   (char *)&on, sizeof (on)) < 0));
+When this call succeed, then this socket only receive IPv6 packets.
+
+
+Comments on initiating side:
+
+Advise to application implementers: to implement a portable IPv6 application
+(which works on multiple IPv6 kernels), we believe that the following
+is the key to the success:
+- NEVER hardcode AF_INET nor AF_INET6.
+- Use getaddrinfo() and getnameinfo() throughout the system.
+  Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*().
+  (To update existing applications to be IPv6 aware easily,
+  sometime getipnodeby*() will be useful. But if possible, try to
+  rewrite the code to use getaddrinfo() and getnameinfo().)
+- If you would like to connect to destination, use getaddrinfo() and try
+  all the destination returned, like telnet does.
+- Some of the IPv6 stack is shipped with buggy getaddrinfo().  Ship a minimal
+  working version with your application and use that as last resort.
+
+If you would like to use AF_INET6 socket for both IPv4 and IPv6 outgoing
+connection, you will need to use getipnodebyname(). When you would like to
+update your existing appication to be IPv6 aware with minimal effort,
+this approach might be choosed. But please note that it is a temporal
+solution, because getipnodebyname() itself is not recommended as it does
+not handle scoped IPv6 addresses at all.  For IPv6 name resolution,
+getaddrinfo() is the preferred API. So you should rewrite your
+application to use getaddrinfo(), when you get the time to do it.
+
+When writing applications that make outgoing connections, story goes much
+simpler if you treat AF_INET and AF_INET6 as totally seaprate address family.
+{set,get}sockopt issue goes simpler, DNS issue will be made simpler.  We do
+not recommend you to rely upon IPv4 mapped address.
+
+1.12.1 FreeBSD4x
+
+FreeBSD4x uses shared tcp4/6 code (from sys/netinet/tcp*) and separete
+udp4/6 code.  It uses unified inpcb/in6pcb structure.
+
+The platform can be configured to support IPv4 mapped address.
+Kernel configuration is summarized as follows:
+- By default, AF_INET6 socket will grab IPv4 connections in certain condition,
+  and can initiate connection to IPv4 destination embedded in
+  IPv4 mapped IPv6 address.
+- You can disable it on entire system with sysctl like below.
+	sysctl -w net.inet6.ip6.mapped_addr=0
+
+1.12.1.1 FreeBSD4x, listening side
+
+Each socket can be configured to support special AF_INET6 wildcard bind
+(enabled by default).
+You can disable it on each socket basis with setsockopt() like below.
+	int on;
+
+	setsockopt(s, IPPROTO_IPV6, IPV6_BINDV6ONLY,
+		   (char *)&on, sizeof (on)) < 0));
+
+Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following
+conditions are satisfied:
+- there's no AF_INET socket that matches the IPv4 connection
+- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
+  getsockopt(IPV6_BINDV6ONLY) returns 0.
+There's no problem with open/close ordering.
+
+1.12.1.2 FreeBSD4x, initiating side
+
+FreeBSD4x supports outgoing connetion to IPv4 mapped address
+(::ffff:10.1.1.1), if the node is configured to support IPv4 mapped address.
+
+1.13 sockaddr_storage
+
+When RFC2553 was about to be finalized, there was discusson on how struct
+sockaddr_storage members are named.  One proposal is to prepend "__" to the
+members (like "__ss_len") as they should not be touched.  The other proposal
+was that don't prepend it (like "ss_len") as we need to touch those members
+directly.  There was no clear consensus on it.
+
+As a result, RFC2553 defines struct sockaddr_storage as follows:
+	struct sockaddr_storage {
+		u_char	__ss_len;	/* address length */
+		u_char	__ss_family;	/* address family */
+		/* and bunch of padding */
+	};
+On the contrary, XNET draft defines as follows:
+	struct sockaddr_storage {
+		u_char	ss_len;		/* address length */
+		u_char	ss_family;	/* address family */
+		/* and bunch of padding */
+	};
+
+In December 1999, it was agreed that RFC2553bis should pick the latter (XNET)
+definition.
+
+KAME kit prior to December 1999 used RFC2553 definition.  KAME kit after
+December 1999 (including December) will conform to XNET definition,
+based on RFC2553bis discusson.
+
+If you look at multiple IPv6 implementations, you will be able to see
+both definitions.  As an userland programmer, the most portable way of
+dealing with it is to:
+(1) ensure ss_family and/or ss_len are available on the platform, by using
+    GNU autoconf,
+(2) have -Dss_family=__ss_family to unify all occurences (including header
+    file) into __ss_family, or
+(3) never touch __ss_family.  cast to sockaddr * and use sa_family like:
+	struct sockaddr_storage ss;
+	family = ((struct sockaddr *)&ss)->sa_family
+
+2. Network Drivers
+
+KAME requires two items to be added into the standard drivers:
+
+(1) mbuf clustering requirement. In this stable release, we changed
+    MINCLSIZE into MHLEN+1 for all the operating systems in order to make
+    all the drivers behave as we expect.
+
+(2) multicast.  If "ifmcstat" yields no multicast group for a
+    interface, that interface has to be patched.
+
+If any of the driver don't support the requirements, then the driver
+can't be used for IPv6 and/or IPsec communication.  If you find any
+problem with your card using IPv6/IPsec, then, please report it to
+freebsd-bugs@freebsd.org.
+
+(NOTE: In the past we required all pcmcia drivers to have a call to
+in6_ifattach().  We have no such requirement any more)
+
+3. Translator
+
+We categorize IPv4/IPv6 translator into 4 types.
+
+Translator A --- It is used in the early stage of transition to make
+it possible to establish a connection from an IPv6 host in an IPv6
+island to an IPv4 host in the IPv4 ocean.
+
+Translator B --- It is used in the early stage of transition to make
+it possible to establish a connection from an IPv4 host in the IPv4
+ocean to an IPv6 host in an IPv6 island.
+
+Translator C --- It is used in the late stage of transition to make it
+possible to establish a connection from an IPv4 host in an IPv4 island
+to an IPv6 host in the IPv6 ocean.
+
+Translator D --- It is used in the late stage of transition to make it
+possible to establish a connection from an IPv6 host in the IPv6 ocean
+to an IPv4 host in an IPv4 island.
+
+KAME provides an TCP relay translator for category A.  This is called
+"FAITH".  We also provide IP header translator for category A.
+(The latter is not yet put into FreeBSD4.x yet.)
+
+3.1 FAITH TCP relay translator
+
+FAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel.
+FAITH will reserve an IPv6 address prefix, and relay TCP connection
+toward that prefix to IPv4 destination.
+
+For example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and
+the IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12,
+the connection will be relayed toward IPv4 destination 163.221.202.12.
+
+	destination IPv4 node (163.221.202.12)
+	  ^
+	  | IPv4 tcp toward 163.221.202.12
+	FAITH-relay dual stack node
+	  ^
+	  | IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
+	source IPv6 node
+
+faithd must be invoked on FAITH-relay dual stack node.
+
+For more details, consult src/usr.sbin/faithd/README.
+
+3.2 IPv6-to-IPv4 header translator
+
+(to be written)
+
+4. IPsec
+
+IPsec is mainly organized by three components.
+
+(1) Policy Management
+(2) Key Management
+(3) AH and ESP handling
+
+4.1 Policy Management
+
+The kernel implements experimental policy management code.  There are two way
+to manage security policy.  One is to configure per-socket policy using
+setsockopt(3).  In this cases, policy configuration is described in
+ipsec_set_policy(3).  The other is to configure kernel packet filter-based
+policy using PF_KEY interface, via setkey(8).
+
+The policy entry is not re-ordered with its
+indexes, so the order of entry when you add is very significant.
+
+4.2 Key Management
+
+The key management code implemented in this kit (sys/netkey) is a
+home-brew PFKEY v2 implementation.  This conforms to RFC2367.
+
+The home-brew IKE daemon, "racoon" is included in the kit
+(kame/kame/racoon).
+Basically you'll need to run racoon as daemon, then setup a policy
+to require keys (like ping -P 'out ipsec esp/transport//use').
+The kernel will contact racoon daemon as necessary to exchange keys.
+
+4.3 AH and ESP handling
+
+IPsec module is implemented as "hooks" to the standard IPv4/IPv6
+processing.  When sending a packet, ip{,6}_output() checks if ESP/AH
+processing is required by checking if a matching SPD (Security
+Policy Database) is found.  If ESP/AH is needed,
+{esp,ah}{4,6}_output() will be called and mbuf will be updated
+accordingly.  When a packet is received, {esp,ah}4_input() will be
+called based on protocol number, i.e. (*inetsw[proto])().
+{esp,ah}4_input() will decrypt/check authenticity of the packet,
+and strips off daisy-chained header and padding for ESP/AH.  It is
+safe to strip off the ESP/AH header on packet reception, since we
+will never use the received packet in "as is" form.
+
+By using ESP/AH, TCP4/6 effective data segment size will be affected by
+extra daisy-chained headers inserted by ESP/AH.  Our code takes care of
+the case.
+
+Basic crypto functions can be found in directory "sys/crypto".  ESP/AH
+transform are listed in {esp,ah}_core.c with wrapper functions.  If you
+wish to add some algorithm, add wrapper function in {esp,ah}_core.c, and
+add your crypto algorithm code into sys/crypto.
+
+Tunnel mode is partially supported in this release, with the following
+restrictions:
+- IPsec tunnel is not combined with GIF generic tunneling interface.
+  It needs a great care because we may create an infinite loop between
+  ip_output() and tunnelifp->if_output().  Opinion varies if it is better
+  to unify them, or not.
+- MTU and Don't Fragment bit (IPv4) considerations need more checking, but
+  basically works fine.
+- Authentication model for AH tunnel must be revisited.  We'll need to
+  improve the policy management engine, eventually.
+
+4.4 Conformance to RFCs and IDs
+
+The IPsec code in the kernel conforms (or, tries to conform) to the
+following standards:
+    "old IPsec" specification documented in rfc182[5-9].txt
+    "new IPsec" specification documented in rfc240[1-6].txt, rfc241[01].txt,
+	rfc2451.txt and draft-mcdonald-simple-ipsec-api-01.txt (draft expired,
+	but you can take from ftp://ftp.kame.net/pub/internet-drafts/).
+	(NOTE: IKE specifications, rfc241[7-9].txt are implemented in userland,
+	as "racoon" IKE daemon)
+
+Currently supported algorithms are:
+    old IPsec AH
+	null crypto checksum (no document, just for debugging)
+	keyed MD5 with 128bit crypto checksum (rfc1828.txt)
+	keyed SHA1 with 128bit crypto checksum (no document)
+	HMAC MD5 with 128bit crypto checksum (rfc2085.txt)
+	HMAC SHA1 with 128bit crypto checksum (no document)
+    old IPsec ESP
+	null encryption (no document, similar to rfc2410.txt)
+	DES-CBC mode (rfc1829.txt)
+    new IPsec AH
+	null crypto checksum (no document, just for debugging)
+	keyed MD5 with 96bit crypto checksum (no document)
+	keyed SHA1 with 96bit crypto checksum (no document)
+	HMAC MD5 with 96bit crypto checksum (rfc2403.txt
+	HMAC SHA1 with 96bit crypto checksum (rfc2404.txt)
+    new IPsec ESP
+	null encryption (rfc2410.txt)
+	DES-CBC with derived IV
+		(draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired)
+	DES-CBC with explicit IV (rfc2405.txt)
+	3DES-CBC with explicit IV (rfc2451.txt)
+	BLOWFISH CBC (rfc2451.txt)
+	CAST128 CBC (rfc2451.txt)
+	RC5 CBC (rfc2451.txt)
+	each of the above can be combined with:
+	    ESP authentication with HMAC-MD5(96bit)
+	    ESP authentication with HMAC-SHA1(96bit)
+
+The following algorithms are NOT supported:
+    old IPsec AH
+	HMAC MD5 with 128bit crypto checksum + 64bit replay prevention
+		(rfc2085.txt)
+	keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt)
+
+IPsec (in kernel) and IKE (in userland as "racoon") has been tested
+at several interoperability test events, and it is known to interoperate
+with many other implementations well.  Also, KAME IPsec has quite wide
+coverage for IPsec crypto algorithms documented in RFC (we cover
+algorithms without intellectual property issues only).
+
+4.5 ECN consideration on IPsec tunnels
+
+KAME IPsec implements ECN-friendly IPsec tunnel, described in
+draft-ipsec-ecn-00.txt.
+Normal IPsec tunnel is described in RFC2401.  On encapsulation,
+IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
+IP header to outer IP header.  On decapsulation outer IP header
+will be simply dropped.  The decapsulation rule is not compatible
+with ECN, since ECN bit on the outer IP TOS/traffic class field will be
+lost.
+To make IPsec tunnel ECN-friendly, we should modify encapsulation
+and decapsulation procedure.  This is described in
+http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt, chapter 3.
+
+KAME IPsec tunnel implementation can give you three behaviors, by setting
+net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value:
+- RFC2401: no consideration for ECN (sysctl value -1)
+- ECN forbidden (sysctl value 0)
+- ECN allowed (sysctl value 1)
+Note that the behavior is configurable in per-node manner, not per-SA manner
+(draft-ipsec-ecn-00 wants per-SA configuration, but it looks too much for me).
+
+The behavior is summarized as follows (see source code for more detail):
+
+		encapsulate			decapsulate
+		---				---
+RFC2401		copy all TOS bits		drop TOS bits on outer
+		from inner to outer.		(use inner TOS bits as is)
+
+ECN forbidden	copy TOS bits except for ECN	drop TOS bits on outer
+		(masked with 0xfc) from inner	(use inner TOS bits as is)
+		to outer.  set ECN bits to 0.
+
+ECN allowed	copy TOS bits except for ECN	use inner TOS bits with some
+		CE (masked with 0xfe) from	change.  if outer ECN CE bit
+		inner to outer.			is 1, enable ECN CE bit on
+		set ECN CE bit to 0.		the inner.
+
+General strategy for configuration is as follows:
+- if both IPsec tunnel endpoint are capable of ECN-friendly behavior,
+  you'd better configure both end to "ECN allowed" (sysctl value 1).
+- if the other end is very strict about TOS bit, use "RFC2401"
+  (sysctl value -1).
+- in other cases, use "ECN forbidden" (sysctl value 0).
+The default behavior is "ECN forbidden" (sysctl value 0).
+
+For more information, please refer to:
+	http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt
+	RFC2481 (Explicit Congestion Notification)
+	KAME sys/netinet6/{ah,esp}_input.c
+
+(Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis)
+
+4.6 Interoperability
+
+Here are (some of) platforms we have tested IPsec/IKE interoperability
+in the past.  Note that both ends (KAME and others) may have modified their
+implementation, so use the following list just for reference purposes.
+	Altiga, Ashley-laurent (vpcom.com), Data Fellows (F-Secure), Ericsson
+	ACC, FreeS/WAN, HITACHI, IBM AIX, IIJ, Intel, Microsoft WinNT, NIST
+	(linux IPsec + plutoplus), Netscreen, OpenBSD, RedCreek, Routerware,
+	SSH, Secure Computing, Soliton, Toshiba, VPNet, Yamaha RT100i
+
+5. IPComp
+(not yet put into FreeBSD4.x, due to inflate related changes in 4.x.)
+
+IPComp stands for IP payload compression protocol.  This is aimed for
+payload compression, not the header compression like PPP VJ compression.
+This may be useful when you are using slow serial link (say, cell phone)
+with powerful CPU (well, recent notebook PCs are really powerful...).
+The protocol design of IPComp is very similar to IPsec.
+
+KAME implements the following specifications:
+- RFC2393: IP Payload Compression Protocol (IPComp)
+- RFC2394: IP Payload Compression Using DEFLATE
+
+Here are some points to be noted:
+- IPComp is treated as part of IPsec protocol suite, and SPI and
+  CPI space is unified.  Spec says that there's no relationship
+  between two so they are assumed to be separate.
+- IPComp association (IPCA) is kept in SAD.
+- It is possible to use well-known CPI (CPI=2 for DEFLATE for example),
+  for outbound/inbound packet, but for indexing purposes one element from
+  SPI/CPI space will be occupied anyway.
+- pfkey is modified to support IPComp.  However, there's no official
+  SA type number assignment yet.  Portability with other IPComp
+  stack is questionable (anyway, who else implement IPComp on UN*X?).
+- Spec says that IPComp output processing must be performed before IPsec
+  output processing, to achieve better compression ratio and "stir" data
+  stream before encryption.  However, with manual SPD setting, you are able to
+  violate the ordering requirement (KAME code is too generic, maybe).
+- Though MTU can be significantly decreased by using IPComp, no special
+  consideration is made about path MTU (spec talks nothing about MTU
+  consideration).  IPComp is designed for serial links, not ethernet-like
+  medium, it seems.
+- You can change compression ratio on outbound packet, by changing
+  deflate_policy in sys/netinet6/ipcomp_core.c.  You can also change history
+  buffer size by changing deflate_window in the same source code.
+  (should it be sysctl accessible?  or per-SAD configurable?)
+- Tunnel mode IPComp is not working right.  KAME box can generate tunnelled
+  IPComp packet, however, cannot accept tunneled IPComp packet.
+
+6. ALTQ
+ (not yet put into FreeBSD4.x)
+
+						 <end of IMPLEMENTATION>
author	shin <shin@FreeBSD.org>	2000-02-26 19:44:12 +0000
committer	shin <shin@FreeBSD.org>	2000-02-26 19:44:12 +0000
commit	9b8b2074975c685e02cccd4939abd1200f82d010 (patch)
tree	8de4cc9d2457db0a62abeac5dee6fb8e363579d7 /share/doc
parent	60843485ad65175ee23a3e9a01666a9cf467602f (diff)
download	FreeBSD-src-9b8b2074975c685e02cccd4939abd1200f82d010.zip FreeBSD-src-9b8b2074975c685e02cccd4939abd1200f82d010.tar.gz