diff options
author | bms <bms@FreeBSD.org> | 2009-04-29 19:19:13 +0000 |
---|---|---|
committer | bms <bms@FreeBSD.org> | 2009-04-29 19:19:13 +0000 |
commit | 32a71137f08bc028578417de36a241d7e6011f58 (patch) | |
tree | 51d9a006ee48417962ce45f044b7e5603910fe13 | |
parent | 51a4d1c4a3d279a3638c0b40f351aa93f965c7df (diff) | |
download | FreeBSD-src-32a71137f08bc028578417de36a241d7e6011f58.zip FreeBSD-src-32a71137f08bc028578417de36a241d7e6011f58.tar.gz |
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
-rw-r--r-- | UPDATING | 86 | ||||
-rw-r--r-- | sys/conf/files | 1 | ||||
-rw-r--r-- | sys/kern/subr_witness.c | 12 | ||||
-rw-r--r-- | sys/netinet/ip_carp.c | 68 | ||||
-rw-r--r-- | sys/netinet6/icmp6.c | 64 | ||||
-rw-r--r-- | sys/netinet6/in6.c | 43 | ||||
-rw-r--r-- | sys/netinet6/in6_ifattach.c | 35 | ||||
-rw-r--r-- | sys/netinet6/in6_mcast.c | 165 | ||||
-rw-r--r-- | sys/netinet6/in6_pcb.c | 30 | ||||
-rw-r--r-- | sys/netinet6/in6_proto.c | 1 | ||||
-rw-r--r-- | sys/netinet6/in6_var.h | 293 | ||||
-rw-r--r-- | sys/netinet6/ip6_input.c | 29 | ||||
-rw-r--r-- | sys/netinet6/ip6_mroute.c | 34 | ||||
-rw-r--r-- | sys/netinet6/ip6_output.c | 453 | ||||
-rw-r--r-- | sys/netinet6/ip6_var.h | 24 | ||||
-rw-r--r-- | sys/netinet6/mld6.c | 3339 | ||||
-rw-r--r-- | sys/netinet6/mld6_var.h | 147 | ||||
-rw-r--r-- | sys/netinet6/raw_ip6.c | 61 | ||||
-rw-r--r-- | sys/netinet6/udp6_usrreq.c | 40 | ||||
-rw-r--r-- | sys/netinet6/vinet6.h | 16 | ||||
-rw-r--r-- | sys/sys/param.h | 2 | ||||
-rw-r--r-- | usr.sbin/ifmcstat/ifmcstat.c | 180 |
22 files changed, 4005 insertions, 1118 deletions
@@ -22,6 +22,92 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 8.x IS SLOW: to maximize performance. (To disable malloc debugging, run ln -s aj /etc/malloc.conf.) +20090429: + MLDv2 and Source-Specific Multicast (SSM) have been merged + to the IPv6 stack. VIMAGE hooks are in but not yet used. + The implementation of SSM within FreeBSD's IPv6 stack closely + follows the IPv4 implementation. + + For kernel developers: + + * The most important changes are that the ip6_output() and + ip6_input() paths no longer take the IN6_MULTI_LOCK, + and this lock has been downgraded to a non-recursive mutex. + + * As with the changes to the IPv4 stack to support SSM, filtering + of inbound multicast traffic must now be performed by transport + protocols within the IPv6 stack. This does not apply to TCP and + SCTP, however, it does apply to UDP in IPv6 and raw IPv6. + + * The KPIs used by IPv6 multicast are similar to those used by + the IPv4 stack, with the following differences: + * im6o_mc_filter() is analogous to imo_multicast_filter(). + * The legacy KAME entry points in6_joingroup and in6_leavegroup() + are shimmed to in6_mc_join() and in6_mc_leave() respectively. + * IN6_LOOKUP_MULTI() has been deprecated and removed. + * IPv6 relies on MLD for the DAD mechanism. KAME's internal KPIs + for MLDv1 have an additional 'timer' argument which is used to + jitter the initial membership report for the solicited-node + multicast membership on-link. + * This is not strictly needed for MLDv2, which already jitters + its report transmissions. However, the 'timer' argument is + preserved in case MLDv1 is active on the interface. + + * The KAME linked-list based IPv6 membership implementation has + been refactored to use a vector similar to that used by the IPv4 + stack. + Code which maintains a list of its own multicast memberships + internally, e.g. carp, has been updated to reflect the new + semantics. + + * There is a known Lock Order Reversal (LOR) due to in6_setscope() + acquiring the IF_AFDATA_LOCK and being called within ip6_output(). + Whilst MLDv2 tries to avoid this otherwise benign LOR, it is an + implementation constraint which needs to be addressed in HEAD. + + For application developers: + + * The changes are broadly similar to those made for the IPv4 + stack. + + * The use of IPv4 and IPv6 multicast socket options on the same + socket, using mapped addresses, HAS NOT been tested or supported. + + * There are a number of issues with the implementation of various + IPv6 multicast APIs which need to be resolved in the API surface + before the implementation is fully compatible with KAME userland + use, and these are mostly to do with interface index treatment. + + * The literature available discusses the use of either the delta / ASM + API with setsockopt(2)/getsockopt(2), or the full-state / ASM API + using setsourcefilter(3)/getsourcefilter(3). For more information + please refer to RFC 3768, 'Socket Interface Extensions for + Multicast Source Filters'. + + * Applications which use the published RFC 3678 APIs should be fine. + + For systems administrators: + + * The mtest(8) utility has been refactored to support IPv6, in + addition to IPv4. Interface addresses are no longer accepted + as arguments, their names must be used instead. The utility + will map the interface name to its first IPv4 address as + returned by getifaddrs(3). + + * The ifmcstat(8) utility has also been updated to print the MLDv2 + endpoint state and source filter lists via sysctl(3). + + * The net.inet6.ip6.mcast.loop sysctl may be tuned to 0 to disable + loopback of IPv6 multicast datagrams by default; it defaults to 1 + to preserve the existing behaviour. Disabling multicast loopback is + recommended for optimal system performance. + + * The IPv6 MROUTING code has been changed to examine this sysctl + instead of attempting to perform a group lookup before looping + back forwarded datagrams. + + Bump __FreeBSD_version to 800084. + 20090422: Implement low-level Bluetooth HCI API. Bump __FreeBSD_version to 800083. diff --git a/sys/conf/files b/sys/conf/files index 68282c7..97f0cd6 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -2381,6 +2381,7 @@ netinet6/in6.c optional inet6 netinet6/in6_cksum.c optional inet6 netinet6/in6_gif.c optional gif inet6 netinet6/in6_ifattach.c optional inet6 +netinet6/in6_mcast.c optional inet6 netinet6/in6_pcb.c optional inet6 netinet6/in6_proto.c optional inet6 netinet6/in6_rmx.c optional inet6 diff --git a/sys/kern/subr_witness.c b/sys/kern/subr_witness.c index 2db8e07..6d54be2 100644 --- a/sys/kern/subr_witness.c +++ b/sys/kern/subr_witness.c @@ -512,7 +512,8 @@ static struct witness_order_list_entry order_lists[] = { { "ifaddr", &lock_class_mtx_sleep }, { NULL, NULL }, /* - * Multicast - protocol locks before interface locks, after UDP locks. + * IPv4 multicast: + * protocol locks before interface locks, after UDP locks. */ { "udpinp", &lock_class_rw }, { "in_multi_mtx", &lock_class_mtx_sleep }, @@ -520,6 +521,15 @@ static struct witness_order_list_entry order_lists[] = { { "if_addr_mtx", &lock_class_mtx_sleep }, { NULL, NULL }, /* + * IPv6 multicast: + * protocol locks before interface locks, after UDP locks. + */ + { "udpinp", &lock_class_rw }, + { "in6_multi_mtx", &lock_class_mtx_sleep }, + { "mld_mtx", &lock_class_mtx_sleep }, + { "if_addr_mtx", &lock_class_mtx_sleep }, + { NULL, NULL }, + /* * UNIX Domain Sockets */ { "unp_global_rwlock", &lock_class_rw }, diff --git a/sys/netinet/ip_carp.c b/sys/netinet/ip_carp.c index b2922fc..fa0726a 100644 --- a/sys/netinet/ip_carp.c +++ b/sys/netinet/ip_carp.c @@ -400,15 +400,20 @@ carp_clone_create(struct if_clone *ifc, int unit, caddr_t params) sc->sc_advskew = 0; sc->sc_init_counter = 1; sc->sc_naddrs = sc->sc_naddrs6 = 0; /* M_ZERO? */ -#ifdef INET6 - sc->sc_im6o.im6o_multicast_hlim = CARP_DFLTTL; -#endif sc->sc_imo.imo_membership = (struct in_multi **)malloc( (sizeof(struct in_multi *) * IP_MIN_MEMBERSHIPS), M_CARP, M_WAITOK); sc->sc_imo.imo_mfilters = NULL; sc->sc_imo.imo_max_memberships = IP_MIN_MEMBERSHIPS; sc->sc_imo.imo_multicast_vif = -1; +#ifdef INET6 + sc->sc_im6o.im6o_membership = (struct in6_multi **)malloc( + (sizeof(struct in6_multi *) * IPV6_MIN_MEMBERSHIPS), M_CARP, + M_WAITOK); + sc->sc_im6o.im6o_mfilters = NULL; + sc->sc_im6o.im6o_max_memberships = IPV6_MIN_MEMBERSHIPS; + sc->sc_im6o.im6o_multicast_hlim = CARP_DFLTTL; +#endif callout_init(&sc->sc_ad_tmo, CALLOUT_MPSAFE); callout_init(&sc->sc_md_tmo, CALLOUT_MPSAFE); @@ -448,6 +453,9 @@ carp_clone_destroy(struct ifnet *ifp) if_detach(ifp); if_free_type(ifp, IFT_ETHER); free(sc->sc_imo.imo_membership, M_CARP); +#ifdef INET6 + free(sc->sc_im6o.im6o_membership, M_CARP); +#endif free(sc, M_CARP); } @@ -1449,14 +1457,17 @@ static void carp_multicast6_cleanup(struct carp_softc *sc) { struct ip6_moptions *im6o = &sc->sc_im6o; + u_int16_t n = im6o->im6o_num_memberships; - while (!LIST_EMPTY(&im6o->im6o_memberships)) { - struct in6_multi_mship *imm = - LIST_FIRST(&im6o->im6o_memberships); - - LIST_REMOVE(imm, i6mm_chain); - in6_leavegroup(imm); + while (n-- > 0) { + if (im6o->im6o_membership[n] != NULL) { + in6_mc_leave(im6o->im6o_membership[n], NULL); + im6o->im6o_membership[n] = NULL; + } } + KASSERT(im6o->im6o_mfilters == NULL, + ("%s: im6o_mfilters != NULL", __func__)); + im6o->im6o_num_memberships = 0; im6o->im6o_multicast_ifp = NULL; } #endif @@ -1635,10 +1646,11 @@ carp_set_addr6(struct carp_softc *sc, struct sockaddr_in6 *sin6) struct carp_if *cif; struct in6_ifaddr *ia, *ia_if; struct ip6_moptions *im6o = &sc->sc_im6o; - struct in6_multi_mship *imm; struct in6_addr in6; int own, error; + error = 0; + if (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr)) { if (!(SC2IFP(sc)->if_flags & IFF_UP)) carp_set_state(sc, INIT); @@ -1686,6 +1698,8 @@ carp_set_addr6(struct carp_softc *sc, struct sockaddr_in6 *sin6) return (EADDRNOTAVAIL); if (!sc->sc_naddrs6) { + struct in6_multi *in6m; + im6o->im6o_multicast_ifp = ifp; /* join CARP multicast address */ @@ -1694,9 +1708,12 @@ carp_set_addr6(struct carp_softc *sc, struct sockaddr_in6 *sin6) in6.s6_addr8[15] = 0x12; if (in6_setscope(&in6, ifp, NULL) != 0) goto cleanup; - if ((imm = in6_joingroup(ifp, &in6, &error, 0)) == NULL) + in6m = NULL; + error = in6_mc_join(ifp, &in6, NULL, &in6m, 0); + if (error) goto cleanup; - LIST_INSERT_HEAD(&im6o->im6o_memberships, imm, i6mm_chain); + im6o->im6o_membership[0] = in6m; + im6o->im6o_num_memberships++; /* join solicited multicast address */ bzero(&in6, sizeof(in6)); @@ -1707,9 +1724,12 @@ carp_set_addr6(struct carp_softc *sc, struct sockaddr_in6 *sin6) in6.s6_addr8[12] = 0xff; if (in6_setscope(&in6, ifp, NULL) != 0) goto cleanup; - if ((imm = in6_joingroup(ifp, &in6, &error, 0)) == NULL) + in6m = NULL; + error = in6_mc_join(ifp, &in6, NULL, &in6m, 0); + if (error) goto cleanup; - LIST_INSERT_HEAD(&im6o->im6o_memberships, imm, i6mm_chain); + im6o->im6o_membership[1] = in6m; + im6o->im6o_num_memberships++; } if (!ifp->if_carp) { @@ -1781,14 +1801,8 @@ carp_set_addr6(struct carp_softc *sc, struct sockaddr_in6 *sin6) return (0); cleanup: - /* clean up multicast memberships */ - if (!sc->sc_naddrs6) { - while (!LIST_EMPTY(&im6o->im6o_memberships)) { - imm = LIST_FIRST(&im6o->im6o_memberships); - LIST_REMOVE(imm, i6mm_chain); - in6_leavegroup(imm); - } - } + if (!sc->sc_naddrs6) + carp_multicast6_cleanup(sc); return (error); } @@ -1799,21 +1813,13 @@ carp_del_addr6(struct carp_softc *sc, struct sockaddr_in6 *sin6) if (!--sc->sc_naddrs6) { struct carp_if *cif = (struct carp_if *)sc->sc_carpdev->if_carp; - struct ip6_moptions *im6o = &sc->sc_im6o; CARP_LOCK(cif); callout_stop(&sc->sc_ad_tmo); SC2IFP(sc)->if_flags &= ~IFF_UP; SC2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING; sc->sc_vhid = -1; - while (!LIST_EMPTY(&im6o->im6o_memberships)) { - struct in6_multi_mship *imm = - LIST_FIRST(&im6o->im6o_memberships); - - LIST_REMOVE(imm, i6mm_chain); - in6_leavegroup(imm); - } - im6o->im6o_multicast_ifp = NULL; + carp_multicast6_cleanup(sc); TAILQ_REMOVE(&cif->vhif_vrs, sc, sc_list); if (!--cif->vhif_nvrs) { CARP_LOCK_DESTROY(cif); diff --git a/sys/netinet6/icmp6.c b/sys/netinet6/icmp6.c index 866fee1..1c8a132 100644 --- a/sys/netinet6/icmp6.c +++ b/sys/netinet6/icmp6.c @@ -147,8 +147,6 @@ icmp6_init(void) INIT_VNET_INET6(curvnet); V_icmp6errpps_count = 0; - - mld6_init(); } static void @@ -429,6 +427,23 @@ icmp6_input(struct mbuf **mp, int *offp, int proto) } /* + * Check multicast group membership. + * Note: SSM filters are not applied for ICMPv6 traffic. + */ + if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { + struct ifnet *ifp; + struct in6_multi *inm; + + ifp = m->m_pkthdr.rcvif; + inm = in6m_lookup(ifp, &ip6->ip6_dst); + if (inm == NULL) { + IP6STAT_INC(ip6s_notmember); + in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_discard); + goto freeit; + } + } + + /* * calculate the checksum */ #ifndef PULLDOWN_TEST @@ -615,34 +630,20 @@ icmp6_input(struct mbuf **mp, int *offp, int proto) case MLD_LISTENER_QUERY: case MLD_LISTENER_REPORT: - if (icmp6len < sizeof(struct mld_hdr)) - goto badlen; - if (icmp6->icmp6_type == MLD_LISTENER_QUERY) /* XXX: ugly... */ - icmp6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_mldquery); - else - icmp6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_mldreport); - if ((n = m_copym(m, 0, M_COPYALL, M_DONTWAIT)) == NULL) { - /* give up local */ - mld6_input(m, off); - m = NULL; + case MLD_LISTENER_DONE: + case MLDV2_LISTENER_REPORT: + /* + * Drop MLD traffic which is not link-local. + * XXX Should we also sanity check that these messages + * were directed to a link-local multicast prefix? + */ + if (ip6->ip6_hlim != 1) goto freeit; - } - mld6_input(n, off); + if (mld_input(m, off, icmp6len) != 0) + return (IPPROTO_DONE); /* m stays. */ break; - case MLD_LISTENER_DONE: - icmp6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_mlddone); - if (icmp6len < sizeof(struct mld_hdr)) /* necessary? */ - goto badlen; - break; /* nothing to be done in kernel */ - - case MLD_MTRACE_RESP: - case MLD_MTRACE: - /* XXX: these two are experimental. not officially defined. */ - /* XXX: per-interface statistics? */ - break; /* just pass it to applications */ - case ICMP6_WRUREQUEST: /* ICMP6_FQDN_QUERY */ { enum { WRU, FQDN } mode; @@ -2050,7 +2051,7 @@ icmp6_rip6_input(struct mbuf **mp, int off) INP_RUNLOCK(last); } else { m_freem(m); - V_ip6stat.ip6s_delivered--; + IP6STAT_DEC(ip6s_delivered); } return IPPROTO_DONE; } @@ -2222,7 +2223,14 @@ void icmp6_fasttimo(void) { - return; + mld_fasttimo(); +} + +void +icmp6_slowtimo(void) +{ + + mld_slowtimo(); } static const char * diff --git a/sys/netinet6/in6.c b/sys/netinet6/in6.c index d0caa7e..df57bf6 100644 --- a/sys/netinet6/in6.c +++ b/sys/netinet6/in6.c @@ -106,8 +106,6 @@ __FBSDID("$FreeBSD$"); #include <netinet6/in6_pcb.h> #include <netinet6/vinet6.h> -MALLOC_DEFINE(M_IP6MADDR, "in6_multi", "internet multicast address"); - /* * Definitions of some costant IP6 addresses. */ @@ -119,6 +117,8 @@ const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT; const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT; +const struct in6_addr in6addr_linklocal_allv2routers = + IN6ADDR_LINKLOCAL_ALLV2ROUTERS_INIT; const struct in6_addr in6mask0 = IN6MASK0; const struct in6_addr in6mask32 = IN6MASK32; @@ -135,7 +135,6 @@ static int in6_ifinit __P((struct ifnet *, struct in6_ifaddr *, struct sockaddr_in6 *, int)); static void in6_unlink_ifa(struct in6_ifaddr *, struct ifnet *); -struct in6_multihead in6_multihead; /* XXX BSS initialization */ int (*faithprefix_p)(struct in6_addr *); @@ -1110,10 +1109,12 @@ in6_update_ifa(struct ifnet *ifp, struct in6_aliasreq *ifra, * should be larger than the MLD delay (this could be * relaxed a bit, but this simple logic is at least * safe). + * XXX: Break data hiding guidelines and look at + * state for the solicited multicast group. */ mindelay = 0; if (in6m_sol != NULL && - in6m_sol->in6m_state == MLD_REPORTPENDING) { + in6m_sol->in6m_state == MLD_REPORTING_MEMBER) { mindelay = in6m_sol->in6m_timer; } maxdelay = MAX_RTR_SOLICITATION_DELAY * hz; @@ -1590,36 +1591,6 @@ in6_ifinit(struct ifnet *ifp, struct in6_ifaddr *ia, return (error); } -struct in6_multi_mship * -in6_joingroup(struct ifnet *ifp, struct in6_addr *addr, - int *errorp, int delay) -{ - struct in6_multi_mship *imm; - - imm = malloc(sizeof(*imm), M_IP6MADDR, M_NOWAIT); - if (!imm) { - *errorp = ENOBUFS; - return NULL; - } - imm->i6mm_maddr = in6_addmulti(addr, ifp, errorp, delay); - if (!imm->i6mm_maddr) { - /* *errorp is alrady set */ - free(imm, M_IP6MADDR); - return NULL; - } - return imm; -} - -int -in6_leavegroup(struct in6_multi_mship *imm) -{ - - if (imm->i6mm_maddr) - in6_delmulti(imm->i6mm_maddr); - free(imm, M_IP6MADDR); - return 0; -} - /* * Find an IPv6 interface link-local address specific to an interface. */ @@ -2328,6 +2299,9 @@ in6_domifattach(struct ifnet *ifp) ext->lltable->llt_lookup = in6_lltable_lookup; ext->lltable->llt_dump = in6_lltable_dump; } + + ext->mld_ifinfo = mld_domifattach(ifp); + return ext; } @@ -2336,6 +2310,7 @@ in6_domifdetach(struct ifnet *ifp, void *aux) { struct in6_ifextra *ext = (struct in6_ifextra *)aux; + mld_domifdetach(ifp); scope6_ifdetach(ext->scope6_id); nd6_ifdetach(ext->nd_ifinfo); lltable_free(ext->lltable); diff --git a/sys/netinet6/in6_ifattach.c b/sys/netinet6/in6_ifattach.c index 80b6d6b..077014e 100644 --- a/sys/netinet6/in6_ifattach.c +++ b/sys/netinet6/in6_ifattach.c @@ -63,6 +63,7 @@ __FBSDID("$FreeBSD$"); #include <netinet6/in6_ifattach.h> #include <netinet6/ip6_var.h> #include <netinet6/nd6.h> +#include <netinet6/mld6_var.h> #include <netinet6/scope6_var.h> #include <netinet6/vinet6.h> @@ -918,11 +919,35 @@ in6_tmpaddrtimer(void *ignored_arg) static void in6_purgemaddrs(struct ifnet *ifp) { - struct in6_multi *in6m; - struct in6_multi *oin6m; + INIT_VNET_INET6(ifp->if_vnet); + LIST_HEAD(,in6_multi) purgeinms; + struct in6_multi *inm, *tinm; + struct ifmultiaddr *ifma; + + LIST_INIT(&purgeinms); + IN6_MULTI_LOCK(); + + /* + * Extract list of in6_multi associated with the detaching ifp + * which the PF_INET6 layer is about to release. + * We need to do this as IF_ADDR_LOCK() may be re-acquired + * by code further down. + */ + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { + if (ifma->ifma_addr->sa_family != AF_INET6 || + ifma->ifma_protospec == NULL) + continue; + inm = (struct in6_multi *)ifma->ifma_protospec; + LIST_INSERT_HEAD(&purgeinms, inm, in6m_entry); + } + IF_ADDR_UNLOCK(ifp); - LIST_FOREACH_SAFE(in6m, &in6_multihead, in6m_entry, oin6m) { - if (in6m->in6m_ifp == ifp) - in6_delmulti(in6m); + LIST_FOREACH_SAFE(inm, &purgeinms, in6m_entry, tinm) { + LIST_REMOVE(inm, in6m_entry); + in6m_release_locked(inm); } + mld_ifdetach(ifp); + + IN6_MULTI_UNLOCK(); } diff --git a/sys/netinet6/in6_mcast.c b/sys/netinet6/in6_mcast.c index a4d435c..b3f272c 100644 --- a/sys/netinet6/in6_mcast.c +++ b/sys/netinet6/in6_mcast.c @@ -29,6 +29,7 @@ /* * IPv6 multicast socket, group, and socket option processing module. + * Normative references: RFC 2292, RFC 3492, RFC 3542, RFC 3678, RFC 3810. */ #include <sys/cdefs.h> @@ -142,6 +143,9 @@ static struct ip6_moptions * static int in6p_get_source_filters(struct inpcb *, struct sockopt *); static int in6p_join_group(struct inpcb *, struct sockopt *); static int in6p_leave_group(struct inpcb *, struct sockopt *); +static struct ifnet * + in6p_lookup_mcast_ifp(const struct inpcb *, + const struct sockaddr_in6 *); static int in6p_block_unblock_source(struct inpcb *, struct sockopt *); static int in6p_set_multicast_if(struct inpcb *, struct sockopt *); static int in6p_set_source_filters(struct inpcb *, struct sockopt *); @@ -1655,12 +1659,12 @@ int ip6_getmoptions(struct inpcb *inp, struct sockopt *sopt) { INIT_VNET_INET6(curvnet); - struct ip6_moptions *imo; - int error, optval; - u_char coptval; + struct ip6_moptions *im6o; + int error; + u_int optval; INP_WLOCK(inp); - imo = inp->in6p_moptions; + im6o = inp->in6p_moptions; /* * If socket is neither of type SOCK_RAW or SOCK_DGRAM, * or is a divert socket, reject it. @@ -1674,38 +1678,36 @@ ip6_getmoptions(struct inpcb *inp, struct sockopt *sopt) error = 0; switch (sopt->sopt_name) { -#if 0 /* XXX FIXME */ case IPV6_MULTICAST_IF: - if (imo == NULL || imo->im6o_multicast_ifp == NULL) { + if (im6o == NULL || im6o->im6o_multicast_ifp == NULL) { optval = 0; } else { - optval = imo->im6o_multicast_ifp->if_index; + optval = im6o->im6o_multicast_ifp->if_index; } INP_WUNLOCK(inp); - error = sooptcopyout(sopt, &ifindex, sizeof(u_int)); + error = sooptcopyout(sopt, &optval, sizeof(u_int)); break; -#endif case IPV6_MULTICAST_HOPS: - if (imo == 0) - optval = coptval = V_ip6_defmcasthlim; + if (im6o == NULL) + optval = V_ip6_defmcasthlim; else - optval = coptval = imo->im6o_multicast_loop; + optval = im6o->im6o_multicast_loop; INP_WUNLOCK(inp); error = sooptcopyout(sopt, &optval, sizeof(u_int)); break; case IPV6_MULTICAST_LOOP: - if (imo == 0) - optval = coptval = IPV6_DEFAULT_MULTICAST_LOOP; + if (im6o == NULL) + optval = in6_mcast_loop; /* XXX VIMAGE */ else - optval = coptval = imo->im6o_multicast_loop; + optval = im6o->im6o_multicast_loop; INP_WUNLOCK(inp); error = sooptcopyout(sopt, &optval, sizeof(u_int)); break; case IPV6_MSFILTER: - if (imo == NULL) { + if (im6o == NULL) { error = EADDRNOTAVAIL; INP_WUNLOCK(inp); } else { @@ -1725,7 +1727,57 @@ ip6_getmoptions(struct inpcb *inp, struct sockopt *sopt) } /* + * Look up the ifnet to use for a multicast group membership, + * given the address of an IPv6 group. + * + * This routine exists to support legacy IPv6 multicast applications. + * + * If inp is non-NULL, use this socket's current FIB number for any + * required FIB lookup. Look up the group address in the unicast FIB, + * and use its ifp; usually, this points to the default next-hop. + * If the FIB lookup fails, return NULL. + * + * FUTURE: Support multiple forwarding tables for IPv6. + * + * Returns NULL if no ifp could be found. + */ +static struct ifnet * +in6p_lookup_mcast_ifp(const struct inpcb *in6p __unused, + const struct sockaddr_in6 *gsin6) +{ + INIT_VNET_INET6(curvnet); + struct route_in6 ro6; + struct ifnet *ifp; + + KASSERT(in6p->inp_vflag & INP_IPV6, + ("%s: not INP_IPV6 inpcb", __func__)); + KASSERT(gsin6->sin6_family == AF_INET6, + ("%s: not AF_INET6 group", __func__)); + KASSERT(IN6_IS_ADDR_MULTICAST(&gsin6->sin6_addr), + ("%s: not multicast", __func__)); + + ifp = NULL; + memset(&ro6, 0, sizeof(struct route_in6)); + memcpy(&ro6.ro_dst, gsin6, sizeof(struct sockaddr_in6)); +#ifdef notyet + rtalloc_ign_fib(&ro6, 0, inp ? inp->inp_inc.inc_fibnum : 0); +#else + rtalloc_ign((struct route *)&ro6, 0); +#endif + if (ro6.ro_rt != NULL) { + ifp = ro6.ro_rt->rt_ifp; + KASSERT(ifp != NULL, ("%s: null ifp", __func__)); + RTFREE(ro6.ro_rt); + } + + return (ifp); +} + +/* * Join an IPv6 multicast group, possibly with a source. + * + * FIXME: The KAME use of the unspecified address (::) + * to join *all* multicast groups is currently unsupported. */ static int in6p_join_group(struct inpcb *inp, struct sockopt *sopt) @@ -1765,8 +1817,14 @@ in6p_join_group(struct inpcb *inp, struct sockopt *sopt) gsa->sin6.sin6_len = sizeof(struct sockaddr_in6); gsa->sin6.sin6_addr = mreq.ipv6mr_multiaddr; - ifp = ifnet_byindex(mreq.ipv6mr_interface); - + if (mreq.ipv6mr_interface == 0) { + ifp = in6p_lookup_mcast_ifp(inp, &gsa->sin6); + } else { + if (mreq.ipv6mr_interface < 0 || + V_if_index < mreq.ipv6mr_interface) + return (EADDRNOTAVAIL); + ifp = ifnet_byindex(mreq.ipv6mr_interface); + } CTR3(KTR_MLD, "%s: ipv6mr_interface = %d, ifp = %p", __func__, mreq.ipv6mr_interface, ifp); } break; @@ -1813,12 +1871,35 @@ in6p_join_group(struct inpcb *inp, struct sockopt *sopt) break; } +#ifdef notyet + /* + * FIXME: Check for unspecified address (all groups). + * Do we have a normative reference for this 'feature'? + * + * We use the unspecified address to specify to accept + * all multicast addresses. Only super user is allowed + * to do this. + * XXX-BZ might need a better PRIV_NETINET_x for this + */ + if (IN6_IS_ADDR_UNSPECIFIED(&gsa->sin6.sin6_addr)) { + error = priv_check(curthread, PRIV_NETINET_MROUTE); + if (error) + break; + } else +#endif if (!IN6_IS_ADDR_MULTICAST(&gsa->sin6.sin6_addr)) return (EINVAL); if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0) return (EADDRNOTAVAIL); +#ifdef notyet + /* + * FIXME: Set interface scope in group address. + */ + (void)in6_setscope(&gsa->sin6.sin_addr, ifp, NULL); +#endif + /* * MCAST_JOIN_SOURCE on an exclusive membership is an error. * On an existing inclusive membership, it just adds the @@ -1987,7 +2068,23 @@ in6p_leave_group(struct inpcb *inp, struct sockopt *sopt) gsa->sin6.sin6_family = AF_INET6; gsa->sin6.sin6_len = sizeof(struct sockaddr_in6); gsa->sin6.sin6_addr = mreq.ipv6mr_multiaddr; - ifp = ifnet_byindex(mreq.ipv6mr_interface); + + if (mreq.ipv6mr_interface == 0) { +#ifdef notyet + /* + * FIXME: Resolve scope ambiguity when interface + * index is unspecified. + */ + ifp = in6p_lookup_mcast_ifp(inp, &gsa->sin6); +#else + return (EADDRNOTAVAIL); +#endif + } else { + if (mreq.ipv6mr_interface < 0 || + V_if_index < mreq.ipv6mr_interface) + return (EADDRNOTAVAIL); + ifp = ifnet_byindex(mreq.ipv6mr_interface); + } CTR3(KTR_MLD, "%s: ipv6mr_interface = %d, ifp = %p", __func__, mreq.ipv6mr_interface, ifp); @@ -2033,6 +2130,15 @@ in6p_leave_group(struct inpcb *inp, struct sockopt *sopt) if (!IN6_IS_ADDR_MULTICAST(&gsa->sin6.sin6_addr)) return (EINVAL); +#ifdef notyet + /* + * FIXME: Need to embed ifp's scope ID in the address + * handed down to MLD. + * See KAME IPV6_LEAVE_GROUP implementation. + */ + (void)in6_setscope(&mreq->ipv6mr_multiaddr, ifp, NULL); +#endif + /* * Find the membership in the membership array. */ @@ -2348,7 +2454,7 @@ out_in6p_locked: int ip6_setmoptions(struct inpcb *inp, struct sockopt *sopt) { - struct ip6_moptions *imo; + struct ip6_moptions *im6o; int error; error = 0; @@ -2364,7 +2470,6 @@ ip6_setmoptions(struct inpcb *inp, struct sockopt *sopt) switch (sopt->sopt_name) { case IPV6_MULTICAST_IF: - /* XXX in v6 this one is far more involved */ error = in6p_set_multicast_if(inp, sopt); break; @@ -2381,9 +2486,11 @@ ip6_setmoptions(struct inpcb *inp, struct sockopt *sopt) if (hlim < -1 || hlim > 255) { error = EINVAL; break; + } else if (hlim == -1) { + hlim = V_ip6_defmcasthlim; } - imo = in6p_findmoptions(inp); - imo->im6o_multicast_hlim = hlim; + im6o = in6p_findmoptions(inp); + im6o->im6o_multicast_hlim = hlim; INP_WUNLOCK(inp); break; } @@ -2393,9 +2500,7 @@ ip6_setmoptions(struct inpcb *inp, struct sockopt *sopt) /* * Set the loopback flag for outgoing multicast packets. - * Must be zero or one. The orimcaddrl multicast API required a - * char argument, which is inconsistent with the rest - * of the socket API. We allow either a char or an int. + * Must be zero or one. */ if (sopt->sopt_valsize != sizeof(u_int)) { error = EINVAL; @@ -2404,8 +2509,12 @@ ip6_setmoptions(struct inpcb *inp, struct sockopt *sopt) error = sooptcopyin(sopt, &loop, sizeof(u_int), sizeof(u_int)); if (error) break; - imo = in6p_findmoptions(inp); - imo->im6o_multicast_loop = loop; + if (loop > 1) { + error = EINVAL; + break; + } + im6o = in6p_findmoptions(inp); + im6o->im6o_multicast_loop = loop; INP_WUNLOCK(inp); break; } diff --git a/sys/netinet6/in6_pcb.c b/sys/netinet6/in6_pcb.c index 79e79cb..e446a05 100644 --- a/sys/netinet6/in6_pcb.c +++ b/sys/netinet6/in6_pcb.c @@ -733,36 +733,36 @@ in6_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct ifnet *ifp) { struct inpcb *in6p; struct ip6_moptions *im6o; - struct in6_multi_mship *imm, *nimm; + int i, gap; INP_INFO_RLOCK(pcbinfo); LIST_FOREACH(in6p, pcbinfo->ipi_listhead, inp_list) { INP_WLOCK(in6p); im6o = in6p->in6p_moptions; - if ((in6p->inp_vflag & INP_IPV6) && - im6o) { + if ((in6p->inp_vflag & INP_IPV6) && im6o != NULL) { /* - * Unselect the outgoing interface if it is being - * detached. + * Unselect the outgoing ifp for multicast if it + * is being detached. */ if (im6o->im6o_multicast_ifp == ifp) im6o->im6o_multicast_ifp = NULL; - /* * Drop multicast group membership if we joined * through the interface being detached. - * XXX controversial - is it really legal for kernel - * to force this? */ - for (imm = im6o->im6o_memberships.lh_first; - imm != NULL; imm = nimm) { - nimm = imm->i6mm_chain.le_next; - if (imm->i6mm_maddr->in6m_ifp == ifp) { - LIST_REMOVE(imm, i6mm_chain); - in6_delmulti(imm->i6mm_maddr); - free(imm, M_IP6MADDR); + gap = 0; + for (i = 0; i < im6o->im6o_num_memberships; i++) { + if (im6o->im6o_membership[i]->in6m_ifp == + ifp) { + in6_mc_leave(im6o->im6o_membership[i], + NULL); + gap++; + } else if (gap != 0) { + im6o->im6o_membership[i - gap] = + im6o->im6o_membership[i]; } } + im6o->im6o_num_memberships -= gap; } INP_WUNLOCK(in6p); } diff --git a/sys/netinet6/in6_proto.c b/sys/netinet6/in6_proto.c index 8908d67..622777c 100644 --- a/sys/netinet6/in6_proto.c +++ b/sys/netinet6/in6_proto.c @@ -236,6 +236,7 @@ struct ip6protosw inet6sw[] = { .pr_ctloutput = rip6_ctloutput, .pr_init = icmp6_init, .pr_fasttimo = icmp6_fasttimo, + .pr_slowtimo = icmp6_slowtimo, .pr_usrreqs = &rip6_usrreqs }, { diff --git a/sys/netinet6/in6_var.h b/sys/netinet6/in6_var.h index a472a24..9a846c3 100644 --- a/sys/netinet6/in6_var.h +++ b/sys/netinet6/in6_var.h @@ -64,6 +64,12 @@ #ifndef _NETINET6_IN6_VAR_H_ #define _NETINET6_IN6_VAR_H_ +#include <sys/tree.h> + +#ifdef _KERNEL +#include <sys/libkern.h> +#endif + /* * Interface address, Internet version. One of these structures * is allocated for each interface with an Internet address. @@ -89,12 +95,15 @@ struct in6_addrlifetime { struct nd_ifinfo; struct scope6_id; struct lltable; +struct mld_ifinfo; + struct in6_ifextra { struct in6_ifstat *in6_ifstat; struct icmp6_ifstat *icmp6_ifstat; struct nd_ifinfo *nd_ifinfo; struct scope6_id *scope6_id; struct lltable *lltable; + struct mld_ifinfo *mld_ifinfo; }; #define LLTABLE6(ifp) (((struct in6_ifextra *)(ifp)->if_afdata[AF_INET6])->lltable) @@ -489,9 +498,6 @@ do { \ extern struct in6_addr zeroin6_addr; extern u_char inet6ctlerrmap[]; -#ifdef MALLOC_DECLARE -MALLOC_DECLARE(M_IP6MADDR); -#endif /* MALLOC_DECLARE */ /* * Macro for finding the internet address structure (in6_ifaddr) corresponding @@ -514,94 +520,243 @@ do { \ #endif /* _KERNEL */ /* - * Multi-cast membership entry. One for each group/ifp that a PCB - * belongs to. + * IPv6 multicast MLD-layer source entry. + */ +struct ip6_msource { + RB_ENTRY(ip6_msource) im6s_link; /* RB tree links */ + struct in6_addr im6s_addr; + struct im6s_st { + uint16_t ex; /* # of exclusive members */ + uint16_t in; /* # of inclusive members */ + } im6s_st[2]; /* state at t0, t1 */ + uint8_t im6s_stp; /* pending query */ +}; +RB_HEAD(ip6_msource_tree, ip6_msource); + +/* + * IPv6 multicast PCB-layer source entry. + * + * NOTE: overlapping use of struct ip6_msource fields at start. + */ +struct in6_msource { + RB_ENTRY(ip6_msource) im6s_link; /* Common field */ + struct in6_addr im6s_addr; /* Common field */ + uint8_t im6sl_st[2]; /* state before/at commit */ +}; + +#ifdef _KERNEL +/* + * IPv6 source tree comparison function. + * + * An ordered predicate is necessary; bcmp() is not documented to return + * an indication of order, memcmp() is, and is an ISO C99 requirement. + */ +static __inline int +ip6_msource_cmp(const struct ip6_msource *a, const struct ip6_msource *b) +{ + + return (memcmp(&a->im6s_addr, &b->im6s_addr, sizeof(struct in6_addr))); +} +RB_PROTOTYPE(ip6_msource_tree, ip6_msource, im6s_link, ip6_msource_cmp); +#endif /* _KERNEL */ + +/* + * IPv6 multicast PCB-layer group filter descriptor. + */ +struct in6_mfilter { + struct ip6_msource_tree im6f_sources; /* source list for (S,G) */ + u_long im6f_nsrc; /* # of source entries */ + uint8_t im6f_st[2]; /* state before/at commit */ +}; + +/* + * Legacy KAME IPv6 multicast membership descriptor. */ struct in6_multi_mship { - struct in6_multi *i6mm_maddr; /* Multicast address pointer */ - LIST_ENTRY(in6_multi_mship) i6mm_chain; /* multicast options chain */ + struct in6_multi *i6mm_maddr; + LIST_ENTRY(in6_multi_mship) i6mm_chain; }; -struct in6_multi { +/* + * IPv6 group descriptor. + * + * For every entry on an ifnet's if_multiaddrs list which represents + * an IP multicast group, there is one of these structures. + * + * If any source filters are present, then a node will exist in the RB-tree + * to permit fast lookup by source whenever an operation takes place. + * This permits pre-order traversal when we issue reports. + * Source filter trees are kept separately from the socket layer to + * greatly simplify locking. + * + * When MLDv2 is active, in6m_timer is the response to group query timer. + * The state-change timer in6m_sctimer is separate; whenever state changes + * for the group the state change record is generated and transmitted, + * and kept if retransmissions are necessary. + * + * FUTURE: in6m_link is now only used when groups are being purged + * on a detaching ifnet. It could be demoted to a SLIST_ENTRY, but + * because it is at the very start of the struct, we can't do this + * w/o breaking the ABI for ifmcstat. + */ +struct in6_multi { LIST_ENTRY(in6_multi) in6m_entry; /* list glue */ - struct in6_addr in6m_addr; /* IP6 multicast address */ + struct in6_addr in6m_addr; /* IPv6 multicast address */ struct ifnet *in6m_ifp; /* back pointer to ifnet */ struct ifmultiaddr *in6m_ifma; /* back pointer to ifmultiaddr */ - u_int in6m_refcount; /* # membership claims by sockets */ + u_int in6m_refcount; /* reference count */ u_int in6m_state; /* state of the membership */ u_int in6m_timer; /* MLD6 listener report timer */ - struct timeval in6m_timer_expire; /* when the timer expires */ - struct callout *in6m_timer_ch; -}; -#define IN6M_TIMER_UNDEF -1 + /* New fields for MLDv2 follow. */ + struct mld_ifinfo *in6m_mli; /* MLD info */ + SLIST_ENTRY(in6_multi) in6m_nrele; /* to-be-released by MLD */ + struct ip6_msource_tree in6m_srcs; /* tree of sources */ + u_long in6m_nsrc; /* # of tree entries */ -#ifdef _KERNEL -/* flags to in6_update_ifa */ -#define IN6_IFAUPDATE_DADDELAY 0x1 /* first time to configure an address */ + struct ifqueue in6m_scq; /* queue of pending + * state-change packets */ + struct timeval in6m_lastgsrtv; /* last G-S-R query */ + uint16_t in6m_sctimer; /* state-change timer */ + uint16_t in6m_scrv; /* state-change rexmit count */ -extern LIST_HEAD(in6_multihead, in6_multi) in6_multihead; + /* + * SSM state counters which track state at T0 (the time the last + * state-change report's RV timer went to zero) and T1 + * (time of pending report, i.e. now). + * Used for computing MLDv2 state-change reports. Several refcounts + * are maintained here to optimize for common use-cases. + */ + struct in6m_st { + uint16_t iss_fmode; /* MLD filter mode */ + uint16_t iss_asm; /* # of ASM listeners */ + uint16_t iss_ex; /* # of exclusive members */ + uint16_t iss_in; /* # of inclusive members */ + uint16_t iss_rec; /* # of recorded sources */ + } in6m_st[2]; /* state at t0, t1 */ +}; /* - * Structure used by macros below to remember position when stepping through - * all of the in6_multi records. + * Helper function to derive the filter mode on a source entry + * from its internal counters. Predicates are: + * A source is only excluded if all listeners exclude it. + * A source is only included if no listeners exclude it, + * and at least one listener includes it. + * May be used by ifmcstat(8). */ -struct in6_multistep { - struct in6_ifaddr *i_ia; - struct in6_multi *i_in6m; -}; +static __inline uint8_t +im6s_get_mode(const struct in6_multi *inm, const struct ip6_msource *ims, + uint8_t t) +{ + + t = !!t; + if (inm->in6m_st[t].iss_ex > 0 && + inm->in6m_st[t].iss_ex == ims->im6s_st[t].ex) + return (MCAST_EXCLUDE); + else if (ims->im6s_st[t].in > 0 && ims->im6s_st[t].ex == 0) + return (MCAST_INCLUDE); + return (MCAST_UNDEFINED); +} + +#ifdef _KERNEL /* - * Macros for looking up the in6_multi record for a given IP6 multicast - * address on a given interface. If no matching record is found, "in6m" - * returns NULL. + * Lock macros for IPv6 layer multicast address lists. IPv6 lock goes + * before link layer multicast locks in the lock order. In most cases, + * consumers of IN_*_MULTI() macros should acquire the locks before + * calling them; users of the in_{add,del}multi() functions should not. */ +extern struct mtx in6_multi_mtx; +#define IN6_MULTI_LOCK() mtx_lock(&in6_multi_mtx) +#define IN6_MULTI_UNLOCK() mtx_unlock(&in6_multi_mtx) +#define IN6_MULTI_LOCK_ASSERT() mtx_assert(&in6_multi_mtx, MA_OWNED) +#define IN6_MULTI_UNLOCK_ASSERT() mtx_assert(&in6_multi_mtx, MA_NOTOWNED) -#define IN6_LOOKUP_MULTI(addr, ifp, in6m) \ -/* struct in6_addr addr; */ \ -/* struct ifnet *ifp; */ \ -/* struct in6_multi *in6m; */ \ -do { \ - struct ifmultiaddr *ifma; \ - IF_ADDR_LOCK(ifp); \ - TAILQ_FOREACH(ifma, &(ifp)->if_multiaddrs, ifma_link) { \ - if (ifma->ifma_addr->sa_family == AF_INET6 \ - && IN6_ARE_ADDR_EQUAL(&((struct sockaddr_in6 *)ifma->ifma_addr)->sin6_addr, \ - &(addr))) \ - break; \ - } \ - (in6m) = (struct in6_multi *)(ifma ? ifma->ifma_protospec : 0); \ - IF_ADDR_UNLOCK(ifp); \ -} while(0) +/* + * Look up an in6_multi record for an IPv6 multicast address + * on the interface ifp. + * If no record found, return NULL. + * + * SMPng: The IN6_MULTI_LOCK and IF_ADDR_LOCK on ifp must be held. + */ +static __inline struct in6_multi * +in6m_lookup_locked(struct ifnet *ifp, const struct in6_addr *mcaddr) +{ + struct ifmultiaddr *ifma; + struct in6_multi *inm; + + IN6_MULTI_LOCK_ASSERT(); + IF_ADDR_LOCK_ASSERT(ifp); + + inm = NULL; + TAILQ_FOREACH(ifma, &((ifp)->if_multiaddrs), ifma_link) { + if (ifma->ifma_addr->sa_family == AF_INET6) { + inm = (struct in6_multi *)ifma->ifma_protospec; + if (IN6_ARE_ADDR_EQUAL(&inm->in6m_addr, mcaddr)) + break; + inm = NULL; + } + } + return (inm); +} /* - * Macro to step through all of the in6_multi records, one at a time. - * The current position is remembered in "step", which the caller must - * provide. IN6_FIRST_MULTI(), below, must be called to initialize "step" - * and get the first record. Both macros return a NULL "in6m" when there - * are no remaining records. + * Wrapper for in6m_lookup_locked(). + * + * SMPng: Assumes that neithr the IN6_MULTI_LOCK() or IF_ADDR_LOCK() are held. */ -#define IN6_NEXT_MULTI(step, in6m) \ -/* struct in6_multistep step; */ \ -/* struct in6_multi *in6m; */ \ -do { \ - if (((in6m) = (step).i_in6m) != NULL) \ - (step).i_in6m = LIST_NEXT((step).i_in6m, in6m_entry); \ -} while(0) - -#define IN6_FIRST_MULTI(step, in6m) \ -/* struct in6_multistep step; */ \ -/* struct in6_multi *in6m */ \ -do { \ - (step).i_in6m = LIST_FIRST(&in6_multihead); \ - IN6_NEXT_MULTI((step), (in6m)); \ -} while(0) - -struct in6_multi *in6_addmulti __P((struct in6_addr *, struct ifnet *, - int *, int)); -void in6_delmulti __P((struct in6_multi *)); -struct in6_multi_mship *in6_joingroup(struct ifnet *, struct in6_addr *, int *, int); +static __inline struct in6_multi * +in6m_lookup(struct ifnet *ifp, const struct in6_addr *mcaddr) +{ + struct in6_multi *inm; + + IN6_MULTI_LOCK(); + IF_ADDR_LOCK(ifp); + inm = in6m_lookup_locked(ifp, mcaddr); + IF_ADDR_UNLOCK(ifp); + IN6_MULTI_UNLOCK(); + + return (inm); +} + +/* Acquire an in6_multi record. */ +static __inline void +in6m_acquire_locked(struct in6_multi *inm) +{ + + IN6_MULTI_LOCK_ASSERT(); + ++inm->in6m_refcount; +} + +struct ip6_moptions; +struct sockopt; + +/* Multicast KPIs. */ +int im6o_mc_filter(const struct ip6_moptions *, const struct ifnet *, + const struct sockaddr *, const struct sockaddr *); +int in6_mc_join(struct ifnet *, const struct in6_addr *, + struct in6_mfilter *, struct in6_multi **, int); +int in6_mc_join_locked(struct ifnet *, const struct in6_addr *, + struct in6_mfilter *, struct in6_multi **, int); +int in6_mc_leave(struct in6_multi *, struct in6_mfilter *); +int in6_mc_leave_locked(struct in6_multi *, struct in6_mfilter *); +void in6m_clear_recorded(struct in6_multi *); +void in6m_commit(struct in6_multi *); +void in6m_print(const struct in6_multi *); +int in6m_record_source(struct in6_multi *, const struct in6_addr *); +void in6m_release_locked(struct in6_multi *); +void ip6_freemoptions(struct ip6_moptions *); +int ip6_getmoptions(struct inpcb *, struct sockopt *); +int ip6_setmoptions(struct inpcb *, struct sockopt *); + +/* Legacy KAME multicast KPIs. */ +struct in6_multi_mship * + in6_joingroup(struct ifnet *, struct in6_addr *, int *, int); int in6_leavegroup(struct in6_multi_mship *); + +/* flags to in6_update_ifa */ +#define IN6_IFAUPDATE_DADDELAY 0x1 /* first time to configure an address */ + int in6_mask2len __P((struct in6_addr *, u_char *)); int in6_control __P((struct socket *, u_long, caddr_t, struct ifnet *, struct thread *)); @@ -615,8 +770,6 @@ void *in6_domifattach __P((struct ifnet *)); void in6_domifdetach __P((struct ifnet *, void *)); void in6_setmaxmtu __P((void)); int in6_if2idlen __P((struct ifnet *)); -void in6_restoremkludge __P((struct in6_ifaddr *, struct ifnet *)); -void in6_purgemkludge __P((struct ifnet *)); struct in6_ifaddr *in6ifa_ifpforlinklocal __P((struct ifnet *, int)); struct in6_ifaddr *in6ifa_ifpwithaddr __P((struct ifnet *, struct in6_addr *)); char *ip6_sprintf __P((char *, const struct in6_addr *)); diff --git a/sys/netinet6/ip6_input.c b/sys/netinet6/ip6_input.c index 5654c94..69ac45c 100644 --- a/sys/netinet6/ip6_input.c +++ b/sys/netinet6/ip6_input.c @@ -555,25 +555,12 @@ passin: } /* - * Multicast check + * Multicast check. Assume packet is for us to avoid + * prematurely taking locks. */ if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { - struct in6_multi *in6m = 0; - + ours = 1; in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_mcast); - /* - * See if we belong to the destination multicast group on the - * arrival interface. - */ - IN6_LOOKUP_MULTI(ip6->ip6_dst, m->m_pkthdr.rcvif, in6m); - if (in6m) - ours = 1; - else if (!ip6_mrouter) { - V_ip6stat.ip6s_notmember++; - V_ip6stat.ip6s_cantforward++; - in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_discard); - goto bad; - } deliverifp = m->m_pkthdr.rcvif; goto hbhcheck; } @@ -823,7 +810,8 @@ passin: /* * Forward if desirable. */ - if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { + if (V_ip6_mrouter && + IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { /* * If we are acting as a multicast router, all * incoming multicast packets are passed to the @@ -832,13 +820,12 @@ passin: * ip6_mforward() returns a non-zero value, the packet * must be discarded, else it may be accepted below. */ - if (ip6_mrouter && ip6_mforward && + if (ip6_mforward && ip6_mforward(ip6, m->m_pkthdr.rcvif, m)) { - V_ip6stat.ip6s_cantforward++; + IP6STAT_INC(ip6s_cantforward); + in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_discard); goto bad; } - if (!ours) - goto bad; } else if (!ours) { ip6_forward(m, srcrt); goto out; diff --git a/sys/netinet6/ip6_mroute.c b/sys/netinet6/ip6_mroute.c index 29201d6..a88a9a1 100644 --- a/sys/netinet6/ip6_mroute.c +++ b/sys/netinet6/ip6_mroute.c @@ -93,6 +93,7 @@ __FBSDID("$FreeBSD$"); #include <sys/malloc.h> #include <sys/mbuf.h> #include <sys/module.h> +#include <sys/domain.h> #include <sys/protosw.h> #include <sys/signalvar.h> #include <sys/socket.h> @@ -140,6 +141,7 @@ static int set_pim6(int *); static int socket_send(struct socket *, struct mbuf *, struct sockaddr_in6 *); +extern int in6_mcast_loop; extern struct domain inet6domain; static const struct encaptab *pim6_encap_cookie; @@ -367,7 +369,7 @@ X_ip6_mrouter_set(struct socket *so, struct sockopt *sopt) struct mf6cctl mfcc; mifi_t mifi; - if (so != ip6_mrouter && sopt->sopt_name != MRT6_INIT) + if (so != V_ip6_mrouter && sopt->sopt_name != MRT6_INIT) return (EACCES); switch (sopt->sopt_name) { @@ -432,7 +434,7 @@ X_ip6_mrouter_get(struct socket *so, struct sockopt *sopt) INIT_VNET_INET6(curvnet); int error = 0; - if (so != ip6_mrouter) + if (so != V_ip6_mrouter) return (EACCES); switch (sopt->sopt_name) { @@ -560,12 +562,12 @@ ip6_mrouter_init(struct socket *so, int v, int cmd) MROUTER6_LOCK(); - if (ip6_mrouter != NULL) { + if (V_ip6_mrouter != NULL) { MROUTER6_UNLOCK(); return (EADDRINUSE); } - ip6_mrouter = so; + V_ip6_mrouter = so; V_ip6_mrouter_ver = cmd; bzero((caddr_t)mf6ctable, sizeof(mf6ctable)); @@ -601,7 +603,7 @@ X_ip6_mrouter_done(void) MROUTER6_LOCK(); - if (ip6_mrouter == NULL) { + if (V_ip6_mrouter == NULL) { MROUTER6_UNLOCK(); return (EINVAL); } @@ -657,7 +659,7 @@ X_ip6_mrouter_done(void) multicast_register_if6 = NULL; } - ip6_mrouter = NULL; + V_ip6_mrouter = NULL; V_ip6_mrouter_ver = 0; MROUTER6_UNLOCK(); @@ -1293,7 +1295,7 @@ X_ip6_mforward(struct ip6_hdr *ip6, struct ifnet *ifp, struct mbuf *m) break; } - if (socket_send(ip6_mrouter, mm, &sin6) < 0) { + if (socket_send(V_ip6_mrouter, mm, &sin6) < 0) { log(LOG_WARNING, "ip6_mforward: ip6_mrouter " "socket queue full\n"); mrt6stat.mrt6s_upq_sockfull++; @@ -1531,7 +1533,7 @@ ip6_mdq(struct mbuf *m, struct ifnet *ifp, struct mf6c *rt) mrt6stat.mrt6s_upcalls++; - if (socket_send(ip6_mrouter, mm, &sin6) < 0) { + if (socket_send(V_ip6_mrouter, mm, &sin6) < 0) { #ifdef MRT6DEBUG if (V_mrt6debug) log(LOG_WARNING, "mdq, ip6_mrouter socket queue full\n"); @@ -1603,10 +1605,11 @@ phyint_send(struct ip6_hdr *ip6, struct mif6 *mifp, struct mbuf *m) struct mbuf *mb_copy; struct ifnet *ifp = mifp->m6_ifp; int error = 0; - struct in6_multi *in6m; struct sockaddr_in6 *dst6; u_long linkmtu; + dst6 = &mifp->m6_route.ro_dst; + /* * Make a new reference to the packet; make sure that * the IPv6 header is actually copied, not just referenced, @@ -1648,17 +1651,16 @@ phyint_send(struct ip6_hdr *ip6, struct mif6 *mifp, struct mbuf *m) } /* - * If we belong to the destination multicast group - * on the outgoing interface, loop back a copy. + * If configured to loop back multicasts by default, + * loop back a copy now. */ - dst6 = &mifp->m6_route.ro_dst; - IN6_LOOKUP_MULTI(ip6->ip6_dst, ifp, in6m); - if (in6m != NULL) { + if (in6_mcast_loop) { dst6->sin6_len = sizeof(struct sockaddr_in6); dst6->sin6_family = AF_INET6; dst6->sin6_addr = ip6->ip6_dst; ip6_mloopback(ifp, m, &mifp->m6_route.ro_dst); } + /* * Put the packet into the sending queue of the outgoing interface * if it would fit in the MTU of the interface. @@ -1759,7 +1761,7 @@ register_send(struct ip6_hdr *ip6, struct mif6 *mif, struct mbuf *m) /* iif info is not given for reg. encap.n */ mrt6stat.mrt6s_upcalls++; - if (socket_send(ip6_mrouter, mm, &sin6) < 0) { + if (socket_send(V_ip6_mrouter, mm, &sin6) < 0) { #ifdef MRT6DEBUG if (V_mrt6debug) log(LOG_WARNING, @@ -2056,7 +2058,7 @@ ip6_mroute_modevent(module_t mod, int type, void *unused) break; case MOD_UNLOAD: - if (ip6_mrouter != NULL) + if (V_ip6_mrouter != NULL) return EINVAL; if (pim6_encap_cookie) { diff --git a/sys/netinet6/ip6_output.c b/sys/netinet6/ip6_output.c index 787bc12..61804ec 100644 --- a/sys/netinet6/ip6_output.c +++ b/sys/netinet6/ip6_output.c @@ -110,7 +110,7 @@ __FBSDID("$FreeBSD$"); #include <netinet6/scope6_var.h> #include <netinet6/vinet6.h> -static MALLOC_DEFINE(M_IP6MOPTS, "ip6_moptions", "internet multicast options"); +extern int in6_mcast_loop; struct ip6_exthdrs { struct mbuf *ip6e_ip6; @@ -128,8 +128,6 @@ static int ip6_getpcbopt(struct ip6_pktopts *, int, struct sockopt *); static int ip6_setpktopt __P((int, u_char *, int, struct ip6_pktopts *, struct ucred *, int, int, int)); -static int ip6_setmoptions(int, struct ip6_moptions **, struct mbuf *); -static int ip6_getmoptions(int, struct ip6_moptions *, struct mbuf **); static int ip6_copyexthdr(struct mbuf **, caddr_t, int); static int ip6_insertfraghdr __P((struct mbuf *, struct mbuf *, int, struct ip6_frag **)); @@ -692,12 +690,8 @@ again: if (!IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { m->m_flags &= ~(M_BCAST | M_MCAST); /* just in case */ } else { - struct in6_multi *in6m; - m->m_flags = (m->m_flags & ~M_BCAST) | M_MCAST; - in6_ifstat_inc(ifp, ifs6_out_mcast); - /* * Confirm that the outgoing interface supports multicast. */ @@ -707,13 +701,14 @@ again: error = ENETUNREACH; goto bad; } - IN6_LOOKUP_MULTI(ip6->ip6_dst, ifp, in6m); - if (in6m != NULL && - (im6o == NULL || im6o->im6o_multicast_loop)) { + if ((im6o == NULL && in6_mcast_loop) || + (im6o && im6o->im6o_multicast_loop)) { /* - * If we belong to the destination multicast group - * on the outgoing interface, and the caller did not - * forbid loopback, loop back a copy. + * Loop back multicast datagram if not expressly + * forbidden to do so, even if we have not joined + * the address; protocols will filter it later, + * thus deferring a hash lookup and lock acquisition + * at the expense of an m_copym(). */ ip6_mloopback(ifp, m, dst); } else { @@ -729,7 +724,7 @@ again: * above, will be forwarded by the ip6_input() routine, * if necessary. */ - if (ip6_mrouter && (flags & IPV6_FORWARDING) == 0) { + if (V_ip6_mrouter && (flags & IPV6_FORWARDING) == 0) { /* * XXX: ip6_mforward expects that rcvif is NULL * when it is called from the originating path. @@ -1702,47 +1697,14 @@ do { \ case IPV6_MULTICAST_LOOP: case IPV6_JOIN_GROUP: case IPV6_LEAVE_GROUP: - { - if (sopt->sopt_valsize > MLEN) { - error = EMSGSIZE; - break; - } - /* XXX */ - } - /* FALLTHROUGH */ - { - struct mbuf *m; - - if (sopt->sopt_valsize > MCLBYTES) { - error = EMSGSIZE; - break; - } - /* XXX */ - MGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT, MT_DATA); - if (m == 0) { - error = ENOBUFS; - break; - } - if (sopt->sopt_valsize > MLEN) { - MCLGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT); - if ((m->m_flags & M_EXT) == 0) { - m_free(m); - error = ENOBUFS; - break; - } - } - m->m_len = sopt->sopt_valsize; - error = sooptcopyin(sopt, mtod(m, char *), - m->m_len, m->m_len); - if (error) { - (void)m_free(m); - break; - } - error = ip6_setmoptions(sopt->sopt_name, - &in6p->in6p_moptions, - m); - (void)m_free(m); - } + case IPV6_MSFILTER: + case MCAST_BLOCK_SOURCE: + case MCAST_UNBLOCK_SOURCE: + case MCAST_JOIN_GROUP: + case MCAST_LEAVE_GROUP: + case MCAST_JOIN_SOURCE_GROUP: + case MCAST_LEAVE_SOURCE_GROUP: + error = ip6_setmoptions(in6p, sopt); break; case IPV6_PORTRANGE: @@ -1974,17 +1936,8 @@ do { \ case IPV6_MULTICAST_IF: case IPV6_MULTICAST_HOPS: case IPV6_MULTICAST_LOOP: - case IPV6_JOIN_GROUP: - case IPV6_LEAVE_GROUP: - { - struct mbuf *m; - error = ip6_getmoptions(sopt->sopt_name, - in6p->in6p_moptions, &m); - if (error == 0) - error = sooptcopyout(sopt, - mtod(m, char *), m->m_len); - m_freem(m); - } + case IPV6_MSFILTER: + error = ip6_getmoptions(in6p, sopt); break; #ifdef IPSEC @@ -2405,374 +2358,6 @@ ip6_freepcbopts(struct ip6_pktopts *pktopt) } /* - * Set the IP6 multicast options in response to user setsockopt(). - */ -static int -ip6_setmoptions(int optname, struct ip6_moptions **im6op, struct mbuf *m) -{ - INIT_VNET_NET(curvnet); - INIT_VNET_INET6(curvnet); - int error = 0; - u_int loop, ifindex; - struct ipv6_mreq *mreq; - struct ifnet *ifp; - struct ip6_moptions *im6o = *im6op; - struct route_in6 ro; - struct in6_multi_mship *imm; - - if (im6o == NULL) { - /* - * No multicast option buffer attached to the pcb; - * allocate one and initialize to default values. - */ - im6o = (struct ip6_moptions *) - malloc(sizeof(*im6o), M_IP6MOPTS, M_WAITOK); - - if (im6o == NULL) - return (ENOBUFS); - *im6op = im6o; - im6o->im6o_multicast_ifp = NULL; - im6o->im6o_multicast_hlim = V_ip6_defmcasthlim; - im6o->im6o_multicast_loop = IPV6_DEFAULT_MULTICAST_LOOP; - LIST_INIT(&im6o->im6o_memberships); - } - - switch (optname) { - - case IPV6_MULTICAST_IF: - /* - * Select the interface for outgoing multicast packets. - */ - if (m == NULL || m->m_len != sizeof(u_int)) { - error = EINVAL; - break; - } - bcopy(mtod(m, u_int *), &ifindex, sizeof(ifindex)); - if (ifindex < 0 || V_if_index < ifindex) { - error = ENXIO; /* XXX EINVAL? */ - break; - } - ifp = ifnet_byindex(ifindex); - if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0) { - error = EADDRNOTAVAIL; - break; - } - im6o->im6o_multicast_ifp = ifp; - break; - - case IPV6_MULTICAST_HOPS: - { - /* - * Set the IP6 hoplimit for outgoing multicast packets. - */ - int optval; - if (m == NULL || m->m_len != sizeof(int)) { - error = EINVAL; - break; - } - bcopy(mtod(m, u_int *), &optval, sizeof(optval)); - if (optval < -1 || optval >= 256) - error = EINVAL; - else if (optval == -1) - im6o->im6o_multicast_hlim = V_ip6_defmcasthlim; - else - im6o->im6o_multicast_hlim = optval; - break; - } - - case IPV6_MULTICAST_LOOP: - /* - * Set the loopback flag for outgoing multicast packets. - * Must be zero or one. - */ - if (m == NULL || m->m_len != sizeof(u_int)) { - error = EINVAL; - break; - } - bcopy(mtod(m, u_int *), &loop, sizeof(loop)); - if (loop > 1) { - error = EINVAL; - break; - } - im6o->im6o_multicast_loop = loop; - break; - - case IPV6_JOIN_GROUP: - /* - * Add a multicast group membership. - * Group must be a valid IP6 multicast address. - */ - if (m == NULL || m->m_len != sizeof(struct ipv6_mreq)) { - error = EINVAL; - break; - } - mreq = mtod(m, struct ipv6_mreq *); - - if (IN6_IS_ADDR_UNSPECIFIED(&mreq->ipv6mr_multiaddr)) { - /* - * We use the unspecified address to specify to accept - * all multicast addresses. Only super user is allowed - * to do this. - */ - /* XXX-BZ might need a better PRIV_NETINET_x for this */ - error = priv_check(curthread, PRIV_NETINET_MROUTE); - if (error) - break; - } else if (!IN6_IS_ADDR_MULTICAST(&mreq->ipv6mr_multiaddr)) { - error = EINVAL; - break; - } - - /* - * If no interface was explicitly specified, choose an - * appropriate one according to the given multicast address. - */ - if (mreq->ipv6mr_interface == 0) { - struct sockaddr_in6 *dst; - - /* - * Look up the routing table for the - * address, and choose the outgoing interface. - * XXX: is it a good approach? - */ - ro.ro_rt = NULL; - dst = (struct sockaddr_in6 *)&ro.ro_dst; - bzero(dst, sizeof(*dst)); - dst->sin6_family = AF_INET6; - dst->sin6_len = sizeof(*dst); - dst->sin6_addr = mreq->ipv6mr_multiaddr; - rtalloc((struct route *)&ro); - if (ro.ro_rt == NULL) { - error = EADDRNOTAVAIL; - break; - } - ifp = ro.ro_rt->rt_ifp; - RTFREE(ro.ro_rt); - } else { - /* - * If the interface is specified, validate it. - */ - if (mreq->ipv6mr_interface < 0 || - V_if_index < mreq->ipv6mr_interface) { - error = ENXIO; /* XXX EINVAL? */ - break; - } - ifp = ifnet_byindex(mreq->ipv6mr_interface); - if (!ifp) { - error = ENXIO; /* XXX EINVAL? */ - break; - } - } - - /* - * See if we found an interface, and confirm that it - * supports multicast - */ - if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0) { - error = EADDRNOTAVAIL; - break; - } - - if (in6_setscope(&mreq->ipv6mr_multiaddr, ifp, NULL)) { - error = EADDRNOTAVAIL; /* XXX: should not happen */ - break; - } - - /* - * See if the membership already exists. - */ - for (imm = im6o->im6o_memberships.lh_first; - imm != NULL; imm = imm->i6mm_chain.le_next) - if (imm->i6mm_maddr->in6m_ifp == ifp && - IN6_ARE_ADDR_EQUAL(&imm->i6mm_maddr->in6m_addr, - &mreq->ipv6mr_multiaddr)) - break; - if (imm != NULL) { - error = EADDRINUSE; - break; - } - /* - * Everything looks good; add a new record to the multicast - * address list for the given interface. - */ - imm = in6_joingroup(ifp, &mreq->ipv6mr_multiaddr, &error, 0); - if (imm == NULL) - break; - LIST_INSERT_HEAD(&im6o->im6o_memberships, imm, i6mm_chain); - break; - - case IPV6_LEAVE_GROUP: - /* - * Drop a multicast group membership. - * Group must be a valid IP6 multicast address. - */ - if (m == NULL || m->m_len != sizeof(struct ipv6_mreq)) { - error = EINVAL; - break; - } - mreq = mtod(m, struct ipv6_mreq *); - - /* - * If an interface address was specified, get a pointer - * to its ifnet structure. - */ - if (mreq->ipv6mr_interface < 0 || - V_if_index < mreq->ipv6mr_interface) { - error = ENXIO; /* XXX EINVAL? */ - break; - } - if (mreq->ipv6mr_interface == 0) - ifp = NULL; - else - ifp = ifnet_byindex(mreq->ipv6mr_interface); - - /* Fill in the scope zone ID */ - if (ifp) { - if (in6_setscope(&mreq->ipv6mr_multiaddr, ifp, NULL)) { - /* XXX: should not happen */ - error = EADDRNOTAVAIL; - break; - } - } else if (mreq->ipv6mr_interface != 0) { - /* - * This case happens when the (positive) index is in - * the valid range, but the corresponding interface has - * been detached dynamically (XXX). - */ - error = EADDRNOTAVAIL; - break; - } else { /* ipv6mr_interface == 0 */ - struct sockaddr_in6 sa6_mc; - - /* - * The API spec says as follows: - * If the interface index is specified as 0, the - * system may choose a multicast group membership to - * drop by matching the multicast address only. - * On the other hand, we cannot disambiguate the scope - * zone unless an interface is provided. Thus, we - * check if there's ambiguity with the default scope - * zone as the last resort. - */ - bzero(&sa6_mc, sizeof(sa6_mc)); - sa6_mc.sin6_family = AF_INET6; - sa6_mc.sin6_len = sizeof(sa6_mc); - sa6_mc.sin6_addr = mreq->ipv6mr_multiaddr; - error = sa6_embedscope(&sa6_mc, V_ip6_use_defzone); - if (error != 0) - break; - mreq->ipv6mr_multiaddr = sa6_mc.sin6_addr; - } - - /* - * Find the membership in the membership list. - */ - for (imm = im6o->im6o_memberships.lh_first; - imm != NULL; imm = imm->i6mm_chain.le_next) { - if ((ifp == NULL || imm->i6mm_maddr->in6m_ifp == ifp) && - IN6_ARE_ADDR_EQUAL(&imm->i6mm_maddr->in6m_addr, - &mreq->ipv6mr_multiaddr)) - break; - } - if (imm == NULL) { - /* Unable to resolve interface */ - error = EADDRNOTAVAIL; - break; - } - /* - * Give up the multicast address record to which the - * membership points. - */ - LIST_REMOVE(imm, i6mm_chain); - in6_delmulti(imm->i6mm_maddr); - free(imm, M_IP6MADDR); - break; - - default: - error = EOPNOTSUPP; - break; - } - - /* - * If all options have default values, no need to keep the mbuf. - */ - if (im6o->im6o_multicast_ifp == NULL && - im6o->im6o_multicast_hlim == V_ip6_defmcasthlim && - im6o->im6o_multicast_loop == IPV6_DEFAULT_MULTICAST_LOOP && - im6o->im6o_memberships.lh_first == NULL) { - free(*im6op, M_IP6MOPTS); - *im6op = NULL; - } - - return (error); -} - -/* - * Return the IP6 multicast options in response to user getsockopt(). - */ -static int -ip6_getmoptions(int optname, struct ip6_moptions *im6o, struct mbuf **mp) -{ - INIT_VNET_INET6(curvnet); - u_int *hlim, *loop, *ifindex; - - *mp = m_get(M_WAIT, MT_HEADER); /* XXX */ - - switch (optname) { - - case IPV6_MULTICAST_IF: - ifindex = mtod(*mp, u_int *); - (*mp)->m_len = sizeof(u_int); - if (im6o == NULL || im6o->im6o_multicast_ifp == NULL) - *ifindex = 0; - else - *ifindex = im6o->im6o_multicast_ifp->if_index; - return (0); - - case IPV6_MULTICAST_HOPS: - hlim = mtod(*mp, u_int *); - (*mp)->m_len = sizeof(u_int); - if (im6o == NULL) - *hlim = V_ip6_defmcasthlim; - else - *hlim = im6o->im6o_multicast_hlim; - return (0); - - case IPV6_MULTICAST_LOOP: - loop = mtod(*mp, u_int *); - (*mp)->m_len = sizeof(u_int); - if (im6o == NULL) - *loop = V_ip6_defmcasthlim; - else - *loop = im6o->im6o_multicast_loop; - return (0); - - default: - return (EOPNOTSUPP); - } -} - -/* - * Discard the IP6 multicast options. - */ -void -ip6_freemoptions(struct ip6_moptions *im6o) -{ - struct in6_multi_mship *imm; - - if (im6o == NULL) - return; - - while ((imm = im6o->im6o_memberships.lh_first) != NULL) { - LIST_REMOVE(imm, i6mm_chain); - if (imm->i6mm_maddr) - in6_delmulti(imm->i6mm_maddr); - free(imm, M_IP6MADDR); - } - free(im6o, M_IP6MOPTS); -} - -/* * Set IPv6 outgoing packet options based on advanced API. */ int diff --git a/sys/netinet6/ip6_var.h b/sys/netinet6/ip6_var.h index 9e8476c..313b6ca 100644 --- a/sys/netinet6/ip6_var.h +++ b/sys/netinet6/ip6_var.h @@ -98,11 +98,19 @@ struct ip6asfrag { #define IP6_REASS_MBUF(ip6af) (*(struct mbuf **)&((ip6af)->ip6af_m)) -struct ip6_moptions { +/* + * Structure attached to inpcb.in6p_moptions and + * passed to ip6_output when IPv6 multicast options are in use. + * This structure is lazy-allocated. + */ +struct ip6_moptions { struct ifnet *im6o_multicast_ifp; /* ifp for outgoing multicasts */ u_char im6o_multicast_hlim; /* hoplimit for outgoing multicasts */ u_char im6o_multicast_loop; /* 1 >= hear sends if a member */ - LIST_HEAD(, in6_multi_mship) im6o_memberships; + u_short im6o_num_memberships; /* no. memberships this socket */ + u_short im6o_max_memberships; /* max memberships this socket */ + struct in6_multi **im6o_membership; /* group memberships */ + struct in6_mfilter *im6o_mfilters; /* source filters */ }; /* @@ -234,6 +242,13 @@ struct ip6stat { }; #ifdef _KERNEL +#define IP6STAT_ADD(name, val) V_ip6stat.name += (val) +#define IP6STAT_SUB(name, val) V_ip6stat.name -= (val) +#define IP6STAT_INC(name) IP6STAT_ADD(name, 1) +#define IP6STAT_DEC(name) IP6STAT_SUB(name, 1) +#endif + +#ifdef _KERNEL /* * IPv6 onion peeling state. * it will be initialized when we come into ip6_input(). @@ -287,10 +302,7 @@ extern int ip6_rr_prune; /* router renumbering prefix * walk list every 5 sec. */ extern int ip6_mcast_pmtu; /* enable pMTU discovery for multicast? */ extern int ip6_v6only; -#endif /* VIMAGE_GLOBALS */ - extern struct socket *ip6_mrouter; /* multicast routing daemon */ -#ifdef VIMAGE_GLOBALS extern int ip6_sendredirects; /* send IP redirects when forwarding? */ extern int ip6_maxfragpackets; /* Maximum packets in reassembly queue */ extern int ip6_maxfrags; /* Maximum fragments in reassembly queue */ @@ -330,7 +342,7 @@ void ip6_init __P((void)); void ip6_input __P((struct mbuf *)); struct in6_ifaddr *ip6_getdstifaddr __P((struct mbuf *)); void ip6_freepcbopts __P((struct ip6_pktopts *)); -void ip6_freemoptions __P((struct ip6_moptions *)); + int ip6_unknown_opt __P((u_int8_t *, struct mbuf *, int)); char * ip6_get_prevhdr __P((struct mbuf *, int)); int ip6_nexthdr __P((struct mbuf *, int, int, int *)); diff --git a/sys/netinet6/mld6.c b/sys/netinet6/mld6.c index 0d05522..17b1df8 100644 --- a/sys/netinet6/mld6.c +++ b/sys/netinet6/mld6.c @@ -1,6 +1,5 @@ /*- - * Copyright (C) 1998 WIDE Project. - * All rights reserved. + * Copyright (c) 2009 Bruce Simpson. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -10,14 +9,14 @@ * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. - * 3. Neither the name of the project nor the names of its contributors - * may be used to endorse or promote products derived from this software - * without specific prior written permission. + * 3. The name of the author may not be used to endorse or promote + * products derived from this software without specific prior written + * permission. * - * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) @@ -75,13 +74,16 @@ __FBSDID("$FreeBSD$"); #include <sys/mbuf.h> #include <sys/socket.h> #include <sys/protosw.h> -#include <sys/syslog.h> +#include <sys/sysctl.h> #include <sys/kernel.h> #include <sys/callout.h> #include <sys/malloc.h> +#include <sys/module.h> #include <sys/vimage.h> #include <net/if.h> +#include <net/route.h> +#include <net/vnet.h> #include <netinet/in.h> #include <netinet/in_var.h> @@ -90,568 +92,3113 @@ __FBSDID("$FreeBSD$"); #include <netinet6/ip6_var.h> #include <netinet6/scope6_var.h> #include <netinet/icmp6.h> +#include <netinet6/mld6.h> #include <netinet6/mld6_var.h> +#include <netinet/vinet.h> #include <netinet6/vinet6.h> +#include <security/mac/mac_framework.h> + +#ifndef KTR_MLD +#define KTR_MLD KTR_INET6 +#endif + +static struct mld_ifinfo * + mli_alloc_locked(struct ifnet *); +static void mli_delete_locked(const struct ifnet *); +static void mld_dispatch_packet(struct mbuf *); +static void mld_dispatch_queue(struct ifqueue *, int); +static void mld_final_leave(struct in6_multi *, struct mld_ifinfo *); +static void mld_fasttimo_vnet(void); +static int mld_handle_state_change(struct in6_multi *, + struct mld_ifinfo *); +static int mld_initial_join(struct in6_multi *, struct mld_ifinfo *, + const int); +#ifdef KTR +static char * mld_rec_type_to_str(const int); +#endif +static void mld_set_version(struct mld_ifinfo *, const int); +static void mld_slowtimo_vnet(void); +static void mld_sysinit(void); +static void mld_sysuninit(void); +static int mld_v1_input_query(struct ifnet *, const struct ip6_hdr *, + const struct mld_hdr *); +static int mld_v1_input_report(struct ifnet *, const struct ip6_hdr *, + const struct mld_hdr *); +static void mld_v1_process_group_timer(struct in6_multi *, const int); +static void mld_v1_process_querier_timers(struct mld_ifinfo *); +static int mld_v1_transmit_report(struct in6_multi *, const int); +static void mld_v1_update_group(struct in6_multi *, const int); +static void mld_v2_cancel_link_timers(struct mld_ifinfo *); +static void mld_v2_dispatch_general_query(struct mld_ifinfo *); +static struct mbuf * + mld_v2_encap_report(struct ifnet *, struct mbuf *); +static int mld_v2_enqueue_filter_change(struct ifqueue *, + struct in6_multi *); +static int mld_v2_enqueue_group_record(struct ifqueue *, + struct in6_multi *, const int, const int, const int); +static int mld_v2_input_query(struct ifnet *, const struct ip6_hdr *, + struct mbuf *, const int, const int); +static int mld_v2_merge_state_changes(struct in6_multi *, + struct ifqueue *); +static void mld_v2_process_group_timers(struct mld_ifinfo *, + struct ifqueue *, struct ifqueue *, + struct in6_multi *, const int); +static int mld_v2_process_group_query(struct in6_multi *, + struct mld_ifinfo *mli, int, struct mbuf *, const int); +static int sysctl_mld_gsr(SYSCTL_HANDLER_ARGS); +static int sysctl_mld_ifinfo(SYSCTL_HANDLER_ARGS); + +#ifdef VIMAGE +static vnet_attach_fn vnet_mld_iattach; +static vnet_detach_fn vnet_mld_idetach; +#else +static int vnet_mld_iattach(const void *); +static int vnet_mld_idetach(const void *); +#endif /* VIMAGE */ + /* - * Protocol constants + * Normative references: RFC 2710, RFC 3590, RFC 3810. + * + * Locking: + * * The MLD subsystem lock ends up being system-wide for the moment, + * but could be per-VIMAGE later on. + * * The permitted lock order is: IN6_MULTI_LOCK, MLD_LOCK, IF_ADDR_LOCK. + * Any may be taken independently; if any are held at the same + * time, the above lock order must be followed. + * * IN6_MULTI_LOCK covers in_multi. + * * MLD_LOCK covers per-link state and any global variables in this file. + * * IF_ADDR_LOCK covers if_multiaddrs, which is used for a variety of + * per-link state iterators. + * + * XXX LOR PREVENTION + * A special case for IPv6 is the in6_setscope() routine. ip6_output() + * will not accept an ifp; it wants an embedded scope ID, unlike + * ip_output(), which happily takes the ifp given to it. The embedded + * scope ID is only used by MLD to select the outgoing interface. + * + * During interface attach and detach, MLD will take MLD_LOCK *after* + * the IF_AFDATA_LOCK. + * As in6_setscope() takes IF_AFDATA_LOCK then SCOPE_LOCK, we can't call + * it with MLD_LOCK held without triggering an LOR. A netisr with indirect + * dispatch could work around this, but we'd rather not do that, as it + * can introduce other races. + * + * As such, we exploit the fact that the scope ID is just the interface + * index, and embed it in the IPv6 destination address accordingly. + * This is potentially NOT VALID for MLDv1 reports, as they + * are always sent to the multicast group itself; as MLDv2 + * reports are always sent to ff02::16, this is not an issue + * when MLDv2 is in use. + * + * This does not however eliminate the LOR when ip6_output() itself + * calls in6_setscope() internally whilst MLD_LOCK is held. This will + * trigger a LOR warning in WITNESS when the ifnet is detached. + * + * The right answer is probably to make IF_AFDATA_LOCK an rwlock, given + * how it's used across the network stack. Here we're simply exploiting + * the fact that MLD runs at a similar layer in the stack to scope6.c. + * + * VIMAGE: + * * Each in6_multi corresponds to an ifp, and each ifp corresponds + * to a vnet in ifp->if_vnet. */ +static struct mtx mld_mtx; +MALLOC_DEFINE(M_MLD, "mld", "mld state"); + +#define MLD_EMBEDSCOPE(pin6, zoneid) \ + (pin6)->s6_addr16[1] = htons((zoneid) & 0xFFFF) -/* denotes that the MLD max response delay field specifies time in milliseconds */ -#define MLD_TIMER_SCALE 1000 /* - * time between repetitions of a node's initial report of interest in a - * multicast address(in seconds) + * VIMAGE-wide globals. */ -#define MLD_UNSOLICITED_REPORT_INTERVAL 10 - #ifdef VIMAGE_GLOBALS -static struct ip6_pktopts ip6_opts; -#endif +struct timeval mld_gsrdelay; +LIST_HEAD(, mld_ifinfo) mli_head; +int interface_timers_running6; +int state_change_timers_running6; +int current_state_timers_running6; +#endif /* VIMAGE_GLOBALS */ -static void mld6_sendpkt(struct in6_multi *, int, const struct in6_addr *); -static void mld_starttimer(struct in6_multi *); -static void mld_stoptimer(struct in6_multi *); -static void mld_timeo(struct in6_multi *); -static u_long mld_timerresid(struct in6_multi *); +SYSCTL_DECL(_net_inet6); /* Note: Not in any common header. */ -void -mld6_init(void) +SYSCTL_NODE(_net_inet6, OID_AUTO, mld, CTLFLAG_RW, 0, + "IPv6 Multicast Listener Discovery"); + +/* + * Virtualized sysctls. + */ +SYSCTL_V_PROC(V_NET, vnet_inet6, _net_inet6_mld, OID_AUTO, gsrdelay, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, mld_gsrdelay.tv_sec, 0, + sysctl_mld_gsr, "I", + "Rate limit for MLDv2 Group-and-Source queries in seconds"); + +/* + * Non-virtualized sysctls. + */ +SYSCTL_NODE(_net_inet6_mld, OID_AUTO, ifinfo, CTLFLAG_RD | CTLFLAG_MPSAFE, + sysctl_mld_ifinfo, "Per-interface MLDv2 state"); + +/* + * Packed Router Alert option structure declaration. + */ +struct mld_raopt { + struct ip6_hbh hbh; + struct ip6_opt pad; + struct ip6_opt_router ra; +} __packed; + +/* + * Router Alert hop-by-hop option header. + */ +static struct mld_raopt mld_ra = { + .hbh = { 0, 0 }, + .pad = { .ip6o_type = IP6OPT_PADN, 0 }, + .ra = { + .ip6or_type = IP6OPT_ROUTER_ALERT, + .ip6or_len = IP6OPT_RTALERT_LEN - 2, + .ip6or_value[0] = ((IP6OPT_RTALERT_MLD >> 8) & 0xFF), + .ip6or_value[1] = (IP6OPT_RTALERT_MLD & 0xFF) + } +}; +static struct ip6_pktopts mld_po; + +static __inline void +mld_save_context(struct mbuf *m, struct ifnet *ifp) +{ + +#ifdef VIMAGE + m->m_pkthdr.header = ifp->if_vnet; +#endif /* VIMAGE */ + m->m_pkthdr.flowid = ifp->if_index; +} + +static __inline void +mld_scrub_context(struct mbuf *m) +{ + + m->m_pkthdr.header = NULL; + m->m_pkthdr.flowid = 0; +} + +/* + * Restore context from a queued output chain. + * Return saved ifindex. + * + * VIMAGE: The assertion is there to make sure that we + * actually called CURVNET_SET() with what's in the mbuf chain. + */ +static __inline uint32_t +mld_restore_context(struct mbuf *m) +{ + +#ifdef notyet +#if defined(VIMAGE) && defined(INVARIANTS) + KASSERT(curvnet == (m->m_pkthdr.header), + ("%s: called when curvnet was not restored", __func__)); +#endif +#endif + return (m->m_pkthdr.flowid); +} + +/* + * Retrieve or set threshold between group-source queries in seconds. + * + * VIMAGE: Assume curvnet set by caller. + * SMPng: NOTE: Serialized by MLD lock. + */ +static int +sysctl_mld_gsr(SYSCTL_HANDLER_ARGS) { INIT_VNET_INET6(curvnet); - static u_int8_t hbh_buf[8]; - struct ip6_hbh *hbh = (struct ip6_hbh *)hbh_buf; - u_int16_t rtalert_code = htons((u_int16_t)IP6OPT_RTALERT_MLD); + int error; + int i; + + error = sysctl_wire_old_buffer(req, sizeof(int)); + if (error) + return (error); - /* ip6h_nxt will be fill in later */ - hbh->ip6h_len = 0; /* (8 >> 3) - 1 */ + MLD_LOCK(); - /* XXX: grotty hard coding... */ - hbh_buf[2] = IP6OPT_PADN; /* 2 byte padding */ - hbh_buf[3] = 0; - hbh_buf[4] = IP6OPT_ROUTER_ALERT; - hbh_buf[5] = IP6OPT_RTALERT_LEN - 2; - bcopy((caddr_t)&rtalert_code, &hbh_buf[6], sizeof(u_int16_t)); + i = V_mld_gsrdelay.tv_sec; - ip6_initpktopts(&V_ip6_opts); - V_ip6_opts.ip6po_hbh = hbh; + error = sysctl_handle_int(oidp, &i, 0, req); + if (error || !req->newptr) + goto out_locked; + + if (i < -1 || i >= 60) { + error = EINVAL; + goto out_locked; + } + + CTR2(KTR_MLD, "change mld_gsrdelay from %d to %d", + V_mld_gsrdelay.tv_sec, i); + V_mld_gsrdelay.tv_sec = i; + +out_locked: + MLD_UNLOCK(); + return (error); } -static void -mld_starttimer(struct in6_multi *in6m) +/* + * Expose struct mld_ifinfo to userland, keyed by ifindex. + * For use by ifmcstat(8). + * + * SMPng: NOTE: Does an unlocked ifindex space read. + * VIMAGE: Assume curvnet set by caller. The node handler itself + * is not directly virtualized. + */ +static int +sysctl_mld_ifinfo(SYSCTL_HANDLER_ARGS) { - struct timeval now; + INIT_VNET_NET(curvnet); + INIT_VNET_INET6(curvnet); + int *name; + int error; + u_int namelen; + struct ifnet *ifp; + struct mld_ifinfo *mli; + + name = (int *)arg1; + namelen = arg2; + + if (req->newptr != NULL) + return (EPERM); - microtime(&now); - in6m->in6m_timer_expire.tv_sec = now.tv_sec + in6m->in6m_timer / hz; - in6m->in6m_timer_expire.tv_usec = now.tv_usec + - (in6m->in6m_timer % hz) * (1000000 / hz); - if (in6m->in6m_timer_expire.tv_usec > 1000000) { - in6m->in6m_timer_expire.tv_sec++; - in6m->in6m_timer_expire.tv_usec -= 1000000; + if (namelen != 1) + return (EINVAL); + + error = sysctl_wire_old_buffer(req, sizeof(struct mld_ifinfo)); + if (error) + return (error); + + IN6_MULTI_LOCK(); + MLD_LOCK(); + + if (name[0] <= 0 || name[0] > V_if_index) { + error = ENOENT; + goto out_locked; } - /* start or restart the timer */ - callout_reset(in6m->in6m_timer_ch, in6m->in6m_timer, - (void (*)(void *))mld_timeo, in6m); + error = ENOENT; + + ifp = ifnet_byindex(name[0]); + if (ifp == NULL) + goto out_locked; + + LIST_FOREACH(mli, &V_mli_head, mli_link) { + if (ifp == mli->mli_ifp) { + error = SYSCTL_OUT(req, mli, + sizeof(struct mld_ifinfo)); + break; + } + } + +out_locked: + MLD_UNLOCK(); + IN6_MULTI_UNLOCK(); + return (error); } +/* + * Dispatch an entire queue of pending packet chains. + * VIMAGE: Assumes the vnet pointer has been set. + */ static void -mld_stoptimer(struct in6_multi *in6m) +mld_dispatch_queue(struct ifqueue *ifq, int limit) { - if (in6m->in6m_timer == IN6M_TIMER_UNDEF) - return; + struct mbuf *m; - callout_stop(in6m->in6m_timer_ch); - in6m->in6m_timer = IN6M_TIMER_UNDEF; + for (;;) { + _IF_DEQUEUE(ifq, m); + if (m == NULL) + break; + CTR3(KTR_MLD, "%s: dispatch %p from %p", __func__, ifq, m); + mld_dispatch_packet(m); + if (--limit == 0) + break; + } } -static void -mld_timeo(struct in6_multi *in6m) +/* + * Filter outgoing MLD report state by group. + * + * Reports are ALWAYS suppressed for ALL-HOSTS (ff02::1) + * and node-local addresses. However, kernel and socket consumers + * always embed the KAME scope ID in the address provided, so strip it + * when performing comparison. + * Note: This is not the same as the *multicast* scope. + * + * Return zero if the given group is one for which MLD reports + * should be suppressed, or non-zero if reports should be issued. + */ +static __inline int +mld_is_addr_reported(const struct in6_addr *addr) { - int s = splnet(); + INIT_VNET_INET6(curvnet); - in6m->in6m_timer = IN6M_TIMER_UNDEF; + KASSERT(IN6_IS_ADDR_MULTICAST(addr), ("%s: not multicast", __func__)); - callout_stop(in6m->in6m_timer_ch); + if (IPV6_ADDR_MC_SCOPE(addr) == IPV6_ADDR_SCOPE_NODELOCAL) + return (0); - switch (in6m->in6m_state) { - case MLD_REPORTPENDING: - mld6_start_listening(in6m); - break; - default: - mld6_sendpkt(in6m, MLD_LISTENER_REPORT, NULL); - break; + if (IPV6_ADDR_MC_SCOPE(addr) == IPV6_ADDR_SCOPE_LINKLOCAL) { + struct in6_addr tmp = *addr; + in6_clearscope(&tmp); + if (IN6_ARE_ADDR_EQUAL(&tmp, &in6addr_linklocal_allnodes)) + return (0); } - splx(s); + return (1); } -static u_long -mld_timerresid(struct in6_multi *in6m) +/* + * Attach MLD when PF_INET6 is attached to an interface. + * + * SMPng: Normally called with IF_AFDATA_LOCK held. + * VIMAGE: Currently we set the vnet pointer, although it is + * likely that it was already set by our caller. + */ +struct mld_ifinfo * +mld_domifattach(struct ifnet *ifp) { - struct timeval now, diff; + struct mld_ifinfo *mli; - microtime(&now); + CTR3(KTR_MLD, "%s: called for ifp %p(%s)", + __func__, ifp, ifp->if_xname); - if (now.tv_sec > in6m->in6m_timer_expire.tv_sec || - (now.tv_sec == in6m->in6m_timer_expire.tv_sec && - now.tv_usec > in6m->in6m_timer_expire.tv_usec)) { - return (0); - } - diff = in6m->in6m_timer_expire; - diff.tv_sec -= now.tv_sec; - diff.tv_usec -= now.tv_usec; - if (diff.tv_usec < 0) { - diff.tv_sec--; - diff.tv_usec += 1000000; - } + CURVNET_SET(ifp->if_vnet); + MLD_LOCK(); + + mli = mli_alloc_locked(ifp); + if (!(ifp->if_flags & IFF_MULTICAST)) + mli->mli_flags |= MLIF_SILENT; + + MLD_UNLOCK(); + CURVNET_RESTORE(); - /* return the remaining time in milliseconds */ - return (diff.tv_sec * 1000 + diff.tv_usec / 1000); + return (mli); } -void -mld6_start_listening(struct in6_multi *in6m) +/* + * VIMAGE: assume curvnet set by caller. + */ +static struct mld_ifinfo * +mli_alloc_locked(/*const*/ struct ifnet *ifp) { - struct in6_addr all_in6; - int s = splnet(); + INIT_VNET_INET6(ifp->if_vnet); + struct mld_ifinfo *mli; + + MLD_LOCK_ASSERT(); + + mli = malloc(sizeof(struct mld_ifinfo), M_MLD, M_NOWAIT|M_ZERO); + if (mli == NULL) + goto out; + + mli->mli_ifp = ifp; + mli->mli_version = MLD_VERSION_2; + mli->mli_flags = 0; + mli->mli_rv = MLD_RV_INIT; + mli->mli_qi = MLD_QI_INIT; + mli->mli_qri = MLD_QRI_INIT; + mli->mli_uri = MLD_URI_INIT; + + SLIST_INIT(&mli->mli_relinmhead); /* - * RFC2710 page 10: - * The node never sends a Report or Done for the link-scope all-nodes - * address. - * MLD messages are never sent for multicast addresses whose scope is 0 - * (reserved) or 1 (node-local). + * Responses to general queries are subject to bounds. */ - all_in6 = in6addr_linklocal_allnodes; - if (in6_setscope(&all_in6, in6m->in6m_ifp, NULL)) { - /* XXX: this should not happen! */ - in6m->in6m_timer = 0; - in6m->in6m_state = MLD_OTHERLISTENER; - } - if (IN6_ARE_ADDR_EQUAL(&in6m->in6m_addr, &all_in6) || - IPV6_ADDR_MC_SCOPE(&in6m->in6m_addr) < - IPV6_ADDR_SCOPE_LINKLOCAL) { - in6m->in6m_timer = 0; - in6m->in6m_state = MLD_OTHERLISTENER; - } else { - mld6_sendpkt(in6m, MLD_LISTENER_REPORT, NULL); - in6m->in6m_timer = arc4random() % - MLD_UNSOLICITED_REPORT_INTERVAL * hz; - in6m->in6m_state = MLD_IREPORTEDLAST; + IFQ_SET_MAXLEN(&mli->mli_gq, MLD_MAX_RESPONSE_PACKETS); + + LIST_INSERT_HEAD(&V_mli_head, mli, mli_link); + + CTR2(KTR_MLD, "allocate mld_ifinfo for ifp %p(%s)", + ifp, ifp->if_xname); + +out: + return (mli); +} + +/* + * Hook for ifdetach. + * + * NOTE: Some finalization tasks need to run before the protocol domain + * is detached, but also before the link layer does its cleanup. + * Run before link-layer cleanup; cleanup groups, but do not free MLD state. + * + * SMPng: Caller must hold IN6_MULTI_LOCK(). + * Must take IF_ADDR_LOCK() to cover if_multiaddrs iterator. + * XXX This routine is also bitten by unlocked ifma_protospec access. + * + * VIMAGE: curvnet should have been set by caller, but let's not assume + * that for now. + */ +void +mld_ifdetach(struct ifnet *ifp) +{ + struct mld_ifinfo *mli; + struct ifmultiaddr *ifma; + struct in6_multi *inm, *tinm; + + CTR3(KTR_MLD, "%s: called for ifp %p(%s)", __func__, ifp, + ifp->if_xname); + + CURVNET_SET(ifp->if_vnet); - mld_starttimer(in6m); + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK(); + + mli = MLD_IFINFO(ifp); + if (mli->mli_version == MLD_VERSION_2) { + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { + if (ifma->ifma_addr->sa_family != AF_INET6 || + ifma->ifma_protospec == NULL) + continue; + inm = (struct in6_multi *)ifma->ifma_protospec; + if (inm->in6m_state == MLD_LEAVING_MEMBER) { + SLIST_INSERT_HEAD(&mli->mli_relinmhead, + inm, in6m_nrele); + } + in6m_clear_recorded(inm); + } + IF_ADDR_UNLOCK(ifp); + SLIST_FOREACH_SAFE(inm, &mli->mli_relinmhead, in6m_nrele, + tinm) { + SLIST_REMOVE_HEAD(&mli->mli_relinmhead, in6m_nrele); + in6m_release_locked(inm); + } } - splx(s); + + MLD_UNLOCK(); + CURVNET_RESTORE(); } +/* + * Hook for domifdetach. + * Runs after link-layer cleanup; free MLD state. + * + * SMPng: Normally called with IF_AFDATA_LOCK held. + * VIMAGE: curvnet should have been set by caller, but let's not assume + * that for now. + */ void -mld6_stop_listening(struct in6_multi *in6m) +mld_domifdetach(struct ifnet *ifp) { - struct in6_addr allnode, allrouter; - allnode = in6addr_linklocal_allnodes; - if (in6_setscope(&allnode, in6m->in6m_ifp, NULL)) { - /* XXX: this should not happen! */ - return; + CTR3(KTR_MLD, "%s: called for ifp %p(%s)", + __func__, ifp, ifp->if_xname); + + CURVNET_SET(ifp->if_vnet); + + MLD_LOCK(); + mli_delete_locked(ifp); + MLD_UNLOCK(); + + CURVNET_RESTORE(); +} + +static void +mli_delete_locked(const struct ifnet *ifp) +{ + INIT_VNET_INET6(ifp->if_vnet); + struct mld_ifinfo *mli, *tmli; + + CTR3(KTR_MLD, "%s: freeing mld_ifinfo for ifp %p(%s)", + __func__, ifp, ifp->if_xname); + + MLD_LOCK_ASSERT(); + + LIST_FOREACH_SAFE(mli, &V_mli_head, mli_link, tmli) { + if (mli->mli_ifp == ifp) { + /* + * Free deferred General Query responses. + */ + _IF_DRAIN(&mli->mli_gq); + + LIST_REMOVE(mli, mli_link); + + KASSERT(SLIST_EMPTY(&mli->mli_relinmhead), + ("%s: there are dangling in_multi references", + __func__)); + + free(mli, M_MLD); + return; + } } - allrouter = in6addr_linklocal_allrouters; - if (in6_setscope(&allrouter, in6m->in6m_ifp, NULL)) { - /* XXX impossible */ - return; +#ifdef INVARIANTS + panic("%s: mld_ifinfo not found for ifp %p\n", __func__, ifp); +#endif +} + +/* + * Process a received MLDv1 general or address-specific query. + * Assumes that the query header has been pulled up to sizeof(mld_hdr). + */ +static int +mld_v1_input_query(struct ifnet *ifp, const struct ip6_hdr *ip6, + const struct mld_hdr *mld) +{ + INIT_VNET_INET6(ifp->if_vnet); + struct ifmultiaddr *ifma; + struct mld_ifinfo *mli; + struct in6_multi *inm; + uint16_t timer; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + IN6_MULTI_LOCK(); + IF_ADDR_LOCK(ifp); + MLD_LOCK(); + + mli = MLD_IFINFO(ifp); + KASSERT(mli != NULL, ("%s: no mld_ifinfo for ifp %p", __func__, ifp)); + + /* + * Switch to MLDv1 host compatibility mode. + */ + mld_set_version(mli, MLD_VERSION_1); + + timer = ntohs(mld->mld_maxdelay) * PR_FASTHZ / MLD_TIMER_SCALE; + if (timer == 0) + timer = 1; + + if (!IN6_IS_ADDR_UNSPECIFIED(&mld->mld_addr)) { + /* + * MLDv1 Group-Specific Query. + * If this is a group-specific MLDv1 query, we need only + * look up the single group to process it. + */ + inm = in6m_lookup_locked(ifp, &mld->mld_addr); + if (inm != NULL) { + CTR3(KTR_MLD, "process v1 query %s on ifp %p(%s)", + ip6_sprintf(ip6tbuf, &mld->mld_addr), + ifp, ifp->if_xname); + mld_v1_update_group(inm, timer); + } + } else { + /* + * MLDv1 General Query. + * If this was not sent to the all-nodes group, ignore it. + * + * XXX Do we need to check for a scope ID in the destination + * address on input and strip it? + */ + if (IN6_ARE_ADDR_EQUAL(&ip6->ip6_dst, + &in6addr_linklocal_allnodes)) { + /* + * For each reporting group joined on this + * interface, kick the report timer. + */ + CTR2(KTR_MLD, + "process v1 general query on ifp %p(%s)", + ifp, ifp->if_xname); + + TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { + if (ifma->ifma_addr->sa_family != AF_INET6 || + ifma->ifma_protospec == NULL) + continue; + inm = (struct in6_multi *)ifma->ifma_protospec; + mld_v1_update_group(inm, timer); + } + } } - if (in6m->in6m_state == MLD_IREPORTEDLAST && - !IN6_ARE_ADDR_EQUAL(&in6m->in6m_addr, &allnode) && - IPV6_ADDR_MC_SCOPE(&in6m->in6m_addr) > - IPV6_ADDR_SCOPE_INTFACELOCAL) { - mld6_sendpkt(in6m, MLD_LISTENER_DONE, &allrouter); + + MLD_UNLOCK(); + IF_ADDR_UNLOCK(ifp); + IN6_MULTI_UNLOCK(); + + return (0); +} + +/* + * Update the report timer on a group in response to an MLDv1 query. + * + * If we are becoming the reporting member for this group, start the timer. + * If we already are the reporting member for this group, and timer is + * below the threshold, reset it. + * + * We may be updating the group for the first time since we switched + * to MLDv2. If we are, then we must clear any recorded source lists, + * and transition to REPORTING state; the group timer is overloaded + * for group and group-source query responses. + * + * Unlike MLDv2, the delay per group should be jittered + * to avoid bursts of MLDv1 reports. + */ +static void +mld_v1_update_group(struct in6_multi *inm, const int timer) +{ + INIT_VNET_INET6(curvnet); +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + CTR4(KTR_MLD, "%s: %s/%s timer=%d", __func__, + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname, timer); + + IN6_MULTI_LOCK_ASSERT(); + + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + break; + case MLD_REPORTING_MEMBER: + if (inm->in6m_timer != 0 && + inm->in6m_timer <= timer) { + CTR1(KTR_MLD, "%s: REPORTING and timer running, " + "skipping.", __func__); + break; + } + /* FALLTHROUGH */ + case MLD_SG_QUERY_PENDING_MEMBER: + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_LAZY_MEMBER: + case MLD_AWAKENING_MEMBER: + CTR1(KTR_MLD, "%s: ->REPORTING", __func__); + inm->in6m_state = MLD_REPORTING_MEMBER; + inm->in6m_timer = MLD_RANDOM_DELAY(timer); + V_current_state_timers_running6 = 1; + break; + case MLD_SLEEPING_MEMBER: + CTR1(KTR_MLD, "%s: ->AWAKENING", __func__); + inm->in6m_state = MLD_AWAKENING_MEMBER; + break; + case MLD_LEAVING_MEMBER: + break; } } -void -mld6_input(struct mbuf *m, int off) +/* + * Process a received MLDv2 general, group-specific or + * group-and-source-specific query. + * + * Assumes that the query header has been pulled up to sizeof(mldv2_query). + * + * Return 0 if successful, otherwise an appropriate error code is returned. + */ +static int +mld_v2_input_query(struct ifnet *ifp, const struct ip6_hdr *ip6, + struct mbuf *m, const int off, const int icmp6len) { INIT_VNET_INET6(curvnet); - struct ip6_hdr *ip6 = mtod(m, struct ip6_hdr *); - struct mld_hdr *mldh; - struct ifnet *ifp = m->m_pkthdr.rcvif; - struct in6_multi *in6m; - struct in6_addr mld_addr, all_in6; - struct in6_ifaddr *ia; - struct ifmultiaddr *ifma; - u_long timer; /* timer value in the MLD query header */ - -#ifndef PULLDOWN_TEST - IP6_EXTHDR_CHECK(m, off, sizeof(*mldh),); - mldh = (struct mld_hdr *)(mtod(m, caddr_t) + off); -#else - IP6_EXTHDR_GET(mldh, struct mld_hdr *, m, off, sizeof(*mldh)); - if (mldh == NULL) { - ICMP6STAT_INC(icp6s_tooshort); - return; + struct mld_ifinfo *mli; + struct mldv2_query *mld; + struct in6_multi *inm; + uint32_t maxdelay, nsrc, qqi; + uint16_t timer; + uint8_t qrv; + + CTR2(KTR_MLD, "process v2 query on ifp %p(%s)", ifp, ifp->if_xname); + + mld = (struct mldv2_query *)(mtod(m, uint8_t *) + off); + + maxdelay = ntohs(mld->mld_maxdelay); /* in 1/10ths of a second */ + if (maxdelay >= 32678) { + maxdelay = (MLD_MRC_MANT(mld->mld_maxdelay) | 0x1000) << + (MLD_MRC_EXP(mld->mld_maxdelay) + 3); } -#endif - /* source address validation */ - ip6 = mtod(m, struct ip6_hdr *); /* in case mpullup */ - if (!IN6_IS_ADDR_LINKLOCAL(&ip6->ip6_src)) { - char ip6bufs[INET6_ADDRSTRLEN], ip6bufg[INET6_ADDRSTRLEN]; - log(LOG_ERR, - "mld6_input: src %s is not link-local (grp=%s)\n", - ip6_sprintf(ip6bufs, &ip6->ip6_src), - ip6_sprintf(ip6bufg, &mldh->mld_addr)); + qrv = MLD_QRV(mld->mld_misc); + if (qrv < 2) { + CTR3(KTR_MLD, "%s: clamping qrv %d to %d", __func__, + qrv, MLD_RV_INIT); + qrv = MLD_RV_INIT; + } + + qqi = mld->mld_qqi; + if (qqi >= 128) { + qqi = MLD_QQIC_MANT(mld->mld_qqi) << + (MLD_QQIC_EXP(mld->mld_qqi) + 3); + } + + timer = maxdelay * PR_FASTHZ / MLD_TIMER_SCALE; + if (timer == 0) + timer = 1; + + nsrc = ntohs(mld->mld_numsrc); + if (nsrc > MLD_MAX_GS_SOURCES) + return (EMSGSIZE); + if (icmp6len < sizeof(struct mldv2_query) + + (nsrc * sizeof(struct in6_addr))) + return (EMSGSIZE); + + IN6_MULTI_LOCK(); + IF_ADDR_LOCK(ifp); + MLD_LOCK(); + + mli = MLD_IFINFO(ifp); + KASSERT(mli != NULL, ("%s: no mld_ifinfo for ifp %p", __func__, ifp)); + + mld_set_version(mli, MLD_VERSION_2); + + mli->mli_rv = qrv; + mli->mli_qi = qqi; + mli->mli_qri = maxdelay; + + CTR4(KTR_MLD, "%s: qrv %d qi %d qri %d", __func__, qrv, qqi, + maxdelay); + + if (IN6_IS_ADDR_UNSPECIFIED(&mld->mld_addr)) { /* - * spec (RFC2710) does not explicitly - * specify to discard the packet from a non link-local - * source address. But we believe it's expected to do so. - * XXX: do we have to allow :: as source? + * MLDv2 General Query. + * Schedule a current-state report on this ifp for + * all groups, possibly containing source lists. */ - m_freem(m); - return; + if (!IN6_ARE_ADDR_EQUAL(&in6addr_linklocal_allnodes, + &ip6->ip6_dst) || nsrc > 0) { + /* + * General Queries SHOULD be directed to ff02::1. + * A general query with a source list has undefined + * behaviour; discard it. + */ + goto out_locked; + } + + CTR2(KTR_MLD, "process v2 general query on ifp %p(%s)", + ifp, ifp->if_xname); + + /* + * If there is a pending General Query response + * scheduled earlier than the selected delay, do + * not schedule any other reports. + * Otherwise, reset the interface timer. + */ + if (mli->mli_v1_timer == 0 || mli->mli_v2_timer >= timer) { + mli->mli_v1_timer = MLD_RANDOM_DELAY(timer); + V_interface_timers_running6 = 1; + } + } else { + /* + * MLDv2 Group-specific or Group-and-source-specific Query. + * + * Group-source-specific queries are throttled on + * a per-group basis to defeat denial-of-service attempts. + * Queries for groups we are not a member of on this + * link are simply ignored. + */ + inm = in6m_lookup_locked(ifp, &mld->mld_addr); + if (inm == NULL) + goto out_locked; + if (nsrc > 0) { + if (!ratecheck(&inm->in6m_lastgsrtv, + &V_mld_gsrdelay)) { + CTR1(KTR_MLD, "%s: GS query throttled.", + __func__); + goto out_locked; + } + } + CTR2(KTR_MLD, "process v2 group query on ifp %p(%s)", + ifp, ifp->if_xname); + /* + * If there is a pending General Query response + * scheduled sooner than the selected delay, no + * further report need be scheduled. + * Otherwise, prepare to respond to the + * group-specific or group-and-source query. + */ + if (mli->mli_v1_timer == 0 || mli->mli_v2_timer >= timer) + mld_v2_process_group_query(inm, mli, timer, m, off); + } + +out_locked: + MLD_UNLOCK(); + IF_ADDR_UNLOCK(ifp); + IN6_MULTI_UNLOCK(); + + return (0); +} + +/* + * Process a recieved MLDv2 group-specific or group-and-source-specific + * query. + * Return <0 if any error occured. Currently this is ignored. + */ +static int +mld_v2_process_group_query(struct in6_multi *inm, struct mld_ifinfo *mli, + int timer, struct mbuf *m0, const int off) +{ + INIT_VNET_INET6(curvnet); + struct mldv2_query *mld; + int retval; + uint16_t nsrc; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + retval = 0; + mld = (struct mldv2_query *)(mtod(m0, uint8_t *) + off); + + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + case MLD_SLEEPING_MEMBER: + case MLD_LAZY_MEMBER: + case MLD_AWAKENING_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_LEAVING_MEMBER: + return (retval); + break; + case MLD_REPORTING_MEMBER: + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: + break; } + nsrc = ntohs(mld->mld_numsrc); + /* - * make a copy for local work (in6_setscope() may modify the 1st arg) + * Deal with group-specific queries upfront. + * If any group query is already pending, purge any recorded + * source-list state if it exists, and schedule a query response + * for this group-specific query. */ - mld_addr = mldh->mld_addr; - if (in6_setscope(&mld_addr, ifp, NULL)) { - /* XXX: this should not happen! */ - m_free(m); - return; + if (nsrc == 0) { + if (inm->in6m_state == MLD_G_QUERY_PENDING_MEMBER || + inm->in6m_state == MLD_SG_QUERY_PENDING_MEMBER) { + in6m_clear_recorded(inm); + timer = min(inm->in6m_timer, timer); + } + inm->in6m_state = MLD_G_QUERY_PENDING_MEMBER; + inm->in6m_timer = MLD_RANDOM_DELAY(timer); + V_current_state_timers_running6 = 1; + return (retval); } /* - * In the MLD6 specification, there are 3 states and a flag. - * - * In Non-Listener state, we simply don't have a membership record. - * In Delaying Listener state, our timer is running (in6m->in6m_timer) - * In Idle Listener state, our timer is not running - * (in6m->in6m_timer==IN6M_TIMER_UNDEF) - * - * The flag is in6m->in6m_state, it is set to MLD_OTHERLISTENER if - * we have heard a report from another member, or MLD_IREPORTEDLAST - * if we sent the last report. + * Deal with the case where a group-and-source-specific query has + * been received but a group-specific query is already pending. */ - switch(mldh->mld_type) { - case MLD_LISTENER_QUERY: - if (ifp->if_flags & IFF_LOOPBACK) - break; - - if (!IN6_IS_ADDR_UNSPECIFIED(&mld_addr) && - !IN6_IS_ADDR_MULTICAST(&mld_addr)) - break; /* print error or log stat? */ + if (inm->in6m_state == MLD_G_QUERY_PENDING_MEMBER) { + timer = min(inm->in6m_timer, timer); + inm->in6m_timer = MLD_RANDOM_DELAY(timer); + V_current_state_timers_running6 = 1; + return (retval); + } - all_in6 = in6addr_linklocal_allnodes; - if (in6_setscope(&all_in6, ifp, NULL)) { - /* XXX: this should not happen! */ - break; + /* + * Finally, deal with the case where a group-and-source-specific + * query has been received, where a response to a previous g-s-r + * query exists, or none exists. + * In this case, we need to parse the source-list which the Querier + * has provided us with and check if we have any source list filter + * entries at T1 for these sources. If we do not, there is no need + * schedule a report and the query may be dropped. + * If we do, we must record them and schedule a current-state + * report for those sources. + */ + if (inm->in6m_nsrc > 0) { + struct mbuf *m; + uint8_t *sp; + int i, nrecorded; + int soff; + + m = m0; + soff = off + sizeof(struct mldv2_query); + nrecorded = 0; + for (i = 0; i < nsrc; i++) { + sp = mtod(m, uint8_t *) + soff; + retval = in6m_record_source(inm, + (const struct in6_addr *)sp); + if (retval < 0) + break; + nrecorded += retval; + soff += sizeof(struct in6_addr); + if (soff >= m->m_len) { + soff = soff - m->m_len; + m = m->m_next; + if (m == NULL) + break; + } + } + if (nrecorded > 0) { + CTR1(KTR_MLD, + "%s: schedule response to SG query", __func__); + inm->in6m_state = MLD_SG_QUERY_PENDING_MEMBER; + inm->in6m_timer = MLD_RANDOM_DELAY(timer); + V_current_state_timers_running6 = 1; } + } + + return (retval); +} + +/* + * Process a received MLDv1 host membership report. + * Assumes mld points to mld_hdr in pulled up mbuf chain. + */ +static int +mld_v1_input_report(struct ifnet *ifp, const struct ip6_hdr *ip6, + const struct mld_hdr *mld) +{ + INIT_VNET_INET6(curvnet); + struct in6_ifaddr *ia; + struct in6_multi *inm; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + if (ifp->if_flags & IFF_LOOPBACK) + return (0); + + if (!IN6_IS_ADDR_MULTICAST(&mld->mld_addr) || + !IN6_ARE_ADDR_EQUAL(&mld->mld_addr, &ip6->ip6_dst)) + return (EINVAL); + + /* + * Make sure we don't hear our own membership report, as fast + * leave requires knowing that we are the only member of a + * group. Assume we used the link-local address if available, + * otherwise look for ::. + */ + ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST); + if ((ia && IN6_ARE_ADDR_EQUAL(&ip6->ip6_src, IA6_IN6(ia))) || + (ia == NULL && IN6_IS_ADDR_UNSPECIFIED(&ip6->ip6_src))) + return (0); + + CTR3(KTR_MLD, "process v1 report %s on ifp %p(%s)", + ip6_sprintf(ip6tbuf, &mld->mld_addr), ifp, ifp->if_xname); + + IN6_MULTI_LOCK(); + IF_ADDR_LOCK(ifp); + + /* + * MLDv1 report suppression. + * If we are a member of this group, and our membership should be + * reported, and our group timer is pending or about to be reset, + * stop our group timer by transitioning to the 'lazy' state. + */ + inm = in6m_lookup_locked(ifp, &mld->mld_addr); + if (inm != NULL) { + struct mld_ifinfo *mli; + + mli = inm->in6m_mli; + KASSERT(mli != NULL, + ("%s: no mli for ifp %p", __func__, ifp)); /* - * - Start the timers in all of our membership records - * that the query applies to for the interface on - * which the query arrived excl. those that belong - * to the "all-nodes" group (ff02::1). - * - Restart any timer that is already running but has - * A value longer than the requested timeout. - * - Use the value specified in the query message as - * the maximum timeout. + * If we are in MLDv2 host mode, do not allow the + * other host's MLDv1 report to suppress our reports. */ - timer = ntohs(mldh->mld_maxdelay); + if (mli->mli_version == MLD_VERSION_2) + goto out_locked; - IF_ADDR_LOCK(ifp); - IFP_TO_IA6(ifp, ia); - if (ia == NULL) { - IF_ADDR_UNLOCK(ifp); + inm->in6m_timer = 0; + + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + case MLD_SLEEPING_MEMBER: + break; + case MLD_REPORTING_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_AWAKENING_MEMBER: + CTR3(KTR_MLD, + "report suppressed for %s on ifp %p(%s)", + ip6_sprintf(ip6tbuf, &mld->mld_addr), + ifp, ifp->if_xname); + case MLD_LAZY_MEMBER: + inm->in6m_state = MLD_LAZY_MEMBER; + break; + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: + case MLD_LEAVING_MEMBER: break; } + } - /* - * XXX: System timer resolution is too low to handle Max - * Response Delay, so set 1 to the internal timer even if - * the calculated value equals to zero when Max Response - * Delay is positive. - */ - timer = ntohs(mldh->mld_maxdelay) * PR_FASTHZ / MLD_TIMER_SCALE; - if (timer == 0 && mldh->mld_maxdelay) - timer = 1; +out_locked: + IF_ADDR_UNLOCK(ifp); + IN6_MULTI_UNLOCK(); - TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { - if (ifma->ifma_addr->sa_family != AF_INET6) - continue; - in6m = (struct in6_multi *)ifma->ifma_protospec; + return (0); +} - if (IN6_ARE_ADDR_EQUAL(&in6m->in6m_addr, &all_in6) || - IPV6_ADDR_MC_SCOPE(&in6m->in6m_addr) < - IPV6_ADDR_SCOPE_LINKLOCAL) - continue; +/* + * MLD input path. + * + * Assume query messages which fit in a single ICMPv6 message header + * have been pulled up. + * Assume that userland will want to see the message, even if it + * otherwise fails kernel input validation; do not free it. + * Pullup may however free the mbuf chain m if it fails. + * + * Return IPPROTO_DONE if we freed m. Otherwise, return 0. + */ +int +mld_input(struct mbuf *m, int off, int icmp6len) +{ + struct ifnet *ifp; + struct ip6_hdr *ip6; + struct mld_hdr *mld; + int mldlen; - if (IN6_IS_ADDR_UNSPECIFIED(&mld_addr) || - IN6_ARE_ADDR_EQUAL(&mld_addr, &in6m->in6m_addr)) { - if (timer == 0) { - /* send a report immediately */ - mld_stoptimer(in6m); - mld6_sendpkt(in6m, MLD_LISTENER_REPORT, - NULL); - in6m->in6m_timer = 0; /* reset timer */ - in6m->in6m_state = MLD_IREPORTEDLAST; - } - else if (in6m->in6m_timer == IN6M_TIMER_UNDEF || - mld_timerresid(in6m) > timer) { - in6m->in6m_timer = - 1 + (arc4random() % timer) * hz / 1000; - mld_starttimer(in6m); - } + CTR3(KTR_MLD, "%s: called w/mbuf (%p,%d)", __func__, m, off); + + ifp = m->m_pkthdr.rcvif; + INIT_VNET_INET6(ifp->if_vnet); + + ip6 = mtod(m, struct ip6_hdr *); + + /* Pullup to appropriate size. */ + mld = (struct mld_hdr *)(mtod(m, uint8_t *) + off); + if (mld->mld_type == MLD_LISTENER_QUERY && + icmp6len >= sizeof(struct mldv2_query)) { + mldlen = sizeof(struct mldv2_query); + } else { + mldlen = sizeof(struct mld_hdr); + } + IP6_EXTHDR_GET(mld, struct mld_hdr *, m, off, mldlen); + if (mld == NULL) { + ICMP6STAT_INC(icp6s_badlen); + return (IPPROTO_DONE); + } + + switch (mld->mld_type) { + case MLD_LISTENER_QUERY: + icmp6_ifstat_inc(ifp, ifs6_in_mldquery); + if (icmp6len == sizeof(struct mld_hdr)) { + if (mld_v1_input_query(ifp, ip6, mld) != 0) + return (0); + } else if (icmp6len >= sizeof(struct mldv2_query)) { + if (mld_v2_input_query(ifp, ip6, m, off, + icmp6len) != 0) + return (0); + } + break; + case MLD_LISTENER_REPORT: + icmp6_ifstat_inc(ifp, ifs6_in_mldreport); + if (mld_v1_input_report(ifp, ip6, mld) != 0) + return (0); /* Userland needs to see it. */ + break; + case MLDV2_LISTENER_REPORT: + icmp6_ifstat_inc(ifp, ifs6_in_mldreport); + break; + case MLD_LISTENER_DONE: + icmp6_ifstat_inc(ifp, ifs6_in_mlddone); + break; + default: + break; + } + + return (0); +} + +/* + * Fast timeout handler (global). + * VIMAGE: Timeout handlers are expected to service all vimages. + */ +void +mld_fasttimo(void) +{ + VNET_ITERATOR_DECL(vnet_iter); + + VNET_LIST_RLOCK(); + VNET_FOREACH(vnet_iter) { + CURVNET_SET(vnet_iter); + mld_fasttimo_vnet(); + CURVNET_RESTORE(); + } + VNET_LIST_RUNLOCK(); +} + +/* + * Fast timeout handler (per-vnet). + * + * VIMAGE: Assume caller has set up our curvnet. + */ +static void +mld_fasttimo_vnet(void) +{ + INIT_VNET_INET6(curvnet); + struct ifqueue scq; /* State-change packets */ + struct ifqueue qrq; /* Query response packets */ + struct ifnet *ifp; + struct mld_ifinfo *mli; + struct ifmultiaddr *ifma, *tifma; + struct in6_multi *inm; + int uri_fasthz; + + uri_fasthz = 0; + + /* + * Quick check to see if any work needs to be done, in order to + * minimize the overhead of fasttimo processing. + * SMPng: XXX Unlocked reads. + */ + if (!V_current_state_timers_running6 && + !V_interface_timers_running6 && + !V_state_change_timers_running6) + return; + + IN6_MULTI_LOCK(); + MLD_LOCK(); + + /* + * MLDv2 General Query response timer processing. + */ + if (V_interface_timers_running6) { + CTR1(KTR_MLD, "%s: interface timers running", __func__); + + V_interface_timers_running6 = 0; + LIST_FOREACH(mli, &V_mli_head, mli_link) { + if (mli->mli_v2_timer == 0) { + /* Do nothing. */ + } else if (--mli->mli_v2_timer == 0) { + mld_v2_dispatch_general_query(mli); + } else { + V_interface_timers_running6 = 1; + } + } + } + + if (!V_current_state_timers_running6 && + !V_state_change_timers_running6) + goto out_locked; + + V_current_state_timers_running6 = 0; + V_state_change_timers_running6 = 0; + + CTR1(KTR_MLD, "%s: state change timers running", __func__); + + /* + * MLD host report and state-change timer processing. + * Note: Processing a v2 group timer may remove a node. + */ + LIST_FOREACH(mli, &V_mli_head, mli_link) { + ifp = mli->mli_ifp; + + if (mli->mli_version == MLD_VERSION_2) { + uri_fasthz = MLD_RANDOM_DELAY(mli->mli_uri * + PR_FASTHZ); + + memset(&qrq, 0, sizeof(struct ifqueue)); + IFQ_SET_MAXLEN(&qrq, MLD_MAX_G_GS_PACKETS); + + memset(&scq, 0, sizeof(struct ifqueue)); + IFQ_SET_MAXLEN(&scq, MLD_MAX_STATE_CHANGE_PACKETS); + } + + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH_SAFE(ifma, &ifp->if_multiaddrs, ifma_link, + tifma) { + if (ifma->ifma_addr->sa_family != AF_INET6 || + ifma->ifma_protospec == NULL) + continue; + inm = (struct in6_multi *)ifma->ifma_protospec; + switch (mli->mli_version) { + case MLD_VERSION_1: + mld_v1_process_group_timer(inm, + mli->mli_version); + break; + case MLD_VERSION_2: + mld_v2_process_group_timers(mli, &qrq, + &scq, inm, uri_fasthz); + break; } } IF_ADDR_UNLOCK(ifp); + + if (mli->mli_version == MLD_VERSION_2) { + struct in6_multi *tinm; + + mld_dispatch_queue(&qrq, 0); + mld_dispatch_queue(&scq, 0); + + /* + * Free the in_multi reference(s) for + * this lifecycle. + */ + SLIST_FOREACH_SAFE(inm, &mli->mli_relinmhead, + in6m_nrele, tinm) { + SLIST_REMOVE_HEAD(&mli->mli_relinmhead, + in6m_nrele); + in6m_release_locked(inm); + } + } + } + +out_locked: + MLD_UNLOCK(); + IN6_MULTI_UNLOCK(); +} + +/* + * Update host report group timer. + * Will update the global pending timer flags. + */ +static void +mld_v1_process_group_timer(struct in6_multi *inm, const int version) +{ + INIT_VNET_INET6(curvnet); + int report_timer_expired; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + if (inm->in6m_timer == 0) { + report_timer_expired = 0; + } else if (--inm->in6m_timer == 0) { + report_timer_expired = 1; + } else { + V_current_state_timers_running6 = 1; + return; + } + + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_LAZY_MEMBER: + case MLD_SLEEPING_MEMBER: + case MLD_AWAKENING_MEMBER: + break; + case MLD_REPORTING_MEMBER: + if (report_timer_expired) { + inm->in6m_state = MLD_IDLE_MEMBER; + (void)mld_v1_transmit_report(inm, + MLD_LISTENER_REPORT); + } + break; + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: + case MLD_LEAVING_MEMBER: break; + } +} - case MLD_LISTENER_REPORT: +/* + * Update a group's timers for MLDv2. + * Will update the global pending timer flags. + * Note: Unlocked read from mli. + */ +static void +mld_v2_process_group_timers(struct mld_ifinfo *mli, + struct ifqueue *qrq, struct ifqueue *scq, + struct in6_multi *inm, const int uri_fasthz) +{ + INIT_VNET_INET6(curvnet); + int query_response_timer_expired; + int state_change_retransmit_timer_expired; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + query_response_timer_expired = 0; + state_change_retransmit_timer_expired = 0; + + /* + * During a transition from compatibility mode back to MLDv2, + * a group record in REPORTING state may still have its group + * timer active. This is a no-op in this function; it is easier + * to deal with it here than to complicate the slow-timeout path. + */ + if (inm->in6m_timer == 0) { + query_response_timer_expired = 0; + } else if (--inm->in6m_timer == 0) { + query_response_timer_expired = 1; + } else { + V_current_state_timers_running6 = 1; + } + + if (inm->in6m_sctimer == 0) { + state_change_retransmit_timer_expired = 0; + } else if (--inm->in6m_sctimer == 0) { + state_change_retransmit_timer_expired = 1; + } else { + V_state_change_timers_running6 = 1; + } + + /* We are in fasttimo, so be quick about it. */ + if (!state_change_retransmit_timer_expired && + !query_response_timer_expired) + return; + + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + case MLD_SLEEPING_MEMBER: + case MLD_LAZY_MEMBER: + case MLD_AWAKENING_MEMBER: + case MLD_IDLE_MEMBER: + break; + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: /* - * For fast leave to work, we have to know that we are the - * last person to send a report for this group. Reports - * can potentially get looped back if we are a multicast - * router, so discard reports sourced by me. - * Note that it is impossible to check IFF_LOOPBACK flag of - * ifp for this purpose, since ip6_mloopback pass the physical - * interface to looutput. + * Respond to a previously pending Group-Specific + * or Group-and-Source-Specific query by enqueueing + * the appropriate Current-State report for + * immediate transmission. */ - if (m->m_flags & M_LOOP) /* XXX: grotty flag, but efficient */ - break; + if (query_response_timer_expired) { + int retval; + + retval = mld_v2_enqueue_group_record(qrq, inm, 0, 1, + (inm->in6m_state == MLD_SG_QUERY_PENDING_MEMBER)); + CTR2(KTR_MLD, "%s: enqueue record = %d", + __func__, retval); + inm->in6m_state = MLD_REPORTING_MEMBER; + in6m_clear_recorded(inm); + } + /* FALLTHROUGH */ + case MLD_REPORTING_MEMBER: + case MLD_LEAVING_MEMBER: + if (state_change_retransmit_timer_expired) { + /* + * State-change retransmission timer fired. + * If there are any further pending retransmissions, + * set the global pending state-change flag, and + * reset the timer. + */ + if (--inm->in6m_scrv > 0) { + inm->in6m_sctimer = uri_fasthz; + V_state_change_timers_running6 = 1; + } + /* + * Retransmit the previously computed state-change + * report. If there are no further pending + * retransmissions, the mbuf queue will be consumed. + * Update T0 state to T1 as we have now sent + * a state-change. + */ + (void)mld_v2_merge_state_changes(inm, scq); + + in6m_commit(inm); + CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname); + + /* + * If we are leaving the group for good, make sure + * we release MLD's reference to it. + * This release must be deferred using a SLIST, + * as we are called from a loop which traverses + * the in_ifmultiaddr TAILQ. + */ + if (inm->in6m_state == MLD_LEAVING_MEMBER && + inm->in6m_scrv == 0) { + inm->in6m_state = MLD_NOT_MEMBER; + SLIST_INSERT_HEAD(&mli->mli_relinmhead, + inm, in6m_nrele); + } + } + break; + } +} - if (!IN6_IS_ADDR_MULTICAST(&mld_addr)) - break; +/* + * Switch to a different version on the given interface, + * as per Section 9.12. + */ +static void +mld_set_version(struct mld_ifinfo *mli, const int version) +{ + MLD_LOCK_ASSERT(); + + CTR4(KTR_MLD, "%s: switching to v%d on ifp %p(%s)", __func__, + version, mli->mli_ifp, mli->mli_ifp->if_xname); + + if (version == MLD_VERSION_1) { + int old_version_timer; /* - * If we belong to the group being reported, stop - * our timer for that group. + * Compute the "Older Version Querier Present" timer as per + * Section 9.12. */ - IN6_LOOKUP_MULTI(mld_addr, ifp, in6m); - if (in6m) { - in6m->in6m_timer = 0; /* transit to idle state */ - in6m->in6m_state = MLD_OTHERLISTENER; /* clear flag */ + old_version_timer = mli->mli_rv * mli->mli_qi + mli->mli_qri; + old_version_timer *= PR_SLOWHZ; + + if (version == MLD_VERSION_1) { + mli->mli_v1_timer = old_version_timer; } - break; - default: /* this is impossible */ - log(LOG_ERR, "mld6_input: illegal type(%d)", mldh->mld_type); - break; } - m_freem(m); + if (mli->mli_v1_timer > 0) { + if (mli->mli_version != MLD_VERSION_1) { + mli->mli_version = MLD_VERSION_1; + mld_v2_cancel_link_timers(mli); + } + } } +/* + * Cancel pending MLDv2 timers for the given link and all groups + * joined on it; state-change, general-query, and group-query timers. + */ static void -mld6_sendpkt(struct in6_multi *in6m, int type, const struct in6_addr *dst) +mld_v2_cancel_link_timers(struct mld_ifinfo *mli) { INIT_VNET_INET6(curvnet); - struct mbuf *mh, *md; - struct mld_hdr *mldh; - struct ip6_hdr *ip6; - struct ip6_moptions im6o; - struct in6_ifaddr *ia; - struct ifnet *ifp = in6m->in6m_ifp; - struct ifnet *outif = NULL; + struct ifmultiaddr *ifma; + struct ifnet *ifp; + struct in6_multi *inm; + + CTR3(KTR_MLD, "%s: cancel v2 timers on ifp %p(%s)", __func__, + mli->mli_ifp, mli->mli_ifp->if_xname); + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); /* - * At first, find a link local address on the outgoing interface - * to use as the source address of the MLD packet. + * Fast-track this potentially expensive operation + * by checking all the global 'timer pending' flags. */ - if ((ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST)) - == NULL) + if (!V_interface_timers_running6 && + !V_state_change_timers_running6 && + !V_current_state_timers_running6) return; - /* - * Allocate mbufs to store ip6 header and MLD header. - * We allocate 2 mbufs and make chain in advance because - * it is more convenient when inserting the hop-by-hop option later. - */ + mli->mli_v2_timer = 0; + + ifp = mli->mli_ifp; + + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { + if (ifma->ifma_addr->sa_family != AF_INET6) + continue; + inm = (struct in6_multi *)ifma->ifma_protospec; + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_LAZY_MEMBER: + case MLD_SLEEPING_MEMBER: + case MLD_AWAKENING_MEMBER: + break; + case MLD_LEAVING_MEMBER: + /* + * If we are leaving the group and switching + * version, we need to release the final + * reference held for issuing the INCLUDE {}. + * + * SMPNG: Must drop and re-acquire IF_ADDR_LOCK + * around in6m_release_locked(), as it is not + * a recursive mutex. + */ + IF_ADDR_UNLOCK(ifp); + in6m_release_locked(inm); + IF_ADDR_LOCK(ifp); + /* FALLTHROUGH */ + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: + in6m_clear_recorded(inm); + /* FALLTHROUGH */ + case MLD_REPORTING_MEMBER: + inm->in6m_sctimer = 0; + inm->in6m_timer = 0; + inm->in6m_state = MLD_REPORTING_MEMBER; + /* + * Free any pending MLDv2 state-change records. + */ + _IF_DRAIN(&inm->in6m_scq); + break; + } + } + IF_ADDR_UNLOCK(ifp); +} + +/* + * Global slowtimo handler. + * VIMAGE: Timeout handlers are expected to service all vimages. + */ +void +mld_slowtimo(void) +{ + VNET_ITERATOR_DECL(vnet_iter); + + VNET_LIST_RLOCK(); + VNET_FOREACH(vnet_iter) { + CURVNET_SET(vnet_iter); + mld_slowtimo_vnet(); + CURVNET_RESTORE(); + } + VNET_LIST_RUNLOCK(); +} + +/* + * Per-vnet slowtimo handler. + */ +static void +mld_slowtimo_vnet(void) +{ + INIT_VNET_INET6(curvnet); + struct mld_ifinfo *mli; + + MLD_LOCK(); + + LIST_FOREACH(mli, &V_mli_head, mli_link) { + mld_v1_process_querier_timers(mli); + } + + MLD_UNLOCK(); +} + +/* + * Update the Older Version Querier Present timers for a link. + * See Section 9.12 of RFC 3810. + */ +static void +mld_v1_process_querier_timers(struct mld_ifinfo *mli) +{ + INIT_VNET_INET6(curvnet); + + MLD_LOCK_ASSERT(); + + if (mli->mli_v1_timer == 0) { + /* + * MLDv1 Querier Present timers expired; revert to MLDv2. + */ + if (mli->mli_version != MLD_VERSION_2) { + CTR5(KTR_MLD, + "%s: transition from v%d -> v%d on %p(%s)", + __func__, mli->mli_version, MLD_VERSION_2, + mli->mli_ifp, mli->mli_ifp->if_xname); + mli->mli_version = MLD_VERSION_2; + } + } +} + +/* + * Transmit an MLDv1 report immediately. + */ +static int +mld_v1_transmit_report(struct in6_multi *in6m, const int type) +{ + struct ifnet *ifp; + struct in6_ifaddr *ia; + struct ip6_hdr *ip6; + struct mbuf *mh, *md; + struct mld_hdr *mld; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + ifp = in6m->in6m_ifp; + ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST); + /* ia may be NULL if link-local address is tentative. */ + MGETHDR(mh, M_DONTWAIT, MT_HEADER); if (mh == NULL) - return; + return (ENOMEM); MGET(md, M_DONTWAIT, MT_DATA); if (md == NULL) { m_free(mh); - return; + return (ENOMEM); } mh->m_next = md; - mh->m_pkthdr.rcvif = NULL; + /* + * FUTURE: Consider increasing alignment by ETHER_HDR_LEN, so + * that ether_output() does not need to allocate another mbuf + * for the header in the most common case. + */ + MH_ALIGN(mh, sizeof(struct ip6_hdr)); mh->m_pkthdr.len = sizeof(struct ip6_hdr) + sizeof(struct mld_hdr); mh->m_len = sizeof(struct ip6_hdr); - MH_ALIGN(mh, sizeof(struct ip6_hdr)); - /* fill in the ip6 header */ ip6 = mtod(mh, struct ip6_hdr *); ip6->ip6_flow = 0; ip6->ip6_vfc &= ~IPV6_VERSION_MASK; ip6->ip6_vfc |= IPV6_VERSION; - /* ip6_plen will be set later */ ip6->ip6_nxt = IPPROTO_ICMPV6; - /* ip6_hlim will be set by im6o.im6o_multicast_hlim */ - ip6->ip6_src = ia->ia_addr.sin6_addr; - ip6->ip6_dst = dst ? *dst : in6m->in6m_addr; + ip6->ip6_src = ia ? ia->ia_addr.sin6_addr : in6addr_any; + ip6->ip6_dst = in6m->in6m_addr; - /* fill in the MLD header */ md->m_len = sizeof(struct mld_hdr); - mldh = mtod(md, struct mld_hdr *); - mldh->mld_type = type; - mldh->mld_code = 0; - mldh->mld_cksum = 0; - /* XXX: we assume the function will not be called for query messages */ - mldh->mld_maxdelay = 0; - mldh->mld_reserved = 0; - mldh->mld_addr = in6m->in6m_addr; - in6_clearscope(&mldh->mld_addr); /* XXX */ - mldh->mld_cksum = in6_cksum(mh, IPPROTO_ICMPV6, sizeof(struct ip6_hdr), - sizeof(struct mld_hdr)); - - /* construct multicast option */ - bzero(&im6o, sizeof(im6o)); - im6o.im6o_multicast_ifp = ifp; - im6o.im6o_multicast_hlim = 1; + mld = mtod(md, struct mld_hdr *); + mld->mld_type = type; + mld->mld_code = 0; + mld->mld_cksum = 0; + mld->mld_maxdelay = 0; + mld->mld_reserved = 0; + mld->mld_addr = in6m->in6m_addr; + in6_clearscope(&mld->mld_addr); + mld->mld_cksum = in6_cksum(mh, IPPROTO_ICMPV6, + sizeof(struct ip6_hdr), sizeof(struct mld_hdr)); + + mld_save_context(mh, ifp); + mh->m_flags |= M_MLDV1; + + mld_dispatch_packet(mh); + + return (0); +} + +/* + * Process a state change from the upper layer for the given IPv6 group. + * + * Each socket holds a reference on the in_multi in its own ip_moptions. + * The socket layer will have made the necessary updates to.the group + * state, it is now up to MLD to issue a state change report if there + * has been any change between T0 (when the last state-change was issued) + * and T1 (now). + * + * We use the MLDv2 state machine at group level. The MLd module + * however makes the decision as to which MLD protocol version to speak. + * A state change *from* INCLUDE {} always means an initial join. + * A state change *to* INCLUDE {} always means a final leave. + * + * If delay is non-zero, and the state change is an initial multicast + * join, the state change report will be delayed by 'delay' ticks + * in units of PR_FASTHZ if MLDv1 is active on the link; otherwise + * the initial MLDv2 state change report will be delayed by whichever + * is sooner, a pending state-change timer or delay itself. + * + * VIMAGE: curvnet should have been set by caller, as this routine + * is called from the socket option handlers. + */ +int +mld_change_state(struct in6_multi *inm, const int delay) +{ + struct mld_ifinfo *mli; + struct ifnet *ifp; + int error; + + IN6_MULTI_LOCK_ASSERT(); + + error = 0; /* - * Request loopback of the report if we are acting as a multicast - * router, so that the process-level routing daemon can hear it. + * Try to detect if the upper layer just asked us to change state + * for an interface which has now gone away. */ - im6o.im6o_multicast_loop = (ip6_mrouter != NULL); + KASSERT(inm->in6m_ifma != NULL, ("%s: no ifma", __func__)); + ifp = inm->in6m_ifma->ifma_ifp; + if (ifp != NULL) { + /* + * Sanity check that netinet6's notion of ifp is the + * same as net's. + */ + KASSERT(inm->in6m_ifp == ifp, ("%s: bad ifp", __func__)); + } - /* increment output statictics */ - ICMP6STAT_INC(icp6s_outhist[type]); + MLD_LOCK(); - ip6_output(mh, &V_ip6_opts, NULL, 0, &im6o, &outif, NULL); - if (outif) { - icmp6_ifstat_inc(outif, ifs6_out_msg); - switch (type) { - case MLD_LISTENER_QUERY: - icmp6_ifstat_inc(outif, ifs6_out_mldquery); + mli = MLD_IFINFO(ifp); + KASSERT(mli != NULL, ("%s: no mld_ifinfo for ifp %p", __func__, ifp)); + + /* + * If we detect a state transition to or from MCAST_UNDEFINED + * for this group, then we are starting or finishing an MLD + * life cycle for this group. + */ + if (inm->in6m_st[1].iss_fmode != inm->in6m_st[0].iss_fmode) { + CTR3(KTR_MLD, "%s: inm transition %d -> %d", __func__, + inm->in6m_st[0].iss_fmode, inm->in6m_st[1].iss_fmode); + if (inm->in6m_st[0].iss_fmode == MCAST_UNDEFINED) { + CTR1(KTR_MLD, "%s: initial join", __func__); + error = mld_initial_join(inm, mli, delay); + goto out_locked; + } else if (inm->in6m_st[1].iss_fmode == MCAST_UNDEFINED) { + CTR1(KTR_MLD, "%s: final leave", __func__); + mld_final_leave(inm, mli); + goto out_locked; + } + } else { + CTR1(KTR_MLD, "%s: filter set change", __func__); + } + + error = mld_handle_state_change(inm, mli); + +out_locked: + MLD_UNLOCK(); + return (error); +} + +/* + * Perform the initial join for an MLD group. + * + * When joining a group: + * If the group should have its MLD traffic suppressed, do nothing. + * MLDv1 starts sending MLDv1 host membership reports. + * MLDv2 will schedule an MLDv2 state-change report containing the + * initial state of the membership. + * + * If the delay argument is non-zero, then we must delay sending the + * initial state change for delay ticks (in units of PR_FASTHZ). + */ +static int +mld_initial_join(struct in6_multi *inm, struct mld_ifinfo *mli, + const int delay) +{ + INIT_VNET_INET6(curvnet); + struct ifnet *ifp; + struct ifqueue *ifq; + int error, retval, syncstates; + int odelay; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + CTR4(KTR_MLD, "%s: initial join %s on ifp %p(%s)", + __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp, inm->in6m_ifp->if_xname); + + error = 0; + syncstates = 1; + + ifp = inm->in6m_ifp; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + KASSERT(mli && mli->mli_ifp == ifp, ("%s: inconsistent ifp", __func__)); + + /* + * Groups joined on loopback or marked as 'not reported', + * enter the MLD_SILENT_MEMBER state and + * are never reported in any protocol exchanges. + * All other groups enter the appropriate state machine + * for the version in use on this link. + * A link marked as MLIF_SILENT causes MLD to be completely + * disabled for the link. + */ + if ((ifp->if_flags & IFF_LOOPBACK) || + (mli->mli_flags & MLIF_SILENT) || + !mld_is_addr_reported(&inm->in6m_addr)) { + CTR1(KTR_MLD, +"%s: not kicking state machine for silent group", __func__); + inm->in6m_state = MLD_SILENT_MEMBER; + inm->in6m_timer = 0; + } else { + /* + * Deal with overlapping in_multi lifecycle. + * If this group was LEAVING, then make sure + * we drop the reference we picked up to keep the + * group around for the final INCLUDE {} enqueue. + */ + if (mli->mli_version == MLD_VERSION_2 && + inm->in6m_state == MLD_LEAVING_MEMBER) + in6m_release_locked(inm); + + inm->in6m_state = MLD_REPORTING_MEMBER; + + switch (mli->mli_version) { + case MLD_VERSION_1: + /* + * If a delay was provided, only use it if + * it is greater than the delay normally + * used for an MLDv1 state change report, + * and delay sending the initial MLDv1 report + * by not transitioning to the IDLE state. + */ + odelay = MLD_RANDOM_DELAY(MLD_V1_MAX_RI * PR_FASTHZ); + if (delay) { + inm->in6m_timer = max(delay, odelay); + V_current_state_timers_running6 = 1; + } else { + inm->in6m_state = MLD_IDLE_MEMBER; + error = mld_v1_transmit_report(inm, + MLD_LISTENER_REPORT); + if (error == 0) { + inm->in6m_timer = odelay; + V_current_state_timers_running6 = 1; + } + } break; - case MLD_LISTENER_REPORT: - icmp6_ifstat_inc(outif, ifs6_out_mldreport); + + case MLD_VERSION_2: + /* + * Defer update of T0 to T1, until the first copy + * of the state change has been transmitted. + */ + syncstates = 0; + + /* + * Immediately enqueue a State-Change Report for + * this interface, freeing any previous reports. + * Don't kick the timers if there is nothing to do, + * or if an error occurred. + */ + ifq = &inm->in6m_scq; + _IF_DRAIN(ifq); + retval = mld_v2_enqueue_group_record(ifq, inm, 1, + 0, 0); + CTR2(KTR_MLD, "%s: enqueue record = %d", + __func__, retval); + if (retval <= 0) { + error = retval * -1; + break; + } + + /* + * Schedule transmission of pending state-change + * report up to RV times for this link. The timer + * will fire at the next mld_fasttimo (~200ms), + * giving us an opportunity to merge the reports. + * + * If a delay was provided to this function, only + * use this delay if sooner than the existing one. + */ + KASSERT(mli->mli_rv > 1, + ("%s: invalid robustness %d", __func__, + mli->mli_rv)); + inm->in6m_scrv = mli->mli_rv; + if (delay) { + if (inm->in6m_sctimer > 1) { + inm->in6m_sctimer = + min(inm->in6m_sctimer, delay); + } else + inm->in6m_sctimer = delay; + } else + inm->in6m_sctimer = 1; + V_state_change_timers_running6 = 1; + + error = 0; break; - case MLD_LISTENER_DONE: - icmp6_ifstat_inc(outif, ifs6_out_mlddone); + } + } + + /* + * Only update the T0 state if state change is atomic, + * i.e. we don't need to wait for a timer to fire before we + * can consider the state change to have been communicated. + */ + if (syncstates) { + in6m_commit(inm); + CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname); + } + + return (error); +} + +/* + * Issue an intermediate state change during the life-cycle. + */ +static int +mld_handle_state_change(struct in6_multi *inm, struct mld_ifinfo *mli) +{ + INIT_VNET_INET6(curvnet); + struct ifnet *ifp; + int retval; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + CTR4(KTR_MLD, "%s: state change for %s on ifp %p(%s)", + __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp, inm->in6m_ifp->if_xname); + + ifp = inm->in6m_ifp; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + KASSERT(mli && mli->mli_ifp == ifp, + ("%s: inconsistent ifp", __func__)); + + if ((ifp->if_flags & IFF_LOOPBACK) || + (mli->mli_flags & MLIF_SILENT) || + !mld_is_addr_reported(&inm->in6m_addr) || + (mli->mli_version != MLD_VERSION_2)) { + if (!mld_is_addr_reported(&inm->in6m_addr)) { + CTR1(KTR_MLD, +"%s: not kicking state machine for silent group", __func__); + } + CTR1(KTR_MLD, "%s: nothing to do", __func__); + in6m_commit(inm); + CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname); + return (0); + } + + _IF_DRAIN(&inm->in6m_scq); + + retval = mld_v2_enqueue_group_record(&inm->in6m_scq, inm, 1, 0, 0); + CTR2(KTR_MLD, "%s: enqueue record = %d", __func__, retval); + if (retval <= 0) + return (-retval); + + /* + * If record(s) were enqueued, start the state-change + * report timer for this group. + */ + inm->in6m_scrv = mli->mli_rv; + inm->in6m_sctimer = 1; + V_state_change_timers_running6 = 1; + + return (0); +} + +/* + * Perform the final leave for a multicast address. + * + * When leaving a group: + * MLDv1 sends a DONE message, if and only if we are the reporter. + * MLDv2 enqueues a state-change report containing a transition + * to INCLUDE {} for immediate transmission. + */ +static void +mld_final_leave(struct in6_multi *inm, struct mld_ifinfo *mli) +{ + INIT_VNET_INET6(curvnet); + int syncstates; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + + syncstates = 1; + + CTR4(KTR_MLD, "%s: final leave %s on ifp %p(%s)", + __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp, inm->in6m_ifp->if_xname); + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + case MLD_LEAVING_MEMBER: + /* Already leaving or left; do nothing. */ + CTR1(KTR_MLD, +"%s: not kicking state machine for silent group", __func__); + break; + case MLD_REPORTING_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: + if (mli->mli_version == MLD_VERSION_1) { +#ifdef INVARIANTS + if (inm->in6m_state == MLD_G_QUERY_PENDING_MEMBER || + inm->in6m_state == MLD_SG_QUERY_PENDING_MEMBER) + panic("%s: MLDv2 state reached, not MLDv2 mode", + __func__); +#endif + mld_v1_transmit_report(inm, MLD_LISTENER_DONE); + inm->in6m_state = MLD_NOT_MEMBER; + } else if (mli->mli_version == MLD_VERSION_2) { + /* + * Stop group timer and all pending reports. + * Immediately enqueue a state-change report + * TO_IN {} to be sent on the next fast timeout, + * giving us an opportunity to merge reports. + */ + _IF_DRAIN(&inm->in6m_scq); + inm->in6m_timer = 0; + inm->in6m_scrv = mli->mli_rv; + CTR4(KTR_MLD, "%s: Leaving %s/%s with %d " + "pending retransmissions.", __func__, + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname, inm->in6m_scrv); + if (inm->in6m_scrv == 0) { + inm->in6m_state = MLD_NOT_MEMBER; + inm->in6m_sctimer = 0; + } else { + int retval; + + in6m_acquire_locked(inm); + + retval = mld_v2_enqueue_group_record( + &inm->in6m_scq, inm, 1, 0, 0); + KASSERT(retval != 0, + ("%s: enqueue record = %d", __func__, + retval)); + + inm->in6m_state = MLD_LEAVING_MEMBER; + inm->in6m_sctimer = 1; + V_state_change_timers_running6 = 1; + syncstates = 0; + } break; } + break; + case MLD_LAZY_MEMBER: + case MLD_SLEEPING_MEMBER: + case MLD_AWAKENING_MEMBER: + /* Our reports are suppressed; do nothing. */ + break; + } + + if (syncstates) { + in6m_commit(inm); + CTR3(KTR_MLD, "%s: T1 -> T0 for %s/%s", __func__, + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname); + inm->in6m_st[1].iss_fmode = MCAST_UNDEFINED; + CTR3(KTR_MLD, "%s: T1 now MCAST_UNDEFINED for %p/%s", + __func__, &inm->in6m_addr, inm->in6m_ifp->if_xname); } } /* - * Add an address to the list of IP6 multicast addresses for a given interface. - * Add source addresses to the list also, if upstream router is MLDv2 capable - * and the number of source is not 0. + * Enqueue an MLDv2 group record to the given output queue. + * + * If is_state_change is zero, a current-state record is appended. + * If is_state_change is non-zero, a state-change report is appended. + * + * If is_group_query is non-zero, an mbuf packet chain is allocated. + * If is_group_query is zero, and if there is a packet with free space + * at the tail of the queue, it will be appended to providing there + * is enough free space. + * Otherwise a new mbuf packet chain is allocated. + * + * If is_source_query is non-zero, each source is checked to see if + * it was recorded for a Group-Source query, and will be omitted if + * it is not both in-mode and recorded. + * + * The function will attempt to allocate leading space in the packet + * for the IPv6+ICMP headers to be prepended without fragmenting the chain. + * + * If successful the size of all data appended to the queue is returned, + * otherwise an error code less than zero is returned, or zero if + * no record(s) were appended. */ -struct in6_multi * -in6_addmulti(struct in6_addr *maddr6, struct ifnet *ifp, - int *errorp, int delay) +static int +mld_v2_enqueue_group_record(struct ifqueue *ifq, struct in6_multi *inm, + const int is_state_change, const int is_group_query, + const int is_source_query) { - struct in6_multi *in6m; + struct mldv2_record mr; + struct mldv2_record *pmr; + struct ifnet *ifp; + struct ip6_msource *ims, *nims; + struct mbuf *m0, *m, *md; + int error, is_filter_list_change; + int minrec0len, m0srcs, msrcs, nbytes, off; + int record_has_sources; + int now; + int type; + uint8_t mode; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif - *errorp = 0; - in6m = NULL; + IN6_MULTI_LOCK_ASSERT(); + + error = 0; + ifp = inm->in6m_ifp; + is_filter_list_change = 0; + m = NULL; + m0 = NULL; + m0srcs = 0; + msrcs = 0; + nbytes = 0; + nims = NULL; + record_has_sources = 1; + pmr = NULL; + type = MLD_DO_NOTHING; + mode = inm->in6m_st[1].iss_fmode; - /*IN6_MULTI_LOCK();*/ + /* + * If we did not transition out of ASM mode during t0->t1, + * and there are no source nodes to process, we can skip + * the generation of source records. + */ + if (inm->in6m_st[0].iss_asm > 0 && inm->in6m_st[1].iss_asm > 0 && + inm->in6m_nsrc == 0) + record_has_sources = 0; - IN6_LOOKUP_MULTI(*maddr6, ifp, in6m); - if (in6m != NULL) { + if (is_state_change) { /* - * If we already joined this group, just bump the - * refcount and return it. + * Queue a state change record. + * If the mode did not change, and there are non-ASM + * listeners or source filters present, + * we potentially need to issue two records for the group. + * If we are transitioning to MCAST_UNDEFINED, we need + * not send any sources. + * If there are ASM listeners, and there was no filter + * mode transition of any kind, do nothing. */ - KASSERT(in6m->in6m_refcount >= 1, - ("%s: bad refcount %d", __func__, in6m->in6m_refcount)); - ++in6m->in6m_refcount; - } else do { - struct in6_multi *nin6m; - struct ifmultiaddr *ifma; - struct sockaddr_in6 sa6; - - bzero(&sa6, sizeof(sa6)); - sa6.sin6_family = AF_INET6; - sa6.sin6_len = sizeof(struct sockaddr_in6); - sa6.sin6_addr = *maddr6; - - *errorp = if_addmulti(ifp, (struct sockaddr *)&sa6, &ifma); - if (*errorp) - break; + if (mode != inm->in6m_st[0].iss_fmode) { + if (mode == MCAST_EXCLUDE) { + CTR1(KTR_MLD, "%s: change to EXCLUDE", + __func__); + type = MLD_CHANGE_TO_EXCLUDE_MODE; + } else { + CTR1(KTR_MLD, "%s: change to INCLUDE", + __func__); + type = MLD_CHANGE_TO_INCLUDE_MODE; + if (mode == MCAST_UNDEFINED) + record_has_sources = 0; + } + } else { + if (record_has_sources) { + is_filter_list_change = 1; + } else { + type = MLD_DO_NOTHING; + } + } + } else { + /* + * Queue a current state record. + */ + if (mode == MCAST_EXCLUDE) { + type = MLD_MODE_IS_EXCLUDE; + } else if (mode == MCAST_INCLUDE) { + type = MLD_MODE_IS_INCLUDE; + KASSERT(inm->in6m_st[1].iss_asm == 0, + ("%s: inm %p is INCLUDE but ASM count is %d", + __func__, inm, inm->in6m_st[1].iss_asm)); + } + } + + /* + * Generate the filter list changes using a separate function. + */ + if (is_filter_list_change) + return (mld_v2_enqueue_filter_change(ifq, inm)); + + if (type == MLD_DO_NOTHING) { + CTR3(KTR_MLD, "%s: nothing to do for %s/%s", + __func__, ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname); + return (0); + } + + /* + * If any sources are present, we must be able to fit at least + * one in the trailing space of the tail packet's mbuf, + * ideally more. + */ + minrec0len = sizeof(struct mldv2_record); + if (record_has_sources) + minrec0len += sizeof(struct in6_addr); + + CTR4(KTR_MLD, "%s: queueing %s for %s/%s", __func__, + mld_rec_type_to_str(type), + ip6_sprintf(ip6tbuf, &inm->in6m_addr), + inm->in6m_ifp->if_xname); + + /* + * Check if we have a packet in the tail of the queue for this + * group into which the first group record for this group will fit. + * Otherwise allocate a new packet. + * Always allocate leading space for IP6+RA+ICMPV6+REPORT. + * Note: Group records for G/GSR query responses MUST be sent + * in their own packet. + */ + m0 = ifq->ifq_tail; + if (!is_group_query && + m0 != NULL && + (m0->m_pkthdr.PH_vt.vt_nrecs + 1 <= MLD_V2_REPORT_MAXRECS) && + (m0->m_pkthdr.len + minrec0len) < + (ifp->if_mtu - MLD_MTUSPACE)) { + m0srcs = (ifp->if_mtu - m0->m_pkthdr.len - + sizeof(struct mldv2_record)) / + sizeof(struct in6_addr); + m = m0; + CTR1(KTR_MLD, "%s: use existing packet", __func__); + } else { + if (_IF_QFULL(ifq)) { + CTR1(KTR_MLD, "%s: outbound queue full", __func__); + return (-ENOMEM); + } + m = NULL; + m0srcs = (ifp->if_mtu - MLD_MTUSPACE - + sizeof(struct mldv2_record)) / sizeof(struct in6_addr); + if (!is_state_change && !is_group_query) + m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR); + if (m == NULL) + m = m_gethdr(M_DONTWAIT, MT_DATA); + if (m == NULL) + return (-ENOMEM); + + mld_save_context(m, ifp); + + CTR1(KTR_MLD, "%s: allocated first packet", __func__); + } + + /* + * Append group record. + * If we have sources, we don't know how many yet. + */ + mr.mr_type = type; + mr.mr_datalen = 0; + mr.mr_numsrc = 0; + mr.mr_addr = inm->in6m_addr; + in6_clearscope(&mr.mr_addr); + if (!m_append(m, sizeof(struct mldv2_record), (void *)&mr)) { + if (m != m0) + m_freem(m); + CTR1(KTR_MLD, "%s: m_append() failed.", __func__); + return (-ENOMEM); + } + nbytes += sizeof(struct mldv2_record); + + /* + * Append as many sources as will fit in the first packet. + * If we are appending to a new packet, the chain allocation + * may potentially use clusters; use m_getptr() in this case. + * If we are appending to an existing packet, we need to obtain + * a pointer to the group record after m_append(), in case a new + * mbuf was allocated. + * Only append sources which are in-mode at t1. If we are + * transitioning to MCAST_UNDEFINED state on the group, do not + * include source entries. + * Only report recorded sources in our filter set when responding + * to a group-source query. + */ + if (record_has_sources) { + if (m == m0) { + md = m_last(m); + pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + + md->m_len - nbytes); + } else { + md = m_getptr(m, 0, &off); + pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + + off); + } + msrcs = 0; + RB_FOREACH_SAFE(ims, ip6_msource_tree, &inm->in6m_srcs, + nims) { + CTR2(KTR_MLD, "%s: visit node %s", __func__, + ip6_sprintf(ip6tbuf, &ims->im6s_addr)); + now = im6s_get_mode(inm, ims, 1); + CTR2(KTR_MLD, "%s: node is %d", __func__, now); + if ((now != mode) || + (now == mode && mode == MCAST_UNDEFINED)) { + CTR1(KTR_MLD, "%s: skip node", __func__); + continue; + } + if (is_source_query && ims->im6s_stp == 0) { + CTR1(KTR_MLD, "%s: skip unrecorded node", + __func__); + continue; + } + CTR1(KTR_MLD, "%s: append node", __func__); + if (!m_append(m, sizeof(struct in6_addr), + (void *)&ims->im6s_addr)) { + if (m != m0) + m_freem(m); + CTR1(KTR_MLD, "%s: m_append() failed.", + __func__); + return (-ENOMEM); + } + nbytes += sizeof(struct in6_addr); + ++msrcs; + if (msrcs == m0srcs) + break; + } + CTR2(KTR_MLD, "%s: msrcs is %d this packet", __func__, + msrcs); + pmr->mr_numsrc = htons(msrcs); + nbytes += (msrcs * sizeof(struct in6_addr)); + } + + if (is_source_query && msrcs == 0) { + CTR1(KTR_MLD, "%s: no recorded sources to report", __func__); + if (m != m0) + m_freem(m); + return (0); + } + + /* + * We are good to go with first packet. + */ + if (m != m0) { + CTR1(KTR_MLD, "%s: enqueueing first packet", __func__); + m->m_pkthdr.PH_vt.vt_nrecs = 1; + _IF_ENQUEUE(ifq, m); + } else + m->m_pkthdr.PH_vt.vt_nrecs++; + + /* + * No further work needed if no source list in packet(s). + */ + if (!record_has_sources) + return (nbytes); + + /* + * Whilst sources remain to be announced, we need to allocate + * a new packet and fill out as many sources as will fit. + * Always try for a cluster first. + */ + while (nims != NULL) { + if (_IF_QFULL(ifq)) { + CTR1(KTR_MLD, "%s: outbound queue full", __func__); + return (-ENOMEM); + } + m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR); + if (m == NULL) + m = m_gethdr(M_DONTWAIT, MT_DATA); + if (m == NULL) + return (-ENOMEM); + mld_save_context(m, ifp); + md = m_getptr(m, 0, &off); + pmr = (struct mldv2_record *)(mtod(md, uint8_t *) + off); + CTR1(KTR_MLD, "%s: allocated next packet", __func__); + + if (!m_append(m, sizeof(struct mldv2_record), (void *)&mr)) { + if (m != m0) + m_freem(m); + CTR1(KTR_MLD, "%s: m_append() failed.", __func__); + return (-ENOMEM); + } + m->m_pkthdr.PH_vt.vt_nrecs = 1; + nbytes += sizeof(struct mldv2_record); + + m0srcs = (ifp->if_mtu - MLD_MTUSPACE - + sizeof(struct mldv2_record)) / sizeof(struct in6_addr); + + msrcs = 0; + RB_FOREACH_FROM(ims, ip6_msource_tree, nims) { + CTR2(KTR_MLD, "%s: visit node %s", + __func__, ip6_sprintf(ip6tbuf, &ims->im6s_addr)); + now = im6s_get_mode(inm, ims, 1); + if ((now != mode) || + (now == mode && mode == MCAST_UNDEFINED)) { + CTR1(KTR_MLD, "%s: skip node", __func__); + continue; + } + if (is_source_query && ims->im6s_stp == 0) { + CTR1(KTR_MLD, "%s: skip unrecorded node", + __func__); + continue; + } + CTR1(KTR_MLD, "%s: append node", __func__); + if (!m_append(m, sizeof(struct in6_addr), + (void *)&ims->im6s_addr)) { + if (m != m0) + m_freem(m); + CTR1(KTR_MLD, "%s: m_append() failed.", + __func__); + return (-ENOMEM); + } + ++msrcs; + if (msrcs == m0srcs) + break; + } + pmr->mr_numsrc = htons(msrcs); + nbytes += (msrcs * sizeof(struct in6_addr)); + + CTR1(KTR_MLD, "%s: enqueueing next packet", __func__); + _IF_ENQUEUE(ifq, m); + } + + return (nbytes); +} + +/* + * Type used to mark record pass completion. + * We exploit the fact we can cast to this easily from the + * current filter modes on each ip_msource node. + */ +typedef enum { + REC_NONE = 0x00, /* MCAST_UNDEFINED */ + REC_ALLOW = 0x01, /* MCAST_INCLUDE */ + REC_BLOCK = 0x02, /* MCAST_EXCLUDE */ + REC_FULL = REC_ALLOW | REC_BLOCK +} rectype_t; + +/* + * Enqueue an MLDv2 filter list change to the given output queue. + * + * Source list filter state is held in an RB-tree. When the filter list + * for a group is changed without changing its mode, we need to compute + * the deltas between T0 and T1 for each source in the filter set, + * and enqueue the appropriate ALLOW_NEW/BLOCK_OLD records. + * + * As we may potentially queue two record types, and the entire R-B tree + * needs to be walked at once, we break this out into its own function + * so we can generate a tightly packed queue of packets. + * + * XXX This could be written to only use one tree walk, although that makes + * serializing into the mbuf chains a bit harder. For now we do two walks + * which makes things easier on us, and it may or may not be harder on + * the L2 cache. + * + * If successful the size of all data appended to the queue is returned, + * otherwise an error code less than zero is returned, or zero if + * no record(s) were appended. + */ +static int +mld_v2_enqueue_filter_change(struct ifqueue *ifq, struct in6_multi *inm) +{ + static const int MINRECLEN = + sizeof(struct mldv2_record) + sizeof(struct in6_addr); + struct ifnet *ifp; + struct mldv2_record mr; + struct mldv2_record *pmr; + struct ip6_msource *ims, *nims; + struct mbuf *m, *m0, *md; + int m0srcs, nbytes, npbytes, off, rsrcs, schanged; + int nallow, nblock; + uint8_t mode, now, then; + rectype_t crt, drt, nrt; +#ifdef KTR + char ip6tbuf[INET6_ADDRSTRLEN]; +#endif + IN6_MULTI_LOCK_ASSERT(); + + if (inm->in6m_nsrc == 0 || + (inm->in6m_st[0].iss_asm > 0 && inm->in6m_st[1].iss_asm > 0)) + return (0); + + ifp = inm->in6m_ifp; /* interface */ + mode = inm->in6m_st[1].iss_fmode; /* filter mode at t1 */ + crt = REC_NONE; /* current group record type */ + drt = REC_NONE; /* mask of completed group record types */ + nrt = REC_NONE; /* record type for current node */ + m0srcs = 0; /* # source which will fit in current mbuf chain */ + npbytes = 0; /* # of bytes appended this packet */ + nbytes = 0; /* # of bytes appended to group's state-change queue */ + rsrcs = 0; /* # sources encoded in current record */ + schanged = 0; /* # nodes encoded in overall filter change */ + nallow = 0; /* # of source entries in ALLOW_NEW */ + nblock = 0; /* # of source entries in BLOCK_OLD */ + nims = NULL; /* next tree node pointer */ + + /* + * For each possible filter record mode. + * The first kind of source we encounter tells us which + * is the first kind of record we start appending. + * If a node transitioned to UNDEFINED at t1, its mode is treated + * as the inverse of the group's filter mode. + */ + while (drt != REC_FULL) { + do { + m0 = ifq->ifq_tail; + if (m0 != NULL && + (m0->m_pkthdr.PH_vt.vt_nrecs + 1 <= + MLD_V2_REPORT_MAXRECS) && + (m0->m_pkthdr.len + MINRECLEN) < + (ifp->if_mtu - MLD_MTUSPACE)) { + m = m0; + m0srcs = (ifp->if_mtu - m0->m_pkthdr.len - + sizeof(struct mldv2_record)) / + sizeof(struct in6_addr); + CTR1(KTR_MLD, + "%s: use previous packet", __func__); + } else { + m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR); + if (m == NULL) + m = m_gethdr(M_DONTWAIT, MT_DATA); + if (m == NULL) { + CTR1(KTR_MLD, + "%s: m_get*() failed", __func__); + return (-ENOMEM); + } + m->m_pkthdr.PH_vt.vt_nrecs = 0; + mld_save_context(m, ifp); + m0srcs = (ifp->if_mtu - MLD_MTUSPACE - + sizeof(struct mldv2_record)) / + sizeof(struct in6_addr); + npbytes = 0; + CTR1(KTR_MLD, + "%s: allocated new packet", __func__); + } + /* + * Append the MLD group record header to the + * current packet's data area. + * Recalculate pointer to free space for next + * group record, in case m_append() allocated + * a new mbuf or cluster. + */ + memset(&mr, 0, sizeof(mr)); + mr.mr_addr = inm->in6m_addr; + in6_clearscope(&mr.mr_addr); + if (!m_append(m, sizeof(mr), (void *)&mr)) { + if (m != m0) + m_freem(m); + CTR1(KTR_MLD, + "%s: m_append() failed", __func__); + return (-ENOMEM); + } + npbytes += sizeof(struct mldv2_record); + if (m != m0) { + /* new packet; offset in chain */ + md = m_getptr(m, npbytes - + sizeof(struct mldv2_record), &off); + pmr = (struct mldv2_record *)(mtod(md, + uint8_t *) + off); + } else { + /* current packet; offset from last append */ + md = m_last(m); + pmr = (struct mldv2_record *)(mtod(md, + uint8_t *) + md->m_len - + sizeof(struct mldv2_record)); + } + /* + * Begin walking the tree for this record type + * pass, or continue from where we left off + * previously if we had to allocate a new packet. + * Only report deltas in-mode at t1. + * We need not report included sources as allowed + * if we are in inclusive mode on the group, + * however the converse is not true. + */ + rsrcs = 0; + if (nims == NULL) { + nims = RB_MIN(ip6_msource_tree, + &inm->in6m_srcs); + } + RB_FOREACH_FROM(ims, ip6_msource_tree, nims) { + CTR2(KTR_MLD, "%s: visit node %s", __func__, + ip6_sprintf(ip6tbuf, &ims->im6s_addr)); + now = im6s_get_mode(inm, ims, 1); + then = im6s_get_mode(inm, ims, 0); + CTR3(KTR_MLD, "%s: mode: t0 %d, t1 %d", + __func__, then, now); + if (now == then) { + CTR1(KTR_MLD, + "%s: skip unchanged", __func__); + continue; + } + if (mode == MCAST_EXCLUDE && + now == MCAST_INCLUDE) { + CTR1(KTR_MLD, + "%s: skip IN src on EX group", + __func__); + continue; + } + nrt = (rectype_t)now; + if (nrt == REC_NONE) + nrt = (rectype_t)(~mode & REC_FULL); + if (schanged++ == 0) { + crt = nrt; + } else if (crt != nrt) + continue; + if (!m_append(m, sizeof(struct in6_addr), + (void *)&ims->im6s_addr)) { + if (m != m0) + m_freem(m); + CTR1(KTR_MLD, + "%s: m_append() failed", __func__); + return (-ENOMEM); + } + nallow += !!(crt == REC_ALLOW); + nblock += !!(crt == REC_BLOCK); + if (++rsrcs == m0srcs) + break; + } + /* + * If we did not append any tree nodes on this + * pass, back out of allocations. + */ + if (rsrcs == 0) { + npbytes -= sizeof(struct mldv2_record); + if (m != m0) { + CTR1(KTR_MLD, + "%s: m_free(m)", __func__); + m_freem(m); + } else { + CTR1(KTR_MLD, + "%s: m_adj(m, -mr)", __func__); + m_adj(m, -((int)sizeof( + struct mldv2_record))); + } + continue; + } + npbytes += (rsrcs * sizeof(struct in6_addr)); + if (crt == REC_ALLOW) + pmr->mr_type = MLD_ALLOW_NEW_SOURCES; + else if (crt == REC_BLOCK) + pmr->mr_type = MLD_BLOCK_OLD_SOURCES; + pmr->mr_numsrc = htons(rsrcs); + /* + * Count the new group record, and enqueue this + * packet if it wasn't already queued. + */ + m->m_pkthdr.PH_vt.vt_nrecs++; + if (m != m0) + _IF_ENQUEUE(ifq, m); + nbytes += npbytes; + } while (nims != NULL); + drt |= crt; + crt = (~crt & REC_FULL); + } + + CTR3(KTR_MLD, "%s: queued %d ALLOW_NEW, %d BLOCK_OLD", __func__, + nallow, nblock); + + return (nbytes); +} + +static int +mld_v2_merge_state_changes(struct in6_multi *inm, struct ifqueue *ifscq) +{ + struct ifqueue *gq; + struct mbuf *m; /* pending state-change */ + struct mbuf *m0; /* copy of pending state-change */ + struct mbuf *mt; /* last state-change in packet */ + int docopy, domerge; + u_int recslen; + + docopy = 0; + domerge = 0; + recslen = 0; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + /* + * If there are further pending retransmissions, make a writable + * copy of each queued state-change message before merging. + */ + if (inm->in6m_scrv > 0) + docopy = 1; + + gq = &inm->in6m_scq; +#ifdef KTR + if (gq->ifq_head == NULL) { + CTR2(KTR_MLD, "%s: WARNING: queue for inm %p is empty", + __func__, inm); + } +#endif + + m = gq->ifq_head; + while (m != NULL) { /* - * If ifma->ifma_protospec is null, then if_addmulti() created - * a new record. Otherwise, bump refcount, and we are done. + * Only merge the report into the current packet if + * there is sufficient space to do so; an MLDv2 report + * packet may only contain 65,535 group records. + * Always use a simple mbuf chain concatentation to do this, + * as large state changes for single groups may have + * allocated clusters. */ - if (ifma->ifma_protospec != NULL) { - in6m = ifma->ifma_protospec; - ++in6m->in6m_refcount; - break; + domerge = 0; + mt = ifscq->ifq_tail; + if (mt != NULL) { + recslen = m_length(m, NULL); + + if ((mt->m_pkthdr.PH_vt.vt_nrecs + + m->m_pkthdr.PH_vt.vt_nrecs <= + MLD_V2_REPORT_MAXRECS) && + (mt->m_pkthdr.len + recslen <= + (inm->in6m_ifp->if_mtu - MLD_MTUSPACE))) + domerge = 1; } - nin6m = malloc(sizeof(*nin6m), M_IP6MADDR, M_NOWAIT | M_ZERO); - if (nin6m == NULL) { - if_delmulti_ifma(ifma); - break; + if (!domerge && _IF_QFULL(gq)) { + CTR2(KTR_MLD, + "%s: outbound queue full, skipping whole packet %p", + __func__, m); + mt = m->m_nextpkt; + if (!docopy) + m_freem(m); + m = mt; + continue; } - nin6m->in6m_addr = *maddr6; - nin6m->in6m_ifp = ifp; - nin6m->in6m_refcount = 1; - nin6m->in6m_ifma = ifma; - ifma->ifma_protospec = nin6m; + if (!docopy) { + CTR2(KTR_MLD, "%s: dequeueing %p", __func__, m); + _IF_DEQUEUE(gq, m0); + m = m0->m_nextpkt; + } else { + CTR2(KTR_MLD, "%s: copying %p", __func__, m); + m0 = m_dup(m, M_NOWAIT); + if (m0 == NULL) + return (ENOMEM); + m0->m_nextpkt = NULL; + m = m->m_nextpkt; + } - nin6m->in6m_timer_ch = malloc(sizeof(*nin6m->in6m_timer_ch), - M_IP6MADDR, M_NOWAIT); - if (nin6m->in6m_timer_ch == NULL) { - free(nin6m, M_IP6MADDR); - if_delmulti_ifma(ifma); - break; + if (!domerge) { + CTR3(KTR_MLD, "%s: queueing %p to ifscq %p)", + __func__, m0, ifscq); + _IF_ENQUEUE(ifscq, m0); + } else { + struct mbuf *mtl; /* last mbuf of packet mt */ + + CTR3(KTR_MLD, "%s: merging %p with ifscq tail %p)", + __func__, m0, mt); + + mtl = m_last(mt); + m0->m_flags &= ~M_PKTHDR; + mt->m_pkthdr.len += recslen; + mt->m_pkthdr.PH_vt.vt_nrecs += + m0->m_pkthdr.PH_vt.vt_nrecs; + + mtl->m_next = m0; } + } + + return (0); +} + +/* + * Respond to a pending MLDv2 General Query. + */ +static void +mld_v2_dispatch_general_query(struct mld_ifinfo *mli) +{ + INIT_VNET_INET6(curvnet); + struct ifmultiaddr *ifma, *tifma; + struct ifnet *ifp; + struct in6_multi *inm; + int retval; + + IN6_MULTI_LOCK_ASSERT(); + MLD_LOCK_ASSERT(); + + KASSERT(mli->mli_version == MLD_VERSION_2, + ("%s: called when version %d", __func__, mli->mli_version)); + + ifp = mli->mli_ifp; + + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH_SAFE(ifma, &ifp->if_multiaddrs, ifma_link, tifma) { + if (ifma->ifma_addr->sa_family != AF_INET6 || + ifma->ifma_protospec == NULL) + continue; - LIST_INSERT_HEAD(&in6_multihead, nin6m, in6m_entry); + inm = (struct in6_multi *)ifma->ifma_protospec; + KASSERT(ifp == inm->in6m_ifp, + ("%s: inconsistent ifp", __func__)); - callout_init(nin6m->in6m_timer_ch, 0); - nin6m->in6m_timer = delay; - if (nin6m->in6m_timer > 0) { - nin6m->in6m_state = MLD_REPORTPENDING; - mld_starttimer(nin6m); + switch (inm->in6m_state) { + case MLD_NOT_MEMBER: + case MLD_SILENT_MEMBER: + break; + case MLD_REPORTING_MEMBER: + case MLD_IDLE_MEMBER: + case MLD_LAZY_MEMBER: + case MLD_SLEEPING_MEMBER: + case MLD_AWAKENING_MEMBER: + inm->in6m_state = MLD_REPORTING_MEMBER; + retval = mld_v2_enqueue_group_record(&mli->mli_gq, + inm, 0, 0, 0); + CTR2(KTR_MLD, "%s: enqueue record = %d", + __func__, retval); + break; + case MLD_G_QUERY_PENDING_MEMBER: + case MLD_SG_QUERY_PENDING_MEMBER: + case MLD_LEAVING_MEMBER: + break; } + } + IF_ADDR_UNLOCK(ifp); + + mld_dispatch_queue(&mli->mli_gq, MLD_MAX_RESPONSE_BURST); + + /* + * Slew transmission of bursts over 500ms intervals. + */ + if (mli->mli_gq.ifq_head != NULL) { + mli->mli_v2_timer = 1 + MLD_RANDOM_DELAY( + MLD_RESPONSE_BURST_INTERVAL); + V_interface_timers_running6 = 1; + } +} + +/* + * Transmit the next pending message in the output queue. + * + * VIMAGE: Needs to store/restore vnet pointer on a per-mbuf-chain basis. + * MRT: Nothing needs to be done, as MLD traffic is always local to + * a link and uses a link-scope multicast address. + */ +static void +mld_dispatch_packet(struct mbuf *m) +{ + struct ip6_moptions im6o; + struct ifnet *ifp; + struct ifnet *oifp; + struct mbuf *m0; + struct mbuf *md; + struct ip6_hdr *ip6; + struct mld_hdr *mld; + int error; + int off; + int type; + uint32_t ifindex; + + CTR2(KTR_MLD, "%s: transmit %p", __func__, m); + + /* + * Set VNET image pointer from enqueued mbuf chain + * before doing anything else. Whilst we use interface + * indexes to guard against interface detach, they are + * unique to each VIMAGE and must be retrieved. + */ + CURVNET_SET(m->m_pkthdr.header); + INIT_VNET_NET(curvnet); + INIT_VNET_INET6(curvnet); + ifindex = mld_restore_context(m); + + /* + * Check if the ifnet still exists. This limits the scope of + * any race in the absence of a global ifp lock for low cost + * (an array lookup). + */ + ifp = ifnet_byindex(ifindex); + if (ifp == NULL) { + CTR3(KTR_MLD, "%s: dropped %p as ifindex %u went away.", + __func__, m, ifindex); + m_freem(m); + IP6STAT_INC(ip6s_noroute); + goto out; + } - mld6_start_listening(nin6m); + im6o.im6o_multicast_hlim = 1; + im6o.im6o_multicast_loop = (V_ip6_mrouter != NULL); + im6o.im6o_multicast_ifp = ifp; - in6m = nin6m; + if (m->m_flags & M_MLDV1) { + m0 = m; + } else { + m0 = mld_v2_encap_report(ifp, m); + if (m0 == NULL) { + CTR2(KTR_MLD, "%s: dropped %p", __func__, m); + m_freem(m); + IP6STAT_INC(ip6s_odropped); + goto out; + } + } - } while (0); + mld_scrub_context(m0); + m->m_flags &= ~(M_PROTOFLAGS); + m0->m_pkthdr.rcvif = V_loif; - /*IN6_MULTI_UNLOCK();*/ + ip6 = mtod(m0, struct ip6_hdr *); +#if 0 + (void)in6_setscope(&ip6->ip6_dst, ifp, NULL); /* XXX LOR */ +#else + /* + * XXX XXX Break some KPI rules to prevent an LOR which would + * occur if we called in6_setscope() at transmission. + * See comments at top of file. + */ + MLD_EMBEDSCOPE(&ip6->ip6_dst, ifp->if_index); +#endif - return (in6m); + /* + * Retrieve the ICMPv6 type before handoff to ip6_output(), + * so we can bump the stats. + */ + md = m_getptr(m0, sizeof(struct ip6_hdr), &off); + mld = (struct mld_hdr *)(mtod(md, uint8_t *) + off); + type = mld->mld_type; + + error = ip6_output(m0, &mld_po, NULL, IPV6_UNSPECSRC, &im6o, + &oifp, NULL); + if (error) { + CTR3(KTR_MLD, "%s: ip6_output(%p) = %d", __func__, m0, error); + goto out; + } + ICMP6STAT_INC(icp6s_outhist[type]); + if (oifp != NULL) { + icmp6_ifstat_inc(oifp, ifs6_out_msg); + switch (type) { + case MLD_LISTENER_REPORT: + case MLDV2_LISTENER_REPORT: + icmp6_ifstat_inc(oifp, ifs6_out_mldreport); + break; + case MLD_LISTENER_DONE: + icmp6_ifstat_inc(oifp, ifs6_out_mlddone); + break; + } + } +out: + /* + * We must restore the existing vnet pointer before continuing. + */ + CURVNET_RESTORE(); } /* - * Delete a multicast address record. + * Encapsulate an MLDv2 report. + * + * KAME IPv6 requires that hop-by-hop options be passed separately, + * and that the IPv6 header be prepended in a separate mbuf. * - * TODO: Locking, as per netinet. + * Returns a pointer to the new mbuf chain head, or NULL if the + * allocation failed. */ -void -in6_delmulti(struct in6_multi *in6m) +static struct mbuf * +mld_v2_encap_report(struct ifnet *ifp, struct mbuf *m) { - struct ifmultiaddr *ifma; + INIT_VNET_INET6(curvnet); + struct mbuf *mh; + struct mldv2_report *mld; + struct ip6_hdr *ip6; + struct in6_ifaddr *ia; + int mldreclen; + + KASSERT(ifp != NULL, ("%s: null ifp", __func__)); + KASSERT((m->m_flags & M_PKTHDR), + ("%s: mbuf chain %p is !M_PKTHDR", __func__, m)); + + /* + * RFC3590: OK to send as :: or tentative during DAD. + */ + ia = in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY|IN6_IFF_ANYCAST); + if (ia == NULL) + CTR1(KTR_MLD, "%s: warning: ia is NULL", __func__); - KASSERT(in6m->in6m_refcount >= 1, ("%s: freeing freed in6m", __func__)); + MGETHDR(mh, M_DONTWAIT, MT_HEADER); + if (mh == NULL) { + m_freem(m); + return (NULL); + } + MH_ALIGN(mh, sizeof(struct ip6_hdr) + sizeof(struct mldv2_report)); - if (--in6m->in6m_refcount == 0) { - mld_stoptimer(in6m); - mld6_stop_listening(in6m); + mldreclen = m_length(m, NULL); + CTR2(KTR_MLD, "%s: mldreclen is %d", __func__, mldreclen); - ifma = in6m->in6m_ifma; - KASSERT(ifma->ifma_protospec == in6m, - ("%s: ifma_protospec != in6m", __func__)); - ifma->ifma_protospec = NULL; + mh->m_len = sizeof(struct ip6_hdr) + sizeof(struct mldv2_report); + mh->m_pkthdr.len = sizeof(struct ip6_hdr) + + sizeof(struct mldv2_report) + mldreclen; - LIST_REMOVE(in6m, in6m_entry); - free(in6m->in6m_timer_ch, M_IP6MADDR); - free(in6m, M_IP6MADDR); + ip6 = mtod(mh, struct ip6_hdr *); + ip6->ip6_flow = 0; + ip6->ip6_vfc &= ~IPV6_VERSION_MASK; + ip6->ip6_vfc |= IPV6_VERSION; + ip6->ip6_nxt = IPPROTO_ICMPV6; + ip6->ip6_src = ia ? ia->ia_addr.sin6_addr : in6addr_any; + ip6->ip6_dst = in6addr_linklocal_allv2routers; + /* scope ID will be set in netisr */ + + mld = (struct mldv2_report *)(ip6 + 1); + mld->mld_type = MLDV2_LISTENER_REPORT; + mld->mld_code = 0; + mld->mld_cksum = 0; + mld->mld_v2_reserved = 0; + mld->mld_v2_numrecs = htons(m->m_pkthdr.PH_vt.vt_nrecs); + m->m_pkthdr.PH_vt.vt_nrecs = 0; + + mh->m_next = m; + mld->mld_cksum = in6_cksum(mh, IPPROTO_ICMPV6, + sizeof(struct ip6_hdr), sizeof(struct mldv2_report) + mldreclen); + + return (mh); +} + +#ifdef KTR +static char * +mld_rec_type_to_str(const int type) +{ - if_delmulti_ifma(ifma); + switch (type) { + case MLD_CHANGE_TO_EXCLUDE_MODE: + return "TO_EX"; + break; + case MLD_CHANGE_TO_INCLUDE_MODE: + return "TO_IN"; + break; + case MLD_MODE_IS_EXCLUDE: + return "MODE_EX"; + break; + case MLD_MODE_IS_INCLUDE: + return "MODE_IN"; + break; + case MLD_ALLOW_NEW_SOURCES: + return "ALLOW_NEW"; + break; + case MLD_BLOCK_OLD_SOURCES: + return "BLOCK_OLD"; + break; + default: + break; } + return "unknown"; +} +#endif + +static void +mld_sysinit(void) +{ + + CTR1(KTR_MLD, "%s: initializing", __func__); + MLD_LOCK_INIT(); + + ip6_initpktopts(&mld_po); + mld_po.ip6po_hlim = 1; + mld_po.ip6po_hbh = &mld_ra.hbh; + mld_po.ip6po_prefer_tempaddr = IP6PO_TEMPADDR_NOTPREFER; + mld_po.ip6po_flags = IP6PO_DONTFRAG; } + +static void +mld_sysuninit(void) +{ + + CTR1(KTR_MLD, "%s: tearing down", __func__); + MLD_LOCK_DESTROY(); +} + +/* + * Initialize an MLDv2 instance. + * VIMAGE: Assumes curvnet set by caller and called per vimage. + */ +static int +vnet_mld_iattach(const void *unused __unused) +{ + INIT_VNET_INET6(curvnet); + + CTR1(KTR_MLD, "%s: initializing", __func__); + + LIST_INIT(&V_mli_head); + + V_current_state_timers_running6 = 0; + V_interface_timers_running6 = 0; + V_state_change_timers_running6 = 0; + + /* + * Initialize sysctls to default values. + */ + V_mld_gsrdelay.tv_sec = 10; + V_mld_gsrdelay.tv_usec = 0; + + return (0); +} + +static int +vnet_mld_idetach(const void *unused __unused) +{ + INIT_VNET_INET6(curvnet); + + CTR1(KTR_MLD, "%s: tearing down", __func__); + + KASSERT(LIST_EMPTY(&V_mli_head), + ("%s: mli list not empty; ifnets not detached?", __func__)); + + return (0); +} + +#ifndef VIMAGE_GLOBALS +static vnet_modinfo_t vnet_mld_modinfo = { + .vmi_id = VNET_MOD_MLD, + .vmi_name = "mld", + .vmi_dependson = VNET_MOD_INET6, + .vmi_iattach = vnet_mld_iattach, + .vmi_idetach = vnet_mld_idetach +}; +#endif + +static int +mld_modevent(module_t mod, int type, void *unused __unused) +{ + + switch (type) { + case MOD_LOAD: + mld_sysinit(); +#ifndef VIMAGE_GLOBALS + vnet_mod_register(&vnet_mld_modinfo); +#else + vnet_mld_iattach(NULL); +#endif + break; + case MOD_UNLOAD: +#ifndef VIMAGE_GLOBALS +#ifdef NOTYET + vnet_mod_deregister(&vnet_mld_modinfo); +#endif +#else + vnet_mld_idetach(NULL); +#endif + mld_sysuninit(); + break; + default: + return (EOPNOTSUPP); + } + return (0); +} + +static moduledata_t mld_mod = { + "mld", + mld_modevent, + 0 +}; +DECLARE_MODULE(mld, mld_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); diff --git a/sys/netinet6/mld6_var.h b/sys/netinet6/mld6_var.h index 4f51b99..efd01ab 100644 --- a/sys/netinet6/mld6_var.h +++ b/sys/netinet6/mld6_var.h @@ -1,6 +1,5 @@ /*- - * Copyright (C) 1998 WIDE Project. - * All rights reserved. + * Copyright (c) 2009 Bruce Simpson. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -10,14 +9,14 @@ * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. - * 3. Neither the name of the project nor the names of its contributors - * may be used to endorse or promote products derived from this software - * without specific prior written permission. + * 3. The name of the author may not be used to endorse or promote + * products derived from this software without specific prior written + * permission. * - * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) @@ -26,29 +25,139 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * - * $KAME: mld6_var.h,v 1.4 2000/03/25 07:23:54 sumikawa Exp $ * $FreeBSD$ */ - #ifndef _NETINET6_MLD6_VAR_H_ #define _NETINET6_MLD6_VAR_H_ +/* + * Multicast Listener Discovery (MLD) + * implementation-specific definitions. + */ + #ifdef _KERNEL -#define MLD_RANDOM_DELAY(X) (arc4random() % (X) + 1) +/* + * Per-link MLD state. + */ +struct mld_ifinfo { + LIST_ENTRY(mld_ifinfo) mli_link; + struct ifnet *mli_ifp; /* interface this instance belongs to */ + uint32_t mli_version; /* MLDv1 Host Compatibility Mode */ + uint32_t mli_v1_timer; /* MLDv1 Querier Present timer (s) */ + uint32_t mli_v2_timer; /* MLDv2 General Query (interface) timer (s)*/ + uint32_t mli_flags; /* MLD per-interface flags */ + uint32_t mli_rv; /* MLDv2 Robustness Variable */ + uint32_t mli_qi; /* MLDv2 Query Interval (s) */ + uint32_t mli_qri; /* MLDv2 Query Response Interval (s) */ + uint32_t mli_uri; /* MLDv2 Unsolicited Report Interval (s) */ + SLIST_HEAD(,in6_multi) mli_relinmhead; /* released groups */ + struct ifqueue mli_gq; /* queue of general query responses */ +}; +#define MLIF_SILENT 0x00000001 /* Do not use MLD on this ifp */ + +#define MLD_RANDOM_DELAY(X) (arc4random() % (X) + 1) +#define MLD_MAX_STATE_CHANGES 24 /* Max pending changes per group */ /* - * States for MLD stop-listening processing + * MLD per-group states. */ -#define MLD_OTHERLISTENER 0 -#define MLD_IREPORTEDLAST 1 -#define MLD_REPORTPENDING 2 /* implementation specific */ +#define MLD_NOT_MEMBER 0 /* Can garbage collect group */ +#define MLD_SILENT_MEMBER 1 /* Do not perform MLD for group */ +#define MLD_REPORTING_MEMBER 2 /* MLDv1 we are reporter */ +#define MLD_IDLE_MEMBER 3 /* MLDv1 we reported last */ +#define MLD_LAZY_MEMBER 4 /* MLDv1 other member reporting */ +#define MLD_SLEEPING_MEMBER 5 /* MLDv1 start query response */ +#define MLD_AWAKENING_MEMBER 6 /* MLDv1 group timer will start */ +#define MLD_G_QUERY_PENDING_MEMBER 7 /* MLDv2 group query pending */ +#define MLD_SG_QUERY_PENDING_MEMBER 8 /* MLDv2 source query pending */ +#define MLD_LEAVING_MEMBER 9 /* MLDv2 dying gasp (pending last */ + /* retransmission of INCLUDE {}) */ + +/* + * MLD version tag. + */ +#define MLD_VERSION_NONE 0 /* Invalid */ +#define MLD_VERSION_1 1 +#define MLD_VERSION_2 2 /* Default */ + +/* + * MLDv2 protocol control variables. + */ +#define MLD_RV_INIT 2 /* Robustness Variable */ +#define MLD_RV_MIN 1 +#define MLD_RV_MAX 7 + +#define MLD_QI_INIT 125 /* Query Interval (s) */ +#define MLD_QI_MIN 1 +#define MLD_QI_MAX 255 + +#define MLD_QRI_INIT 10 /* Query Response Interval (s) */ +#define MLD_QRI_MIN 1 +#define MLD_QRI_MAX 255 + +#define MLD_URI_INIT 3 /* Unsolicited Report Interval (s) */ +#define MLD_URI_MIN 0 +#define MLD_URI_MAX 10 + +#define MLD_MAX_GS_SOURCES 256 /* # of sources in rx GS query */ +#define MLD_MAX_G_GS_PACKETS 8 /* # of packets to answer G/GS */ +#define MLD_MAX_STATE_CHANGE_PACKETS 8 /* # of packets per state change */ +#define MLD_MAX_RESPONSE_PACKETS 16 /* # of packets for general query */ +#define MLD_MAX_RESPONSE_BURST 4 /* # of responses to send at once */ +#define MLD_RESPONSE_BURST_INTERVAL (PR_FASTHZ / 2) /* 500ms */ + +/* + * MLD-specific mbuf flags. + */ +#define M_MLDV1 M_PROTO1 /* Packet is MLDv1 */ +#define M_GROUPREC M_PROTO3 /* mbuf chain is a group record */ + +/* + * Leading space for MLDv2 reports inside MTU. + * + * NOTE: This differs from IGMPv3 significantly. KAME IPv6 requires + * that a fully formed mbuf chain *without* the Router Alert option + * is passed to ip6_output(), however we must account for it in the + * MTU if we need to split an MLDv2 report into several packets. + * + * We now put the MLDv2 report header in the initial mbuf containing + * the IPv6 header. + */ +#define MLD_MTUSPACE (sizeof(struct ip6_hdr) + sizeof(struct mld_raopt) + \ + sizeof(struct icmp6_hdr)) + +/* + * Subsystem lock macros. + * The MLD lock is only taken with MLD. Currently it is system-wide. + * VIMAGE: The lock could be pushed to per-VIMAGE granularity in future. + */ +#define MLD_LOCK_INIT() mtx_init(&mld_mtx, "mld_mtx", NULL, MTX_DEF) +#define MLD_LOCK_DESTROY() mtx_destroy(&mld_mtx) +#define MLD_LOCK() mtx_lock(&mld_mtx) +#define MLD_LOCK_ASSERT() mtx_assert(&mld_mtx, MA_OWNED) +#define MLD_UNLOCK() mtx_unlock(&mld_mtx) +#define MLD_UNLOCK_ASSERT() mtx_assert(&mld_mtx, MA_NOTOWNED) + +/* + * Per-link MLD context. + */ +#define MLD_IFINFO(ifp) \ + (((struct in6_ifextra *)(ifp)->if_afdata[AF_INET6])->mld_ifinfo) + +int mld_change_state(struct in6_multi *, const int); +struct mld_ifinfo * + mld_domifattach(struct ifnet *); +void mld_domifdetach(struct ifnet *); +void mld_fasttimo(void); +void mld_ifdetach(struct ifnet *); +int mld_input(struct mbuf *, int, int); +void mld_slowtimo(void); + +#ifdef SYSCTL_DECL +SYSCTL_DECL(_net_inet6_mld); +#endif -void mld6_init(void); -void mld6_input(struct mbuf *, int); -void mld6_start_listening(struct in6_multi *); -void mld6_stop_listening(struct in6_multi *); -void mld6_fasttimeo(void); #endif /* _KERNEL */ #endif /* _NETINET6_MLD6_VAR_H_ */ diff --git a/sys/netinet6/raw_ip6.c b/sys/netinet6/raw_ip6.c index 2ac95e5..c340ffd 100644 --- a/sys/netinet6/raw_ip6.c +++ b/sys/netinet6/raw_ip6.c @@ -128,9 +128,20 @@ extern u_long rip_sendspace; extern u_long rip_recvspace; /* - * Hooks for multicast forwarding. + * Hooks for multicast routing. They all default to NULL, so leave them not + * initialized and rely on BSS being set to 0. + */ + +/* + * The socket used to communicate with the multicast routing daemon. + */ +#ifdef VIMAGE_GLOBALS +struct socket *ip6_mrouter; +#endif + +/* + * The various mrouter functions. */ -struct socket *ip6_mrouter = NULL; int (*ip6_mrouter_set)(struct socket *, struct sockopt *); int (*ip6_mrouter_get)(struct socket *, struct sockopt *); int (*ip6_mrouter_done)(void); @@ -149,6 +160,7 @@ rip6_input(struct mbuf **mp, int *offp, int proto) #ifdef IPSEC INIT_VNET_IPSEC(curvnet); #endif + struct ifnet *ifp; struct mbuf *m = *mp; register struct ip6_hdr *ip6 = mtod(m, struct ip6_hdr *); register struct inpcb *in6p; @@ -166,6 +178,8 @@ rip6_input(struct mbuf **mp, int *offp, int proto) init_sin6(&fromsa, m); /* general init */ + ifp = m->m_pkthdr.rcvif; + INP_INFO_RLOCK(&V_ripcbinfo); LIST_FOREACH(in6p, &V_ripcb, inp_list) { /* XXX inp locking */ @@ -180,9 +194,17 @@ rip6_input(struct mbuf **mp, int *offp, int proto) if (!IN6_IS_ADDR_UNSPECIFIED(&in6p->in6p_faddr) && !IN6_ARE_ADDR_EQUAL(&in6p->in6p_faddr, &ip6->ip6_src)) continue; - if (prison_check_ip6(in6p->inp_cred, &ip6->ip6_dst) != 0) - continue; - INP_RLOCK(in6p); + if (jailed(in6p->inp_cred)) { + /* + * Allow raw socket in jail to receive multicast; + * assume process had PRIV_NETINET_RAW at attach, + * and fall through into normal filter path if so. + */ + if (!IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst) && + prison_check_ip6(in6p->inp_cred, + &ip6->ip6_dst) != 0) + continue; + } if (in6p->in6p_cksum != -1) { V_rip6stat.rip6s_isum++; if (in6_cksum(m, proto, *offp, @@ -192,6 +214,31 @@ rip6_input(struct mbuf **mp, int *offp, int proto) continue; } } + INP_RLOCK(in6p); + /* + * If this raw socket has multicast state, and we + * have received a multicast, check if this socket + * should receive it, as multicast filtering is now + * the responsibility of the transport layer. + */ + if (in6p->in6p_moptions && + IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { + struct sockaddr_in6 mcaddr; + int blocked; + + bzero(&mcaddr, sizeof(struct sockaddr_in6)); + mcaddr.sin6_len = sizeof(struct sockaddr_in6); + mcaddr.sin6_family = AF_INET6; + mcaddr.sin6_addr = ip6->ip6_dst; + + blocked = im6o_mc_filter(in6p->in6p_moptions, ifp, + (struct sockaddr *)&mcaddr, + (struct sockaddr *)&fromsa); + if (blocked != MCAST_PASS) { + IP6STAT_INC(ip6s_notmember); + continue; + } + } if (last != NULL) { struct mbuf *n = m_copy(m, 0, (int)M_COPYALL); @@ -604,13 +651,13 @@ rip6_attach(struct socket *so, int proto, struct thread *td) static void rip6_detach(struct socket *so) { - INIT_VNET_INET(so->so_vnet); + INIT_VNET_INET6(so->so_vnet); struct inpcb *inp; inp = sotoinpcb(so); KASSERT(inp != NULL, ("rip6_detach: inp == NULL")); - if (so == ip6_mrouter && ip6_mrouter_done) + if (so == V_ip6_mrouter && ip6_mrouter_done) ip6_mrouter_done(); /* xxx: RSVP */ INP_INFO_WLOCK(&V_ripcbinfo); diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c index 566cf92..5393740 100644 --- a/sys/netinet6/udp6_usrreq.c +++ b/sys/netinet6/udp6_usrreq.c @@ -177,6 +177,7 @@ udp6_input(struct mbuf **mp, int *offp, int proto) INIT_VNET_INET(curvnet); INIT_VNET_INET6(curvnet); struct mbuf *m = *mp; + struct ifnet *ifp; struct ip6_hdr *ip6; struct udphdr *uh; struct inpcb *inp; @@ -184,6 +185,7 @@ udp6_input(struct mbuf **mp, int *offp, int proto) int plen, ulen; struct sockaddr_in6 fromsa; + ifp = m->m_pkthdr.rcvif; ip6 = mtod(m, struct ip6_hdr *); if (faithprefix_p != NULL && (*faithprefix_p)(&ip6->ip6_dst)) { @@ -239,6 +241,7 @@ udp6_input(struct mbuf **mp, int *offp, int proto) INP_INFO_RLOCK(&V_udbinfo); if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { struct inpcb *last; + struct ip6_moptions *imo; /* * In the event that laddr should be set to the link-local @@ -261,12 +264,6 @@ udp6_input(struct mbuf **mp, int *offp, int proto) continue; if (inp->inp_lport != uh->uh_dport) continue; - /* - * XXX: Do not check source port of incoming datagram - * unless inp_connect() has been called to bind the - * fport part of the 4-tuple; the source could be - * trying to talk to us with an ephemeral port. - */ if (inp->inp_fport != 0 && inp->inp_fport != uh->uh_sport) continue; @@ -282,6 +279,35 @@ udp6_input(struct mbuf **mp, int *offp, int proto) continue; } + INP_RLOCK(inp); + + /* + * Handle socket delivery policy for any-source + * and source-specific multicast. [RFC3678] + */ + imo = inp->in6p_moptions; + if (imo && IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) { + struct sockaddr_in6 mcaddr; + int blocked; + + bzero(&mcaddr, sizeof(struct sockaddr_in6)); + mcaddr.sin6_len = sizeof(struct sockaddr_in6); + mcaddr.sin6_family = AF_INET6; + mcaddr.sin6_addr = ip6->ip6_dst; + + blocked = im6o_mc_filter(imo, ifp, + (struct sockaddr *)&mcaddr, + (struct sockaddr *)&fromsa); + if (blocked != MCAST_PASS) { + if (blocked == MCAST_NOTGMEMBER) + IP6STAT_INC(ip6s_notmember); + if (blocked == MCAST_NOTSMEMBER || + blocked == MCAST_MUTED) + UDPSTAT_INC(udps_filtermcast); + INP_RUNLOCK(inp); + continue; + } + } if (last != NULL) { struct mbuf *n; @@ -397,6 +423,8 @@ udp6_input(struct mbuf **mp, int *offp, int proto) return (IPPROTO_DONE); badheadlocked: + if (inp) + INP_RUNLOCK(inp); INP_INFO_RUNLOCK(&V_udbinfo); badunlocked: if (m) diff --git a/sys/netinet6/vinet6.h b/sys/netinet6/vinet6.h index 0920200..76bbec9 100644 --- a/sys/netinet6/vinet6.h +++ b/sys/netinet6/vinet6.h @@ -145,6 +145,7 @@ struct vnet_inet6 { u_int32_t _ip6_temp_preferred_lifetime; u_int32_t _ip6_temp_valid_lifetime; + struct socket * _ip6_mrouter; int _ip6_mrouter_ver; int _pim6; u_int _mrt6debug; @@ -153,6 +154,12 @@ struct vnet_inet6 { int _ip6_use_defzone; struct ip6_pktopts _ip6_opts; + + struct timeval _mld_gsrdelay; + LIST_HEAD(, mld_ifinfo) _mli_head; + int _interface_timers_running6; + int _state_change_timers_running6; + int _current_state_timers_running6; }; /* Size guard. See sys/vimage.h. */ @@ -173,6 +180,8 @@ extern struct vnet_inet6 vnet_inet6_0; * Symbol translation macros */ #define V_addrsel_policytab VNET_INET6(addrsel_policytab) +#define V_current_state_timers_running6 \ + VNET_INET6(current_state_timers_running6) #define V_dad_ignore_ns VNET_INET6(dad_ignore_ns) #define V_dad_init VNET_INET6(dad_init) #define V_dad_maxtry VNET_INET6(dad_maxtry) @@ -190,6 +199,8 @@ extern struct vnet_inet6 vnet_inet6_0; #define V_in6_ifaddr VNET_INET6(in6_ifaddr) #define V_in6_maxmtu VNET_INET6(in6_maxmtu) #define V_in6_tmpaddrtimer_ch VNET_INET6(in6_tmpaddrtimer_ch) +#define V_interface_timers_running6 \ + VNET_INET6(interface_timers_running6) #define V_ip6_accept_rtadv VNET_INET6(ip6_accept_rtadv) #define V_ip6_auto_flowlabel VNET_INET6(ip6_auto_flowlabel) #define V_ip6_auto_linklocal VNET_INET6(ip6_auto_linklocal) @@ -205,6 +216,7 @@ extern struct vnet_inet6 vnet_inet6_0; #define V_ip6_maxfragpackets VNET_INET6(ip6_maxfragpackets) #define V_ip6_maxfrags VNET_INET6(ip6_maxfrags) #define V_ip6_mcast_pmtu VNET_INET6(ip6_mcast_pmtu) +#define V_ip6_mrouter VNET_INET6(ip6_mrouter) #define V_ip6_mrouter_ver VNET_INET6(ip6_mrouter_ver) #define V_ip6_opts VNET_INET6(ip6_opts) #define V_ip6_prefer_tempaddr VNET_INET6(ip6_prefer_tempaddr) @@ -223,6 +235,8 @@ extern struct vnet_inet6 vnet_inet6_0; #define V_ip6stealth VNET_INET6(ip6stealth) #define V_llinfo_nd6 VNET_INET6(llinfo_nd6) #define V_mrt6debug VNET_INET6(mrt6debug) +#define V_mld_gsrdelay VNET_INET6(mld_gsrdelay) +#define V_mli_head VNET_INET6(mli_head) #define V_nd6_allocated VNET_INET6(nd6_allocated) #define V_nd6_debug VNET_INET6(nd6_debug) #define V_nd6_defifindex VNET_INET6(nd6_defifindex) @@ -256,6 +270,8 @@ extern struct vnet_inet6 vnet_inet6_0; #define V_rtq_timer6 VNET_INET6(rtq_timer6) #define V_rtq_toomany6 VNET_INET6(rtq_toomany6) #define V_sid_default VNET_INET6(sid_default) +#define V_state_change_timers_running6 \ + VNET_INET6(state_change_timers_running6) #define V_udp6_recvspace VNET_INET6(udp6_recvspace) #define V_udp6_sendspace VNET_INET6(udp6_sendspace) diff --git a/sys/sys/param.h b/sys/sys/param.h index 649db22..91f153b 100644 --- a/sys/sys/param.h +++ b/sys/sys/param.h @@ -57,7 +57,7 @@ * is created, otherwise 1. */ #undef __FreeBSD_version -#define __FreeBSD_version 800083 /* Master, propagated to newvers */ +#define __FreeBSD_version 800084 /* Master, propagated to newvers */ #ifndef LOCORE #include <sys/types.h> diff --git a/usr.sbin/ifmcstat/ifmcstat.c b/usr.sbin/ifmcstat/ifmcstat.c index 34068ea..0aea6fd 100644 --- a/usr.sbin/ifmcstat/ifmcstat.c +++ b/usr.sbin/ifmcstat/ifmcstat.c @@ -167,6 +167,7 @@ static void in_ifinfo(struct igmp_ifinfo *); static const char * inm_mode(u_int mode); #endif #ifdef INET6 +static void in6_ifinfo(struct mld_ifinfo *); static const char * inet6_n2a(struct in6_addr *); #endif int main(int, char **); @@ -441,8 +442,35 @@ ll_addrlist(struct ifaddr *ifap) #ifdef INET6 static void +in6_ifinfo(struct mld_ifinfo *mli) +{ + + printf("\t"); + switch (mli->mli_version) { + case MLD_VERSION_1: + case MLD_VERSION_2: + printf("mldv%d", mli->mli_version); + break; + default: + printf("mldv?(%d)", mli->mli_version); + break; + } + printb(" flags", mli->mli_flags, "\020\1SILENT"); + if (mli->mli_version == MLD_VERSION_2) { + printf(" rv %u qi %u qri %u uri %u", + mli->mli_rv, mli->mli_qi, mli->mli_qri, mli->mli_uri); + } + if (vflag >= 2) { + printf(" v1timer %u v2timer %u", mli->mli_v1_timer, + mli->mli_v2_timer); + } + printf("\n"); +} + +static void if6_addrlist(struct ifaddr *ifap) { + struct ifnet ifnet; struct ifaddr ifa; struct sockaddr sa; struct in6_ifaddr if6a; @@ -460,6 +488,21 @@ if6_addrlist(struct ifaddr *ifap) goto nextifap; KREAD(ifap, &if6a, struct in6_ifaddr); printf("\tinet6 %s\n", inet6_n2a(&if6a.ia_addr.sin6_addr)); + /* + * Print per-link MLD information, if available. + */ + if (ifa.ifa_ifp != NULL) { + struct in6_ifextra ie; + struct mld_ifinfo mli; + + KREAD(ifa.ifa_ifp, &ifnet, struct ifnet); + KREAD(ifnet.if_afdata[AF_INET6], &ie, + struct in6_ifextra); + if (ie.mld_ifinfo != NULL) { + KREAD(ie.mld_ifinfo, &mli, struct mld_ifinfo); + in6_ifinfo(&mli); + } + } nextifap: ifap = ifa.ifa_link.tqe_next; } @@ -842,6 +885,110 @@ out_free: #endif /* INET */ +#ifdef INET6 +/* + * Retrieve MLD per-group source filter mode and lists via sysctl. + * + * Note: The 128-bit IPv6 group addres needs to be segmented into + * 32-bit pieces for marshaling to sysctl. So the MIB name ends + * up looking like this: + * a.b.c.d.e.ifindex.g[0].g[1].g[2].g[3] + * Assumes that pgroup originated from the kernel, so its components + * are already in network-byte order. + */ +static void +in6m_print_sources_sysctl(uint32_t ifindex, struct in6_addr *pgroup) +{ +#define MAX_SYSCTL_TRY 5 + char addrbuf[INET6_ADDRSTRLEN]; + int mib[10]; + int ntry = 0; + int *pi; + size_t mibsize; + size_t len; + size_t needed; + size_t cnt; + int i; + char *buf; + struct in6_addr *pina; + uint32_t *p; + uint32_t fmode; + const char *modestr; + + mibsize = sizeof(mib) / sizeof(mib[0]); + if (sysctlnametomib("net.inet6.ip6.mcast.filters", mib, + &mibsize) == -1) { + perror("sysctlnametomib"); + return; + } + + needed = 0; + mib[5] = ifindex; + pi = (int *)pgroup; + for (i = 0; i < 4; i++) + mib[6 + i] = *pi++; + + mibsize = sizeof(mib) / sizeof(mib[0]); + do { + if (sysctl(mib, mibsize, NULL, &needed, NULL, 0) == -1) { + perror("sysctl net.inet6.ip6.mcast.filters"); + return; + } + if ((buf = malloc(needed)) == NULL) { + perror("malloc"); + return; + } + if (sysctl(mib, mibsize, buf, &needed, NULL, 0) == -1) { + if (errno != ENOMEM || ++ntry >= MAX_SYSCTL_TRY) { + perror("sysctl"); + goto out_free; + } + free(buf); + buf = NULL; + } + } while (buf == NULL); + + len = needed; + if (len < sizeof(uint32_t)) { + perror("sysctl"); + goto out_free; + } + + p = (uint32_t *)buf; + fmode = *p++; + len -= sizeof(uint32_t); + + modestr = inm_mode(fmode); + if (modestr) + printf(" mode %s", modestr); + else + printf(" mode (%u)", fmode); + + if (vflag == 0) + goto out_free; + + cnt = len / sizeof(struct in6_addr); + pina = (struct in6_addr *)p; + + for (i = 0; i < cnt; i++) { + if (i == 0) + printf(" srcs "); + inet_ntop(AF_INET6, (const char *)pina++, addrbuf, + INET6_ADDRSTRLEN); + fprintf(stdout, "%s%s", (i == 0 ? "" : ","), addrbuf); + len -= sizeof(struct in6_addr); + } + if (len > 0) { + fprintf(stderr, "warning: %u trailing bytes from %s\n", + (unsigned int)len, "net.inet6.ip6.mcast.filters"); + } + +out_free: + free(buf); +#undef MAX_SYSCTL_TRY +} +#endif /* INET6 */ + static int ifmcstat_getifmaddrs(void) { @@ -1015,6 +1162,33 @@ ifmcstat_getifmaddrs(void) } in_ifinfo(&igi); } +#endif /* INET */ +#ifdef INET6 + /* + * Print per-link MLD information, if available. + */ + if (pifasa->sa.sa_family == AF_INET6) { + struct mld_ifinfo mli; + size_t mibsize, len; + int mib[5]; + + mibsize = sizeof(mib) / sizeof(mib[0]); + if (sysctlnametomib("net.inet6.mld.ifinfo", + mib, &mibsize) == -1) { + perror("sysctlnametomib"); + goto next_ifnet; + } + mib[mibsize] = thisifindex; + len = sizeof(struct mld_ifinfo); + if (sysctl(mib, mibsize + 1, &mli, &len, NULL, + 0) == -1) { + perror("sysctl net.inet6.mld.ifinfo"); + goto next_ifnet; + } + in6_ifinfo(&mli); + } +#endif /* INET6 */ +#if defined(INET) || defined(INET6) next_ifnet: #endif lastifasa = *pifasa; @@ -1041,6 +1215,12 @@ next_ifnet: pgsa->sin.sin_addr); } #endif +#ifdef INET6 + if (pgsa->sa.sa_family == AF_INET6) { + in6m_print_sources_sysctl(thisifindex, + &pgsa->sin6.sin6_addr); + } +#endif fprintf(stdout, "\n"); /* Link-layer mapping, if present. */ |