diff options
author | julian <julian@FreeBSD.org> | 2008-05-09 23:03:00 +0000 |
---|---|---|
committer | julian <julian@FreeBSD.org> | 2008-05-09 23:03:00 +0000 |
commit | 1dfc5c98a4f7c32163dfdc61e390ccf805385108 (patch) | |
tree | 1bc85679564ad62b5790f35580ebdcc21ca90f8b /sys/netinet | |
parent | 4c2d9b2a516115af830b3ee9304b28c73e0bb9d7 (diff) | |
download | FreeBSD-src-1dfc5c98a4f7c32163dfdc61e390ccf805385108.zip FreeBSD-src-1dfc5c98a4f7c32163dfdc61e390ccf805385108.tar.gz |
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
Diffstat (limited to 'sys/netinet')
-rw-r--r-- | sys/netinet/if_atm.c | 2 | ||||
-rw-r--r-- | sys/netinet/if_ether.c | 297 | ||||
-rw-r--r-- | sys/netinet/in_gif.c | 8 | ||||
-rw-r--r-- | sys/netinet/in_mcast.c | 3 | ||||
-rw-r--r-- | sys/netinet/in_pcb.c | 3 | ||||
-rw-r--r-- | sys/netinet/in_pcb.h | 2 | ||||
-rw-r--r-- | sys/netinet/in_rmx.c | 154 | ||||
-rw-r--r-- | sys/netinet/in_var.h | 16 | ||||
-rw-r--r-- | sys/netinet/ip_fastfwd.c | 2 | ||||
-rw-r--r-- | sys/netinet/ip_fw.h | 4 | ||||
-rw-r--r-- | sys/netinet/ip_fw2.c | 54 | ||||
-rw-r--r-- | sys/netinet/ip_icmp.c | 17 | ||||
-rw-r--r-- | sys/netinet/ip_input.c | 10 | ||||
-rw-r--r-- | sys/netinet/ip_mroute.c | 4 | ||||
-rw-r--r-- | sys/netinet/ip_mroute.h | 2 | ||||
-rw-r--r-- | sys/netinet/ip_options.c | 5 | ||||
-rw-r--r-- | sys/netinet/ip_output.c | 8 | ||||
-rw-r--r-- | sys/netinet/ip_var.h | 2 | ||||
-rw-r--r-- | sys/netinet/raw_ip.c | 2 | ||||
-rw-r--r-- | sys/netinet/sctp_os_bsd.h | 2 | ||||
-rw-r--r-- | sys/netinet/tcp_input.c | 1 | ||||
-rw-r--r-- | sys/netinet/tcp_subr.c | 8 | ||||
-rw-r--r-- | sys/netinet/tcp_syncache.c | 4 |
23 files changed, 430 insertions, 180 deletions
diff --git a/sys/netinet/if_atm.c b/sys/netinet/if_atm.c index d19dea8..065f0c4 100644 --- a/sys/netinet/if_atm.c +++ b/sys/netinet/if_atm.c @@ -327,7 +327,7 @@ atmresolve(struct rtentry *rt, struct mbuf *m, struct sockaddr *dst, } if (rt == NULL) { - rt = RTALLOC1(dst, 0); + rt = RTALLOC1(dst, 0); /* link level on table 0 XXX MRT */ if (rt == NULL) goto bad; /* failed */ RT_REMREF(rt); /* don't keep LL references */ diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c index b1133c9..6939dbb 100644 --- a/sys/netinet/if_ether.c +++ b/sys/netinet/if_ether.c @@ -116,7 +116,7 @@ static void arprequest(struct ifnet *, static void arpintr(struct mbuf *); static void arptimer(void *); static struct rtentry - *arplookup(u_long, int, int); + *arplookup(u_long, int, int, int); #ifdef INET static void in_arpinput(struct mbuf *); #endif @@ -138,7 +138,8 @@ arptimer(void *arg) */ RT_UNLOCK(rt); - rtrequest(RTM_DELETE, rt_key(rt), NULL, rt_mask(rt), 0, NULL); + in_rtrequest(RTM_DELETE, rt_key(rt), NULL, rt_mask(rt), 0, NULL, + rt->rt_fibnum); } /* @@ -362,6 +363,7 @@ arpresolve(struct ifnet *ifp, struct rtentry *rt0, struct mbuf *m, struct rtentry *rt = NULL; struct sockaddr_dl *sdl; int error; + int fibnum = 0; if (m) { if (m->m_flags & M_BCAST) { @@ -375,10 +377,14 @@ arpresolve(struct ifnet *ifp, struct rtentry *rt0, struct mbuf *m, ETHER_MAP_IP_MULTICAST(&SIN(dst)->sin_addr, desten); return (0); } + fibnum = M_GETFIB(m); } if (rt0 != NULL) { - error = rt_check(&rt, &rt0, dst); + /* Look for a cached arp (ll) entry. */ + if (m == NULL) + fibnum = rt0->rt_fibnum; + error = in_rt_check(&rt, &rt0, dst, fibnum); if (error) { m_freem(m); return error; @@ -389,10 +395,14 @@ arpresolve(struct ifnet *ifp, struct rtentry *rt0, struct mbuf *m, } if (la == NULL) { /* - * We enter this block in case if rt0 was NULL, - * or if rt found by rt_check() didn't have llinfo. + * We enter this block if rt0 was NULL, + * or if rt found by in_rt_check() didn't have llinfo. + * we should get a cloned route, which since it should + * come from the local interface should have a ll entry. + * if may be incoplete but that's ok. + * XXXMRT if we haven't found a fibnum is that OK? */ - rt = arplookup(SIN(dst)->sin_addr.s_addr, 1, 0); + rt = arplookup(SIN(dst)->sin_addr.s_addr, 1, 0, fibnum); if (rt == NULL) { log(LOG_DEBUG, "arpresolve: can't allocate route for %s\n", @@ -582,6 +592,9 @@ in_arpinput(struct mbuf *m) int op, rif_len; int req_len; int bridged = 0; + u_int fibnum; + u_int goodfib = 0; + int firstpass = 1; #ifdef DEV_CARP int carp_match = 0; #endif @@ -674,133 +687,181 @@ match: } if (ifp->if_flags & IFF_STATICARP) goto reply; - rt = arplookup(isaddr.s_addr, itaddr.s_addr == myaddr.s_addr, 0); - if (rt != NULL) { - sin.sin_addr.s_addr = isaddr.s_addr; - EVENTHANDLER_INVOKE(route_arp_update_event, rt, - ar_sha(ah), (struct sockaddr *)&sin); + /* + * We look for any FIBs that has this address to find + * the interface etc. + * For sanity checks that are FIB independent we abort the loop. + */ + for (fibnum = 0; fibnum < rt_numfibs; fibnum++) { + rt = arplookup(isaddr.s_addr, + itaddr.s_addr == myaddr.s_addr, 0, fibnum); + if (rt == NULL) + continue; + + sdl = SDL(rt->rt_gateway); + /* Only call this once */ + if (firstpass) { + sin.sin_addr.s_addr = isaddr.s_addr; + EVENTHANDLER_INVOKE(route_arp_update_event, rt, + ar_sha(ah), (struct sockaddr *)&sin); + } la = (struct llinfo_arp *)rt->rt_llinfo; if (la == NULL) { RT_UNLOCK(rt); - goto reply; + continue; } - } else - goto reply; - /* The following is not an error when doing bridging. */ - if (!bridged && rt->rt_ifp != ifp + if (firstpass) { + /* The following is not an error when doing bridging. */ + if (!bridged && rt->rt_ifp != ifp #ifdef DEV_CARP - && (ifp->if_type != IFT_CARP || !carp_match) + && (ifp->if_type != IFT_CARP || !carp_match) #endif - ) { - if (log_arp_wrong_iface) - log(LOG_ERR, "arp: %s is on %s but got reply from %*D on %s\n", - inet_ntoa(isaddr), - rt->rt_ifp->if_xname, - ifp->if_addrlen, (u_char *)ar_sha(ah), ":", - ifp->if_xname); - RT_UNLOCK(rt); - goto reply; - } - sdl = SDL(rt->rt_gateway); - if (sdl->sdl_alen && - bcmp(ar_sha(ah), LLADDR(sdl), sdl->sdl_alen)) { - if (rt->rt_expire) { - if (log_arp_movements) - log(LOG_INFO, "arp: %s moved from %*D to %*D on %s\n", - inet_ntoa(isaddr), - ifp->if_addrlen, (u_char *)LLADDR(sdl), ":", - ifp->if_addrlen, (u_char *)ar_sha(ah), ":", - ifp->if_xname); - } else { - RT_UNLOCK(rt); - if (log_arp_permanent_modify) - log(LOG_ERR, "arp: %*D attempts to modify " - "permanent entry for %s on %s\n", - ifp->if_addrlen, (u_char *)ar_sha(ah), ":", - inet_ntoa(isaddr), ifp->if_xname); - goto reply; - } - } - /* - * sanity check for the address length. - * XXX this does not work for protocols with variable address - * length. -is - */ - if (sdl->sdl_alen && - sdl->sdl_alen != ah->ar_hln) { - log(LOG_WARNING, - "arp from %*D: new addr len %d, was %d", - ifp->if_addrlen, (u_char *) ar_sha(ah), ":", - ah->ar_hln, sdl->sdl_alen); - } - if (ifp->if_addrlen != ah->ar_hln) { - log(LOG_WARNING, - "arp from %*D: addr len: new %d, i/f %d (ignored)", - ifp->if_addrlen, (u_char *) ar_sha(ah), ":", - ah->ar_hln, ifp->if_addrlen); - RT_UNLOCK(rt); - goto reply; - } - (void)memcpy(LLADDR(sdl), ar_sha(ah), - sdl->sdl_alen = ah->ar_hln); - /* - * If we receive an arp from a token-ring station over - * a token-ring nic then try to save the source - * routing info. - */ - if (ifp->if_type == IFT_ISO88025) { - struct iso88025_header *th = NULL; - struct iso88025_sockaddr_dl_data *trld; - - th = (struct iso88025_header *)m->m_pkthdr.header; - trld = SDL_ISO88025(sdl); - rif_len = TR_RCF_RIFLEN(th->rcf); - if ((th->iso88025_shost[0] & TR_RII) && - (rif_len > 2)) { - trld->trld_rcf = th->rcf; - trld->trld_rcf ^= htons(TR_RCF_DIR); - memcpy(trld->trld_route, th->rd, rif_len - 2); - trld->trld_rcf &= ~htons(TR_RCF_BCST_MASK); + ) { + if (log_arp_wrong_iface) + log(LOG_ERR, "arp: %s is on %s " + "but got reply from %*D " + "on %s\n", + inet_ntoa(isaddr), + rt->rt_ifp->if_xname, + ifp->if_addrlen, + (u_char *)ar_sha(ah), ":", + ifp->if_xname); + RT_UNLOCK(rt); + break; + } + if (sdl->sdl_alen && + bcmp(ar_sha(ah), LLADDR(sdl), sdl->sdl_alen)) { + if (rt->rt_expire) { + if (log_arp_movements) + log(LOG_INFO, + "arp: %s moved from %*D to %*D " + "on %s\n", + inet_ntoa(isaddr), + ifp->if_addrlen, + (u_char *)LLADDR(sdl), ":", + ifp->if_addrlen, + (u_char *)ar_sha(ah), ":", + ifp->if_xname); + } else { + RT_UNLOCK(rt); + if (log_arp_permanent_modify) + log(LOG_ERR, + "arp: %*D attempts to " + "modify permanent entry " + "for %s on %s\n", + ifp->if_addrlen, + (u_char *)ar_sha(ah), ":", + inet_ntoa(isaddr), + ifp->if_xname); + break; + } + } /* - * Set up source routing information for - * reply packet (XXX) + * sanity check for the address length. + * XXX this does not work for protocols + * with variable address length. -is */ - m->m_data -= rif_len; - m->m_len += rif_len; - m->m_pkthdr.len += rif_len; - } else { - th->iso88025_shost[0] &= ~TR_RII; - trld->trld_rcf = 0; + if (sdl->sdl_alen && + sdl->sdl_alen != ah->ar_hln) { + log(LOG_WARNING, + "arp from %*D: new addr len %d, was %d", + ifp->if_addrlen, (u_char *) ar_sha(ah), + ":", ah->ar_hln, sdl->sdl_alen); + } + if (ifp->if_addrlen != ah->ar_hln) { + log(LOG_WARNING, + "arp from %*D: addr len: " + "new %d, i/f %d (ignored)", + ifp->if_addrlen, (u_char *) ar_sha(ah), + ":", ah->ar_hln, ifp->if_addrlen); + RT_UNLOCK(rt); + break; + } + firstpass = 0; + goodfib = fibnum; } - m->m_data -= 8; - m->m_len += 8; - m->m_pkthdr.len += 8; - th->rcf = trld->trld_rcf; - } - if (rt->rt_expire) { - rt->rt_expire = time_uptime + arpt_keep; - callout_reset(&la->la_timer, hz * arpt_keep, arptimer, rt); - } - la->la_asked = 0; - la->la_preempt = arp_maxtries; - hold = la->la_hold; - la->la_hold = NULL; - RT_UNLOCK(rt); - if (hold != NULL) - (*ifp->if_output)(ifp, hold, rt_key(rt), rt); + /* Copy in the information received. */ + (void)memcpy(LLADDR(sdl), ar_sha(ah), + sdl->sdl_alen = ah->ar_hln); + /* + * If we receive an arp from a token-ring station over + * a token-ring nic then try to save the source routing info. + * XXXMRT Only minimal Token Ring support for MRT. + * Only do this on the first pass as if modifies the mbuf. + */ + if (ifp->if_type == IFT_ISO88025) { + struct iso88025_header *th = NULL; + struct iso88025_sockaddr_dl_data *trld; + + /* force the fib loop to end after this pass */ + fibnum = rt_numfibs - 1; + + th = (struct iso88025_header *)m->m_pkthdr.header; + trld = SDL_ISO88025(sdl); + rif_len = TR_RCF_RIFLEN(th->rcf); + if ((th->iso88025_shost[0] & TR_RII) && + (rif_len > 2)) { + trld->trld_rcf = th->rcf; + trld->trld_rcf ^= htons(TR_RCF_DIR); + memcpy(trld->trld_route, th->rd, rif_len - 2); + trld->trld_rcf &= ~htons(TR_RCF_BCST_MASK); + /* + * Set up source routing information for + * reply packet (XXX) + */ + m->m_data -= rif_len; + m->m_len += rif_len; + m->m_pkthdr.len += rif_len; + } else { + th->iso88025_shost[0] &= ~TR_RII; + trld->trld_rcf = 0; + } + m->m_data -= 8; + m->m_len += 8; + m->m_pkthdr.len += 8; + th->rcf = trld->trld_rcf; + } + + if (rt->rt_expire) { + rt->rt_expire = time_uptime + arpt_keep; + callout_reset(&la->la_timer, hz * arpt_keep, + arptimer, rt); + } + la->la_asked = 0; + la->la_preempt = arp_maxtries; + hold = la->la_hold; + la->la_hold = NULL; + RT_UNLOCK(rt); + if (hold != NULL) + (*ifp->if_output)(ifp, hold, rt_key(rt), rt); + } /* end of FIB loop */ reply: + + /* + * Decide if we have to respond to something. + */ if (op != ARPOP_REQUEST) goto drop; if (itaddr.s_addr == myaddr.s_addr) { - /* I am the target */ + /* Shortcut.. the receiving interface is the target. */ (void)memcpy(ar_tha(ah), ar_sha(ah), ah->ar_hln); (void)memcpy(ar_sha(ah), enaddr, ah->ar_hln); } else { - rt = arplookup(itaddr.s_addr, 0, SIN_PROXY); + /* It's not asking for our address. But it still may + * be something we should answer. + * + * XXX MRT + * We assume that link level info is independent of + * the table used and so we use whichever we can and don't + * have a better option. + */ + /* Have we been asked to proxy for the target. */ + rt = arplookup(itaddr.s_addr, 0, SIN_PROXY, goodfib); if (rt == NULL) { + /* Nope, only intersted now if proxying everything. */ struct sockaddr_in sin; if (!arp_proxyall) @@ -811,7 +872,8 @@ reply: sin.sin_len = sizeof sin; sin.sin_addr = itaddr; - rt = rtalloc1((struct sockaddr *)&sin, 0, 0UL); + /* XXX MRT use table 0 for arp reply */ + rt = in_rtalloc1((struct sockaddr *)&sin, 0, 0UL, 0); if (!rt) goto drop; /* @@ -835,7 +897,8 @@ reply: */ sin.sin_addr = isaddr; - rt = rtalloc1((struct sockaddr *)&sin, 0, 0UL); + /* XXX MRT use table 0 for arp checks */ + rt = in_rtalloc1((struct sockaddr *)&sin, 0, 0UL, 0); if (!rt) goto drop; if (rt->rt_ifp != ifp) { @@ -905,7 +968,7 @@ drop: * Lookup or enter a new address in arptab. */ static struct rtentry * -arplookup(u_long addr, int create, int proxy) +arplookup(u_long addr, int create, int proxy, int fibnum) { struct rtentry *rt; struct sockaddr_inarp sin; @@ -917,7 +980,7 @@ arplookup(u_long addr, int create, int proxy) sin.sin_addr.s_addr = addr; if (proxy) sin.sin_other = SIN_PROXY; - rt = rtalloc1((struct sockaddr *)&sin, create, 0UL); + rt = in_rtalloc1((struct sockaddr *)&sin, create, 0UL, fibnum); if (rt == 0) return (0); diff --git a/sys/netinet/in_gif.c b/sys/netinet/in_gif.c index 69a34f8..55b4ec7 100644 --- a/sys/netinet/in_gif.c +++ b/sys/netinet/in_gif.c @@ -191,6 +191,8 @@ in_gif_output(struct ifnet *ifp, int family, struct mbuf *m) } bcopy(&iphdr, mtod(m, struct ip *), sizeof(struct ip)); + M_SETFIB(m, sc->gif_fibnum); + if (dst->sin_family != sin_dst->sin_family || dst->sin_addr.s_addr != sin_dst->sin_addr.s_addr) { /* cache route doesn't match */ @@ -208,7 +210,7 @@ in_gif_output(struct ifnet *ifp, int family, struct mbuf *m) } if (sc->gif_ro.ro_rt == NULL) { - rtalloc_ign(&sc->gif_ro, 0); + in_rtalloc_ign(&sc->gif_ro, 0, sc->gif_fibnum); if (sc->gif_ro.ro_rt == NULL) { m_freem(m); return ENETUNREACH; @@ -368,7 +370,9 @@ gif_validate4(const struct ip *ip, struct gif_softc *sc, struct ifnet *ifp) sin.sin_family = AF_INET; sin.sin_len = sizeof(struct sockaddr_in); sin.sin_addr = ip->ip_src; - rt = rtalloc1((struct sockaddr *)&sin, 0, 0UL); + /* XXX MRT check for the interface we would use on output */ + rt = in_rtalloc1((struct sockaddr *)&sin, 0, + 0UL, sc->gif_fibnum); if (!rt || rt->rt_ifp != ifp) { #if 0 log(LOG_WARNING, "%s: packet from 0x%x dropped " diff --git a/sys/netinet/in_mcast.c b/sys/netinet/in_mcast.c index be2208a..9f37f33 100644 --- a/sys/netinet/in_mcast.c +++ b/sys/netinet/in_mcast.c @@ -1025,7 +1025,8 @@ inp_join_group(struct inpcb *inp, struct sockopt *sopt) ro.ro_rt = NULL; *(struct sockaddr_in *)&ro.ro_dst = gsa->sin; - rtalloc_ign(&ro, RTF_CLONING); + in_rtalloc_ign(&ro, RTF_CLONING, + inp->inp_inc.inc_fibnum); if (ro.ro_rt != NULL) { ifp = ro.ro_rt->rt_ifp; KASSERT(ifp != NULL, ("%s: null ifp", diff --git a/sys/netinet/in_pcb.c b/sys/netinet/in_pcb.c index 9b0b6a5..a9702c5 100644 --- a/sys/netinet/in_pcb.c +++ b/sys/netinet/in_pcb.c @@ -186,6 +186,7 @@ in_pcballoc(struct socket *so, struct inpcbinfo *pcbinfo) bzero(inp, inp_zero_size); inp->inp_pcbinfo = pcbinfo; inp->inp_socket = so; + inp->inp_inc.inc_fibnum = so->so_fibnum; #ifdef MAC error = mac_inpcb_init(inp, M_NOWAIT); if (error != 0) @@ -605,7 +606,7 @@ in_pcbconnect_setup(struct inpcb *inp, struct sockaddr *nam, * Find out route to destination */ if ((inp->inp_socket->so_options & SO_DONTROUTE) == 0) - ia = ip_rtaddr(faddr); + ia = ip_rtaddr(faddr, inp->inp_inc.inc_fibnum); /* * If we found a route, use the address corresponding to * the outgoing interface. diff --git a/sys/netinet/in_pcb.h b/sys/netinet/in_pcb.h index afb4dd2..6e5c92e 100644 --- a/sys/netinet/in_pcb.h +++ b/sys/netinet/in_pcb.h @@ -101,7 +101,7 @@ struct in_endpoints { struct in_conninfo { u_int8_t inc_flags; u_int8_t inc_len; - u_int16_t inc_pad; /* XXX alignment for in_endpoints */ + u_int16_t inc_fibnum; /* XXX was pad, 16 bits is plenty */ /* protocol dependent part */ struct in_endpoints inc_ie; }; diff --git a/sys/netinet/in_rmx.c b/sys/netinet/in_rmx.c index 8a5f978..aabf57e 100644 --- a/sys/netinet/in_rmx.c +++ b/sys/netinet/in_rmx.c @@ -110,7 +110,8 @@ in_addroute(void *v_arg, void *n_arg, struct radix_node_head *head, * Find out if it is because of an * ARP entry and delete it if so. */ - rt2 = rtalloc1((struct sockaddr *)sin, 0, RTF_CLONING); + rt2 = in_rtalloc1((struct sockaddr *)sin, 0, + RTF_CLONING, rt->rt_fibnum); if (rt2) { if (rt2->rt_flags & RTF_LLINFO && rt2->rt_flags & RTF_HOST && @@ -225,10 +226,10 @@ in_rtqkill(struct radix_node *rn, void *rock) if (rt->rt_refcnt > 0) panic("rtqkill route really not free"); - err = rtrequest(RTM_DELETE, + err = in_rtrequest(RTM_DELETE, (struct sockaddr *)rt_key(rt), rt->rt_gateway, rt_mask(rt), - rt->rt_flags, 0); + rt->rt_flags, 0, rt->rt_fibnum); if (err) { log(LOG_WARNING, "in_rtqkill: error %d\n", err); } else { @@ -253,12 +254,31 @@ in_rtqkill(struct radix_node *rn, void *rock) static int rtq_timeout = RTQ_TIMEOUT; static struct callout rtq_timer; +static void in_rtqtimo_one(void *rock); + static void in_rtqtimo(void *rock) { + int fibnum; + void *newrock; + struct timeval atv; + + KASSERT((rock == (void *)rt_tables[0][AF_INET]), + ("in_rtqtimo: unexpected arg")); + for (fibnum = 0; fibnum < rt_numfibs; fibnum++) { + if ((newrock = rt_tables[fibnum][AF_INET]) != NULL) + in_rtqtimo_one(newrock); + } + atv.tv_usec = 0; + atv.tv_sec = rtq_timeout; + callout_reset(&rtq_timer, tvtohz(&atv), in_rtqtimo, rock); +} + +static void +in_rtqtimo_one(void *rock) +{ struct radix_node_head *rnh = rock; struct rtqk_arg arg; - struct timeval atv; static time_t last_adjusted_timeout = 0; arg.found = arg.killed = 0; @@ -297,27 +317,29 @@ in_rtqtimo(void *rock) RADIX_NODE_HEAD_UNLOCK(rnh); } - atv.tv_usec = 0; - atv.tv_sec = arg.nextstop - time_uptime; - callout_reset(&rtq_timer, tvtohz(&atv), in_rtqtimo, rock); } void in_rtqdrain(void) { - struct radix_node_head *rnh = rt_tables[AF_INET]; + struct radix_node_head *rnh; struct rtqk_arg arg; + int fibnum; - arg.found = arg.killed = 0; - arg.rnh = rnh; - arg.nextstop = 0; - arg.draining = 1; - arg.updating = 0; - RADIX_NODE_HEAD_LOCK(rnh); - rnh->rnh_walktree(rnh, in_rtqkill, &arg); - RADIX_NODE_HEAD_UNLOCK(rnh); + for ( fibnum = 0; fibnum < rt_numfibs; fibnum++) { + rnh = rt_tables[fibnum][AF_INET]; + arg.found = arg.killed = 0; + arg.rnh = rnh; + arg.nextstop = 0; + arg.draining = 1; + arg.updating = 0; + RADIX_NODE_HEAD_LOCK(rnh); + rnh->rnh_walktree(rnh, in_rtqkill, &arg); + RADIX_NODE_HEAD_UNLOCK(rnh); + } } +static int _in_rt_was_here; /* * Initialize our routing tree. */ @@ -326,18 +348,29 @@ in_inithead(void **head, int off) { struct radix_node_head *rnh; - if (!rn_inithead(head, off)) + /* XXX MRT + * This can be called from vfs_export.c too in which case 'off' + * will be 0. We know the correct value so just use that and + * return directly if it was 0. + * This is a hack that replaces an even worse hack on a bad hack + * on a bad design. After RELENG_7 this should be fixed but that + * will change the ABI, so for now do it this way. + */ + if (!rn_inithead(head, 32)) return 0; - if (head != (void **)&rt_tables[AF_INET]) /* BOGUS! */ - return 1; /* only do this for the real routing table */ + if (off == 0) /* XXX MRT see above */ + return 1; /* only do the rest for a real routing table */ rnh = *head; rnh->rnh_addaddr = in_addroute; rnh->rnh_matchaddr = in_matroute; rnh->rnh_close = in_clsroute; - callout_init(&rtq_timer, CALLOUT_MPSAFE); - in_rtqtimo(rnh); /* kick off timeout first time */ + if (_in_rt_was_here == 0 ) { + callout_init(&rtq_timer, CALLOUT_MPSAFE); + in_rtqtimo(rnh); /* kick off timeout first time */ + _in_rt_was_here = 1; + } return 1; } @@ -384,16 +417,81 @@ in_ifadown(struct ifaddr *ifa, int delete) { struct in_ifadown_arg arg; struct radix_node_head *rnh; + int fibnum; if (ifa->ifa_addr->sa_family != AF_INET) return 1; - rnh = rt_tables[AF_INET]; - arg.ifa = ifa; - arg.del = delete; - RADIX_NODE_HEAD_LOCK(rnh); - rnh->rnh_walktree(rnh, in_ifadownkill, &arg); - RADIX_NODE_HEAD_UNLOCK(rnh); - ifa->ifa_flags &= ~IFA_ROUTE; /* XXXlocking? */ + for ( fibnum = 0; fibnum < rt_numfibs; fibnum++) { + rnh = rt_tables[fibnum][AF_INET]; + arg.ifa = ifa; + arg.del = delete; + RADIX_NODE_HEAD_LOCK(rnh); + rnh->rnh_walktree(rnh, in_ifadownkill, &arg); + RADIX_NODE_HEAD_UNLOCK(rnh); + ifa->ifa_flags &= ~IFA_ROUTE; /* XXXlocking? */ + } return 0; } + +/* + * inet versions of rt functions. These have fib extensions and + * for now will just reference the _fib variants. + * eventually this order will be reversed, + */ +void +in_rtalloc_ign(struct route *ro, u_long ignflags, u_int fibnum) +{ + rtalloc_ign_fib(ro, ignflags, fibnum); +} + +int +in_rtrequest( int req, + struct sockaddr *dst, + struct sockaddr *gateway, + struct sockaddr *netmask, + int flags, + struct rtentry **ret_nrt, + u_int fibnum) +{ + return (rtrequest_fib(req, dst, gateway, netmask, + flags, ret_nrt, fibnum)); +} + +struct rtentry * +in_rtalloc1(struct sockaddr *dst, int report, u_long ignflags, u_int fibnum) +{ + return (rtalloc1_fib(dst, report, ignflags, fibnum)); +} + +int +in_rt_check(struct rtentry **lrt, struct rtentry **lrt0, + struct sockaddr *dst, u_int fibnum) +{ + return (rt_check_fib(lrt, lrt0, dst, fibnum)); +} + +void +in_rtredirect(struct sockaddr *dst, + struct sockaddr *gateway, + struct sockaddr *netmask, + int flags, + struct sockaddr *src, + u_int fibnum) +{ + rtredirect_fib(dst, gateway, netmask, flags, src, fibnum); +} + +void +in_rtalloc(struct route *ro, u_int fibnum) +{ + rtalloc_ign_fib(ro, 0UL, fibnum); +} + +#if 0 +int in_rt_getifa(struct rt_addrinfo *, u_int fibnum); +int in_rtioctl(u_long, caddr_t, u_int); +int in_rtrequest1(int, struct rt_addrinfo *, struct rtentry **, u_int); +#endif + + diff --git a/sys/netinet/in_var.h b/sys/netinet/in_var.h index 47a160a..d7f1e28 100644 --- a/sys/netinet/in_var.h +++ b/sys/netinet/in_var.h @@ -287,6 +287,7 @@ do { \ IN_NEXT_MULTI((step), (inm)); \ } while(0) +struct rtentry; struct route; struct ip_moptions; @@ -305,6 +306,21 @@ int in_ifadown(struct ifaddr *ifa, int); void in_ifscrub(struct ifnet *, struct in_ifaddr *); struct mbuf *ip_fastforward(struct mbuf *); +/* XXX */ +void in_rtalloc_ign(struct route *ro, u_long ignflags, u_int fibnum); +void in_rtalloc(struct route *ro, u_int fibnum); +struct rtentry *in_rtalloc1(struct sockaddr *, int, u_long, u_int); +void in_rtredirect(struct sockaddr *, struct sockaddr *, + struct sockaddr *, int, struct sockaddr *, u_int); +int in_rtrequest(int, struct sockaddr *, + struct sockaddr *, struct sockaddr *, int, struct rtentry **, u_int); +int in_rt_check(struct rtentry **, struct rtentry **, struct sockaddr *, u_int); + +#if 0 +int in_rt_getifa(struct rt_addrinfo *, u_int fibnum); +int in_rtioctl(u_long, caddr_t, u_int); +int in_rtrequest1(int, struct rt_addrinfo *, struct rtentry **, u_int); +#endif #endif /* _KERNEL */ /* INET6 stuff */ diff --git a/sys/netinet/ip_fastfwd.c b/sys/netinet/ip_fastfwd.c index 97b823f..bb8c74a 100644 --- a/sys/netinet/ip_fastfwd.c +++ b/sys/netinet/ip_fastfwd.c @@ -123,7 +123,7 @@ ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m) dst->sin_family = AF_INET; dst->sin_len = sizeof(*dst); dst->sin_addr.s_addr = dest.s_addr; - rtalloc_ign(ro, RTF_CLONING); + in_rtalloc_ign(ro, RTF_CLONING, M_GETFIB(m)); /* * Route there and interface still up? diff --git a/sys/netinet/ip_fw.h b/sys/netinet/ip_fw.h index b41c037..5dcdbb3 100644 --- a/sys/netinet/ip_fw.h +++ b/sys/netinet/ip_fw.h @@ -161,6 +161,9 @@ enum ipfw_opcodes { /* arguments (4 byte each) */ O_TAG, /* arg1=tag number */ O_TAGGED, /* arg1=tag number */ + O_SETFIB, /* arg1=FIB number */ + O_FIB, /* arg1=FIB desired fib number */ + O_LAST_OPCODE /* not an opcode! */ }; @@ -465,6 +468,7 @@ struct ipfw_flow_id { u_int32_t src_ip; u_int16_t dst_port; u_int16_t src_port; + u_int8_t fib; u_int8_t proto; u_int8_t flags; /* protocol-specific flags */ uint8_t addr_type; /* 4 = ipv4, 6 = ipv6, 1=ether ? */ diff --git a/sys/netinet/ip_fw2.c b/sys/netinet/ip_fw2.c index 39baa71..2346df6 100644 --- a/sys/netinet/ip_fw2.c +++ b/sys/netinet/ip_fw2.c @@ -492,7 +492,7 @@ iface_match(struct ifnet *ifp, ipfw_insn_if *cmd) * multicast, or broadcast. */ static int -verify_path(struct in_addr src, struct ifnet *ifp) +verify_path(struct in_addr src, struct ifnet *ifp, u_int fib) { struct route ro; struct sockaddr_in *dst; @@ -503,7 +503,7 @@ verify_path(struct in_addr src, struct ifnet *ifp) dst->sin_family = AF_INET; dst->sin_len = sizeof(*dst); dst->sin_addr = src; - rtalloc_ign(&ro, RTF_CLONING); + in_rtalloc_ign(&ro, RTF_CLONING, fib); if (ro.ro_rt == NULL) return 0; @@ -593,6 +593,7 @@ verify_path6(struct in6_addr *src, struct ifnet *ifp) dst->sin6_family = AF_INET6; dst->sin6_len = sizeof(*dst); dst->sin6_addr = *src; + /* XXX MRT 0 for ipv6 at this time */ rtalloc_ign((struct route *)&ro, RTF_CLONING); if (ro.ro_rt == NULL) @@ -828,6 +829,10 @@ ipfw_log(struct ip_fw *f, u_int hlen, struct ip_fw_args *args, snprintf(SNPARGS(action2, 0), "Tee %d", cmd->arg1); break; + case O_SETFIB: + snprintf(SNPARGS(action2, 0), "SetFib %d", + cmd->arg1); + break; case O_SKIPTO: snprintf(SNPARGS(action2, 0), "SkipTo %d", cmd->arg1); @@ -1500,6 +1505,7 @@ install_state(struct ip_fw *rule, ipfw_insn_limit *cmd, id.dst_ip = id.src_ip = id.dst_port = id.src_port = 0; id.proto = args->f_id.proto; id.addr_type = args->f_id.addr_type; + id.fib = M_GETFIB(args->m); if (IS_IP6_FLOW_ID (&(args->f_id))) { if (limit_mask & DYN_SRC_ADDR) @@ -1601,6 +1607,7 @@ send_pkt(struct mbuf *replyto, struct ipfw_flow_id *id, u_int32_t seq, return (NULL); m->m_pkthdr.rcvif = (struct ifnet *)0; + M_SETFIB(m, id->fib); #ifdef MAC if (replyto != NULL) mac_netinet_firewall_reply(replyto, m); @@ -2200,6 +2207,7 @@ ipfw_chk(struct ip_fw_args *args) return (IP_FW_PASS); /* accept */ pktlen = m->m_pkthdr.len; + args->f_id.fib = M_GETFIB(m); /* note mbuf not altered) */ proto = args->f_id.proto = 0; /* mark f_id invalid */ /* XXX 0 is a valid proto: IP/IPv6 Hop-by-Hop Option */ @@ -2911,7 +2919,8 @@ check_body: verify_path6(&(args->f_id.src_ip6), m->m_pkthdr.rcvif) : #endif - verify_path(src_ip, m->m_pkthdr.rcvif))); + verify_path(src_ip, m->m_pkthdr.rcvif, + args->f_id.fib))); break; case O_VERSRCREACH: @@ -2922,7 +2931,7 @@ check_body: verify_path6(&(args->f_id.src_ip6), NULL) : #endif - verify_path(src_ip, NULL))); + verify_path(src_ip, NULL, args->f_id.fib))); break; case O_ANTISPOOF: @@ -2941,7 +2950,8 @@ check_body: m->m_pkthdr.rcvif) : #endif verify_path(src_ip, - m->m_pkthdr.rcvif); + m->m_pkthdr.rcvif, + args->f_id.fib); else match = 1; break; @@ -3043,6 +3053,11 @@ check_body: break; } + case O_FIB: /* try match the specified fib */ + if (args->f_id.fib == cmd->arg1) + match = 1; + break; + case O_TAGGED: { uint32_t tag = (cmd->arg1 == IP_FW_TABLEARG) ? tablearg : cmd->arg1; @@ -3203,7 +3218,6 @@ check_body: IP_FW_DIVERT : IP_FW_TEE; goto done; } - case O_COUNT: case O_SKIPTO: f->pcnt++; /* update stats */ @@ -3283,6 +3297,14 @@ check_body: IP_FW_NETGRAPH : IP_FW_NGTEE; goto done; + case O_SETFIB: + f->pcnt++; /* update stats */ + f->bcnt += pktlen; + f->timestamp = time_uptime; + M_SETFIB(m, cmd->arg1); + args->f_id.fib = cmd->arg1; + goto next_rule; + case O_NAT: { struct cfg_nat *t; int nat_id; @@ -3793,6 +3815,26 @@ check_ipfw_struct(struct ip_fw *rule, int size) goto bad_size; break; + case O_FIB: + if (cmdlen != F_INSN_SIZE(ipfw_insn)) + goto bad_size; + if (cmd->arg1 >= rt_numfibs) { + printf("ipfw: invalid fib number %d\n", + cmd->arg1); + return EINVAL; + } + break; + + case O_SETFIB: + if (cmdlen != F_INSN_SIZE(ipfw_insn)) + goto bad_size; + if (cmd->arg1 >= rt_numfibs) { + printf("ipfw: invalid fib number %d\n", + cmd->arg1); + return EINVAL; + } + goto check_action; + case O_UID: case O_GID: case O_JAIL: diff --git a/sys/netinet/ip_icmp.c b/sys/netinet/ip_icmp.c index 4f664bf..bed9536 100644 --- a/sys/netinet/ip_icmp.c +++ b/sys/netinet/ip_icmp.c @@ -227,6 +227,10 @@ stdreply: icmpelen = max(8, min(icmp_quotelen, oip->ip_len - oiphlen)); m_align(m, ICMP_MINLEN + icmplen); m->m_len = ICMP_MINLEN + icmplen; + /* XXX MRT make the outgoing packet use the same FIB + * that was associated with the incoming packet + */ + M_SETFIB(m, M_GETFIB(n)); icp = mtod(m, struct icmp *); icmpstat.icps_outhist[type]++; icp->icmp_type = type; @@ -295,6 +299,7 @@ icmp_input(struct mbuf *m, int off) int icmplen = ip->ip_len; int i, code; void (*ctlfunc)(int, struct sockaddr *, void *); + int fibnum; /* * Locate icmp structure in mbuf, and check @@ -576,10 +581,12 @@ reflect: } #endif icmpsrc.sin_addr = icp->icmp_ip.ip_dst; - rtredirect((struct sockaddr *)&icmpsrc, - (struct sockaddr *)&icmpdst, - (struct sockaddr *)0, RTF_GATEWAY | RTF_HOST, - (struct sockaddr *)&icmpgw); + for ( fibnum = 0; fibnum < rt_numfibs; fibnum++) { + in_rtredirect((struct sockaddr *)&icmpsrc, + (struct sockaddr *)&icmpdst, + (struct sockaddr *)0, RTF_GATEWAY | RTF_HOST, + (struct sockaddr *)&icmpgw, fibnum); + } pfctlinput(PRC_REDIRECT_HOST, (struct sockaddr *)&icmpsrc); #ifdef IPSEC key_sa_routechange((struct sockaddr *)&icmpsrc); @@ -693,7 +700,7 @@ icmp_reflect(struct mbuf *m) * When we don't have a route back to the packet source, stop here * and drop the packet. */ - ia = ip_rtaddr(ip->ip_dst); + ia = ip_rtaddr(ip->ip_dst, M_GETFIB(m)); if (ia == NULL) { m_freem(m); icmpstat.icps_noroute++; diff --git a/sys/netinet/ip_input.c b/sys/netinet/ip_input.c index 1eb9e4a..93ba871 100644 --- a/sys/netinet/ip_input.c +++ b/sys/netinet/ip_input.c @@ -1198,7 +1198,7 @@ ipproto_unregister(u_char ipproto) * return internet address info of interface to be used to get there. */ struct in_ifaddr * -ip_rtaddr(struct in_addr dst) +ip_rtaddr(struct in_addr dst, u_int fibnum) { struct route sro; struct sockaddr_in *sin; @@ -1209,7 +1209,7 @@ ip_rtaddr(struct in_addr dst) sin->sin_family = AF_INET; sin->sin_len = sizeof(*sin); sin->sin_addr = dst; - rtalloc_ign(&sro, RTF_CLONING); + in_rtalloc_ign(&sro, RTF_CLONING, fibnum); if (sro.ro_rt == NULL) return (NULL); @@ -1269,7 +1269,7 @@ ip_forward(struct mbuf *m, int srcrt) } #endif - ia = ip_rtaddr(ip->ip_dst); + ia = ip_rtaddr(ip->ip_dst, M_GETFIB(m)); if (!srcrt && ia == NULL) { icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0); return; @@ -1334,7 +1334,7 @@ ip_forward(struct mbuf *m, int srcrt) sin->sin_family = AF_INET; sin->sin_len = sizeof(*sin); sin->sin_addr = ip->ip_dst; - rtalloc_ign(&ro, RTF_CLONING); + in_rtalloc_ign(&ro, RTF_CLONING, M_GETFIB(m)); rt = ro.ro_rt; @@ -1363,7 +1363,7 @@ ip_forward(struct mbuf *m, int srcrt) * the ICMP_UNREACH_NEEDFRAG "Next-Hop MTU" field described in RFC1191. */ bzero(&ro, sizeof(ro)); - rtalloc_ign(&ro, RTF_CLONING); + rtalloc_ign_fib(&ro, RTF_CLONING, M_GETFIB(m)); error = ip_output(m, NULL, &ro, IP_FORWARDING, NULL, NULL); diff --git a/sys/netinet/ip_mroute.c b/sys/netinet/ip_mroute.c index 6e0e124..d60e8bd 100644 --- a/sys/netinet/ip_mroute.c +++ b/sys/netinet/ip_mroute.c @@ -303,7 +303,7 @@ static int X_ip_mrouter_done(void); static int X_ip_mrouter_get(struct socket *so, struct sockopt *m); static int X_ip_mrouter_set(struct socket *so, struct sockopt *m); static int X_legal_vif_num(int vif); -static int X_mrt_ioctl(int cmd, caddr_t data); +static int X_mrt_ioctl(int cmd, caddr_t data, int fibnum); static int get_sg_cnt(struct sioc_sg_req *); static int get_vif_cnt(struct sioc_vif_req *); @@ -552,7 +552,7 @@ X_ip_mrouter_get(struct socket *so, struct sockopt *sopt) * Handle ioctl commands to obtain information from the cache */ static int -X_mrt_ioctl(int cmd, caddr_t data) +X_mrt_ioctl(int cmd, caddr_t data, int fibnum) { int error = 0; diff --git a/sys/netinet/ip_mroute.h b/sys/netinet/ip_mroute.h index c756d84..4043e44 100644 --- a/sys/netinet/ip_mroute.h +++ b/sys/netinet/ip_mroute.h @@ -359,7 +359,7 @@ struct sockopt; extern int (*ip_mrouter_set)(struct socket *, struct sockopt *); extern int (*ip_mrouter_get)(struct socket *, struct sockopt *); extern int (*ip_mrouter_done)(void); -extern int (*mrt_ioctl)(int, caddr_t); +extern int (*mrt_ioctl)(int, caddr_t, int); #endif /* _KERNEL */ diff --git a/sys/netinet/ip_options.c b/sys/netinet/ip_options.c index 72b6edd..0019f7a 100644 --- a/sys/netinet/ip_options.c +++ b/sys/netinet/ip_options.c @@ -233,7 +233,8 @@ dropit: if ((ia = (INA)ifa_ifwithdstaddr((SA)&ipaddr)) == NULL) ia = (INA)ifa_ifwithnet((SA)&ipaddr); } else - ia = ip_rtaddr(ipaddr.sin_addr); +/* XXX MRT 0 for routing */ + ia = ip_rtaddr(ipaddr.sin_addr, M_GETFIB(m)); if (ia == NULL) { type = ICMP_UNREACH; code = ICMP_UNREACH_SRCFAIL; @@ -276,7 +277,7 @@ dropit: * same). */ if ((ia = (INA)ifa_ifwithaddr((SA)&ipaddr)) == NULL && - (ia = ip_rtaddr(ipaddr.sin_addr)) == NULL) { + (ia = ip_rtaddr(ipaddr.sin_addr, M_GETFIB(m))) == NULL) { type = ICMP_UNREACH; code = ICMP_UNREACH_HOST; goto bad; diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c index 37995ef..231510a 100644 --- a/sys/netinet/ip_output.c +++ b/sys/netinet/ip_output.c @@ -230,10 +230,12 @@ again: */ if (ro->ro_rt == NULL) #ifdef RADIX_MPATH - rtalloc_mpath(ro, - ntohl(ip->ip_src.s_addr ^ ip->ip_dst.s_addr)); + rtalloc_mpath_fib(ro, + ntohl(ip->ip_src.s_addr ^ ip->ip_dst.s_addr), + inp ? inp->inp_inc.inc_fibnum : M_GETFIB(m)); #else - rtalloc_ign(ro, 0); + in_rtalloc_ign(ro, 0, + inp ? inp->inp_inc.inc_fibnum : M_GETFIB(m)); #endif if (ro->ro_rt == NULL) { ipstat.ips_noroute++; diff --git a/sys/netinet/ip_var.h b/sys/netinet/ip_var.h index eef4e1f..436a4a0 100644 --- a/sys/netinet/ip_var.h +++ b/sys/netinet/ip_var.h @@ -209,7 +209,7 @@ int ipproto_unregister(u_char); struct mbuf * ip_reass(struct mbuf *); struct in_ifaddr * - ip_rtaddr(struct in_addr); + ip_rtaddr(struct in_addr, u_int fibnum); void ip_savecontrol(struct inpcb *, struct mbuf **, struct ip *, struct mbuf *); void ip_slowtimo(void); diff --git a/sys/netinet/raw_ip.c b/sys/netinet/raw_ip.c index 23ab1fe..2e9366f 100644 --- a/sys/netinet/raw_ip.c +++ b/sys/netinet/raw_ip.c @@ -95,7 +95,7 @@ int (*ip_mrouter_get)(struct socket *, struct sockopt *); int (*ip_mrouter_done)(void); int (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *, struct ip_moptions *); -int (*mrt_ioctl)(int, caddr_t); +int (*mrt_ioctl)(int, caddr_t, int); int (*legal_vif_num)(int); u_long (*ip_mcast_src)(int); diff --git a/sys/netinet/sctp_os_bsd.h b/sys/netinet/sctp_os_bsd.h index b165943..01c0fcb 100644 --- a/sys/netinet/sctp_os_bsd.h +++ b/sys/netinet/sctp_os_bsd.h @@ -399,7 +399,7 @@ typedef struct callout sctp_os_timer_t; typedef struct route sctp_route_t; typedef struct rtentry sctp_rtentry_t; -#define SCTP_RTALLOC(ro, vrf_id) rtalloc_ign((struct route *)ro, 0UL) +#define SCTP_RTALLOC(ro, vrf_id) in_rtalloc_ign((struct route *)ro, 0UL, vrf_id) /* Future zero copy wakeup/send function */ #define SCTP_ZERO_COPY_EVENT(inp, so) diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c index a344ae5..47763c1 100644 --- a/sys/netinet/tcp_input.c +++ b/sys/netinet/tcp_input.c @@ -453,6 +453,7 @@ findpcb: /* * If the INPCB does not exist then all data in the incoming * segment is discarded and an appropriate RST is sent back. + * XXX MRT Send RST using which routing table? */ if (inp == NULL) { /* diff --git a/sys/netinet/tcp_subr.c b/sys/netinet/tcp_subr.c index aaac6d6..36422197 100644 --- a/sys/netinet/tcp_subr.c +++ b/sys/netinet/tcp_subr.c @@ -471,6 +471,10 @@ tcp_respond(struct tcpcb *tp, void *ipgen, struct tcphdr *th, struct mbuf *m, bcopy((caddr_t)th, (caddr_t)nth, sizeof(struct tcphdr)); flags = TH_ACK; } else { + /* + * reuse the mbuf. + * XXX MRT We inherrit the FIB, which is lucky. + */ m_freem(m->m_next); m->m_next = NULL; m->m_data = (caddr_t)ipgen; @@ -1199,6 +1203,8 @@ tcp_ctlinput(int cmd, struct sockaddr *sa, void *vip) bzero(&inc, sizeof(inc)); inc.inc_flags = 0; /* IPv4 */ inc.inc_faddr = faddr; + inc.inc_fibnum = + inp->inp_inc.inc_fibnum; mtu = ntohs(icp->icmp_nextmtu); /* @@ -1595,7 +1601,7 @@ tcp_maxmtu(struct in_conninfo *inc, int *flags) dst->sin_family = AF_INET; dst->sin_len = sizeof(*dst); dst->sin_addr = inc->inc_faddr; - rtalloc_ign(&sro, RTF_CLONING); + in_rtalloc_ign(&sro, RTF_CLONING, inc->inc_fibnum); } if (sro.ro_rt != NULL) { ifp = sro.ro_rt->rt_ifp; diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c index d5694f3..e19f095 100644 --- a/sys/netinet/tcp_syncache.c +++ b/sys/netinet/tcp_syncache.c @@ -671,6 +671,8 @@ syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m) #endif inp = sotoinpcb(so); + inp->inp_inc.inc_fibnum = sc->sc_inc.inc_fibnum; + so->so_fibnum = sc->sc_inc.inc_fibnum; INP_WLOCK(inp); /* Insert new socket into PCB hash list. */ @@ -941,6 +943,7 @@ syncache_expand(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th, else tcpstat.tcps_sc_completed++; +/* how do we find the inp for the new socket? */ if (sc != &scs) syncache_free(sc); return (1); @@ -1127,6 +1130,7 @@ _syncache_add(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th, sc->sc_label = maclabel; #endif sc->sc_ipopts = ipopts; + sc->sc_inc.inc_fibnum = inp->inp_inc.inc_fibnum; bcopy(inc, &sc->sc_inc, sizeof(struct in_conninfo)); #ifdef INET6 if (!inc->inc_isipv6) |