summaryrefslogtreecommitdiffstats
path: root/include/net/route.h
Commit message (Collapse)AuthorAgeFilesLines
* ipv4: avoid a test in ip_rt_put()Eric Dumazet2012-11-031-3/+6
| | | | | | | | | | | | | We can save a test in ip_rt_put(), considering dst_release() accepts a NULL parameter, and dst is first element in rtable. Add a BUILD_BUG_ON() to catch any change that could break this assertion. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <amwang@redhat.com> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: introduce rt_uses_gatewayJulian Anastasov2012-10-081-1/+2
| | | | | | | | | | | | | | | | Add new flag to remember when route is via gateway. We will use it to allow rt_gateway to contain address of directly connected host for the cases when DST_NOCACHE is used or when the NH exception caches per-destination route without DST_NOCACHE flag, i.e. when routes are not used for other destinations. By this way we force the neighbour resolving to work with the routed destination but we can use different address in the packet, feature needed for IPVS-DR where original packet for virtual IP is routed via route to real IP. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4/route: arg delay is useless in rt_cache_flush()Nicolas Dichtel2012-09-181-1/+1
| | | | | | | | | Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Properly purge netdev references on uncached routes.David S. Miller2012-07-311-0/+3
| | | | | | | | | | | | | When a device is unregistered, we have to purge all of the references to it that may exist in the entire system. If a route is uncached, we currently have no way of accomplishing this. So create a global list that is scanned when a network device goes down. This mirrors the logic in net/core/dst.c's dst_ifdown(). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Fix input route performance regression.David S. Miller2012-07-261-2/+17
| | | | | | | | | | | | | | | With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Change rt->rt_iif encoding.David S. Miller2012-07-231-1/+5
| | | | | | | | | | On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill rt->fiDavid S. Miller2012-07-201-1/+0
| | | | | | | | | | | | | | | | | | | It's not really needed. We only grabbed a reference to the fib_info for the sake of fib_info local metrics. However, fib_info objects are freed using RCU, as are therefore their private metrics (if any). We would have triggered a route cache flush if we eliminated a reference to a fib_info object in the routing tables. Therefore, any existing cached routes will first check and see that they have been invalidated before an errant reference to these metric values would occur. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Turn rt->rt_route_iif into rt->rt_is_input.David S. Miller2012-07-201-3/+3
| | | | | | | | | | That is this value's only use, as a boolean to indicate whether a route is an input route or not. So implement it that way, using a u16 gap present in the struct already. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill rt->rt_oifDavid S. Miller2012-07-201-1/+0
| | | | | | | | | | | | | Never actually used. It was being set on output routes to the original OIF specified in the flow key used for the lookup. Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of the flowi4_oif and flowi4_iif values, thanks to feedback from Julian Anastasov. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Adjust semantics of rt->rt_gateway.David S. Miller2012-07-201-0/+7
| | | | | | | | | | | | | | | | | In order to allow prefixed routes, we have to adjust how rt_gateway is set and interpreted. The new interpretation is: 1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr 2) rt_gateway != 0, destination requires a nexthop gateway Abstract the fetching of the proper nexthop value using a new inline helper, rt_nexthop(), as suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
* ipv4: Remove 'rt_dst' from 'struct rtable'David S. Miller2012-07-201-1/+0
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove 'rt_mark' from 'struct rtable'David Miller2012-07-201-1/+0
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill 'rt_src' from 'struct rtable'David Miller2012-07-201-1/+0
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove rt_key_{src,dst,tos} from struct rtable.David Miller2012-07-201-5/+0
| | | | | | | | They are always used in contexts where they can be reconstituted, or where the finally resolved rt->rt_{src,dst} is semantically equivalent. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill ip_route_input_noref().David Miller2012-07-201-14/+2
| | | | | | | | The "noref" argument to ip_route_input_common() is now always ignored because we do not cache routes, and in that case we must always grab a reference to the resulting 'dst'. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Delete routing cache.David S. Miller2012-07-201-1/+0
| | | | | | | | | | | | | | | | | | | The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill ip_rt_redirect().David S. Miller2012-07-111-1/+0
| | | | | | | No longer needed, as the protocol handlers now all properly propagate the redirect back into the routing code. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions.David S. Miller2012-07-111-0/+3
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Rearrange arguments to ip_rt_redirect()David S. Miller2012-07-111-2/+1
| | | | | | | | Pass in the SKB rather than just the IP addresses, so that policy and other aspects can reside in ip_rt_redirect() rather then icmp_redirect(). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove inetpeer from routes.David S. Miller2012-07-101-57/+0
| | | | | | No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Maintain redirect and PMTU info in struct rtable again.David S. Miller2012-07-101-1/+1
| | | | | | | Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Kill FLOWI_FLAG_PRECOW_METRICS.David S. Miller2012-07-101-2/+0
| | | | | | | | No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill rt->rt_spec_dst, no longer used.David S. Miller2012-06-281-1/+0
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "ipv4: tcp: dont cache unconfirmed intput dst"David S. Miller2012-06-271-4/+4
| | | | | | | | | | | | | | This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: tcp: dont cache unconfirmed intput dstEric Dumazet2012-06-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Handle PMTU in all ICMP error handlers.David S. Miller2012-06-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With ip_rt_frag_needed() removed, we have to explicitly update PMTU information in every ICMP error handler. Create two helper functions to facilitate this. 1) ipv4_sk_update_pmtu() This updates the PMTU when we have a socket context to work with. 2) ipv4_update_pmtu() Raw version, used when no socket context is available. For this interface, we essentially just pass in explicit arguments for the flow identity information we would have extracted from the socket. And you'll notice that ipv4_sk_update_pmtu() is simply implemented in terms of ipv4_update_pmtu() Note that __ip_route_output_key() is used, rather than something like ip_route_output_flow() or ip_route_output_key(). This is because we absolutely do not want to end up with a route that does IPSEC encapsulation and the like. Instead, we only want the route that would get us to the node described by the outermost IP header. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-06-121-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Reorder initialization in ip_route_output to fix gcc warningRoland Dreier2012-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If I build with W=1, for every file that includes <net/route.h>, I get the warning include/net/route.h: In function 'ip_route_output': include/net/route.h:135:3: warning: initialized field overwritten [-Woverride-init] include/net/route.h:135:3: warning: (near initialization for 'fl4') [-Woverride-init] (This is with "gcc (Debian 4.6.3-1) 4.6.3") A fix seems pretty trivial: move the initialization of .flowi4_tos earlier. As far as I can tell, this has no effect on code generation. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Fix BUG triggered by __rt{,6}_get_peer().David S. Miller2012-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | If no peer actually gets attached (either because create is zero or the peer allocation fails) we'll trigger a BUG because we unconditionally do an rt{,6}_peer_ptr() afterwards. Fix this by guarding it with the proper check. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Kill ip_rt_frag_needed().David S. Miller2012-06-111-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is zero point to this function. It's only real substance is to perform an extremely outdated BSD4.2 ICMP check, which we can safely remove. If you really have a MTU limited link being routed by a BSD4.2 derived system, here's a nickel go buy yourself a real router. The other actions of ip_rt_frag_needed(), checking and conditionally updating the peer, are done by the per-protocol handlers of the ICMP event. TCP, UDP, et al. have a handler which will receive this event and transmit it back into the associated route via dst_ops->update_pmtu(). This simplification is important, because it eliminates the one place where we do not have a proper route context in which to make an inetpeer lookup. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Hide route peer accesses behind helpers.David S. Miller2012-06-111-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Create and use rt{,6}_get_peer_create().David S. Miller2012-06-081-2/+12
|/ | | | | | | | | | | There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanup unsigned to unsigned intEric Dumazet2012-04-151-3/+3
| | | | | | | Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: reset flowi parameters on route connectJulian Anastasov2012-02-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric Dumazet found that commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) that comes in 3.0 added a regression. The problem appears to be that resulting flowi4_oif is used incorrectly as input parameter to some routing lookups. The result is that when connecting to local port without listener if the IP address that is used is not on a loopback interface we incorrectly assign RTN_UNICAST to the output route because no route is matched by oif=lo. The RST packet can not be sent immediately by tcp_v4_send_reset because it expects RTN_LOCAL. So, change ip_route_connect and ip_route_newports to update the flowi4 fields that are input parameters because we do not want unnecessary binding to oif. To make it clear what are the input parameters that can be modified during lookup and to show which fields of floiw4 are reused add a new function to update the flowi4 structure: flowi4_update_output. Thanks to Yurij M. Plotnikov for providing a bug report including a program to reproduce the problem. Thanks to Eric Dumazet for tracking the problem down to tcp_v4_send_reset and providing initial fix. Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* route: struct rtable can be const in rt_is_input_route and rt_is_output_routeSteffen Klassert2011-11-261-2/+2
| | | | | Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Pass explicit destination address to rt_bind_peer().David S. Miller2011-05-181-2/+2
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Pass explicit destination address to rt_get_peer().David S. Miller2011-05-181-1/+1
| | | | | | This will next trickle down to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove route key identity dependencies in ip_rt_get_source().David S. Miller2011-05-131-1/+1
| | | | | | | Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels.David S. Miller2011-05-041-10/+9
| | | | | | | | | | First, make callers pass on-stack flowi4 to ip_route_output_gre() so they can get at the fully resolved flow key. Next, use that in ipgre_tunnel_xmit() to avoid the need to use rt->rt_{dst,src}. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Make caller provide on-stack flow key to ip_route_output_ports().David S. Miller2011-05-031-6/+5
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Renamt struct rtable's rt_tos to rt_key_tos.David S. Miller2011-05-031-1/+1
| | | | | | | To more accurately reflect that it is purely a routing cache lookup key and is used in no other context. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Remove now superfluous code in ip_route_connect().David S. Miller2011-04-281-2/+0
| | | | | | | | Now that output route lookups update the flow with source address et al. selections, the fl4->{saddr,daddr} assignments here are no longer necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Use caller's on-stack flowi as-is in output route lookups.David S. Miller2011-04-281-1/+1
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Kill RTO_CONN.David S. Miller2011-04-271-4/+0
| | | | | | | | | It's not used by anything in the kernel, and defined in net/route.h so never exported to userspace. Therefore we can safely remove it. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Sanitize and simplify ip_route_{connect,newports}()David S. Miller2011-04-271-32/+56
| | | | | | | | | | | | | | | | | | | | | | | | These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
* net: Remove __KERNEL__ cpp checks from include/netDavid S. Miller2011-04-241-4/+0
| | | | | | | | | | | These header files are never installed to user consumption, so any __KERNEL__ cpp checks are superfluous. Projects should also not copy these files into their userland utility sources and try to use them there. If they insist on doing so, the onus is on them to sanitize the headers as needed. Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: constify ip headers and in6_addrEric Dumazet2011-04-221-1/+2
| | | | | | | | Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-04-071-2/+3
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/benet/be_main.c
| * ipv4: Fix "Set rt->rt_iif more sanely on output routes."OGAWA Hirofumi2011-04-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more sanely on output routes.") breaks rt_is_{output,input}_route. This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0". To fix it, this does: 1) Add "int rt_route_iif;" to struct rtable 2) For input routes, always set rt_route_iif to same value as rt_iif 3) For output routes, always set rt_route_iif to zero. Set rt_iif as it is done currently. 4) Change rt_is_{output,input}_route() to test rt_route_iif Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Use flowi4_init_output() in net/route.hDavid S. Miller2011-03-311-36/+24
|/ | | | Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud