op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ipv4/route: arg delay is useless in rt_cache_flush()	Nicolas Dichtel	2012-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Properly purge netdev references on uncached routes.	David S. Miller	2012-07-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	When a device is unregistered, we have to purge all of the references to it that may exist in the entire system. If a route is uncached, we currently have no way of accomplishing this. So create a global list that is scanned when a network device goes down. This mirrors the logic in net/core/dst.c's dst_ifdown(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Fix input route performance regression.	David S. Miller	2012-07-26	1	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Change rt->rt_iif encoding.	David S. Miller	2012-07-23	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill rt->fi	David S. Miller	2012-07-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's not really needed. We only grabbed a reference to the fib_info for the sake of fib_info local metrics. However, fib_info objects are freed using RCU, as are therefore their private metrics (if any). We would have triggered a route cache flush if we eliminated a reference to a fib_info object in the routing tables. Therefore, any existing cached routes will first check and see that they have been invalidated before an errant reference to these metric values would occur. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Turn rt->rt_route_iif into rt->rt_is_input.	David S. Miller	2012-07-20	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	That is this value's only use, as a boolean to indicate whether a route is an input route or not. So implement it that way, using a u16 gap present in the struct already. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill rt->rt_oif	David S. Miller	2012-07-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Never actually used. It was being set on output routes to the original OIF specified in the flow key used for the lookup. Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of the flowi4_oif and flowi4_iif values, thanks to feedback from Julian Anastasov. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Adjust semantics of rt->rt_gateway.	David S. Miller	2012-07-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to allow prefixed routes, we have to adjust how rt_gateway is set and interpreted. The new interpretation is: 1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr 2) rt_gateway != 0, destination requires a nexthop gateway Abstract the fetching of the proper nexthop value using a new inline helper, rt_nexthop(), as suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
*	ipv4: Remove 'rt_dst' from 'struct rtable'	David S. Miller	2012-07-20	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove 'rt_mark' from 'struct rtable'	David Miller	2012-07-20	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill 'rt_src' from 'struct rtable'	David Miller	2012-07-20	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove rt_key_{src,dst,tos} from struct rtable.	David Miller	2012-07-20	1	-5/+0
\| \| \| \| \| \| \| \|	They are always used in contexts where they can be reconstituted, or where the finally resolved rt->rt_{src,dst} is semantically equivalent. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill ip_route_input_noref().	David Miller	2012-07-20	1	-14/+2
\| \| \| \| \| \| \| \|	The "noref" argument to ip_route_input_common() is now always ignored because we do not cache routes, and in that case we must always grab a reference to the resulting 'dst'. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Delete routing cache.	David S. Miller	2012-07-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill ip_rt_redirect().	David S. Miller	2012-07-11	1	-1/+0
\| \| \| \| \| \| \|	No longer needed, as the protocol handlers now all properly propagate the redirect back into the routing code. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions.	David S. Miller	2012-07-11	1	-0/+3
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Rearrange arguments to ip_rt_redirect()	David S. Miller	2012-07-11	1	-2/+1
\| \| \| \| \| \| \| \|	Pass in the SKB rather than just the IP addresses, so that policy and other aspects can reside in ip_rt_redirect() rather then icmp_redirect(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove inetpeer from routes.	David S. Miller	2012-07-10	1	-57/+0
\| \| \| \| \| \|	No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Maintain redirect and PMTU info in struct rtable again.	David S. Miller	2012-07-10	1	-1/+1
\| \| \| \| \| \| \|	Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by: David S. Miller <davem@davemloft.net>
*	inet: Kill FLOWI_FLAG_PRECOW_METRICS.	David S. Miller	2012-07-10	1	-2/+0
\| \| \| \| \| \| \| \|	No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill rt->rt_spec_dst, no longer used.	David S. Miller	2012-06-28	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	Revert "ipv4: tcp: dont cache unconfirmed intput dst"	David S. Miller	2012-06-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: tcp: dont cache unconfirmed intput dst	Eric Dumazet	2012-06-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Handle PMTU in all ICMP error handlers.	David S. Miller	2012-06-14	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With ip_rt_frag_needed() removed, we have to explicitly update PMTU information in every ICMP error handler. Create two helper functions to facilitate this. 1) ipv4_sk_update_pmtu() This updates the PMTU when we have a socket context to work with. 2) ipv4_update_pmtu() Raw version, used when no socket context is available. For this interface, we essentially just pass in explicit arguments for the flow identity information we would have extracted from the socket. And you'll notice that ipv4_sk_update_pmtu() is simply implemented in terms of ipv4_update_pmtu() Note that __ip_route_output_key() is used, rather than something like ip_route_output_flow() or ip_route_output_key(). This is because we absolutely do not want to end up with a route that does IPSEC encapsulation and the like. Instead, we only want the route that would get us to the node described by the outermost IP header. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2012-06-12	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Reorder initialization in ip_route_output to fix gcc warning	Roland Dreier	2012-06-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If I build with W=1, for every file that includes <net/route.h>, I get the warning include/net/route.h: In function 'ip_route_output': include/net/route.h:135:3: warning: initialized field overwritten [-Woverride-init] include/net/route.h:135:3: warning: (near initialization for 'fl4') [-Woverride-init] (This is with "gcc (Debian 4.6.3-1) 4.6.3") A fix seems pretty trivial: move the initialization of .flowi4_tos earlier. As far as I can tell, this has no effect on code generation. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	inet: Fix BUG triggered by __rt{,6}_get_peer().	David S. Miller	2012-06-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If no peer actually gets attached (either because create is zero or the peer allocation fails) we'll trigger a BUG because we unconditionally do an rt{,6}_peer_ptr() afterwards. Fix this by guarding it with the proper check. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: Kill ip_rt_frag_needed().	David S. Miller	2012-06-11	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is zero point to this function. It's only real substance is to perform an extremely outdated BSD4.2 ICMP check, which we can safely remove. If you really have a MTU limited link being routed by a BSD4.2 derived system, here's a nickel go buy yourself a real router. The other actions of ip_rt_frag_needed(), checking and conditionally updating the peer, are done by the per-protocol handlers of the ICMP event. TCP, UDP, et al. have a handler which will receive this event and transmit it back into the associated route via dst_ops->update_pmtu(). This simplification is important, because it eliminates the one place where we do not have a proper route context in which to make an inetpeer lookup. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	inet: Hide route peer accesses behind helpers.	David S. Miller	2012-06-11	1	-4/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	inet: Create and use rt{,6}_get_peer_create().	David S. Miller	2012-06-08	1	-2/+12
\|/ \| \| \| \| \| \| \| \| \| \|	There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: cleanup unsigned to unsigned int	Eric Dumazet	2012-04-15	1	-3/+3
\| \| \| \| \| \| \|	Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: reset flowi parameters on route connect	Julian Anastasov	2012-02-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric Dumazet found that commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) that comes in 3.0 added a regression. The problem appears to be that resulting flowi4_oif is used incorrectly as input parameter to some routing lookups. The result is that when connecting to local port without listener if the IP address that is used is not on a loopback interface we incorrectly assign RTN_UNICAST to the output route because no route is matched by oif=lo. The RST packet can not be sent immediately by tcp_v4_send_reset because it expects RTN_LOCAL. So, change ip_route_connect and ip_route_newports to update the flowi4 fields that are input parameters because we do not want unnecessary binding to oif. To make it clear what are the input parameters that can be modified during lookup and to show which fields of floiw4 are reused add a new function to update the flowi4 structure: flowi4_update_output. Thanks to Yurij M. Plotnikov for providing a bug report including a program to reproduce the problem. Thanks to Eric Dumazet for tracking the problem down to tcp_v4_send_reset and providing initial fix. Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	route: struct rtable can be const in rt_is_input_route and rt_is_output_route	Steffen Klassert	2011-11-26	1	-2/+2
\| \| \| \| \|	Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Pass explicit destination address to rt_bind_peer().	David S. Miller	2011-05-18	1	-2/+2
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Pass explicit destination address to rt_get_peer().	David S. Miller	2011-05-18	1	-1/+1
\| \| \| \| \| \|	This will next trickle down to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove route key identity dependencies in ip_rt_get_source().	David S. Miller	2011-05-13	1	-1/+1
\| \| \| \| \| \| \|	Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels.	David S. Miller	2011-05-04	1	-10/+9
\| \| \| \| \| \| \| \| \| \|	First, make callers pass on-stack flowi4 to ip_route_output_gre() so they can get at the fully resolved flow key. Next, use that in ipgre_tunnel_xmit() to avoid the need to use rt->rt_{dst,src}. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Make caller provide on-stack flow key to ip_route_output_ports().	David S. Miller	2011-05-03	1	-6/+5
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Renamt struct rtable's rt_tos to rt_key_tos.	David S. Miller	2011-05-03	1	-1/+1
\| \| \| \| \| \| \|	To more accurately reflect that it is purely a routing cache lookup key and is used in no other context. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove now superfluous code in ip_route_connect().	David S. Miller	2011-04-28	1	-2/+0
\| \| \| \| \| \| \| \|	Now that output route lookups update the flow with source address et al. selections, the fl4->{saddr,daddr} assignments here are no longer necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Use caller's on-stack flowi as-is in output route lookups.	David S. Miller	2011-04-28	1	-1/+1
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill RTO_CONN.	David S. Miller	2011-04-27	1	-4/+0
\| \| \| \| \| \| \| \| \|	It's not used by anything in the kernel, and defined in net/route.h so never exported to userspace. Therefore we can safely remove it. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Sanitize and simplify ip_route_{connect,newports}()	David S. Miller	2011-04-27	1	-32/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
*	net: Remove __KERNEL__ cpp checks from include/net	David S. Miller	2011-04-24	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \|	These header files are never installed to user consumption, so any __KERNEL__ cpp checks are superfluous. Projects should also not copy these files into their userland utility sources and try to use them there. If they insist on doing so, the onus is on them to sanitize the headers as needed. Signed-off-by: David S. Miller <davem@davemloft.net>
*	inet: constify ip headers and in6_addr	Eric Dumazet	2011-04-22	1	-1/+2
\| \| \| \| \| \| \| \|	Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2011-04-07	1	-2/+3
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/benet/be_main.c
\| *	ipv4: Fix "Set rt->rt_iif more sanely on output routes."	OGAWA Hirofumi	2011-04-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more sanely on output routes.") breaks rt_is_{output,input}_route. This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0". To fix it, this does: 1) Add "int rt_route_iif;" to struct rtable 2) For input routes, always set rt_route_iif to same value as rt_iif 3) For output routes, always set rt_route_iif to zero. Set rt_iif as it is done currently. 4) Change rt_is_{output,input}_route() to test rt_route_iif Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: Use flowi4_init_output() in net/route.h	David S. Miller	2011-03-31	1	-36/+24
\|/ \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	route: Take the right src and dst addresses in ip_route_newports	Steffen Klassert	2011-03-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When we set up the flow informations in ip_route_newports(), we take the address informations from the the rt_key_src and rt_key_dst fields of the rtable. They appear to be empty. So take the address informations from rt_src and rt_dst instead. This issue was introduced by commit 5e2b61f78411be25f0b84f97d5b5d312f184dfd1 ("ipv4: Remove flowi from struct rtable.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: fix route deletion for IPs on many subnets	Julian Anastasov	2011-03-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alex Sidorenko reported for problems with local routes left after IP addresses are deleted. It happens when same IPs are used in more than one subnet for the device. Fix fib_del_ifaddr to restrict the checks for duplicate local and broadcast addresses only to the IFAs that use our primary IFA or another primary IFA with same address. And we expect the prefsrc to be matched when the routes are deleted because it is possible they to differ only by prefsrc. This patch prevents local and broadcast routes to be leaked until their primary IP is deleted finally from the box. As the secondary address promotion needs to delete the routes for all secondaries that used the old primary IFA, add option to ignore these secondaries from the checks and to assume they are already deleted, so that we can safely delete the route while these IFAs are still on the device list. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>