op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	xfrm: Fix null pointer dereference when decoding sessions	Steffen Klassert	2013-11-01	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some codepaths the skb does not have a dst entry when xfrm_decode_session() is called. So check for a valid skb_dst() before dereferencing the device interface index. We use 0 as the device index if there is no valid skb_dst(), or at reverse decoding we use skb_iif as device interface index. Bug was introduced with git commit bafd4bd4dc ("xfrm: Decode sessions with output interface."). Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
*	xfrm: Increase the garbage collector threshold	Steffen Klassert	2013-10-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the removal of the routing cache, we lost the option to tweak the garbage collector threshold along with the maximum routing cache size. So git commit 703fb94ec ("xfrm: Fix the gc threshold value for ipv4") moved back to a static threshold. It turned out that the current threshold before we start garbage collecting is much to small for some workloads, so increase it from 1024 to 32768. This means that we start the garbage collector if we have more than 32768 dst entries in the system and refuse new allocations if we are above 65536. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
*	xfrm: Decode sessions with output interface.	Steffen Klassert	2013-09-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	The output interface matching does not work on forward policy lookups, the output interface of the flowi is always 0. Fix this by setting the output interface when we decode the session. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
*	xfrm: make gc_thresh configurable in all namespaces	Michal Kubecek	2013-02-06	1	-3/+46
\| \| \| \| \| \| \| \| \| \|	The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh sysctl but currently only in init_net, other namespaces always use the default value. This can substantially limit the number of IPsec tunnels that can be effectively used. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
*	xfrm: remove unused xfrm4_policy_fini()	Michal Kubecek	2013-02-06	1	-9/+0
\| \| \| \| \| \| \| \|	Function xfrm4_policy_fini() is unused since xfrm4_fini() was removed in 2.6.11. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
*	xfrm: Fix the gc threshold value for ipv4	Steffen Klassert	2012-11-13	1	-12/+1
\| \| \| \| \| \| \| \| \| \|	The xfrm gc threshold value depends on ip_rt_max_size. This value was set to INT_MAX with the routing cache removal patch, so we start doing garbage collecting when we have INT_MAX/2 IPsec routes cached. Fix this by going back to the static threshold of 1024 routes. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
*	ipv4: introduce rt_uses_gateway	Julian Anastasov	2012-10-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new flag to remember when route is via gateway. We will use it to allow rt_gateway to contain address of directly connected host for the cases when DST_NOCACHE is used or when the NH exception caches per-destination route without DST_NOCACHE flag, i.e. when routes are not used for other destinations. By this way we force the neighbour resolving to work with the routed destination but we can use different address in the packet, feature needed for IPVS-DR where original packet for virtual IP is routed via route to real IP. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Properly purge netdev references on uncached routes.	David S. Miller	2012-07-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	When a device is unregistered, we have to purge all of the references to it that may exist in the entire system. If a route is uncached, we currently have no way of accomplishing this. So create a global list that is scanned when a network device goes down. This mirrors the logic in net/core/dst.c's dst_ifdown(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Turn rt->rt_route_iif into rt->rt_is_input.	David S. Miller	2012-07-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	That is this value's only use, as a boolean to indicate whether a route is an input route or not. So implement it that way, using a u16 gap present in the struct already. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill rt->rt_oif	David S. Miller	2012-07-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Never actually used. It was being set on output routes to the original OIF specified in the flow key used for the lookup. Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of the flowi4_oif and flowi4_iif values, thanks to feedback from Julian Anastasov. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove 'rt_dst' from 'struct rtable'	David S. Miller	2012-07-20	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove 'rt_mark' from 'struct rtable'	David Miller	2012-07-20	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill 'rt_src' from 'struct rtable'	David Miller	2012-07-20	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove rt_key_{src,dst,tos} from struct rtable.	David Miller	2012-07-20	1	-3/+0
\| \| \| \| \| \| \| \|	They are always used in contexts where they can be reconstituted, or where the finally resolved rt->rt_{src,dst} is semantically equivalent. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()	David S. Miller	2012-07-17	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Remove checks for dst_ops->redirect being NULL.	David S. Miller	2012-07-12	1	-2/+1
\| \| \| \| \| \|	No longer necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Add redirect support to all protocol icmp error handlers.	David S. Miller	2012-07-11	1	-0/+10
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove inetpeer from routes.	David S. Miller	2012-07-10	1	-7/+0
\| \| \| \| \| \|	No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Maintain redirect and PMTU info in struct rtable again.	David S. Miller	2012-07-10	1	-0/+1
\| \| \| \| \| \| \|	Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Kill rt->rt_spec_dst, no longer used.	David S. Miller	2012-06-28	1	-1/+0
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	inet: Hide route peer accesses behind helpers.	David S. Miller	2012-06-11	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Convert all sysctl registrations to register_net_sysctl	Eric W. Biederman	2012-04-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: fix checkpatch errors	Daniel Baluta	2012-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fix checkpatch errors of the following type: * ERROR: "foo * bar" should be "foo bar" ERROR: "(foo)" should be "(foo )" Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: fix ipsec forward performance regression	Yan, Zheng	2011-10-24	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable). It makes xfrm4_fill_dst() modify wrong data structure. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reported-by: Kim Phillips <kim.phillips@freescale.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ip: introduce ip_is_fragment helper inline function	Paul Gortmaker	2011-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	There are enough instances of this: iph->frag_off & htons(IP_MF \| IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: xfrm: Eliminate ->rt_src reference in policy code.	David S. Miller	2011-05-10	1	-13/+21
\| \| \| \| \| \| \| \| \| \| \|	Rearrange xfrm4_dst_lookup() so that it works by calling a helper function __xfrm_dst_lookup() that takes an explicit flow key storage area as an argument. Use this new helper in xfrm4_get_saddr() so we can fetch the selected source address from the flow instead of from rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Renamt struct rtable's rt_tos to rt_key_tos.	David S. Miller	2011-05-03	1	-1/+1
\| \| \| \| \| \| \|	To more accurately reflect that it is purely a routing cache lookup key and is used in no other context. Signed-off-by: David S. Miller <davem@davemloft.net>
*	inet: constify ip headers and in6_addr	Eric Dumazet	2011-04-22	1	-1/+1
\| \| \| \| \| \| \| \|	Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Fix "Set rt->rt_iif more sanely on output routes."	OGAWA Hirofumi	2011-04-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more sanely on output routes.") breaks rt_is_{output,input}_route. This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0". To fix it, this does: 1) Add "int rt_route_iif;" to struct rtable 2) For input routes, always set rt_route_iif to same value as rt_iif 3) For output routes, always set rt_route_iif to zero. Set rt_iif as it is done currently. 4) Change rt_is_{output,input}_route() to test rt_route_iif Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Put fl4_* macros to struct flowi4 and use them again.	David S. Miller	2011-03-12	1	-9/+9
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Use flowi4 and flowi6 in xfrm layer.	David S. Miller	2011-03-12	1	-22/+24
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Use flowi4 in public route lookup interfaces.	David S. Miller	2011-03-12	1	-5/+5
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Make flowi ports AF dependent.	David S. Miller	2011-03-12	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Put flowi_* prefix on AF independent members of struct flowi	David S. Miller	2011-03-12	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Remove flowi from struct rtable.	David S. Miller	2011-03-04	1	-1/+6
\| \| \| \| \| \| \| \| \| \|	The only necessary parts are the src/dst addresses, the interface indexes, the TOS, and the mark. The rest is unnecessary bloat, which amounts to nearly 50 bytes on 64-bit. Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: Make output route lookup return rtable directly.	David S. Miller	2011-03-02	1	-7/+5
\| \| \| \| \| \|	Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm: Handle blackhole route creation via afinfo.	David S. Miller	2011-03-01	1	-0/+1
\| \| \| \| \| \| \|	That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm: Const'ify address arguments to ->dst_lookup()	David S. Miller	2011-02-23	1	-2/+2
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm: Mark flowi arg to ->fill_dst() const.	David S. Miller	2011-02-22	1	-1/+1
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm: Mark flowi arg to ->get_tos() const.	David S. Miller	2011-02-22	1	-1/+1
\| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Implement read-only protection and COW'ing of metrics.	David S. Miller	2011-01-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Routing metrics are now copy-on-write. Initially a route entry points it's metrics at a read-only location. If a routing table entry exists, it will point there. Else it will point at the all zero metric place-holder called 'dst_default_metrics'. The writeability state of the metrics is stored in the low bits of the metrics pointer, we have two bits left to spare if we want to store more states. For the initial implementation, COW is implemented simply via kmalloc. However future enhancements will change this to place the writable metrics somewhere else, in order to increase sharing. Very likely this "somewhere else" will be the inetpeer cache. Note also that this means that metrics updates may transiently fail if we cannot COW the metrics successfully. But even by itself, this patch should decrease memory usage and increase cache locality especially for routing workloads. In those cases the read-only metric copies stay in place and never get written to. TCP workloads where metrics get updated, and those rare cases where PMTU triggers occur, will take a very slight performance hit. But that hit will be alleviated when the long-term writable metrics move to a more sharable location. Since the metrics storage went from a u32 array of RTAX_MAX entries to what is essentially a pointer, some retooling of the dst_entry layout was necessary. Most importantly, we need to preserve the alignment of the reference count so that it doesn't share cache lines with the read-mostly state, as per Eric Dumazet's alignment assertion checks. The only non-trivial bit here is the move of the 'flags' member into the writeable cacheline. This is OK since we are always accessing the flags around the same moment when we made a modification to the reference count. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: use the macros defined for the members of flowi	Changli Gao	2010-11-17	1	-6/+2
\| \| \| \| \| \| \|	Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm: use gre key as flow upper protocol info	Timo Teräs	2010-11-15	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	The GRE Key field is intended to be used for identifying an individual traffic flow within a tunnel. It is useful to be able to have XFRM policy selector matches to have different policies for different GRE tunnels. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: get rid of rtable->idev	Eric Dumazet	2010-11-11	1	-24/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems idev field in struct rtable has no special purpose, but adding extra atomic ops. We hold refcounts on the device itself (using percpu data, so pretty cheap in current kernel). infiniband case is solved using dst.dev instead of idev->dev Removal of this field means routing without route cache is now using shared data, percpu data, and only potential contention is a pair of atomic ops on struct neighbour per forwarded packet. About 5% speedup on routing test. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net dst: use a percpu_counter to track entries	Eric Dumazet	2010-10-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct dst_ops tracks number of allocated dst in an atomic_t field, subject to high cache line contention in stress workload. Switch to a percpu_counter, to reduce number of time we need to dirty a central location. Place it on a separate cache line to avoid dirtying read only fields. Stress test : (Sending 160.000.000 UDP frames, IP route cache disabled, dual E5540 @2.53GHz, 32bit kernel, FIB_TRIE, SLUB/NUMA) Before: real 0m51.179s user 0m15.329s sys 10m15.942s After: real 0m45.570s user 0m15.525s sys 9m56.669s With a small reordering of struct neighbour fields, subject of a following patch, (to separate refcnt from other read mostly fields) real 0m41.841s user 0m15.261s sys 8m45.949s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm4: strip ECN bits from tos field	Ulrich Weber	2010-09-22	1	-1/+1
\| \| \| \| \| \| \| \|	otherwise ECT(1) bit will get interpreted as RTO_ONLINK and routing will fail with XfrmOutBundleGenError. Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2010-07-07	1	-0/+2
\|\ \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
\| *	xfrm: fix xfrm by MARK logic	Peter Kosyh	2010-07-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While using xfrm by MARK feature in 2.6.34 - 2.6.35 kernels, the mark is always cleared in flowi structure via memset in _decode_session4 (net/ipv4/xfrm4_policy.c), so the policy lookup fails. IPv6 code is affected by this bug too. Signed-off-by: Peter Kosyh <p.kosyh@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net-next: remove useless union keyword	Changli Gao	2010-06-10	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \|	remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	xfrm: cache bundles instead of policies for outgoing flows	Timo Teräs	2010-04-07	1	-22/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__xfrm_lookup() is called for each packet transmitted out of system. The xfrm_find_bundle() does a linear search which can kill system performance depending on how many bundles are required per policy. This modifies __xfrm_lookup() to store bundles directly in the flow cache. If we did not get a hit, we just create a new bundle instead of doing slow search. This means that we can now get multiple xfrm_dst's for same flow (on per-cpu basis). Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>