op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	sunrpc: Factor out rpc_xprt freeing	Pavel Emelyanov	2010-10-01	3	-13/+13
\| \| \| \| \|	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Factor out rpc_xprt allocation	Pavel Emelyanov	2010-10-01	3	-23/+27
\| \| \| \| \|	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: fix up rpcauth_remove_module section mismatch	Stephen Rothwell	2010-09-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Wed, 29 Sep 2010 14:02:38 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > After merging the final tree, today's linux-next build (powerpc > ppc44x_defconfig) produced tis warning: > > WARNING: net/sunrpc/sunrpc.o(.init.text+0x110): Section mismatch in reference from the function init_sunrpc() to the function .exit.text:rpcauth_remove_module() > The function __init init_sunrpc() references > a function __exit rpcauth_remove_module(). > This is often seen when error handling in the init function > uses functionality in the exit path. > The fix is often to remove the __exit annotation of > rpcauth_remove_module() so it may be used outside an exit section. > > Probably caused by commit 2f72c9b73730c335381b13e2bd221abe1acea394 > ("sunrpc: The per-net skeleton"). This actually causes a build failure on a sparc32 defconfig build: `rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o I applied the following patch for today: Fixes: `rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Make the ip_map_cache be per-net	Pavel Emelyanov	2010-09-27	3	-31/+108
\| \| \| \| \| \| \| \| \|	Everything that is required for that already exists: * the per-net cache registration with respective proc entries * the context (struct net) is available in all the users Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Make the /proc/net/rpc appear in net namespaces	Pavel Emelyanov	2010-09-27	4	-27/+44
\| \| \| \| \|	Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: The per-net skeleton	Pavel Emelyanov	2010-09-27	2	-1/+42
\| \| \| \| \| \| \|	Register empty per-net operations for the sunrpc layer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Tag svc_xprt with net	Pavel Emelyanov	2010-09-27	1	-0/+2
\| \| \| \| \| \| \|	The transport representation should be per-net of course. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Add routines that allow registering per-net caches	Pavel Emelyanov	2010-09-27	1	-8/+19
\| \| \| \| \| \| \|	Existing calls do the same, but for the init_net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Add net to pure API calls	Pavel Emelyanov	2010-09-27	1	-8/+10
\| \| \| \| \| \| \| \| \|	There are two calls that operate on ip_map_cache and are directly called from the nfsd code. Other places will be handled in a different way. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Pass xprt to cached get/put routines	Pavel Emelyanov	2010-09-27	1	-7/+5
\| \| \| \| \| \| \| \|	They do not require the rqst actually and having the xprt simplifies further patching. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Make xprt auth cache release work with the xprt	Pavel Emelyanov	2010-09-27	2	-6/+8
\| \| \| \| \| \| \| \|	This is done in order to facilitate getting the ip_map_cache from which to put the ip_map. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: Pass the ip_map_parse's cd to lower calls	Pavel Emelyanov	2010-09-27	1	-10/+21
\| \| \| \| \| \| \| \|	The target is to have many ip_map_cache-s in the system. This particular patch handles its usage by the ip_map_parse callback. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc/cache: fix recent breakage of cache_clean_deferred	NeilBrown	2010-09-22	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	commit 6610f720e9e8103c22d1f1ccf8fbb695550a571f broke cache_clean_deferred as entries are no longer added to the pending list for subsequent revisiting. So put those requests back on the pending list. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc/cache: don't use custom hex_to_bin() converter	Andy Shevchenko	2010-09-21	1	-7/+13
\| \| \| \| \| \| \|	Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc/cache: change deferred-request hash table to use hlist.	NeilBrown	2010-09-21	1	-18/+10
\| \| \| \| \| \| \| \| \| \| \|	Being a hash table, hlist is the best option. There is currently some ugliness were we treat "->next == NULL" as a special case to avoid having to initialise the whole array. This change nicely gets rid of that case. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	svcauth_gss: replace a trivial 'switch' with an 'if'	NeilBrown	2010-09-21	1	-22/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code like: switch(xxx) { case -error1: case -error2: .. return; case 0: stuff; } can more naturally be written: if (xxx < 0) return; stuff; Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	sunrpc: close connection when a request is irretrievably lost.	NeilBrown	2010-09-21	3	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	If we drop a request in the sunrpc layer, either due kmalloc failure, or due to a cache miss when we could not queue the request for later replay, then close the connection to encourage the client to retry sooner. Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX (aka NFS4ERR_DELAY) is returned to guide the client concerning replay. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: fix hang on fast-booting nfs servers	J. Bruce Fields	2010-09-19	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The last_close field of a cache_detail is initialized to zero, so the condition detail->last_close < seconds_since_boot() - 30 may be false even for a cache that was never opened. However, we want to immediately fail upcalls to caches that were never opened: in the case of the auth_unix_gid cache, especially, which may never be opened by mountd (if the --manage-gids option is not set), we want to fail the upcall immediately. Otherwise client requests will be dropped unnecessarily on reboot. Also document these conditions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	Merge remote branch 'trond/bugfixes' into for-2.6.37	J. Bruce Fields	2010-09-19	35	-287/+352
\|\ \| \| \| \| \| \|	Without some client-side fixes, server testing is currently difficult.
\| *	sunrpc: increase MAX_HASHTABLE_BITS to 14	Miquel van Smoorenburg	2010-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The maximum size of the authcache is now set to 1024 (10 bits), but on our server we need at least 4096 (12 bits). Increase MAX_HASHTABLE_BITS to 14. This is a maximum of 16384 entries, each containing a pointer (8 bytes on x86_64). This is exactly the limit of kmalloc() (128K). Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	gss:spkm3 miss returning error to caller when import security context	Bian Naimeng	2010-09-12	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spkm3 miss returning error to up layer when import security context, it may be return ok though it has failed to import security context. Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	gss:krb5 miss returning error to caller when import security context	Bian Naimeng	2010-09-12	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	krb5 miss returning error to up layer when import security context, it may be return ok though it has failed to import security context. Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	SUNRPC: cleanup state-machine ordering	J. Bruce Fields	2010-09-12	1	-42/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is just a minor cleanup: net/sunrpc/clnt.c clarifies the rpc client state machine by commenting each state and by laying out the functions implementing each state in the order that each state is normally executed (in the absence of errors). The previous patch "Fix null dereference in call_allocate" changed the order of the states. Move the functions and update the comments to reflect the change. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	SUNRPC: Fix a race in rpc_info_open	Trond Myklebust	2010-09-12	2	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a race between rpc_info_open and rpc_release_client() in that nothing stops a process from opening the file after the clnt->cl_kref goes to zero. Fix this by using atomic_inc_unless_zero()... Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
\| *	SUNRPC: Fix race corrupting rpc upcall	Trond Myklebust	2010-09-12	2	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just after rpc_pipe_release calls rpc_purge_list(), but before it calls gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter will free a message without deleting it from the rpci->pipe list. We will be left with a freed object on the rpc->pipe list. Most frequent symptoms are kernel crashes in rpc.gssd system calls on the pipe in question. Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
\| *	Fix null dereference in call_allocate	J. Bruce Fields	2010-09-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In call_allocate we need to reach the auth in order to factor au_cslack into the allocation. As of a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 "SUNRPC: Move the bound cred to struct rpc_rqst", call_allocate attempts to do this by dereferencing tk_client->cl_auth, however this is not guaranteed to be defined--cl_auth can be zero in the case of gss context destruction (see rpc_free_auth). Reorder the client state machine to bind credentials before allocating, so that we can instead reach the auth through the cred. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
\| *	sctp: fix test for end of loop	Joe Perches	2010-09-09	1	-23/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a list_has_sctp_addr function to simplify loop Based on a patches by Dan Carpenter and David Miller Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'master' of ↵	David S. Miller	2010-09-08	23	-274/+919
\| \|\ \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
\| \| *	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-09-07	14	-37/+91
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits) pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator ipvs: avoid oops for passive FTP Revert "sky2: don't do GRO on second port" gro: fix different skb headrooms bridge: Clear INET control block of SKBs passed into ip_fragment(). 3c59x: Remove incorrect locking; correct documented lock hierarchy sky2: don't do GRO on second port ipv4: minor fix about RPF in help of Kconfig xfrm_user: avoid a warning with some compiler net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf() pxa168_eth: fix a mdiobus leak net sched: fix kernel leak in act_police vhost: stop worker only if created MAINTAINERS: Add ehea driver as Supported ath9k_hw: fix parsing of HT40 5 GHz CTLs ath9k_hw: Fix EEPROM uncompress block reading on AR9003 wireless: register wiphy rfkill w/o holding cfg80211_mutex netlink: Make NETLINK_USERSOCK work again. irda: Correctly clean up self->ias_obj on irda_bind() failure. wireless extensions: fix kernel heap content leak ...
\| * \| \|	udp: add rehash on connect()	Eric Dumazet	2010-09-08	4	-2/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net: blackhole route should always be recalculated	Jianzhao Wang	2010-09-08	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92. Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	ipv4: Suppress lockdep-RCU false positive in FIB trie (3)	Jarek Poplawski	2010-09-08	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi, Here is one more of these warnings and a patch below: Sep 5 23:52:33 del kernel: [46044.244833] =================================================== Sep 5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ] Sep 5 23:52:33 del kernel: [46044.277000] --------------------------------------------------- Sep 5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! Sep 5 23:52:33 del kernel: [46044.293627] Sep 5 23:52:33 del kernel: [46044.293632] other info that might help us debug this: Sep 5 23:52:33 del kernel: [46044.293634] Sep 5 23:52:33 del kernel: [46044.325333] Sep 5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0 Sep 5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717: Sep 5 23:52:33 del kernel: [46044.357548] #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 Sep 5 23:52:33 del kernel: [46044.367647] Sep 5 23:52:33 del kernel: [46044.367652] stack backtrace: Sep 5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3 Sep 5 23:52:33 del kernel: [46044.398764] Call Trace: Sep 5 23:52:33 del kernel: [46044.409596] [<c12f9aba>] ? printk+0x18/0x1e Sep 5 23:52:33 del kernel: [46044.420761] [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 Sep 5 23:52:33 del kernel: [46044.432229] [<c12b7235>] trie_firstleaf+0x65/0x70 Sep 5 23:52:33 del kernel: [46044.443941] [<c12b74d4>] fib_table_flush+0x14/0x170 Sep 5 23:52:33 del kernel: [46044.455823] [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0 Sep 5 23:52:33 del kernel: [46044.467995] [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40 Sep 5 23:52:33 del kernel: [46044.480404] [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180 Sep 5 23:52:33 del kernel: [46044.493025] [<c12b069d>] fib_flush+0x2d/0x60 Sep 5 23:52:33 del kernel: [46044.505796] [<c12b06f5>] fib_disable_ip+0x25/0x50 Sep 5 23:52:33 del kernel: [46044.518772] [<c12b10d3>] fib_netdev_event+0x73/0xd0 Sep 5 23:52:33 del kernel: [46044.531918] [<c1048dfd>] notifier_call_chain+0x2d/0x70 Sep 5 23:52:33 del kernel: [46044.545358] [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20 Sep 5 23:52:33 del kernel: [46044.559092] [<c124f687>] call_netdevice_notifiers+0x27/0x60 Sep 5 23:52:33 del kernel: [46044.573037] [<c124faec>] __dev_notify_flags+0x5c/0x80 Sep 5 23:52:33 del kernel: [46044.586489] [<c124fb47>] dev_change_flags+0x37/0x60 Sep 5 23:52:33 del kernel: [46044.599394] [<c12a8a8d>] devinet_ioctl+0x54d/0x630 Sep 5 23:52:33 del kernel: [46044.612277] [<c12aabb7>] inet_ioctl+0x97/0xc0 Sep 5 23:52:34 del kernel: [46044.625208] [<c123f6af>] sock_ioctl+0x6f/0x270 Sep 5 23:52:34 del kernel: [46044.638046] [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0 Sep 5 23:52:34 del kernel: [46044.650968] [<c123f640>] ? sock_ioctl+0x0/0x270 Sep 5 23:52:34 del kernel: [46044.663865] [<c10c3188>] vfs_ioctl+0x28/0xa0 Sep 5 23:52:34 del kernel: [46044.676556] [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0 Sep 5 23:52:34 del kernel: [46044.688989] [<c1048676>] ? up_read+0x16/0x30 Sep 5 23:52:34 del kernel: [46044.701411] [<c1021376>] ? do_page_fault+0x1d6/0x3a0 Sep 5 23:52:34 del kernel: [46044.714223] [<c10b6588>] ? fget_light+0xf8/0x2f0 Sep 5 23:52:34 del kernel: [46044.726601] [<c1241f98>] ? sys_socketcall+0x208/0x2c0 Sep 5 23:52:34 del kernel: [46044.739140] [<c10c3eb3>] sys_ioctl+0x63/0x70 Sep 5 23:52:34 del kernel: [46044.751967] [<c12fca3d>] syscall_call+0x7/0xb Sep 5 23:52:34 del kernel: [46044.764734] [<c12f0000>] ? cookie_v6_check+0x3d0/0x630 --------------> This patch fixes the warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by pppd/1717: #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 stack backtrace: Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3 Call Trace: [<c12f9aba>] ? printk+0x18/0x1e [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 [<c12b7235>] trie_firstleaf+0x65/0x70 [<c12b74d4>] fib_table_flush+0x14/0x170 ... Allow trie_firstleaf() to be called either under rcu_read_lock() protection or with RTNL held. The same annotation is added to node_parent_rcu() to prevent a similar warning a bit later. Followup of commits 634a4b20 and 4eaa0e3c. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	ipvs: fix active FTP	Julian Anastasov	2010-09-08	3	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Do not create expectation when forwarding the PORT command to avoid blocking the connection. The problem is that nf_conntrack_ftp.c:help() tries to create the same expectation later in POST_ROUTING and drops the packet with "dropping packet" message after failure in nf_ct_expect_related. - Change ip_vs_update_conntrack to alter the conntrack for related connections from real server. If we do not alter the reply in this direction the next packet from client sent to vport 20 comes as NEW connection. We alter it but may be some collision happens for both conntracks and the second conntrack gets destroyed immediately. The connection stucks too. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	gro: Re-fix different skb headrooms	Jarek Poplawski	2010-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch: "gro: fix different skb headrooms" in its part: "2) allocate a minimal skb for head of frag_list" is buggy. The copied skb has p->data set at the ip header at the moment, and skb_gro_offset is the length of ip + tcp headers. So, after the change the length of mac header is skipped. Later skb_set_mac_header() sets it into the NET_SKB_PAD area (if it's long enough) and ip header is misaligned at NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the original skb was wrongly allocated, so let's copy it as it was. bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57 Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	ipv4: Fix reverse path filtering with multipath routing.	David S. Miller	2010-09-07	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Actually iterate over the next-hops to make sure we have a device match. Otherwise RP filtering is always elided when the route matched has multiple next-hops. Reported-by: Igor M Podlesny <for.poige@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	UNIX: Do not loop forever at unix_autobind().	Tetsuo Handa	2010-09-07	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We assumed that unix_autobind() never fails if kzalloc() succeeded. But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can consume all names using fork()/socket()/bind(). If all names are in use, those who call bind() with addr_len == sizeof(short) or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue while (1) yield(); loop at unix_autobind() till a name becomes available. This patch adds a loop counter in order to give up after 1048576 attempts. Calling yield() for once per 256 attempts may not be sufficient when many names are already in use, for __unix_find_socket_byname() can take long time under such circumstance. Therefore, this patch also adds cond_resched() call. Note that currently a local user can consume 2GB of kernel memory if the user is allowed to create and autobind 1048576 UNIX domain sockets. We should consider adding some restriction for autobind operation. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	irda: off by one	Dan Carpenter	2010-09-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an off by one. We would go past the end when we NUL terminate the "value" string at end of the function. The "value" buffer is allocated in irlan_client_parse_response() or irlan_provider_parse_command(). CC: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	netfilter: discard overlapping IPv6 fragment	Nicolas Dichtel	2010-09-07	1	-65/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC5722 prohibits reassembling IPv6 fragments when some data overlaps. Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	ipv6: discard overlapping fragment	Nicolas Dichtel	2010-09-07	1	-56/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC5722 prohibits reassembling fragments when some data overlaps. Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net: fix tx queue selection for bridged devices implementing select_queue	Helmut Schaa	2010-09-07	1	-8/+8
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a net device is implementing the select_queue callback and is part of a bridge, frames coming from the bridge already have a tx queue associated to the socket (introduced in commit a4ee3ce3293dc931fab19beb472a8bde1295aebe, "net: Use sk_tx_queue_mapping for connected sockets"). The call to sk_tx_queue_get will then return the tx queue used by the bridge instead of calling the select_queue callback. In case of mac80211 this broke QoS which is implemented by using the select_queue callback. Furthermore it introduced problems with rt2x00 because frames with the same TID and RA sometimes appeared on different tx queues which the hw cannot handle correctly. Fix this by always calling select_queue first if it is available and only afterwards use the socket tx queue mapping. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator	Jarek Poplawski	2010-09-02	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a lockdep warning: [ 516.287584] ========================================================= [ 516.288386] [ INFO: possible irq lock inversion dependency detected ] [ 516.288386] 2.6.35b #7 [ 516.288386] --------------------------------------------------------- [ 516.288386] swapper/0 just changed the state of lock: [ 516.288386] (&qdisc_tx_lock){+.-...}, at: [<c12eacda>] est_timer+0x62/0x1b4 [ 516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 516.288386] (est_tree_lock){+.+...} [ 516.288386] [ 516.288386] and interrupts could create inverse lock ordering between them. ... So, est_tree_lock needs BH protection because it's taken by qdisc_tx_lock, which is used both in BH and process contexts. (Full warning with this patch at netdev, 02 Sep 2010.) Fixes commit: ae638c47dc040b8def16d05dc6acdd527628f231 ("pkt_sched: gen_estimator: add a new lock") Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipvs: avoid oops for passive FTP	Julian Anastasov	2010-09-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix Passive FTP problem in ip_vs_ftp: - Do not oops in nf_nat_set_seq_adjust (adjust_tcp_sequence) when iptable_nat module is not loaded Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	gro: fix different skb headrooms	Eric Dumazet	2010-09-01	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Packets entering GRO might have different headrooms, even for a given flow (because of implementation details in drivers, like copybreak). We cant force drivers to deliver packets with a fixed headroom. 1) fix skb_segment() skb_segment() makes the false assumption headrooms of fragments are same than the head. When CHECKSUM_PARTIAL is used, this can give csum_start errors, and crash later in skb_copy_and_csum_dev() 2) allocate a minimal skb for head of frag_list skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to allocate a fresh skb. This adds NET_SKB_PAD to a padding already provided by netdevice, depending on various things, like copybreak. Use alloc_skb() to allocate an exact padding, to reduce cache line needs: NET_SKB_PAD + NET_IP_ALIGN bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 Many thanks to Plamen Petrov, testing many debugging patches ! With help of Jarek Poplawski. Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bridge: Clear INET control block of SKBs passed into ip_fragment().	David S. Miller	2010-09-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a similar vain to commit 17762060c25590bfddd68cc1131f28ec720f405f ("bridge: Clear IPCB before possible entry into IP stack") Any time we call into the IP stack we have to make sure the state there is as expected by the ipv4 code. With help from Eric Dumazet and Herbert Xu. Reported-by: Bandan Das <bandan.das@stratus.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv4: minor fix about RPF in help of Kconfig	Nicolas Dichtel	2010-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	xfrm_user: avoid a warning with some compiler	Nicolas Dichtel	2010-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Attached is a small patch to remove a warning ("warning: ISO C90 forbids mixed declarations and code" with gcc 4.3.2). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf()	Michal Soltys	2010-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes init_vf() function, so on each new backlog period parent's cl_cfmin is properly updated (including further propgation towards the root), even if the activated leaf has no upperlimit curve defined. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net sched: fix kernel leak in act_police	Jeff Mahoney	2010-09-01	1	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While reviewing commit 1c40be12f7d8ca1d387510d39787b12e512a7ce8, I audited other users of tc_action_ops->dump for information leaks. That commit covered almost all of them but act_police still had a leak. opt.limit and opt.capab aren't zeroed out before the structure is passed out. This patch uses the C99 initializers to zero everything unused out. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	Merge branch 'master' of ↵	David S. Miller	2010-09-01	4	-9/+37
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
\| \| * \|	wireless: register wiphy rfkill w/o holding cfg80211_mutex	John W. Linville	2010-08-31	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise lockdep complains... https://bugzilla.kernel.org/show_bug.cgi?id=17311 [ INFO: possible circular locking dependency detected ] 2.6.36-rc2-git4 #12 ------------------------------------------------------- kworker/0:3/3630 is trying to acquire lock: (rtnl_mutex){+.+.+.}, at: [<ffffffff813396c7>] rtnl_lock+0x12/0x14 but task is already holding lock: (rfkill_global_mutex){+.+.+.}, at: [<ffffffffa014b129>] rfkill_switch_all+0x24/0x49 [rfkill] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (rfkill_global_mutex){+.+.+.}: [<ffffffff81079ad7>] lock_acquire+0x120/0x15b [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39 [<ffffffffa014b4ab>] rfkill_register+0x2b/0x29c [rfkill] [<ffffffffa0185ba0>] wiphy_register+0x1ae/0x270 [cfg80211] [<ffffffffa0206f01>] ieee80211_register_hw+0x1b4/0x3cf [mac80211] [<ffffffffa0292e98>] iwl_ucode_callback+0x9e9/0xae3 [iwlagn] [<ffffffff812d3e9d>] request_firmware_work_func+0x54/0x6f [<ffffffff81065d15>] kthread+0x8c/0x94 [<ffffffff8100ac24>] kernel_thread_helper+0x4/0x10 -> #1 (cfg80211_mutex){+.+.+.}: [<ffffffff81079ad7>] lock_acquire+0x120/0x15b [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39 [<ffffffffa018605e>] cfg80211_get_dev_from_ifindex+0x1b/0x7c [cfg80211] [<ffffffffa0189f36>] cfg80211_wext_giwscan+0x58/0x990 [cfg80211] [<ffffffff8139a3ce>] ioctl_standard_iw_point+0x1a8/0x272 [<ffffffff8139a529>] ioctl_standard_call+0x91/0xa7 [<ffffffff8139a687>] T.723+0xbd/0x12c [<ffffffff8139a727>] wext_handle_ioctl+0x31/0x6d [<ffffffff8133014e>] dev_ioctl+0x63d/0x67a [<ffffffff8131afd9>] sock_ioctl+0x48/0x21d [<ffffffff81102abd>] do_vfs_ioctl+0x4ba/0x509 [<ffffffff81102b5d>] sys_ioctl+0x51/0x74 [<ffffffff81009e02>] system_call_fastpath+0x16/0x1b -> #0 (rtnl_mutex){+.+.+.}: [<ffffffff810796b0>] __lock_acquire+0xa93/0xd9a [<ffffffff81079ad7>] lock_acquire+0x120/0x15b [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39 [<ffffffff813396c7>] rtnl_lock+0x12/0x14 [<ffffffffa0185cb5>] cfg80211_rfkill_set_block+0x1a/0x7b [cfg80211] [<ffffffffa014aed0>] rfkill_set_block+0x80/0xd5 [rfkill] [<ffffffffa014b07e>] __rfkill_switch_all+0x3f/0x6f [rfkill] [<ffffffffa014b13d>] rfkill_switch_all+0x38/0x49 [rfkill] [<ffffffffa014b821>] rfkill_op_handler+0x105/0x136 [rfkill] [<ffffffff81060708>] process_one_work+0x248/0x403 [<ffffffff81062620>] worker_thread+0x139/0x214 [<ffffffff81065d15>] kthread+0x8c/0x94 [<ffffffff8100ac24>] kernel_thread_helper+0x4/0x10 Signed-off-by: John W. Linville <linville@tuxdriver.com> Acked-by: Johannes Berg <johannes@sipsolutions.net>