op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	f2fs: fix incorrect iputs during the dentry recovery	Jaegeuk Kim	2013-05-28	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- iget/iput flow in the dentry recovery process 1. dir = f2fs_iget 2. set FI_DELAY_IPUT to dir 3. add dir to the dirty_dir_list - __f2fs_add_link - recover_dentry) 4. iput dir by remove_dirty_dir_inode - sync_dirty_dir_inodes - write_chekcpoint If dir's i_count is not 1 (i.e., root dir), remove_dirty_dir_inode is called later and then iput is triggered again due to the FI_DELAY_IPUT flag. So, let's unset the flag properly once iput is triggered. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix dentry recovery routine	Jaegeuk Kim	2013-05-28	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error scenario is: 1. create /a (1.a link /a /b) 2. sync 3. unlinke /a 4. create /a 5. fsync /a 6. Sudden power-off When the f2fs recovers the fsynced dentry, /a, we discover an exsiting dentry at f2fs_find_entry() in recover_dentry(). In such the case, we should unlink the existing dentry and its inode and then recover newly created dentry. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: iput only if whole data blocks are flushed	Jaegeuk Kim	2013-05-28	1	-3/+4
\| \| \| \| \| \| \| \|	If there remains some unwritten blocks from the recovery, we should not call iput on that directory inode. Otherwise, we can loose some dentry blocks after the recovery. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: return proper error from start_gc_thread	Namjae Jeon	2013-05-28	1	-5/+10
\| \| \| \| \| \| \| \| \|	when there is an error from kthread_run, then return proper error rather than returning -ENOMEM. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: optimize several routines in node.h	Namjae Jeon	2013-05-28	1	-48/+19
\| \| \| \| \| \| \| \| \| \| \|	There are various functions with common code which could be separated out to make common routines. So, made new routines and in order to retain the same call path and no major changes, written some macros to access those routines. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: remove unneeded initializations in f2fs_parent_dir	Namjae Jeon	2013-05-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	There is no need to initialize few pointers in f2fs_parent_dir as the values are not checked and instead directly initialized values are used. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: push some variables to debug part	Namjae Jeon	2013-05-28	5	-2/+21
\| \| \| \| \| \| \| \| \| \| \|	Some, counters are needed only for the statistical information while debugging. So, those can be controlled using CONFIG_F2FS_STAT_FS, pushing the usage for few variables under this flag. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: align data types between on-disk and in-memory block addresses	Jaegeuk Kim	2013-05-28	2	-3/+6
\| \| \| \| \| \| \| \| \| \|	The on-disk block address is defined as __le32, but in-memory block address, block_t, does as u64. Let's synchronize them to 32 bits. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: dereferencing an ERR_PTR	Dan Carpenter	2013-05-28	1	-1/+2
\| \| \| \| \| \| \|	There is an error path where "dir" is an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: use ihold	Jaegeuk Kim	2013-05-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the following helper function committed by Al. commit 7de9c6ee3ecffd99e1628e81a5ea5468f7581a1f Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat Oct 23 11:11:40 2010 -0400 new helper: ihold() ... Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: should not make_bad_inode on f2fs_link failure	Jaegeuk Kim	2013-05-28	1	-1/+0
\| \| \| \| \| \| \|	If -ENOSPC is met during f2fs_link, we should not make the inode as bad. The inode is still alive. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix to handle do_recover_data errors	Jaegeuk Kim	2013-05-28	1	-9/+15
\| \| \| \| \| \| \|	This patch adds error handling codes of check_index_in_prev_nodes and its caller, do_recover_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: reuse the locked dnode page and its inode	Jaegeuk Kim	2013-05-28	3	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following deadlock bug during the recovery. INFO: task mount:1322 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount D ffffffff81125870 0 1322 1266 0x00000000 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520 Call Trace: [<ffffffff81125870>] ? __lock_page+0x70/0x70 [<ffffffff816a92d9>] schedule+0x29/0x70 [<ffffffff816a93af>] io_schedule+0x8f/0xd0 [<ffffffff8112587e>] sleep_on_page+0xe/0x20 [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81125867>] __lock_page+0x67/0x70 [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40 [<ffffffff81126857>] find_lock_page+0x67/0x80 [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0 [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs] [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs] [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs] [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40 [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs] [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210 [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs] [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff81185a13>] mount_fs+0x43/0x1b0 [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20 [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120 [<ffffffff811a2cb7>] do_mount+0x237/0xa10 [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80 [<ffffffff811a3520>] SyS_mount+0x90/0xe0 [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b The bug is triggered when check_index_in_prev_nodes tries to get the direct node page by calling get_node_page. At this point, if the direct node page is already locked by get_dnode_of_data, its caller, we got a deadlock condition. This patch adds additional condition check for the reuse of locked direct node pages prior to the get_node_page call. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix wrong condition check	Jaegeuk Kim	2013-05-28	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While an orphan inode has zero link_count, f2fs_gc is able to select the inode for foreground gc. - f2fs_gc - do_garbage_collect - gc_data_segment : f2fs_iget is failed : get_valid_blocks() != 0, so that retry --> here we got the infinite loop. This patch resolved this issue. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: add f2fs_readonly()	Jaegeuk Kim	2013-05-28	3	-2/+7
\| \| \| \| \| \|	Introduce a simple macro function for readability. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: avoid RECLAIM_FS-ON-W: deadlock	Jaegeuk Kim	2013-05-28	2	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch tries to avoid the following deadlock condition of which the reclaim path can trigger f2fs_balance_fs again. ================================= [ INFO: inconsistent lock state ] --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes: (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs] {RECLAIM_FS-ON-W} state was registered at: [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140 [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0 [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0 [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180 [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0 [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0 [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs] [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs] [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs] [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs] [<ffffffff811ae51b>] notify_change+0x1db/0x3a0 [<ffffffff8118fbd0>] do_truncate+0x60/0xa0 [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0 [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0 [<ffffffff8118ffee>] SyS_truncate+0xe/0x10 [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: don't do checkpoint if error is occurred	Jaegeuk Kim	2013-05-28	1	-1/+2
\| \| \| \| \| \| \| \|	If we met an error during the dentry recovery, we should not conduct checkpoint. Otherwise, some errorneous dentry blocks overwrites the existing blocks that contain the remaining recovery information. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix to unlock page before exit	Jaegeuk Kim	2013-05-28	1	-3/+2
\| \| \| \| \| \|	If we got an error after lock_page, we should unlock it before exit. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: remove unnecessary kmap/kunmap operations	Jaegeuk Kim	2013-05-28	1	-5/+3
\| \| \| \| \| \| \|	The allocated page used by the recovery is not on HIGHMEM, so that we don't need to use kmap/kunmap. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: reorganize f2fs_vm_page_mkwrite	Namjae Jeon	2013-05-28	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Few things can be changed in the default mkwrite function 1) Make file_update_time at the start before acquiring any lock 2) the condition page_offset(page) >= i_size_read(inode) should be changed to page_offset(page) > i_size_read 3) Move wait_on_page_writeback. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: use list_for_each_entry rather than list_for_each_entry_safe	majianpeng	2013-05-28	1	-2/+2
\| \| \| \| \| \| \| \| \|	We can do this, since now we use a global mutex, f2fs_stat_mutex to protect its list operations. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> [Jaegeuk Kim: add description] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: remove unecessary variable and code	Haicheng Li	2013-05-28	1	-10/+8
\| \| \| \| \| \| \|	Code cleanup without behavior changed. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs, lockdep: annotate mutex_lock_all()	Peter Zijlstra	2013-05-28	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all() acquires an array of locks (in global/local lock style). Any such operation is always serialized using cp_mutex, therefore there is no fs_lock[] lock-order issue; tell lockdep about this using the mutex_lock_nest_lock() primitive. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: add debug msgs in the recovery routine	Jaegeuk Kim	2013-05-28	2	-20/+25
\| \| \| \| \| \|	This patch adds some trivial debugging messages in the recovery process. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: update inode page after creation	Jaegeuk Kim	2013-05-28	4	-50/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found a bug when testing power-off-recovery as follows. [Bug Scenario] 1. create a file 2. fsync the file 3. reboot w/o any sync 4. try to recover the file - found its fsync mark - found its dentry mark : try to recover its dentry - get its file name - get its parent inode number : here we got zero value The reason why we get the wrong parent inode number is that we didn't synchronize the inode page with its newly created inode information perfectly. Especially, previous f2fs stores fi->i_pino and writes it to the cached node page in a wrong order, which incurs the zero-valued i_pino during the recovery. So, this patch modifies the creation flow to fix the synchronization order of inode page with its inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: change get_new_data_page to pass a locked node page	Jaegeuk Kim	2013-05-28	4	-9/+11
\| \| \| \| \| \|	This patch is for passing a locked node page to get_dnode_of_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: skip get_node_page if locked node page is passed	Jaegeuk Kim	2013-05-28	1	-3/+6
\| \| \| \| \| \| \| \|	If get_dnode_of_data gets a locked node page, let's skip redundant get_node_page calls. This is for the futher enhancement. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: remove unnecessary por_doing check	Jaegeuk Kim	2013-05-28	1	-2/+1
\| \| \| \| \| \|	This por_doing check is totally not related to the recovery process. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix BUG_ON during f2fs_evict_inode(dir)	Jaegeuk Kim	2013-05-28	3	-5/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the dentry recovery routine, recover_inode() triggers __f2fs_add_link with its directory inode. In the following scenario, a bug is captured. 1. dir = f2fs_iget(pino) 2. __f2fs_add_link(dir, name) 3. iput(dir) -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents)) Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable] [<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs] Call Trace: [<ffffffff8118ea00>] evict+0xb0/0x1b0 [<ffffffff8118f1c5>] iput+0x105/0x190 [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs] [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0 [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0 [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70 [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140 [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0 [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30 [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70 This means that we should flush all the dentry pages between iget and iput(). But, during the recovery routine, it is unallowed due to consistency, so we have to wait the whole recovery process. And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we can put the stale dir inodes from the dirty_dir_inode_list. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix por_doing variable coverage	Jaegeuk Kim	2013-05-28	1	-2/+2
\| \| \| \| \| \| \| \| \|	The reason of using sbi->por_doing is to alleviate data writes during the recovery. The find_fsync_dnodes() produces some dirty dentry pages, so we should cover it too with sbi->por_doing. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: remove redundant assignment	Jaegeuk Kim	2013-05-28	1	-3/+2
\| \| \| \| \| \|	We don't need to assign a value redundantly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix the inconsistent state of data pages	Jaegeuk Kim	2013-05-28	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In get_lock_data_page, if there is a data race between get_dnode_of_data for node and grab_cache_page for data, f2fs is able to face with the following BUG_ON(dn.data_blkaddr == NEW_ADDR). kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251! [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs] Call Trace: [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs] [<ffffffff811a0920>] ? fillonedir+0x100/0x100 [<ffffffff811a0920>] ? fillonedir+0x100/0x100 [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0 [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110 [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b This bug is able to be occurred when the block address of the data block is changed after f2fs_put_dnode(). In order to avoid that, this patch fixes the lock order of node and data blocks in which the node block lock is covered by the data block lock. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	f2fs: fix inconsistency of block count during recovery	Jaegeuk Kim	2013-05-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently f2fs recovers the dentry of fsynced files. When power-off-recovery is conducted, this newly recovered inode should increase node block count as well as inode block count. This patch resolves this inconsistency that results in: 1. create a file 2. write data 3. fsync 4. reboot without sync 5. mount and recover the file 6. node block count is 1 and inode block count is 2 : fall into the inconsistent state 7. unlink the file : trigger the following BUG_ON ------------[ cut here ]------------ kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716! Call Trace: [<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs] [<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs] [<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs] [<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs] [<ffffffff811c7a67>] evict+0xa7/0x1a0 [<ffffffff811c82b5>] iput+0x105/0x190 [<ffffffff811c2b30>] d_kill+0xe0/0x120 [<ffffffff811c2c57>] dput+0xe7/0x1e0 [<ffffffff811acc3d>] __fput+0x19d/0x2d0 [<ffffffff811acd7e>] ____fput+0xe/0x10 [<ffffffff81070645>] task_work_run+0xb5/0xe0 [<ffffffff81002941>] do_notify_resume+0x71/0xb0 [<ffffffff8175f14a>] int_signal+0x12/0x17 Reported-and-Tested-by: Chris Fries <C.Fries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
*	Linux 3.10-rc3v3.10-rc3	Linus Torvalds	2013-05-26	1	-1/+1
\|
*	ipc/sem.c: Fix missing wakeups in do_smart_update_queue()	Manfred Spraul	2013-05-26	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	do_smart_update_queue() is called when an operation (semop, semctl(SETVAL), semctl(SETALL), ...) modified the array. It must check which of the sleeping tasks can proceed. do_smart_update_queue() missed a few wakeups: - if a sleeping complex op was completed, then all per-semaphore queues must be scanned - not only those that were modified by *sops - if a sleeping simple op proceeded, then the global queue must be scanned again And: - the test for "\|sops == NULL) before scanning the global queue is not required: If the global queue is empty, then it doesn't need to be scanned - regardless of the reason for calling do_smart_update_queue() The patch is not optimized, i.e. even completing a wait-for-zero operation causes a rescan. This is done to keep the patch as simple as possible. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs	Linus Torvalds	2013-05-26	11	-41/+78
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NFS client bugfixes from Trond Myklebust: - Stable fix to prevent an rpc_task wakeup race - Fix a NFSv4.1 session drain deadlock - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd - Ensure auth_gss pipe detection works in namespaces - Fix SETCLIENTID fallback if rpcsec_gss is not available * tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix SETCLIENTID fallback if GSS is not available SUNRPC: Prevent an rpc_task wakeup race NFSv4.1 Fix a pNFS session draining deadlock SUNRPC: Convert auth_gss pipe detection to work in namespaces SUNRPC: Faster detection if gssd is actually running SUNRPC: Fix a bug in gss_create_upcall
\| *	NFS: Fix SETCLIENTID fallback if GSS is not available	Chuck Lever	2013-05-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 79d852bf "NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE" did not take into account commit 23631227 "NFSv4: Fix the fallback to AUTH_NULL if krb5i is not available". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	SUNRPC: Prevent an rpc_task wakeup race	Trond Myklebust	2013-05-22	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to be careful about ordering the calls to rpc_test_and_set_running(task) and rpc_clear_queued(task). If we get the order wrong, then we may end up testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped and changed the state of the rpc_task. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| *	NFSv4.1 Fix a pNFS session draining deadlock	Andy Adamson	2013-05-20	6	-18/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a CB_RECALL the callback service thread flushes the inode using filemap_flush prior to scheduling the state manager thread to return the delegation. When pNFS is used and I/O has not yet gone to the data server servicing the inode, a LAYOUTGET can preceed the I/O. Unlike the async filemap_flush call, the LAYOUTGET must proceed to completion. If the state manager starts to recover data while the inode flush is sending the LAYOUTGET, a deadlock occurs as the callback service thread holds the single callback session slot until the flushing is done which blocks the state manager thread, and the state manager thread has set the session draining bit which puts the inode flush LAYOUTGET RPC to sleep on the forechannel slot table waitq. Separate the draining of the back channel from the draining of the fore channel by moving the NFS4_SESSION_DRAINING bit from session scope into the fore and back slot tables. Drain the back channel first allowing the LAYOUTGET call to proceed (and fail) so the callback service thread frees the callback slot. Then proceed with draining the forechannel. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	SUNRPC: Convert auth_gss pipe detection to work in namespaces	Trond Myklebust	2013-05-16	3	-19/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This seems to have been overlooked when we did the namespace conversion. If a container is running a legacy version of rpc.gssd then it will be disrupted if the global 'pipe_version' is set by a container running the new version of rpc.gssd. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	SUNRPC: Faster detection if gssd is actually running	Trond Myklebust	2013-05-16	3	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent changes to the NFS security flavour negotiation mean that we have a stronger dependency on rpc.gssd. If the latter is not running, because the user failed to start it, then we time out and mark the container as not having an instance. We then use that information to time out faster the next time. If, on the other hand, the rpc.gssd successfully binds to an rpc_pipe, then we mark the container as having an rpc.gssd instance. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	SUNRPC: Fix a bug in gss_create_upcall	Trond Myklebust	2013-05-15	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If wait_event_interruptible_timeout() is successful, it returns the number of seconds remaining until the timeout. In that case, we should be retrying the upcall. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* \|	Merge tag 'edac_fixes_for_3.10' of ↵	Linus Torvalds	2013-05-26	1	-2/+2
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull amd64 edac fix from Borislav Petkov: "A sysfs file permissions correction" * tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: Fix bogus sysfs file permissions
\| * \|	amd64_edac: Fix bogus sysfs file permissions	Borislav Petkov	2013-05-21	1	-2/+2
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \|	Fix yet another issue caught by 8f46baaa7ec6c ("base: core: WARN() about bogus permissions on device attributes"). Signed-off-by: Borislav Petkov <bp@suse.de>
* \|	Merge branch 'parisc-for-3.10' of ↵	Linus Torvalds	2013-05-26	12	-71/+62
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "This time we made the kernel- and interruption stack allocation reentrant which fixed some strange kernel crashes (specifically protection ID traps). Furthemore this patchset fixes the interrupt stack in UP and SMP configurations by using native locking instructions. And finally usage of floating point calculations on parisc were disabled in the MPILIB." * 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: fix irq stack on UP and SMP parisc/superio: Use module_pci_driver to register driver parisc: make interrupt and interruption stack allocation reentrant parisc: show number of FPE and unaligned access handler calls in /proc/interrupts parisc: add additional parisc git tree to MAINTAINERS file parisc: use PAGE_SHIFT instead of hardcoded value 12 in pacache.S parisc: add rp5470 entry to machine database MPILIB: disable usage of floating point registers on parisc
\| * \|	parisc: fix irq stack on UP and SMP	Helge Deller	2013-05-24	3	-41/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic to detect if the irq stack was already in use with raw_spin_trylock() is wrong, because it will generate a "trylock failure on UP" error message with CONFIG_SMP=n and CONFIG_DEBUG_SPINLOCK=y. arch_spin_trylock() can't be used either since in the CONFIG_SMP=n case no atomic protection is given and we are reentrant here. A mutex didn't worked either and brings more overhead by turning off interrupts. So, let's use the fastest path for parisc which is the ldcw instruction. Counting how often the irq stack was used is pretty useless, so just drop this piece of code. Signed-off-by: Helge Deller <deller@gmx.de>
\| * \|	parisc/superio: Use module_pci_driver to register driver	Peter Huewe	2013-05-24	1	-12/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing some boilerplate by using module_pci_driver instead of calling register and unregister in the otherwise empty init/exit functions. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>
\| * \|	parisc: make interrupt and interruption stack allocation reentrant	John David Anglin	2013-05-24	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The get_stack_use_cr30 and get_stack_use_r30 macros allocate a stack frame for external interrupts and interruptions requiring a stack frame. They are currently not reentrant in that they save register context before the stack is set or adjusted. I have observed a number of system crashes where there was clear evidence of stack corruption during interrupt processing, and as a result register corruption. Some interruptions can still occur during interruption processing, however external interrupts are disabled and data TLB misses don't occur for absolute accesses. So, it's not entirely clear what triggers this issue. Also, if an interruption occurs when Q=0, it is generally not possible to recover as the shadowed registers are not copied. The attached patch reworks the get_stack_use_cr30 and get_stack_use_r30 macros to allocate stack before doing register saves. The new code is a couple of instructions shorter than the old implementation. Thus, it's an improvement even if it doesn't fully resolve the stack corruption issue. Based on limited testing, it improves SMP system stability. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
\| * \|	parisc: show number of FPE and unaligned access handler calls in ↵	Helge Deller	2013-05-24	4	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/proc/interrupts Show number of floating point assistant and unaligned access fixup handler in /proc/interrupts file. Signed-off-by: Helge Deller <deller@gmx.de>
\| * \|	parisc: add additional parisc git tree to MAINTAINERS file	Helge Deller	2013-05-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Helge Deller <deller@gmx.de>