op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd	Linus Torvalds	2010-01-06	2	-9/+18
\|\ \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: simple_write_end does not mark_inode_dirty exofs: fix pnfs_osd re-definitions in pre-pnfs trees
\| *	exofs: simple_write_end does not mark_inode_dirty	Boaz Harrosh	2010-01-05	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exofs uses simple_write_end() for it's .write_end handler. But it is not enough because simple_write_end() does not call mark_inode_dirty() when it extends i_size. So even if we do call mark_inode_dirty at beginning of write out, with a very long IO and a saturated system we might get the .write_inode() called while still extend-writing to file and miss out on the last i_size updates. So override .write_end, call simple_write_end(), and afterwords if i_size was changed call mark_inode_dirty(). It stands to logic that since simple_write_end() was the one extending i_size it should also call mark_inode_dirty(). But it looks like all users of simple_write_end() are memory-bound pseudo filesystems, who could careless about mark_inode_dirty(). I might submit a warning-comment patch to simple_write_end() in future. CC: Stable <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
\| *	exofs: fix pnfs_osd re-definitions in pre-pnfs trees	Boaz Harrosh	2010-01-05	1	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some on disk exofs constants and types are defined in the pnfs_osd_xdr.h file. Since we needed these types before the pnfs-objects code was accepted to mainline we duplicated the minimal needed definitions into an exofs local header. The definitions where conditionally included depending on !CONFIG_PNFS defined. So if PNFS was present in the tree definitions are taken from there and if not they are defined locally. That was all good but, the CONFIG_PNFS is planed to be included upstream before the pnfs-objects is also included. (The first pnfs batch might be pnfs-files only) So condition exofs local definitions on the absence of pnfs_osd_xdr.h inclusion (__PNFS_OSD_XDR_H__ not defined). User code must make sure that in future pnfs_osd_xdr.h will be included before fs/exofs/pnfs.h, which happens to be so in current code. Once pnfs-objects hits mainline, exofs's local header will be removed. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
* \|	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2010-01-05	1	-6/+15
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Handle O_DIRECT when writing to a refcounted cluster.
\| *	ocfs2: Handle O_DIRECT when writing to a refcounted cluster.	Tao Ma	2009-12-30	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of writing to a refcounted cluster with O_DIRECT, we need to fall back to buffer write. And when it is finished, we need to flush the page and the journal as we did for other O_DIRECT writes. This patch fix oss bug 1191. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1191 Signed-off-by: Tao Ma <tao.ma@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* \|	sysfs: Add lockdep annotations for the sysfs active reference	Eric W. Biederman	2010-01-04	2	-2/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Holding locks over device_del -> kobject_del -> sysfs_deactivate can cause deadlocks if those same locks are grabbed in sysfs show or store methods. The I model s_active count + completion as a sleeping read/write lock. I describe to lockdep sysfs_get_active as a read_trylock, sysfs_put_active as a read_unlock, and sysfs_deactivate as a write_lock and write_unlock pair. This seems to capture the essence for purposes of finding deadlocks, and in my testing gives finds real issues and ignores non-issues. This brings us back to holding locks over kobject_del is a problem that ideally we should find a way of addressing, but at least lockdep can tell us about the problems instead of requiring developers to debug rare strange system deadlocks, that happen when sysfs files are removed while being written to. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'sh/for-2.6.33' of ↵	Linus Torvalds	2010-01-04	1	-2/+2
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: binfmt_elf_fdpic: Fix build breakage introduced by coredump changes. sh: update defconfigs. sh: Don't default enable PMB support. sh: Disable PMB for SH4AL-DSP CPUs. sh: Only provide a PCLK definition for legacy CPG CPUs.
\| * \|	binfmt_elf_fdpic: Fix build breakage introduced by coredump changes.	Daisuke HATAYAMA	2010-01-04	1	-2/+2
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f6151dfea21496d43dbaba32cfcd9c9f404769bc introduces build breakage, so this patch fixes it together with some printk formatting cleanup. Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2010-01-04	5	-48/+77
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Calculate metadata requirements more accurately ext4: Fix accounting of reserved metadata blocks
\| * \|	ext4: Calculate metadata requirements more accurately	Theodore Ts'o	2010-01-01	5	-44/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the past, ext4_calc_metadata_amount(), and its sub-functions ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount() badly over-estimated the number of metadata blocks that might be required for delayed allocation blocks. This didn't matter as much when functions which managed the reserved metadata blocks were more aggressive about dropping reserved metadata blocks as delayed allocation blocks were written, but unfortunately they were too aggressive. This was fixed in commit 0637c6f, but as a result the over-estimation by ext4_calc_metadata_amount() would lead to reserving 2-3 times the number of pending delayed allocation blocks as potentially required metadata blocks. So if there are 1 megabytes of blocks which have been not yet been allocation, up to 3 megabytes of space would get reserved out of the user's quota and from the file system free space pool until all of the inode's data blocks have been allocated. This commit addresses this problem by much more accurately estimating the number of metadata blocks that will be required. It will still somewhat over-estimate the number of blocks needed, since it must make a worst case estimate not knowing which physical blocks will be needed, but it is much more accurate than before. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \|	ext4: Fix accounting of reserved metadata blocks	Theodore Ts'o	2010-01-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0637c6f had a typo which caused the reserved metadata blocks to not be released correctly. Fix this. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-01-04	4	-24/+30
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: update mailing list address nilfs2: Storage class should be before const qualifier nilfs2: trivial coding style fix
\| * \| \|	nilfs2: Storage class should be before const qualifier	Tobias Klauser	2009-12-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| * \| \|	nilfs2: trivial coding style fix	Jiro SEKIBA	2009-12-25	3	-23/+29
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a trivial style fix patch to mend errors/warnings reported by "checkpatch.pl --file". Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* \| \|	Merge branch 'reiserfs/kill-bkl' of ↵	Linus Torvalds	2010-01-02	6	-15/+53
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: reiserfs: Safely acquire i_mutex from xattr_rmdir reiserfs: Safely acquire i_mutex from reiserfs_for_each_xattr reiserfs: Fix journal mutex <-> inode mutex lock inversion reiserfs: Fix unwanted recursive reiserfs lock in reiserfs_unlink() reiserfs: Relax lock before open xattr dir in reiserfs_xattr_set_handle() reiserfs: Relax reiserfs lock while freeing the journal reiserfs: Fix reiserfs lock <-> i_mutex dependency inversion on xattr reiserfs: Warn on lock relax if taken recursively reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion reiserfs: Fix reiserfs lock <-> inode mutex dependency inversion reiserfs: Fix reiserfs lock and journal lock inversion dependency reiserfs: Fix possible recursive lock
\| * \| \|	reiserfs: Safely acquire i_mutex from xattr_rmdir	Frederic Weisbecker	2010-01-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Relax the reiserfs lock before taking the inode mutex from xattr_rmdir() to avoid the usual reiserfs lock <-> inode mutex bad dependency. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Safely acquire i_mutex from reiserfs_for_each_xattr	Frederic Weisbecker	2010-01-02	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Relax the reiserfs lock before taking the inode mutex from reiserfs_for_each_xattr() to avoid the usual bad dependencies: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-atom #179 ------------------------------------------------------- rm/3242 is trying to acquire lock: (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c11428ef>] reiserfs_for_each_xattr+0x23f/0x290 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143389>] reiserfs_write_lock+0x29/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401aab>] mutex_lock_nested+0x5b/0x340 [<c1143339>] reiserfs_write_lock_once+0x29/0x50 [<c1117022>] reiserfs_lookup+0x62/0x140 [<c10bd85f>] __lookup_hash+0xef/0x110 [<c10bf21d>] lookup_one_len+0x8d/0xc0 [<c1141e3a>] open_xa_dir+0xea/0x1b0 [<c1142720>] reiserfs_for_each_xattr+0x70/0x290 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10c0b13>] sys_unlinkat+0x23/0x40 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #0 (&sb->s_type->i_mutex_key#4/3){+.+.+.}: [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401aab>] mutex_lock_nested+0x5b/0x340 [<c11428ef>] reiserfs_for_each_xattr+0x23f/0x290 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10c0b13>] sys_unlinkat+0x23/0x40 [<c1002ec4>] sysenter_do_call+0x12/0x32 other info that might help us debug this: 1 lock held by rm/3242: #0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143389>] reiserfs_write_lock+0x29/0x40 stack backtrace: Pid: 3242, comm: rm Not tainted 2.6.32-atom #179 Call Trace: [<c13ffa13>] ? printk+0x18/0x1a [<c105d33a>] print_circular_bug+0xca/0xd0 [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105c932>] ? mark_held_locks+0x62/0x80 [<c105cc3b>] ? trace_hardirqs_on+0xb/0x10 [<c1401098>] ? mutex_unlock+0x8/0x10 [<c105f2c8>] lock_acquire+0x68/0x90 [<c11428ef>] ? reiserfs_for_each_xattr+0x23f/0x290 [<c11428ef>] ? reiserfs_for_each_xattr+0x23f/0x290 [<c1401aab>] mutex_lock_nested+0x5b/0x340 [<c11428ef>] ? reiserfs_for_each_xattr+0x23f/0x290 [<c11428ef>] reiserfs_for_each_xattr+0x23f/0x290 [<c1143180>] ? delete_one_xattr+0x0/0x100 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60 [<c1143339>] ? reiserfs_write_lock_once+0x29/0x50 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150 [<c11b0d4f>] ? _atomic_dec_and_lock+0x4f/0x70 [<c111e990>] ? reiserfs_delete_inode+0x0/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c1401098>] ? mutex_unlock+0x8/0x10 [<c10c3e0d>] ? vfs_readdir+0x7d/0xb0 [<c10c3af0>] ? filldir64+0x0/0xf0 [<c1002ef3>] ? sysenter_exit+0xf/0x16 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10c0b13>] sys_unlinkat+0x23/0x40 [<c1002ec4>] sysenter_do_call+0x12/0x32 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Fix journal mutex <-> inode mutex lock inversion	Frederic Weisbecker	2010-01-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to relax the reiserfs lock before locking the inode mutex from xattr_unlink(), otherwise we'll face the usual bad dependencies: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-atom #178 ------------------------------------------------------- rm/3202 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<c113c234>] do_journal_begin_r+0x94/0x360 but task is already holding lock: (&sb->s_type->i_mutex_key#4/2){+.+...}, at: [<c1142a67>] xattr_unlink+0x57/0xb0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#4/2){+.+...}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a7b>] mutex_lock_nested+0x5b/0x340 [<c1142a67>] xattr_unlink+0x57/0xb0 [<c1143179>] delete_one_xattr+0x29/0x100 [<c11427bb>] reiserfs_for_each_xattr+0x10b/0x290 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10c0b13>] sys_unlinkat+0x23/0x40 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a7b>] mutex_lock_nested+0x5b/0x340 [<c1143359>] reiserfs_write_lock+0x29/0x40 [<c113c23c>] do_journal_begin_r+0x9c/0x360 [<c113c680>] journal_begin+0x80/0x130 [<c1127363>] reiserfs_remount+0x223/0x4e0 [<c10b6dd6>] do_remount_sb+0xa6/0x140 [<c10ce6a0>] do_mount+0x560/0x750 [<c10ce914>] sys_mount+0x84/0xb0 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #0 (&journal->j_mutex){+.+...}: [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a7b>] mutex_lock_nested+0x5b/0x340 [<c113c234>] do_journal_begin_r+0x94/0x360 [<c113c680>] journal_begin+0x80/0x130 [<c1116d63>] reiserfs_unlink+0x83/0x2e0 [<c1142a74>] xattr_unlink+0x64/0xb0 [<c1143179>] delete_one_xattr+0x29/0x100 [<c11427bb>] reiserfs_for_each_xattr+0x10b/0x290 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10c0b13>] sys_unlinkat+0x23/0x40 [<c1002ec4>] sysenter_do_call+0x12/0x32 other info that might help us debug this: 2 locks held by rm/3202: #0: (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c114274b>] reiserfs_for_each_xattr+0x9b/0x290 #1: (&sb->s_type->i_mutex_key#4/2){+.+...}, at: [<c1142a67>] xattr_unlink+0x57/0xb0 stack backtrace: Pid: 3202, comm: rm Not tainted 2.6.32-atom #178 Call Trace: [<c13ff9e3>] ? printk+0x18/0x1a [<c105d33a>] print_circular_bug+0xca/0xd0 [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c1142a67>] ? xattr_unlink+0x57/0xb0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c113c234>] ? do_journal_begin_r+0x94/0x360 [<c113c234>] ? do_journal_begin_r+0x94/0x360 [<c1401a7b>] mutex_lock_nested+0x5b/0x340 [<c113c234>] ? do_journal_begin_r+0x94/0x360 [<c113c234>] do_journal_begin_r+0x94/0x360 [<c10411b6>] ? run_timer_softirq+0x1a6/0x220 [<c103cb00>] ? __do_softirq+0x50/0x140 [<c113c680>] journal_begin+0x80/0x130 [<c103cba2>] ? __do_softirq+0xf2/0x140 [<c104f72f>] ? hrtimer_interrupt+0xdf/0x220 [<c1116d63>] reiserfs_unlink+0x83/0x2e0 [<c105c932>] ? mark_held_locks+0x62/0x80 [<c11b8d08>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c1002fd8>] ? restore_all_notrace+0x0/0x18 [<c1142a67>] ? xattr_unlink+0x57/0xb0 [<c1142a74>] xattr_unlink+0x64/0xb0 [<c1143179>] delete_one_xattr+0x29/0x100 [<c11427bb>] reiserfs_for_each_xattr+0x10b/0x290 [<c1143150>] ? delete_one_xattr+0x0/0x100 [<c1401cb9>] ? mutex_lock_nested+0x299/0x340 [<c11429ba>] reiserfs_delete_xattrs+0x1a/0x60 [<c1143309>] ? reiserfs_write_lock_once+0x29/0x50 [<c111ea2f>] reiserfs_delete_inode+0x9f/0x150 [<c11b0d1f>] ? _atomic_dec_and_lock+0x4f/0x70 [<c111e990>] ? reiserfs_delete_inode+0x0/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c1401068>] ? mutex_unlock+0x8/0x10 [<c10c3e0d>] ? vfs_readdir+0x7d/0xb0 [<c10c3af0>] ? filldir64+0x0/0xf0 [<c1002ef3>] ? sysenter_exit+0xf/0x16 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10c0b13>] sys_unlinkat+0x23/0x40 [<c1002ec4>] sysenter_do_call+0x12/0x32 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Fix unwanted recursive reiserfs lock in reiserfs_unlink()	Frederic Weisbecker	2010-01-02	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reiserfs_unlink() may or may not be called under the reiserfs lock. But it also takes the reiserfs lock and can then acquire it recursively which leads to do_journal_begin_r() that fails to relax the reiserfs lock before grabbing the journal mutex, creating an unexpected lock inversion. We need to ensure reiserfs_unlink() won't get the reiserfs lock recursively using reiserfs_write_lock_once(). This fixes the following warning that precedes a lock inversion report (reiserfs lock <-> journal mutex). ------------[ cut here ]------------ WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3a/0x50() Hardware name: MS-7418 Unwanted recursive reiserfs lock! Pid: 3208, comm: dbench Not tainted 2.6.32-atom #177 Call Trace: [<c114327a>] ? reiserfs_lock_check_recursive+0x3a/0x50 [<c114327a>] ? reiserfs_lock_check_recursive+0x3a/0x50 [<c10373a7>] warn_slowpath_common+0x67/0xc0 [<c114327a>] ? reiserfs_lock_check_recursive+0x3a/0x50 [<c1037446>] warn_slowpath_fmt+0x26/0x30 [<c114327a>] reiserfs_lock_check_recursive+0x3a/0x50 [<c113c213>] do_journal_begin_r+0x83/0x360 [<c105eb16>] ? __lock_acquire+0x1296/0x19e0 [<c1142a57>] ? xattr_unlink+0x57/0xb0 [<c113c670>] journal_begin+0x80/0x130 [<c1116d5d>] reiserfs_unlink+0x7d/0x2d0 [<c1142a57>] ? xattr_unlink+0x57/0xb0 [<c1142a57>] ? xattr_unlink+0x57/0xb0 [<c1142a57>] ? xattr_unlink+0x57/0xb0 [<c1142a64>] xattr_unlink+0x64/0xb0 [<c1143169>] delete_one_xattr+0x29/0x100 [<c11427ab>] reiserfs_for_each_xattr+0x10b/0x290 [<c1143140>] ? delete_one_xattr+0x0/0x100 [<c1401ca9>] ? mutex_lock_nested+0x299/0x340 [<c11429aa>] reiserfs_delete_xattrs+0x1a/0x60 [<c11432f9>] ? reiserfs_write_lock_once+0x29/0x50 [<c111ea1f>] reiserfs_delete_inode+0x9f/0x150 [<c11b0d0f>] ? _atomic_dec_and_lock+0x4f/0x70 [<c111e980>] ? reiserfs_delete_inode+0x0/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10505c6>] ? up_read+0x16/0x30 [<c1022ab7>] ? do_page_fault+0x187/0x330 [<c1002fd8>] ? restore_all_notrace+0x0/0x18 [<c1022930>] ? do_page_fault+0x0/0x330 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10c0a00>] sys_unlink+0x10/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 ---[ end trace 2e35d71a6cc69d0c ]--- Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Relax lock before open xattr dir in reiserfs_xattr_set_handle()	Frederic Weisbecker	2010-01-02	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We call xattr_lookup() from reiserfs_xattr_get(). We then hold the reiserfs lock when we grab the i_mutex. But later, we may relax the reiserfs lock, creating dependency inversion between both locks. The lookups and creation jobs ar already protected by the inode mutex, so we can safely relax the reiserfs lock, dropping the unwanted reiserfs lock -> i_mutex dependency, as shown in the following lockdep report: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-atom #173 ------------------------------------------------------- cp/3204 is trying to acquire lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11432b9>] reiserfs_write_lock_once+0x29/0x50 but task is already holding lock: (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#4/3){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a2b>] mutex_lock_nested+0x5b/0x340 [<c1141d83>] open_xa_dir+0x43/0x1b0 [<c1142722>] reiserfs_for_each_xattr+0x62/0x260 [<c114299a>] reiserfs_delete_xattrs+0x1a/0x60 [<c111ea1f>] reiserfs_delete_inode+0x9f/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10c0a00>] sys_unlink+0x10/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #0 (&REISERFS_SB(s)->lock){+.+.+.}: [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a2b>] mutex_lock_nested+0x5b/0x340 [<c11432b9>] reiserfs_write_lock_once+0x29/0x50 [<c1117012>] reiserfs_lookup+0x62/0x140 [<c10bd85f>] __lookup_hash+0xef/0x110 [<c10bf21d>] lookup_one_len+0x8d/0xc0 [<c1141e2a>] open_xa_dir+0xea/0x1b0 [<c1141fe5>] xattr_lookup+0x15/0x160 [<c1142476>] reiserfs_xattr_get+0x56/0x2a0 [<c1144042>] reiserfs_get_acl+0xa2/0x360 [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160 [<c111789c>] reiserfs_mkdir+0x6c/0x2c0 [<c10bea96>] vfs_mkdir+0xd6/0x180 [<c10c0c10>] sys_mkdirat+0xc0/0xd0 [<c10c0c40>] sys_mkdir+0x20/0x30 [<c1002ec4>] sysenter_do_call+0x12/0x32 other info that might help us debug this: 2 locks held by cp/3204: #0: (&sb->s_type->i_mutex_key#4/1){+.+.+.}, at: [<c10bd8d6>] lookup_create+0x26/0xa0 #1: (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0 stack backtrace: Pid: 3204, comm: cp Not tainted 2.6.32-atom #173 Call Trace: [<c13ff993>] ? printk+0x18/0x1a [<c105d33a>] print_circular_bug+0xca/0xd0 [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105d3aa>] ? check_usage+0x6a/0x460 [<c105f2c8>] lock_acquire+0x68/0x90 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50 [<c1401a2b>] mutex_lock_nested+0x5b/0x340 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50 [<c11432b9>] reiserfs_write_lock_once+0x29/0x50 [<c1117012>] reiserfs_lookup+0x62/0x140 [<c105ccca>] ? debug_check_no_locks_freed+0x8a/0x140 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10bd85f>] __lookup_hash+0xef/0x110 [<c10bf21d>] lookup_one_len+0x8d/0xc0 [<c1141e2a>] open_xa_dir+0xea/0x1b0 [<c1141fe5>] xattr_lookup+0x15/0x160 [<c1142476>] reiserfs_xattr_get+0x56/0x2a0 [<c1144042>] reiserfs_get_acl+0xa2/0x360 [<c10ca2e7>] ? new_inode+0x27/0xa0 [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160 [<c1402eb7>] ? _spin_unlock+0x27/0x40 [<c111789c>] reiserfs_mkdir+0x6c/0x2c0 [<c10c7cb8>] ? __d_lookup+0x108/0x190 [<c105c932>] ? mark_held_locks+0x62/0x80 [<c1401c8d>] ? mutex_lock_nested+0x2bd/0x340 [<c10bd17a>] ? generic_permission+0x1a/0xa0 [<c11788fe>] ? security_inode_permission+0x1e/0x20 [<c10bea96>] vfs_mkdir+0xd6/0x180 [<c10c0c10>] sys_mkdirat+0xc0/0xd0 [<c10505c6>] ? up_read+0x16/0x30 [<c1002fd8>] ? restore_all_notrace+0x0/0x18 [<c10c0c40>] sys_mkdir+0x20/0x30 [<c1002ec4>] sysenter_do_call+0x12/0x32 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Relax reiserfs lock while freeing the journal	Frederic Weisbecker	2010-01-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keeping the reiserfs lock while freeing the journal on umount path triggers a lock inversion between bdev->bd_mutex and the reiserfs lock. We don't need the reiserfs lock at this stage. The filesystem is not usable anymore, and there are no more pending commits, everything got flushed (even this operation was done in parallel and didn't required the reiserfs lock from the current process). This fixes the following lockdep report: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-atom #172 ------------------------------------------------------- umount/3904 is trying to acquire lock: (&bdev->bd_mutex){+.+.+.}, at: [<c10de2c2>] __blkdev_put+0x22/0x160 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143279>] reiserfs_write_lock+0x29/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&REISERFS_SB(s)->lock){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c1143229>] reiserfs_write_lock_once+0x29/0x50 [<c111c485>] reiserfs_get_block+0x85/0x1620 [<c10e1040>] do_mpage_readpage+0x1f0/0x6d0 [<c10e1640>] mpage_readpages+0xc0/0x100 [<c1119b89>] reiserfs_readpages+0x19/0x20 [<c108f1ec>] __do_page_cache_readahead+0x1bc/0x260 [<c108f2b8>] ra_submit+0x28/0x40 [<c1087e3e>] filemap_fault+0x40e/0x420 [<c109b5fd>] __do_fault+0x3d/0x430 [<c109d47e>] handle_mm_fault+0x12e/0x790 [<c1022a65>] do_page_fault+0x135/0x330 [<c1403663>] error_code+0x6b/0x70 [<c10ef9ca>] load_elf_binary+0x82a/0x1a10 [<c10ba130>] search_binary_handler+0x90/0x1d0 [<c10bb70f>] do_execve+0x1df/0x250 [<c1001746>] sys_execve+0x46/0x70 [<c1002fa5>] syscall_call+0x7/0xb -> #2 (&mm->mmap_sem){++++++}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c109b1ab>] might_fault+0x8b/0xb0 [<c11b8f52>] copy_to_user+0x32/0x70 [<c10c3b94>] filldir64+0xa4/0xf0 [<c1109116>] sysfs_readdir+0x116/0x210 [<c10c3e1d>] vfs_readdir+0x8d/0xb0 [<c10c3ea9>] sys_getdents64+0x69/0xb0 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #1 (sysfs_mutex){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c110951c>] sysfs_addrm_start+0x2c/0xb0 [<c1109aa0>] create_dir+0x40/0x90 [<c1109b1b>] sysfs_create_dir+0x2b/0x50 [<c11b2352>] kobject_add_internal+0xc2/0x1b0 [<c11b2531>] kobject_add_varg+0x31/0x50 [<c11b25ac>] kobject_add+0x2c/0x60 [<c1258294>] device_add+0x94/0x560 [<c11036ea>] add_partition+0x18a/0x2a0 [<c110418a>] rescan_partitions+0x33a/0x450 [<c10de5bf>] __blkdev_get+0x12f/0x2d0 [<c10de76a>] blkdev_get+0xa/0x10 [<c11034b8>] register_disk+0x108/0x130 [<c11a87a9>] add_disk+0xd9/0x130 [<c12998e5>] sd_probe_async+0x105/0x1d0 [<c10528af>] async_thread+0xcf/0x230 [<c104bfd4>] kthread+0x74/0x80 [<c1003aab>] kernel_thread_helper+0x7/0x3c -> #0 (&bdev->bd_mutex){+.+.+.}: [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c10de2c2>] __blkdev_put+0x22/0x160 [<c10de40a>] blkdev_put+0xa/0x10 [<c113ce22>] free_journal_ram+0xd2/0x130 [<c113ea18>] do_journal_release+0x98/0x190 [<c113eb2a>] journal_release+0xa/0x10 [<c1128eb6>] reiserfs_put_super+0x36/0x130 [<c10b776f>] generic_shutdown_super+0x4f/0xe0 [<c10b7825>] kill_block_super+0x25/0x40 [<c11255df>] reiserfs_kill_sb+0x7f/0x90 [<c10b7f4a>] deactivate_super+0x7a/0x90 [<c10cccd8>] mntput_no_expire+0x98/0xd0 [<c10ccfcc>] sys_umount+0x4c/0x310 [<c10cd2a9>] sys_oldumount+0x19/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 other info that might help us debug this: 2 locks held by umount/3904: #0: (&type->s_umount_key#30){+++++.}, at: [<c10b7f45>] deactivate_super+0x75/0x90 #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143279>] reiserfs_write_lock+0x29/0x40 stack backtrace: Pid: 3904, comm: umount Not tainted 2.6.32-atom #172 Call Trace: [<c13ff903>] ? printk+0x18/0x1a [<c105d33a>] print_circular_bug+0xca/0xd0 [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c108b66f>] ? free_pcppages_bulk+0x1f/0x250 [<c105f2c8>] lock_acquire+0x68/0x90 [<c10de2c2>] ? __blkdev_put+0x22/0x160 [<c10de2c2>] ? __blkdev_put+0x22/0x160 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c10de2c2>] ? __blkdev_put+0x22/0x160 [<c105c932>] ? mark_held_locks+0x62/0x80 [<c10afe12>] ? kfree+0x92/0xd0 [<c10de2c2>] __blkdev_put+0x22/0x160 [<c105cc3b>] ? trace_hardirqs_on+0xb/0x10 [<c10de40a>] blkdev_put+0xa/0x10 [<c113ce22>] free_journal_ram+0xd2/0x130 [<c113ea18>] do_journal_release+0x98/0x190 [<c113eb2a>] journal_release+0xa/0x10 [<c1128eb6>] reiserfs_put_super+0x36/0x130 [<c1050596>] ? up_write+0x16/0x30 [<c10b776f>] generic_shutdown_super+0x4f/0xe0 [<c10b7825>] kill_block_super+0x25/0x40 [<c10f41e0>] ? vfs_quota_off+0x0/0x20 [<c11255df>] reiserfs_kill_sb+0x7f/0x90 [<c10b7f4a>] deactivate_super+0x7a/0x90 [<c10cccd8>] mntput_no_expire+0x98/0xd0 [<c10ccfcc>] sys_umount+0x4c/0x310 [<c10cd2a9>] sys_oldumount+0x19/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Fix reiserfs lock <-> i_mutex dependency inversion on xattr	Frederic Weisbecker	2010-01-02	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While deleting the xattrs of an inode, we hold the reiserfs lock and grab the inode->i_mutex of the targeted inode and the root private xattr directory. Later on, we may relax the reiserfs lock for various reasons, this creates inverted dependencies. We can remove the reiserfs lock -> i_mutex dependency by relaxing the former before calling open_xa_dir(). This is fine because the lookup and creation of xattr private directories done in open_xa_dir() are covered by the targeted inode mutexes. And deeper operations in the tree are still done under the write lock. This fixes the following lockdep report: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-atom #173 ------------------------------------------------------- cp/3204 is trying to acquire lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11432b9>] reiserfs_write_lock_once+0x29/0x50 but task is already holding lock: (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#4/3){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a2b>] mutex_lock_nested+0x5b/0x340 [<c1141d83>] open_xa_dir+0x43/0x1b0 [<c1142722>] reiserfs_for_each_xattr+0x62/0x260 [<c114299a>] reiserfs_delete_xattrs+0x1a/0x60 [<c111ea1f>] reiserfs_delete_inode+0x9f/0x150 [<c10c9c32>] generic_delete_inode+0xa2/0x170 [<c10c9d4f>] generic_drop_inode+0x4f/0x70 [<c10c8b07>] iput+0x47/0x50 [<c10c0965>] do_unlinkat+0xd5/0x160 [<c10c0a00>] sys_unlink+0x10/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #0 (&REISERFS_SB(s)->lock){+.+.+.}: [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c1401a2b>] mutex_lock_nested+0x5b/0x340 [<c11432b9>] reiserfs_write_lock_once+0x29/0x50 [<c1117012>] reiserfs_lookup+0x62/0x140 [<c10bd85f>] __lookup_hash+0xef/0x110 [<c10bf21d>] lookup_one_len+0x8d/0xc0 [<c1141e2a>] open_xa_dir+0xea/0x1b0 [<c1141fe5>] xattr_lookup+0x15/0x160 [<c1142476>] reiserfs_xattr_get+0x56/0x2a0 [<c1144042>] reiserfs_get_acl+0xa2/0x360 [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160 [<c111789c>] reiserfs_mkdir+0x6c/0x2c0 [<c10bea96>] vfs_mkdir+0xd6/0x180 [<c10c0c10>] sys_mkdirat+0xc0/0xd0 [<c10c0c40>] sys_mkdir+0x20/0x30 [<c1002ec4>] sysenter_do_call+0x12/0x32 other info that might help us debug this: 2 locks held by cp/3204: #0: (&sb->s_type->i_mutex_key#4/1){+.+.+.}, at: [<c10bd8d6>] lookup_create+0x26/0xa0 #1: (&sb->s_type->i_mutex_key#4/3){+.+.+.}, at: [<c1141e18>] open_xa_dir+0xd8/0x1b0 stack backtrace: Pid: 3204, comm: cp Not tainted 2.6.32-atom #173 Call Trace: [<c13ff993>] ? printk+0x18/0x1a [<c105d33a>] print_circular_bug+0xca/0xd0 [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105d3aa>] ? check_usage+0x6a/0x460 [<c105f2c8>] lock_acquire+0x68/0x90 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50 [<c1401a2b>] mutex_lock_nested+0x5b/0x340 [<c11432b9>] ? reiserfs_write_lock_once+0x29/0x50 [<c11432b9>] reiserfs_write_lock_once+0x29/0x50 [<c1117012>] reiserfs_lookup+0x62/0x140 [<c105ccca>] ? debug_check_no_locks_freed+0x8a/0x140 [<c105cbe4>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10bd85f>] __lookup_hash+0xef/0x110 [<c10bf21d>] lookup_one_len+0x8d/0xc0 [<c1141e2a>] open_xa_dir+0xea/0x1b0 [<c1141fe5>] xattr_lookup+0x15/0x160 [<c1142476>] reiserfs_xattr_get+0x56/0x2a0 [<c1144042>] reiserfs_get_acl+0xa2/0x360 [<c10ca2e7>] ? new_inode+0x27/0xa0 [<c114461a>] reiserfs_cache_default_acl+0x3a/0x160 [<c1402eb7>] ? _spin_unlock+0x27/0x40 [<c111789c>] reiserfs_mkdir+0x6c/0x2c0 [<c10c7cb8>] ? __d_lookup+0x108/0x190 [<c105c932>] ? mark_held_locks+0x62/0x80 [<c1401c8d>] ? mutex_lock_nested+0x2bd/0x340 [<c10bd17a>] ? generic_permission+0x1a/0xa0 [<c11788fe>] ? security_inode_permission+0x1e/0x20 [<c10bea96>] vfs_mkdir+0xd6/0x180 [<c10c0c10>] sys_mkdirat+0xc0/0xd0 [<c10505c6>] ? up_read+0x16/0x30 [<c1002fd8>] ? restore_all_notrace+0x0/0x18 [<c10c0c40>] sys_mkdir+0x20/0x30 [<c1002ec4>] sysenter_do_call+0x12/0x32 v2: Don't drop reiserfs_mutex_lock_nested_safe() as we'll still need it later Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Warn on lock relax if taken recursively	Frederic Weisbecker	2010-01-02	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we relax the reiserfs lock to avoid creating unwanted dependencies against others locks while grabbing these, we want to ensure it has not been taken recursively, otherwise the lock won't be really relaxed. Only its depth will be decreased. The unwanted dependency would then actually happen. To prevent from that, add a reiserfs_lock_check_recursive() call in the places that need it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion	Frederic Weisbecker	2010-01-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	i_xattr_sem depends on the reiserfs lock. But after we grab i_xattr_sem, we may relax/relock the reiserfs lock while waiting on a freezed filesystem, creating a dependency inversion between the two locks. In order to avoid the i_xattr_sem -> reiserfs lock dependency, let's create a reiserfs_down_read_safe() that acts like reiserfs_mutex_lock_safe(): relax the reiserfs lock while grabbing another lock to avoid undesired dependencies induced by the heivyweight reiserfs lock. This fixes the following warning: [ 990.005931] ======================================================= [ 990.012373] [ INFO: possible circular locking dependency detected ] [ 990.013233] 2.6.33-rc1 #1 [ 990.013233] ------------------------------------------------------- [ 990.013233] dbench/1891 is trying to acquire lock: [ 990.013233] (&REISERFS_SB(s)->lock){+.+.+.}, at: [<ffffffff81159505>] reiserfs_write_lock+0x35/0x50 [ 990.013233] [ 990.013233] but task is already holding lock: [ 990.013233] (&REISERFS_I(inode)->i_xattr_sem){+.+.+.}, at: [<ffffffff8115899a>] reiserfs_xattr_set_handle+0x8a/0x470 [ 990.013233] [ 990.013233] which lock already depends on the new lock. [ 990.013233] [ 990.013233] [ 990.013233] the existing dependency chain (in reverse order) is: [ 990.013233] [ 990.013233] -> #1 (&REISERFS_I(inode)->i_xattr_sem){+.+.+.}: [ 990.013233] [<ffffffff81063afc>] __lock_acquire+0xf9c/0x1560 [ 990.013233] [<ffffffff8106414f>] lock_acquire+0x8f/0xb0 [ 990.013233] [<ffffffff814ac194>] down_write+0x44/0x80 [ 990.013233] [<ffffffff8115899a>] reiserfs_xattr_set_handle+0x8a/0x470 [ 990.013233] [<ffffffff81158e30>] reiserfs_xattr_set+0xb0/0x150 [ 990.013233] [<ffffffff8115a6aa>] user_set+0x8a/0x90 [ 990.013233] [<ffffffff8115901a>] reiserfs_setxattr+0xaa/0xb0 [ 990.013233] [<ffffffff810e2596>] __vfs_setxattr_noperm+0x36/0xa0 [ 990.013233] [<ffffffff810e26bc>] vfs_setxattr+0xbc/0xc0 [ 990.013233] [<ffffffff810e2780>] setxattr+0xc0/0x150 [ 990.013233] [<ffffffff810e289d>] sys_fsetxattr+0x8d/0xa0 [ 990.013233] [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b [ 990.013233] [ 990.013233] -> #0 (&REISERFS_SB(s)->lock){+.+.+.}: [ 990.013233] [<ffffffff81063e30>] __lock_acquire+0x12d0/0x1560 [ 990.013233] [<ffffffff8106414f>] lock_acquire+0x8f/0xb0 [ 990.013233] [<ffffffff814aba77>] __mutex_lock_common+0x47/0x3b0 [ 990.013233] [<ffffffff814abebe>] mutex_lock_nested+0x3e/0x50 [ 990.013233] [<ffffffff81159505>] reiserfs_write_lock+0x35/0x50 [ 990.013233] [<ffffffff811340e5>] reiserfs_prepare_write+0x45/0x180 [ 990.013233] [<ffffffff81158bb6>] reiserfs_xattr_set_handle+0x2a6/0x470 [ 990.013233] [<ffffffff81158e30>] reiserfs_xattr_set+0xb0/0x150 [ 990.013233] [<ffffffff8115a6aa>] user_set+0x8a/0x90 [ 990.013233] [<ffffffff8115901a>] reiserfs_setxattr+0xaa/0xb0 [ 990.013233] [<ffffffff810e2596>] __vfs_setxattr_noperm+0x36/0xa0 [ 990.013233] [<ffffffff810e26bc>] vfs_setxattr+0xbc/0xc0 [ 990.013233] [<ffffffff810e2780>] setxattr+0xc0/0x150 [ 990.013233] [<ffffffff810e289d>] sys_fsetxattr+0x8d/0xa0 [ 990.013233] [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b [ 990.013233] [ 990.013233] other info that might help us debug this: [ 990.013233] [ 990.013233] 2 locks held by dbench/1891: [ 990.013233] #0: (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff810e2678>] vfs_setxattr+0x78/0xc0 [ 990.013233] #1: (&REISERFS_I(inode)->i_xattr_sem){+.+.+.}, at: [<ffffffff8115899a>] reiserfs_xattr_set_handle+0x8a/0x470 [ 990.013233] [ 990.013233] stack backtrace: [ 990.013233] Pid: 1891, comm: dbench Not tainted 2.6.33-rc1 #1 [ 990.013233] Call Trace: [ 990.013233] [<ffffffff81061639>] print_circular_bug+0xe9/0xf0 [ 990.013233] [<ffffffff81063e30>] __lock_acquire+0x12d0/0x1560 [ 990.013233] [<ffffffff8115899a>] ? reiserfs_xattr_set_handle+0x8a/0x470 [ 990.013233] [<ffffffff8106414f>] lock_acquire+0x8f/0xb0 [ 990.013233] [<ffffffff81159505>] ? reiserfs_write_lock+0x35/0x50 [ 990.013233] [<ffffffff8115899a>] ? reiserfs_xattr_set_handle+0x8a/0x470 [ 990.013233] [<ffffffff814aba77>] __mutex_lock_common+0x47/0x3b0 [ 990.013233] [<ffffffff81159505>] ? reiserfs_write_lock+0x35/0x50 [ 990.013233] [<ffffffff81159505>] ? reiserfs_write_lock+0x35/0x50 [ 990.013233] [<ffffffff81062592>] ? mark_held_locks+0x72/0xa0 [ 990.013233] [<ffffffff814ab81d>] ? __mutex_unlock_slowpath+0xbd/0x140 [ 990.013233] [<ffffffff810628ad>] ? trace_hardirqs_on_caller+0x14d/0x1a0 [ 990.013233] [<ffffffff814abebe>] mutex_lock_nested+0x3e/0x50 [ 990.013233] [<ffffffff81159505>] reiserfs_write_lock+0x35/0x50 [ 990.013233] [<ffffffff811340e5>] reiserfs_prepare_write+0x45/0x180 [ 990.013233] [<ffffffff81158bb6>] reiserfs_xattr_set_handle+0x2a6/0x470 [ 990.013233] [<ffffffff81158e30>] reiserfs_xattr_set+0xb0/0x150 [ 990.013233] [<ffffffff814abcb4>] ? __mutex_lock_common+0x284/0x3b0 [ 990.013233] [<ffffffff8115a6aa>] user_set+0x8a/0x90 [ 990.013233] [<ffffffff8115901a>] reiserfs_setxattr+0xaa/0xb0 [ 990.013233] [<ffffffff810e2596>] __vfs_setxattr_noperm+0x36/0xa0 [ 990.013233] [<ffffffff810e26bc>] vfs_setxattr+0xbc/0xc0 [ 990.013233] [<ffffffff810e2780>] setxattr+0xc0/0x150 [ 990.013233] [<ffffffff81056018>] ? sched_clock_cpu+0xb8/0x100 [ 990.013233] [<ffffffff8105eded>] ? trace_hardirqs_off+0xd/0x10 [ 990.013233] [<ffffffff810560a3>] ? cpu_clock+0x43/0x50 [ 990.013233] [<ffffffff810c6820>] ? fget+0xb0/0x110 [ 990.013233] [<ffffffff810c6770>] ? fget+0x0/0x110 [ 990.013233] [<ffffffff81002ddc>] ? sysret_check+0x27/0x62 [ 990.013233] [<ffffffff810e289d>] sys_fsetxattr+0x8d/0xa0 [ 990.013233] [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b Reported-and-tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion	Frederic Weisbecker	2009-12-29	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 500f5a0bf5f0624dae34307010e240ec090e4cde (reiserfs: Fix possible recursive lock) fixed a vmalloc under reiserfs lock that triggered a lockdep warning because of a IN-FS-RECLAIM <-> RECLAIM-FS-ON locking dependency inversion. But this patch has ommitted another vmalloc call in the same path that allocates the journal. Relax the lock for this one too. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>
\| * \| \|	reiserfs: Fix reiserfs lock <-> inode mutex dependency inversion	Frederic Weisbecker	2009-12-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reiserfs lock -> inode mutex dependency gets inverted when we relax the lock while walking to the tree. To fix this, use a specialized version of reiserfs_mutex_lock_safe that takes care of mutex subclasses. Then we can grab the inode mutex with I_MUTEX_XATTR subclass without any reiserfs lock dependency. This fixes the following report: [ INFO: possible circular locking dependency detected ] 2.6.32-06793-gf405425-dirty #2 ------------------------------------------------------- mv/18566 is trying to acquire lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1110708>] reiserfs_write_lock+0x28= /0x40 but task is already holding lock: (&sb->s_type->i_mutex_key#5/3){+.+.+.}, at: [<c111033c>] reiserfs_for_each_xattr+0x10c/0x380 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#5/3){+.+.+.}: [<c104f723>] validate_chain+0xa23/0xf70 [<c1050155>] __lock_acquire+0x4e5/0xa70 [<c105075a>] lock_acquire+0x7a/0xa0 [<c134c76f>] mutex_lock_nested+0x5f/0x2b0 [<c11102b4>] reiserfs_for_each_xattr+0x84/0x380 [<c1110615>] reiserfs_delete_xattrs+0x15/0x50 [<c10ef57f>] reiserfs_delete_inode+0x8f/0x140 [<c10a565c>] generic_delete_inode+0x9c/0x150 [<c10a574d>] generic_drop_inode+0x3d/0x60 [<c10a4667>] iput+0x47/0x50 [<c109cc0b>] do_unlinkat+0xdb/0x160 [<c109cca0>] sys_unlink+0x10/0x20 [<c1002c50>] sysenter_do_call+0x12/0x36 -> #0 (&REISERFS_SB(s)->lock){+.+.+.}: [<c104fc68>] validate_chain+0xf68/0xf70 [<c1050155>] __lock_acquire+0x4e5/0xa70 [<c105075a>] lock_acquire+0x7a/0xa0 [<c134c76f>] mutex_lock_nested+0x5f/0x2b0 [<c1110708>] reiserfs_write_lock+0x28/0x40 [<c1103d6b>] search_by_key+0x1f7b/0x21b0 [<c10e73ef>] search_by_entry_key+0x1f/0x3b0 [<c10e77f7>] reiserfs_find_entry+0x77/0x400 [<c10e81e5>] reiserfs_lookup+0x85/0x130 [<c109a144>] __lookup_hash+0xb4/0x110 [<c109b763>] lookup_one_len+0xb3/0x100 [<c1110350>] reiserfs_for_each_xattr+0x120/0x380 [<c1110615>] reiserfs_delete_xattrs+0x15/0x50 [<c10ef57f>] reiserfs_delete_inode+0x8f/0x140 [<c10a565c>] generic_delete_inode+0x9c/0x150 [<c10a574d>] generic_drop_inode+0x3d/0x60 [<c10a4667>] iput+0x47/0x50 [<c10a1c4f>] dentry_iput+0x6f/0xf0 [<c10a1d74>] d_kill+0x24/0x50 [<c10a396b>] dput+0x5b/0x120 [<c109ca89>] sys_renameat+0x1b9/0x230 [<c109cb28>] sys_rename+0x28/0x30 [<c1002c50>] sysenter_do_call+0x12/0x36 other info that might help us debug this: 2 locks held by mv/18566: #0: (&sb->s_type->i_mutex_key#5/1){+.+.+.}, at: [<c109b6ac>] lock_rename+0xcc/0xd0 #1: (&sb->s_type->i_mutex_key#5/3){+.+.+.}, at: [<c111033c>] reiserfs_for_each_xattr+0x10c/0x380 stack backtrace: Pid: 18566, comm: mv Tainted: G C 2.6.32-06793-gf405425-dirty #2 Call Trace: [<c134b252>] ? printk+0x18/0x1e [<c104e790>] print_circular_bug+0xc0/0xd0 [<c104fc68>] validate_chain+0xf68/0xf70 [<c104c8cb>] ? trace_hardirqs_off+0xb/0x10 [<c1050155>] __lock_acquire+0x4e5/0xa70 [<c105075a>] lock_acquire+0x7a/0xa0 [<c1110708>] ? reiserfs_write_lock+0x28/0x40 [<c134c76f>] mutex_lock_nested+0x5f/0x2b0 [<c1110708>] ? reiserfs_write_lock+0x28/0x40 [<c1110708>] ? reiserfs_write_lock+0x28/0x40 [<c134b60a>] ? schedule+0x27a/0x440 [<c1110708>] reiserfs_write_lock+0x28/0x40 [<c1103d6b>] search_by_key+0x1f7b/0x21b0 [<c1050176>] ? __lock_acquire+0x506/0xa70 [<c1051267>] ? lock_release_non_nested+0x1e7/0x340 [<c1110708>] ? reiserfs_write_lock+0x28/0x40 [<c104e354>] ? trace_hardirqs_on_caller+0x124/0x170 [<c104e3ab>] ? trace_hardirqs_on+0xb/0x10 [<c1042a55>] ? T.316+0x15/0x1a0 [<c1042d2d>] ? sched_clock_cpu+0x9d/0x100 [<c10e73ef>] search_by_entry_key+0x1f/0x3b0 [<c134bf2a>] ? __mutex_unlock_slowpath+0x9a/0x120 [<c104e354>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10e77f7>] reiserfs_find_entry+0x77/0x400 [<c10e81e5>] reiserfs_lookup+0x85/0x130 [<c1042d2d>] ? sched_clock_cpu+0x9d/0x100 [<c109a144>] __lookup_hash+0xb4/0x110 [<c109b763>] lookup_one_len+0xb3/0x100 [<c1110350>] reiserfs_for_each_xattr+0x120/0x380 [<c110ffe0>] ? delete_one_xattr+0x0/0x1c0 [<c1003342>] ? math_error+0x22/0x150 [<c1110708>] ? reiserfs_write_lock+0x28/0x40 [<c1110615>] reiserfs_delete_xattrs+0x15/0x50 [<c1110708>] ? reiserfs_write_lock+0x28/0x40 [<c10ef57f>] reiserfs_delete_inode+0x8f/0x140 [<c10a561f>] ? generic_delete_inode+0x5f/0x150 [<c10ef4f0>] ? reiserfs_delete_inode+0x0/0x140 [<c10a565c>] generic_delete_inode+0x9c/0x150 [<c10a574d>] generic_drop_inode+0x3d/0x60 [<c10a4667>] iput+0x47/0x50 [<c10a1c4f>] dentry_iput+0x6f/0xf0 [<c10a1d74>] d_kill+0x24/0x50 [<c10a396b>] dput+0x5b/0x120 [<c109ca89>] sys_renameat+0x1b9/0x230 [<c1042d2d>] ? sched_clock_cpu+0x9d/0x100 [<c104c8cb>] ? trace_hardirqs_off+0xb/0x10 [<c1042dde>] ? cpu_clock+0x4e/0x60 [<c1350825>] ? do_page_fault+0x155/0x370 [<c1041816>] ? up_read+0x16/0x30 [<c1350825>] ? do_page_fault+0x155/0x370 [<c109cb28>] sys_rename+0x28/0x30 [<c1002c50>] sysenter_do_call+0x12/0x36 Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
\| * \| \|	reiserfs: Fix reiserfs lock and journal lock inversion dependency	Frederic Weisbecker	2009-12-14	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we were using the bkl, we didn't care about dependencies against other locks, but the mutex conversion created new ones, which is why we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock before acquiring another mutex. But this trick actually fails if we have acquired the reiserfs lock recursively, as we try to unlock it to acquire the new mutex without inverted dependency, but we eventually only decrease its depth. This happens in the case of a nested inode creation/deletion. Say we have no space left on the device, we create an inode and tak the lock but fail to create its entry, then we release the inode using iput(), which calls reiserfs_delete_inode() that takes the reiserfs lock recursively. The path eventually ends up in journal_begin() where we try to take the journal safely but we fail because of the reiserfs lock recursion: [ INFO: possible circular locking dependency detected ] 2.6.32-06486-g053fe57 #2 ------------------------------------------------------- vi/23454 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c104f8f3>] validate_chain+0xa23/0xf70 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c11106a8>] reiserfs_write_lock+0x28/0x40 [<c110dacb>] do_journal_begin_r+0x6b/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c10f76c2>] reiserfs_remount+0x212/0x4d0 [<c1093997>] do_remount_sb+0x67/0x140 [<c10a9ca6>] do_mount+0x436/0x6b0 [<c10a9f86>] sys_mount+0x66/0xa0 [<c1002c50>] sysenter_do_call+0x12/0x36 -> #0 (&journal->j_mutex){+.+...}: [<c104fe38>] validate_chain+0xf68/0xf70 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c110dac4>] do_journal_begin_r+0x64/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140 [<c10a55fc>] generic_delete_inode+0x9c/0x150 [<c10a56ed>] generic_drop_inode+0x3d/0x60 [<c10a4607>] iput+0x47/0x50 [<c10e915c>] reiserfs_create+0x16c/0x1c0 [<c109a9c1>] vfs_create+0xc1/0x130 [<c109dbec>] do_filp_open+0x81c/0x920 [<c109004f>] do_sys_open+0x4f/0x110 [<c1090179>] sys_open+0x29/0x40 [<c1002c50>] sysenter_do_call+0x12/0x36 other info that might help us debug this: 2 locks held by vi/23454: #0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>] do_filp_open+0x27e/0x920 #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40 stack backtrace: Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57 #2 Call Trace: [<c134b202>] ? printk+0x18/0x1e [<c104e960>] print_circular_bug+0xc0/0xd0 [<c104fe38>] validate_chain+0xf68/0xf70 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c110ff80>] ? delete_one_xattr+0x0/0x1c0 [<c110dac4>] do_journal_begin_r+0x64/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140 [<c10a55bf>] ? generic_delete_inode+0x5f/0x150 [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140 [<c10a55fc>] generic_delete_inode+0x9c/0x150 [<c10a56ed>] generic_drop_inode+0x3d/0x60 [<c10a4607>] iput+0x47/0x50 [<c10e915c>] reiserfs_create+0x16c/0x1c0 [<c1099a5d>] ? inode_permission+0x7d/0xa0 [<c109a9c1>] vfs_create+0xc1/0x130 [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0 [<c109dbec>] do_filp_open+0x81c/0x920 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10 [<c134dc0d>] ? _spin_unlock+0x1d/0x20 [<c10a6eea>] ? alloc_fd+0xba/0xf0 [<c109004f>] do_sys_open+0x4f/0x110 [<c1090179>] sys_open+0x29/0x40 [<c1002c50>] sysenter_do_call+0x12/0x36 To fix this, use reiserfs_lock_once() from reiserfs_delete_inode() which prevents from adding reiserfs lock recursion. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
\| * \| \|	reiserfs: Fix possible recursive lock	Frederic Weisbecker	2009-12-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While allocating the bitmap using vmalloc, we hold the reiserfs lock, which makes lockdep later reporting a possible deadlock as we may swap out pages to allocate memory and then take the reiserfs lock recursively: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/312 [HC0[0]:SC0[0]:HE1:SE1] takes: (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11108a8>] reiserfs_write_lock+0x28/0x40 {RECLAIM_FS-ON-W} state was registered at: [<c104e1c2>] mark_held_locks+0x62/0x90 [<c104e28a>] lockdep_trace_alloc+0x9a/0xc0 [<c108e396>] kmem_cache_alloc+0x26/0xf0 [<c10850ec>] __get_vm_area_node+0x6c/0xf0 [<c10857de>] __vmalloc_node+0x7e/0xa0 [<c108597b>] vmalloc+0x2b/0x30 [<c10e00b9>] reiserfs_init_bitmap_cache+0x39/0x70 [<c10f8178>] reiserfs_fill_super+0x2e8/0xb90 [<c1094345>] get_sb_bdev+0x145/0x180 [<c10f5a11>] get_super_block+0x21/0x30 [<c10931f0>] vfs_kern_mount+0x40/0xd0 [<c10932d9>] do_kern_mount+0x39/0xd0 [<c10a9857>] do_mount+0x2c7/0x6b0 [<c10a9ca6>] sys_mount+0x66/0xa0 [<c161589b>] mount_block_root+0xc4/0x245 [<c1615a75>] mount_root+0x59/0x5f [<c1615b8c>] prepare_namespace+0x111/0x14b [<c1615269>] kernel_init+0xcf/0xdb [<c10031fb>] kernel_thread_helper+0x7/0x1c This is actually fine for two reasons: we call vmalloc at mount time then it's not in the swapping out path. Also the reiserfs lock can be acquired recursively, but since its implementation depends on a mutex, it's hard and not necessary worth it to teach that to lockdep. The lock is useless at mount time anyway, at least until we replay the journal. But let's remove it from this path later as this needs more thinking and is a sensible change. For now we can just relax the lock around vmalloc, Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
\| * \| \|	Merge commit 'v2.6.32' into reiserfs/kill-bkl	Frederic Weisbecker	2009-12-07	498	-12718/+26225
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge-reason: The tree was based 2.6.31. It's better to be up to date with 2.6.32. Although no conflicting changes were made in between, it gives benchmarking results closer to the lastest kernel behaviour.
* \| \| \| \|	writeback: add missing kernel-doc notation	Jaswinder Singh Rajput	2010-01-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the following htmldocs warning: Warning(fs/fs-writeback.c:255): No description found for parameter 'sb' Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	Linus Torvalds	2009-12-31	3	-6/+14
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Enable mmap on forcedirectio mounts cifs: NULL out tcon, pSesInfo, and srvTcp pointers when chasing DFS referrals
\| * \| \| \| \|	[CIFS] Enable mmap on forcedirectio mounts	Steve French	2009-12-07	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	openoffice and gedit failed with 'direct' options Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \| \| \| \|	cifs: NULL out tcon, pSesInfo, and srvTcp pointers when chasing DFS referrals	Jeff Layton	2009-12-03	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scenario is this: The kernel gets EREMOTE and starts chasing a DFS referral at mount time. The tcon reference is put, which puts the session reference too, but neither pointer is zeroed out. The mount gets retried (goto try_mount_again) with new mount info. Session setup fails fails and rc ends up being non-zero. The code then falls through to the end and tries to put the previously freed tcon pointer again. Oops at: cifs_put_smb_ses+0x14/0xd0 Fix this by moving the initialization of the rc variable and the tcon, pSesInfo and srvTcp pointers below the try_mount_again label. Also, add a FreeXid() before the goto to prevent xid "leaks". Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-by: Gustavo Carvalho Homem <gustavo@angulosolido.pt> CC: stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
* \| \| \| \| \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2009-12-30	12	-93/+186
\|\ \ \ \ \ \ \| \| \|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Patch up how we claim metadata blocks for quota purposes ext4: Ensure zeroout blocks have no dirty metadata ext4: return correct wbc.nr_to_write in ext4_da_writepages ext4: Update documentation to correct the inode_readahead_blks option name jbd2: don't use __GFP_NOFAIL in journal_init_common() ext4: flush delalloc blocks when space is low fs-writeback: Add helper function to start writeback if idle ext4: Eliminate potential double free on error path ext4: fix unsigned long long printk warning in super.c ext4, jbd2: Add barriers for file systems with exernal journals ext4: replace BUG() with return -EIO in ext4_ext_get_blocks ext4: add module aliases for ext2 and ext3 ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured ext4: remove unused #include <linux/version.h>
\| * \| \| \| \|	ext4: Patch up how we claim metadata blocks for quota purposes	Theodore Ts'o	2009-12-30	1	-73/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported in Kernel Bugzilla #14936, commit d21cd8f triggered a BUG in the function ext4_da_update_reserve_space() found in fs/ext4/inode.c. The root cause of this BUG() was caused by the fact that ext4_calc_metadata_amount() can severely over-estimate how many metadata blocks will be needed, especially when using direct block-mapped files. In addition, it can also badly under estimate how much space is needed, since ext4_calc_metadata_amount() assumes that the blocks are contiguous, and this is not always true. If the application is writing blocks to a sparse file, the number of metadata blocks necessary can be severly underestimated by the functions ext4_da_reserve_space(), ext4_da_update_reserve_space() and ext4_da_release_space(). This was the cause of the dq_claim_space reports found on kerneloops.org. Unfortunately, doing this right means that we need to massively over-estimate the amount of free space needed. So in some cases we may need to force the inode to be written to disk asynchronously in to avoid spurious quota failures. http://bugzilla.kernel.org/show_bug.cgi?id=14936 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: Ensure zeroout blocks have no dirty metadata	Aneesh Kumar K.V	2009-12-29	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug (found by Curt Wohlgemuth) in which new blocks returned from an extent created with ext4_ext_zeroout() can have dirty metadata still associated with them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: return correct wbc.nr_to_write in ext4_da_writepages	Richard Kennedy	2009-12-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ext4_da_writepages increases the nr_to_write in writeback_control then it must always re-base the return value. Originally there was a (misguided) attempt prevent wbc.nr_to_write from going negative. In fact, it's necessary to allow nr_to_write to be negative so that wb_writeback() can correctly calculate how many pages were actually written. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	jbd2: don't use __GFP_NOFAIL in journal_init_common()	Andrew Morton	2009-12-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It triggers the warning in get_page_from_freelist(), and it isn't appropriate to use __GFP_NOFAIL here anyway. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14843 Reported-by: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: flush delalloc blocks when space is low	Eric Sandeen	2009-12-23	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Creating many small files in rapid succession on a small filesystem can lead to spurious ENOSPC; on a 104MB filesystem: for i in `seq 1 22500`; do echo -n > $SCRATCH_MNT/$i echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i done leads to ENOSPC even though after a sync, 40% of the fs is free again. This is because we reserve worst-case metadata for delalloc writes, and when data is allocated that worst-case reservation is not usually needed. When freespace is low, kicking off an async writeback will start converting that worst-case space usage into something more realistic, almost always freeing up space to continue. This resolves the testcase for me, and survives all 4 generic ENOSPC tests in xfstests. We'll still need a hard synchronous sync to squeeze out the last bit, but this fixes things up to a large degree. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	fs-writeback: Add helper function to start writeback if idle	Eric Sandeen	2009-12-23	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4, at least, would like to start pushing on writeback if it starts to get close to ENOSPC when reserving worst-case blocks for delalloc writes. Writing out delalloc data will convert those worst-case predictions into usually smaller actual usage, freeing up space before we hit ENOSPC based on this speculation. Thanks to Jens for the suggestion for the helper function, & the naming help. I've made the helper return status on whether writeback was started even though I don't plan to use it in the ext4 patch; it seems like it would be potentially useful to test this in some cases. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Jan Kara <jack@suse.cz>
\| * \| \| \| \|	ext4: Eliminate potential double free on error path	Julia Lawall	2009-12-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	b_entry_name and buffer are initially NULL, are initialized within a loop to the result of calling kmalloc, and are freed at the bottom of this loop. The loop contains gotos to cleanup, which also frees b_entry_name and buffer. Some of these gotos are before the reinitializations of b_entry_name and buffer. To maintain the invariant that b_entry_name and buffer are NULL at the top of the loop, and thus acceptable arguments to kfree, these variables are now set to NULL after the kfrees. This seems to be the simplest solution. A more complicated solution would be to introduce more labels in the error handling code at the end of the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier E; expression E1; iterator I; statement S; @@ kfree(E); ... when != E = E1 when != I(E,...) S when != &E kfree(E); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: fix unsigned long long printk warning in super.c	Andrew Morton	2009-12-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sparc64 allmodconfig: fs/ext4/super.c: In function `lifetime_write_kbytes_show': fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4, jbd2: Add barriers for file systems with exernal journals	Theodore Ts'o	2009-12-23	3	-10/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a bit complicated because we are trying to optimize when we send barriers to the fs data disk. We could just throw in an extra barrier to the data disk whenever we send a barrier to the journal disk, but that's not always strictly necessary. We only need to send a barrier during a commit when there are data blocks which are must be written out due to an inode written in ordered mode, or if fsync() depends on the commit to force data blocks to disk. Finally, before we drop transactions from the beginning of the journal during a checkpoint operation, we need to guarantee that any blocks that were flushed out to the data disk are firmly on the rust platter before we drop the transaction from the journal. Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: replace BUG() with return -EIO in ext4_ext_get_blocks	Surbhi Palande	2009-12-14	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the Kernel BZ #14286. When the address of an extent corresponding to a valid block is corrupted, a -EIO should be reported instead of a BUG(). This situation should not normally not occur except in the case of a corrupted filesystem. If however it does, then the system should not panic directly but depending on the mount time options appropriate action should be taken. If the mount options so permit, the I/O should be gracefully aborted by returning a -EIO. http://bugzilla.kernel.org/show_bug.cgi?id=14286 Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: add module aliases for ext2 and ext3	Theodore Ts'o	2009-12-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add module aliases for ext2 and ext3 when CONFIG_EXT4_USE_FOR_EXT23 is set. This makes the existing user-space stuff like mkinitrd working as is. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configured	David Howells	2009-12-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't offer to build ext2/3 support into ext4 if ext4 itself is not configured on. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
\| * \| \| \| \|	ext4: remove unused #include <linux/version.h>	Huang Weiyi	2009-12-14	2	-2/+0
\| \| \|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unused #include <linux/version.h>('s) in fs/ext4/block_validity.c fs/ext4/mballoc.h Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* \| \| \| \|	generic_permission: MAY_OPEN is not write access	Serge E. Hallyn	2009-12-30	1	-0/+1
\|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generic_permission was refusing CAP_DAC_READ_SEARCH-enabled processes from opening DAC-protected files read-only, because do_filp_open adds MAY_OPEN to the open mask. Ignore MAY_OPEN. After this patch, CAP_DAC_READ_SEARCH is again sufficient to open(fname, O_RDONLY) on a file to which DAC otherwise refuses us read permission. Reported-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Mike Kazantsev <mk.fraggod@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \|	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2009-12-24	18	-116/+168
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c ocfs2/trivial: Use proper mask for 2 places in hearbeat.c Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink. Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode? ocfs2: Use FIEMAP_EXTENT_SHARED fiemap: Add new extent flag FIEMAP_EXTENT_SHARED ocfs2: replace u8 by __u8 in ocfs2_fs.h ocfs2: explicit declare uninitialized var in user_cluster_connect() ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock() ocfs2: return -EAGAIN instead of EAGAIN in dlm ocfs2/cluster: Make fence method configurable - v2 ocfs2: Set MS_POSIXACL on remount ocfs2: Make acl use the default ocfs2: Always include ACL support
\| * \| \| \|	ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c	Tao Ma	2009-12-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2_value_metas_in_xattr_header, we should Use le16_to_cpu for ocfs2_extent_list.l_next_free_rec. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>