op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	BKL: remove BKL from freevxfs	Arnd Bergmann	2010-10-21	2	-22/+2
\| \| \| \| \| \| \| \| \| \|	All uses of the BKL in freevxfs were the result of a pushdown into code that doesn't really need it. As Christoph points out, this is a read-only file system, which eliminates most of the races in readdir/lookup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org>
*	BKL: remove BKL from qnx4	Arnd Bergmann	2010-10-21	3	-21/+1
\| \| \| \| \| \| \| \| \| \| \|	All uses of the BKL in qnx4 were the result of a pushdown into code that doesn't really need it. As Christoph points out, this is a read-only file system, which eliminates most of the races in readdir/lookup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Anders Larsen <al@alarsen.net> Cc: Christoph Hellwig <hch@infradead.org>
*	autofs4: Only declare function when CONFIG_COMPAT is defined	Felipe Contreras	2010-10-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch solves the following warnings message when CONFIG_COMPAT is not defined: fs/autofs4/root.c:31: warning: ‘autofs4_root_compat_ioctl’ declared ‘static’ but never defined Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Cc: Ian Kent <raven@themaw.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	autofs: Only declare function when CONFIG_COMPAT is defined	Márton Németh	2010-10-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The patch solves the following warnings message when CONFIG_COMPAT is not defined: fs/autofs/root.c:30: warning: ‘autofs_root_compat_ioctl’ declared ‘static’ but never defined Signed-off-by: Márton Németh <nm127@freemail.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	ncpfs: Lock socket in ncpfs while setting its callbacks	Petr Vandrovec	2010-10-05	1	-5/+9
\| \| \| \| \| \| \| \|	Otherwise partially updated pointers could be seen if pointer update is not atomic. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	fs/locks.c: prepare for BKL removal	Arnd Bergmann	2010-10-05	7	-62/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This prepares the removal of the big kernel lock from the file locking code. We still use the BKL as long as fs/lockd uses it and ceph might sleep, but we can flip the definition to a private spinlock as soon as that's done. All users outside of fs/lockd get converted to use lock_flocks() instead of lock_kernel() where appropriate. Based on an earlier patch to use a spinlock from Matthew Wilcox, who has attempted this a few times before, the earliest patch from over 10 years ago turned it into a semaphore, which ended up being slower than the BKL and was subsequently reverted. Someone should do some serious performance testing when this becomes a spinlock, since this has caused problems before. Using a spinlock should be at least as good as the BKL in theory, but who knows... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
*	BKL: Remove BKL from ncpfs	Petr Vandrovec	2010-10-04	8	-400/+484
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dozen of changes in ncpfs to provide some locking other than BKL. In readdir cache unlock and mark complete first page as last operation, so it can be used for synchronization, as code intended. When updating dentry name on case insensitive filesystems do at least some basic locking... Hold i_mutex when updating inode fields. Push some ncp_conn_is_valid down to ncp_request. Connection can become invalid at any moment, and fewer error code paths to test the better. Use i_size_{read,write} to modify file size. Set inode's backing_dev_info as ncpfs has its own special bdi. In ioctl unbreak ioctls invoked on filesystem mounted 'ro' - tests are for inode writeable or owner match, but were turned to filesystem writeable and inode writeable or owner match. Also collect all permission checks in single place. Add some locking, and remove comments saying that it would be cool to add some locks to the code. Constify some pointers. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from OCFS2	Arnd Bergmann	2010-10-04	3	-23/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The BKL in ocfs2/dlmfs is used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. The use in ocfs2_control_open is evidently unrelated and the function is protected by ocfs2_control_lock. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com>
*	BKL: Remove BKL from squashfs	Arnd Bergmann	2010-10-04	1	-11/+0
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super and fill_super, which are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
*	BKL: Remove BKL from jffs2	Arnd Bergmann	2010-10-04	2	-15/+1
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw2@infradead.org>
*	BKL: Remove BKL from ecryptfs	Arnd Bergmann	2010-10-04	2	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	The BKL is only used in fill_super, which is protected by the superblocks s_umount rw_semaphorei, and in fasync, which does not do anything that could require the BKL. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: ecryptfs-devel@lists.launchpad.net
*	BKL: Remove BKL from afs	Arnd Bergmann	2010-10-04	1	-10/+0
\| \| \| \| \| \| \| \| \| \|	The BKL is only used in put_super and fill_super, which are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-afs@lists.infradead.org Cc: David Howells <dhowells@redhat.com>
*	BKL: Remove BKL from autofs4	Arnd Bergmann	2010-10-04	1	-5/+7
\| \| \| \| \| \| \| \| \| \|	autofs4 uses the BKL only to guard its ioctl operations. This can be trivially converted to use a mutex, as we have done with most device drivers before. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ian Kent <raven@themaw.net>
*	BKL: Remove BKL from isofs	Arnd Bergmann	2010-10-04	5	-27/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As in other file systems, we can replace the big kernel lock with a private mutex in isofs. This means we can now access multiple file systems concurrently, but it also means that we serialize readdir and lookup across sleeping operations which previously released the big kernel lock. This should not matter though, as these operations are in practice serialized through the hardware access. The isofs_get_blocks functions now does not take any lock any more, it used to recursively get the BKL. After looking at the code for hours, I convinced myself that it was never needed here anyway, because it only reads constant fields of the inode and writes to a buffer head array that is at this time only visible to the caller. The get_sb and fill_super operations do not need the locking at all because they operate on a file system that is either about to be created or to be destroyed but in either case is not visible to other threads. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from fat	Arnd Bergmann	2010-10-04	3	-13/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lock_kernel in fat_put_super is not needed because it only protects the super block itself and we know that no other thread can reach it because we are about to kfree the object. In the two fill_super functions, this converts the locking to use lock_super like elsewhere in the fat code. This is probably not needed either, but is consistent and puts us on the safe side. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Jan Blunck <jblunck@infradead.org>
*	BKL: Remove BKL from ext2 filesystem	Jan Blunck	2010-10-04	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs() ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(), ext2_fill_super() and ext2_remount() are protected against each other by the struct super_block s_umount rw semaphore. The call in ext2_write_inode() could only protect the modification of the ext2_sb_info through ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount(). ext2_fill_super() and ext2_put_super() can be left out because you need a valid filesystem reference in all three cases, which you do not have when you are one of these functions. If the BKL is only protecting the modification of the ext2_sb_info it can safely be removed since this is protected by the struct ext2_sb_info s_lock. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from do_new_mount()	Jan Blunck	2010-10-04	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	After pushing down the BKL to the get_sb/fill_super operations of the filesystems that still make usage of the BKL it is safe to remove it from do_new_mount(). I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from NTFS	Jan Blunck	2010-10-04	1	-27/+2
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from NILFS2	Jan Blunck	2010-10-04	2	-18/+1
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from JFS	Jan Blunck	2010-10-04	1	-28/+7
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from HFS	Jan Blunck	2010-10-04	1	-12/+2
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super and fill_super that are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from ext4 filesystem	Jan Blunck	2010-10-04	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \|	The BKL is still used in ext4_put_super(), ext4_fill_super() and ext4_remount(). All three calles are protected against concurrent calls by the s_umount rw semaphore of struct super_block. Therefore the BKL is protecting nothing in this case. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from ext3_put_super() and ext3_remount()	Jan Blunck	2010-10-04	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \|	The BKL lock is protecting the remounting against a potential call to ext3_put_super(). This could not happen, since this is protected by the s_umount rw semaphore of struct super_block. Therefore I think the BKL is protecting nothing here. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from ext3 fill_super()	Jan Blunck	2010-10-04	1	-12/+1
\| \| \| \| \| \| \| \|	The BKL is protecting nothing than two memory allocations here. Signed-off-by: Jan Blunck <jblunck@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from CifsFS	Jan Blunck	2010-10-04	1	-12/+1
\| \| \| \| \| \| \| \| \| \|	The BKL is only used in put_super and fill_super that are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Cc: Steve French <smfrench@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from BFS	Jan Blunck	2010-10-04	1	-12/+1
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super and fill_super that are both protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Remove BKL from Amiga FFS	Jan Blunck	2010-10-04	1	-18/+5
\| \| \| \| \| \| \| \| \|	The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
*	BKL: Explicitly add BKL around get_sb/fill_super	Jan Blunck	2010-10-04	27	-25/+183
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Christoph Hellwig <hch@infradead.org>
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2010-09-24	14	-44/+117
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: o2dlm: force free mles during dlm exit ocfs2: Sync inode flags with ext2. ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. ocfs2: update ctime when changing the file's permission by setfacl ocfs2/net: fix uninitialized ret in o2net_send_message_vec() Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes. ocfs2: Fix lockdep warning in reflink. ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.
\| *	o2dlm: force free mles during dlm exit	Srinivas Eeda	2010-09-23	3	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2: Sync inode flags with ext2.	Tao Ma	2010-09-23	2	-16/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We sync our inode flags with ext2 and define them by hex values. But actually in commit 3669567(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits.	Tao Ma	2010-09-23	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The first time I read the function ocfs2_resmap_resv_bits, I consider about what 'wanted' will be used and consider about the comments. Then I find it is only used if the reservation is empty. ;) So we'd better move it to the parens so that it make the code more readable, what's more, ocfs2_resmap_resv_bits is used so frequently and we should save some cpus. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.	Tao Ma	2010-09-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2: update ctime when changing the file's permission by setfacl	Tao Ma	2010-09-23	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 30e2bab, ext3 fixed it. So change it accordingly in ocfs2. Steps to reproduce: # touch aaa # stat -c %Z aaa 1283760364 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1283760364 Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2/net: fix uninitialized ret in o2net_send_message_vec()	Wu Fengguang	2010-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mmotm/fs/ocfs2/cluster/tcp.c: In function ‘o2net_send_message_vec’: mmotm/fs/ocfs2/cluster/tcp.c:980:6: warning: ‘ret’ may be used uninitialized in this function It seems a real bug introduced by commit 9af0b38ff3 (ocfs2/net: Use wait_event() in o2net_send_message_vec()). cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c	Tristan Ye	2010-09-10	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch tries to handle the case in which list 'dlm->tracking_list' is empty, to avoid accessing an invalid pointer. It fixes the following oops: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1287 Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes.	Tristan Ye	2010-09-10	1	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2_dx_dir_rebalance(), we need to rejournal_acess the blocks after calling ocfs2_insert_extent() since growing an extent tree may trigger ocfs2_extend_trans(), which makes previous journal_access meaningless. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2: Fix lockdep warning in reflink.	Tao Ma	2010-09-10	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch change mutex_lock to a new subclass and add a new inode lock subclass for the target inode which caused this lockdep warning. ============================================= [ INFO: possible recursive locking detected ] 2.6.35+ #5 --------------------------------------------- reflink/11086 is trying to acquire lock: (Meta){+++++.}, at: [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] but task is already holding lock: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] other info that might help us debug this: 6 locks held by reflink/11086: #0: (&sb->s_type->i_mutex_key#15/1){+.+.+.}, at: [<ffffffff820e09ec>] lookup_create+0x26/0x97 #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa06f99a0>] ocfs2_reflink_ioctl+0x4d3/0x1229 [ocfs2] #2: (Meta){+++++.}, at: [<ffffffffa06f9aa0>] ocfs2_reflink_ioctl+0x5d3/0x1229 [ocfs2] #3: (&oi->ip_xattr_sem){+.+.+.}, at: [<ffffffffa06f9b58>] ocfs2_reflink_ioctl+0x68b/0x1229 [ocfs2] #4: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa06f9b67>] ocfs2_reflink_ioctl+0x69a/0x1229 [ocfs2] #5: (&sb->s_type->i_mutex_key#15/2){+.+...}, at: [<ffffffffa06f9d4f>] ocfs2_reflink_ioctl+0x882/0x1229 [ocfs2] stack backtrace: Pid: 11086, comm: reflink Not tainted 2.6.35+ #5 Call Trace: [<ffffffff82063dd9>] validate_chain+0x56e/0xd68 [<ffffffff82062275>] ? mark_held_locks+0x49/0x69 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06c9ade>] __ocfs2_cluster_lock+0x975/0xa0d [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06e107b>] ? ocfs2_wait_for_recovery+0x15/0x8a [ocfs2] [<ffffffffa06cb6ea>] ocfs2_inode_lock_full_nested+0x1ac/0xdc5 [ocfs2] [<ffffffffa06f9d65>] ? ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffff820623a0>] ? trace_hardirqs_on_caller+0x10b/0x12f [<ffffffff82060193>] ? debug_mutex_free_waiter+0x4f/0x53 [<ffffffffa06f9d65>] ocfs2_reflink_ioctl+0x898/0x1229 [ocfs2] [<ffffffffa06ce24a>] ? ocfs2_file_lock_res_init+0x66/0x78 [ocfs2] [<ffffffff820bb2d2>] ? might_fault+0x40/0x8d [<ffffffffa06df9f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820ee5d3>] ? mntput_no_expire+0x1d/0xb0 [<ffffffff820e07b3>] ? path_put+0x2c/0x31 [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff8233a7f6>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff8200299c>] ? sysret_check+0x27/0x62 [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
\| *	ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.	Tao Ma	2010-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the name shows, we shouldn't have any lock in ocfs2_xattr_get_nolock. so lift ip_xattr_sem to the caller. This should be safe for us since the only 2 callers are: 1. ocfs2_xattr_get which will lock the resources. 2. ocfs2_mknod which don't need this locking. And this also resolves the following lockdep warning. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35+ #5 ------------------------------------------------------- reflink/30027 is trying to acquire lock: (&oi->ip_alloc_sem){+.+.+.}, at: [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] but task is already holding lock: (&oi->ip_xattr_sem){++++..}, at: [<ffffffffa0673b58>] ocfs2_reflink_ioctl+0x68b/0x1226 [ocfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&oi->ip_xattr_sem){++++..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339650>] down_read+0x34/0x47 [<ffffffffa0691cb8>] ocfs2_xattr_get_nolock+0xa0/0x4e6 [ocfs2] [<ffffffffa069d64f>] ocfs2_get_acl_nolock+0x5c/0x132 [ocfs2] [<ffffffffa069d9c7>] ocfs2_init_acl+0x60/0x243 [ocfs2] [<ffffffffa066499d>] ocfs2_mknod+0xae8/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #2 (jbd2_handle){+.+...}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffffa0604ff8>] start_this_handle+0x4a3/0x4bc [jbd2] [<ffffffffa06051d6>] jbd2__journal_start+0xba/0xee [jbd2] [<ffffffffa0605218>] jbd2_journal_start+0xe/0x10 [jbd2] [<ffffffffa065ca34>] ocfs2_start_trans+0xb7/0x19b [ocfs2] [<ffffffffa06645f3>] ocfs2_mknod+0x73e/0xfea [ocfs2] [<ffffffffa0665041>] ocfs2_create+0x9d/0x105 [ocfs2] [<ffffffff820e1c83>] vfs_create+0x9b/0xf4 [<ffffffff820e20bb>] do_last+0x2fd/0x5be [<ffffffff820e31c0>] do_filp_open+0x1fb/0x572 [<ffffffff820d6cf6>] do_sys_open+0x5a/0xe7 [<ffffffff820d6dac>] sys_open+0x1b/0x1d [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #1 (&journal->j_trans_barrier){.+.+..}: [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82064fa9>] lock_release_non_nested+0x1e5/0x24b [<ffffffff82065999>] lock_release+0x158/0x17a [<ffffffff823389f6>] __mutex_unlock_slowpath+0xbf/0x11b [<ffffffff82338a5b>] mutex_unlock+0x9/0xb [<ffffffffa0679673>] ocfs2_free_ac_resource+0x31/0x67 [ocfs2] [<ffffffffa067c6bc>] ocfs2_free_alloc_context+0x11/0x1d [ocfs2] [<ffffffffa0633de0>] ocfs2_write_begin_nolock+0x141e/0x159b [ocfs2] [<ffffffffa0635523>] ocfs2_write_begin+0x11e/0x1e7 [ocfs2] [<ffffffff820a1297>] generic_file_buffered_write+0x10c/0x210 [<ffffffffa0653624>] ocfs2_file_aio_write+0x4cc/0x6d3 [ocfs2] [<ffffffff820d822d>] do_sync_write+0xc2/0x106 [<ffffffff820d897b>] vfs_write+0xae/0x131 [<ffffffff820d8e55>] sys_write+0x47/0x6f [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b -> #0 (&oi->ip_alloc_sem){+.+.+.}: [<ffffffff82063f92>] validate_chain+0x727/0xd68 [<ffffffff82064d6d>] __lock_acquire+0x79a/0x7f1 [<ffffffff82065a81>] lock_acquire+0xc6/0xed [<ffffffff82339694>] down_write+0x31/0x52 [<ffffffffa0673b67>] ocfs2_reflink_ioctl+0x69a/0x1226 [ocfs2] [<ffffffffa06599f6>] ocfs2_ioctl+0x61a/0x656 [ocfs2] [<ffffffff820e53ac>] vfs_ioctl+0x2a/0x9d [<ffffffff820e5903>] do_vfs_ioctl+0x45d/0x4ae [<ffffffff820e59ab>] sys_ioctl+0x57/0x7a [<ffffffff8200296b>] system_call_fastpath+0x16/0x1b Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* \|	/proc/pid/smaps: fix dirty pages accounting	KOSAKI Motohiro	2010-09-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, /proc/<pid>/smaps has wrong dirty pages accounting. Shared_Dirty and Private_Dirty output only pte dirty pages and ignore PG_dirty page flag. It is difference against documentation, but also inconsistent against Referenced field. (Referenced checks both pte and page flags) This patch fixes it. Test program: large-array.c --------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> char array[110241024*1024L]; int main(void) { memset(array, 1, sizeof(array)); pause(); return 0; } --------------------------------------------------- Test case: 1. run ./large-array 2. cat /proc/`pidof large-array`/smaps 3. swapoff -a 4. cat /proc/`pidof large-array`/smaps again Test result: <before patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 218992 kB <-- showed pages as clean incorrectly Private_Dirty: 829584 kB Referenced: 388364 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB <after patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 1048576 kB <-- fixed Referenced: 388480 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	aio: do not return ERESTARTSYS as a result of AIO	Jan Kara	2010-09-22	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	/proc/vmcore: fix seeking	Arnd Bergmann	2010-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore") broke seeking on /proc/vmcore. This changes it back to use default_llseek in order to restore the original behaviour. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual file systems. We should merge generic_file_llseek and default_llseek some day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore is the safer solution. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Prevent freeing uninitialized pointer in compat_do_readv_writev	Dan Rosenberg	2010-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 32-bit compatibility mode, the error handling for compat_do_readv_writev() may free an uninitialized pointer, potentially leading to all sorts of ugly memory corruption. This is reliably triggerable by unprivileged users by invoking the readv()/writev() syscalls with an invalid iovec pointer. The below patch fixes this to emulate the non-compat version. Introduced by commit b83733639a49 ("compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev") Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Cc: stable@kernel.org (2.6.35) Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2010-09-22	2	-3/+24
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.kernel.dk/linux-2.6-block: bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends char: Mark /dev/zero and /dev/kmem as not capable of writeback bdi: Initialize noop_backing_dev_info properly cfq-iosched: fix a kernel OOPs when usb key is inserted block: fix blk_rq_map_kern bio direction flag cciss: freeing uninitialized data on error path
\| * \|	bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends	Jan Kara	2010-09-22	1	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inodes of devices such as /dev/zero can get dirty for example via utime(2) syscall or due to atime update. Backing device of such inodes (zero_bdi, etc.) is however unable to handle dirty inodes and thus __mark_inode_dirty complains. In fact, inode should be rather dirtied against backing device of the filesystem holding it. This is generally a good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes in these pseudofilesystems are referenced from ordinary filesystem inodes and carry mapping with real data of the device. Thus for these inodes we have to use inode->i_mapping->backing_dev_info as we did so far. We distinguish these filesystems by checking whether sb->s_bdi points to a non-trivial backing device or not. Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /. There's a device inode A described by a path "/dev/sdb" on this filesystem. This inode will be dirtied against backing device "8:0" after this patch. bdev filesystem contains block device inode B coupled with our inode A. When someone modifies a page of /dev/sdb, it's B that gets dirtied and the dirtying happens against the backing device "8:16". Thus both inodes get filed to a correct bdi list. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
\| * \|	char: Mark /dev/zero and /dev/kmem as not capable of writeback	Jan Kara	2010-09-22	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These devices don't do any writeback but their device inodes still can get dirty so mark bdi appropriately so that bdi code does the right thing and files inodes to lists of bdi carrying the device inodes. Cc: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-09-21	9	-83/+84
\|\ \ \ \| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: select CRYPTO ceph: check mapping to determine if FILE_CACHE cap is used ceph: only send one flushsnap per cap_snap per mds session ceph: fix cap_snap and realm split ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap ceph: correctly set 'follows' in flushsnap messages ceph: fix dn offset during readdir_prepopulate ceph: fix file offset wrapping at 4GB on 32-bit archs ceph: fix reconnect encoding for old servers ceph: fix pagelist kunmap tail ceph: fix null pointer deref on anon root dentry release
\| * \|	ceph: select CRYPTO	Sage Weil	2010-09-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We select CRYPTO_AES, but not CRYPTO. Signed-off-by: Sage Weil <sage@newdream.net>
\| * \|	ceph: check mapping to determine if FILE_CACHE cap is used	Sage Weil	2010-09-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See if the i_data mapping has any pages to determine if the FILE_CACHE capability is currently in use, instead of assuming it is any time the rdcache_gen value is set (i.e., issued -> used). This allows the MDS RECALL_STATE process work for inodes that have cached pages. Signed-off-by: Sage Weil <sage@newdream.net>
\| * \|	ceph: only send one flushsnap per cap_snap per mds session	Sage Weil	2010-09-17	3	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sending multiple flushsnap messages is problematic because we ignore the response if the tid doesn't match, and the server may only respond to each one once. It's also a waste. So, skip cap_snaps that are already on the flushing list, unless the caller tells us to resend (because we are reconnecting). Signed-off-by: Sage Weil <sage@newdream.net>