op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'for_linus' of ↵	Linus Torvalds	2011-01-11	16	-142/+545
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG ext3: Remove redundant unlikely() ext2: Remove redundant unlikely() ext3: speed up file creates by optimizing rec_len functions ext2: speed up file creates by optimizing rec_len functions ext3: Add more journal error check ext3: Add journal error check in resize.c quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout ext3: Add FITRIM handling ext3: Add batched discard support for ext3 ext3: Add journal error check into ext3_rename() ext3: Use search_dirblock() in ext3_dx_find_entry() ext3: Avoid uninitialized memory references with a corrupted htree directory ext3: Return error code from generic_check_addressable ext3: Add journal error check into ext3_delete_entry() ext3: Add error check in ext3_mkdir() fs/ext3/super.c: Use printf extension %pV fs/ext2/super.c: Use printf extension %pV ext3: don't update sb journal_devnum when RO dev
\| *	ext2: Resolve 'dereferencing pointer to incomplete type' when enabling ↵	Josh Hunt	2011-01-10	2	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EXT2_XATTR_DEBUG When I enable EXT2_XATTR_DEBUG in fs/ext2/xattr.c I get a build error stating the following: CC fs/ext2/xattr.o fs/ext2/xattr.c: In function 'ext2_xattr_cache_insert': fs/ext2/xattr.c:841: error: dereferencing pointer to incomplete type fs/ext2/xattr.c:846: error: dereferencing pointer to incomplete type make[2]: * [fs/ext2/xattr.o] Error 1 make[1]: * [fs/ext2] Error 2 make: *** [fs] Error 2 These lines reference ext2_xattr_cache->c_entry_count which is defined in struct mb_cache. struct mb_cache is currently only defined in fs/mbcache.c. Moving struct mb_cache definition to include/linux/mbcache.h to resolve the issue. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Remove redundant unlikely()	Tobias Klauser	2011-01-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext2: Remove redundant unlikely()	Tobias Klauser	2011-01-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: speed up file creates by optimizing rec_len functions	Eric Sandeen	2011-01-10	2	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The addition of 64k block capability in the rec_len_from_disk and rec_len_to_disk functions added a bit of math overhead which slows down file create workloads needlessly when the architecture cannot even support 64k blocks, thanks to page size limits. Similar changes already exist in the ext4 codebase. The directory entry checking can also be optimized a bit by sprinkling in some unlikely() conditions to move the error handling out of line. bonnie++ sequential file creates on a 512MB ramdisk speeds up from about 77,000/s to about 82,000/s, about a 6% improvement. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext2: speed up file creates by optimizing rec_len functions	Eric Sandeen	2011-01-10	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The addition of 64k block capability in the rec_len_from_disk and rec_len_to_disk functions added a bit of math overhead which slows down file create workloads needlessly when the architecture cannot even support 64k blocks, thanks to page size limits. The directory entry checking can also be optimized a bit by sprinkling in some unlikely() conditions to move the error handling out of line. bonnie++ sequential file creates on a 512MB ramdisk speeds up from about 2200/s to about 2500/s, about a 14% improvement. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add more journal error check	Namhyung Kim	2011-01-10	2	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check return value of ext3_journal_get_write_acccess() and ext3_journal_dirty_metadata(). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add journal error check in resize.c	Namhyung Kim	2011-01-10	1	-14/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check return value of ext3_journal_get_write_access() and ext3_journal_dirty_metadata(). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	quota: Use %pV and __attribute__((format (printf in __quota_error and fix ↵	Joe Perches	2011-01-10	3	-13/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fallout Use %pV in __quota_error so a single printk can not be interleaved with other logging messages. Add __attribute__((format (printf, 3, 4))) so format and arguments can be verified by compiler. Make sure printk formats and arguments match. Block # needed a pointer dereference. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add FITRIM handling	Lukas Czerner	2011-01-10	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ioctl takes fstrim_range structure (defined in include/linux/fs.h) as an argument specifying a range of filesystem to trim and the minimum size of an continguous extent to trim. After the FITRIM is done, the number of bytes passed from the filesystem down the block stack to the device for potential discard is stored in fstrim_range.len. This number is a maximum discard amount from the storage device's perspective, because FITRIM called repeatedly will keep sending the same sectors for discard. fstrim_range.len will report the same potential discard bytes each time, but only sectors which had been written to between the discards would actually be discarded by the storage device. Further, the kernel block layer reserves the right to adjust the discard ranges to fit raid stripe geometry, non-trim capable devices in a LVM setup, etc. These reductions would not be reflected in fstrim_range.len. Thus fstrim_range.len can give the user better insight on how much storage space has potentially been released for wear-leveling, but it needs to be one of only one criteria the userspace tools take into account when trying to optimize calls to FITRIM. Thanks to Greg Freemyer <greg.freemyer@gmail.com> for better commit message. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add batched discard support for ext3	Lukas Czerner	2011-01-10	2	-0/+267
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Walk through allocation groups and trim all free extents. It can be invoked through FITRIM ioctl on the file system. The main idea is to provide a way to trim the whole file system if needed, since some SSD's may suffer from performance loss after the whole device was filled (it does not mean that fs is full!). It search for free extents in allocation groups specified by Byte range start -> start+len. When the free extent is within this range, blocks are marked as used and then trimmed. Afterwards these blocks are marked as free in per-group bitmap. [JK: Fixed up error handling and trimming of a single group] Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add journal error check into ext3_rename()	Namhyung Kim	2011-01-06	1	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check return value of ext3_journal_get_write_access() and ext3_journal_dirty_metadata(). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Use search_dirblock() in ext3_dx_find_entry()	Theodore Ts'o	2011-01-06	1	-21/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the search_dirblock() in ext3_dx_find_entry(). It makes the code easier to read, and it takes advantage of common code. It also saves 100 bytes or so of text space. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Avoid uninitialized memory references with a corrupted htree directory	Theodore Ts'o	2011-01-06	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the first htree directory is missing '.' or '..' but is otherwise a valid directory, and we do a lookup for '.' or '..', it's possible to dereference an uninitialized memory pointer in ext3_htree_next_block(). Avoid this. We avoid this by moving the special case from ext3_dx_find_entry() to ext3_find_entry(); this also means we can optimize ext3_find_entry() slightly when NFS looks up "..". Thanks to Brad Spengler for pointing a Clang warning that led me to look more closely at this code. The warning was harmless, but it was useful in pointing out code that was too ugly to live. This warning was also reported by Roman Borisov. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Return error code from generic_check_addressable	Darrick J. Wong	2011-01-06	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext3_fill_super should return the error code that generic_check_accessible returns when an error condition occurs. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add journal error check into ext3_delete_entry()	Namhyung Kim	2011-01-06	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check return value of ext3_journal_get_write_access() and ext3_journal_dirty_metadata(). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: Add error check in ext3_mkdir()	Namhyung Kim	2011-01-06	1	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check return value of ext3_journal_get_write_access, ext3_journal_dirty_metadata and ext3_mark_inode_dirty. Consolidate error path under new label 'out_clear_inode' and adjust bh releasing appropriately. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	fs/ext3/super.c: Use printf extension %pV	Joe Perches	2011-01-06	1	-19/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	fs/ext2/super.c: Use printf extension %pV	Joe Perches	2011-01-06	1	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| *	ext3: don't update sb journal_devnum when RO dev	Maciej Żenczykowski	2011-01-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An ext3 filesystem on a read-only device, with an external journal which is at a different device number then recorded in the superblock will fail to honor the read-only setting of the device and trigger a superblock update (write). For example: - ext3 on a software raid which is in read-only mode - external journal on a read-write device which has changed device num - attempt to mount with -o journal_dev=<new_number> - hits BUG_ON(mddev->ro = 1) in md.c Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-01-11	8	-884/+887
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: Don't set dentry->d_op in create routines fs/9p: fix spelling typo fs/9p: TREADLINK bugfix net/9p: Use proper data types fs/9p: Simplify the .L create operation fs/9p: Move dotl inode operations into a seperate file fs/9p: fix menu presentation fs/9p: Fix the return error on default acl removal fs/9p: Remove unnecessary semicolons
\| * \|	fs/9p: Don't set dentry->d_op in create routines	Aneesh Kumar K.V	2011-01-11	2	-19/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We do set dentry->d_op in lookup even in case of EOENT entries. That implies we should have dentry->d_op already set when create/mkdir/mknod/link/symlink routines are called Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	fs/9p: fix spelling typo	Eric Van Hensbergen	2011-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	introduced a typo somehow during a hand merge Reported by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	fs/9p: TREADLINK bugfix	M. Mohan Kumar	2011-01-11	1	-36/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove v9fs_vfs_readlink_dotl function and use generic_readlink. Update v9fs_vfs_follow_link_dotl function to accommodate this change Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	net/9p: Use proper data types	M. Mohan Kumar	2011-01-11	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use proper data types for storing the count of the binary blob and length of a string. Without this patch length calculation of string will always result in -1 because of comparision between signed and unsigned integer. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	fs/9p: Simplify the .L create operation	Aneesh Kumar K.V	2011-01-11	1	-47/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	fs/9p: Move dotl inode operations into a seperate file	Aneesh Kumar K.V	2011-01-11	4	-862/+916
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Source Code Reorganization Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	fs/9p: fix menu presentation	Randy Dunlap	2011-01-11	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the 9P_FS kconfig options subordinate to the 9P_FS kconfig symbol in the menu presentation instead of them all being at the same level. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
\| * \|	fs/9p: Fix the return error on default acl removal	Aneesh Kumar K.V	2011-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we don't have default ACL, then trying to remove default acl on a file should return 0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
\| * \|	fs/9p: Remove unnecessary semicolons	Joe Perches	2011-01-11	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* \| \|	Merge branch 'for-linus-merged' of git://oss.sgi.com/xfs/xfs	Linus Torvalds	2011-01-11	38	-1920/+1997
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus-merged' of git://oss.sgi.com/xfs/xfs: (47 commits) xfs: convert grant head manipulations to lockless algorithm xfs: introduce new locks for the log grant ticket wait queues xfs: convert log grant heads to atomic variables xfs: convert l_tail_lsn to an atomic variable. xfs: convert l_last_sync_lsn to an atomic variable xfs: make AIL tail pushing independent of the grant lock xfs: use wait queues directly for the log wait queues xfs: combine grant heads into a single 64 bit integer xfs: rework log grant space calculations xfs: fact out common grant head/log tail verification code xfs: convert log grant ticket queues to list heads xfs: use AIL bulk delete function to implement single delete xfs: use AIL bulk update function to implement single updates xfs: remove all the inodes on a buffer from the AIL in bulk xfs: consume iodone callback items on buffers as they are processed xfs: reduce the number of AIL push wakeups xfs: bulk AIL insertion during transaction commit xfs: clean up xfs_ail_delete() xfs: Pull EFI/EFD handling out from under the AIL lock xfs: fix EFI transaction cancellation. ...
\| * \ \	Merge branch 'master' into for-linus-merged	Alex Elder	2011-01-10	38	-1920/+1997
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This merge pulls the XFS master branch into the latest Linus master. This results in a merge conflict whose best fix is not obvious. I manually fixed the conflict, in "fs/xfs/xfs_iget.c". Dave Chinner had done work that resulted in RCU freeing of inodes separate from what Nick Piggin had done, and their results differed slightly in xfs_inode_free(). The fix updates Nick's call_rcu() with the use of VFS_I(), while incorporating needed updates to some XFS inode fields implemented in Dave's series. Dave's RCU callback function has also been removed. Signed-off-by: Alex Elder <aelder@sgi.com>
\| \| * \| \|	xfs: convert grant head manipulations to lockless algorithm	Dave Chinner	2010-12-21	2	-77/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only thing that the grant lock remains to protect is the grant head manipulations when adding or removing space from the log. These calculations are already based on atomic variables, so we can already update them safely without locks. However, the grant head manpulations require atomic multi-step calculations to be executed, which the algorithms currently don't allow. To make these multi-step calculations atomic, convert the algorithms to compare-and-exchange loops on the atomic variables. That is, we sample the old value, perform the calculation and use atomic64_cmpxchg() to attempt to update the head with the new value. If the head has not changed since we sampled it, it will succeed and we are done. Otherwise, we rerun the calculation again from a new sample of the head. This allows us to remove the grant lock from around all the grant head space manipulations, and that effectively removes the grant lock from the log completely. Hence we can remove the grant lock completely from the log at this point. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: introduce new locks for the log grant ticket wait queues	Dave Chinner	2010-12-21	3	-60/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log grant ticket wait queues are currently protected by the log grant lock. However, the queues are functionally independent from each other, and operations on them only require serialisation against other queue operations now that all of the other log variables they use are atomic values. Hence, we can make them independent of the grant lock by introducing new locks just to protect the lists operations. because the lists are independent, we can use a lock per list and ensure that reserve and write head queuing do not contend. To ensure forced shutdowns work correctly in conjunction with the new fast paths, ensure that we check whether the log has been shut down in the grant functions once we hold the relevant spin locks but before we go to sleep. This is needed to co-ordinate correctly with the wakeups that are issued on the ticket queues so we don't leave any processes sleeping on the queues during a shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: convert log grant heads to atomic variables	Dave Chinner	2010-12-04	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert the log grant heads to atomic64_t types in preparation for converting the accounting algorithms to atomic operations. his patch just converts the variables; the algorithmic changes are in a separate patch for clarity. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: convert l_tail_lsn to an atomic variable.	Dave Chinner	2010-12-21	4	-46/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	log->l_tail_lsn is currently protected by the log grant lock. The lock is only needed for serialising readers against writers, so we don't really need the lock if we make the l_tail_lsn variable an atomic. Converting the l_tail_lsn variable to an atomic64_t means we can start to peel back the grant lock from various operations. Also, provide functions to safely crack an atomic LSN variable into it's component pieces and to recombined the components into an atomic variable. Use them where appropriate. This also removes the need for explicitly holding a spinlock to read the l_tail_lsn on 32 bit platforms. Signed-off-by: Dave Chinner <dchinner@redhat.com>
\| \| * \| \|	xfs: convert l_last_sync_lsn to an atomic variable	Dave Chinner	2010-12-03	3	-34/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	log->l_last_sync_lsn is updated in only one critical spot - log buffer Io completion - and is protected by the grant lock here. This requires the grant lock to be taken for every log buffer IO completion. Converting the l_last_sync_lsn variable to an atomic64_t means that we do not need to take the grant lock in log buffer IO completion to update it. This also removes the need for explicitly holding a spinlock to read the l_last_sync_lsn on 32 bit platforms. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: make AIL tail pushing independent of the grant lock	Dave Chinner	2010-12-21	1	-57/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The xlog_grant_push_ail() currently takes the grant lock internally to sample the tail lsn, last sync lsn and the reserve grant head. Most of the callers already hold the grant lock but have to drop it before calling xlog_grant_push_ail(). This is a left over from when the AIL tail pushing was done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL push is now done in another thread and hence we can safely hold the grant lock over the entire xlog_grant_push_ail call. Push the grant lock outside of xlog_grant_push_ail() to simplify the locking and synchronisation needed for tail pushing. This will reduce traffic on the grant lock by itself, but this is only one step in preparing for the complete removal of the grant lock. While there, clean up the formatting of xlog_grant_push_ail() to match the rest of the XFS code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: use wait queues directly for the log wait queues	Dave Chinner	2010-12-21	6	-106/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log grant queues are one of the few places left using sv_t constructs for waiting. Given we are touching this code, we should convert them to plain wait queues. While there, convert all the other sv_t users in the log code as well. Seeing as this removes the last users of the sv_t type, remove the header file defining the wrapper and the fragments that still reference it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: combine grant heads into a single 64 bit integer	Dave Chinner	2010-12-21	4	-91/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prepare for switching the grant heads to atomic variables by combining the two 32 bit values that make up the grant head into a single 64 bit variable. Provide wrapper functions to combine and split the grant heads appropriately for calculations and use them as necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: rework log grant space calculations	Dave Chinner	2010-12-21	1	-47/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log grant space calculations are repeated for both write and reserve grant heads. To make it simpler to convert the calculations toa different algorithm, factor them so both the gratn heads use the same calculation functions. Once this is done we can drop the wrappers that are used in only a couple of place to update both grant heads at once as they don't provide any particular value. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: fact out common grant head/log tail verification code	Dave Chinner	2010-12-21	1	-29/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Factor repeated debug code out of grant head manipulation functions into a separate function. This removes ifdef DEBUG spagetti from the code and makes the code easier to follow. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: convert log grant ticket queues to list heads	Dave Chinner	2010-12-21	3	-97/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The grant write and reserve queues use a roll-your-own double linked list, so convert it to a standard list_head structure and convert all the list traversals to use list_for_each_entry(). We can also get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty() check to tell if the ticket is in a list or not. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: use AIL bulk delete function to implement single delete	Dave Chinner	2010-12-20	2	-72/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now have two copies of AIL delete operations that are mostly duplicate functionality. The single log item deletes can be implemented via the bulk updates by turning xfs_trans_ail_delete() into a simple wrapper. This removes all the duplicate delete functionality and associated helpers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: use AIL bulk update function to implement single updates	Dave Chinner	2010-12-20	3	-96/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now have two copies of AIL insert operations that are mostly duplicate functionality. The single log item updates can be implemented via the bulk updates by turning xfs_trans_ail_update() into a simple wrapper. This removes all the duplicate insert functionality and associated helpers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: remove all the inodes on a buffer from the AIL in bulk	Dave Chinner	2010-12-20	3	-16/+151
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When inode buffer IO completes, usually all of the inodes are removed from the AIL. This involves processing them one at a time and taking the AIL lock once for every inode. When all CPUs are processing inode IO completions, this causes excessive amount sof contention on the AIL lock. Instead, change the way we process inode IO completion in the buffer IO done callback. Allow the inode IO done callback to walk the list of IO done callbacks and pull all the inodes off the buffer in one go and then process them as a batch. Once all the inodes for removal are collected, take the AIL lock once and do a bulk removal operation to minimise traffic on the AIL lock. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: consume iodone callback items on buffers as they are processed	Dave Chinner	2010-12-03	1	-11/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To allow buffer iodone callbacks to consume multiple items off the callback list, first we need to convert the xfs_buf_do_callbacks() to consume items and always pull the next item from the head of the list. The means the item list walk is never dependent on knowing the next item on the list and hence allows callbacks to remove items from the list as well. This allows callbacks to do bulk operations by scanning the list for identical callbacks, consuming them all and then processing them in bulk, negating the need for multiple callbacks of that type. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: reduce the number of AIL push wakeups	Dave Chinner	2010-12-17	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The xfaild often tries to rest to wait for congestion to pass of for IO to complete, but is regularly woken in tail-pushing situations. In severe cases, the xfsaild is getting woken tens of thousands of times a second. Reduce the number needless wakeups by only waking the xfsaild if the new target is larger than the old one. Further make short sleeps uninterruptible as they occur when the xfsaild has decided it needs to back off to allow some IO to complete and being woken early is counter-productive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: bulk AIL insertion during transaction commit	Dave Chinner	2010-12-20	4	-12/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When inserting items into the AIL from the transaction committed callbacks, we take the AIL lock for every single item that is to be inserted. For a CIL checkpoint commit, this can be tens of thousands of individual inserts, yet almost all of the items will be inserted at the same point in the AIL because they have the same index. To reduce the overhead and contention on the AIL lock for such operations, introduce a "bulk insert" operation which allows a list of log items with the same LSN to be inserted in a single operation via a list splice. To do this, we need to pre-sort the log items being committed into a temporary list for insertion. The complexity is that not every log item will end up with the same LSN, and not every item is actually inserted into the AIL. Items that don't match the commit LSN will be inserted and unpinned as per the current one-at-a-time method (relatively rare), while items that are not to be inserted will be unpinned and freed immediately. Items that are to be inserted at the given commit lsn are placed in a temporary array and inserted into the AIL in bulk each time the array fills up. As a result of this, we trade off AIL hold time for a significant reduction in traffic. lock_stat output shows that the worst case hold time is unchanged, but contention from AIL inserts drops by an order of magnitude and the number of lock traversal decreases significantly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| \| * \| \|	xfs: clean up xfs_ail_delete()	Dave Chinner	2010-12-03	1	-20/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_ail_delete() has a needlessly complex interface. It returns the log item that was passed in for deletion (which the callers then assert is identical to the one passed in), and callers of xfs_ail_delete() still need to invalidate current traversal cursors. Make xfs_ail_delete() return void, move the cursor invalidation inside it, and clean up the callers just to use the log item pointer they passed in. While cleaning up, remove the messy and unnecessary "/* ARGUSED */" comments around all these functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>