op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ext4: add missing ext4_journal_stop()	Akinobu Mita	2008-02-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Add missing ext4_journal_stop() in error handling. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Stephen Tweedie <sct@redhat.com> Cc: adilger@clusterfs.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mingming Cao <cmm@us.ibm.com>
*	ext4: ext4_find_next_zero_bit needs an aligned address on some arch	Aneesh Kumar K.V	2008-02-23	1	-22/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit and use them in the mballoc. Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286 Eric Sandeen debugged the problem and suggested the fix. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: set EXT4_EXTENTS_FL only for directory and regular files	Aneesh Kumar K.V	2008-02-25	2	-8/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	In addition, don't inherit EXT4_EXTENTS_FL from parent directory. If we have a directory with extent flag set and later mount the file system with -o noextents, the files created in that directory will also have extent flag set but we would not have called ext4_ext_tree_init for them. This will cause error later when we are verifying the extent header Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Don't mark filesystem error if fallocate fails	Aneesh Kumar K.V	2008-02-25	1	-5/+6
\| \| \| \| \| \| \| \|	If we fail to allocate blocks don't call ext4_error. Also don't hide errors from ext4_get_blocks_wrap Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix BUG when writing to an unitialized extent	Mingming Cao	2008-02-25	2	-3/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a bug when writing to preallocated but uninitialized blocks, which resulted in a BUG in fs/buffer.c saying that the buffer is not mapped. When writing to a file, ext4_get_block_wrap() is called with create=1 in order to request that blocks be allocated if necessary. It currently calls ext4_get_blocks() with create=0 in order to do a lookup first. If the inode contains an unitialized data block, the buffer head is left unampped, which ext4_get_blocks_wrap() returns, causing the BUG. We fix this by checking to see if the buffer head is unmapped, and if so, we make sure the the buffer head is mapped by calling ext4_ext_get_blocks with create=1. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Don't use ext4_dec_count() if not needed	Theodore Ts'o	2008-02-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	The ext4_dec_count() function is only needed when dropping the i_nlink count on inodes which are (or which could be) directories. If we know that the inode in question can't possibly be a directory, use drop_nlink or clear_nlink() if we know i_nlink is 1. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: modify block allocation algorithm for the last group	Valerie Clement	2008-02-15	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a directory inode is allocated in the last group and the last group contains less than s_blocks_per_group blocks, the initial block allocated for the directory is not always allocated in the same group as the directory inode, but in one of the first groups of the filesystem (group 1 for example). Depending on the current process's pid, ext4_find_near() and ext4_ext_find_goal() can return a block number greater than the maximum blocks count in the filesystem and in that case the block will be not allocated in the same group as the inode. The following patch fixes the problem. Should the modification also be done in ext2/3 code? Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Don't claim block from group which has corrupt bitmap	Aneesh Kumar K.V	2008-02-15	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ext4_mb_complex_scan_group, if the extent length of the newly found extentet is greater than than the total free blocks counted in group info, break without claiming the block. Document different ext4_error usage, explaining the state with which we continue if we mount with errors=continue Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Get journal write access before modifying the extent tree	Aneesh Kumar K.V	2008-02-22	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \|	When the user was writing into an unitialized extent, ext4_ext_convert_to_initialize() was not requesting journal write access before it started to modify the extent tree. Fix this oversight. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent()	Aneesh Kumar K.V	2008-02-25	2	-3/+8
\| \| \| \| \| \| \| \| \| \| \|	The path variable returned via ext4_ext_find_extent is a kmalloc variable and needs to be freeded. It also contains a reference to buffer_head which needs to be dropped. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Don't leave behind a half-created inode if ext4_mkdir() fails	Aneesh Kumar K.V	2008-02-22	1	-7/+4
\| \| \| \| \| \| \| \| \| \|	If ext4_mkdir() fails to allocate the initial block for the directory, don't leave behind a half-created directory inode with the link count left at one. This was caused by an inappropriate call to ext4_dec_count(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!	Valerie Clement	2008-02-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the flex_bg feature enabled, a large file creation oopses the kernel. The BUG_ON is: BUG_ON(len >= EXT4_BLOCKS_PER_GROUP(sb)); As the allocation of the bitmaps and the inode table can be done outside the block group with flex_bg, this allows to allocate up to EXT4_BLOCKS_PER_GROUP blocks in a group. This patch fixes the oops. Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix locking hierarchy violation in ext4_fallocate()	Aneesh Kumar K.V	2008-02-15	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	ext4_fallocate() was trying to acquire i_data_sem outside of jbd2_start_transaction/jbd2_journal_stop, which violates ext4's locking hierarchy. So we take i_mutex to prevent writes and truncates during the complete fallocate operation, and use ext4_get_block_wrap() which acquires and releases i_data_sem for each block allocation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Remove incorrect BKL comments in ext4	Andi Kleen	2008-02-25	2	-2/+1
\| \| \| \| \| \|	Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Introduce path_put()	Jan Blunck	2008-02-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Embed a struct path into struct nameidata instead of nd->{dentry,mnt}	Jan Blunck	2008-02-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 6894212 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext4: Add new "development flag" to the ext4 filesystem	Theodore Tso	2008-02-10	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This flag is simply a generic "this is a crash/burn test filesystem" marker. If it is set, then filesystem code which is "in development" will be allowed to mount the filesystem. Filesystem code which is not considered ready for prime-time will check for this flag, and if it is not set, it will refuse to touch the filesystem. As we start rolling ext4 out to distro's like Fedora, et. al, this makes it less likely that a user might accidentally start using ext4 on a production filesystem; a bad thing, since that will essentially make it be unfsckable until e2fsprogs catches up. Signed-off-by: Theodore Tso <tytso@MIT.EDU> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: Don't panic in case of corrupt bitmap	Aneesh Kumar K.V	2008-02-10	1	-14/+21
\| \| \| \| \| \| \| \| \| \| \| \|	Multiblock allocator calls BUG_ON in many case if the free and used blocks count obtained looking at the bitmap is different from what the allocator internally accounted for. Use ext4_error in such case and don't panic the system. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: allocate struct ext4_allocation_context from a kmem cache	Eric Sandeen	2008-02-10	1	-45/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct ext4_allocation_context is rather large, and this bloats the stack of many functions which use it. Allocating it from a named slab cache will alleviate this. For example, with this change (on top of the noinline patch sent earlier): -ext4_mb_new_blocks 200 +ext4_mb_new_blocks 40 -ext4_mb_free_blocks 344 +ext4_mb_free_blocks 168 -ext4_mb_release_inode_pa 216 +ext4_mb_release_inode_pa 40 -ext4_mb_release_group_pa 192 +ext4_mb_release_group_pa 24 Most of these stack-allocated structs are actually used only for mballoc history; and in those cases often a smaller struct would do. So changing that may be another way around it, at least for those functions, if preferred. For now, in those cases where the ac is only for history, an allocation failure simply skips the history recording, and does not cause any other failures. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix Direct I/O locking	Jan Kara	2008-02-10	1	-54/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot start transaction in ext4_direct_IO() and just let it last during the whole write because dio_get_page() acquires mmap_sem which ranks above transaction start (e.g. because we have dependency chain mmap_sem->PageLock->journal_start, or because we update atime while holding mmap_sem) and thus deadlocks could happen. We solve the problem by starting a transaction separately for each ext4_get_block() call. We could have a problem that we allocate a block and before its data are written out the machine crashes and thus we expose stale data. But that does not happen because for hole-filling generic code falls back to buffered writes and for file extension, we add inode to orphan list and thus in case of crash, journal replay will truncate inode back to the original size. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Fix circular locking dependency with migrate and rm.	Aneesh Kumar K.V	2008-02-10	1	-43/+74
\| \| \| \| \| \| \| \| \| \| \| \| \|	In order to prevent a circular locking dependency when an unlink operation is racing with an ext4 migration, we delay taking i_data_sem until just before switch the inode format, and use i_mutex to prevent writes and truncates during the first part of the migration operation. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	allow in-inode EAs on ext4 root inode	Eric Sandeen	2008-02-05	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ext3 root inode was treated specially with respect to in-inode extended attributes, for reasons detailed in the removed comment below. The first mkfs-created inodes would not get extra_i_size or the EXT3_STATE_XATTR flag set in ext3_read_inode, which disallowed reading or setting in-inode EAs on the root. However, in ext4, ext4_mark_inode_dirty calls ext4_expand_extra_isize for all inodes; once this is done EAs may be placed in the root ext4 inode body. But for reasons above, it won't be found after a reboot. testcase: setfattr -n user.name -v value mntpt/ setfattr -n user.name2 -v value2 mntpt/ umount mntpt/; remount mntpt/ getfattr -d mntpt/ name2/value2 has gone missing; debugfs shows it in the inode body, but it is not found there by getattr. The following fixes it up; newer mkfs appears to properly zero the inodes, so this workaround isn't needed for ext4. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
*	ext4: Fix null bh pointer dereference in mballoc	Aneesh Kumar K.V	2008-02-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Repoted by Adrian Bunk <bunk@kernel.org>: The Coverity checker spotted the following NULL dereference: static int ext4_mb_mark_diskspace_used { ... if (!bitmap_bh) goto out_err; ... out_err: sb->s_dirt = 1; put_bh(bitmap_bh); ... Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: Don't set EXTENTS_FL flag for fast symlinks	Valerie Clement	2008-02-05	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For fast symbolic links, the file content is stored in the i_block[] array, which is not compatible with the new file extents format. e2fsck reports error on such files because EXTENTS_FL is set. Don't set the EXTENTS_FL flag when creating fast symlinks. In the case of file migration, skip fast symbolic links. Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	iget: stop EXT4 from using iget() and read_inode()	David Howells	2008-02-07	5	-69/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stop the EXT4 filesystem from using iget() and read_inode(). Replace ext4_read_inode() with ext4_iget(), and call that instead of iget(). ext4_iget() then uses iget_locked() directly and returns a proper error code instead of an inode in the event of an error. ext4_fill_super() returns any error incurred when getting the root inode instead of EINVAL. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext[234]: cleanup ext[234]_bg_num_gdb()	Akinobu Mita	2008-02-06	1	-5/+1
\| \| \| \| \| \| \| \| \|	Use ext[234]_bg_has_super() to remove duplicate code. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext[234]: remove unused argument for ext[234]_find_goal()	Akinobu Mita	2008-02-06	1	-6/+3
\| \| \| \| \| \| \| \| \| \|	The argument chain for ext[234]_find_goal() is not used. This patch removes it and fixes comment as well. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext[234]: use ext[234]_get_group_desc()	Akinobu Mita	2008-02-06	1	-10/+4
\| \| \| \| \| \| \| \| \|	Use ext[234]_get_group_desc() to get group descriptor from group number. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext[234]: fix comment for nonexistent variable	Akinobu Mita	2008-02-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	The comment in ext[234]_new_blocks() describes about "i". But there is no local variable called "i" in that scope. I guess it has been renamed to group_no. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user	Christoph Lameter	2008-02-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext4: Use the ext4_ext_actual_len() helper function	Aneesh Kumar K.V	2008-01-28	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext4 uses the high bit of the extent length to encode whether the extent is intialized or not. The helper function ext4_ext_get_actual_len should be used to get the actual length of the extent. This addresses the kernel bug documented here: http://bugzilla.kernel.org/show_bug.cgi?id=9732 kernel BUG at fs/ext4/extents.c:1056! .... Call Trace: [<ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1 [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff812748f6>] _spin_unlock+0x17/0x20 [<ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe [<ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff81056480>] __lock_acquire+0x4e7/0xc4d [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff810a8de7>] sys_fallocate+0xe4/0x10d [<ffffffff8100c043>] tracesys+0xd5/0xda Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix uniniatilized extent splitting error	Dmitry Monakhov	2008-01-28	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix bug reported by Dmitry Monakhov caused by lost error code Testcase: blksize = 0x1000; fd = open(argv[1], O_RDWR\|O_CREAT, 0700); unsigned long long sz = 0x10000000UL; /* allocating big blocks chunk / syscall(__NR_fallocate, fd, 0, 0UL, sz) / grab all other available filesystem space / tfd = open("tmp", O_RDWR\|O_CREAT\|O_DIRECT, 0700); while( write(tfd, buf, 4096) > 0); / loop untill ENOSPC / fsync(fd); / just in case / while (pos < sz) { / each seek+ write operation result in splits uninitialized extent in three extents. Splitting may result in new extent allocation which probably will fail because of ENOSPC/ lseek(fd, blksize2 -1, SEEK_CUR); if ((ret = write(fd, 'a', 1)) != 1) exit(1); pos += blksize * 2; } Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Check for return value from sb_set_blocksize	Aneesh Kumar K.V	2008-01-28	1	-10/+5
\| \| \| \| \| \| \| \| \|	sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: Add stripe= option to /proc/mounts	Miklos Szeredi	2008-01-28	1	-0/+2
\| \| \| \| \| \| \|	Add stripe= option to /proc/mounts for ext4 filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: Enable the multiblock allocator by default	Aneesh Kumar K.V	2008-01-28	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Enable the multiblock allocator by default. Fix ext4_show_options() so if it is not enabled, the nomballoc option included in /proc/mounts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Add multi block allocator for ext4	Alex Tomas	2008-01-29	8	-36/+4721
\| \| \| \| \| \| \| \|	Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Add new functions for searching extent tree	Alex Tomas	2008-01-28	1	-0/+142
\| \| \| \| \| \| \| \| \| \| \| \|	Add the functions ext4_ext_search_left() and ext4_ext_search_right(), which are used by mballoc during ext4_ext_get_blocks to decided whether to merge extent information. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Johann Lombardi <johann@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: fix up EXT4FS_DEBUG builds	Eric Sandeen	2008-01-28	3	-12/+12
\| \| \| \| \| \| \| \|	Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: Fix ext4_show_options to show the correct mount options.	Aneesh Kumar K.V	2008-01-28	1	-11/+15
\| \| \| \| \| \| \| \|	We need to look at the default value and make sure the mount options are not set via default value before showing them via ext4_show_options Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Add EXT4_IOC_MIGRATE ioctl	Aneesh Kumar K.V	2008-01-28	4	-3/+566
\| \| \| \| \| \| \|	The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Add inode version support in ext4	Jean Noel Cordenner	2008-01-28	2	-3/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
*	ext4: Add the journal checksum feature	Girish Shilamkar	2008-01-28	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: Take read lock during overwrite case.	Aneesh Kumar K.V	2008-01-28	1	-8/+24
\| \| \| \| \| \| \|	When we are overwriting a file and not actually allocating new file system blocks we need to take only the read lock on i_data_sem. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Convert truncate_mutex to read write semaphore.	Aneesh Kumar K.V	2008-01-28	6	-19/+46
\| \| \| \| \| \| \| \|	We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.	Aneesh Kumar K.V	2008-01-28	2	-64/+14
\| \| \| \| \| \| \| \| \| \| \|	When doing a migrate from ext3 to ext4 inode we need to make sure the test for inode type and walking inode data happens inside lock. To make this happen move truncate_mutex early before checking the i_flags. This actually should enable us to remove the verify_chain(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: remove unused code from ext4_find_entry()	Mariusz Kozlowski	2008-01-28	1	-4/+0
\| \| \| \| \| \| \| \|	The unused code found in ext3_find_entry() is also present (and still unused) in the ext4_find_entry() code. This patch removes it. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Check for the correct error return from	Aneesh Kumar K.V	2008-01-28	1	-5/+5
\| \| \| \| \| \| \|	ext4_ext_get_blocks returns negative values on error. We should check for <= 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: add block bitmap validation	Aneesh Kumar K.V	2008-01-28	1	-18/+81
\| \| \| \| \| \| \| \| \|	When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Change the default behaviour on error	Aneesh Kumar K.V	2008-01-28	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \|	ext4 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: fix oops on corrupted ext4 mount	Eric Sandeen	2008-01-28	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When mounting an ext4 filesystem with corrupted s_first_data_block, things can go very wrong and oops. Because blocks_count in ext4_fill_super is a u64, and we must use do_div, the calculation of db_count is done differently than on ext4. If first_data_block is corrupted such that it is larger than ext4_blocks_count, for example, then the intermediate blocks_count value may go negative, but sign-extend to a very large value: blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); This is then assigned to s_groups_count which is an unsigned long: sbi->s_groups_count = blocks_count; This may result in a value of 0xFFFFFFFF which is then used to compute db_count: db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / EXT4_DESC_PER_BLOCK(sb); and in this case db_count will wind up as 0 because the addition overflows 32 bits. This in turn causes the kmalloc for group_desc to be of 0 size: sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *), GFP_KERNEL); and eventually in ext4_check_descriptors, dereferencing sbi->s_group_desc[desc_block] will result in a NULL pointer dereference. The simplest test seems to be to sanity check s_first_data_block, EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure their combination won't result in a bad intermediate value for blocks_count. We could just check for db_count == 0, but catching it at the root cause seems like it provides more info. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mingming Cao <cmm@us.ibm.com>