op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	CIFS: ignore mode change if it's just for clearing setuid/setgid bits	Jeff Layton	2007-10-18	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For CIFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	NFS: if ATTR_KILL_S*ID bits are set, then skip mode change	Jeff Layton	2007-10-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For NFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	VFS: make notify_change pass ATTR_KILL_S*ID to setattr operations	Jeff Layton	2007-10-18	1	-10/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an unprivileged process attempts to modify a file that has the setuid or setgid bits set, the VFS will attempt to clear these bits. The VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid mask, and then call notify_change to clear these bits and set the mode accordingly. With a networked filesystem (NFS and CIFS in particular but likely others), the client machine or process may not have credentials that allow for setting the mode. In some situations, this can lead to file corruption, an operation failing outright because the setattr fails, or to races that lead to a mode change being reverted. In this situation, we'd like to just leave the handling of this to the server and ignore these bits. The problem is that by the time the setattr op is called, the VFS has already reinterpreted the ATTR_KILL_* bits into a mode change. The setattr operation has no way to know its intent. The following patch fixes this by making notify_change no longer clear the ATTR_KILL_SUID and ATTR_KILL_SGID bits in the ia_valid before handing it off to the setattr inode op. setattr can then check for the presence of these bits, and if they're set it can assume that the mode change was only for the purposes of clearing these bits. This means that we now have an implicit assumption that notify_change is never called with ATTR_MODE and either ATTR_KILL_S*ID bit set. Nothing currently enforces that, so this patch also adds a BUG() if that occurs. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	reiserfs: turn of ATTR_KILL_S*ID at beginning of reiserfs_setattr	Jeff Layton	2007-10-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	reiserfs_setattr can call notify_change recursively using the same iattr struct. This could cause it to trip the BUG() in notify_change. Fix reiserfs to clear those bits near the beginning of the function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	knfsd: only set ATTR_KILL_S*ID if ATTR_MODE isn't being explicitly set	Jeff Layton	2007-10-18	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's theoretically possible for a single SETATTR call to come in that sets the mode and the uid/gid. In that case, don't set the ATTR_KILL_S*ID bits since that would trip the BUG() in notify_change. Just fix up the mode to have the same effect. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ecryptfs: allow lower fs to interpret ATTR_KILL_S*ID	Jeff Layton	2007-10-18	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure ecryptfs doesn't trip the BUG() in notify_change. This also allows the lower filesystem to interpret ATTR_KILL_S*ID in its own way. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Remove struct task_struct::io_wait	Alexey Dobriyan	2007-10-18	1	-14/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b during 2.6.9 development. Commit introduced io_wait field which remained write-only than and still remains write-only. Also garbage collect macros which "use" io_wait. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext4: lighten up resize transaction requirements	Eric Sandeen	2007-10-17	1	-3/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When resizing online, setup_new_group_blocks attempts to reserve a potentially very large transaction, depending on the current filesystem geometry. For some journal sizes, there may not be enough room for this transaction, and the online resize will fail. The patch below resizes & restarts the transaction as necessary while setting up the new group, and should work with even the smallest journal. Tested with something like: [root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768 [root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384 [root@newbox ~]# mount -o loop fsfile mnt/ [root@newbox ~]# resize2fs /dev/loop0 resize2fs 1.40.2 (12-Jul-2007) Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required old desc_blocks = 1, new_desc_blocks = 1 Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks. resize2fs: No space left on device While trying to add group #2 [root@newbox ~]# dmesg \| tail -n 1 JBD: resize2fs wants too many credits (258 > 256) [root@newbox ~]# With the below change, it works. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com>
*	ext4: fix setup_new_group_blocks locking	Eric Sandeen	2007-10-17	1	-3/+3
\| \| \| \| \| \| \| \| \|	setup_new_group_blocks() manipulates the group descriptor block bh under the block_bitmap bh's lock. It shouldn't matter since nobody but resize should be touching these blocks, but it's worth fixing up. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	ext4: sparse fixes	Aneesh Kumar K.V	2007-10-17	3	-4/+4
\| \| \| \|	Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_lo	Aneesh Kumar K.V	2007-10-17	1	-2/+2
\| \| \| \| \| \| \| \|	Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo This helps in finding BUGs due to direct partial access of these split 48 bit values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_lo	Aneesh Kumar K.V	2007-10-17	1	-5/+3
\| \| \| \| \| \| \| \| \| \|	Convert ext4_extent.ee_start to ext4_extent.ee_start_lo This helps in finding BUGs due to direct partial access of these split 48 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Convert s_r_blocks_count and s_free_blocks_count	Aneesh Kumar K.V	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Convert s_r_blocks_count and s_free_blocks_count to s_r_blocks_count_lo and s_free_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: Convert s_blocks_count to s_blocks_count_lo	Aneesh Kumar K.V	2007-10-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Convert s_blocks_count to s_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Convert bg_inode_bitmap and bg_inode_table	Aneesh Kumar K.V	2007-10-17	2	-7/+7
\| \| \| \| \| \| \| \| \| \|	Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo and bg_inode_table_lo. This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix one direct partial access Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: Convert bg_block_bitmap to bg_block_bitmap_lo	Aneesh Kumar K.V	2007-10-17	1	-3/+3
\| \| \| \| \| \| \| \|	Convert bg_block_bitmap to bg_block_bitmap_lo This helps in catching some BUGS due to direct partial access of these split fields. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: FLEX_BG Kernel support v2.	Jose R. Santos	2007-10-17	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This feature relaxes check restrictions on where each block groups meta data is located within the storage media. This allows for the allocation of bitmaps or inode tables outside the block group boundaries in cases where bad blocks forces us to look for new blocks which the owning block group can not satisfy. This will also allow for new meta-data allocation schemes to improve performance and scalability. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	ext4: Fix sparse warnings	Aneesh Kumar K.V	2007-10-17	1	-2/+4
\| \| \| \| \|	Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	Ext4: Uninitialized Block Groups	Andreas Dilger	2007-10-17	6	-31/+323
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count / __le16 bg_checksum; / crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	ext4: remove #ifdef CONFIG_EXT4_INDEX	Eric Sandeen	2007-10-17	2	-27/+0
\| \| \| \| \| \| \| \| \| \| \| \|	CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext4_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	ext4: Remove (partial, never completed) fragment support	Coly Li	2007-10-17	3	-30/+0
\| \| \| \| \| \| \| \| \| \|	Fragment support in ext2/3/4 was never implemented, and it probably will never be implemented. So remove it from ext4. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	JBD2: debug code cleanup.	Jose R. Santos	2007-10-17	1	-14/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly stolen from akpm's JBD cleanup patch. - use `#ifdef foo' instead of `#if defined(foo)' - Make journal_enable_debug __read_mostly just for the heck of it - Make jbd_debugfs_dir and jbd_debug static - debugfs_remove(NULL) is legal: remove unneeded tests - remove unnecessary empty loops Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	jbd2: fix commit code to properly abort journal	Jan Kara	2007-10-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	We should really call journal_abort() and not __journal_abort_hard() in case of errors. The latter call does not record the error in the journal superblock and thus filesystem won't be marked as with errors later (and user could happily mount it without any warning). Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	jbd2: JBD_XXX to JBD2_XXX naming cleanup	Mingming Cao	2007-10-17	6	-10/+10
\| \| \| \| \| \| \|	change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	JBD2/Ext4: Convert kmalloc to kzalloc in jbd2/ext4	Mingming Cao	2007-10-17	3	-6/+3
\| \| \| \| \| \|	Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	JBD2: replace jbd_kmalloc with kmalloc directly.	Mingming Cao	2007-10-17	2	-12/+3
\| \| \| \| \| \|	This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	JBD: replace jbd_kmalloc with kmalloc directly	Mingming Cao	2007-10-17	2	-12/+3
\| \| \| \| \| \|	This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	JBD2: jbd2 slab allocation cleanups	Mingming Cao	2007-10-17	3	-94/+14
\| \| \| \| \| \| \| \| \| \| \|	JBD2: Replace slab allocations with page allocations JBD2 allocate memory for committed_data and frozen_data from slab. However JBD2 should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	JBD: JBD slab allocation cleanups	Mingming Cao	2007-10-17	3	-91/+11
\| \| \| \| \| \| \| \| \| \|	JBD: Replace slab allocations with page allocations JBD allocate memory for committed_data and frozen_data from slab. However JBD should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
*	9p: fix bad kconfig cross-dependency	Eric Van Hensbergen	2007-10-17	1	-46/+1
\| \| \| \| \| \| \| \|	This patch moves transport dynamic registration and matching to the net module to prevent a bad Kconfig dependency between the net and fs 9p modules. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
*	9p: soften invalidation in loose_mode	Eric Van Hensbergen	2007-10-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	Loose mode in 9p utilizes the page cache without respecting coherency with the server. Any writes previously invaldiated the entire mapping for a file. This patch softens the behavior to only invalidate the region of the actual write. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
*	9p: attach-per-user	Latchesar Ionkov	2007-10-17	4	-60/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 9P2000 protocol requires the authentication and permission checks to be done in the file server. For that reason every user that accesses the file server tree has to authenticate and attach to the server separately. Multiple users can share the same connection to the server. Currently v9fs does a single attach and executes all I/O operations as a single user. This makes using v9fs in multiuser environment unsafe as it depends on the client doing the permission checking. This patch improves the 9P2000 support by allowing every user to attach separately. The patch defines three modes of access (new mount option 'access'): - attach-per-user (access=user) (default mode for 9P2000.u) If a user tries to access a file served by v9fs for the first time, v9fs sends an attach command to the server (Tattach) specifying the user. If the attach succeeds, the user can access the v9fs tree. As there is no uname->uid (string->integer) mapping yet, this mode works only with the 9P2000.u dialect. - allow only one user to access the tree (access=<uid>) Only the user with uid can access the v9fs tree. Other users that attempt to access it will get EPERM error. - do all operations as a single user (access=any) (default for 9P2000) V9fs does a single attach and all operations are done as a single user. If this mode is selected, the v9fs behavior is identical with the current one. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
*	9p: rename uid and gid parameters	Latchesar Ionkov	2007-10-17	3	-12/+16
\| \| \| \| \| \| \| \| \| \|	Change the names of 'uid' and 'gid' parameters to the more appropriate 'dfltuid' and 'dfltgid'. This also sets the default uid/gid to -2 (aka nfsnobody) Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
*	9p: define session flags	Latchesar Ionkov	2007-10-17	3	-17/+27
\| \| \| \| \| \| \| \| \|	Create more general flags field in the v9fs_session_info struct and move the 'extended' flag as a bit in the flags. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
*	9p: Make transports dynamic	Eric Van Hensbergen	2007-10-17	3	-108/+75
\| \| \| \| \| \| \| \| \|	This patch abstracts out the interfaces to underlying transports so that new transports can be added as modules. This should also allow kernel configuration of transports without ifdef-hell. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
*	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6	Linus Torvalds	2007-10-17	90	-4739/+2762
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (59 commits) [XFS] eagerly remove vmap mappings to avoid upsetting Xen [XFS] simplify validata_fields [XFS] no longer using io_vnode, as was remaining from 23 cherrypick [XFS] Remove STATIC which was missing from prior manual merge [XFS] Put back the QUEUE_ORDERED_NONE test in the barrier check. [XFS] Turn off XBF_ASYNC flag before re-reading superblock. [XFS] avoid race in sync_inodes() that can fail to write out all dirty data [XFS] This fix prevents bulkstat from spinning in an infinite loop. [XFS] simplify xfs_create/mknod/symlink prototype [XFS] avoid xfs_getattr in XFS_IOC_FSGETXATTR ioctl [XFS] get_bulkall() could return incorrect inode state [XFS] Kill unused IOMAP_EOF flag [XFS] fix when DMAPI mount option processing happens [XFS] ensure file size is logged on synchronous writes [XFS] growlock should be a mutex [XFS] replace some large xfs_log_priv.h macros by proper functions [XFS] kill struct bhv_vfs [XFS] move syncing related members from struct bhv_vfs to struct xfs_mount [XFS] kill the vfs_flags member in struct bhv_vfs [XFS] kill the vfs_fsid and vfs_altfsid members in struct bhv_vfs ...
\| *	[XFS] eagerly remove vmap mappings to avoid upsetting Xen	Jeremy Fitzhardinge	2007-10-17	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XFS leaves stray mappings around when it vmaps memory to make it virtually contigious. This upsets Xen if one of those pages is being recycled into a pagetable, since it finds an extra writable mapping of the page. This patch solves the problem in a brute force way, by making XFS always eagerly unmap its mappings. SGI-PV: 971902 SGI-Modid: xfs-linux-melb:xfs-kern:29886a Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] simplify validata_fields	Christoph Hellwig	2007-10-17	1	-27/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stop using xfs_getattr and a onstack bhv_vattr_t just to get three fields from the underlying inode and opencode copying from the inode fields instead. SGI-PV: 970662 SGI-Modid: xfs-linux-melb:xfs-kern:29711a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] no longer using io_vnode, as was remaining from 23 cherrypick	Tim Shimmin	2007-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we cherrypicked SGI-Modid xfs-linux-melb:xfs-kern:29675a and it depended on the sgi mod which removed io_vnode (which was not cherrypicked in 23) it was hand modified. This fixes things back up (to the originial mod) now we have moved on again. Reviewed-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] Remove STATIC which was missing from prior manual merge	Tim Shimmin	2007-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removes STATIC on xfs_freeze function which was not manually applied for SGI-Modid: xfs-linux-melb:xfs-kern:29504a. Reviewed-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] Put back the QUEUE_ORDERED_NONE test in the barrier check.	Tim Shimmin	2007-10-16	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Put back the QUEUE_ORDERED_NONE test which caused us grief in sles when it was taken out as, IIRC, it allowed md/lvm to be thought of as supporting barriers when they weren't in some configurations. This patch will be reverting what went in as part of a change for the SGI-pv 964544 (SGI-Modid: xfs-linux-melb:xfs-kern:28568a). SGI-PV: 971783 SGI-Modid: xfs-linux-melb:xfs-kern:29882a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
\| *	[XFS] Turn off XBF_ASYNC flag before re-reading superblock.	Lachlan McIlroy	2007-10-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SGI-PV: 971603 SGI-Modid: xfs-linux-melb:xfs-kern:29871a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] avoid race in sync_inodes() that can fail to write out all dirty data	Lachlan McIlroy	2007-10-16	1	-9/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In xfs_fs_sync_super() treat a sync the same as a filesystem freeze. This is needed to force the log to disk for inodes which are not marked dirty in the Linux inode (the inodes are marked dirty on completion of the log I/O) and so sync_inodes() will not flush them. In xfs_fs_write_inode() a synchronous flush will not get an EAGAIN from xfs_inode_flush() and if an asynchronous flush returns EAGAIN we should pass it on to the caller. If we get an error while flushing the inode then re-dirty it so we can try again later. SGI-PV: 971670 SGI-Modid: xfs-linux-melb:xfs-kern:29860a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] This fix prevents bulkstat from spinning in an infinite loop.	Lachlan McIlroy	2007-10-16	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here 'agino' increments through the inodes in an allocation group. At the end of the innermost 'for' loop it will hold the value of the next inode to look at (ie the first inode in the next cluster/chunk). Assigning 'lastino' to 'agino' resets it to the last inode in the last inode cluster we just looked at. This causes us to look up the very same cluster and examine all the inodes all over again, and again, and again... We also want to set 'lastino' for the cases when we're not interested in the inode so that the next call to bulkstat won't re-examine the same uninteresting inodes. SGI-PV: 971064 SGI-Modid: xfs-linux-melb:xfs-kern:29840a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] simplify xfs_create/mknod/symlink prototype	Christoph Hellwig	2007-10-16	5	-47/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify the prototype for xfs_create/xfs_mkdir/xfs_symlink by not passing down a bhv_vattr_t that just hogs stack space. Instead pass down the mode in a mode_t and in case of xfs_create the rdev as a scalar type as well. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29794a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] avoid xfs_getattr in XFS_IOC_FSGETXATTR ioctl	Christoph Hellwig	2007-10-16	1	-44/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No need to call into xfs_getattr and put a big bhv_vattr_t on the stack just to get a little information from the XFS inode. Add a helper called xfs_ioc_fsgetxattr instead that deals with retrieving the information in a clean way. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29780a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] get_bulkall() could return incorrect inode state	Vlad Apostolov	2007-10-16	2	-7/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the following scenario xfs_bulkstat() returns incorrect stale inode state: 1. File_A is created and its inode synced to disk. 2. File_A is unlinked and doesn't exist anymore. 3. Filesystem sync is invoked. 4. File_B is created. File_B happens to reclaim File_A's inode. 5. xfs_bulkstat() is called and detects File_B but reports the incorrect File_A inode state. Explanation for the incorrect inode state is that inodes are not immediately synced on file create for performance reasons. This leaves the on-disk inode buffer uninitialized (or with old state from a previous generation inode) and this is what xfs_bulkstat() would report. The patch marks the on-disk inode buffer "dirty" on unlink. When the inode is reclaimed (by a new file create), xfs_bulkstat() would filter this inode by the "dirty" mark. Once the inode is flushed to disk, the on-disk buffer "dirty" mark is automatically removed and a following xfs_bulkstat() would return the correct inode state. Marking the on-disk inode buffer "dirty" on unlink is achieved by setting the on-disk di_nlink field to 0. Note that the in-core di_nlink has already been set to 0 and a corresponding transaction logged by xfs_droplink(). This is an exception from the rule that any on-disk inode buffer changes has to be followed by a disk write (inode flush). Synchronizing the in-core to on-disk di_nlink values in advance (before the actual inode flush to disk) should be fine in this case because the inode is already unlinked and it would never change its di_nlink again for this inode generation. SGI-PV: 970842 SGI-Modid: xfs-linux-melb:xfs-kern:29757a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mark Goodwin <markgw@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] Kill unused IOMAP_EOF flag	Christoph Hellwig	2007-10-16	2	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29705a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] fix when DMAPI mount option processing happens	Vlad Apostolov	2007-10-16	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix for a regression caused by a recent patch that moved the DMAPI mount option processing inside xfs_parseargs(). The DMAPI mount option used to be processed in the DMAPI module loaded before xfs_parseargs() was invoked. SGI-PV: 970451 SGI-Modid: xfs-linux-melb:xfs-kern:29683a Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
\| *	[XFS] ensure file size is logged on synchronous writes	Lachlan McIlroy	2007-10-16	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Synchronous writes currently log inode changes before syncing pages to disk. Since the file size is updated on I/O completion we wont be writing out the updated file size and if we crash the file will have the wrong size. This change moves the logging after the syncing of the pages to ensure we log the correct file size. SGI-PV: 970334 SGI-Modid: xfs-linux-melb:xfs-kern:29649a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>