op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	xfs: don't warn on EAGAIN in inode reclaim	Dave Chinner	2010-04-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Any inode reclaim flush that returns EAGAIN will result in the inode reclaim being attempted again later. There is no need to issue a warning into the logs about this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
*	xfs: ensure that sync updates the log tail correctly	Dave Chinner	2010-04-16	1	-12/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updates to the VFS layer removed an extra ->sync_fs call into the filesystem during the sync process (from the quota code). Unfortunately the sync code was unknowingly relying on this call to make sure metadata buffers were flushed via a xfs_buftarg_flush() call to move the tail of the log forward in memory before the final transactions of the sync process were issued. As a result, the old code would write a very recent log tail value to the log by the end of the sync process, and so a subsequent crash would leave nothing for log recovery to do. Hence in qa test 182, log recovery only replayed a small handle for inode fsync transactions in this case. However, with the removal of the extra ->sync_fs call, the log tail was now not moved forward with the inode fsync transactions near the end of the sync procese the first (and only) buftarg flush occurred after these transactions went to disk. The result is that log recovery now sees a large number of transactions for metadata that is already on disk. This usually isn't a problem, but when the transactions include inode chunk allocation, the inode create transactions and all subsequent changes are replayed as we cannt rely on what is on disk is valid. As a result, if the inode was written and contains unlogged changes, the unlogged changes are lost, thereby violating sync semantics. The fix is to always issue a transaction after the buftarg flush occurs is the log iѕ not idle or covered. This results in a dummy transaction being written that contains the up-to-date log tail value, which will be very recent. Indeed, it will be at least as recent as the old code would have left on disk, so log recovery will behave exactly as it used to in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-04-14	10	-134/+212
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: use separate class for ceph sockets' sk_lock ceph: reserve one more caps space when doing readdir ceph: queue_cap_snap should always queue dirty context ceph: fix dentry reference leak in dcache readdir ceph: decode v5 of osdmap (pool names) [protocol change] ceph: fix ack counter reset on connection reset ceph: fix leaked inode ref due to snap metadata writeback race ceph: fix snap context reference leaks ceph: allow writeback of snapped pages older than 'oldest' snapc ceph: fix dentry rehashing on virtual .snap dir
\| *	ceph: use separate class for ceph sockets' sk_lock	Sage Weil	2010-04-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a separate class for ceph sockets to prevent lockdep confusion. Because ceph sockets only get passed kernel pointers, there is no dependency from sk_lock -> mmap_sem. If we share the same class as other sockets, lockdep detects a circular dependency from mmap_sem (page fault) -> fs mutex -> sk_lock -> mmap_sem because dependencies are noted from both ceph and user contexts. Using a separate class prevents the sk_lock(ceph) -> mmap_sem dependency and makes lockdep happy. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: reserve one more caps space when doing readdir	Yehuda Sadeh	2010-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were missing space for the directory cap. The result was a BUG at fs/ceph/caps.c:2178. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: queue_cap_snap should always queue dirty context	Sage Weil	2010-04-13	2	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This simplifies the calling convention, and fixes a bug where we queue a capsnap with a context other than i_head_snapc (the one that matches the dirty pages). The result was a BUG at fs/ceph/caps.c:2178 on writeback completion when a capsnap matching the writeback snapc could not be found. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: fix dentry reference leak in dcache readdir	Sage Weil	2010-04-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When filldir returned an error (e.g. buffer full for a large directory), we would leak a dentry reference, causing an oops on umount. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: decode v5 of osdmap (pool names) [protocol change]	Sage Weil	2010-04-09	3	-73/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Teach the client to decode an updated format for the osdmap. The new format includes pool names, which will be useful shortly. Get this change in earlier rather than later. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: fix ack counter reset on connection reset	Sage Weil	2010-04-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If in_seq_acked isn't reset along with in_seq, we don't ack received messages until we reach the old count, consuming gobs memory on the other end of the connection and introducing a large delay when those messages are eventually deleted. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: fix leaked inode ref due to snap metadata writeback race	Sage Weil	2010-04-01	2	-14/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We create a ceph_cap_snap if there is dirty cap metadata (for writeback to mds) OR dirty pages (for writeback to osd). It is thus possible that the metadata has been written back to the MDS but the OSD data has not when the cap_snap is created. This results in a cap_snap with dirty(caps) == 0. The problem is that cap writeback to the MDS isn't necessary, and a FLUSHSNAP cap op gets no ack from the MDS. This leaves the cap_snap attached to the inode along with its inode reference. Fix the problem by dropping the cap_snap if it becomes 'complete' (all pages written out) and dirty(caps) == 0 in ceph_put_wrbuffer_cap_refs(). Also, BUG() in __ceph_flush_snaps() if we encounter a cap_snap with dirty(caps) == 0. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: fix snap context reference leaks	Sage Weil	2010-04-01	1	-20/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The get_oldest_context() helper takes a reference to the returned snap context, but most callers weren't dropping that reference. Fix them. Also drop the unused locked __get_oldest_context() variant. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: allow writeback of snapped pages older than 'oldest' snapc	Sage Weil	2010-04-01	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On snap deletion, we don't regenerate ceph_cap_snaps for inodes with dirty pages because deletion does not affect metadata writeback. However, we did run into problems when we went to write back the pages because the 'oldest' snapc is determined by the oldest cap_snap, and that may be the newer snapc that reflects the deletion. This caused confusion and an infinite loop in ceph_update_writeable_page(). Change the snapc checks to allow writeback of any snapc that is equal to OR older than the 'oldest' snapc. When there are no cap_snaps, we were also using the realm's latest snapc for writeback, which complicates ceph_put_wrbufffer_cap_refs(). Instead, use i_head_snapc, the most snapc used for the most recent ('head') data. This makes the writeback snapc (ceph_osd_request.r_snapc) _always_ match a capsnap or i_head_snapc. Also, in writepags_finish(), drop the snapc referenced by the _page_ and do not assume it matches the request snapc (it may not anymore). Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: fix dentry rehashing on virtual .snap dir	Sage Weil	2010-03-30	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a lookup fails on the magic .snap directory, we bind it to a magic snap directory inode in ceph_lookup_finish(). That code assumes the dentry is unhashed, but a recent server-side change started returning NULL leases on lookup failure, causing the .snap dentry to be hashed and NULL by ceph_fill_trace(). This causes dentry hash chain corruption, or a dies when d_rehash() includes BUG_ON(!d_unhashed(entry)); So, avoid processing the NULL dentry lease if it the dentry matches the snapdir name in ceph_fill_trace(). That allows the lookup completion to properly bind it to the snapdir inode. BUG there if dentry is hashed to be sure. Signed-off-by: Sage Weil <sage@newdream.net>
* \|	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6	Linus Torvalds	2010-04-13	5	-23/+38
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: fix delegated locking NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible NFS: Fix a race with the new commit code NFS: Ensure that writeback_single_inode() calls write_inode() when syncing NFS: Fix the mode calculation in nfs_find_open_context NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR
\| * \|	NFSv4: fix delegated locking	Trond Myklebust	2010-04-12	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
\| * \|	NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible	Trond Myklebust	2010-04-09	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always want to ensure that WRITE and COMMIT completes, whether or not the user presses ^C. Do this by making the call asynchronous, and allowing the user to do an interruptible wait for rpc_task completion. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \|	NFS: Fix a race with the new commit code	Trond Myklebust	2010-04-09	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a race which occurs due to the fact that we release the PG_writeback flag while still holding the nfs_page locked. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \|	NFS: Ensure that writeback_single_inode() calls write_inode() when syncing	Trond Myklebust	2010-04-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since writeback_single_inode() checks the inode->i_state flags _before_ it flushes out the data, we need to ensure that the I_DIRTY_DATASYNC flag is already set. Otherwise we risk not seeing a call to write_inode(), which again means that we break fsync() et al... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \|	NFS: Fix the mode calculation in nfs_find_open_context	Trond Myklebust	2010-04-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \|	NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR	Trond Myklebust	2010-04-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
* \| \|	Merge branch 'master' of ↵	Linus Torvalds	2010-04-12	2	-5/+21
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: make sure the chunk allocator doesn't create zero length chunks Btrfs: fix data enospc check overflow
\| * \| \|	Btrfs: make sure the chunk allocator doesn't create zero length chunks	Chris Mason	2010-04-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A recent commit allowed for smaller chunks to be created, but didn't make sure they were always bigger than a stripe. After some divides, this led to zero length stripes. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: fix data enospc check overflow	Josef Bacik	2010-04-05	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we account for reserved space we get from the allocator before we actually account for allocating delalloc space, we can have a small window where the amount of "used" space in a space_info is more than the total amount of space in the space_info. This will cause a overflow in our check, so it will seem like we have _tons_ of free space, and we'll allow reservations to occur that will end up larger than the amount of space we have. I've seen users report ENOSPC panic's in cow_file_range a few times recently, so I tried to reproduce this problem and found I could reproduce it if I ran one of my tests in a loop for like 20 minutes. With this patch my test ran all night without issues. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* \| \| \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2010-04-12	3	-6/+16
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix possible dq_flags corruption quota: Hide warnings about writes to the filesystem before quota was turned on ext3: symlink must be handled via filesystem specific operation ext2: symlink must be handled via filesystem specific operation
\| * \| \| \|	quota: Fix possible dq_flags corruption	Andrew Perepechko	2010-04-12	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a change done by an atomic operation can be overwritten by a change done by a non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk. Signed-off-by: Andrew Perepechko <andrew.perepechko@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| * \| \| \|	quota: Hide warnings about writes to the filesystem before quota was turned on	Jan Kara	2010-04-12	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a root filesystem write to the filesystem before quota is turned on happens regularly and there's no way around it because of writes to syslog, /etc/mtab, and similar. So the warning is rather pointless for ordinary users. It's still useful during development so we just hide the warning behind __DQUOT_PARANOIA config option. Signed-off-by: Jan Kara <jack@suse.cz>
\| * \| \| \|	ext3: symlink must be handled via filesystem specific operation	Dmitry Monakhov	2010-04-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext3_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
\| * \| \| \|	ext2: symlink must be handled via filesystem specific operation	Dmitry Monakhov	2010-04-12	1	-0/+2
\| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext2_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
* \| \| \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2010-04-12	5	-10/+16
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: add speciffic ->setattr callback udf: potential integer overflow
\| * \| \| \|	udf: add speciffic ->setattr callback	Dmitry Monakhov	2010-04-08	4	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generic setattr not longer responsible for quota transfer. use udf_setattr for all udf's inodes. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
\| * \| \| \|	udf: potential integer overflow	Dan Carpenter	2010-04-08	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bloc->logicalBlockNum is unsigned so it's never less than zero. When I saw that, it made me worry that "bloc->logicalBlockNum + count" could overflow. That's why I changed the check for less than zero to an overflow check. (The test works because "count" is also unsigned.) Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* \| \| \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-04-12	3	-3/+3
\|\ \ \ \ \ \| \|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo "numer" -> "number" in alloc.c nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v() nilfs2: fix a wrong type conversion in nilfs_ioctl()
\| * \| \| \|	nilfs2: fix typo "numer" -> "number" in alloc.c	Ryusuke Konishi	2010-04-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes the typo found in a warning message of a persistent object allocator function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| * \| \| \|	nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()	Li Hong	2010-04-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`make CONFIG_NILFS2_FS=m M=fs/nilfs2/` will give the following warnings: fs/nilfs2/btree.c: In function 'nilfs_btree_propagate': fs/nilfs2/btree.c:1882: warning: 'maxlevel' may be used uninitialized in this function fs/nilfs2/btree.c:1882: note: 'maxlevel' was declared here Set maxlevel = 0 to fix it. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| * \| \| \|	nilfs2: fix a wrong type conversion in nilfs_ioctl()	Li Hong	2010-03-31	1	-1/+1
\| \| \|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(void * __user ) should be (void __user ) Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* \| \| \|	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2010-04-09	2	-62/+75
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits) cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch loop: Update mtime when writing using aops block: expose the statistics in blkio.time and blkio.sectors for the root cgroup backing-dev: Handle class_create() failure Block: Fix block/elevator.c elevator_get() off-by-one error drbd: lc_element_by_index() never returns NULL cciss: unlock on error path cfq-iosched: Do not merge queues of BE and IDLE classes cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging i2o: Remove the dangerous kobj_to_i2o_device macro block: remove 16 bytes of padding from struct request on 64bits cfq-iosched: fix a kbuild regression block: make CONFIG_BLK_CGROUP visible Remove GENHD_FL_DRIVERFS block: Export max number of segments and max segment size in sysfs block: Finalize conversion of block limits functions block: Fix overrun in lcm() and move it to lib vfs: improve writeback_inodes_wb() paride: fix off-by-one test drbd: fix al-to-on-disk-bitmap for 4k logical_block_size ...
\| * \ \ \	Merge branch 'master' into for-linus	Jens Axboe	2010-03-19	72	-1991/+364
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: block/Kconfig Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \| \| \| \|	vfs: improve writeback_inodes_wb()	Edward Shishkin	2010-03-12	1	-60/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not pin/unpin superblock for every inode in writeback_inodes_wb(), pin it for the whole group of inodes which belong to the same superblock and call writeback_sb_inodes() handler for them. Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \| \| \| \|	blkdev: fix merge_bvec_fn return value checks v2	Dmitry Monakhov	2010-03-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	merge_bvec_fn() returns bvec->bv_len on success. So we have to check against this value. But in case of fs_optimization merge we compare with wrong value. This patch must be included in b428cd6da7e6559aca69aa2e3a526037d3f20403 But accidentally i've forgot to add this in the initial patch. To make things straight let's replace all such checks. In fact this makes code easy to understand. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* \| \| \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	Linus Torvalds	2010-04-08	3	-4/+59
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: not overwriting file_lock structure after GET_LK cifs: Fix a kernel BUG with remote OS/2 server (try #3) [CIFS] initialize nbytes at the beginning of CIFSSMBWrite() [CIFS] Add mmap for direct, nobrl cifs mount types
\| * \| \| \| \| \|	not overwriting file_lock structure after GET_LK	Pavel Shilovsky	2010-04-06	2	-3/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have preventing lock, cifs should overwrite file_lock structure with info about preventing lock. If we haven't preventing lock, cifs should leave it unchanged except for the lock type (change it to F_UNLCK). Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \| \| \| \| \|	cifs: Fix a kernel BUG with remote OS/2 server (try #3)	Suresh Jayaraman	2010-04-03	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While chasing a bug report involving a OS/2 server, I noticed the server sets pSMBr->CountHigh to a incorrect value even in case of normal writes. This results in 'nbytes' being computed wrongly and triggers a kernel BUG at mm/filemap.c. void iov_iter_advance(struct iov_iter *i, size_t bytes) { BUG_ON(i->count < bytes); <--- BUG here Why the server is setting 'CountHigh' is not clear but only does so after writing 64k bytes. Though this looks like the server bug, the client side crash may not be acceptable. The workaround is to mask off high 16 bits if the number of bytes written as returned by the server is greater than the bytes requested by the client as suggested by Jeff Layton. CC: Stable <stable@kernel.org> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \| \| \| \| \|	[CIFS] initialize nbytes at the beginning of CIFSSMBWrite()	Steve French	2010-04-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By doing this we always overwrite nbytes value that is being passed on to CIFSSMBWrite() and need not rely on the callers to initialize. CIFSSMBWrite2 is doing this already. CC: Stable <stable@kernel.org> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| * \| \| \| \| \|	[CIFS] Add mmap for direct, nobrl cifs mount types	Pavel Shilovsky	2010-03-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	without mmap functions in file_ops OpenOffice can't save changes in existing document. The same situation you can see with gedit. Also, a.out format of files can't be executed without mmap. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* \| \| \| \| \| \|	Have nfs ->d_revalidate() report errors properly	Al Viro	2010-04-07	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If nfs atomic open implementation ends up doing open request from ->d_revalidate() codepath and gets an error from server, return that error to caller explicitly and don't bother with lookup_instantiate_filp() at all. ->d_revalidate() can return an error itself just fine... See http://bugzilla.kernel.org/show_bug.cgi?id=15674 http://marc.info/?l=linux-kernel&m=126988782722711&w=2 for original report. Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \| \| \|	fs-cache: order the debugfs stats correctly	David Howells	2010-04-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Order the debugfs statistics correctly. The values displayed through a seq_printf() statement should be in the same order as the names in the format string. In the 'Lookups' line, objects created ('crt=') and lookups timed out ('tmo=') have their values transposed. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \| \| \|	pagemap: fix pfn calculation for hugepage	Naoya Horiguchi	2010-04-07	1	-20/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we look into pagemap using page-types with option -p, the value of pfn for hugepages looks wrong (see below.) This is because pte was evaluated only once for one vma although it should be updated for each hugepage. This patch fixes it. $ page-types -p 3277 -Nl -b huge voffset offset len flags 7f21e8a00 11e400 1 ___U___________H_G________________ 7f21e8a01 11e401 1ff ________________TG________________ ^^^ 7f21e8c00 11e400 1 ___U___________H_G________________ 7f21e8c01 11e401 1ff ________________TG________________ ^^^ One hugepage contains 1 head page and 511 tail pages in x86_64 and each two lines represent each hugepage. Voffset and offset mean virtual address and physical address in the page unit, respectively. The different hugepages should not have the same offset value. With this patch applied: $ page-types -p 3386 -Nl -b huge voffset offset len flags 7fec7a600 112c00 1 ___UD__________H_G________________ 7fec7a601 112c01 1ff ________________TG________________ ^^^ 7fec7a800 113200 1 ___UD__________H_G________________ 7fec7a801 113201 1ff ________________TG________________ ^^^ OK More info: - This patch modifies walk_page_range()'s hugepage walker. But the change only affects pagemap_read(), which is the only caller of hugepage callback. - Without this patch, hugetlb_entry() callback is called per vma, that doesn't match the natural expectation from its name. - With this patch, hugetlb_entry() is called per hugepte entry and the callback can become much simpler. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \| \| \|	vfs: rename block_fsync() to blkdev_fsync()	Andrew Morton	2010-04-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Requested by hch, for consistency now it is exported. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \| \| \|	raw: fsync method is now required	Anton Blanchard	2010-04-07	1	-1/+2
\| \|_\|_\|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop \|\| !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \| \|	proc: copy_to_user() returns unsigned	Dan Carpenter	2010-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	copy_to_user() returns the number of bytes left to be copied. This was a typo from: d82ef020cf31 "proc: pagemap: Hold mmap_sem during page walk". Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>