op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2007-09-11	5	-37/+41
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Fix calculation of i_blocks during truncate [PATCH] ocfs2: Fix a wrong cluster calculation. [PATCH] ocfs2: fix mount option parsing ocfs2: update docs for new features
\| *	ocfs2: Fix calculation of i_blocks during truncate	Mark Fasheh	2007-09-11	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were setting i_blocks too early - before truncating any allocation. Correct things to set i_blocks after the allocation change. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	[PATCH] ocfs2: Fix a wrong cluster calculation.	tao.ma@oracle.com	2007-09-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated by the byte length only. This may cause some problems if we start to write at some position in the end of one cluster and last to a second cluster while the "len" is smaller than a cluster size. In that case, we have to write 2 clusters actually. So we have to take the start position into consideration also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	[PATCH] ocfs2: fix mount option parsing	Tiger Yang	2007-09-11	1	-32/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some mount option types, ocfs2_parse_options() will try to access sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that hasn't been allocated yet and will cause a kernel crash. Fix this by storing options in a struct which can then get pushed into the ocfs2_super once it's been allocated later. If we need more options which store to the ocfs2_super in the future, we can just fields to this struct. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: update docs for new features	Mark Fasheh	2007-09-11	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update documentation listing ocfs2 features to reflect the current state of the file system. Add missing descriptions for some mount options which ocfs2 supports. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* \|	Leases can be hidden by flocks	Pavel Emelyanov	2007-09-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The inode->i_flock list contains the leases, flocks and posix locks in the specified order. However, the flocks are added in the head of this list thus hiding the leases from F_GETLEASE command, from time_out_leases() and other code that expects the leases to come first. The following example will demonstrate this: #define _GNU_SOURCE #include <unistd.h> #include <fcntl.h> #include <stdio.h> #include <sys/file.h> static void show_lease(int fd) { int res; res = fcntl(fd, F_GETLEASE); switch (res) { case F_RDLCK: printf("Read lease\n"); break; case F_WRLCK: printf("Write lease\n"); break; case F_UNLCK: printf("No leases\n"); break; default: printf("Some shit\n"); break; } } int main(int argc, char **argv) { int fd, res; fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("Can't open file"); return 1; } res = fcntl(fd, F_SETLEASE, F_WRLCK); if (res == -1) { perror("Can't set lease"); return 1; } show_lease(fd); if (flock(fd, LOCK_SH) == -1) { perror("Can't flock shared"); return 1; } show_lease(fd); return 0; } The first call to show_lease() will show the write lease set, but the second will show no leases. Fix the flock adding so that the leases always stay in the head of this list. Found during making the flocks pid-namespaces aware. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Fix select on /proc files without ->poll	Alexey Dobriyan	2007-09-11	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba aka "Fix rmmod/read/write races in /proc entries" broke SBCL + SLIME combo. The old code in do_select() used DEFAULT_POLLMASK, if couldn't find ->poll handler. The new code makes ->poll always there and returns 0 by default, which is not correct. Return DEFAULT_POLLMASK instead. Steps to reproduce: install emacs, SBCL, SLIME emacs M-x slime in inferior-lisp buffer [watch it doing "Connecting to Swank on port X.."] Please, apply before 2.6.23. P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	afs: mntput called before dput	Andreas Gruenbacher	2007-09-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dput must be called before mntput here. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-By: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	quota: fix infinite loop	Jan Kara	2007-09-11	3	-4/+31
\|/ \| \| \| \| \| \| \| \| \| \|	If we fail to start a transaction when releasing dquot, we have to call dquot_release() anyway to mark dquot structure as inactive. Otherwise we end in an infinite loop inside dqput(). Signed-off-by: Jan Kara <jack@suse.cz> Cc: xb <xavier.bru@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	knfsd: Validate filehandle type in fsid_source	Neil Brown	2007-09-10	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fsid_source decided where to get the 'fsid' number to return for a GETATTR based on the type of filehandle. It can be from the device, from the fsid, or from the UUID. It is possible for the filehandle to be inconsistent with the export information, so make sure the export information actually has the info implied by the value returned by fsid_source. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	knfsd: Fixed problem with NFS exporting directories which are mounted on.	Neil Brown	2007-09-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Recent changes in NFSd cause a directory which is mounted-on to not appear properly when the filesystem containing it is exported. exp_get now returns -ENOENT rather than NULL and when commit 5d3dbbeaf56d0365ac6b5c0a0da0bd31cc4781e1 removed the NULL checks, it didn't add a check for -ENOENT. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[XFS] fix nasty quota hashtable allocation bug	Eric Sandeen	2007-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This git mod: 77e4635ae191774526ed695482a151ac986f3806 converted to a "greedy" allocation interface, but for the quota hashtables it switched from allocating XFS_QM_HASHSIZE (nr of elements) xfs_dqhash_t's to allocating only XFS_QM_HASHSIZE bytes - quite a lot smaller! Then when we converted hsize "back" to nr of elements (the division line) hsize went to 0. This was leading to oopses when running any quota tests on the Fedora 8 test kernel, but the problem has been there for almost a year. SGI-PV: 968837 SGI-Modid: xfs-linux-melb:xfs-kern:29354a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	[XFS] fix sparse shadowed variable warnings	Christoph Hellwig	2007-09-05	2	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- in xfs_probe_cluster rename the inner len to pg_len. There's no harm here because the outer len isn't used after the inner len comes into existence but it keeps the code clean. - in xfs_da_do_buf remove the inner i because they don't overlap and they are both the same type. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29311a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	[XFS] fix ASSERT and ASSERT_ALWAYS	Christoph Hellwig	2007-09-05	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- remove the != 0 inside the unlikely in ASSERT_ALWAYS because sparse now complains about comparisons between pointers and 0 - add a standalone ASSERT implementation because defining it to ASSERT_ALWAYS means the string is expanded before the token passing stringification. This way we get the actual content of the assertion in the assfail message and don't overflow sparse's stringification buffer leading to sparse error messages. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29310a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	[XFS] Fix sparse warning in kmem_shake_allow	Christoph Hellwig	2007-09-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	We can't return a masked result of a __bitwise type. Compare it to 0 first to keep the behaviour without the warning. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29309a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	[XFS] Fix sparse NULL vs 0 warnings	Christoph Hellwig	2007-09-05	2	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Sparse now warns about comparing pointers to 0, so change all instance where that happens to NULL instead. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29308a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	[XFS] Set filestreams object timeout to something sane.	David Chinner	2007-09-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	SGI-PV: 968554 SGI-Modid: xfs-linux-melb:xfs-kern:29303a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
*	Merge branch 'for_linus' of git://git.linux-nfs.org/pub/linux/nfs-2.6	Linus Torvalds	2007-09-04	5	-14/+58
\|\
\| *	NFS: Fix a write request leak in nfs_invalidate_page()	Trond Myklebust	2007-09-01	2	-1/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ryusuke Konishi says: The recent truncate_complete_page() clears the dirty flag from a page before calling a_ops->invalidatepage(), ^^^^^^ static void truncate_complete_page(struct address_space mapping, struct page page) { ... cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 if (PagePrivate(page)) do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() ... } and this is disturbing nfs_wb_page_priority() from calling nfs_writepage_locked() that is expected to handle the pending request (=nfs_page) associated with the page. int nfs_wb_page_priority(struct inode inode, struct page page, int how) { ... if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out; } ... } Since truncate_complete_page() will get rid of the page after a_ops->invalidatepage() returns, the request (=nfs_page) associated with the page becomes a garbage in nfs_inode->nfs_page_tree. ------------------------ Fix this by ensuring that nfs_wb_page_priority() recognises that it may also need to clear out non-dirty pages that have an nfs_page associated with them. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFS: change NFS mount error return when hostname/pathname too long	Chuck Lever	2007-09-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the mount(2) man page, the proper error return code for the mount(2) system call when the special device name or the mounted-on directory name is too long is ENAMETOOLONG. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFS: Off-by-one length error in string handling	Chuck Lever	2007-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hostname was getting truncated in the new text-based NFS mount API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFS: Return a real error code from mount(2)	Chuck Lever	2007-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't filter the return code from the in-kernel rpcbind or NFS mount clients. Return the real error code so that callers of the new NFS text-based mount API can apply a useful retry strategy. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFS: mount option parser chokes on proto=	Chuck Lever	2007-09-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new text-based NFS mount option parsing logic doesn't recognize any valid transport protocols due to a silly mistake in the protocol token matching logic. This prevents basic mount requests such as: mount.nfs server:/export /mnt -o proto=tcp from working with the new text-based NFS mount API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4: Ensure that we pass the correct dentry to nfs4_intent_set_file	Trond Myklebust	2007-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an Oops that was reported by Gabriel Barazer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4: Fix a typo in _nfs4_do_open_reclaim	Trond Myklebust	2007-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should fix the following Oops reported by Jeff Garzik: kernel BUG at fs/nfs/nfs4xdr.c:1040! invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: nfs lockd sunrpc af_packet ipv6 cpufreq_ondemand acpi_cpufreq battery floppy nvram sg snd_hda_intel ata_generic snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_page_alloc e1000 firewire_ohci ata_piix i2c_core sr_mod cdrom sata_sil ahci libata sd_mod scsi_mod ext3 jbd ehci_hcd uhci_hcd Pid: 16353, comm: 10.10.10.1-recl Not tainted 2.6.23-rc3 #1 RIP: 0010:[<ffffffff88240980>] [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330 RSP: 0018:ffff8100467c5c60 EFLAGS: 00010202 RAX: ffff81000f89b8b8 RBX: 00000000697a6f6d RCX: ffff81000f89b8b8 RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8100467c5c80 RBP: ffff8100467c5c80 R08: ffff81000f89bc30 R09: ffff81000f89b83f R10: 0000000000000001 R11: ffffffff881e79e0 R12: ffff81003cbd1808 R13: ffff81000f89b860 R14: ffff81005fc984e0 R15: ffffffff88240af0 FS: 0000000000000000(0000) GS:ffffffff8052a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002adb9e51a030 CR3: 000000007ea7e000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process 10.10.10.1-recl (pid: 16353, threadinfo ffff8100467c4000, task ffff8100038ce780) Stack: ffff81004aeb6a40 ffff81003cbd1808 ffff81003cbd1808 ffffffff88240b5d ffff81000f89b8bc ffff81005fc984e8 ffff81000f89bc30 ffff81005fc984e8 0000000300000000 0000000000000000 0000000000000000 ffff81003cbd1800 Call Trace: [<ffffffff88240b5d>] :nfs:nfs4_xdr_enc_open_noattr+0x6d/0x90 [<ffffffff881e74b7>] :sunrpc:rpcauth_wrap_req+0x97/0xf0 [<ffffffff88240af0>] :nfs:nfs4_xdr_enc_open_noattr+0x0/0x90 [<ffffffff881df57a>] :sunrpc:call_transmit+0x18a/0x290 [<ffffffff881e5e7b>] :sunrpc:__rpc_execute+0x6b/0x290 [<ffffffff881dff76>] :sunrpc:rpc_do_run_task+0x76/0xd0 [<ffffffff882373f6>] :nfs:_nfs4_proc_open+0x76/0x230 [<ffffffff88237a2e>] :nfs:nfs4_open_recover_helper+0x5e/0xc0 [<ffffffff88237b74>] :nfs:nfs4_open_recover+0xe4/0x120 [<ffffffff88238e14>] :nfs:nfs4_open_reclaim+0xa4/0xf0 [<ffffffff882413c5>] :nfs:nfs4_reclaim_open_state+0x55/0x1b0 [<ffffffff882417ea>] :nfs:reclaimer+0x2ca/0x390 [<ffffffff88241520>] :nfs:reclaimer+0x0/0x390 [<ffffffff8024e59b>] kthread+0x4b/0x80 [<ffffffff8020cad8>] child_rip+0xa/0x12 [<ffffffff8024e550>] kthread+0x0/0x80 [<ffffffff8020cace>] child_rip+0x0/0x12 Code: 0f 0b eb fe 48 89 ef c7 00 00 00 00 02 be 08 00 00 00 e8 79 RIP [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330 RSP <ffff8100467c5c60> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFS: Fix use of cancel_delayed_work_sync in nfs_release_automount_timer	Trond Myklebust	2007-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Doh! We can't use cancel_delayed_work_sync because we may have been called from an unmount that was being performed by nfs_automount_task. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* \|	[JFFS2] fix write deadlock regression	Jason Lunz	2007-09-02	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've bisected the deadlock when many small appends are done on jffs2 down to this commit: commit 6fe6900e1e5b6fa9e5c59aa5061f244fe3f467e2 Author: Nick Piggin <npiggin@suse.de> Date: Sun May 6 14:49:04 2007 -0700 mm: make read_cache_page synchronous Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. It introduced a wait to read_cache_page, as well as a read_cache_page_async function equivalent to the old read_cache_page without any callers. Switching jffs2_gc_fetch_page to read_cache_page_async for the old behavior makes the deadlocks go away, but maybe reintroduces the use-before-uptodate problem? I don't understand the mm/fs interaction well enough to say. [It's fine. dwmw2.] Signed-off-by: Jason Lunz <lunz@falooley.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
*	NFS: Fix the mount regression	Trond Myklebust	2007-08-31	1	-46/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids the recent NFS mount regression (returning EBUSY when mounting the same filesystem twice with different parameters). The best I can do given the constraints appears to be to have the kernel first look for a superblock that matches both the fsid and the user-specified mount options, and then spawn off a new superblock if that search fails. Note that this is not the same as specifying nosharecache everywhere since nosharecache will never attempt to match an existing superblock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	hugepage: fix broken check for offset alignment in hugepage mappings	David Gibson	2007-08-31	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For hugepage mappings, the file offset, like the address and size, needs to be aligned to the size of a hugepage. In commit 68589bc353037f233fe510ad9ff432338c95db66, the check for this was moved into prepare_hugepage_range() along with the address and size checks. But since BenH's rework of the get_unmapped_area() paths leading up to commit 4b1d89290b62bb2db476c94c82cf7442aab440c8, prepare_hugepage_range() is only called for MAP_FIXED mappings, not for other mappings. This means we're no longer ever checking for an aligned offset - I've confirmed that mmap() will (apparently) succeed with a misaligned offset on both powerpc and i386 at least. This patch restores the check, removing it from prepare_hugepage_range() and putting it back into hugetlbfs_file_mmap(). I'm putting it there, rather than in the get_unmapped_area() path so it only needs to go in one place, than separately in the half-dozen or so arch-specific implementations of hugetlb_get_unmapped_area(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: fix possible fault in ecryptfs_sync_page	Ryusuke Konishi	2007-08-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will avoid a possible fault in ecryptfs_sync_page(). In the function, eCryptfs calls sync_page() method of a lower filesystem without checking its existence. However, there are many filesystems that don't have this method including network filesystems such as NFS, AFS, and so forth. They may fail when an eCryptfs page is waiting for lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Fix possible NULL pointer dereference in udf_table_free_blocks()	Jan Kara	2007-08-31	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \|	Fix possible NULL pointer dereference when freeing blocks in case table of free space is used. Also fix handling of the case when we need to move extent from one block to another one to make space for indirect extent. BTW: Nobody seem to have ever used this code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	UDF: handle wrong superblock better	Jan Kara	2007-08-31	1	-4/+22
\| \| \| \| \| \| \| \| \| \|	If UDF superblock is incorrect, we can fail to find a table of free / allocated space and consequently Oops. Handle this situation more gracefully by ignoring the broken UDF partition. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	revert "eCryptfs: fix lookup error for special files"	Andrew Morton	2007-08-31	1	-4/+0
\| \| \| \| \| \| \| \| \|	This patch got appied twice. Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched	Linus Torvalds	2007-08-23	1	-15/+29
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: tweak the sched_runtime_limit tunable sched: skip updating rq's next_balance under null SD sched: fix broken SMT/MC optimizations sched: accounting regression since rc1 sched: fix sysctl directory permissions sched: sched_clock_idle_[sleep\|wakeup]_event()
\| *	sched: accounting regression since rc1	Christian Borntraeger	2007-08-23	1	-15/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the accounting regression for CONFIG_VIRT_CPU_ACCOUNTING. It reverts parts of commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa by converting fs/proc/array.c back to cputime_t. The new functions task_utime and task_stime now return cputime_t instead of clock_t. If CONFIG_VIRT_CPU_ACCOUTING is set, task->utime and task->stime are returned directly instead of using sum_exec_runtime. Patch is tested on s390x with and without VIRT_CPU_ACCOUTING as well as on i386. [ mingo@elte.hu: cleanups, comments. ] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2007-08-23	2	-18/+0
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix bad error path in conversion routines 9p: remove deprecated v9fs_fid_lookup_remove() 9p: update maintainers and documentation 9p: fix use after free
\| * \|	9p: remove deprecated v9fs_fid_lookup_remove()	Eric Van Hensbergen	2007-08-23	2	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the v9fs_fid_lookup_remove which is no longer used. Based on original patch from Adrian Bunk <bunk@stusta.de> which used #if 0 to isolate the code. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* \| \|	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6	Linus Torvalds	2007-08-23	2	-15/+13
\|\ \ \ \| \|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: don't warn on removal of a nonexistent binary file HOWTO: latest lxr url address changed HOWTO: korean translation of Documentation/HOWTO Fix Off-by-one in /sys/module/*/refcnt sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()
\| * \|	sysfs: don't warn on removal of a nonexistent binary file	Alan Stern	2007-08-22	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch (as960) removes the error message and stack dump logged by sysfs_remove_bin_file() when someone tries to remove a nonexistent file. The warning doesn't seem to be needed, since none of the other file-, symlink-, or directory-removal routines in sysfs complain in a comparable way. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Tejun Heo <htejun@gmail.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| * \|	sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()	Tejun Heo	2007-08-22	1	-9/+12
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sd children list walking in sysfs_lookup() and sd renaming in sysfs_rename_dir() were left out during i_mutex -> sysfs_mutex conversion. Fix them. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* \|	exec: kill unsafe BUG_ON(sig->count) checks	Oleg Nesterov	2007-08-22	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	de_thread: if (atomic_read(&oldsighand->count) <= 1) BUG_ON(atomic_read(&sig->count) != 1); This is not safe without the rmb() in between. The results of two correctly ordered __exit_signal()->atomic_dec_and_test()'s could be seen out of order on our CPU. The same is true for the "thread_group_empty()" case, __unhash_process()'s changes could be seen before atomic_dec_and_test(&sig->count). On some platforms (including i386) atomic_read() doesn't provide even the compiler barrier, in that case these checks are simply racy. Remove these BUG_ON()'s. Alternatively, we can do something like BUG_ON( ({ smp_rmb(); atomic_read(&sig->count) != 1; }) ); Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	autofs4: deadlock during create	Ian Kent	2007-08-22	1	-14/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to inconsistent locking in the VFS between calls to lookup and revalidate deadlock can occur in the automounter. The inconsistency is that the directory inode mutex is held for both lookup and revalidate calls when called via lookup_hash whereas it is held only for lookup during a path walk. Consequently, if the mutex is held during a call to revalidate autofs4 can't release the mutex to callback the daemon as it can't know whether it owns the mutex. This situation happens when a process tries to create a directory within an automount and a second process also tries to create the same directory between the lookup and the mkdir. Since the first process has dropped the mutex for the daemon callback, the second process takes it during revalidate leading to deadlock between the autofs daemon and the second process when the daemon tries to create the mount point directory. After spending quite a bit of time trying to resolve this on more than one occassion, using rather complex and ulgy approaches, it turns out that just delaying the hashing of the dentry until the create operation works fine. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	signalfd: make it group-wide, fix posix-timers scheduling	Oleg Nesterov	2007-08-22	2	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this patch any thread can dequeue its own private signals via signalfd, even if it was created by another sub-thread. To do so, we pass "current" to dequeue_signal() if the caller is from the same thread group. This also fixes the scheduling of posix timers broken by the previous patch. If the caller doesn't belong to this thread group, we can't handle __SI_TIMER case properly anyway. Perhaps we should forbid the cross-process signalfd usage and convert ctx->tsk to ctx->sighand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Roland McGrath <roland@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	eCryptfs: fix lookup error for special files	Ryusuke Konishi	2007-08-22	1	-0/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ecryptfs_lookup() is called against special files, eCryptfs generates the following errors because it tries to treat them like regular eCryptfs files. Error opening lower file for lower_dentry [0xffff810233a6f150], lower_mnt [0xffff810235bb4c80], and flags [0x8000] Error opening lower_file to read header region Error attempting to read the [user.ecryptfs] xattr from the lower file; return value = [-95] Valid metadata not found in header region or xattr region; treating file as unencrypted For instance, the problem can be reproduced by the steps below. # mkdir /root/crypt /mnt/crypt # mount -t ecryptfs /root/crypt /mnt/crypt # mknod /mnt/crypt/c0 c 0 0 # umount /mnt/crypt # mount -t ecryptfs /root/crypt /mnt/crypt # ls -l /mnt/crypt This patch fixes it by adding a check similar to directories and symlinks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	dio: zero struct dio with kzalloc instead of manually	Zach Brown	2007-08-20	1	-17/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch uses kzalloc to zero all of struct dio rather than manually trying to track which fields we rely on being zero. It passed aio+dio stress testing and some bug regression testing on ext3. This patch was introduced by Linus in the conversation that lead up to Badari's minimal fix to manually zero .map_bh.b_state in commit: 6a648fa72161d1f6468dabd96c5d3c0db04f598a It makes the code a bit smaller. Maybe a couple fewer cachelines to load, if we're lucky: text data bss dec hex filename 3285925 568506 1304616 5159047 4eb887 vmlinux 3285797 568506 1304616 5158919 4eb807 vmlinux.patched I was unable to measure a stable difference in the number of cpu cycles spent in blockdev_direct_IO() when pushing aio+dio 256K reads at ~340MB/s. So the resulting intent of the patch isn't a performance gain but to avoid exposing ourselves to the risk of finding another field like .map_bh.b_state where we rely on zeroing but don't enforce it in the code. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	JFFS2 locking regression fix.	David Woodhouse	2007-08-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a491486a2087ac3dfc00efb4f838c8d684afaf54 introduced a locking problem in JFFS2 -- we up() the alloc_sem when we weren't previously holding it. This leads to all kinds of fun behaviour later. There was a _reason_ for the if (1 /* alternative path needs testing */ \|\| which the above-mentioned commit removed :) Discovered and debugged by Giulio Fedel <giulio.fedel@andorsystems.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	Linus Torvalds	2007-08-18	5	-13/+45
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Check return code on failed alloc [CIFS] Update CIFS project web site [CIFS] Fix hang in find_writable_file
\| *	[CIFS] Check return code on failed alloc	Cyrill Gorcunov	2007-08-18	2	-0/+10
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
\| *	[CIFS] Fix hang in find_writable_file	Steve French	2007-07-26	4	-13/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Caused by unneeded reopen during reconnect while spinlock held. Fixes kernel bugzilla bug #7903 Thanks to Lin Feng Shen for testing this, and Amit Arora for some nice problem determination to narrow this down. Acked-by: Dave Kleikamp <shaggy@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* \|	Reset current->pdeath_signal on SUID binary execution	Marcel Holtmann	2007-08-18	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a vulnerability in the "parent process death signal" implementation discoverd by Wojciech Purczynski of COSEINC PTE Ltd. and iSEC Security Research. http://marc.info/?l=bugtraq&m=118711306802632&w=2 Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>