op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	fs: Protect write paths by sb_start_write - sb_end_write	Jan Kara	2012-07-31	5	-23/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: Skip atime update on frozen filesystem	Jan Kara	2012-07-31	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is unexpected to block reading of frozen filesystem because of atime update. Also handling blocking on frozen filesystem because of atime update would make locking more complex than it already is. So just skip atime update when filesystem is frozen like we skip it when filesystem is remounted read-only. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: Add freezing handling to mnt_want_write() / mnt_drop_write()	Jan Kara	2012-07-31	5	-24/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: Improve filesystem freezing handling	Jan Kara	2012-07-31	2	-28/+373
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vfs_check_frozen() tests are racy since the filesystem can be frozen just after the test is performed. Thus in write paths we can end up marking some pages or inodes dirty even though the file system is already frozen. This creates problems with flusher thread hanging on frozen filesystem. Another problem is that exclusion between ->page_mkwrite() and filesystem freezing has been handled by setting page dirty and then verifying s_frozen. This guaranteed that either the freezing code sees the faulted page, writes it, and writeprotects it again or we see s_frozen set and bail out of page fault. This works to protect from page being marked writeable while filesystem freezing is running but has an unpleasant artefact of leaving dirty (although unmodified and writeprotected) pages on frozen filesystem resulting in similar problems with flusher thread as the first problem. This patch aims at providing exclusion between write paths and filesystem freezing. We implement a writer-freeze read-write semaphore in the superblock. Actually, there are three such semaphores because of lock ranking reasons - one for page fault handlers (->page_mkwrite), one for all other writers, and one of internal filesystem purposes (used e.g. to track running transactions). Write paths which should block freezing (e.g. directory operations, ->aio_write(), ->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem takes the writer side. Only that we don't really want to bounce cachelines of the semaphores between CPUs for each write happening. So we implement the reader side of the semaphore as a per-cpu counter and the writer side is implemented using s_writers.frozen superblock field. [AV: microoptimize sb_start_write(); we want it fast in normal case] BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch the protection of percpu_counter list to spinlock	Al Viro	2012-07-31	1	-7/+7
\| \| \| \| \| \|	... making percpu_counter_destroy() non-blocking Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	nfsd: Push mnt_want_write() outside of i_mutex	Jan Kara	2012-07-31	6	-46/+64
\| \| \| \| \| \| \| \| \| \| \|	When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: linux-nfs@vger.kernel.org CC: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	btrfs: Push mnt_want_write() outside of i_mutex	Jan Kara	2012-07-31	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \|	When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fat: Push mnt_want_write() outside of i_mutex	Jan Kara	2012-07-31	1	-8/+7
\| \| \| \| \| \| \| \| \| \|	When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call outside of i_mutex as in other places. CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: Push mnt_want_write() outside of i_mutex	Jan Kara	2012-07-31	1	-21/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes without it. This isn't really a problem because mnt_want_write() is a non-blocking operation (essentially has a trylock semantics) but when the function starts to handle also frozen filesystems, it will get a full lock semantics and thus proper lock ordering has to be established. So move all mnt_want_write() calls outside of i_mutex. One non-trivial case needing conversion is kern_path_create() / user_path_create() which didn't include mnt_want_write() but now needs to because it acquires i_mutex. Because there are virtual file systems which don't bother with freeze / remount-ro protection we actually provide both versions of the function - one which calls mnt_want_write() and one which does not. [AV: scratch the previous, mnt_want_write() has been moved to kern_path_create() by now] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	mm: Make default vm_ops provide ->page_mkwrite handler	Jan Kara	2012-07-31	3	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make default vm_ops provide ->page_mkwrite handler. Currently it only updates file's modification times and gets locked page but later it will also handle filesystem freezing. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	mm: Update file times from fault path only if .page_mkwrite is not set	Jan Kara	2012-07-31	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Filesystems wanting to properly support freezing need to have control when file_update_time() is called. After pushing file_update_time() to all relevant .page_mkwrite implementations we can just stop calling file_update_time() when filesystem implements .page_mkwrite. Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sysfs: Push file_update_time() into bin_page_mkwrite()	Jan Kara	2012-07-31	1	-0/+2
\| \| \| \| \| \| \|	CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	gfs2: Push file_update_time() into gfs2_page_mkwrite()	Jan Kara	2012-07-31	1	-0/+3
\| \| \| \| \| \| \| \|	CC: Steven Whitehouse <swhiteho@redhat.com> CC: cluster-devel@redhat.com Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	9p: Push file_update_time() into v9fs_vm_page_mkwrite()	Jan Kara	2012-07-31	1	-0/+3
\| \| \| \| \| \| \| \| \|	CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ceph: Push file_update_time() into ceph_page_mkwrite()	Jan Kara	2012-07-31	1	-0/+3
\| \| \| \| \| \| \| \|	CC: Sage Weil <sage@newdream.net> CC: ceph-devel@vger.kernel.org Acked-by: Sage Weil <sage@newdream.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: Push file_update_time() into __block_page_mkwrite()	Jan Kara	2012-07-31	1	-0/+6
\| \| \| \| \| \| \| \| \|	Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fb_defio: Push file_update_time() into fb_deferred_io_mkwrite()	Jan Kara	2012-07-31	1	-0/+2
\| \| \| \| \| \|	CC: Jaya Kumar <jayalk@intworks.biz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early	Al Viro	2012-07-31	1	-22/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The write ref to vfsmount taken in lookup_open()/atomic_open() is going to be dropped; we take the one to stay in dentry_open(). Just grab the temporary in caller if it looks like we are going to need it (create/truncate/writable open) and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write() inside, check that flag and treat "false" as "mnt_want_write() has just failed". mnt_want_write() is cheap and the things get considerably simpler and more robust that way - we get it and drop it in the same function, to start with, rather than passing a "has something in the guts of really scary functions taken it" back to caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fix O_EXCL handling for devices	Al Viro	2012-07-30	1	-2/+2
\| \| \| \| \| \| \|	O_EXCL without O_CREAT has different semantics; it's "fail if already opened", not "fail if already exists". commit 71574865 broke that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	lockd: handle lockowner allocation failure in nlmclnt_proc()	Al Viro	2012-07-29	1	-0/+5
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	lockd: shift grabbing a reference to nlm_host into nlm_alloc_call()	Al Viro	2012-07-29	4	-8/+4
\| \| \| \| \| \| \| \|	It's used both for client and server hosts; we can't do nlmclnt_release_host() on failure exits, since the host might need nlmsvc_release_host(), with BUG_ON() for calling the wrong one. Makes life simpler for callers, actually... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: add link restriction audit reporting	Kees Cook	2012-07-29	3	-0/+27
\| \| \| \| \| \| \| \| \|	Adds audit messages for unexpected link restriction violations so that system owners will have some sort of potentially actionable information about misbehaving processes. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fs: add link restrictions	Kees Cook	2012-07-29	4	-0/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds symlink and hardlink restrictions to the Linux VFS. Symlinks: A long-standing class of security issues is the symlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given symlink (i.e. a root process follows a symlink belonging to another user). For a likely incomplete list of hundreds of examples across the years, please see: http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp The solution is to permit symlinks to only be followed when outside a sticky world-writable directory, or when the uid of the symlink and follower match, or when the directory owner matches the symlink's owner. Some pointers to the history of earlier discussion that I could find: 1996 Aug, Zygo Blaxell http://marc.info/?l=bugtraq&m=87602167419830&w=2 1996 Oct, Andrew Tridgell http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html 1997 Dec, Albert D Cahalan http://lkml.org/lkml/1997/12/16/4 2005 Feb, Lorenzo Hernández García-Hierro http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html 2010 May, Kees Cook https://lkml.org/lkml/2010/5/30/144 Past objections and rebuttals could be summarized as: - Violates POSIX. - POSIX didn't consider this situation and it's not useful to follow a broken specification at the cost of security. - Might break unknown applications that use this feature. - Applications that break because of the change are easy to spot and fix. Applications that are vulnerable to symlink ToCToU by not having the change aren't. Additionally, no applications have yet been found that rely on this behavior. - Applications should just use mkstemp() or O_CREATE\|O_EXCL. - True, but applications are not perfect, and new software is written all the time that makes these mistakes; blocking this flaw at the kernel is a single solution to the entire class of vulnerability. - This should live in the core VFS. - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135) - This should live in an LSM. - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188) Hardlinks: On systems that have user-writable directories on the same partition as system files, a long-standing class of security issues is the hardlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given hardlink (i.e. a root process follows a hardlink created by another user). Additionally, an issue exists where users can "pin" a potentially vulnerable setuid/setgid file so that an administrator will not actually upgrade a system fully. The solution is to permit hardlinks to only be created when the user is already the existing file's owner, or if they already have read/write access to the existing file. Many Linux users are surprised when they learn they can link to files they have no access to, so this change appears to follow the doctrine of "least surprise". Additionally, this change does not violate POSIX, which states "the implementation may require that the calling process has permission to access the existing file"[1]. This change is known to break some implementations of the "at" daemon, though the version used by Fedora and Ubuntu has been fixed[2] for a while. Otherwise, the change has been undisruptive while in use in Ubuntu for the last 1.5 years. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279 This patch is based on the patches in Openwall and grsecurity, along with suggestions from Al Viro. I have added a sysctl to enable the protected behavior, and documentation. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: don't let do_last pass negative dentry to audit_inode	Jeff Layton	2012-07-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I can reliably reproduce the following panic by simply setting an audit rule on a recent 3.5.0+ kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 PGD 7acd9067 PUD 7b8fb067 PMD 0 Oops: 0000 [#86] SMP Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] CPU 0 Pid: 1286, comm: abrt-dump-oops Tainted: G D 3.5.0+ #1 Bochs Bochs RIP: 0010:[<ffffffff810d1250>] [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 RSP: 0018:ffff88007aebfc38 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800 FS: 00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530) Stack: ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358 ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968 ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000 Call Trace: [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160 [<ffffffff810d4968>] __audit_inode+0x118/0x2d0 [<ffffffff811955a9>] do_last+0x999/0xe80 [<ffffffff81191fe8>] ? inode_permission+0x18/0x50 [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130 [<ffffffff81195b4a>] path_openat+0xba/0x420 [<ffffffff81196111>] do_filp_open+0x41/0xa0 [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120 [<ffffffff811855cd>] do_sys_open+0xed/0x1c0 [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300 [<ffffffff811856c1>] sys_open+0x21/0x30 [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b RSP <ffff88007aebfc38> CR2: 0000000000000040 The problem is that do_last is passing a negative dentry to audit_inode. The comments on lookup_open note that it can pass back a negative dentry if O_CREAT is not set. This patch fixes the oops, but I'm not clear on whether there's a better approach. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	brcm80211: pointless current->files passed to filp_close()	Al Viro	2012-07-29	1	-1/+1
\| \| \| \| \| \|	... only needed if it's been in descriptor table Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sound_firmware: don't pass crap to filp_close()	Al Viro	2012-07-29	1	-4/+4
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	gadgetfs: clean up	Al Viro	2012-07-29	2	-10/+8
\| \| \| \| \| \| \| \| \|	sigh... * opened files have non-NULL dentries and non-NULL inodes * close_filp() needs current->files only if the file had been in descriptor table. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	slightly reduce lossage in gdm72xx	Al Viro	2012-07-29	2	-17/+12
\| \| \| \| \| \| \| \| \| \|	* filp_close() needs non-NULL second argument only if it'd been in descriptor table * opened files have non-NULL dentries, TYVM * ... and those dentries are positive - it's kinda hard to open a file that doesn't exist. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	slightly reduce idiocy in drivers/staging/bcm/Misc.c	Al Viro	2012-07-29	1	-26/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	a) vfs_llseek() does not access userland pointers of any kind b) neither does filp_close(), for that matter c) ... nor filp_open() d) vfs_read() does, but we do have a wrapper for that (kernel_read()), so there's no need to reinvent it. e) passing current->files to filp_close() on something that never had been in descriptor table is pointless. ISAGN: voodoo dolls to be used on voodoo programmers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	consolidate pipe file creation	Al Viro	2012-07-29	4	-65/+34
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	take grabbing f->f_path to do_dentry_open()	Al Viro	2012-07-29	1	-4/+2
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	uninline file_free_rcu()	Al Viro	2012-07-29	1	-1/+1
\| \| \| \| \| \|	What inline? Its only use is passing its address to call_rcu(), for fuck sake! Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ecryptfs_lookup_interpose(): allocate dentry_info first	Al Viro	2012-07-29	1	-7/+6
\| \| \| \| \| \|	less work on failure that way Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize ecryptfs_lookup()	Al Viro	2012-07-29	1	-13/+4
\| \| \| \| \| \| \| \| \|	* ->lookup() never gets hit with . or .. * dentry it gets is unhashed, so unless we had gone and hashed it ourselves, there's no need to d_drop() the sucker. * wrong name printed in one of the printks (NULL, in fact) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	clean unix_bind() up a bit	Al Viro	2012-07-29	1	-45/+43
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	pull mnt_want_write()/mnt_drop_write() into ↵	Al Viro	2012-07-29	3	-50/+18
\| \| \| \| \| \| \| \| \|	kern_path_create()/done_path_create() resp. One side effect - attempt to create a cross-device link on a read-only fs fails with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	mknod: take sanity checks on mode into the very beginning	Al Viro	2012-07-29	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Note that applying umask can't affect their results. While that affects errno in cases like mknod("/no_such_directory/a", 030000) yielding -EINVAL (due to impossible mode_t) instead of -ENOENT (due to inexistent directory), IMO that makes a lot more sense, POSIX allows to return either and any software that relies on getting -ENOENT instead of -EINVAL in that case deserves everything it gets. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	new helper: done_path_create()	Al Viro	2012-07-29	6	-30/+21
\| \| \| \| \| \|	releases what needs to be released after {kern,user}_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	pull unlock+dput() out into do_spu_create()	Al Viro	2012-07-29	2	-16/+11
\| \| \| \| \| \|	... and cleaning spufs_create() a bit, while we are at it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	spufs: pull unlock-and-dput() up into spufs_create()	Al Viro	2012-07-29	1	-23/+10
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	spufs_create_context(): simplify failure exits	Al Viro	2012-07-29	1	-7/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	move spu_forget() into spufs_rmdir()	Al Viro	2012-07-29	1	-6/+5
\| \| \| \| \| \| \|	now that __fput() is not done in any callchain containing mmput(), we can do that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()	Al Viro	2012-07-23	1	-2/+2
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()	Al Viro	2012-07-23	1	-2/+2
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch dentry_open() to struct path, make it grab references itself	Al Viro	2012-07-23	15	-151/+106
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	spufs: shift dget/mntget towards dentry_open()	Al Viro	2012-07-23	1	-28/+18
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	zoran: don't bother with struct file * in zoran_map	Al Viro	2012-07-23	2	-3/+5
\| \| \| \| \| \| \| \|	all we need it for is file->private_data, which is assign-once, already assigned by that point and, incidentally, its value is already in use by zoran ->mmap() anyway. So just store that pointer instead... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ecryptfs: don't reinvent the wheels, please - use struct completion	Al Viro	2012-07-23	3	-65/+26
\| \| \| \| \| \|	... and keep the sodding requests on stack - they are small enough. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	don't expose I_NEW inodes via dentry->d_inode	Al Viro	2012-07-23	7	-19/+19
\| \| \| \| \| \| \| \| \|	d_instantiate(dentry, inode); unlock_new_inode(inode); is a bad idea; do it the other way round... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	tidy up namei.c a bit	Al Viro	2012-07-23	1	-18/+21
\| \| \| \| \| \|	locking/unlocking for rcu walk taken to a couple of inline helpers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>