op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2009-01-05	1	-19/+13
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition
\| *	inode->i_op is never NULL	Al Viro	2009-01-05	1	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to have rather schizophrenic set of checks for NULL ->i_op even though it had been eliminated years ago. You'd need to go out of your way to set it to NULL explicitly _and_ a bunch of code would die on such inodes anyway. After killing two remaining places that still did that bogosity, all that crap can go away. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	fs: symlink write_begin allocation context fix	Nick Piggin	2009-01-04	1	-4/+9
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	make INIT_FS use the __RW_LOCK_UNLOCKED initialization	Steven Rostedt	2008-12-31	1	-1/+1
\| \| \| \| \| \| \| \| \|	[AV: rediffed on top of unification of init_fs] Initialization of init_fs still uses the deprecated RW_LOCK_UNLOCKED macro. This patch updates it to use the __RW_LOCK_UNLOCKED(lock) macro. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	take init_fs to saner place	Al Viro	2008-12-31	1	-0/+7
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	kill vfs_permission	Christoph Hellwig	2008-12-31	1	-18/+13
\| \| \| \| \| \| \| \| \| \| \| \|	With all the nameidata removal there's no point anymore for this helper. Of the three callers left two will go away with the next lookup series anyway. Also add proper kerneldoc to inode_permission as this is the main permission check routine now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	pass a struct path * to may_open	Christoph Hellwig	2008-12-31	1	-7/+7
\| \| \| \| \| \| \|	No need for the nameidata in may_open - a struct path is enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	kill walk_init_root	Christoph Hellwig	2008-12-31	1	-13/+8
\| \| \| \| \| \| \| \|	walk_init_root is a tiny helper that is marked __always_inline, has just one caller and an unused argument. Just merge it into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	remove incorrect comment in inode_permission	Christoph Hellwig	2008-12-31	1	-1/+0
\| \| \| \| \| \| \| \|	We now pass on all MAY_ flags to the filesystems permission routines, so remove the comment stating the contrary. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: ensure page symlinks are NUL-terminated	Duane Griffin	2008-12-31	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	On-disk data corruption could cause a page link to have its i_size set to PAGE_SIZE (or a multiple thereof) and its contents all non-NUL. NUL-terminate the link name to ensure this doesn't cause further problems for the kernel. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	introduce new LSM hooks where vfsmount is available.	Kentaro Takeda	2008-12-31	1	-0/+36
\| \| \| \| \| \| \| \| \| \|	Add new LSM hooks for path-based checks. Call them on directory-modifying operations at the points where we still know the vfsmount involved. Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Merge branch 'master' into next	James Morris	2008-12-04	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <jmorris@namei.org>
\| *	don't unlink an active swapfile	Hugh Dickins	2008-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Peter Cordes is sorry that he rm'ed his swapfiles while they were in use, he then had no pathname to swapoff. It's a curious little oversight, but not one worth a lot of hackery. Kudos to Willy Tarreau for turning this around from a discussion of synthetic pathnames to how to prevent unlink. Mimic immutable: prohibit unlinking an active swapfile in may_delete() (and don't worry my little head over the tiny race window). Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Willy Tarreau <w@1wt.eu> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Peter Cordes <peter@cordes.ca> Cc: Bodo Eggert <7eggert@gmx.de> Cc: David Newall <davidn@davidnewall.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	CRED: Wrap task credential accesses in the filesystem subsystem	David Howells	2008-11-14	1	-4/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
*	[PATCH] move executable checking into ->permission()	Miklos Szeredi	2008-10-23	1	-17/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For execute permission on a regular files we need to check if file has any execute bits at all, regardless of capabilites. This check is normally performed by generic_permission() but was also added to the case when the filesystem defines its own ->permission() method. In the latter case the filesystem should be responsible for performing this check. Move the check from inode_permission() inside filesystems which are not calling generic_permission(). Create a helper function execute_ok() that returns true if the inode is a directory or if any execute bits are present in i_mode. Also fix up the following code: - coda control file is never executable - sysctl files are never executable - hfs_permission seems broken on MAY_EXEC, remove - hfsplus_permission is eqivalent to generic_permission(), remove Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
*	[PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent	OGAWA Hirofumi	2008-10-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This adds LOOKUP_RENAME_TARGET intent for lookup of rename destination. LOOKUP_RENAME_TARGET is going to be used like LOOKUP_CREATE. But since the destination of rename() can be existing directory entry, so it has a difference. Although that difference doesn't matter in my usage, this tells it to user of this intent. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
*	[PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup	OGAWA Hirofumi	2008-10-23	1	-9/+18
\| \| \| \| \| \| \| \| \| \|	lookup_hash() with LOOKUP_PARENT is bogus. And this prepares to add new intent on those path. The user of LOOKUP_PARENT intent is nfs only, and it checks whether nd->flags has LOOKUP_CREATE or LOOKUP_OPEN, so the result is same. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
*	[PATCH vfs-2.6 2/6] vfs: add d_ancestor()	OGAWA Hirofumi	2008-10-23	1	-12/+10
\| \| \| \| \| \| \| \| \| \|	This adds d_ancestor() instead of d_isparent(), then use it. If new_dentry == old_dentry, is_subdir() returns 1, looks strange. "new_dentry == old_dentry" is not subdir obviously. But I'm not checking callers for now, so this keeps current behavior. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
*	[PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()	OGAWA Hirofumi	2008-10-23	1	-2/+2
\| \| \| \|	Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
*	[PATCH] make O_EXCL in nd->intent.flags visible in nd->flags	Al Viro	2008-10-23	1	-1/+3
\| \| \| \| \| \| \|	New flag: LOOKUP_EXCL. Set before doing the final step of pathname resolution on the paths that have LOOKUP_CREATE and O_EXCL. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] get rid of path_lookup_create()	Al Viro	2008-10-23	1	-39/+22
\| \| \| \| \| \| \| \|	... and don't pass bogus flags when we are just looking for parent. Fold __path_lookup_intent_open() into path_lookup_open() while we are at it; that's the only remaining caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] new helper - kern_path()	Al Viro	2008-10-23	1	-0/+10
\| \| \| \| \| \|	Analog of lookup_path(), takes struct path *. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[patch 3/4] vfs: remove unused nameidata argument of may_create()	Miklos Szeredi	2008-08-01	1	-8/+7
\| \| \| \| \|	Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Re: BUG at security/selinux/avc.c:883 (was: Re: linux-next: Tree	Stephen Smalley	2008-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	for July 17: early crash on x86-64) SELinux needs MAY_APPEND to be passed down to the security hook. Otherwise, we get permission denials when only append permission is granted by policy even if the opening process specified O_APPEND. Shows up as a regression in the ltp selinux testsuite, fixed by this patch. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] get rid of __user_path_lookup_open	Al Viro	2008-07-26	1	-13/+0
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] new (local) helper: user_path_parent()	Al Viro	2008-07-26	1	-80/+57
\| \| \| \| \| \| \|	Preparation to untangling intents mess: reduce the number of do_path_lookup() callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] sanitize __user_walk_fd() et.al.	Al Viro	2008-07-26	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	* do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] kill nameidata passing to permission(), rename to inode_permission()	Al Viro	2008-07-26	1	-13/+9
\| \| \| \| \| \| \|	Incidentally, the name that gives hundreds of false positives on grep is not a good idea... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] take noexec checks to very few callers that care	Al Viro	2008-07-26	1	-9/+0
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] more nameidata removal: exec_permission_lite() doesn't need it	Al Viro	2008-07-26	1	-3/+2
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] pass MAY_OPEN to vfs_permission() explicitly	Al Viro	2008-07-26	1	-9/+4
\| \| \| \| \| \| \|	... and get rid of the last "let's deduce mask from nameidata->flags" bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] fix MAY_CHDIR/MAY_ACCESS/LOOKUP_ACCESS mess	Al Viro	2008-07-26	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \|	* MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS * MAY_ACCESS on fuse should affect only the last step of pathname resolution * fchdir() and chroot() should pass MAY_ACCESS, for the same reason why chdir() needs that. * now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be removed; it has no business being in nameidata. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] kill altroot	Al Viro	2008-07-26	1	-87/+2
\| \| \| \| \| \|	long overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] permission checks for chdir need special treatment only on the last step	Al Viro	2008-07-26	1	-2/+0
\| \| \| \| \| \| \| \|	... so we ought to pass MAY_CHDIR to vfs_permission() instead of having it triggered on every step of preceding pathname resolution. LOOKUP_CHDIR is killed by that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[patch 5/5] vfs: remove mode parameter from vfs_symlink()	Miklos Szeredi	2008-07-26	1	-2/+2
\| \| \| \| \| \| \| \| \|	Remove the unused mode parameter from vfs_symlink and callers. Thanks to Tetsuo Handa for noticing. CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
*	[patch 4/5] vfs: reuse local variable in vfs_link()	Tetsuo Handa	2008-07-26	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	Why not reuse "inode" which is assigned as struct inode *inode = old_dentry->d_inode; in the beginning of vfs_link() ? Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
*	[PATCH] sanitize ->permission() prototype	Al Viro	2008-07-26	1	-6/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* kill nameidata * argument; map the 3 bits in ->flags anybody cares about to new MAY_... ones and pass with the mask. * kill redundant gfs2_iop_permission() * sanitize ecryptfs_permission() * fix remaining places where ->permission() instances might barf on new MAY_... found in mask. The obvious next target in that direction is permission(9) folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[patch] vfs: fix lookup on deleted directory	Miklos Szeredi	2008-07-26	1	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lookup can install a child dentry for a deleted directory. This keeps the directory dentry alive, and the inode pinned in the cache and on disk, even after all external references have gone away. This isn't a big problem normally, since memory pressure or umount will clear out the directory dentry and its children, releasing the inode. But for UBIFS this causes problems because its orphan area can overflow. Fix this by returning ENOENT for all lookups on a S_DEAD directory before creating a child dentry. Thanks to Zoltan Sogor for noticing this while testing UBIFS, and Artem for the excellent analysis of the problem and testing. Reported-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[patch 3/4] vfs: fix ERR_PTR abuse in generic_readlink	Marcin Slusarz	2008-06-23	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	generic_readlink calls ERR_PTR for negative and positive values (vfs_readlink returns length of "link"), but it should not (not an errno) and does not need to. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[patch 1/4] vfs: path_{get,put}() cleanups	Jan Blunck	2008-06-23	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \|	Here are some more places where path_{get,put}() can be used instead of dput()/mntput() pair. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] return to old errno choice in mkdir() et.al.	Al Viro	2008-05-16	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	In case when both EEXIST and EROFS would apply we used to return the former in mkdir(2) and friends. Lest anyone suspects us of being consistent, in the same situation knfsd gave clients nfs_erofs... ro-bind series had switched the syscall side of things to returning -EROFS and immediately broke an application - namely, mkdir -p. Patch restores the original behaviour... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	cgroups: implement device whitelist	Serge E. Hallyn	2008-04-29	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a : mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] r/o bind mounts: elevate write count for open()s	Dave Hansen	2008-04-19	1	-10/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first really tricky patch in the series. It elevates the writer count on a mount each time a non-special file is opened for write. We used to do this in may_open(), but Miklos pointed out that __dentry_open() is used as well to create filps. This will cover even those cases, while a call in may_open() would not have. There is also an elevated count around the vfs_create() call in open_namei(). See the comments for more details, but we need this to fix a 'create, remount, fail r/w open()' race. Some filesystems forego the use of normal vfs calls to create struct files. Make sure that these users elevate the mnt writer count because they will get __fput(), and we need to make sure they're balanced. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] r/o bind mounts: get write access for vfs_rename() callers	Dave Hansen	2008-04-19	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	This also uses the little helper in the NFS code to make an if() a little bit less ugly. We introduced the helper at the beginning of the series. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] r/o bind mounts: write counts for link/symlink	Dave Hansen	2008-04-19	1	-0/+10
\| \| \| \| \| \| \| \| \|	[AV: add missing nfsd pieces] Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] r/o bind mounts: get callers of vfs_mknod/create/mkdir()	Dave Hansen	2008-04-19	1	-11/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This takes care of all of the direct callers of vfs_mknod(). Since a few of these cases also handle normal file creation as well, this also covers some calls to vfs_create(). So that we don't have to make three mnt_want/drop_write() calls inside of the switch statement, we move some of its logic outside of the switch and into a helper function suggested by Christoph. This also encapsulates a fix for mknod(S_IFREG) that Miklos found. [AV: merged mkdir handling, added missing nfsd pieces] Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] r/o bind mounts: elevate write count for rmdir and unlink.	Dave Hansen	2008-04-19	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Elevate the write count during the vfs_rmdir() and vfs_unlink(). [AV: merged rmdir and unlink parts, added missing pieces in nfsd] Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] merge open_namei() and do_filp_open()	Christoph Hellwig	2008-04-19	1	-43/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	open_namei() will, in the future, need to take mount write counts over its creation and truncation (via may_open()) operations. It needs to keep these write counts until any potential filp that is created gets __fput()'d. This gets complicated in the error handling and becomes very murky as to how far open_namei() actually got, and whether or not that mount write count was taken. That makes it a bad interface. All that the current do_filp_open() really does is allocate the nameidata on the stack, then call open_namei(). So, this merges those two functions and moves filp_open() over to namei.c so it can be close to its buddy: do_filp_open(). It also gets a kerneldoc comment in the process. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] do namei_flags calculation inside open_namei()	Dave Hansen	2008-04-19	1	-9/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My end goal here is to make sure all users of may_open() return filps. This will ensure that we properly release mount write counts which were taken for the filp in may_open(). This patch moves the sys_open flags to namei flags calculation into fs/namei.c. We'll shortly be moving the nameidata_to_filp() calls into namei.c, and this gets the sys_open flags to a place where we can get at them when we need them. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-03-25	1	-31/+32
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] get stack footprint of pathname resolution back to relative sanity [PATCH] double iput() on failure exit in hugetlb [PATCH] double dput() on failure exit in tiny-shmem [PATCH] fix up new filp allocators [PATCH] check for null vfsmount in dentry_open() [PATCH] reiserfs: eliminate private use of struct file in xattr [PATCH] sanitize hppfs hppfs pass vfsmount to dentry_open() [PATCH] restore export of do_kern_mount()