summaryrefslogtreecommitdiffstats
path: root/fs/autofs4
Commit message (Collapse)AuthorAgeFilesLines
* autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.NeilBrown2014-10-141-6/+20
| | | | | | | | | | | | | | | | | | | | | | If rcu-walk mode we don't *have* to return -EISDIR for non-mount-traps as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT dentrys the slow way. But it is better if we do when possible. In 'oz_mode', use the same condition as ref-walk: if not a mountpoint, then it must be -EISDIR. In regular mode there are most tests needed. Most of them can be performed without taking any spinlocks. If we find a directory that isn't obviously empty, and isn't mounted on, we need to call 'simple_empty()' which does take a spinlock. If this turned out to hurt performance, some other approach could be found to signal when a directory is known to be empty. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: avoid taking fs_lock during rcu-walkNeilBrown2014-10-142-8/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | ->fs_lock protects AUTOFS_INF_EXPIRING. We need to be sure that once the flag is set, no new references beneath the dentry are taken. So rcu-walk currently needs to take fs_lock before checking the flag. This hurts performance. Change the expiry to a two-stage process. First set AUTOFS_INF_NO_RCU which forces any path walk into ref-walk mode, then drop the lock and call synchronize_rcu(). Once that returns we can be sure no rcu-walk is active beneath the dentry and we can check reference counts again. Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without taking the lock as along as we test AUTOFS_INF_NO_RCU too. If either are set, we must abort the RCU-walk If neither are set, we know that refcounts will be tested again after we finish the RCU-walk so we are safe to continue. ->fs_lock is still taken in d_manage() to check for a non-trap directory. That will be resolved in the next patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: make "autofs4_can_expire" idempotent.NeilBrown2014-10-141-6/+4
| | | | | | | | | | | | | | Have a "test" function change the value it is testing can be confusing, particularly as a future patch will be calling this function twice. So move the update for 'last_used' to avoid repeat expiry to the place where the final determination on what to expire is known. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: factor should_expire() out of autofs4_expire_indirect.NeilBrown2014-10-141-74/+88
| | | | | | | | | | Future patch will potentially call this twice, so make it separate. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: allow RCU-walk to walk through autofs4NeilBrown2014-10-144-18/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This series teaches autofs about RCU-walk so that we don't drop straight into REF-walk when we hit an autofs directory, and so that we avoid spinlocks as much as possible when performing an RCU-walk. This is needed so that the benefits of the recent NFS support for RCU-walk are fully available when NFS filesystems are automounted. Patches have been carefully reviewed and tested both with test suites and in production - thanks a lot to Ian Kent for his support there. This patch (of 6): Any attempt to look up a pathname that passes though an autofs4 mount is currently forced out of RCU-walk into REF-walk. This can significantly hurt performance of many-thread work loads on many-core systems, especially if the automounted filesystem supports RCU-walk but doesn't get to benefit from it. So if autofs4_d_manage is called with rcu_walk set, only fail with -ECHILD if it is necessary to wait longer than a spinlock. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs - remove obsolete d_invalidate() from expireIan Kent2014-10-091-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Biederman's umount-on-rmdir series changes d_invalidate() to sumarily remove mounts under the passed in dentry regardless of whether they are busy or not. So calling this in fs/autofs4/expire.c:autofs4_tree_busy() is definitely the wrong thing to do becuase it will silently umount entries instead of just cleaning stale dentrys. But this call shouldn't be needed and testing shows that automounting continues to function without it. As Al Viro correctly surmises the original intent of the call was to perform what shrink_dcache_parent() does. If at some time in the future I see stale dentries accumulating following failed mounts I'll revisit the issue and possibly add a shrink_dcache_parent() call if needed. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* autofs4: comment typo: remove a a doubled wordNeilBrown2014-08-081-1/+1
| | | | | | | Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: remove some unused inline functionsNeilBrown2014-08-081-49/+0
| | | | | | | | | | | {__,}manage_dentry_{set,clear}_{automount,transit} are 4 unused inline functions. Discard them. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: don't take spinlock when not needed in autofs4_lookup_expiringNeilBrown2014-08-081-2/+6
| | | | | | | | | | | If the expiring_list is empty, we can avoid a costly spinlock in the rcu-walk path through autofs4_d_manage (once the rest of the path becomes rcu-walk friendly). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: remove a redundant assignmentNeilBrown2014-08-081-1/+0
| | | | | | | | | | | | | | The variable 'ino' already exists and already has the correct value. The d_fsdata of a dentry is never changed after the d_fsdata is instantiated, so this new assignment cannot be necessary. It was introduced in commit b5b801779d59 ("autofs4: Add d_manage() dentry operation"). Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: remove unused autofs4_ispending()NeilBrown2014-08-081-14/+0
| | | | | | | Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: fix false positive compile errorIan Kent2014-07-031-1/+1
| | | | | | | | | | | | | | | | | | On strict build environments we can see: fs/autofs4/inode.c: In function 'autofs4_fill_super': fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function make[2]: *** [fs/autofs4/inode.o] Error 1 make[1]: *** [fs/autofs4] Error 2 make: *** [fs] Error 2 make: *** Waiting for unfinished jobs.... This is due to the use of pgrp_set being used to indicate pgrp has has been set rather than initializing pgrp itself. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_initFabian Frederick2014-06-041-1/+1
| | | | | | | | | autofs_dev_ioctl_init is only called by __init init_autofs4_fs Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: fix lockref lookupIan Kent2014-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | autofs needs to be able to see private data dentry flags for its dentrys that are being created but not yet hashed and for its dentrys that have been rmdir()ed but not yet freed. It needs to do this so it can block processes in these states until a status has been returned to indicate the given operation is complete. It does this by keeping two lists, active and expring, of dentrys in this state and uses ->d_release() to keep them stable while it checks the reference count to determine if they should be used. But with the recent lockref changes dentrys being freed sometimes don't transition to a reference count of 0 before being freed so autofs can occassionally use a dentry that is invalid which can lead to a panic. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: check dev ioctl size before allocatingSasha Levin2014-04-081-0/+3
| | | | | | | | | | | | | There wasn't any check of the size passed from userspace before trying to allocate the memory required. This meant that userspace might request more space than allowed, triggering an OOM. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: fix symlinks aren't checked for expiryIan Kent2014-01-232-0/+18
| | | | | | | | | | | | | | | | | | | | | | The autofs4 module doesn't consider symlinks for expire as it did in the older autofs v3 module (so it's actually a long standing regression). The user space daemon has focused on the use of bind mounts instead of symlinks for a long time now and that's why this has not been noticed. But with the future addition of amd map parsing to automount(8), not to mention amd itself (of am-utils), symlink expiry will be needed. The direct and offset mount types can't be symlinks and the tree mounts of version 4 were always real mounts so only indirect mounts need expire symlinks. Since the current users of the autofs4 module haven't reported this as a problem to date this patch probably isn't a candidate for backport to stable. Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: use IS_ROOT to replace root dentry checksRui Xiang2014-01-231-3/+3
| | | | | | | | | | Use the helper macro !IS_ROOT to replace parent != dentry->d_parent. Just clean up. Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: fix the return value of autofs4_fill_superRui Xiang2014-01-231-5/+8
| | | | | | | | | | | While kzallocing sbi/ino fails, it should return -ENOMEM. And it should return the err value from autofs_prepare_pipe. Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: translate pids to the right namespace for the daemonMiklos Szeredi2014-01-231-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | The PID and the TGID of the process triggering the mount are sent to the daemon. Currently the global pid values are sent (ones valid in the initial pid namespace) but this is wrong if the autofs daemon itself is not running in the initial pid namespace. So send the pid values that are valid in the namespace of the autofs daemon. The namespace to use is taken from the oz_pgrp pid pointer, which was set at mount time to the mounting process' pid namespace. If the pid translation fails (the triggering process is in an unrelated pid namespace) then the automount fails with ENOENT. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: allow autofs to work outside the initial PID namespaceSukadev Bhattiprolu2014-01-233-13/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable autofs4 to work in a "container". oz_pgrp is converted from pid_t to struct pid and this is stored at mount time based on the "pgrp=" option or if the option is missing then the current pgrp. The "pgrp=" option is interpreted in the PID namespace of the current process. This option is flawed in that it doesn't carry the namespace information, so it should be deprecated. AFAICS the autofs daemon always sends the current pgrp, which is the default anyway. The oz_pgrp is also set from the AUTOFS_DEV_IOCTL_SETPIPEFD_CMD ioctl. This ioctl sets oz_pgrp to the current pgrp. It is not allowed to change the pid namespace. oz_pgrp is used mainly to determine whether the process traversing the autofs mount tree is the autofs daemon itself or not. This function now compares the pid pointers instead of the pid_t values. One other use of oz_pgrp is in autofs4_show_options. There is shows the virtual pid number (i.e. the one that is valid inside the PID namespace of the calling process) For debugging printk convert oz_pgrp to the value in the initial pid namespace. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Biederman <ebiederm@xmission.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: make freeing sbi rcu-delayedAl Viro2013-10-242-9/+5
| | | | | | makes ->d_managed() safety in RCU mode independent from vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* file->f_op is never NULL...Al Viro2013-10-242-7/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* autofs4: close the races around autofs4_notify_daemon()Al Viro2013-09-161-10/+3
| | | | | | | | | | Don't drop ->wq_mutex before calling autofs4_notify_daemon() only to regain it there. Besides being pointless, that opens a race window where autofs4_wait_release() could've come and freed wq->name.name. And do the debugging printk in the "reused an existing wq" case before dropping ->wq_mutex - the same reason... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Ian Kent <raven@themaw.net>
* autofs4 - fix device ioctl mount lookupIan Kent2013-09-081-11/+12
| | | | | | | | | | | | | | | | | When reconnecting to automounts at startup an autofs ioctl is used to find the device and inode of existing mounts so they can be used to open a file descriptor of possibly covered mounts. At this time the the caller might not yet "own" the mount so it can trigger calling ->d_automount(). This causes automount to hang when trying to reconnect to direct or offset mount types. Consequently kern_path() can't be used but kern_path_mountpoint() can be. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* helper for reading ->d_countAl Viro2013-07-052-5/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [readdir] switch dcache_readdir() users to ->iterate()Al Viro2013-06-291-2/+2
| | | | | | | new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx), dir_emit_dots(file, ctx). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* autofs - remove autofs dentry mount checkDavid Jeffery2013-05-061-9/+0
| | | | | | | | | | | | | | | | | | | When checking if an autofs mount point is busy it isn't sufficient to only check if it's a mount point. For example, if the mount of an offset mountpoint in a tree is denied for this host by its export and the dentry becomes a process working directory the check incorrectly returns the mount as not in use at expire. This can happen since the default when mounting within a tree is nostrict, which means ingnore mount fails on mounts within the tree and continue. The nostrict option is meant to allow mounting in this case. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs - fix sparse warning for autofs4_d_manage()Claudiu Ghioc2013-05-061-1/+1
| | | | | | | | | | | | | Fixed the sparse warning: fs/autofs4/root.c:411:5: warning: symbol 'autofs4_d_manage' was not declared. Should it be static?" [ Clearly it should be static as the function is declared static at the top of root.c. - imk ] Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman2013-03-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* autofs4 - autofs4_catatonic_mode(): remove redundant null check on kfree()Tim Gardner2013-03-011-4/+2
| | | | | | | | | | | smatch analysis: fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null check on wq->name.name calling kfree() Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Ian Kent <raven@themaw.net> Cc: autofs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs - Fix sparse warning: context imbalance in autofs4_d_automount() ↵Peter Huewe2013-03-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | different lock contexts for basic block Sparse complains: fs/autofs4/root.c:409:9: sparse: context imbalance in 'autofs4_d_automount' - different lock contexts for basic block This was introduced by commit f55fb0c24386 ("autofs4 - dont clear DCACHE_NEED_AUTOMOUNT on rootless mount") The function autofs4_d_automount can be left with the (&sbi->fs_lock) held if sbi->version <= 4 and simple_empty(dentry) == false so the warning seems valid. --> Add an spin_unlock in this case before we jump to done Unfortunately compile tested only. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: change return values from -EACCES to -EPERMZhao Hongjiang2013-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | According to SUSv3: [EACCES] Permission denied. An attempt was made to access a file in a way forbidden by its file access permissions. [EPERM] Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resource. So -EPERM should be returned if capability checks fails. Strictly speaking this is an API change since the error code user sees is altered. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: file_inode(file)Al Viro2013-02-223-4/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-12-174-17/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "While small this set of changes is very significant with respect to containers in general and user namespaces in particular. The user space interface is now complete. This set of changes adds support for unprivileged users to create user namespaces and as a user namespace root to create other namespaces. The tyranny of supporting suid root preventing unprivileged users from using cool new kernel features is broken. This set of changes completes the work on setns, adding support for the pid, user, mount namespaces. This set of changes includes a bunch of basic pid namespace cleanups/simplifications. Of particular significance is the rework of the pid namespace cleanup so it no longer requires sending out tendrils into all kinds of unexpected cleanup paths for operation. At least one case of broken error handling is fixed by this cleanup. The files under /proc/<pid>/ns/ have been converted from regular files to magic symlinks which prevents incorrect caching by the VFS, ensuring the files always refer to the namespace the process is currently using and ensuring that the ptrace_mayaccess permission checks are always applied. The files under /proc/<pid>/ns/ have been given stable inode numbers so it is now possible to see if different processes share the same namespaces. Through the David Miller's net tree are changes to relax many of the permission checks in the networking stack to allowing the user namespace root to usefully use the networking stack. Similar changes for the mount namespace and the pid namespace are coming through my tree. Two small changes to add user namespace support were commited here adn in David Miller's -net tree so that I could complete the work on the /proc/<pid>/ns/ files in this tree. Work remains to make it safe to build user namespaces and 9p, afs, ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the Kconfig guard remains in place preventing that user namespaces from being built when any of those filesystems are enabled. Future design work remains to allow root users outside of the initial user namespace to mount more than just /proc and /sys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits) proc: Usable inode numbers for the namespace file descriptors. proc: Fix the namespace inode permission checks. proc: Generalize proc inode allocation userns: Allow unprivilged mounts of proc and sysfs userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file procfs: Print task uids and gids in the userns that opened the proc file userns: Implement unshare of the user namespace userns: Implent proc namespace operations userns: Kill task_user_ns userns: Make create_new_namespaces take a user_ns parameter userns: Allow unprivileged use of setns. userns: Allow unprivileged users to create new namespaces userns: Allow setting a userns mapping to your current uid. userns: Allow chown and setgid preservation userns: Allow unprivileged users to create user namespaces. userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped userns: fix return value on mntns_install() failure vfs: Allow unprivileged manipulation of the mount namespace. vfs: Only support slave subtrees across different user namespaces vfs: Add a user namespace reference from struct mnt_namespace ...
| * userns: Support autofs4 interacing with multiple user namespacesEric W. Biederman2012-11-144-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue. When creating directories and symlinks default the uid and gid of the mount requester to the global root uid and gid. autofs4_wait will update these fields when a mount is requested. When generating autofsv5 packets report the uid and gid of the mount requestor in user namespace of the process that opened the pipe, reporting unmapped uids and gids as overflowuid and overflowgid. In autofs_dev_ioctl_requester return the uid and gid of the last mount requester converted into the calling processes user namespace. When the uid or gid don't map return overflowuid and overflowgid as appropriate, allowing failure to find a mount requester to be distinguished from failure to map a mount requester. The uid and gid mount options specifying the user and group of the root autofs inode are converted into kuid and kgid as they are parsed defaulting to the current uid and current gid of the process that mounts autofs. Mounting of autofs for the present remains confined to processes in the initial user namespace. Cc: Ian Kent <raven@themaw.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | autofs4 - use simple_empty() for empty directory checkIan Kent2012-12-131-17/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For direct (and offset) mounts, if an automounted mount is manually umounted the trigger mount dentry can appear non-empty causing it to not trigger mounts. This can also happen if there is a file handle leak in a user space automounting application. This happens because, when a ioctl control file handle is opened on the mount, a cursor dentry is created which causes list_empty() to see the dentry as non-empty. Since there is a case where listing the directory of these dentrys is needed, the use of dcache_dir_*() functions for .open() and .release() is needed. Consequently simple_empty() must be used instead of list_empty() when checking for an empty directory. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | autofs4 - dont clear DCACHE_NEED_AUTOMOUNT on rootless mountIan Kent2012-12-132-34/+36
|/ | | | | | | | | | | | | | | The DCACHE_NEED_AUTOMOUNT flag is cleared on mount and set on expire for autofs rootless multi-mount dentrys to prevent unnecessary calls to ->d_automount(). Since DCACHE_MANAGE_TRANSIT is always set on autofs dentrys ->d_managed() is always called so the check can be done in ->d_manage() without the need to change the flag. This still avoids unnecessary calls to ->d_automount(), adds negligible overhead and eliminates a seriously ugly check in the expire code. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4 - fix reset pending flag on mount failIan Kent2012-10-111-2/+4
| | | | | | | | | | | | | In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING mount pending flag is not cleared. One effect of this is when using the "browse" option, directory entry attributes show up with all "?"s due to the incorrect callback and subsequent failure return (when in fact no callback should be made). Signed-off-by: Ian Kent <ikent@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make get_file() return its argumentAl Viro2012-09-261-2/+1
| | | | | | simplifies a bunch of callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* autofs4: don't open-code fd_install()Al Viro2012-09-261-16/+2
| | | | | | | | | The only difference between autofs_dev_ioctl_fd_install() and fd_install() is __set_close_on_exec() done by the latter. Just use get_unused_fd_flags(O_CLOEXEC) to allocate the descriptor and be done with that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* autofs4 - fix expire checkIan Kent2012-08-171-5/+0
| | | | | | | | | | | | | | | | | | | | | | In some cases when an autofs indirect mount is contained in a file system that is marked as shared (such as when systemd does the equivalent of "mount --make-rshared /" early in the boot), mounts stop expiring. When this happens the first expiry check on a mountpoint dentry in autofs_expire_indirect() sees a mountpoint dentry with a higher than minimal reference count. Consequently the dentry is condidered busy and the actual expiry check is never done. This particular check was originally meant as an optimisation to detect a path walk in progress but with the addition of rcu-walk it can be ineffective anyway. Removing the test allows automounts to expire again since the actual expire check doesn't rely on the dentry reference count. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4 - fix get_next_positive_subdir()Ian Kent2012-08-161-18/+13
| | | | | | | | | | | | | | | | | | Following a report of a crash during an automount expire I found that the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong. Not only is the locking wrong but the function is more complex than it needs to be. The function is meant to calculate (and dget) the next entry in the list of directories contained in the root of an autofs mount point (an autofs indirect mount to be precise). The main problem was that the d_lock of the owner of the list was not being taken when walking the list, which lead to list corruption under load. The only other lock that needs to be taken is against the next dentry candidate so it can be checked for usability. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* switch dentry_open() to struct path, make it grab references itselfAl Viro2012-07-231-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* stop passing nameidata to ->lookup()Al Viro2012-07-141-2/+2
| | | | | | | | | Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds2012-05-281-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull writeback tree from Wu Fengguang: "Mainly from Jan Kara to avoid iput() in the flusher threads." * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Avoid iput() from flusher thread vfs: Rename end_writeback() to clear_inode() vfs: Move waiting for inode writeback from end_writeback() to evict_inode() writeback: Refactor writeback_single_inode() writeback: Remove wb->list_lock from writeback_single_inode() writeback: Separate inode requeueing after writeback writeback: Move I_DIRTY_PAGES handling writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() writeback: Move clearing of I_SYNC into inode_sync_complete() writeback: initialize global_dirty_limit fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds mm: page-writeback.c: local functions should not be exposed globally
| * vfs: Rename end_writeback() to clear_inode()Jan Kara2012-05-061-1/+1
| | | | | | | | | | | | | | | | | | After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
* | autofs: make the autofsv5 packet file descriptor use a packetized pipeLinus Torvalds2012-04-293-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The autofs packet size has had a very unfortunate size problem on x86: because the alignment of 'u64' differs in 32-bit and 64-bit modes, and because the packet data was not 8-byte aligned, the size of the autofsv5 packet structure differed between 32-bit and 64-bit modes despite looking otherwise identical (300 vs 304 bytes respectively). We first fixed that up by making the 64-bit compat mode know about this problem in commit a32744d4abae ("autofs: work around unhappy compat problem on x86-64"), and that made a 32-bit 'systemd' work happily on a 64-bit kernel because everything then worked the same way as on a 32-bit kernel. But it turned out that 'automount' had actually known and worked around this problem in user space, so fixing the kernel to do the proper 32-bit compatibility handling actually *broke* 32-bit automount on a 64-bit kernel, because it knew that the packet sizes were wrong and expected those incorrect sizes. As a result, we ended up reverting that compatibility mode fix, and thus breaking systemd again, in commit fcbf94b9dedd. With both automount and systemd doing a single read() system call, and verifying that they get *exactly* the size they expect but using different sizes, it seemed that fixing one of them inevitably seemed to break the other. At one point, a patch I seriously considered applying from Michael Tokarev did a "strcmp()" to see if it was automount that was doing the operation. Ugly, ugly. However, a prettier solution exists now thanks to the packetized pipe mode. By marking the communication pipe as being packetized (by simply setting the O_DIRECT flag), we can always just write the bigger packet size, and if user-space does a smaller read, it will just get that partial end result and the extra alignment padding will simply be thrown away. This makes both automount and systemd happy, since they now get the size they asked for, and the kernel side of autofs simply no longer needs to care - it could pad out the packet arbitrarily. Of course, if there is some *other* user of autofs (please, please, please tell me it ain't so - and we haven't heard of any) that tries to read the packets with multiple writes, that other user will now be broken - the whole point of the packetized mode is that one system call gets exactly one packet, and you cannot read a packet in pieces. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Revert "autofs: work around unhappy compat problem on x86-64"Linus Torvalds2012-04-284-23/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit a32744d4abae24572eff7269bc17895c41bd0085. While that commit was technically the right thing to do, and made the x86-64 compat mode work identically to native 32-bit mode (and thus fixing the problem with a 32-bit systemd install on a 64-bit kernel), it turns out that the automount binaries had workarounds for this compat problem. Now, the workarounds are disgusting: doing an "uname()" to find out the architecture of the kernel, and then comparing it for the 64-bit cases and fixing up the size of the read() in automount for those. And they were confused: it's not actually a generic 64-bit issue at all, it's very much tied to just x86-64, which has different alignment for an 'u64' in 64-bit mode than in 32-bit mode. But the end result is that fixing the compat layer actually breaks the case of a 32-bit automount on a x86-64 kernel. There are various approaches to fix this (including just doing a "strcmp()" on current->comm and comparing it to "automount"), but I think that I will do the one that teaches pipes about a special "packet mode", which will allow user space to not have to care too deeply about the padding at the end of the autofs packet. That change will make the compat workaround unnecessary, so let's revert it first, and get automount working again in compat mode. The packetized pipes will then fix autofs for systemd. Reported-and-requested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Ian Kent <raven@themaw.net> Cc: stable@kernel.org # for 3.3 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'x86-x32-for-linus' of ↵Linus Torvalds2012-03-291-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
| * Wrap accesses to the fd_sets in struct fdtableDavid Howells2012-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrap accesses to the fd_sets in struct fdtable (for recording open files and close-on-exec flags) so that we can move away from using fd_sets since we abuse the fd_set structs by not allocating the full-sized structure under normal circumstances and by non-core code looking at the internals of the fd_sets. The first abuse means that use of FD_ZERO() on these fd_sets is not permitted, since that cannot be told about their abnormal lengths. This introduces six wrapper functions for setting, clearing and testing close-on-exec flags and fd-is-open flags: void __set_close_on_exec(int fd, struct fdtable *fdt); void __clear_close_on_exec(int fd, struct fdtable *fdt); bool close_on_exec(int fd, const struct fdtable *fdt); void __set_open_fd(int fd, struct fdtable *fdt); void __clear_open_fd(int fd, struct fdtable *fdt); bool fd_is_open(int fd, const struct fdtable *fdt); Note that I've prepended '__' to the names of the set/clear functions because they require the caller to hold a lock to use them. Note also that I haven't added wrappers for looking behind the scenes at the the array. Possibly that should exist too. Signed-off-by: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
OpenPOWER on IntegriCloud