summaryrefslogtreecommitdiffstats
path: root/fs/mount.h
Commit message (Collapse)AuthorAgeFilesLines
* mnt: Add a per mount namespace limit on the number of mountsEric W. Biederman2016-09-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* mntns: Add a limit on the number of mount namespaces.Eric W. Biederman2016-08-311-0/+1
| | | | | | | | | | v2: Fixed the very obvious lack of setting ucounts on struct mnt_ns reported by Andrei Vagin, and the kbuild test report. Reported-by: Andrei Vagin <avagin@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* fs: use seq_open_private() for proc_mountsYann Droneaud2015-06-301-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A patchset to remove support for passing pre-allocated struct seq_file to seq_open(). Such feature is undocumented and prone to error. In particular, if seq_release() is used in release handler, it will kfree() a pointer which was not allocated by seq_open(). So this patchset drops support for pre-allocated struct seq_file: it's only of use in proc_namespace.c and can be easily replaced by using seq_open_private()/seq_release_private(). Additionally, it documents the use of file->private_data to hold pointer to struct seq_file by seq_open(). This patch (of 3): Since patch described below, from v2.6.15-rc1, seq_open() could use a struct seq_file already allocated by the caller if the pointer to the structure is stored in file->private_data before calling the function. Commit 1abe77b0fc4b485927f1f798ae81a752677e1d05 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Mon Nov 7 17:15:34 2005 -0500 [PATCH] allow callers of seq_open do allocation themselves Allow caller of seq_open() to kmalloc() seq_file + whatever else they want and set ->private_data to it. seq_open() will then abstain from doing allocation itself. Such behavior is only used by mounts_open_common(). In order to drop support for such uncommon feature, proc_mounts is converted to use seq_open_private(), which take care of allocating the proc_mounts structure, making it available through ->private in struct seq_file. Conversely, proc_mounts is converted to use seq_release_private(), in order to release the private structure allocated by seq_open_private(). Then, ->private is used directly instead of proc_mounts() macro to access to the proc_mounts structure. Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* new helper: __legitimize_mnt()Al Viro2015-05-111-0/+1
| | | | | | | | | | same as legitimize_mnt(), except that it does *not* drop and regain rcu_read_lock; return values are 0 => grabbed a reference, we are fine 1 => failed, just go away -1 => failed, go away and mntput(bastard) when outside of rcu_read_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch the IO-triggering parts of umount to fs_pinAl Viro2015-01-251-1/+3
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* common object embedded into various struct ....nsAl Viro2014-12-041-1/+2
| | | | | | | for now - just move corresponding ->proc_inum instances over there Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Add a function to lazily unmount all mounts from any dentry.Eric W. Biederman2014-10-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | The new function detach_mounts comes in two pieces. The first piece is a static inline test of d_mounpoint that returns immediately without taking any locks if d_mounpoint is not set. In the common case when mountpoints are absent this allows the vfs to continue running with it's same cacheline foot print. The second piece of detach_mounts __detach_mounts actually does the work and it assumes that a mountpoint is present so it is slow and takes namespace_sem for write, and then locks the mount hash (aka mount_lock) after a struct mountpoint has been found. With those two locks held each entry on the list of mounts on a mountpoint is selected and lazily unmounted until all of the mount have been lazily unmounted. v7: Wrote a proper change description and removed the changelog documenting deleted wrong turns. Signed-off-by: Eric W. Biederman <ebiederman@twitter.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Keep a list of mounts on a mount pointEric W. Biederman2014-10-091-0/+2
| | | | | | | | | | | To spot any possible problems call BUG if a mountpoint is put when it's list of mounts is not empty. AV: use hlist instead of list_head Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Eric W. Biederman <ebiederman@twitter.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Don't allow overwriting mounts in the current mount namespaceEric W. Biederman2014-10-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for allowing mountpoints to be renamed and unlinked in remote filesystems and in other mount namespaces test if on a dentry there is a mount in the local mount namespace before allowing it to be renamed or unlinked. The primary motivation here are old versions of fusermount unmount which is not safe if the a path can be renamed or unlinked while it is verifying the mount is safe to unmount. More recent versions are simpler and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount in a directory owned by an arbitrary user. Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good enough to remove concerns about new kernels mixed with old versions of fusermount. A secondary motivation for restrictions here is that it removing empty directories that have non-empty mount points on them appears to violate the rule that rmdir can not remove empty directories. As Linus Torvalds pointed out this is useful for programs (like git) that test if a directory is empty with rmdir. Therefore this patch arranges to enforce the existing mount point semantics for local mount namespace. v2: Rewrote the test to be a drop in replacement for d_mountpoint v3: Use bool instead of int as the return type of is_local_mountpoint Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* delayed mntputAl Viro2014-10-091-1/+4
| | | | | | | | | | | | | | | | On final mntput() we want fs shutdown to happen before return to userland; however, the only case where we want it happen right there (i.e. where task_work_add won't do) is MNT_INTERNAL victim. Those have to be fully synchronous - failure halfway through module init might count on having vfsmount killed right there. Fortunately, final mntput on MNT_INTERNAL vfsmounts happens on shallow stack. So we handle those synchronously and do an analog of delayed fput logics for everything else. As the result, we are guaranteed that fs shutdown will always happen on shallow stack. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* death to mnt_pinnedAl Viro2014-08-071-1/+0
| | | | | | | | | | | | | | | | | | | | | Rather than playing silly buggers with vfsmount refcounts, just have acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt and replace it with said clone. Then attach the pin to original vfsmount. Voila - the clone will be alive until the file gets closed, making sure that underlying superblock remains active, etc., and we can drop the original vfsmount, so that it's not kept busy. If the file lives until the final mntput of the original vfsmount, we'll notice that there's an fs_pin (one in bsd_acct_struct that holds that file) and mnt_pin_kill() will take it out. Since ->kill() is synchronous, we won't proceed past that point until these files are closed (and private clones of our vfsmount are gone), so we get the same ordering warranties we used to get. mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance - it never became usable outside of kernel/acct.c (and racy wrt umount even there). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* acct: get rid of acct_listAl Viro2014-08-071-0/+1
| | | | | | | | Put these suckers on per-vfsmount and per-superblock lists instead. Note: right now it's still acct_lock for everything, but that's going to change. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reduce m_start() cost...Al Viro2014-04-011-1/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch mnt_hash to hlistAl Viro2014-03-301-1/+1
| | | | | | | | | | | | | fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* resizable namespace.c hashesAl Viro2014-03-301-1/+1
| | | | | | | | | * switch allocation to alloc_large_system_hash() * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) * switch mountpoint_hashtable from list_head to hlist_head Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Is mounted should be testing mnt_ns for NULL or error.Eric W. Biederman2014-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | A bug was introduced with the is_mounted helper function in commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat Jun 9 00:59:08 2012 -0400 get rid of ->mnt_longterm it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> The intent was to test if the real_mount(vfsmount)->mnt_ns was NULL_OR_ERR but the code is actually testing real_mount(vfsmount) and always returning true. The result is d_absolute_path returning paths it should be hiding. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* RCU'd vfsmountsAl Viro2013-11-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * RCU-delayed freeing of vfsmounts * vfsmount_lock replaced with a seqlock (mount_lock) * sequence number from mount_lock is stored in nameidata->m_seq and used when we exit RCU mode * new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its caller knows that vfsmount will have no surviving references. * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock() and doing pending mntput(). * new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence number against seq, then grabs reference to mnt. Then it rechecks mount_lock again to close the race and either returns success or drops the reference it has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can simply decrement the refcount and sod off - aforementioned synchronize_rcu() makes sure that final mntput() won't come until we leave RCU mode. We need that, since we don't want to end up with some lazy pathwalk racing with umount() and stealing the final mntput() from it - caller of umount() may expect it to return only once the fs is shut down and we don't want to break that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do full-blown mntput() in case of mount_lock sequence number mismatch happening just as we'd grabbed the reference, but in those cases we won't be stealing the final mntput() from anything that would care. * mntput_no_expire() doesn't lock anything on the fast path now. Incidentally, SMP and UP cases are handled the same way - no ifdefs there. * normal pathname resolution does *not* do any writes to mount_lock. It does, of course, bump the refcounts of vfsmount and dentry in the very end, but that's it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* split __lookup_mnt() in two functionsAl Viro2013-10-241-1/+2
| | | | | | | | Instead of passing the direction as argument (and checking it on every step through the hash chain), just have separate __lookup_mnt() and __lookup_mnt_last(). And use the standard iterators... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helpers: lock_mount_hash/unlock_mount_hashAl Viro2013-10-241-0/+13
| | | | | | | aka br_write_{lock,unlock} of vfsmount_lock. Inlines in fs/mount.h, vfsmount_lock extern moved over there as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* namespace.c: get rid of mnt_ghostsAl Viro2013-10-241-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of full-hash scan on detaching vfsmountsAl Viro2013-04-091-0/+7
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* proc: Usable inode numbers for the namespace file descriptors.Eric W. Biederman2012-11-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* vfs: Add a user namespace reference from struct mnt_namespaceEric W. Biederman2012-11-191-0/+1
| | | | | | | This will allow for support for unprivileged mounts in a new user namespace. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* vfs: Add setns support for the mount namespaceEric W. Biederman2012-11-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | setns support for the mount namespace is a little tricky as an arbitrary decision must be made about what to set fs->root and fs->pwd to, as there is no expectation of a relationship between the two mount namespaces. Therefore I arbitrarily find the root mount point, and follow every mount on top of it to find the top of the mount stack. Then I set fs->root and fs->pwd to that location. The topmost root of the mount stack seems like a reasonable place to be. Bind mount support for the mount namespace inodes has the possibility of creating circular dependencies between mount namespaces. Circular dependencies can result in loops that prevent mount namespaces from every being freed. I avoid creating those circular dependencies by adding a sequence number to the mount namespace and require all bind mounts be of a younger mount namespace into an older mount namespace. Add a helper function proc_ns_inode so it is possible to detect when we are attempting to bind mound a namespace inode. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* get rid of magic in proc_namespace.cAl Viro2012-07-141-1/+3
| | | | | | | | don't rely on proc_mounts->m being the first field; container_of() is there for purpose. No need to bother with ->private, while we are at it - the same container_of will do nicely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of ->mnt_longtermAl Viro2012-07-141-1/+8
| | | | | | | | | it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: keep list of mounts for each superblockMiklos Szeredi2012-01-061-0/+1
| | | | | | | | | Keep track of vfsmounts belonging to a superblock. List is protected by vfsmount_lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch mnt_namespace ->root to struct mountAl Viro2012-01-031-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: take /proc/*/mounts and friends to fs/proc_namespace.cAl Viro2012-01-031-0/+24
| | | | | | | rationale: that stuff is far tighter bound to fs/namespace.c than to the guts of procfs proper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: move fsnotify junk to struct mountAl Viro2012-01-031-1/+4
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: move mnt_devnameAl Viro2012-01-031-1/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: move mnt_list to struct mountAl Viro2012-01-031-1/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: move the rest of int fields to struct mountAl Viro2012-01-031-0/+3
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: mnt_id/mnt_group_id movedAl Viro2012-01-031-0/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: mnt_ns moved to struct mountAl Viro2012-01-031-0/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mountAl Viro2012-01-031-1/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: and now we can make ->mnt_master point to struct mountAl Viro2012-01-031-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: take mnt_master to struct mountAl Viro2012-01-031-0/+2
| | | | | | make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: take mnt_child/mnt_mounts to struct mountAl Viro2012-01-031-0/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: all counters taken to struct mountAl Viro2012-01-031-0/+12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: move mnt_mountpoint to struct mountAl Viro2012-01-031-0/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: now it can be done - make mnt_parent point to struct mountAl Viro2012-01-031-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: mnt_parent moved to struct mountAl Viro2012-01-031-1/+2
| | | | | | the second victim... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: spread struct mount - mnt_has_parentAl Viro2012-01-031-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: the first spoils - mnt_hash movedAl Viro2012-01-031-0/+1
| | | | | | taken out of struct vfsmount into struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: spread struct mount - __lookup_mnt() resultAl Viro2012-01-031-0/+2
| | | | | | switch __lookup_mnt() to returning struct mount *; callers adjusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: start hiding vfsmount guts seriesAl Viro2012-01-031-0/+9
| | | | | | | | | | | | | | | | | | Almost all fields of struct vfsmount are used only by core VFS (and a fairly small part of it, at that). The plan: embed struct vfsmount into struct mount, making the latter visible only to core parts of VFS. Then move fields from vfsmount to mount, eventually leaving only mnt_root/mnt_sb/mnt_flags in struct vfsmount. Filesystem code still gets pointers to struct vfsmount and remains unchanged; all such pointers go to struct vfsmount embedded into the instances of struct mount allocated by fs/namespace.c. When fs/namespace.c et.al. get a pointer to vfsmount, they turn it into pointer to mount (using container_of) and work with that. This is the first part of series; struct mount is introduced, allocation switched to using it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: new internal helper: mnt_has_parent(mnt)Al Viro2012-01-031-0/+6
vfsmounts have ->mnt_parent pointing either to a different vfsmount or to itself; it's never NULL and termination condition in loops traversing the tree towards root is mnt == mnt->mnt_parent. At least one place (see the next patch) is confused about what's going on; let's add an explicit helper checking it right way and use it in all places where we need it. Not that there had been too many, but... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
OpenPOWER on IntegriCloud