summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* d_prune_alias(): just lock the parent and call __dentry_kill()Al Viro2014-10-091-14/+7
| | | | | | | | | | | | | | | | | The only reason for games with ->d_prune() was __d_drop(), which was needed only to force dput() into killing the sucker off. Note that lock_parent() can be called under ->i_lock and won't drop it, so dentry is safe from somebody managing to kill it under us - it won't happen while we are holding ->i_lock. __dentry_kill() is called only with ->d_lockref.count being 0 (here and when picked from shrink list) or 1 (dput() and dropping the ancestors in shrink_dentry_list()), so it will never be called twice - the first thing it's doing is making ->d_lockref.count negative and once that happens, nothing will increment it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* proc: Update proc_flush_task_mnt to use d_invalidateEric W. Biederman2014-10-091-4/+2
| | | | | | | | | | | Now that d_invalidate always succeeds and flushes mount points use it in stead of a combination of shrink_dcache_parent and d_drop in proc_flush_task_mnt. This removes the danger of a mount point under /proc/<pid>/... becoming unreachable after the d_drop. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Remove d_drop calls from d_revalidate implementationsEric W. Biederman2014-10-093-7/+0
| | | | | | | | | | | | | | Now that d_invalidate always succeeds it is not longer necessary or desirable to hard code d_drop calls into filesystem specific d_revalidate implementations. Remove the unnecessary d_drop calls and rely on d_invalidate to drop the dentries. Using d_invalidate ensures that paths to mount points will not be dropped. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Make d_invalidate return voidEric W. Biederman2014-10-097-32/+13
| | | | | | | | | | Now that d_invalidate can no longer fail, stop returning a useless return code. For the few callers that checked the return code update remove the handling of d_invalidate failure. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Merge check_submounts_and_drop and d_invalidateEric W. Biederman2014-10-092-34/+22
| | | | | | | | | Now that d_invalidate is the only caller of check_submounts_and_drop, expand check_submounts_and_drop inline in d_invalidate. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Remove unnecessary calls of check_submounts_and_dropEric W. Biederman2014-10-095-26/+0
| | | | | | | | | | Now that check_submounts_and_drop can not fail and is called from d_invalidate there is no longer a need to call check_submounts_and_drom from filesystem d_revalidate methods so remove it. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Lazily remove mounts on unlinked files and directories.Eric W. Biederman2014-10-092-33/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of mount namespaces and bind mounts it became possible to access files and directories that on some paths are mount points but are not mount points on other paths. It is very confusing when rm -rf somedir returns -EBUSY simply because somedir is mounted somewhere else. With the addition of user namespaces allowing unprivileged mounts this condition has gone from annoying to allowing a DOS attack on other users in the system. The possibility for mischief is removed by updating the vfs to support rename, unlink and rmdir on a dentry that is a mountpoint and by lazily unmounting mountpoints on deleted dentries. In particular this change allows rename, unlink and rmdir system calls on a dentry without a mountpoint in the current mount namespace to succeed, and it allows rename, unlink, and rmdir performed on a distributed filesystem to update the vfs cache even if when there is a mount in some namespace on the original dentry. There are two common patterns of maintaining mounts: Mounts on trusted paths with the parent directory of the mount point and all ancestory directories up to / owned by root and modifiable only by root (i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu, cpuacct, ...}, /usr, /usr/local). Mounts on unprivileged directories maintained by fusermount. In the case of mounts in trusted directories owned by root and modifiable only by root the current parent directory permissions are sufficient to ensure a mount point on a trusted path is not removed or renamed by anyone other than root, even if there is a context where the there are no mount points to prevent this. In the case of mounts in directories owned by less privileged users races with users modifying the path of a mount point are already a danger. fusermount already uses a combination of chdir, /proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races. The removable of global rename, unlink, and rmdir protection really adds nothing new to consider only a widening of the attack window, and fusermount is already safe against unprivileged users modifying the directory simultaneously. In principle for perfect userspace programs returning -EBUSY for unlink, rmdir, and rename of dentires that have mounts in the local namespace is actually unnecessary. Unfortunately not all userspace programs are perfect so retaining -EBUSY for unlink, rmdir and rename of dentries that have mounts in the current mount namespace plays an important role of maintaining consistency with historical behavior and making imperfect userspace applications hard to exploit. v2: Remove spurious old_dentry. v3: Optimized shrink_submounts_and_drop Removed unsued afs label v4: Simplified the changes to check_submounts_and_drop Do not rename check_submounts_and_drop shrink_submounts_and_drop Document what why we need atomicity in check_submounts_and_drop Rely on the parent inode mutex to make d_revalidate and d_invalidate an atomic unit. v5: Refcount the mountpoint to detach in case of simultaneous renames. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Add a function to lazily unmount all mounts from any dentry.Eric W. Biederman2014-10-092-0/+40
| | | | | | | | | | | | | | | | | | | | | | | The new function detach_mounts comes in two pieces. The first piece is a static inline test of d_mounpoint that returns immediately without taking any locks if d_mounpoint is not set. In the common case when mountpoints are absent this allows the vfs to continue running with it's same cacheline foot print. The second piece of detach_mounts __detach_mounts actually does the work and it assumes that a mountpoint is present so it is slow and takes namespace_sem for write, and then locks the mount hash (aka mount_lock) after a struct mountpoint has been found. With those two locks held each entry on the list of mounts on a mountpoint is selected and lazily unmounted until all of the mount have been lazily unmounted. v7: Wrote a proper change description and removed the changelog documenting deleted wrong turns. Signed-off-by: Eric W. Biederman <ebiederman@twitter.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: factor out lookup_mountpoint from new_mountpointEric W. Biederman2014-10-091-3/+12
| | | | | | | | | | | I am shortly going to add a new user of struct mountpoint that needs to look up existing entries but does not want to create a struct mountpoint if one does not exist. Therefore to keep the code simple and easy to read split out lookup_mountpoint from new_mountpoint. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Keep a list of mounts on a mount pointEric W. Biederman2014-10-092-0/+8
| | | | | | | | | | | To spot any possible problems call BUG if a mountpoint is put when it's list of mounts is not empty. AV: use hlist instead of list_head Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Eric W. Biederman <ebiederman@twitter.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Don't allow overwriting mounts in the current mount namespaceEric W. Biederman2014-10-093-1/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for allowing mountpoints to be renamed and unlinked in remote filesystems and in other mount namespaces test if on a dentry there is a mount in the local mount namespace before allowing it to be renamed or unlinked. The primary motivation here are old versions of fusermount unmount which is not safe if the a path can be renamed or unlinked while it is verifying the mount is safe to unmount. More recent versions are simpler and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount in a directory owned by an arbitrary user. Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good enough to remove concerns about new kernels mixed with old versions of fusermount. A secondary motivation for restrictions here is that it removing empty directories that have non-empty mount points on them appears to violate the rule that rmdir can not remove empty directories. As Linus Torvalds pointed out this is useful for programs (like git) that test if a directory is empty with rmdir. Therefore this patch arranges to enforce the existing mount point semantics for local mount namespace. v2: Rewrote the test to be a drop in replacement for d_mountpoint v3: Use bool instead of int as the return type of is_local_mountpoint Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: More precise tests in d_invalidateEric W. Biederman2014-10-091-34/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current comments in d_invalidate about what and why it is doing what it is doing are wildly off-base. Which is not surprising as the comments date back to last minute bug fix of the 2.2 kernel. The big fat lie of a comment said: If it's a directory, we can't drop it for fear of somebody re-populating it with children (even though dropping it would make it unreachable from that root, we still might repopulate it if it was a working directory or similar). [AV] What we really need to avoid is multiple dentry aliases of the same directory inode; on all filesystems that have ->d_revalidate() we either declare all positive dentries always valid (and thus never fed to d_invalidate()) or use d_materialise_unique() and/or d_splice_alias(), which take care of alias prevention. The current rules are: - To prevent mount point leaks dentries that are mount points or that have childrent that are mount points may not be be unhashed. - All dentries may be unhashed. - Directories may be rehashed with d_materialise_unique check_submounts_and_drop implements this already for well maintained remote filesystems so implement the current rules in d_invalidate by just calling check_submounts_and_drop. The one difference between d_invalidate and check_submounts_and_drop is that d_invalidate must respect it when a d_revalidate method has earlier called d_drop so preserve the d_unhashed check in d_invalidate. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Document the effect of d_revalidate on d_find_aliasEric W. Biederman2014-10-091-1/+2
| | | | | | | | | | | d_drop or check_submounts_and_drop called from d_revalidate can result in renamed directories with child dentries being unhashed. These renamed and drop directory dentries can be rehashed after d_materialise_unique uses d_find_alias to find them. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* delayed mntputAl Viro2014-10-092-19/+57
| | | | | | | | | | | | | | | | On final mntput() we want fs shutdown to happen before return to userland; however, the only case where we want it happen right there (i.e. where task_work_add won't do) is MNT_INTERNAL victim. Those have to be fully synchronous - failure halfway through module init might count on having vfsmount killed right there. Fortunately, final mntput on MNT_INTERNAL vfsmounts happens on shallow stack. So we handle those synchronously and do an analog of delayed fput logics for everything else. As the result, we are guaranteed that fs shutdown will always happen on shallow stack. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* autofs - remove obsolete d_invalidate() from expireIan Kent2014-10-091-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Biederman's umount-on-rmdir series changes d_invalidate() to sumarily remove mounts under the passed in dentry regardless of whether they are busy or not. So calling this in fs/autofs4/expire.c:autofs4_tree_busy() is definitely the wrong thing to do becuase it will silently umount entries instead of just cleaning stale dentrys. But this call shouldn't be needed and testing shows that automounting continues to function without it. As Al Viro correctly surmises the original intent of the call was to perform what shrink_dcache_parent() does. If at some time in the future I see stale dentries accumulating following failed mounts I'll revisit the issue and possibly add a shrink_dcache_parent() call if needed. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Allow sharing external names after __d_move()Al Viro2014-10-091-16/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * external dentry names get a small structure prepended to them (struct external_name). * it contains an atomic refcount, matching the number of struct dentry instances that have ->d_name.name pointing to that external name. The first thing free_dentry() does is decrementing refcount of external name, so the instances that are between the call of free_dentry() and RCU-delayed actual freeing do not contribute. * __d_move(x, y, false) makes the name of x equal to the name of y, external or not. If y has an external name, extra reference is grabbed and put into x->d_name.name. If x used to have an external name, the reference to the old name is dropped and, should it reach zero, freeing is scheduled via kfree_rcu(). * free_dentry() in dentry with external name decrements the refcount of that name and, should it reach zero, does RCU-delayed call that will free both the dentry and external name. Otherwise it does what it used to do, except that __d_free() doesn't even look at ->d_name.name; it simply frees the dentry. All non-RCU accesses to dentry external name are safe wrt freeing since they all should happen before free_dentry() is called. RCU accesses might run into a dentry seen by free_dentry() or into an old name that got already dropped by __d_move(); however, in both cases dentry must have been alive and refer to that name at some point after we'd done rcu_read_lock(), which means that any freeing must be still pending. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* missing data dependency barrier in prepend_name()Al Viro2014-09-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | AFAICS, prepend_name() is broken on SMP alpha. Disclaimer: I don't have SMP alpha boxen to reproduce it on. However, it really looks like the race is real. CPU1: d_path() on /mnt/ramfs/<255-character>/foo CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character> CPU2 does d_alloc(), which allocates an external name, stores the name there including terminating NUL, does smp_wmb() and stores its address in dentry->d_name.name. It proceeds to d_add(dentry, NULL) and d_move() old dentry over to that. ->d_name.name value ends up in that dentry. In the meanwhile, CPU1 gets to prepend_name() for that dentry. It fetches ->d_name.name and ->d_name.len; the former ends up pointing to new name (64-byte kmalloc'ed array), the latter - 255 (length of the old name). Nothing to force the ordering there, and normally that would be OK, since we'd run into the terminating NUL and stop. Except that it's alpha, and we'd need a data dependency barrier to guarantee that we see that store of NUL __d_alloc() has done. In a similar situation dentry_cmp() would survive; it does explicit smp_read_barrier_depends() after fetching ->d_name.name. prepend_name() doesn't and it risks walking past the end of kmalloc'ed object and possibly oops due to taking a page fault in kernel mode. Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2014-09-278-85/+60
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes + unifying __d_move() and __d_materialise_dentry() + minimal regression fix for d_path() of victims of overwriting rename() ported on top of that" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Don't exchange "short" filenames unconditionally. fold swapping ->d_name.hash into switch_names() fold unlocking the children into dentry_unlock_parents_for_move() kill __d_materialise_dentry() __d_materialise_dentry(): flip the order of arguments __d_move(): fold manipulations with ->d_child/->d_subdirs don't open-code d_rehash() in d_materialise_unique() pull rehashing and unlocking the target dentry into __d_materialise_dentry() ufs: deal with nfsd/iget races fuse: honour max_read and max_write in direct_io mode shmem: fix nlink for rename overwrite directory
| * vfs: Don't exchange "short" filenames unconditionally.Mikhail Efremov2014-09-271-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only exchange source and destination filenames if flags contain RENAME_EXCHANGE. In case if executable file was running and replaced by other file /proc/PID/exe should still show correct file name, not the old name of the file by which it was replaced. The scenario when this bug manifests itself was like this: * ALT Linux uses rpm and start-stop-daemon; * during a package upgrade rpm creates a temporary file for an executable to rename it upon successful unpacking; * start-stop-daemon is run subsequently and it obtains the (nonexistant) temporary filename via /proc/PID/exe thus failing to identify the running process. Note that "long" filenames (> DNAiME_INLINE_LEN) are still exchanged without RENAME_EXCHANGE and this behaviour exists long enough (should be fixed too apparently). So this patch is just an interim workaround that restores behavior for "short" names as it was before changes introduced by commit da1ce0670c14 ("vfs: add cross-rename"). See https://lkml.org/lkml/2014/9/7/6 for details. AV: the comments about being more careful with ->d_name.hash than with ->d_name.name are from back in 2.3.40s; they became obsolete by 2.3.60s, when we started to unhash the target instead of swapping hash chain positions followed by d_delete() as we used to do when dcache was first introduced. Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: da1ce0670c14 "vfs: add cross-rename" Signed-off-by: Mikhail Efremov <sem@altlinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fold swapping ->d_name.hash into switch_names()Linus Torvalds2014-09-271-2/+1
| | | | | | | | | | | | | | and do it along with ->d_name.len there Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fold unlocking the children into dentry_unlock_parents_for_move()Al Viro2014-09-261-5/+4
| | | | | | | | | | | | | | ... renaming it into dentry_unlock_for_move() and making it more symmetric with dentry_lock_for_move(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * kill __d_materialise_dentry()Al Viro2014-09-261-44/+10
| | | | | | | | | | | | it folds into __d_move() now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * __d_materialise_dentry(): flip the order of argumentsAl Viro2014-09-261-24/+20
| | | | | | | | | | | | | | ... thus making it much closer to (now unreachable, BTW) IS_ROOT(dentry) case in __d_move(). A bit more and it'll fold in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * __d_move(): fold manipulations with ->d_child/->d_subdirsAl Viro2014-09-261-5/+3
| | | | | | | | | | | | | | | | | | list_del() + list_add() is a slightly pessimised list_move() list_del() + INIT_LIST_HEAD() is a slightly pessimised list_del_init() Interleaving those makes the resulting code even worse. And harder to follow... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * don't open-code d_rehash() in d_materialise_unique()Al Viro2014-09-261-5/+1
| | | | | | | | | | | | ... and get rid of duplicate BUG_ON() there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * pull rehashing and unlocking the target dentry into __d_materialise_dentry()Al Viro2014-09-261-7/+4
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ufs: deal with nfsd/iget racesAl Viro2014-09-262-1/+9
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fuse: honour max_read and max_write in direct_io modeMiklos Szeredi2014-09-264-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The third argument of fuse_get_user_pages() "nbytesp" refers to the number of bytes a caller asked to pack into fuse request. This value may be lesser than capacity of fuse request or iov_iter. So fuse_get_user_pages() must ensure that *nbytesp won't grow. Now, when helper iov_iter_get_pages() performs all hard work of extracting pages from iov_iter, it can be done by passing properly calculated "maxsize" to the helper. The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need this capability, so pass LONG_MAX as the maxsize argument here. Fixes: c9c37e2e6378 ("fuse: switch to iov_iter_get_pages()") Reported-by: Werner Baumann <werner.baumann@onlinehome.de> Tested-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * shmem: fix nlink for rename overwrite directoryMiklos Szeredi2014-09-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If overwriting an empty directory with rename, then need to drop the extra nlink. Test prog: #include <stdio.h> #include <fcntl.h> #include <err.h> #include <sys/stat.h> int main(void) { const char *test_dir1 = "test-dir1"; const char *test_dir2 = "test-dir2"; int res; int fd; struct stat statbuf; res = mkdir(test_dir1, 0777); if (res == -1) err(1, "mkdir(\"%s\")", test_dir1); res = mkdir(test_dir2, 0777); if (res == -1) err(1, "mkdir(\"%s\")", test_dir2); fd = open(test_dir2, O_RDONLY); if (fd == -1) err(1, "open(\"%s\")", test_dir2); res = rename(test_dir1, test_dir2); if (res == -1) err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2); res = fstat(fd, &statbuf); if (res == -1) err(1, "fstat(%i)", fd); if (statbuf.st_nlink != 0) { fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink); return 1; } return 0; } Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-3.17-fixes' of ↵Linus Torvalds2014-09-276-24/+43
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This is quite late but these need to be backported anyway. This is the fix for a long-standing cpuset bug which existed from 2009. cpuset makes use of PF_SPREAD_{PAGE|SLAB} flags to modify the task's memory allocation behavior according to the settings of the cpuset it belongs to; unfortunately, when those flags have to be changed, cpuset did so directly even whlie the target task is running, which is obviously racy as task->flags may be modified by the task itself at any time. This obscure bug manifested as corrupt PF_USED_MATH flag leading to a weird crash. The bug is fixed by moving the flag to task->atomic_flags. The first two are prepatory ones to help defining atomic_flags accessors and the third one is the actual fix" * 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags sched: add macros to define bitops for task atomic flags sched: fix confusing PFA_NO_NEW_PRIVS constant
| * | cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flagsZefan Li2014-09-245-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we change cpuset.memory_spread_{page,slab}, cpuset will flip PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset. This should be done using atomic bitops, but currently we don't, which is broken. Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened when one thread tried to clear PF_USED_MATH while at the same time another thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on the same task. Here's the full report: https://lkml.org/lkml/2014/9/19/230 To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags. v4: - updated mm/slab.c. (Fengguang Wu) - updated Documentation. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time") Cc: <stable@vger.kernel.org> # 2.6.31+ Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | sched: add macros to define bitops for task atomic flagsZefan Li2014-09-242-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will simplify code when we add new flags. v3: - Kees pointed out that no_new_privs should never be cleared, so we shouldn't define task_clear_no_new_privs(). we define 3 macros instead of a single one. v2: - updated scripts/tags.sh, suggested by Peter Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | sched: fix confusing PFA_NO_NEW_PRIVS constantZefan Li2014-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1d4457f99928 ("sched: move no_new_privs into new atomic flags") defined PFA_NO_NEW_PRIVS as hexadecimal value, but it is confusing because it is used as bit number. Redefine it as decimal bit number. Note this changes the bit position of PFA_NOW_NEW_PRIVS from 1 to 0. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Kees Cook <keescook@chromium.org> [ lizf: slightly modified subject and changelog ] Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | Merge tag 'fixes-for-linus' of ↵Linus Torvalds2014-09-2713-77/+139
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Here's our last set of fixes for 3.17. Most of these are for TI platforms, fixing some noisy Kconfig issues, runtime clock and power issues on several platforms and NAND timings on DRA7. There are also a couple of bug fixes for i.MX, one for QCOM and a small fix to avoid section mismatch noise on PXA. Diffstat looks large, partially due to some tables being updated and thus touching many lines. The qcom gsbi change also restructures clock management a bit and thus touches a bunch of lines. All in all, a bit more changes than we'd like at this point, but nothing stands out as risky either so it seems like the right thing to send it up now instead of holding it to the merge window" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: drivers/soc: qcom: do not disable the iface clock in probe ARM: imx: fix .is_enabled() of shared gate clock ARM: OMAP3: Fix I/O chain clock line assertion timed out error ARM: keystone: dts: fix bindings for pcie and usb clock nodes bus: omap_l3_noc: Fix connID for OMAP4 ARM: DT: imx53: fix lvds channel 1 port ARM: dts: cm-t54: fix serial console power supply. ARM: dts: dra7-evm: Fix NAND GPMC timings ARM: pxa: fix section mismatch warning for pxa_timer_nodt_init ARM: OMAP: Fix Kconfig warning for omap1
| * | | drivers/soc: qcom: do not disable the iface clock in probeSrinivas Kandagatla2014-09-231-13/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | since commit 31964ffebbb9 ("tty: serial: msm: Remove direct access to GSBI")' serial hangs if earlyprintk are enabled. This hang is noticed only when the GSBI driver is probed and all the earlyprintks before gsbi probe are seen on the console. The reason why it hangs is because GSBI driver disables hclk in its probe function without realizing that the serial IP might be in use by a bootconsole. As gsbi driver disables the clock in probe the bootconsole locks up. Turning off hclk's could be dangerous if there are system components like earlyprintk using the hclk. This patch fixes the issue by delegating the clock management to probe and remove functions in gsbi rather than disabling the clock in probe. More detailed problem description can be found here: http://www.spinics.net/lists/linux-arm-msm/msg10589.html Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
| * | | Merge tag 'fix-v3.17-io-chain-v3' of ↵Olof Johansson2014-09-222-5/+36
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Regression fix for early omap3 revisions for wake-up events that too some time to narrow down. Although a bit intrusive, this would be good to get into the -rc cycle as there are quite a few boards out there with omap3 es2.1 and es3.0, and we have those in at least three boot test systems too that show errors without this patch. * tag 'fix-v3.17-io-chain-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP3: Fix I/O chain clock line assertion timed out error Signed-off-by: Olof Johansson <olof@lixom.net>
| | * | | ARM: OMAP3: Fix I/O chain clock line assertion timed out errorTony Lindgren2014-09-162-5/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are getting "PRM: I/O chain clock line assertion timed out" errors on early omaps for device tree based booting. This is because we are unconditionally calling reconfigure_io_chain while legacy booting has omap3_has_io_chain_ctrl() checks in place in omap_hwmod.c. For device tree based booting, we are calling reconfigure_io_chain unconditionally from pinctrl framework. So we need to add a check for omap3_has_io_chain_ctrl() to avoid the errors for trying to access a register that does not exist. For es3.0, the documentation in "4.11.2 Device Off-Mode Configuration" just mentions PM_WKEN_WKUP[8] bit. For es3.1, there's a new chapter in documentation for "4.11.2.2 I/O Wake-Up Mechanism" that describes the PM_WKEN_WKUP[16] ST_IO_CHAIN bit. So PM_WKEN_WKUP[16] bit did not get added until in es3.1 probaly to fix issues with flakey wake-up events. We are doing proper checks for ST_IO_CHAIN already in id.c and with omap3_has_io_chain_ctrl(). For more information, see also commit b02b917211d5 ("ARM: OMAP3: PM: fix I/O wakeup and I/O chain clock control detection"). Let's fix the issue by selecting the right function during init for reconfigure_io_chain depending on the omap revision. For es3.0 and earlier we need to just toggle EN_IO. By doing this, we can move the check for omap3_has_io_chain_ctrl() from omap_hwmod.c to the init code in prm_3xxx.c. And then we can unconditionally call reconfigure_io_chain. Thanks to Paul Walmsley and Nishanth Menon for help with debugging the issue. Fixes: 30a69ef785e8 ("ARM: OMAP: Move DT wake-up event handling over to use pinctrl-single-omap") Cc: Kevin Hilman <khilman@kernel.org> Cc: Paul Walmsley <paul@pwsan.com> Cc: Tero Kristo <t-kristo@ti.com> Reviewed-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * | | | Merge tag 'fixes-v3.17-rc4' of ↵Olof Johansson2014-09-223-43/+39
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Few regression fixes for omaps for the -rc cycle: - Fix for omap_l3_noc bus code - Serial console fix for cm-t53 - NAND timings fix for dra7-evm * tag 'fixes-v3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: bus: omap_l3_noc: Fix connID for OMAP4 ARM: dts: cm-t54: fix serial console power supply. ARM: dts: dra7-evm: Fix NAND GPMC timings Signed-off-by: Olof Johansson <olof@lixom.net>
| | * | | | bus: omap_l3_noc: Fix connID for OMAP4Nishanth Menon2014-09-111-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d4d8819e205854c ("bus: omap_l3_noc: fix masterid detection") did the right thing in dropping the LSB 2 bits which is not part of the ConnID for NTTP master address. However, as part of that change, we should also have ensured that existing list of OMAP4 connID codes are also shifted by 2 bits to ensure that connIDs map to "Table 13-18. ConnID Values" as provided in Technical Reference Manuals for OMAP4430(Rev AP, April 2014, SWPU220AP) and OMAP4460(Rev AB, April 2014, SWPU234AB) Fixes: d4d8819e205854c ("bus: omap_l3_noc: fix masterid detection") Reported-by: Kristian Otnes <kotnes@cisco.com> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | ARM: dts: cm-t54: fix serial console power supply.Dmitry Lifshitz2014-09-101-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LDO8 regulator is used for act led and serial cosole power supply. Its DT status is declared as "disabled", however the serial console was functional until Commit 318dbb02b ("regulator: palmas: Fix SMPS enable/disable/is_enabled") wich properly turns off LDO8 on boot. Fix serial cosole power supply (and act led) on boot by turning LDO8 on. Fixes: 318dbb02b ("regulator: palmas: Fix SMPS enable/disable/is_enabled") Signed-off-by: Dmitry Lifshitz <lifshitz@compulab.co.il> Tested-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | ARM: dts: dra7-evm: Fix NAND GPMC timingsRoger Quadros2014-09-101-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nand timings were scaled down by 2 to account for the 2x rate returned by clk_get_rate(gpmc_fclk). As the clock data got fixed by [1], revert back to actual timings (i.e. scale them up by 2). Without this NAND doesn't work on dra7-evm. [1] - commit dd94324b983afe114ba9e7ee3649313b451f63ce ARM: dts: dra7xx-clocks: Fix the l3 and l4 clock rates Fixes: ff66a3c86e00 ("ARM: dts: dra7: add support for parallel NAND flash") Cc: <stable@vger.kernel.org> [3.16] Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * | | | | Merge tag 'fixes-v3.17-rc4' of ↵Olof Johansson2014-09-221-3/+3
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into fixes Keystone Edision dts fix for -rc cycle. Fix the PCIE and USB nodes. * tag 'fixes-v3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone: ARM: keystone: dts: fix bindings for pcie and usb clock nodes Signed-off-by: Olof Johansson <olof@lixom.net>
| | * | | | | ARM: keystone: dts: fix bindings for pcie and usb clock nodesMurali Karicheri2014-09-111-3/+3
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix incorrect clock names for usb1, pcie1 and domain register offset for pcie1 clock nodes on K2E EVM Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
| * | | | | ARM: imx: fix .is_enabled() of shared gate clockShawn Guo2014-09-221-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 63288b721a80 ("ARM: imx: fix shared gate clock") attempted to fix an issue with particular enable/disable sequence from two shared gate clocks. But unfortunately, while it partially fixed the issue, it also did something wrong in .is_enabled() function hook. In case of shared gate, the function shouldn't really query the hardware state via share_count, because the function is trying to query the enabling state of the clock in question, not the hardware state which is shared by multiple clocks. Fix the issue by returning the enable_count of the clock itself which is maintained by clock core, in case it's a clock sharing hardware gate with others. As the result, the initialization of share_count per hardware state is not needed now. So remove it. Reported-by: Fabio Estevam <fabio.estevam@freescale.com> Fixes: 63288b721a80 ("ARM: imx: fix shared gate clock") Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawn.guo@freescale.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Olof Johansson <olof@lixom.net>
| * | | | | ARM: DT: imx53: fix lvds channel 1 portMarkus Niebel2014-09-112-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | using LVDS channel 1 on an i.MX53 leads to following error: imx-ldb 53fa8008.ldb: unable to set di0 parent clock to ldb_di1 This comes from imx_ldb_set_clock with mux = 0. Mux parameter must be "1" for reparenting di1 clock to ldb_di1. The value of the mux param comes from device tree port settings. On i.MX5, the internal two-input-multiplexer is used. Due to hardware limitations, only one port (port@[0,1]) can be used for each channel (lvds-channel@[0,1], respectively) Documentation update suggested by Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Markus Niebel <Markus.Niebel@tq-group.com> Fixes: e05c8c9a790a ("ARM: dts: imx53: Add IPU DI ports and endpoints, move imx-drm node to dtsi") Cc: <stable@vger.kernel.org> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@freescale.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| * | | | | ARM: pxa: fix section mismatch warning for pxa_timer_nodt_initArnd Bergmann2014-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a38b1f60b5245a3 ("ARM: pxa: Add non device-tree timer link to clocksource") introduced a harmless section mismatch warning for all pxa platforms, by introducing a new pxa_timer_init() function that is not marked __init but that calls pxa_timer_nodt_init(), which is. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
| * | | | | ARM: OMAP: Fix Kconfig warning for omap1Tony Lindgren2014-09-092-3/+3
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 21278aeafbfa ("ARM: use menuconfig for sub-arch menus") improved the sub-arch menus, but accidentally caused new warnings for omap1. This was because the commit added a menu entry around config ARCH_OMAP bool entry where the menu had depends on ARCH_MULTI_V6 || ARCH_MULTI_V7. As ARCH_OMAP is shared between omap1 and omap2plus, let's fix the issue by defining ARCH_OMAP in the shared plat-omap/Kconfig. Fixes: 21278aeafbfa ("ARM: use menuconfig for sub-arch menus") Reported-by: Andreas Ruprecht <rupran@einserver.de> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | | | | Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds2014-09-272-3/+15
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MIPS fixes from Ralf Baechle: "The final round of fixes. One corner case in the math emulator and another one in the mcount function for ftrace" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: mcount: Adjust stack pointer for static trace in MIPS32 MIPS: Fix MFC1 & MFHC1 emulation for 64-bit MIPS systems
| * | | | | MIPS: mcount: Adjust stack pointer for static trace in MIPS32Markos Chandras2014-09-261-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every mcount() call in the MIPS 32-bit kernel is done as follows: [...] move at, ra jal _mcount addiu sp, sp, -8 [...] but upon returning from the mcount() function, the stack pointer is not adjusted properly. This is explained in details in 58b69401c797 (MIPS: Function tracer: Fix broken function tracing). Commit ad8c396936e3 ("MIPS: Unbreak function tracer for 64-bit kernel.) fixed the stack manipulation for 64-bit but it didn't fix it completely for MIPS32. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: <stable@vger.kernel.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7792/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | | | MIPS: Fix MFC1 & MFHC1 emulation for 64-bit MIPS systemsPaul Burton2014-09-261-3/+3
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bbd426f542cb "MIPS: Simplify FP context access" modified the SIFROMREG & SIFROMHREG macros such that they return unsigned rather than signed 32b integers. I had believed that to be fine, but inadvertently missed the MFC1 & MFHC1 cases which write to a struct pt_regs regs element. On MIPS32 this is fine, but on 64 bit those saved regs' fields are 64 bit wide. Using unsigned values caused the 32 bit value from the FP register to be zero rather than sign extended as the architecture specifies, causing incorrect emulation of the MFC1 & MFHc1 instructions. Fix by reintroducing the casts to signed integers, and therefore the sign extension. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: stable@vger.kernel.org # v3.15+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7848/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
OpenPOWER on IntegriCloud