summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | | | | | | | autofs: use path_is_mountpoint() to fix unreliable d_mountpoint() checksIan Kent2016-12-031-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an automount mount is clone(2)ed into a file system that is propagation private, when it later expires in the originating namespace, subsequent calls to autofs ->d_automount() for that dentry in the original namespace will return ELOOP until the mount is umounted in the cloned namespace. Now that a struct path is available where needed use path_is_mountpoint() instead of d_mountpoint() so we don't get false positives when checking if a dentry is a mount point in the current namespace. Link: http://lkml.kernel.org/r/20161011053418.27645.15241.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | autofs: change autofs4_wait() to take struct pathIan Kent2016-12-034-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to use the functions path_is_mountpoint() and path_has_submounts() autofs needs to pass a struct path in several places. Now change autofs4_wait() to take a struct path instead of a struct dentry. Link: http://lkml.kernel.org/r/20161011053413.27645.84666.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | autofs: change autofs4_expire_wait()/do_expire_wait() to take struct pathIan Kent2016-12-034-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to use the functions path_is_mountpoint() and path_has_submounts() autofs needs to pass a struct path in several places. Start by changing autofs4_expire_wait() and do_expire_wait() to take a struct path instead of a struct dentry. Link: http://lkml.kernel.org/r/20161011053408.27645.40091.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | vfs: add path_has_submounts()Ian Kent2016-12-031-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using the given dentry, possibly in a different namespace. Add function, path_has_submounts(), that checks is a struct path contains mounts (or is a mountpoint itself) to handle this case. Link: http://lkml.kernel.org/r/20161011053403.27645.55242.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | vfs: add path_is_mountpoint() helperIan Kent2016-12-032-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using a given dentry that may be in a different namespace. Add helper functions, path_is_mountpoint(), that checks if a struct path is a mountpoint for this case. Link: http://lkml.kernel.org/r/20161011053358.27645.9729.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | vfs: change d_manage() to take a struct pathIan Kent2016-12-022-9/+9
| | | |_|_|_|/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the autofs module to be able to reliably check if a dentry is a mountpoint in a multiple namespace environment the ->d_manage() dentry operation will need to take a path argument instead of a dentry. Link: http://lkml.kernel.org/r/20161011053352.27645.83962.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | Merge remote-tracking branch 'djwong/ocfs2-vfs-reflink-6' into for-linusAl Viro2016-12-1620-306/+797
| |\ \ \ \ \ \ \ \
| | * | | | | | | | ocfs2: implement the VFS clone_range, copy_range, and dedupe_range featuresDarrick J. Wong2016-12-104-3/+474
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Connect the new VFS clone_range, copy_range, and dedupe_range features to the existing reflink capability of ocfs2. Compared to the existing ocfs2 reflink ioctl We have to do things a little differently to support the VFS semantics (we can clone subranges of a file but we don't clone xattrs), but the VFS ioctls are more broadly supported. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- v2: Convert inline data files to extents files before reflinking, and fix i_blocks so that stat(2) output is correct. v3: Make zero-length dedupe consistent with btrfs behavior. v4: Use VFS double-inode lock routines and remove MAX_DEDUPE_LEN.
| | * | | | | | | | ocfs2: charge quota for reflinked blocksDarrick J. Wong2016-12-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ocfs2 shares blocks from one file to another, it's necessary to charge that many blocks to the quota because ocfs2 tallies block charges according to the number of blocks mapped, not the number of physical blocks used. Without this patch, reflinking X blocks and then CoWing all of them causes quota usage to *decrease* by X as seen in generic/305. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: fix bad pointer castDarrick J. Wong2016-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic/188 triggered a dmesg stack trace because the dio completion was casting a buffer head to an on-disk inode, which is whacky. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: always unlock when completing dio writesDarrick J. Wong2016-12-101-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always unlock the inode when completing dio writes, even if an error has occurrred. The caller already checks the inode and unlocks it if needed, so we might as well reduce contention. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: don't eat io errors during _dio_end_io_writeDarrick J. Wong2016-12-101-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2_dio_end_io_write eats whatever errors may happen, which means that write errors do not propagate to userspace. Fix that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: budget for extent tree splits when adding refcount flagDarrick J. Wong2016-12-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we're adding the refcount flag to an extent, we have to budget enough space to handle a full extent btree split in addition to whatever modifications have to be made to the refcount btree. We don't currently do this, with the result that generic/186 crashes when we need an extent split but not a refcount split because meta_ac never gets allocated. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: prohibit refcounted swapfilesDarrick J. Wong2016-12-101-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The swapfile mechanism calls bmap once to find all the swap file mappings, which means that we cannot properly support CoW remapping. Therefore, error out if the swap code tries to call bmap on a refcounted file. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: add newlines to some error messagesDarrick J. Wong2016-12-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two error messages are missing the trailing newline. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | ocfs2: convert inode refcount test to a helperDarrick J. Wong2016-12-106-29/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the open-coded inode refcount flag test with a helper function to reduce the potential for bugs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | vfs: refactor clone/dedupe_file_range common functionsDarrick J. Wong2016-12-092-204/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hoist both the XFS reflink inode state and preparation code and the XFS file blocks compare functions into the VFS so that ocfs2 can take advantage of it for reflink and dedupe. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| | * | | | | | | | fs: try to clone files first in vfs_copy_file_rangeChristoph Hellwig2016-12-095-40/+22
| | | |/ / / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A clone is a perfectly fine implementation of a file copy, so most file systems just implement the copy that way. Instead of duplicating this logic move it to the VFS. Currently btrfs and XFS implement copies the same way as clones and there is no behavior change for them, cifs only implements clones and grow support for copy_file_range with this patch. NFS implements both, so this will allow copy_file_range to work on servers that only implement CLONE and be lot more efficient on servers that implements CLONE and COPY. Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | | | | Merge branch 'work.write_end' into for-linusAl Viro2016-12-166-65/+52
| |\ \ \ \ \ \ \ \
| | * | | | | | | | simple_write_end(): don't zero in short copy into uptodateAl Viro2016-12-101-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | exofs: don't mess with simple_write_{begin,end}Al Viro2016-12-101-38/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and don't zero anything on short copy; just unlock and return 0 if that has happened on non-uptodate page. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | 9p: saner ->write_end() on failing copy into non-uptodate pageAl Viro2016-12-101-11/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we had a short copy into an uptodate page, there's no reason whatsoever to zero anything; OTOH, if that page had _not_ been uptodate, we must have been trying to overwrite it completely and got a short copy. In that case, overwriting the end with zeroes, marking uptodate and sending to server is just plain wrong. Just unlock, keep it non-uptodate and return 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | fix gfs2_stuffed_write_end() on short copiesAl Viro2016-12-101-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a) the page is uptodate - ->write_begin() would either fail (in which case we don't reach ->write_end()), or unstuff the inode, or find the page already uptodate, or do a successful call of stuffed_readpage(), which would've made it uptodate b) zeroing the tail in pagecache is wrong. kill -9 at the right time while writing unmodified file contents to the same file should _not_ leave us in a situation when read() from the file will be reporting it full of zeroes. Especially since that effect will be transient - at some later point the page will be evicted and then we'll be back to the real file contents. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | fix ceph_write_end()Al Viro2016-12-101-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | don't zero on short copies; if the page was uptodate it's just plain wrong, and if it wasn't we'll be better off just returning 0 and buggering off. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | | | | | | | nfs_write_end(): fix handling of short copiesAl Viro2016-12-091-1/+1
| | | |/ / / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | What matters when deciding if we should make a page uptodate is not how much we _wanted_ to copy, but how much we actually have copied. As it is, on architectures that do not zero tail on short copy we can leave uninitialized data in page marked uptodate. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | vfs: misc struct path constificationAl Viro2016-12-053-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | namespace.c: constify struct path passed to a bunch of primitivesAl Viro2016-12-052-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | quota: constify struct path in quota_onAl Viro2016-12-054-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | constify alloc_file()Al Viro2016-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | constify btrfs_mksubvol()Al Viro2016-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | autofs: constify find_autofs_mount() callbackAl Viro2016-12-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | constify get_dcookie() and friendsAl Viro2016-12-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | constify fsnotify_parent()Al Viro2016-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | fsnotify(): constify 'data'Al Viro2016-12-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | fsnotify: constify 'data' passed to ->handle_event()Al Viro2016-12-055-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | fs: Constify path_is_under()'s argumentsMickaël Salaün2016-12-051-1/+1
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function path_is_under() doesn't modify the paths pointed by its arguments but only browse them. Constifying this pointers make a cleaner interface to be used by (future) code which may only have access to const struct path pointers (e.g. LSM hooks). Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | Merge tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2016-12-168-211/+540
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "A varied set of changes: - a large rework of cephx auth code to cope with CONFIG_VMAP_STACK (myself). Also fixed a deadlock caused by a bogus allocation on the writeback path and authorize reply verification. - a fix for long stalls during fsync (Jeff Layton). The client now has a way to force the MDS log flush, leading to ~100x speedups in some synthetic tests. - a new [no]require_active_mds mount option (Zheng Yan). On mount, we will now check whether any of the MDSes are available and bail rather than block if none are. This check can be avoided by specifying the "no" option. - a couple of MDS cap handling fixes and a few assorted patches throughout" * tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client: (32 commits) libceph: remove now unused finish_request() wrapper libceph: always signal completion when done ceph: avoid creating orphan object when checking pool permission ceph: properly set issue_seq for cap release ceph: add flags parameter to send_cap_msg ceph: update cap message struct version to 10 ceph: define new argument structure for send_cap_msg ceph: move xattr initialzation before the encoding past the ceph_mds_caps ceph: fix minor typo in unsafe_request_wait ceph: record truncate size/seq for snap data writeback ceph: check availability of mds cluster on mount ceph: fix splice read for no Fc capability case ceph: try getting buffer capability for readahead/fadvise ceph: fix scheduler warning due to nested blocking ceph: fix printing wrong return variable in ceph_direct_read_write() crush: include mapper.h in mapper.c rbd: silence bogus -Wmaybe-uninitialized warning libceph: no need to drop con->mutex for ->get_authorizer() libceph: drop len argument of *verify_authorizer_reply() libceph: verify authorize reply on connect ...
| * | | | | | | | libceph: always signal completion when doneIlya Dryomov2016-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r_safe_completion is currently, and has always been, signaled only if on-disk ack was requested. It's there for fsync and syncfs, which wait for in-flight writes to flush - all data write requests set ONDISK. However, the pool perm check code introduced in 4.2 sends a write request with only ACK set. An unfortunately timed syncfs can then hang forever: r_safe_completion won't be signaled because only an unsafe reply was requested. We could patch ceph_osdc_sync() to skip !ONDISK write requests, but that is somewhat incomplete and yet another special case. Instead, rename this completion to r_done_completion and always signal it when the OSD client is done with the request, whether unsafe, safe, or error. This is a bit cleaner and helps with the cancellation code. Reported-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | ceph: avoid creating orphan object when checking pool permissionYan, Zheng2016-12-141-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pool permission check needs to write to the first object. But for snapshot, head of the first object may have already been deleted. Skip the check for snapshot inode to avoid creating orphan object. Link: http://tracker.ceph.com/issues/18211 Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: properly set issue_seq for cap releaseYan, Zheng2016-12-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: add flags parameter to send_cap_msgJeff Layton2016-12-121-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a flags parameter to send_cap_msg, so we can request expedited service from the MDS when we know we'll be waiting on the result. Set that flag in the case of try_flush_caps. The callers of that function generally wait synchronously on the result, so it's beneficial to ask the server to expedite it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: update cap message struct version to 10Jeff Layton2016-12-121-6/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The userland ceph has MClientCaps at struct version 10. This brings the kernel up the same version. For now, all of the the new stuff is set to default values including the flags field, which will be conditionally set in a later patch. Note that we don't need to set the change_attr and btime to anything since we aren't currently setting the feature flag. The MDS should ignore those values. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: define new argument structure for send_cap_msgJeff Layton2016-12-121-99/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get to this many arguments, it's hard to work with positional parameters. send_cap_msg is already at 25 arguments, with more needed. Define a new args structure and pass a pointer to it to send_cap_msg. Eventually it might make sense to embed one of these inside ceph_cap_snap instead of tracking individual fields. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: move xattr initialzation before the encoding past the ceph_mds_capsJeff Layton2016-12-121-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just for clarity. This part is inside the header, so it makes sense to group it with the rest of the stuff in the header. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: fix minor typo in unsafe_request_waitJeff Layton2016-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: record truncate size/seq for snap data writebackYan, Zheng2016-12-123-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dirty snapshot data needs to be flushed unconditionally. If they were created before truncation, writeback should use old truncate size/seq. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: check availability of mds cluster on mountYan, Zheng2016-12-124-11/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: fix splice read for no Fc capability caseYan, Zheng2016-12-121-54/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When iov_iter type is ITER_PIPE, copy_page_to_iter() increases the page's reference and add the page to a pipe_buffer. It also set the pipe_buffer's ops to page_cache_pipe_buf_ops. The comfirm callback in page_cache_pipe_buf_ops expects the page is from page cache and uptodate, otherwise it return error. For ceph_sync_read() case, pages are not from page cache. So we can't call copy_page_to_iter() when iov_iter type is ITER_PIPE. The fix is using iov_iter_get_pages_alloc() to allocate pages for the pipe. (the code is similar to default_file_splice_read) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: try getting buffer capability for readahead/fadviseYan, Zheng2016-12-124-11/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For readahead/fadvise cases, caller of ceph_readpages does not hold buffer capability. Pages can be added to page cache while there is no buffer capability. This can cause data integrity issue. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * | | | | | | | ceph: fix scheduler warning due to nested blockingNikolay Borisov2016-12-121-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | try_get_cap_refs can be used as a condition in a wait_event* calls. This is all fine until it has to call __ceph_do_pending_vmtruncate, which in turn acquires the i_truncate_mutex. This leads to a situation in which a task's state is !TASK_RUNNING and at the same time it's trying to acquire a sleeping primitive. In essence a nested sleeping primitives are being used. This causes the following warning: WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110 ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6 CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6 Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0 ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0 Call Trace: [<ffffffff812f4409>] dump_stack+0x67/0x9e [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0 [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8107767f>] __might_sleep+0x9f/0xb0 [<ffffffff81612d30>] mutex_lock+0x20/0x40 [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph] [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph] [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph] [<ffffffff81094370>] ? wait_woken+0xb0/0xb0 [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph] [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260 [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200 [<ffffffff811b46ce>] ? iput+0x9e/0x230 [<ffffffff81077632>] ? __might_sleep+0x52/0xb0 [<ffffffff81156147>] ? __might_fault+0x37/0x40 [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170 [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0 [<ffffffff81199369>] vfs_write+0xa9/0x190 [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70 [<ffffffff8119a056>] SyS_write+0x46/0xa0 This happens since wait_event_interruptible can interfere with the mutex locking code, since they both fiddle with the task state. Fix the issue by using the newly-added nested blocking infrastructure in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with nested blocking") Link: https://lwn.net/Articles/628628/ Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
OpenPOWER on IntegriCloud