op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-07-04	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
\| *	fs: Rename file_remove_suid() to file_remove_privs()	Jan Kara	2015-06-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	file_remove_suid() is a misnomer since it removes also file capabilities stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells something else than whether file_remove_suid() call is necessary which leads to bugs. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-07-03	1	-6/+3
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace updates from Eric Biederman: "Long ago and far away when user namespaces where young it was realized that allowing fresh mounts of proc and sysfs with only user namespace permissions could violate the basic rule that only root gets to decide if proc or sysfs should be mounted at all. Some hacks were put in place to reduce the worst of the damage could be done, and the common sense rule was adopted that fresh mounts of proc and sysfs should allow no more than bind mounts of proc and sysfs. Unfortunately that rule has not been fully enforced. There are two kinds of gaps in that enforcement. Only filesystems mounted on empty directories of proc and sysfs should be ignored but the test for empty directories was insufficient. So in my tree directories on proc, sysctl and sysfs that will always be empty are created specially. Every other technique is imperfect as an ordinary directory can have entries added even after a readdir returns and shows that the directory is empty. Special creation of directories for mount points makes the code in the kernel a smidge clearer about it's purpose. I asked container developers from the various container projects to help test this and no holes were found in the set of mount points on proc and sysfs that are created specially. This set of changes also starts enforcing the mount flags of fresh mounts of proc and sysfs are consistent with the existing mount of proc and sysfs. I expected this to be the boring part of the work but unfortunately unprivileged userspace winds up mounting fresh copies of proc and sysfs with noexec and nosuid clear when root set those flags on the previous mount of proc and sysfs. So for now only the atime, read-only and nodev attributes which userspace happens to keep consistent are enforced. Dealing with the noexec and nosuid attributes remains for another time. This set of changes also addresses an issue with how open file descriptors from /proc/<pid>/ns/* are displayed. Recently readlink of /proc/<pid>/fd has been triggering a WARN_ON that has not been meaningful since it was added (as all of the code in the kernel was converted) and is not now actively wrong. There is also a short list of issues that have not been fixed yet that I will mention briefly. It is possible to rename a directory from below to above a bind mount. At which point any directory pointers below the renamed directory can be walked up to the root directory of the filesystem. With user namespaces enabled a bind mount of the bind mount can be created allowing the user to pick a directory whose children they can rename to outside of the bind mount. This is challenging to fix and doubly so because all obvious solutions must touch code that is in the performance part of pathname resolution. As mentioned above there is also a question of how to ensure that developers by accident or with purpose do not introduce exectuable files on sysfs and proc and in doing so introduce security regressions in the current userspace that will not be immediately obvious and as such are likely to require breaking userspace in painful ways once they are recognized" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: vfs: Remove incorrect debugging WARN in prepend_path mnt: Update fs_fully_visible to test for permanently empty directories sysfs: Create mountpoints with sysfs_create_mount_point sysfs: Add support for permanently empty directories to serve as mount points. kernfs: Add support for always empty directories. proc: Allow creating permanently empty directories that serve as mount points sysctl: Allow creating permanently empty directories that serve as mountpoints. fs: Add helper functions for permanently empty directories. vfs: Ignore unlocked mounts in fs_fully_visible mnt: Modify fs_fully_visible to deal with locked ro nodev and atime mnt: Refactor the logic for mounting sysfs and proc in a user namespace
\| * \|	sysfs: Create mountpoints with sysfs_create_mount_point	Eric W. Biederman	2015-07-01	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows for better documentation in the code and it allows for a simpler and fully correct version of fs_fully_visible to be written. The mount points converted and their filesystems are: /sys/hypervisor/s390/ s390_hypfs /sys/kernel/config/ configfs /sys/kernel/debug/ debugfs /sys/firmware/efi/efivars/ efivarfs /sys/fs/fuse/connections/ fusectl /sys/fs/pstore/ pstore /sys/kernel/tracing/ tracefs /sys/fs/cgroup/ cgroup /sys/kernel/security/ securityfs /sys/fs/selinux/ selinuxfs /sys/fs/smackfs/ smackfs Cc: stable@vger.kernel.org Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-07-02	5	-489/+624
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This is the start of improving fuse scalability. An input queue and a processing queue is split out from the monolithic fuse connection, each of those having their own spinlock. The end of the patchset adds the ability to clone a fuse connection. This means, that instead of having to read/write requests/answers on a single fuse device fd, the fuse daemon can have multiple distinct file descriptors open. Each of those can be used to receive requests and send answers, currently the only constraint is that a request must be answered on the same fd as it was read from. This can be extended further to allow binding a device clone to a specific CPU or NUMA node. Based on a patchset by Srinivas Eeda and Ashish Samant. Thanks to Ashish for the review of this series" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (40 commits) fuse: update MAINTAINERS entry fuse: separate pqueue for clones fuse: introduce per-instance fuse_dev structure fuse: device fd clone fuse: abort: no fc->lock needed for request ending fuse: no fc->lock for pqueue parts fuse: no fc->lock in request_end() fuse: cleanup request_end() fuse: request_end(): do once fuse: add req flag for private list fuse: pqueue locking fuse: abort: group pqueue accesses fuse: cleanup fuse_dev_do_read() fuse: move list_del_init() from request_end() into callers fuse: duplicate ->connected in pqueue fuse: separate out processing queue fuse: simplify request_wait() fuse: no fc->lock for iqueue parts fuse: allow interrupt queuing without fc->lock fuse: iqueue locking ...
\| * \| \|	fuse: separate pqueue for clones	Miklos Szeredi	2015-07-01	3	-30/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make each fuse device clone refer to a separate processing queue. The only constraint on userspace code is that the request answer must be written to the same device clone as it was read off. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
\| * \| \|	fuse: introduce per-instance fuse_dev structure	Miklos Szeredi	2015-07-01	4	-34/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow fuse device clones to refer to be distinguished. This patch just adds the infrastructure by associating a separate "struct fuse_dev" with each clone. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: device fd clone	Miklos Szeredi	2015-07-01	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow an open fuse device to be "cloned". Userspace can create a clone by: newfd = open("/dev/fuse", O_RDWR) ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd); At this point newfd will refer to the same fuse connection as oldfd. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: abort: no fc->lock needed for request ending	Miklos Szeredi	2015-07-01	1	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In fuse_abort_conn() when all requests are on private lists we no longer need fc->lock protection. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: no fc->lock for pqueue parts	Miklos Szeredi	2015-07-01	1	-14/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove fc->lock protection from processing queue members, now protected by fpq->lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: no fc->lock in request_end()	Miklos Szeredi	2015-07-01	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No longer need to call request_end() with the connection lock held. We still protect the background counters and queue with fc->lock, so acquire it if necessary. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: cleanup request_end()	Miklos Szeredi	2015-07-01	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we atomically test having already done everything we no longer need other protection. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: request_end(): do once	Miklos Szeredi	2015-07-01	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the connection is aborted it is possible that request_end() will be called twice. Use atomic test and set to do the actual ending only once. test_and_set_bit() also provides the necessary barrier semantics so no explicit smp_wmb() is necessary. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: add req flag for private list	Miklos Szeredi	2015-07-01	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an unlocked request is aborted, it is moved from fpq->io to a private list. Then, after unlocking fpq->lock, the private list is processed and the requests are finished off. To protect the private list, we need to mark the request with a flag, so if in the meantime the request is unlocked the list is not corrupted. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: pqueue locking	Miklos Szeredi	2015-07-01	3	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a fpq->lock for protecting members of struct fuse_pqueue and FR_LOCKED request flag. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: abort: group pqueue accesses	Miklos Szeredi	2015-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rearrange fuse_abort_conn() so that processing queue accesses are grouped together. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: cleanup fuse_dev_do_read()	Miklos Szeredi	2015-07-01	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- locked list_add() + list_del_init() cancel out - common handling of case when request is ended here in the read phase Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: move list_del_init() from request_end() into callers	Miklos Szeredi	2015-07-01	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
\| * \| \|	fuse: duplicate ->connected in pqueue	Miklos Szeredi	2015-07-01	3	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will allow checking ->connected just with the processing queue lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: separate out processing queue	Miklos Szeredi	2015-07-01	3	-16/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is just two fields: fc->io and fc->processing. This patch just rearranges the fields, no functional change. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: simplify request_wait()	Miklos Szeredi	2015-07-01	1	-25/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	wait_event_interruptible_exclusive_locked() will do everything request_wait() does, so replace it. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: no fc->lock for iqueue parts	Miklos Szeredi	2015-07-01	1	-51/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove fc->lock protection from input queue members, now protected by fiq->waitq.lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: allow interrupt queuing without fc->lock	Miklos Szeredi	2015-07-01	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Interrupt is only queued after the request has been sent to userspace. This is either done in request_wait_answer() or fuse_dev_do_read() depending on which state the request is in at the time of the interrupt. If it's not yet sent, then queuing the interrupt is postponed until the request is read. Otherwise (the request has already been read and is waiting for an answer) the interrupt is queued immedidately. We want to call queue_interrupt() without fc->lock protection, in which case there can be a race between the two functions: - neither of them queue the interrupt (thinking the other one has already done it). - both of them queue the interrupt The first one is prevented by adding memory barriers, the second is prevented by checking (under fiq->waitq.lock) if the interrupt has already been queued. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
\| * \| \|	fuse: iqueue locking	Miklos Szeredi	2015-07-01	1	-6/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use fiq->waitq.lock for protecting members of struct fuse_iqueue and FR_PENDING request flag, previously protected by fc->lock. Following patches will remove fc->lock protection from these members. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: dev read: split list_move	Miklos Szeredi	2015-07-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Different lists will need different locks. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: abort: group iqueue accesses	Miklos Szeredi	2015-07-01	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rearrange fuse_abort_conn() so that input queue accesses are grouped together. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: duplicate ->connected in iqueue	Miklos Szeredi	2015-07-01	4	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will allow checking ->connected just with the input queue lock. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: separate out input queue	Miklos Szeredi	2015-07-01	3	-84/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The input queue contains normal requests (fc->pending), forgets (fc->forget_*) and interrupts (fc->interrupts). There's also fc->waitq and fc->fasync for waking up the readers of the fuse device when a request is available. The fc->reqctr is also moved to the input queue (assigned to the request when the request is added to the input queue. This patch just rearranges the fields, no functional change. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: req state use flags	Miklos Szeredi	2015-07-01	3	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use flags for representing the state in fuse_req. This is needed since req->list will be protected by different locks in different states, hence we'll want the state itself to be split into distinct bits, each protected with the relevant lock in that state. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
\| * \| \|	fuse: simplify req states	Miklos Szeredi	2015-07-01	3	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common FUSE_REQ_IO state. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: don't hold lock over request_wait_answer()	Miklos Szeredi	2015-07-01	1	-25/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only hold fc->lock over sections of request_wait_answer() that actually need it. If wait_event_interruptible() returns zero, it means that the request finished. Need to add memory barriers, though, to make sure that all relevant data in the request is synchronized. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
\| * \| \|	fuse: simplify unique ctr	Miklos Szeredi	2015-07-01	2	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since it's a 64bit counter, it's never gonna wrap around. Remove code dealing with that possibility. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: rework abort	Miklos Szeredi	2015-07-01	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splice fc->pending and fc->processing lists into a common kill list while holding fc->lock. By the time we release fc->lock, pending and processing lists are empty and the io list contains only locked requests. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: fold helpers into abort	Miklos Szeredi	2015-07-01	1	-55/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fold end_io_requests() and end_queued_requests() into fuse_abort_conn(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: use per req lock for lock/unlock_request()	Miklos Szeredi	2015-07-01	2	-22/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reuse req->waitq.lock for protecting FR_ABORTED and FR_LOCKED flags. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: req use bitops	Miklos Szeredi	2015-07-01	4	-72/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Finer grained locking will mean there's no single lock to protect modification of bitfileds in fuse_req. So move to using bitops. Can use the non-atomic variants for those which happen while the request definitely has only one reference. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: simplify request abort	Miklos Szeredi	2015-07-01	1	-73/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- don't end the request while req->locked is true - make unlock_request() return an error if the connection was aborted Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: call fuse_abort_conn() in dev release	Miklos Szeredi	2015-07-01	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fuse_abort_conn() does all the work done by fuse_dev_release() and more. "More" consists of: end_io_requests(fc); wake_up_all(&fc->waitq); kill_fasync(&fc->fasync, SIGIO, POLL_IN); All of which should be no-op (WARN_ON's added). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: fold fuse_request_send_nowait() into single caller	Miklos Szeredi	2015-07-01	1	-22/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And the same with fuse_request_send_nowait_locked(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: check conn_error earlier	Miklos Szeredi	2015-07-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fc->conn_error is set once in FUSE_INIT reply and never cleared. Check it in request allocation, there's no sense in doing all the preparation if sending will surely fail. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: account as waiting before queuing for background	Miklos Szeredi	2015-07-01	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move accounting of fc->num_waiting to the point where the request actually starts waiting. This is earlier than the current queue_request() for background requests, since they might be waiting on the fc->bg_queue before being queued on fc->pending. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: reset waiting	Miklos Szeredi	2015-07-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reset req->waiting in fuse_put_request(). This is needed for correct accounting in fc->num_waiting for reserved requests. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
\| * \| \|	fuse: fix background request if not connected	Miklos Szeredi	2015-07-01	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	request_end() expects fc->num_background and fc->active_background to have been incremented, which is not the case in fuse_request_send_nowait() failure path. So instead just call the ->end() callback (which is actually set by all callers). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
\| * \| \|	fuse: initialize fc->release before calling it	Miklos Szeredi	2015-07-01	1	-1/+1
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fc->release is called from fuse_conn_put() which was used in the error cleanup before fc->release was initialized. [Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling fuse_conn_init(fc) instead of before.] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Fixes: a325f9b92273 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()") Cc: <stable@vger.kernel.org> #v2.6.31+
* \| \|	Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block	Linus Torvalds	2015-06-25	1	-6/+6
\|\ \ \ \| \|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach\|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
\| * \|	writeback: move backing_dev_info->bdi_stat[] into bdi_writeback	Tejun Heo	2015-06-02	1	-6/+6
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->bdi_stat[] into wb. * enum bdi_stat_item is renamed to wb_stat_item and the prefix of all enums is changed from BDI_ to WB_. * BDI_STAT_BATCH() -> WB_STAT_BATCH() * [__]{add\|inc\|dec\|sum}_wb_stat(bdi, ...) -> [__]{add\|inc}_wb_stat(wb, ...) * bdi_stat[_error]() -> wb_stat[_error]() * bdi_writeout_inc() -> wb_writeout_inc() * stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and frees stat. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[] introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* \|	new helper: free_page_put_link()	Al Viro	2015-05-11	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \|	similar to kfree_put_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	switch ->put_link() from dentry to inode	Al Viro	2015-05-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	only one instance looks at that argument at all; that sole exception wants inode rather than dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	don't pass nameidata to ->follow_link()	Al Viro	2015-05-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	its only use is getting passed to nd_jump_link(), which can obtain it from current->nameidata Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	new ->follow_link() and ->put_link() calling conventions	Al Viro	2015-05-10	1	-15/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a) instead of storing the symlink body (via nd_set_link()) and returning an opaque pointer later passed to ->put_link(), ->follow_link() _stores_ that opaque pointer (into void * passed by address by caller) and returns the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic symlinks) and pointer to symlink body for normal symlinks. Stored pointer is ignored in all cases except the last one. Storing NULL for opaque pointer (or not storing it at all) means no call of ->put_link(). b) the body used to be passed to ->put_link() implicitly (via nameidata). Now only the opaque pointer is. In the cases when we used the symlink body to free stuff, ->follow_link() now should store it as opaque pointer in addition to returning it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>