op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	cgroup: cgroup namespace setns support	Aditya Kali	2016-02-16	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	setns on a cgroup namespace is allowed only if task has CAP_SYS_ADMIN in its current user-namespace and over the user-namespace associated with target cgroupns. No implicit cgroup changes happen with attaching to another cgroupns. It is expected that the somone moves the attaching process under the target cgroupns-root. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
*	cgroup: introduce cgroup namespaces	Aditya Kali	2016-02-16	8	-10/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the ability to create new cgroup namespace. The newly created cgroup namespace remembers the cgroup of the process at the point of creation of the cgroup namespace (referred as cgroupns-root). The main purpose of cgroup namespace is to virtualize the contents of /proc/self/cgroup file. Processes inside a cgroup namespace are only able to see paths relative to their namespace root (unless they are moved outside of their cgroupns-root, at which point they will see a relative path from their cgroupns-root). For a correctly setup container this enables container-tools (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized containers without leaking system level cgroup hierarchy to the task. This patch only implements the 'unshare' part of the cgroupns. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
*	sched: new clone flag CLONE_NEWCGROUP for cgroup namespace	Aditya Kali	2016-02-16	1	-2/+1
\| \| \| \| \| \| \| \|	CLONE_NEWCGROUP will be used to create new cgroup namespace. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
*	kernfs: Add API to generate relative kernfs path	Aditya Kali	2016-02-16	2	-35/+165
\| \| \| \| \| \| \| \| \| \|	The new function kernfs_path_from_node() generates and returns kernfs path of a given kernfs_node relative to a given parent kernfs_node. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
*	cgroup: provide cgroup_nov1= to disable controllers in v1 mounts	Johannes Weiner	2016-02-12	1	-1/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Testing cgroup2 can be painful with system software automatically mounting and populating all cgroup controllers in v1 mode. Sometimes they can be unmounted from rc.local, sometimes even that is too late. Provide a commandline option to disable certain controllers in v1 mounts, so that they remain available for cgroup2 mounts. Example use: cgroup_no_v1=memory,cpu cgroup_no_v1=all Disabling will be confirmed at boot-time as such: [ 0.013770] Disabling cpu control group subsystem in v1 mounts [ 0.016004] Disabling memory control group subsystem in v1 mounts Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Tejun Heo <tj@kernel.org>
*	kernel/Makefile: remove the useless CFLAGS_REMOVE_cgroup-debug.o	Li Bin	2016-01-31	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	The file cgroup-debug.c had been removed from commit fe6934354f8e (cgroups: move the cgroup debug subsys into cgroup.c to access internal state). Remain the CFLAGS_REMOVE_cgroup-debug.o = $(CC_FLAGS_FTRACE) useless in kernel/Makefile. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
*	Documentation: cgroup: Fix 'cgroup-legacy' -> 'cgroup-v1'	W. Trevor King	2016-01-29	1	-1/+1
\| \| \| \| \| \| \| \|	This should have happened in 6255c46f (cgroup: rename cgroup documentations, 2016-01-11). Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Tejun Heo <tj@kernel.org>
*	cgroup: make sure a parent css isn't freed before its children	Tejun Heo	2016-01-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three subsystem callbacks in css shutdown path - css_offline(), css_released() and css_free(). Except for css_released(), cgroup core didn't guarantee the order of invocation. css_offline() or css_free() could be called on a parent css before its children. This behavior is unexpected and led to bugs in cpu and memory controller. The previous patch updated ordering for css_offline() which fixes the cpu controller issue. While there currently isn't a known bug caused by misordering of css_free() invocations, let's fix it too for consistency. css_free() ordering can be trivially fixed by moving putting of the parent css below css_free() invocation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>
*	cgroup: make sure a parent css isn't offlined before its children	Tejun Heo	2016-01-22	2	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three subsystem callbacks in css shutdown path - css_offline(), css_released() and css_free(). Except for css_released(), cgroup core didn't guarantee the order of invocation. css_offline() or css_free() could be called on a parent css before its children. This behavior is unexpected and led to bugs in cpu and memory controller. This patch updates offline path so that a parent css is never offlined before its children. Each css keeps online_cnt which reaches zero iff itself and all its children are offline and offline_css() is invoked only after online_cnt reaches zero. This fixes the memory controller bug and allows the fix for cpu controller. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reported-by: Brian Christiansen <brian.o.christiansen@gmail.com> Link: http://lkml.kernel.org/g/5698A023.9070703@de.ibm.com Link: http://lkml.kernel.org/g/CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg@mail.gmail.com Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org
*	cpuset: make mm migration asynchronous	Tejun Heo	2016-01-22	3	-22/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If "cpuset.memory_migrate" is set, when a process is moved from one cpuset to another with a different memory node mask, pages in used by the process are migrated to the new set of nodes. This was performed synchronously in the ->attach() callback, which is synchronized against process management. Recently, the synchronization was changed from per-process rwsem to global percpu rwsem for simplicity and optimization. Combined with the synchronous mm migration, this led to deadlocks because mm migration could schedule a work item which may in turn try to create a new worker blocking on the process management lock held from cgroup process migration path. This heavy an operation shouldn't be performed synchronously from that deep inside cgroup migration in the first place. This patch punts the actual migration to an ordered workqueue and updates cgroup process migration and cpuset config update paths to flush the workqueue after all locks are released. This way, the operations still seem synchronous to userland without entangling mm migration with process management synchronization. CPU hotplug can also invoke mm migration but there's no reason for it to wait for mm migrations and thus doesn't synchronize against their completions. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v4.4+
*	Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-block	Linus Torvalds	2016-01-21	19	-2197/+2596
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NVMe updates from Jens Axboe: "Last branch for this series is the nvme changes. It's in a separate branch to avoid splitting too much between core and NVMe changes, since NVMe is still helping drive some blk-mq changes. That said, not a huge amount of core changes in here. The grunt of the work is the continued split of the code" * 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits) uapi: update install list after nvme.h rename NVMe: Export NVMe attributes to sysfs group NVMe: Shutdown controller only for power-off NVMe: IO queue deletion re-write NVMe: Remove queue freezing on resets NVMe: Use a retryable error code on reset NVMe: Fix admin queue ring wrap nvme: make SG_IO support optional nvme: fixes for NVME_IOCTL_IO_CMD on the char device nvme: synchronize access to ctrl->namespaces nvme: Move nvme_freeze/unfreeze_queues to nvme core PCI/AER: include header file NVMe: Export namespace attributes to sysfs NVMe: Add pci error handlers block: remove REQ_NO_TIMEOUT flag nvme: merge iod and cmd_info nvme: meta_sg doesn't have to be an array nvme: properly free resources for cancelled command nvme: simplify completion handling nvme: special case AEN requests ...
\| *	uapi: update install list after nvme.h rename	Mike Frysinger	2016-01-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 9d99a8dda154 ("nvme: move hardware structures out of the uapi version of nvme.h") renamed nvme.h to nvme_ioctl.h, but the uapi list still refers to nvme.h. People trying to install the headers hit a failure as the header no longer exists. Cc: stable@vger.kernel.org Signed-off-by: Mike Frysinger <vapier@gentoo.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Export NVMe attributes to sysfs group	Keith Busch	2016-01-12	1	-11/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds all controller information to attribute list exposed to sysfs, and appends the reset_controller attribute to it. The nvme device is created with this attribute list, so driver no long manages its attributes. Reported-by: Sujith Pandel <sujithpshankar@gmail.com> Cc: Sujith Pandel <sujithpshankar@ gmail.com> Cc: David Milburn <dmilburn@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Shutdown controller only for power-off	Keith Busch	2016-01-12	1	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't need to shutdown a controller for a reset. A controller in a shutdown state may take longer to become ready than one that was simply disabled. This patch has the driver shut down a controller only if the device is about to be powered off or being removed. When taking the controller down for a reset reason, the controller will be disabled instead. Function names have been updated in this patch to reflect their changed semantics. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: IO queue deletion re-write	Keith Busch	2016-01-12	1	-170/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The nvme driver deletes IO queues asynchronously since this operation may potentially take an undesirable amount of time with a large number of queues if done serially. The driver used to manage coordinating asynchronous deletions. This patch simplifies that by leveraging the block layer rather than using kthread workers and chaining more complicated callbacks. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Remove queue freezing on resets	Keith Busch	2016-01-12	3	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NVMe submits all commands through the block layer now. This means we can let requests queue at the blk-mq hardware context since there is no path that bypasses this anymore so we don't need to freeze the queues anymore. The driver can simply stop the h/w queues from running during a reset instead. This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen with requeued requests. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Use a retryable error code on reset	Keith Busch	2016-01-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A negative status has the "do not retry" bit set, which makes it not retryable. Use a fake status that can potentially be retried on reset. An aborted command's status is overridden by the timeout handler so that it won't be retried, which is necessary to keep initialization from getting into a reset loop. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Fix admin queue ring wrap	Keith Busch	2016-01-12	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The tag set queue depth needs to be one less than the h/w queue depth so we don't wrap the circular buffer. This conforms to the specification defined "Full Queue" condition. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: make SG_IO support optional	Christoph Hellwig	2016-01-12	3	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Translation SCSI commands to NVMe commands is rather pointless in general as applications must not expext to be able to use SCSI commands on a generic block device. Make the huge translation layer optional and hope no one will ever enable it in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: fixes for NVME_IOCTL_IO_CMD on the char device	Christoph Hellwig	2016-01-12	1	-5/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure we synchronize access to the namespaces list and grab a reference to the namespace before doing I/O. Make sure to reject the ioctl if multiple namespaces are present as it's entirely unsafe, and warn when using it even with a single namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: synchronize access to ctrl->namespaces	Christoph Hellwig	2016-01-12	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently traversal and modification of ctrl->namespaces happens completely unsynchronized, which can be fixed by the addition of a simple mutex. Note: nvme_dev_ioctl will be handled in the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: Move nvme_freeze/unfreeze_queues to nvme core	Sagi Grimberg	2016-01-12	3	-30/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nothing pci specific about them and We'll need them exported in other transports too. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	PCI/AER: include header file	Sudip Mukherjee	2015-12-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are having build failure with sparc allmodconfig with the error: drivers/nvme/host/pci.c:15:0: include/linux/aer.h: In function 'pci_enable_pcie_error_reporting': include/linux/aer.h:49:10: error: 'EINVAL' undeclared (first use in this function) The file aer.h is using the error values but they are defined in errno.h. Include errno.h so that we have the definitions of the error codes. Fixes: a0a3408ee614 ("NVMe: Add pci error handlers") Cc: Keith Busch <keith.busch@intel.com> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Export namespace attributes to sysfs	Keith Busch	2015-12-22	2	-2/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Exposes the NGUID, EUI-64, and NSID to sysfs entries under the disk's kobject. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Add pci error handlers	Keith Busch	2015-12-22	1	-10/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Requests enabling pcie aer support. Shuts down the controller on error detected with io frozen state prior to requesting slot reset; resumes controller after reset completes. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	block: remove REQ_NO_TIMEOUT flag	Christoph Hellwig	2015-12-22	3	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was added for the 'magic' AEN requests in the NVMe driver that never return. We now handle them purely inside the driver and don't need this core hack any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: merge iod and cmd_info	Christoph Hellwig	2015-12-22	1	-111/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge the two per-request structures in the nvme driver into a single one. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: meta_sg doesn't have to be an array	Christoph Hellwig	2015-12-22	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: properly free resources for cancelled command	Christoph Hellwig	2015-12-22	1	-39/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to move freeing of resources to the ->complete handler to ensure they are also freed when we cancel the command. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: simplify completion handling	Christoph Hellwig	2015-12-22	1	-115/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that all commands are executed as block layer requests we can remove the internal completion in the NVMe driver. Note that we can simply call blk_mq_complete_request to abort commands as the block layer will protect against double copletions internally. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: special case AEN requests	Christoph Hellwig	2015-12-22	1	-35/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AEN requests are different from other requests in that they don't time out or can easily be cancelled. Because of that we should not use the blk-mq infrastructure but just special case them in the completion path. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: switch abort to blk_execute_rq_nowait	Christoph Hellwig	2015-12-22	1	-35/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And remove the now unused nvme_submit_cmd helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: switch delete SQ/CQ to blk_execute_rq_nowait	Christoph Hellwig	2015-12-22	1	-34/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: factor out a few helpers from req_completion	Christoph Hellwig	2015-12-22	3	-10/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We'll need them in other places later. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: fix admin queue depth	Christoph Hellwig	2015-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The number in tag_set->queue depth includes the reserved tags. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Simplify metadata setup	Keith Busch	2015-12-22	1	-25/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We no longer require the two-pass setup for block integrity. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Remove device management handles on remove	Keith Busch	2015-12-22	3	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't want to allow new references to open on a device that is removed. This ties the lifetime of these handles to the physical device's presence rather than to the open reference count. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Use unbounded work queue for all work	Keith Busch	2015-12-22	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removes all usage of the global work queue so work can't be scheduled on two different work queues, and removes nvme's work queue singlethreadedness so controllers can be driven in parallel. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: keep the dead controller removal on the system workqueue to avoid deadlocks] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	NVMe: Implement namespace list scanning	Keith Busch	2015-12-22	3	-9/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NVMe 1.1 specification provides an identify mode to return a list of active namespaces. This is more efficient to discover which namespace identifiers are active on a controller, providing potentially significant improvement in scan time for controllers with sparesly populated namespaces. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: add quirk for the broken Qemu Identify implementation. To be relaxed later] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: switch abort_limit to an atomic_t	Christoph Hellwig	2015-12-22	3	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no lock to sychronize access to the abort_limit field of struct nvme_ctrl, so switch it to an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: remove dead controllers from a work item	Christoph Hellwig	2015-12-22	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compared to the kthread this gives us multiple call prevention for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: merge probe_work and reset_work	Christoph Hellwig	2015-12-22	1	-35/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we're using two work queues we're always going to run into races where one item is tearing down what the other one is initializing. So insted merge the two work queues, and let the old probe_work also tear the controller down first if it was alive. Together with the better detection of the probe path using a flag this gives us a properly serialized reset/probe path that also doesn't accidentally trigger when two commands time out and the second one tries to reset the controller while the first reset is still in progress. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: do not restart the request timeout if we're resetting the controller	Keith Busch	2015-12-22	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise we're never going to complete a command when it is restarted just after we completed all other outstanding commands in nvme_clear_queue. The controller must be disabled prior to completing a presumed lost command, do this by directly shutting down the controller before queueing the reset work, and return EH_HANDLED from the timeout handler after we shut the controller down. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split and rebase] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: simplify resets	Christoph Hellwig	2015-12-22	1	-26/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't delete the controller from dev_list before queuing a reset, instead just check for it being reset in the polling kthread. This allows to remove the dev_list_lock in various places, and in addition we can simply rely on checking the queue_work return value to see if we could reset a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: add NVME_SC_CANCELLED	Christoph Hellwig	2015-12-22	2	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To properly document how we are using a negative Linux error value to communicate request cancellations inside the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: merge nvme_abort_req and nvme_timeout	Christoph Hellwig	2015-12-22	1	-29/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to be able to return bettern error values frmo nvme_timeout, which is significantly easier if the two functions are merged. Also clean up and reduce the printk spew so that we only get one message per abort. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: don't take the I/O queue q_lock in nvme_timeout	Christoph Hellwig	2015-12-22	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is nothing it protects, but it makes lockdep unhappy in many different ways. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: protect against simultaneous shutdown invocations	Keith Busch	2015-12-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: only add a controller to dev_list after it's been fully initialized	Christoph Hellwig	2015-12-22	1	-21/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this we can easily get bad derferences on nvmeq->d_db when the nvme kthread tries to poll the CQs for controllers that are in half initialized state. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
\| *	nvme: only ignore hardware errors in nvme_create_io_queues	Christoph Hellwig	2015-12-22	1	-15/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Half initialized queues due to kernel error returns or timeout are still a good reason to give up on initializing a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>