op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge git://git.infradead.org/users/willy/linux-nvme	Linus Torvalds	2014-06-15	1	-71/+132
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NVMe update from Matthew Wilcox: "Mostly bugfixes again for the NVMe driver. I'd like to call out the exported tracepoint in the block layer; I believe Keith has cleared this with Jens. We've had a few reports from people who're really pounding on NVMe devices at scale, hence the timeout changes (and new module parameters), hotplug cpu deadlock, tracepoints, and minor performance tweaks" [ Jens hadn't seen that tracepoint thing, but is ok with it - it will end up going away when mq conversion happens ] * git://git.infradead.org/users/willy/linux-nvme: (22 commits) NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. NVMe: Use Log Page constants in SCSI emulation NVMe: Define Log Page constants NVMe: Fix hot cpu notification dead lock NVMe: Rename io_timeout to nvme_io_timeout NVMe: Use last bytes of f/w rev SCSI Inquiry NVMe: Adhere to request queue block accounting enable/disable NVMe: Fix nvme get/put queue semantics NVMe: Delete NVME_GET_FEAT_TEMP_THRESH NVMe: Make admin timeout a module parameter NVMe: Make iod bio timeout a parameter NVMe: Prevent possible NULL pointer dereference NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) NVMe: Update data structures for NVMe 1.2 NVMe: Enable BUILD_BUG_ON checks NVMe: Update namespace and controller identify structures to the 1.1a spec NVMe: Flush with data support NVMe: Configure support for block flush NVMe: Add tracepoints NVMe: Protect against badly formatted CQEs ...
\| *	NVMe: Fix hot cpu notification dead lock	Keith Busch	2014-06-13	1	-10/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a potential dead lock if a cpu event occurs during nvme probe since it registered with hot cpu notification. This fixes the race by having the module register with notification outside of probe rather than have each device register. The actual work is done in a scheduled work queue instead of in the notifier since assigning IO queues has the potential to block if the driver creates additional queues. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Rename io_timeout to nvme_io_timeout	Matthew Wilcox	2014-06-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's positively immoral to have a global variable called 'io_timeout'. Keep the module parameter called io_timeout, though. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Adhere to request queue block accounting enable/disable	Sam Bradshaw	2014-06-03	1	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently, a new sysfs control "iostats" was added to selectively enable or disable io statistics collection for request queues. This patch hooks that control. IO statistics collection is rather expensive on large, multi-node machines with drives pushing millions of iops. Having the ability to disable collection if not needed can improve throughput significantly. As a data point, on a quad E5-4640, I see more than 50% throughput improvement when io statistics accounting is disabled during heavily multi-threaded small block random read benchmarks where device performance is in the million iops+ range. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Fix nvme get/put queue semantics	Keith Busch	2014-06-03	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The routines to get and lock nvme queues required the caller to "put" or "unlock" them even if getting one returned NULL. This patch fixes that. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Make admin timeout a module parameter	Keith Busch	2014-06-03	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Keith Busch <keith.busch@intel.com> [made admin_timeout static] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Make iod bio timeout a parameter	Keith Busch	2014-06-03	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was originally set to 4 times the IO timeout, but that was when the IO timeout was 5 seconds instead of 30. 20 seconds for total time to failure seemed more reasonable than 2 minutes for most, but other users have requested to make this a module parameter instead. Signed-off-by: Keith Busch <keith.busch@intel.com> [renamed the module parameter to retry_time] [made retry_time static] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Prevent possible NULL pointer dereference	Santosh Y	2014-06-03	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod' can fail. So check the return value. Signed-off-by: Santosh Y <santosh.sy@samsung.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Enable BUILD_BUG_ON checks	Matthew Wilcox	2014-05-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since _nvme_check_size() wasn't being called from anywhere, the compiler was optimising it away ... along with all the link-time build failures that would result if any of the structures were the wrong size. Call it from nvme_exit() for no particular reason. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Flush with data support	Keith Busch	2014-05-05	1	-20/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible a filesystem may send a flush flagged bio with write data. There is no such composite NVMe command, so the driver sends flush and write separately. The device is allowed to execute these commands in any order, so it was possible the driver ends the bio after the write completes, but while the flush is still active. We don't want to let a filesystem believe flush succeeded before it really has; this could cause data corruption on a power loss between these events. To fix, this patch splits the flush and write into chained bios. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Configure support for block flush	Keith Busch	2014-05-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This configures an nvme request_queue as flush capable if the device has a volatile write cache present. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Add tracepoints	Keith Busch	2014-05-05	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding tracepoints for bio_complete and block_split into nvme to help with gathering IO info using blktrace and blkparse. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Protect against badly formatted CQEs	Keith Busch	2014-05-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a misbehaving device posts a CQE with a command id < depth but for one that was never allocated, the command info will have a callback function set to NULL and we don't want to try invoking that. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Improve error messages	Matthew Wilcox	2014-05-05	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Help people diagnose what is going wrong at initialisation time by printing out which command has gone wrong and what the device returned. Also fix the error message printed while waiting for reset. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
\| *	NVMe: Update copyright headers	Matthew Wilcox	2014-05-05	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the copyright dates accurate and remove the final paragraph that includes the address of the FSF. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* \|	NVMe: Implement PCIe reset notification callback	Keith Busch	2014-05-27	1	-0/+11
\|/ \| \| \| \| \| \|	Quiesce and shutdown the device prior to reset, then restart the device and resume IO after. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
*	Merge git://git.infradead.org/users/willy/linux-nvme	Linus Torvalds	2014-04-11	1	-204/+480
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NVMe driver updates from Matthew Wilcox: "Various updates to the NVMe driver. The most user-visible change is that drive hotplugging now works and CPU hotplug while an NVMe drive is installed should also work better" * git://git.infradead.org/users/willy/linux-nvme: NVMe: Retry failed commands with non-fatal errors NVMe: Add getgeo to block ops NVMe: Start-stop nvme_thread during device add-remove. NVMe: Make I/O timeout a module parameter NVMe: CPU hot plug notification NVMe: per-cpu io queues NVMe: Replace DEFINE_PCI_DEVICE_TABLE NVMe: Fix divide-by-zero in nvme_trans_io_get_num_cmds NVMe: IOCTL path RCU protect queue access NVMe: RCU protected access to io queues NVMe: Initialize device reference count earlier NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions
\| *	NVMe: Retry failed commands with non-fatal errors	Keith Busch	2014-04-10	1	-90/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For commands returned with failed status, queue these for resubmission and continue retrying them until success or for a limited amount of time. The final timeout was arbitrarily chosen so requests can't be retried indefinitely. Since these are requeued on the nvmeq that submitted the command, the callbacks have to take an nvmeq instead of an nvme_dev as a parameter so that we can use the locked queue to append the iod to retry later. The nvme_iod conviently can be used to track how long we've been trying to successfully complete an iod request. The nvme_iod also provides the nvme prp dma mappings, so I had to move a few things around so we can keep those mappings. Signed-off-by: Keith Busch <keith.busch@intel.com> [fixed checkpatch issue with long line] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Add getgeo to block ops	Keith Busch	2014-04-10	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some programs require HDIO_GETGEO work, which requires we implement getgeo. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Start-stop nvme_thread during device add-remove.	Dan McLeran	2014-04-10	1	-14/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Done to ensure nvme_thread is not running when there are no devices to poll. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Make I/O timeout a module parameter	Keith Busch	2014-04-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase the default timeout to 30 seconds to match SCSI. Signed-off-by: Keith Busch <keith.busch@intel.com> [use byte instead of ushort] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: CPU hot plug notification	Keith Busch	2014-04-10	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Registers with hot cpu notification to rebalance, and potentially allocate additional, io queues. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: per-cpu io queues	Keith Busch	2014-04-10	1	-37/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The device's IO queues are associated with CPUs, so we can use a per-cpu variable to map the a qid to a cpu. This provides a convienient way to optimally assign queues to multiple cpus when the device supports fewer queues than the host has cpus. The previous implementation may have assigned these poorly in these situations. This patch addresses this by sharing queues among cpus that are "close" together and should have a lower lock contention penalty. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Replace DEFINE_PCI_DEVICE_TABLE	Matthew Wilcox	2014-03-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Checkpatch has started warning against using DEFINE_PCI_DEVICE_TABLE, so replace it. Also update the copyright date and bump the module version number to 0.9. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: IOCTL path RCU protect queue access	Keith Busch	2014-03-24	1	-27/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds rcu protected access to a queue in the nvme IOCTL path to fix potential races between a surprise removal and queue usage in nvme_submit_sync_cmd. The fix holds the rcu_read_lock() here to prevent the nvme_queue from freeing while this path is executing so it can't sleep, and so this path will no longer wait for a available command id should they all be in use at the time a passthrough IOCTL request is received. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: RCU protected access to io queues	Keith Busch	2014-03-24	1	-46/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds rcu protected access to nvme_queue to fix a race between a surprise removal freeing the queue and a thread with open reference on a NVMe block device using that queue. The queues do not need to be rcu protected during the initialization or shutdown parts, so I've added a helper function for raw deferencing to get around the sparse errors. There is still a hole in the IOCTL path for the same problem, which is fixed in a subsequent patch. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Initialize device reference count earlier	Keith Busch	2014-03-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an NVMe device becomes ready but fails to create IO queues, the driver creates a character device handle so the device can be managed. The device reference count needs to be initialized before creating the character device. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions	Jingoo Han	2014-03-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following build warning when CONFIG_PM_SLEEP is not selected. This is because sleep PM callbacks defined by SIMPLE_DEV_PM_OPS are only used when the CONFIG_PM_SLEEP is enabled. drivers/block/nvme-core.c:2541:12: warning: 'nvme_suspend' defined but not used [-Wunused-function] drivers/block/nvme-core.c:2550:12: warning: 'nvme_resume' defined but not used [-Wunused-function] Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* \|	Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-block	Linus Torvalds	2014-04-01	1	-24/+9
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull block driver update from Jens Axboe: "On top of the core pull request, here's the pull request for the driver related changes for 3.15. It contains: - Improvements for msi-x registration for block drivers (mtip32xx, skd, cciss, nvme) from Alexander Gordeev. - A round of cleanups and improvements for drbd from Andreas Gruenbacher and Rashika Kheria. - A round of clanups and improvements for bcache from Kent. - Removal of sleep_on() and friends in DAC960, ataflop, swim3 from Arnd Bergmann. - Bug fix for a bug in the mtip32xx async completion code from Sam Bradshaw. - Bug fix for accidentally bouncing IO on 32-bit platforms with mtip32xx from Felipe Franciosi" * 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits) bcache: remove nested function usage bcache: Kill bucket->gc_gen bcache: Kill unused freelist bcache: Rework btree cache reserve handling bcache: Kill btree_io_wq bcache: btree locking rework bcache: Fix a race when freeing btree nodes bcache: Add a real GC_MARK_RECLAIMABLE bcache: Add bch_keylist_init_single() bcache: Improve priority_stats bcache: Better alloc tracepoints bcache: Kill dead cgroup code bcache: stop moving_gc marking buckets that can't be moved. bcache: Fix moving_pred() bcache: Fix moving_gc deadlocking with a foreground write bcache: Fix discard granularity bcache: Fix another bug recovering from unclean shutdown bcache: Fix a bug recovering from unclean shutdown bcache: Fix a journalling reclaim after recovery bug bcache: Fix a null ptr deref in journal replay ...
\| * \|	nvme: Use pci_enable_msi_range() and pci_enable_msix_range()	Alexander Gordeev	2014-03-13	1	-24/+9
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() or pci_enable_msi_exact() and pci_enable_msix_range() or pci_enable_msix_exact() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* \|	nvme: don't use PREPARE_WORK	Tejun Heo	2014-03-07	1	-6/+12
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PREPARE_[DELAYED_]WORK() are being phased out. They have few users and a nasty surprise in terms of reentrancy guarantee as workqueue considers work items to be different if they don't have the same work function. nvme_dev->reset_work is multiplexed with multiple work functions. Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and always use it as the work function and update the users to set the ->reset_workfn field instead of overriding the work function using PREPARE_WORK(). It would probably be best to route this with other related updates through the workqueue tree. Compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: linux-nvme@lists.infradead.org
*	Merge git://git.infradead.org/users/willy/linux-nvme	Linus Torvalds	2014-02-05	1	-105/+505
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NVMe driver update from Matthew Wilcox: "Looks like I missed the merge window ... but these are almost all bugfixes anyway (the ones that aren't have been baking for months)" * git://git.infradead.org/users/willy/linux-nvme: NVMe: Namespace use after free on surprise removal NVMe: Correct uses of INIT_WORK NVMe: Include device and queue numbers in interrupt name NVMe: Add a pci_driver shutdown method NVMe: Disable admin queue on init failure NVMe: Dynamically allocate partition numbers NVMe: Async IO queue deletion NVMe: Surprise removal handling NVMe: Abort timed out commands NVMe: Schedule reset for failed controllers NVMe: Device resume error handling NVMe: Cache dev->pci_dev in a local pointer NVMe: Fix lockdep warnings NVMe: compat SG_IO ioctl NVMe: remove deprecated IRQF_DISABLED NVMe: Avoid shift operation when writing cq head doorbell
\| *	NVMe: Namespace use after free on surprise removal	Keith Busch	2014-02-02	1	-13/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An nvme block device may have open references when the device is removed. New commands may still be sent on the removed device, so we need to ref count the opens, return errors for new commands, and not free the namespace and nvme_dev until all references are closed. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Correct uses of INIT_WORK	Matthew Wilcox	2014-01-29	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to initialise the work_struct when we initialise the rest of the struct nvme_dev, otherwise we'll hit a lockdep warning when we remove the device. Use PREPARE_WORK to change the function pointer instead of INIT_WORK. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Include device and queue numbers in interrupt name	Matthew Wilcox	2014-01-27	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On larger systems with many drives, it may help debugging to know which queue is tied to which interrupt, just by looking at /proc/interrupts. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Add a pci_driver shutdown method	Keith Busch	2014-01-27	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to shut down the device cleanly when the system is being shut down. This was in an earlier patch but was inadvertently lost during a rewrite. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Disable admin queue on init failure	Keith Busch	2014-01-27	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disable the admin queue if device fails during initialization so the queue's irq is freed. Signed-off-by: Keith Busch <keith.busch@intel.com> [rewritten to use nvme_free_queues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Dynamically allocate partition numbers	Matthew Wilcox	2014-01-27	1	-33/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some users need more than 64 partitions per device. Rather than simply increasing the number of partitions, switch to the dynamic partition allocation scheme. This means that minor numbers are not stable across boots, but since major numbers aren't either, I cannot see this being a significant problem. Tested-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Async IO queue deletion	Keith Busch	2014-01-27	1	-12/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This attempts to delete all IO queues at the same time asynchronously on shutdown. This is necessary for a present device that is not responding; a shutdown operation previously would take 2 minutes per queue-pair to timeout before moving on to the next queue, making a device removal appear to take a very long time or "hung" as reported by users. In the previous worst case, a removal may be stuck forever until a kill signal is given if there are more than 32 queue pairs since it would run out of admin command IDs after over an hour of timed out sync commands (admin queue depth is 64). This patch will wait for the admin command timeout for all commands to complete, so the worst case now for an unresponsive controller is 60 seconds, though that still seems like a long time. Since this adds another way to take queues offline, some duplicate code resulted so I moved these into more convienient functions. Signed-off-by: Keith Busch <keith.busch@intel.com> [make functions static, correct line length and whitespace issues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Surprise removal handling	Keith Busch	2014-01-27	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds checks to see if the nvme pci device was removed. The check reads the status register for the value of -1, which it should never be unless the device is no longer present. If a user performs a surprise removal on an nvme device, the driver will be notified either by the pci driver remove callback if the platform's slot is capable of this event, or via reading the device BAR status register, which will indicate controller failure and trigger a reset. Either way, the device is not present so all outstanding commands would timeout. This will not send queue deletion commands to a drive that isn't present and fail after ioremap, significantly speeding up surprise removal; previously this took over 2 minutes per IO queue pair created, but this will complete removing the device within a few seconds. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Abort timed out commands	Keith Busch	2014-01-27	1	-1/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Send nvme abort command to io requests that have timed out on an initialized device. If the command is not returned after another timeout, schedule the controller for reset. Signed-off-by: Keith Busch <keith.busch@intel.com> [fix endianness issues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Schedule reset for failed controllers	Keith Busch	2014-01-27	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Schedules a controller reset when it indicates it has a failed status. If the device does not become ready after a reset, the pci device will be scheduled for removal. Signed-off-by: Keith Busch <keith.busch@intel.com> [fixed checkpatch issue] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Device resume error handling	Keith Busch	2013-12-16	1	-15/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds controller error handling on resume power management. If the device fails to initialize, the device is queued for a reset. If the reset fails, a thread is spawned to remove the pci device. If the device resumes as "busy", the device is responding to admin commands but will not create IO queues. In this case, we need to remove the gendisks and free the IO queues since they can't be used and may be holding bios in their lists. From testing, the dma pools require a pci device so this had to change the pci driver 'remove' to release the dma resources in line with that call instead of after all references to the device are released. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Cache dev->pci_dev in a local pointer	Matthew Wilcox	2013-12-16	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Helps with line-length issues Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Fix lockdep warnings	Matthew Wilcox	2013-12-16	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the initialisation path, the queue lock is taken without interrupt protection. It's perfectly safe to do so, because the interrupt handler can't run at this point, but it confuses lockdep. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: compat SG_IO ioctl	Keith Busch	2013-12-16	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For 32-bit versions of sg3-utils running on a 64-bit system. This is mostly a copy from the relevent portions of fs/compat_ioctl.c, with slight modifications for going through block_device_operations. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com> [fixed up CONFIG_COMPAT=n build problems] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: remove deprecated IRQF_DISABLED	Michael Opdenacker	2013-11-18	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch proposes to remove the use of the IRQF_DISABLED flag It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
\| *	NVMe: Avoid shift operation when writing cq head doorbell	Haiyan Hu	2013-11-18	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changes the type of dev->db_stride to unsigned and changes the value stored there to be 1 << the current value. Then there is less calculation to be done at completion time. Signed-off-by: Haiyan Hu <huhaiyan@huawei.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* \|	block: Introduce new bio_split()	Kent Overstreet	2013-11-23	1	-97/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new bio_split() can split arbitrary bios - it's not restricted to single page bios, like the old bio_split() (previously renamed to bio_pair_split()). It also has different semantics - it doesn't allocate a struct bio_pair, leaving it up to the caller to handle completions. Then convert the existing bio_pair_split() users to the new bio_split() - and also nvme, which was open coding bio splitting. (We have to take that BUG_ON() out of bio_integrity_trim() because this bio_split() needs to use it, and there's no reason it has to be used on bios marked as cloned; BIO_CLONED doesn't seem to have clearly documented semantics anyways.) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Neil Brown <neilb@suse.de>
* \|	block: Convert bio_for_each_segment() to bvec_iter	Kent Overstreet	2013-11-23	1	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	More prep work for immutable biovecs - with immutable bvecs drivers won't be able to use the biovec directly, they'll need to use helpers that take into account bio->bi_iter.bi_bvec_done. This updates callers for the new usage without changing the implementation yet. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Jim Paris <jim@jtan.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Cc: support@lsi.com Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Cc: linux-m68k@lists.linux-m68k.org Cc: linuxppc-dev@lists.ozlabs.org Cc: drbd-user@lists.linbit.com Cc: nbd-general@lists.sourceforge.net Cc: cbe-oss-dev@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: DL-MPTFusionLinux@lsi.com Cc: linux-scsi@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: linux-fsdevel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: linux-mm@kvack.org Acked-by: Geoff Levand <geoff@infradead.org>