summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/host/rdma.c
Commit message (Collapse)AuthorAgeFilesLines
* blk-mq: update ->init_request and ->exit_request prototypesChristoph Hellwig2017-05-021-14/+14
| | | | | | | | | | | | Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-05-011-87/+67
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
| * nvme: split nvme status from block req->errorsChristoph Hellwig2017-04-201-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want our own clearly defined error field for NVMe passthrough commands, and the request errors field is going away in its current form. Just store the status and result field in the nvme_request field from hardirq completion context (using a new helper) and then generate a Linux errno for the block layer only when we actually need it. Because we can't overload the status value with a negative error code for cancelled command we now have a flags filed in struct nvme_request that contains a bit for this condition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme: factor request completion code into a common helperChristoph Hellwig2017-04-041-18/+2
| | | | | | | | | | | | | | | | | | | | | | | | This avoids duplicating the logic four times, and it also allows to keep some helpers static in core.c or just opencode them. Note that this loses printing the aborted status on completions in the PCI driver as that uses a data structure not available any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: increment request retries counter before requeuingSagi Grimberg2017-04-041-0/+1
| | | | | | | | | | | | | | | | | | This way our max retry limit holds as well. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: Support ctrl_loss_tmoSagi Grimberg2017-04-041-13/+28
| | | | | | | | | | | | | | | | | | | | | | | | Before scheduling a reconnect attempt, check nr_reconnects against max_reconnects, if not exhausted (or max_reconnects is not -1), schedule a reconnect attempts, otherwise schedule ctrl removal. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: get rid of local reconnect_delaySagi Grimberg2017-04-041-5/+3
| | | | | | | | | | | | | | | | we already have it in opts. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: fix module_init (theoretical) error pathSagi Grimberg2017-04-041-5/+13
| | | | | | | | | | | | | | | | If nvmf_register_transport happened to fail (it can't, but theoretically) we leak memory. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: use inet_pton_with_scope helperSagi Grimberg2017-04-041-44/+19
| | | | | | | | | | | | | | | | | | | | Both the destination and the host addresses are now parsed using inet_pton_with_scope helper. We also get ipv6 (with address scopes support). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: Give some more grace for rdma connection establishmentSagi Grimberg2017-04-041-1/+1
| | | | | | | | | | | | | | | | | | The target might be occupied with multiple hosts so lets give it some more grace before failing the connection establishment. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-rdma: handle cpu unplug when re-establishing the controllerSagi Grimberg2017-04-041-14/+14
| | | | | | | | | | | | | | | | | | | | | | If a cpu unplug event has occured, we need to take the minimum of the provided nr_io_queues and the number of online cpus, otherwise we won't be able to connect them as blk-mq mapping won't dispatch to those queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * blk-mq: constify struct blk_mq_opsEric Biggers2017-03-311-2/+2
| | | | | | | | | | | | | | Constify all instances of blk_mq_ops, as they are never modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme-rdma: Fix sqsize wrong assignment based on ctrl MQES capabilitySagi Grimberg2017-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | both our sqsize and the controller MQES cap are a 0 based value, so making it 1 based is wrong. Reported-by: Trapp, Darren <Darren.Trapp@cavium.com> Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme-rdma: handle cpu unplug when re-establishing the controllerSagi Grimberg2017-03-211-14/+14
|/ | | | | | | | | | If a cpu unplug event has occured, we need to take the minimum of the provided nr_io_queues and the number of online cpus, otherwise we won't be able to connect them as blk-mq mapping won't dispatch to those queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* scripts/spelling.txt: add "embeded" pattern and fix typo instancesMasahiro Yamada2017-02-271-1/+1
| | | | | | | | | | | Fix typos and add the following to the scripts/spelling.txt: embeded||embedded Link: http://lkml.kernel.org/r/1481573103-11329-12-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for-next-dma_ops' of ↵Linus Torvalds2017-02-251-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma DMA mapping updates from Doug Ledford: "Drop IB DMA mapping code and use core DMA code instead. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree and has been kept separate for that reason." * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it IB/core: Remove ib_device.dma_device nvme-rdma: Switch from dma_device to dev.parent RDS: net: Switch from dma_device to dev.parent IB/srpt: Modify a debug statement IB/srp: Switch from dma_device to dev.parent IB/iser: Switch from dma_device to dev.parent IB/IPoIB: Switch from dma_device to dev.parent IB/rxe: Switch from dma_device to dev.parent IB/vmw_pvrdma: Switch from dma_device to dev.parent IB/usnic: Switch from dma_device to dev.parent IB/qib: Switch from dma_device to dev.parent IB/qedr: Switch from dma_device to dev.parent IB/ocrdma: Switch from dma_device to dev.parent IB/nes: Remove a superfluous assignment statement IB/mthca: Switch from dma_device to dev.parent IB/mlx5: Switch from dma_device to dev.parent IB/mlx4: Switch from dma_device to dev.parent IB/i40iw: Remove a superfluous assignment statement IB/hns: Switch from dma_device to dev.parent ...
| * nvme-rdma: Switch from dma_device to dev.parentBart Van Assche2017-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Prepare for removal of ib_device.dma_device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Acked-by: Sagi Grimberg <sagi@grimbeg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | nvme-rdma: add support for host_traddrMax Gurtovoy2017-02-221-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will enable the user to control the specific interface for connection establishment in case the host has more than 1 interface under the same subnet. E.g: Host interfaces configured as: - ib0 1.1.1.1/16 - ib1 1.1.1.2/16 Target interfaces configured as: - ib0 1.1.1.3/16 (listener interface) - ib1 1.1.1.4/16 the following connect command will go through host iface ib0 (default): nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023 but the following command will go through host iface ib1: nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023 -w 1.1.1.2 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme-rdma: move nvme cm status helper to .h fileMax Gurtovoy2017-02-221-22/+0
| | | | | | | | | | | | | | | | | | | | | | This will enable the usage for nvme rdma target. Also move from a lookup array to a switch statement. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme: make nvmf_register_transport require a create_ctrl callbackJohannes Thumshirn2017-02-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | nvmf_create_ctrl() relys on the presence of a create_crtl callback in the registered nvmf_transport_ops, so make nvmf_register_transport require one. Update the available call-sites as well to reflect these changes. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | block: fold cmd_type into the REQ_OP_ spaceChristoph Hellwig2017-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | block: introduce blk_rq_is_passthroughChristoph Hellwig2017-01-311-2/+2
| | | | | | | | | | | | | | | | | | This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme: use blk_rq_payload_bytesChristoph Hellwig2017-01-131-8/+5
| | | | | | | | | | | | | | | | | | | | The new blk_rq_payload_bytes generalizes the payload length hacks that nvme_map_len did before. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme-rdma: fix nvme_rdma_queue_is_readyChristoph Hellwig2017-01-111-1/+1
|/ | | | | | | | | | Now that we don't abuse the cmd field in struct request for nvme command passthrough this function needs to be converted to the proper accessor as well. Fixes: d49187e97e ("nvme: introduce struct nvme_request") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
* Merge tag 'for-linus' of ↵Linus Torvalds2016-12-151-6/+36
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
| * nvme-rdma: use rdma connection reject helper functionsSteve Wise2016-12-141-6/+36
| | | | | | | | | | | | | | | | Also add nvme cm status strings and use them. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds2016-12-131-21/+22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
| * | block: improve handling of the magic discard payloadChristoph Hellwig2016-12-091-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme-fabrics: set sqe.command_id in core not transportsJames Smart2016-12-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, core.c sets command_id only on rd/wr commands, leaving it to the transport to set it again to ensure the request had a command id. Move location of set in core so applies to all commands. Remove transport sets. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme-rdma: align to generic ib_event logging helperMax Gurtovoy2016-12-061-1/+3
| | | | | | | | | | | | | | | | | | Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-rdma: remove redundant defineSagi Grimberg2016-12-061-1/+0
| | | | | | | | | | | | | | | Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme-rdma: force queue size to respect controller capabilitySamuel Jones2016-12-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Queue size needs to respect the Maximum Queue Entries Supported advertised by the controller in its Capability register. Signed-off-by: Samuel Jones <sjones@kalray.eu> Reviewed-by: Christoph Hellwig <hch@lst.de> [sagig: fixed queue_size adjustment according to Daniel Verkamp <daniel.verkamp@intel.com> comment] Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme: untangle 0 and BLK_MQ_RQ_QUEUE_OKOmar Sandoval2016-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having specific values. No functional change. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: don't pass the full CQE to nvme_complete_async_eventChristoph Hellwig2016-11-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need the status and result fields, and passing them explicitly makes life a lot easier for the Fibre Channel transport which doesn't have a full CQE for the fast path case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: introduce struct nvme_requestChristoph Hellwig2016-11-101-8/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a shared per-request structure for all NVMe I/O. This structure is embedded as the first member in all NVMe transport drivers request private data and allows to implement common functionality between the drivers. The first use is to replace the current abuse of the SCSI command passthrough fields in struct request for the NVMe command passthrough, but it will grow a field more fields to allow implementing things like common abort handlers in the future. The passthrough commands are handled by having a pointer to the SQE (struct nvme_command) in struct nvme_request, and the union of the possible result fields, which had to be turned from an anonymous into a named union for that purpose. This avoids having to pass a reference to a full CQE around and thus makes checking the result a lot more lightweight. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme-rdma: stop and free io queues on connect failureSteve Wise2016-11-141-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing nvme-rdma with the spdk nvmf target over iw_cxgb4, I configured the target (mistakenly) to generate an error creating the NVMF IO queues. This resulted a "Invalid SQE Parameter" error sent back to the host on the first IO queue connect: [ 9610.928182] nvme nvme1: queue_size 128 > ctrl maxcmd 120, clamping down [ 9610.938745] nvme nvme1: creating 32 I/O queues. So nvmf_connect_io_queue() returns an error to nvmf_connect_io_queue() / nvmf_connect_io_queues(), and that is returned to nvme_rdma_create_io_queues(). In the error path, nvmf_rdma_create_io_queues() frees the queue tagset memory _before_ stopping and freeing the IB queues, which causes yet another touch-after-free crash due to SQ CQEs being flushed after the ib_cqe structs pointed-to by the flushed WRs have been freed (since they are part of the nvme_rdma_request struct). The fix is to stop and free the queues in nvmf_connect_io_queues() if there is an error connecting any of the queues. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | nvme-rdma: reject non-connect commands before the queue is liveChristoph Hellwig2016-11-141-1/+30
|/ | | | | | | | | | | | | | If we reconncect we might have command queue up that get resent as soon as the queue is restarted. But until the connect command succeeded we can't send other command. Add a new flag that marks a queue as live when connect finishes, and delay any non-connect command until the queue is live based on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> [sagig: fixes admin queue LIVE setting] Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-blockLinus Torvalds2016-10-091-2/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event
| * blk-mq: remove ->map_queueChristoph Hellwig2016-09-151-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | Merge tag 'for-linus' of ↵Linus Torvalds2016-10-091-20/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
| * | nvme-rdma: use IB_PD_UNSAFE_GLOBAL_RKEYChristoph Hellwig2016-09-231-20/+5
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig2016-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | | nvme-rdma: only clear queue flags after successful connectSagi Grimberg2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly try to stop/free rdma qps/cm_ids that are already freed. Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag") Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | nvme-rdma: fix null pointer dereference on req->mrColin Ian King2016-09-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is an error on req->mr, req->mr is set to null, however the following statement sets req->mr->need_inval causing a null pointer dereference. Fix this by bailing out to label 'out' to immediately return and hence skip over the offending null pointer dereference. Fixes: f5b7b559e1488 ("nvme-rdma: Get rid of duplicate variable") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | | nvme-rdma: use ib_client API to detect device removalSteve Wise2016-09-121-68/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change nvme-rdma to use the IB Client API to detect device removal. This has the wonderful benefit of being able to blow away all the ib/rdma_cm resources for the device being removed. No craziness about not destroying the cm_id handling the event. No deadlocks due to broken iw_cm/rdma_cm/iwarp dependencies. And no need to have a bound cm_id around during controller recovery/reconnect to catch device removal events. We don't use the device_add aspect of the ib_client service since we only want to create resources for an IB device if we have a target utilizing that device. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | | nvme-rdma: add DELETING queue flagSagi Grimberg2016-09-121-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get a surprise disconnect from the target we queue a periodic reconnect (which is the sane thing to do...). We only move the queues out of CONNECTED when we retry to reconnect (after 10 seconds in the default case) but we stop the blk queues immediately so we are not bothered with traffic from now on. If delete() is kicking off in this period the queues are still in CONNECTED state. Part of the delete sequence is trying to issue ctrl shutdown if the admin queue is CONNECTED (which it is!). This request is issued but stuck in blk-mq waiting for the queues to start again. This might be the one preventing us from forward progress... The patch separates the queue flags to CONNECTED and DELETING. Now we will move out of CONNECTED as soon as error recovery kicks in (before stopping the queues) and DELETING is on when we start the queue deletion. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | | nvme-rdma: destroy nvme queue rdma resources on connect failureSteve Wise2016-09-041-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After address resolution, the nvme_rdma_queue rdma resources are allocated. If rdma route resolution or the connect fails, or the controller reconnect times out and gives up, then the rdma resources need to be freed. Otherwise, rdma resources are leaked. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | | nvme_rdma: keep a ref on the ctrl during delete/flushSteve Wise2016-09-041-8/+18
| |/ |/| | | | | | | | | | | Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | nvme-rdma: Get rid of redundant definesSagi Grimberg2016-08-281-4/+0
| | | | | | | | | | Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | nvme-rdma: Get rid of duplicate variableSagi Grimberg2016-08-281-8/+7
| | | | | | | | | | | | | | | | We already have need_inval in ib_mr, lets use that instead. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
OpenPOWER on IntegriCloud