summaryrefslogtreecommitdiffstats
path: root/sys/dev/nvme/nvme_qpair.c
Commit message (Collapse)AuthorAgeFilesLines
* nvme: do not pre-allocate MSI-X IRQ resourcesjimharris2016-01-071-1/+2
| | | | | | | | The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel
* nvme: use BUS_SPACE_MAXSIZE for bus_dma_tag_create maxsize parameterjimharris2015-04-091-1/+1
| | | | | | | This fixes i386 PAE build fallout from r281281. Reported by: bz MFC after: 1 week
* nvme: remove CHATHAM related codejimharris2015-04-081-9/+0
| | | | | | | | Chatham was an internal NVMe prototype board used for early driver development. MFC after: 1 week Sponsored by: Intel
* nvme: create separate DMA tag for non-payload DMA buffersjimharris2015-04-081-9/+38
| | | | | | | | | | | Submission and completion queue memory need to use a separate DMA tag for mappings than payload buffers, to ensure mappings remain contiguous even with DMAR enabled. Submitted by: kib MFC after: 1 week Sponsored by: Intel
* nvme: Allocate all MSI resources up front so that we can fall back tojimharris2014-03-181-4/+2
| | | | | | | INTx if necessary. Sponsored by: Intel MFC after: 3 days
* nvme: NVMe specification dictates 4-byte alignment for PRPs (not 8).jimharris2014-03-171-1/+2
| | | | | Sponsored by: Intel MFC after: 3 days
* Update copyright dates.jimharris2013-07-091-1/+1
| | | | MFC after: 3 days
* Remove remaining uio-related code.jimharris2013-06-261-16/+0
| | | | | | | | The nvme_physio() function was removed quite a while ago, which was the only user of this uio-related code. Sponsored by: Intel MFC after: 3 days
* Fail any passthrough command whose transfer size exceeds the controller'sjimharris2013-06-261-0/+7
| | | | | | | | | | | max transfer size. This guards against rogue commands coming in from userspace. Also add KASSERTS for the virtual address and unmapped bio cases, if the transfer size exceeds the controller's max transfer size. Sponsored by: Intel MFC after: 3 days
* Use MAXPHYS to specify the maximum I/O size for nvme(4).jimharris2013-06-261-4/+3
| | | | | | | | | | | Also allow admin commands to transfer up to this maximum I/O size, rather than the artificial limit previously imposed. The larger I/O size is very beneficial for upcoming firmware download support. This has the added benefit of simplifying the code since both admin and I/O commands now use the same maximum I/O size. Sponsored by: Intel MFC after: 3 days
* Move the busdma mapping functions to nvme_qpair.c.jimharris2013-04-121-0/+45
| | | | | | This removes nvme_uio.c completely. Sponsored by: Intel
* Do not panic when a busdma mapping operation fails.jimharris2013-04-121-4/+21
| | | | | | | Instead, print an error message and fail the associated command with DATA_TRANSFER_ERROR NVMe completion status. Sponsored by: Intel
* Add unmapped bio support to nvme(4) and nvd(4).jimharris2013-04-011-0/+8
| | | | Sponsored by: Intel
* Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, orjimharris2013-03-291-15/+19
| | | | | | | NULL. This simplifies decisions around if/how requests are routed through busdma. It also paves the way for supporting unmapped bios. Sponsored by: Intel
* Fix printf format issue on i386.jimharris2013-03-271-2/+3
| | | | Reported by: bz
* Clean up debug prints.jimharris2013-03-261-16/+223
| | | | | | | | | 1) Consistently use device_printf. 2) Make dump_completion and dump_command into something more human-readable. Sponsored by: Intel Reviewed by: carl
* Change a number of malloc(9) calls to use M_WAITOK instead ofjimharris2013-03-261-11/+4
| | | | | | | | M_NOWAIT. Sponsored by: Intel Suggested by: carl Reviewed by: carl
* Abort and do not retry any outstanding admin commands left over afterjimharris2013-03-261-0/+15
| | | | | | | a controller reset. Sponsored by: Intel Reviewed by: carl
* Add the ability to internally mark a controller as failed, if it is unable tojimharris2013-03-261-11/+90
| | | | | | | | | | | | | | | start or reset. Also add a notifier for NVMe consumers for controller fail conditions and plumb this notifier for nvd(4) to destroy the associated GEOM disks when a failure occurs. This requires a bit of work to cover the races when a consumer is sending I/O requests to a controller that is transitioning to the failed state. To help cover this condition, add a task to defer completion of I/Os submitted to a failed controller, so that the consumer will still always receive its completions in a different context than the submission. Sponsored by: Intel Reviewed by: carl
* Just disable the controller instead of deleting IO queues during detach.jimharris2013-03-261-59/+16
| | | | | | | | This is just as effective, and removes the need for a bunch of admin commands to a controller that's going to be disabled shortly anyways. Sponsored by: Intel Reviewed by: carl
* Cap the number of retry attempts to a configurable number. This ensuresjimharris2013-03-261-10/+24
| | | | | | | | | that if a specific I/O repeatedly times out, we don't retry it indefinitely. The default number of retries will be 4, but is adjusted using hw.nvme.retry_count. Sponsored by: Intel Reviewed by: carl
* Create struct nvme_status.jimharris2013-03-261-13/+6
| | | | | | | | | | | | | | NVMe error log entries include status, so breaking this out into its own data structure allows it to be included in both the nvme_completion data structure as well as error log entry data structures. While here, expose nvme_completion_is_error(), and change all of the places that were explicitly looking at sc/sct bits to use this macro instead. Sponsored by: Intel Reviewed by: carl
* Make nvme_ctrlr_reset a nop if a reset is already in progress.jimharris2013-03-261-4/+15
| | | | | | | | | | | | | This protects against cases where a controller crashes with multiple I/O outstanding, each timing out and requesting controller resets simultaneously. While here, remove a debugging printf from a previous commit, and add more logging around I/O that need to be resubmitted after a controller reset. Sponsored by: Intel Reviewed by: carl
* By default, always escalate to controller reset when an I/O times out.jimharris2013-03-261-11/+10
| | | | | | | | | | While aborts are typically cleaner than a full controller reset, many times an I/O timeout indicates other controller-level issues where aborts may not work. NVMe drivers for other operating systems are also defaulting to controller reset rather than aborts for timed out I/O. Sponsored by: Intel Reviewed by: carl
* Add a tunable for the I/O timeout interval. Default is still 30 seconds,jimharris2013-03-261-4/+7
| | | | | | | but can be adjusted between a min/max of 5 and 120 seconds. Sponsored by: Intel Reviewed by: carl
* Add handling for controller fatal status (csts.cfs).jimharris2013-03-261-2/+18
| | | | | | | | | | | | | On any I/O timeout, check for csts.cfs==1. If set, the controller is reporting fatal status and we reset the controller immediately, rather than trying to abort the timed out command. This changeset also includes deferring the controller start portion of the reset to a separate task. This ensures we are always performing a controller start operation from a consistent context. Sponsored by: Intel Reviewed by: carl
* Add controller reset capability to nvme(4) and ability to explicitlyjimharris2013-03-261-68/+139
| | | | | | | | | | | | | | invoke it from nvmecontrol(8). Controller reset will be performed in cases where I/O are repeatedly timing out, the controller reports an unrecoverable condition, or when explicitly requested via IOCTL or an nvme consumer. Since the controller may be in such a state where it cannot even process queue deletion requests, we will perform a controller reset without trying to clean up anything on the controller first. Sponsored by: Intel Reviewed by: carl
* Keep a doubly-linked list of outstanding trackers.jimharris2013-03-261-8/+11
| | | | | | This enables in-order re-submission of I/O after a controller reset. Sponsored by: Intel
* Enable asynchronous event requests on non-Chatham devices.jimharris2013-03-261-8/+54
| | | | | | | | Also add logic to clean up all outstanding asynchronous event requests when resetting or shutting down the controller, since these requests will not be explicitly completed by the controller itself. Sponsored by: Intel
* Specify command timeout interval on a per-command type basis.jimharris2013-03-261-3/+4
| | | | | | | This is primarily driven by the need to disable timeouts for asynchronous event requests, which by nature should not be timed out. Sponsored by: Intel
* Explicitly abort a timed out command, if the ABORT command sent to thejimharris2013-03-261-1/+29
| | | | | | controller indicates the command was not found. Sponsored by: Intel
* Break out the code for completing an nvme_tracker object into a separatejimharris2013-03-261-43/+59
| | | | | | | | | | | function. This allows for completions outside the normal completion path, for example when an ABORT command fails due to the controller reporting the targeted command does not exist. This is mainly for protection against a faulty controller, but we need to clean up our internal request nonetheless. Sponsored by: Intel
* Add support for ABORT commands, including issuing these commands whenjimharris2013-03-261-5/+6
| | | | | | | | an I/O times out. Also ensure that we retry commands that are aborted due to a timeout. Sponsored by: Intel
* Add an internal _nvme_qpair_submit_request function, which performsjimharris2013-03-261-6/+15
| | | | | | | | | | | the submit action assuming the qpair lock has already been acquired. Also change nvme_qpair_submit_request to just lock/unlock the mutex around a call to this new function. This fixes a recursive mutex acquisition in the retry path. Sponsored by: Intel
* Use callout_reset_curcpu to allow the callout to be handled by thejimharris2012-10-311-0/+5
| | | | | | | | | current CPU and not always CPU 0. This has the added benefit of reducing a huge amount of spinlock contention on the callout_cpu spinlock for CPU 0. Sponsored by: Intel
* Fix build after r241659.glebius2012-10-181-1/+1
|
* Add ability to queue nvme_request objects if no nvme_trackers are available.jimharris2012-10-181-9/+29
| | | | | | | | | This eliminates the need to manage queue depth at the nvd(4) level for Chatham prototype board workarounds, and also adds the ability to accept a number of requests on a single qpair that is much larger than the number of trackers allocated. Sponsored by: Intel
* Preallocate a limited number of nvme_tracker objects per qpair, ratherjimharris2012-10-181-49/+40
| | | | | | than dynamically creating them at runtime. Sponsored by: Intel
* Create nvme_qpair_submit_request() which eliminates all of the codejimharris2012-10-181-0/+32
| | | | | | | duplication between the admin and io controller-level submit functions. Sponsored by: Intel
* Simplify how the qpair lock is acquired and released.jimharris2012-10-181-9/+2
| | | | Sponsored by: Intel
* Cleanup uio-related code to use struct nvme_request andjimharris2012-10-181-1/+1
| | | | | | | | | | | | nvme_ctrlr_submit_io_request(). While here, also fix case where a uio may have more than 1 iovec. NVMe's definition of SGEs (called PRPs) only allows for the first SGE to start on a non-page boundary. The simplest way to handle this is to construct a temporary uio for each iovec, and submit an NVMe request for each. Sponsored by: Intel
* Add nvme_ctrlr_submit_[admin|io]_request functions which consolidatesjimharris2012-10-181-0/+1
| | | | | | | code for allocating nvme_tracker objects and making calls into bus_dmamap_load for commands which have payloads. Sponsored by: Intel
* Add struct nvme_request object which contains all of the parameters passedjimharris2012-10-181-6/+12
| | | | | | | | | | | from an NVMe consumer. This allows us to mostly build NVMe command buffers without holding the qpair lock, and also allows for future queueing of nvme_request objects in cases where the submission queue is full and no nvme_tracker objects are available. Sponsored by: Intel
* Merge struct nvme_prp_list into struct nvme_tracker.jimharris2012-10-181-42/+11
| | | | | | | | | | | | | This simplifies the driver significantly where it is constructing commands to be submitted to hardware. By reducing the number of PRPs (NVMe parlance for SGE) from 128 to 32, it ensures we do not allocate too much memory for more common smaller I/O sizes, while still supporting up to 128KB I/O sizes. This also paves the way for pre-allocation of nvme_tracker objects for each queue which will simplify the I/O path even further. Sponsored by: Intel
* Add return codes to all functions used for submitting commands to I/Ojimharris2012-10-181-1/+15
| | | | | | queues. Sponsored by: Intel
* Count number of times each queue pair's interrupt handler is invoked.jimharris2012-10-101-0/+3
| | | | | | | Also add sysctls to query and reset each queue pair's stats, including the new count added here. Sponsored by: Intel
* This is the first of several commits which will add NVM Express (NVMe)jimharris2012-09-171-0/+422
support to FreeBSD. A full description of the overall functionality being added is below. nvmexpress.org defines NVM Express as "an optimized register interface, command set and feature set fo PCI Express (PCIe)-based Solid-State Drives (SSDs)." This commit adds nvme(4) and nvd(4) driver source code and Makefiles to the tree. Full NVMe functionality description: Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe) device support. There will continue to be ongoing work on NVM Express support, but there is more than enough to allow for evaluation of pre-production NVM Express devices as well as soliciting feedback. Questions and feedback are welcome. nvme(4) implements NVMe hardware abstraction and is a provider of NVMe namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN. nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks. nvmecontrol(8) is used for NVMe configuration and management. The following are currently supported: nvme(4) - full mandatory NVM command set support - per-CPU IO queues (enabled by default but configurable) - per-queue sysctls for statistics and full command/completion queue dumps for debugging - registration API for NVMe namespace consumers - I/O error handling (except for timeoutsee below) - compilation switches for support back to stable-7 nvd(4) - BIO_DELETE and BIO_FLUSH (if supported by controller) - proper BIO_ORDERED handling nvmecontrol(8) - devlist: list NVMe controllers and their namespaces - identify: display controller or namespace identify data in human-readable or hex format - perftest: quick and dirty performance test to measure raw performance of NVMe device without userspace/physio/GEOM overhead The following are still work in progress and will be completed over the next 3-6 months in rough priority order: - complete man pages - firmware download and activation - asynchronous error requests - command timeout error handling - controller resets - nvmecontrol(8) log page retrieval This has been primarily tested on amd64, with light testing on i386. I would be happy to provide assistance to anyone interested in porting this to other architectures, but am not currently planning to do this work myself. Big-endian and dmamap sync for command/completion queues are the main areas that would need to be addressed. The nvme(4) driver currently has references to Chatham, which is an Intel-developed prototype board which is not fully spec compliant. These references will all be removed over time. Sponsored by: Intel Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>
OpenPOWER on IntegriCloud