| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
The issue referenced here was resolved by other changes
in recent commits, so this code is no longer needed.
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
| |
This fixes i386 PAE build fallout from r281281.
Reported by: bz
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
Chatham was an internal NVMe prototype board used for
early driver development.
MFC after: 1 week
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
Submission and completion queue memory need to use a
separate DMA tag for mappings than payload buffers,
to ensure mappings remain contiguous even with DMAR
enabled.
Submitted by: kib
MFC after: 1 week
Sponsored by: Intel
|
|
|
|
|
|
|
| |
INTx if necessary.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
| |
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
|
|
|
| |
The nvme_physio() function was removed quite a while ago, which was the
only user of this uio-related code.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
| |
max transfer size. This guards against rogue commands coming in from
userspace.
Also add KASSERTS for the virtual address and unmapped bio cases, if the
transfer size exceeds the controller's max transfer size.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
| |
Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed. The larger I/O size is very
beneficial for upcoming firmware download support. This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
| |
This removes nvme_uio.c completely.
Sponsored by: Intel
|
|
|
|
|
|
|
| |
Instead, print an error message and fail the associated command with
DATA_TRANSFER_ERROR NVMe completion status.
Sponsored by: Intel
|
|
|
|
| |
Sponsored by: Intel
|
|
|
|
|
|
|
| |
NULL. This simplifies decisions around if/how requests are routed through
busdma. It also paves the way for supporting unmapped bios.
Sponsored by: Intel
|
|
|
|
| |
Reported by: bz
|
|
|
|
|
|
|
|
|
| |
1) Consistently use device_printf.
2) Make dump_completion and dump_command into something more
human-readable.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
| |
M_NOWAIT.
Sponsored by: Intel
Suggested by: carl
Reviewed by: carl
|
|
|
|
|
|
|
| |
a controller reset.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
start or reset. Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.
This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state. To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
| |
This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
| |
that if a specific I/O repeatedly times out, we don't retry it indefinitely.
The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.
While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This protects against cases where a controller crashes with multiple
I/O outstanding, each timing out and requesting controller resets
simultaneously.
While here, remove a debugging printf from a previous commit, and add
more logging around I/O that need to be resubmitted after a controller
reset.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
|
| |
While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work. NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
| |
but can be adjusted between a min/max of 5 and 120 seconds.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On any I/O timeout, check for csts.cfs==1. If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.
This changeset also includes deferring the controller start portion
of the reset to a separate task. This ensures we are always performing
a controller start operation from a consistent context.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
invoke it from nvmecontrol(8).
Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer. Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.
Sponsored by: Intel
Reviewed by: carl
|
|
|
|
|
|
| |
This enables in-order re-submission of I/O after a controller reset.
Sponsored by: Intel
|
|
|
|
|
|
|
|
| |
Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.
Sponsored by: Intel
|
|
|
|
|
|
|
| |
This is primarily driven by the need to disable timeouts for asynchronous
event requests, which by nature should not be timed out.
Sponsored by: Intel
|
|
|
|
|
|
| |
controller indicates the command was not found.
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
function.
This allows for completions outside the normal completion path, for example
when an ABORT command fails due to the controller reporting the targeted
command does not exist. This is mainly for protection against a faulty
controller, but we need to clean up our internal request nonetheless.
Sponsored by: Intel
|
|
|
|
|
|
|
|
| |
an I/O times out.
Also ensure that we retry commands that are aborted due to a timeout.
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
the submit action assuming the qpair lock has already been acquired.
Also change nvme_qpair_submit_request to just lock/unlock the mutex
around a call to this new function.
This fixes a recursive mutex acquisition in the retry path.
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
| |
current CPU and not always CPU 0.
This has the added benefit of reducing a huge amount of spinlock
contention on the callout_cpu spinlock for CPU 0.
Sponsored by: Intel
|
| |
|
|
|
|
|
|
|
|
|
| |
This eliminates the need to manage queue depth at the nvd(4) level for
Chatham prototype board workarounds, and also adds the ability to
accept a number of requests on a single qpair that is much larger
than the number of trackers allocated.
Sponsored by: Intel
|
|
|
|
|
|
| |
than dynamically creating them at runtime.
Sponsored by: Intel
|
|
|
|
|
|
|
| |
duplication between the admin and io controller-level submit
functions.
Sponsored by: Intel
|
|
|
|
| |
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
|
| |
nvme_ctrlr_submit_io_request().
While here, also fix case where a uio may have more than 1 iovec.
NVMe's definition of SGEs (called PRPs) only allows for the first SGE to
start on a non-page boundary. The simplest way to handle this is to
construct a temporary uio for each iovec, and submit an NVMe request
for each.
Sponsored by: Intel
|
|
|
|
|
|
|
| |
code for allocating nvme_tracker objects and making calls into
bus_dmamap_load for commands which have payloads.
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
from an NVMe consumer.
This allows us to mostly build NVMe command buffers without holding the
qpair lock, and also allows for future queueing of nvme_request objects
in cases where the submission queue is full and no nvme_tracker objects
are available.
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This simplifies the driver significantly where it is constructing
commands to be submitted to hardware. By reducing the number of
PRPs (NVMe parlance for SGE) from 128 to 32, it ensures we do not
allocate too much memory for more common smaller I/O sizes, while
still supporting up to 128KB I/O sizes.
This also paves the way for pre-allocation of nvme_tracker objects
for each queue which will simplify the I/O path even further.
Sponsored by: Intel
|
|
|
|
|
|
| |
queues.
Sponsored by: Intel
|
|
|
|
|
|
|
| |
Also add sysctls to query and reset each queue pair's stats, including
the new count added here.
Sponsored by: Intel
|
|
support to FreeBSD. A full description of the overall functionality
being added is below. nvmexpress.org defines NVM Express as "an optimized
register interface, command set and feature set fo PCI Express (PCIe)-based
Solid-State Drives (SSDs)."
This commit adds nvme(4) and nvd(4) driver source code and Makefiles
to the tree.
Full NVMe functionality description:
Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe)
device support.
There will continue to be ongoing work on NVM Express support, but there
is more than enough to allow for evaluation of pre-production NVM Express
devices as well as soliciting feedback. Questions and feedback are welcome.
nvme(4) implements NVMe hardware abstraction and is a provider of NVMe
namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN.
nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks.
nvmecontrol(8) is used for NVMe configuration and management.
The following are currently supported:
nvme(4)
- full mandatory NVM command set support
- per-CPU IO queues (enabled by default but configurable)
- per-queue sysctls for statistics and full command/completion queue
dumps for debugging
- registration API for NVMe namespace consumers
- I/O error handling (except for timeoutsee below)
- compilation switches for support back to stable-7
nvd(4)
- BIO_DELETE and BIO_FLUSH (if supported by controller)
- proper BIO_ORDERED handling
nvmecontrol(8)
- devlist: list NVMe controllers and their namespaces
- identify: display controller or namespace identify data in
human-readable or hex format
- perftest: quick and dirty performance test to measure raw
performance of NVMe device without userspace/physio/GEOM
overhead
The following are still work in progress and will be completed over the
next 3-6 months in rough priority order:
- complete man pages
- firmware download and activation
- asynchronous error requests
- command timeout error handling
- controller resets
- nvmecontrol(8) log page retrieval
This has been primarily tested on amd64, with light testing on i386. I
would be happy to provide assistance to anyone interested in porting
this to other architectures, but am not currently planning to do this
work myself. Big-endian and dmamap sync for command/completion queues
are the main areas that would need to be addressed.
The nvme(4) driver currently has references to Chatham, which is an
Intel-developed prototype board which is not fully spec compliant.
These references will all be removed over time.
Sponsored by: Intel
Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>
|