| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
I believe that this patch handled the problem from the wrong side.
Instead of making ZFS properly handle large stripe sizes, it made
unrelated driver to lie in reported parameters to workaround that.
Alternative solution for this problem from ZFS side was committed at
r296615.
Discussed with: smh
|
|
|
|
|
|
|
|
|
|
|
| |
This was a regression from r293328, which deferred allocation
of the controller's ioq array until after interrupts are enabled
during boot.
PR: 207432
Reported and tested by: Andy Carrel <wac@google.com>
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
| |
bus_alloc_resource_any()
Since these calls only use default arguments, bus_alloc_resource_any() is the
right call.
Differential Revision: https://reviews.freebsd.org/D5306
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nvme(4) issues a SET_NUM_QUEUES command during device
initialization to ensure enough I/O queues exists for each
of the MSI-X vectors we have allocated. The SET_NUM_QUEUES
command is then issued again during nvme_ctrlr_start(), to
ensure that is properly set after any controller reset.
At least one NVMe drive exists which fails this second
SET_NUM_QUEUES command during device initialization. So
change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES
command when it is coming out of a reset - avoiding the
duplicate SET_NUM_QUEUES during device initialization.
Reported by: gallatin
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
| |
mode we're in and to set the power mode.
|
|
|
|
|
| |
Suggested by: rpokala
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.
This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue. This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues. Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.
While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.
Reviewed by: gallatin
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
Previously nvme(4) would revert to a signle I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core. This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
| |
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
| |
The issue referenced here was resolved by other changes
in recent commits, so this code is no longer needed.
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
Instead just use num_io_queues to make this determination.
This prepares for some future changes enabling use of multiple
queues when we do not have enough queues or MSI-X vectors
for one queue per CPU.
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
| |
This prepares for some follow-up commits which do more work in
this area.
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.
This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation.
This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize.
MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4446
|
|
|
|
|
| |
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes race condition observed under following circumstances:
1) I/O split on 128KB boundary with Intel NVMe controller.
Current Intel controllers produce better latency when
I/Os do not span a 128KB boundary - even if the I/O size
itself is less than 128KB.
2) Per-CPU I/O queues are enabled.
3) Child I/Os are submitted on different submission queues.
4) Interrupts for child I/O completions occur almost
simultaneously.
5) ithread for child I/O A increments bio_inbed, then
immediately is preempted (rendezvous IPI, higher priority
interrupt).
6) ithread for child I/O B increments bio_inbed, then completes
parent bio since all children are now completed.
7) parent bio is freed, and immediately reallocated for a VFS
or gpart bio (including setting bio_children to 1 and
clearing bio_driver1).
8) ithread for child I/O A resumes processing. bio_children
for what it thinks is the parent bio is set to 1, so it
thinks it needs to complete the parent bio.
Result is either calling a NULL callback function, or double freeing
the bio to its uma zone.
PR: 203746
Reported by: Drew Gallatin <gallatin@netflix.com>,
Marc Goroff <mgoroff@quorum.net>
Tested by: Drew Gallatin <gallatin@netflix.com>
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
| |
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Use pointer assignment rather than a combination of pointers and
flags to switch buffers between unmapped and mapped. This eliminates
multiple flags and generally simplifies the logic.
- Eliminate b_saveaddr since it is only used with pager bufs which have
their b_data re-initialized on each allocation.
- Gather up some convenience routines in the buffer cache for
manipulating buf space and buf malloc space.
- Add an inline, buf_mapped(), to standardize checks around unmapped
buffers.
In collaboration with: mlaier
Reviewed by: kib
Tested by: pho (many small revisions ago)
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
| |
PR: 200458
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
| |
Reported by: Sean Kelly <smkelly@smkelly.org>
MFC after: 3 days
Sponsored by: Intel
|
|
|
|
|
|
|
| |
This fixes i386 PAE build fallout from r281281.
Reported by: bz
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
Chatham was an internal NVMe prototype board used for
early driver development.
MFC after: 1 week
Sponsored by: Intel
|
|
|
|
|
| |
MFC after: 1 week
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
|
| |
Submission and completion queue memory need to use a
separate DMA tag for mappings than payload buffers,
to ensure mappings remain contiguous even with DMAR
enabled.
Submitted by: kib
MFC after: 1 week
Sponsored by: Intel
|
|
|
|
|
|
|
|
|
|
| |
Previously, if per-CPU MSI-X vectors could not be allocated,
nvme(4) would fall back to INTx with a single I/O queue pair.
This change will still fall back to a single I/O queue pair, but
allocate MSI-X vectors instead of reverting to INTx.
MFC after: 1 week
Sponsored by: Intel
|
|
|
|
|
|
|
| |
SET_FEATURES/NUMBER_OF_QUEUES command.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
| |
INTx if necessary.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
| |
instances if modules loaded during boot.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
| |
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
| |
controller initialization.
The spec says OS drivers should send this command after controller
initialization completes successfully, but other NVMe OS drivers are
not sending this command. This change will therefore reduce differences
between the FreeBSD and other OS drivers.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
| |
the spoofed identify data into the user buffer rather than issuing the
command to the controller, since Chatham IDENTIFY data is always spoofed.
While here, fix a bug in the spoofed data for Chatham submission and
completion queue entry sizes.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
| |
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
| |
Approved by: re (implicit)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
controller resets fail.
Sponsored by: Intel
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
they occur.
This prevents repeated notifications of the same event.
Status of these events may be viewed at any time by viewing the
SMART/Health Info Page using nvmecontrol, whether or not asynchronous
events notifications for those events are enabled. This log page can
be viewed using:
nvmecontrol logpage -p 2 <ctrlr id>
Future enhancements may re-enable these notifications on a periodic basis
so that if the notified condition persists, it will continue to be logged.
Sponsored by: Intel
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
on NVMe controllers that do not support it.
Sponsored by: Intel
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
when calculating stats in nvmecontrol perftest.
Sponsored by: Intel
Reported by: Joe Golio <joseph.golio@emc.com>
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
benefit from it.
Sponsored by: Intel
Reviewed by: kib (earlier version), carl
Approved by: re (hrs)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver. The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't. The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot. The cdevsw
is shared among all driver instances.
So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.
sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it
with an SI_UNMAPPED cdev flag.
kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine
whether or not a particular driver can handle
unmapped I/O.
geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs.
Since GEOM will create a temporary mapping when
needed, setting SI_UNMAPPED unconditionally will
work.
Remove the D_UNMAPPED_IO flag.
nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here
if NVME_UNMAPPED_BIO_SUPPORT is enabled.
vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a
cdev instead of the D_UNMAPPED_IO flag on the cdevsw.
sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from
setting the D_UNMAPPED_IO flag in the cdevsw to setting
SI_UNMAPPED in the cdev.
Reviewed by: kib, jimharris
MFC after: 1 week
Sponsored by: Spectra Logic
|
|
|
|
|
|
|
|
| |
namespaces.
Sponsoredy by: Intel
Reviewed by: carl
MFC after: 3 days
|
|
|
|
|
|
|
|
|
| |
notification gets sent in cases where system shuts down with driver
unloaded.
Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis(). This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers. This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.
Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
| |
Recent testing with QEMU that has variable sector size support for
NVMe uncovered some of these issues. Chatham prototype boards supported
only 512 byte sectors.
Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commands during controller initialization.
DELAY() does not work here during config_intrhook context - we need to
explicitly relinquish the CPU for the admin command completion to
get processed.
Sponsored by: Intel
Reported by: Adam Brooks <adam.j.brooks@intel.com>
Reviewed by: carl
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
| |
and firmware revision in the controller's identify structure.
Also modify consumers of these fields to ensure they only use the
specified number of bytes for their respective fields.
Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
|
|
| |
if not already defined elsewhere.
Requested by: attilio
MFC after: 3 days
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
| |
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
| |
nvme_attach() and nvme_detach() respectively.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a new firmware command.
NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions. This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image. It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.
Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.
Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days
|