summaryrefslogtreecommitdiffstats
path: root/sys/dev/nvme
Commit message (Collapse)AuthorAgeFilesLines
* Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4Kmav2016-03-103-35/+0
| | | | | | | | | | | I believe that this patch handled the problem from the wrong side. Instead of making ZFS properly handle large stripe sizes, it made unrelated driver to lie in reported parameters to workaround that. Alternative solution for this problem from ZFS side was committed at r296615. Discussed with: smh
* nvme: fix intx handler to not dereference ioq during initializationjimharris2016-02-241-1/+1
| | | | | | | | | | | This was a regression from r293328, which deferred allocation of the controller's ioq array until after interrupts are enabled during boot. PR: 207432 Reported and tested by: Andy Carrel <wac@google.com> MFC after: 3 days Sponsored by: Intel
* Replace several bus_alloc_resource() calls using default arguments with ↵jhibbits2016-02-191-4/+4
| | | | | | | | | bus_alloc_resource_any() Since these calls only use default arguments, bus_alloc_resource_any() is the right call. Differential Revision: https://reviews.freebsd.org/D5306
* nvme: avoid duplicate SET_NUM_QUEUES commandsjimharris2016-02-111-8/+10
| | | | | | | | | | | | | | | | | | nvme(4) issues a SET_NUM_QUEUES command during device initialization to ensure enough I/O queues exists for each of the MSI-X vectors we have allocated. The SET_NUM_QUEUES command is then issued again during nvme_ctrlr_start(), to ensure that is properly set after any controller reset. At least one NVMe drive exists which fails this second SET_NUM_QUEUES command during device initialization. So change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES command when it is coming out of a reset - avoiding the duplicate SET_NUM_QUEUES during device initialization. Reported by: gallatin MFC after: 3 days Sponsored by: Intel
* Implement power command to list all power modes, find out the powerimp2016-01-301-1/+29
| | | | mode we're in and to set the power mode.
* nvme: replace NVME_CEILING macro with howmany()jimharris2016-01-071-9/+3
| | | | | Suggested by: rpokala MFC after: 3 days
* nvme: add hw.nvme.min_cpus_per_ioq tunablejimharris2016-01-072-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | Due to FreeBSD system-wide limits on number of MSI-X vectors (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321), it may be desirable to allocate fewer than the maximum number of vectors for an NVMe device, in order to save vectors for other devices (usually Ethernet) that can take better advantage of them and may be probed after NVMe. This tunable is expressed in terms of minimum number of CPUs per I/O queue instead of max number of queues per controller, to allow for a more even distribution of CPUs per queue. This avoids cases where some number of CPUs have a dedicated queue, but other CPUs need to share queues. Ideally the PR referenced above will eventually be fixed and the mechanism implemented here becomes obsolete anyways. While here, fix a bug in the CPUs per I/O queue calculation to properly account for the admin queue's MSI-X vector. Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel
* nvme: do not revert o single I/O queue when per-CPU queues not possiblejimharris2016-01-073-64/+106
| | | | | | | | | | | Previously nvme(4) would revert to a signle I/O queue if it could not allocate enought interrupt vectors or NVMe submission/completion queues to have one I/O queue per core. This patch determines how to utilize a smaller number of available interrupt vectors, and assigns (as closely as possible) an equal number of cores to each associated I/O queue. MFC after: 3 days Sponsored by: Intel
* nvme: break out interrupt setup code into a separate functionjimharris2016-01-071-66/+63
| | | | | MFC after: 3 days Sponsored by: Intel
* nvme: do not pre-allocate MSI-X IRQ resourcesjimharris2016-01-073-39/+3
| | | | | | | | The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel
* nvme: remove per_cpu_io_queues from struct nvme_controllerjimharris2016-01-072-9/+3
| | | | | | | | | | | Instead just use num_io_queues to make this determination. This prepares for some future changes enabling use of multiple queues when we do not have enough queues or MSI-X vectors for one queue per CPU. MFC after: 3 days Sponsored by: Intel
* nvme: simplify some of the nested ifs in interrupt setup codejimharris2016-01-071-16/+20
| | | | | | | | This prepares for some follow-up commits which do more work in this area. MFC after: 3 days Sponsored by: Intel
* Limit stripesize reported from nvd(4) to 4Ksmh2015-12-113-0/+35
| | | | | | | | | | | | Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB. This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation. This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize. MFC after: 1 week Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4446
* nvd, nvme: report stripesize through GEOM disk layerjimharris2015-10-302-0/+8
| | | | | MFC after: 3 days Sponsored by: Intel
* nvme: fix race condition in split bio completion pathjimharris2015-10-301-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes race condition observed under following circumstances: 1) I/O split on 128KB boundary with Intel NVMe controller. Current Intel controllers produce better latency when I/Os do not span a 128KB boundary - even if the I/O size itself is less than 128KB. 2) Per-CPU I/O queues are enabled. 3) Child I/Os are submitted on different submission queues. 4) Interrupts for child I/O completions occur almost simultaneously. 5) ithread for child I/O A increments bio_inbed, then immediately is preempted (rendezvous IPI, higher priority interrupt). 6) ithread for child I/O B increments bio_inbed, then completes parent bio since all children are now completed. 7) parent bio is freed, and immediately reallocated for a VFS or gpart bio (including setting bio_children to 1 and clearing bio_driver1). 8) ithread for child I/O A resumes processing. bio_children for what it thinks is the parent bio is set to 1, so it thinks it needs to complete the parent bio. Result is either calling a NULL callback function, or double freeing the bio to its uma zone. PR: 203746 Reported by: Drew Gallatin <gallatin@netflix.com>, Marc Goroff <mgoroff@quorum.net> Tested by: Drew Gallatin <gallatin@netflix.com> MFC after: 3 days Sponsored by: Intel
* nvme: do not notify a consumer about failures that occur during initializationjimharris2015-07-291-0/+9
| | | | | MFC after: 3 days Sponsored by: Intel
* Refactor unmapped buffer address handling.jeff2015-07-231-1/+0
| | | | | | | | | | | | | | | | | - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division
* nvme: ensure csts.rdy bit is cleared before returning from nvme_ctrlr_disablejimharris2015-07-231-10/+12
| | | | | | PR: 200458 MFC after: 3 days Sponsored by: Intel
* nvme: properly handle case where pci_alloc_msix does not alloc all vectorsjimharris2015-07-231-6/+28
| | | | | | Reported by: Sean Kelly <smkelly@smkelly.org> MFC after: 3 days Sponsored by: Intel
* nvme: use BUS_SPACE_MAXSIZE for bus_dma_tag_create maxsize parameterjimharris2015-04-091-1/+1
| | | | | | | This fixes i386 PAE build fallout from r281281. Reported by: bz MFC after: 1 week
* nvme: remove CHATHAM related codejimharris2015-04-085-252/+10
| | | | | | | | Chatham was an internal NVMe prototype board used for early driver development. MFC after: 1 week Sponsored by: Intel
* nvme: add device strings for Intel DC series NVMe SSDsjimharris2015-04-081-10/+38
| | | | | MFC after: 1 week Sponsored by: Intel
* nvme: create separate DMA tag for non-payload DMA buffersjimharris2015-04-082-9/+41
| | | | | | | | | | | Submission and completion queue memory need to use a separate DMA tag for mappings than payload buffers, to ensure mappings remain contiguous even with DMAR enabled. Submitted by: kib MFC after: 1 week Sponsored by: Intel
* nvme: fall back to a smaller MSI-X vector allocation if necessaryjimharris2015-04-081-1/+9
| | | | | | | | | | Previously, if per-CPU MSI-X vectors could not be allocated, nvme(4) would fall back to INTx with a single I/O queue pair. This change will still fall back to a single I/O queue pair, but allocate MSI-X vectors instead of reverting to INTx. MFC after: 1 week Sponsored by: Intel
* Use bitwise OR instead of logical OR when constructing value forjimharris2014-06-101-1/+1
| | | | | | | SET_FEATURES/NUMBER_OF_QUEUES command. Sponsored by: Intel MFC after: 3 days
* nvme: Allocate all MSI resources up front so that we can fall back tojimharris2014-03-183-7/+44
| | | | | | | INTx if necessary. Sponsored by: Intel MFC after: 3 days
* nvme: Close hole where nvd(4) would not be notified of all nvme(4)jimharris2014-03-183-29/+73
| | | | | | | instances if modules loaded during boot. Sponsored by: Intel MFC after: 3 days
* nvme: NVMe specification dictates 4-byte alignment for PRPs (not 8).jimharris2014-03-171-1/+2
| | | | | Sponsored by: Intel MFC after: 3 days
* nvme: Remove the software progress marker SET_FEATURE command duringjimharris2014-03-171-10/+0
| | | | | | | | | | | | controller initialization. The spec says OS drivers should send this command after controller initialization completes successfully, but other NVMe OS drivers are not sending this command. This change will therefore reduce differences between the FreeBSD and other OS drivers. Sponsored by: Intel MFC after: 3 days
* For IDENTIFY passthrough commands to Chatham prototype controllers, copyjimharris2014-01-061-2/+23
| | | | | | | | | | | the spoofed identify data into the user buffer rather than issuing the command to the controller, since Chatham IDENTIFY data is always spoofed. While here, fix a bug in the spoofed data for Chatham submission and completion queue entry sizes. Sponsored by: Intel MFC after: 3 days
* Create a unique unit number for each controller and namespace cdev.jimharris2013-11-012-4/+11
| | | | | Sponsored by: Intel MFC after: 3 days
* Fix the LINT build.jimharris2013-10-081-0/+1
| | | | | Approved by: re (implicit) MFC after: 1 week
* Do not leak resources during attach if nvme_ctrlr_construct() or the initialjimharris2013-10-081-3/+9
| | | | | | | | | controller resets fail. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week
* Log and then disable asynchronous notification of persistent events afterjimharris2013-10-082-7/+56
| | | | | | | | | | | | | | | | | | | | | they occur. This prevents repeated notifications of the same event. Status of these events may be viewed at any time by viewing the SMART/Health Info Page using nvmecontrol, whether or not asynchronous events notifications for those events are enabled. This log page can be viewed using: nvmecontrol logpage -p 2 <ctrlr id> Future enhancements may re-enable these notifications on a periodic basis so that if the notified condition persists, it will continue to be logged. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week
* Do not enable temperature threshold as an asynchronous event notificationjimharris2013-10-081-0/+14
| | | | | | | | | on NVMe controllers that do not support it. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week
* Extend some 32-bit fields and variables to 64-bit to prevent overflowjimharris2013-10-082-5/+5
| | | | | | | | | | when calculating stats in nvmecontrol perftest. Sponsored by: Intel Reported by: Joe Golio <joseph.golio@emc.com> Reviewed by: carl Approved by: re (hrs) MFC after: 1 week
* Add driver-assisted striping for upcoming Intel NVMe controllers that canjimharris2013-10-083-1/+225
| | | | | | | | | benefit from it. Sponsored by: Intel Reviewed by: kib (earlier version), carl Approved by: re (hrs) MFC after: 1 week
* Change the way that unmapped I/O capability is advertised.ken2013-08-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic
* If a controller fails to initialize, do not notify consumers (nvd) of itsjimharris2013-08-131-0/+9
| | | | | | | | namespaces. Sponsoredy by: Intel Reviewed by: carl MFC after: 3 days
* Send a shutdown notification in the driver unload path, to ensurejimharris2013-08-134-28/+51
| | | | | | | | | notification gets sent in cases where system shuts down with driver unloaded. Sponsored by: Intel Reviewed by: carl MFC after: 3 days
* Add message when nvd disks are attached and detached.jimharris2013-07-193-3/+65
| | | | | | | | | | | | As part of this commit, add an nvme_strvis() function which borrows heavily from cam_strvis(). This will allow stripping of leading/trailing whitespace and also handle unprintable characters in model/serial numbers. This function goes into a new nvme_util.c file which is used by both the driver and nvmecontrol. Sponsored by: Intel Reviewed by: carl MFC after: 3 days
* Fix nvme(4) and nvd(4) to support non 512-byte sector sizes.jimharris2013-07-192-4/+15
| | | | | | | | | | Recent testing with QEMU that has variable sector size support for NVMe uncovered some of these issues. Chatham prototype boards supported only 512 byte sectors. Sponsored by: Intel Reviewed by: carl MFC after: 3 days
* Use pause() instead of DELAY() when polling for completion of adminjimharris2013-07-171-4/+4
| | | | | | | | | | | | | commands during controller initialization. DELAY() does not work here during config_intrhook context - we need to explicitly relinquish the CPU for the admin command completion to get processed. Sponsored by: Intel Reported by: Adam Brooks <adam.j.brooks@intel.com> Reviewed by: carl MFC after: 3 days
* Define constants for the lengths of the serial number, model numberjimharris2013-07-171-3/+7
| | | | | | | | | | | and firmware revision in the controller's identify structure. Also modify consumers of these fields to ensure they only use the specified number of bytes for their respective fields. Sponsored by: Intel Reviewed by: carl MFC after: 3 days
* Fix a poorly worded comment in nvme(4).jimharris2013-07-111-3/+3
| | | | MFC after: 3 days
* Add comment explaining why CACHE_LINE_SIZE is defined in nvme_private.hjimharris2013-07-091-0/+4
| | | | | | | if not already defined elsewhere. Requested by: attilio MFC after: 3 days
* Update copyright dates.jimharris2013-07-099-9/+9
| | | | MFC after: 3 days
* Do not retry failed async event requests.jimharris2013-07-091-5/+5
| | | | | Sponsored by: Intel MFC after: 3 days
* Add pci_enable_busmaster() and pci_disable_busmaster() calls injimharris2013-07-091-0/+3
| | | | | | | nvme_attach() and nvme_detach() respectively. Sponsored by: Intel MFC after: 3 days
* Add firmware replacement and activation support to nvmecontrol(8) throughjimharris2013-06-271-0/+6
| | | | | | | | | | | | | | | | | | | | a new firmware command. NVMe controllers may support up to 7 firmware slots for storing of different firmware revisions. This new firmware command supports firmware replacement (i.e. firmware download) with or without immediate activation, or activation of a previously stored firmware image. It also supports selection of the firmware slot during replacement operations, using IDENTIFY information from the controller to check that the specified slot is valid. Newly activated firmware does not take effect until the new controller reset, either via a reboot or separate 'nvmecontrol reset' command to the same controller. Submitted by: Joe Golio <joseph.golio@emc.com> Obtained from: EMC / Isilon Storage Division MFC after: 3 days
OpenPOWER on IntegriCloud