summaryrefslogtreecommitdiffstats
path: root/drivers/scsi/scsi_lib.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2018-06-101-0/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc, xfcp, hisi_sas, cxlflash, qla2xxx. In the absence of Nic, we're also taking target updates which are mostly minor except for the tcmu refactor. The only real core change to worry about is the removal of high page bouncing (in sas, storvsc and iscsi). This has been well tested and no problems have shown up so far" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits) scsi: lpfc: update driver version to 12.0.0.4 scsi: lpfc: Fix port initialization failure. scsi: lpfc: Fix 16gb hbas failing cq create. scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc scsi: lpfc: correct oversubscription of nvme io requests for an adapter scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx) scsi: hisi_sas: Mark PHY as in reset for nexus reset scsi: hisi_sas: Fix return value when get_free_slot() failed scsi: hisi_sas: Terminate STP reject quickly for v2 hw scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot scsi: hisi_sas: Try wait commands before before controller reset scsi: hisi_sas: Init disks after controller reset scsi: hisi_sas: Create a scsi_host_template per HW module scsi: hisi_sas: Reset disks when discovered scsi: hisi_sas: Add LED feature for v3 hw scsi: hisi_sas: Change common allocation mode of device id scsi: hisi_sas: change slot index allocation mode scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate() scsi: hisi_sas: fix a typo in hisi_sas_task_prep() ...
| * scsi: core: sanitize++ in progressDouglas Gilbert2018-05-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 505aa4b6a883 ("scsi: sd: Defer spinning up drive while SANITIZE is in progress") may not be sufficient, especially if the SCSI SANITIZE command is sent via the bsg or sg pass-throughs, since they don't use the sd driver. Add "Sanitize in progress" plus some other recent "... in progress" additional sense codes into the scsi mid-level so they are treated in a similar fashion to "Format in progress". [mkp: checkpatch] Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2018-06-041-22/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull dma-mapping updates from Christoph Hellwig: - replace the force_dma flag with a dma_configure bus method. (Nipun Gupta, although one patch is іncorrectly attributed to me due to a git rebase bug) - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai) - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the right thing for bounce buffering. - move dma-debug initialization to common code, and apply a few cleanups to the dma-debug code. - cleanup the Kconfig mess around swiotlb selection - swiotlb comment fixup (Yisheng Xie) - a trivial swiotlb fix. (Dan Carpenter) - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt) - add a new generic dma-noncoherent dma_map_ops implementation and use it for arc, c6x and nds32. - improve scatterlist validity checking in dma-debug. (Robin Murphy) - add a struct device quirk to limit the dma-mask to 32-bit due to bridge/system issues, and switch x86 to use it instead of a local hack for VIA bridges. - handle devices without a dma_mask more gracefully in the dma-direct code. * tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits) dma-direct: don't crash on device without dma_mask nds32: use generic dma_noncoherent_ops nds32: implement the unmap_sg DMA operation nds32: consolidate DMA cache maintainance routines x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag x86/pci-dma: remove the explicit nodac and allowdac option x86/pci-dma: remove the experimental forcesac boot option Documentation/x86: remove a stray reference to pci-nommu.c core, dma-direct: add a flag 32-bit dma limits dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs dma-debug: check scatterlist segments c6x: use generic dma_noncoherent_ops arc: use generic dma_noncoherent_ops arc: fix arc_dma_{map,unmap}_page arc: fix arc_dma_sync_sg_for_{cpu,device} arc: simplify arc_dma_sync_single_for_{cpu,device} dma-mapping: provide a generic dma-noncoherent implementation dma-mapping: simplify Kconfig dependencies riscv: add swiotlb support riscv: only enable ZONE_DMA32 for 64-bit ...
| * | scsi: reduce use of block bounce buffersChristoph Hellwig2018-05-071-22/+2
| |/ | | | | | | | | | | | | | | | | | | We can rely on the dma-mapping code to handle any DMA limits that is bigger than the ISA DMA mask for us (either using an iommu or swiotlb), so remove setting the block layer bounce limit for anything but the unchecked_isa_dma case, or the bouncing for highmem pages. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk>
* | block: consistently use GFP_NOIO instead of __GFP_NORECLAIMChristoph Hellwig2018-05-141-1/+1
| | | | | | | | | | | | | | | | | | Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: sanitize blk_get_request calling conventionsChristoph Hellwig2018-05-141-1/+1
|/ | | | | | | | | Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: core: Make scsi_result_to_blk_status() recognize CONDITION METBart Van Assche2018-04-091-0/+9
| | | | | | | | | | | | | | | Ensure that CONDITION MET and other non-zero status values that indicate success are translated into BLK_STS_OK. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lee Duncan <lduncan@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: core: Rename __scsi_error_from_host_byte() into ↵Bart Van Assche2018-04-091-9/+9
| | | | | | | | | | | | | | | | | | | scsi_result_to_blk_status() Since the next patch will modify this function such that it checks more than just the host byte of the SCSI result, rename __scsi_error_from_host_byte() into scsi_result_to_blk_status(). This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lee Duncan <lduncan@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Revert "scsi: core: return BLK_STS_OK for DID_OK in ↵Bart Van Assche2018-04-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | __scsi_error_from_host_byte()" The description of commit e39a97353e53 is wrong: it mentions that commit 2a842acab109 introduced a bug in __scsi_error_from_host_byte() although that commit did not change the behavior of that function. Additionally, commit e39a97353e53 introduced a bug: it causes commands that fail with hostbyte=DID_OK and driverbyte=DRIVER_SENSE to be completed with BLK_STS_OK. Hence revert that commit. Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()") Reported-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lee Duncan <lduncan@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge tag 'scsi-for-linus' of ↵Linus Torvalds2018-04-051-2/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc, ufs, mpt3sas, hisi_sas. In addition we have removed several really old drivers: sym53c416, NCR53c406a, fdomain, fdomain_cs and removed the old scsi_module.c initialization from all remaining drivers. Plus an assortment of bug fixes, initialization errors and other minor fixes" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (168 commits) scsi: ufs: Add support for Auto-Hibernate Idle Timer scsi: ufs: sysfs: reworking of the rpm_lvl and spm_lvl entries scsi: qla2xxx: fx00 copypaste typo scsi: qla2xxx: fix error message on <qla2400 scsi: smartpqi: update driver version scsi: smartpqi: workaround fw bug for oq deletion scsi: arcmsr: Change driver version to v1.40.00.05-20180309 scsi: arcmsr: Sleep to avoid CPU stuck too long for waiting adapter ready scsi: arcmsr: Handle adapter removed due to thunderbolt cable disconnection. scsi: arcmsr: Rename ACB_F_BUS_HANG_ON to ACB_F_ADAPTER_REMOVED for adapter hot-plug scsi: qla2xxx: Update driver version to 10.00.00.06-k scsi: qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan scsi: qla2xxx: Cleanup code to improve FC-NVMe error handling scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset scsi: qla2xxx: Fix retry for PRLI RJT with reason of BUSY scsi: qla2xxx: Remove nvme_done_list scsi: qla2xxx: Return busy if rport going away scsi: qla2xxx: Fix n2n_ae flag to prevent dev_loss on PDB change scsi: qla2xxx: Add FC-NVMe abort processing scsi: qla2xxx: Add changes for devloss timeout in driver ...
| * Merge branch 'fixes' into miscJames Bottomley2018-04-031-0/+4
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Somewhat nasty merge due to conflicts between "33b28357dd00 scsi: qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan" and "2b5b96473efc scsi: qla2xxx: Fix FC-NVMe LUN discovery" Merge is non-trivial and has been verified by Qlogic (Cavium) Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
| * | scsi: core: Make SCSI Status CONDITION MET equivalent to GOODDouglas Gilbert2018-03-121-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SCSI PRE-FETCH (10 or 16) command is present both on hard disks and some SSDs. It is useful when the address of the next block(s) to be read is known but it is not following the LBA of the current READ (so read-ahead won't help). It returns two "good" SCSI Status values. If the requested blocks have fitted (or will most likely fit (when the IMMED bit is set)) into the disk's cache, it returns CONDITION MET. If it didn't (or will not) fit then it returns GOOD status. The goal of this patch is to stop the SCSI subsystem treating the CONDITION MET SCSI status as an error. The current state makes the PRE-FETCH command effectively unusable via pass-throughs. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: use blk_mq_requeue_request in __scsi_queue_insertJianchao Wang2018-03-061-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In scsi core, __scsi_queue_insert should just put request back on the queue and retry using the same command as before. However, for blk-mq, scsi_mq_requeue_cmd is employed here which will unprepare the request. To align with the semantics of __scsi_queue_insert, use blk_mq_requeue_request with kick_requeue_list == true and put the reference of scsi_device. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Reduce number of scsi_test_unit_ready() retriesBart Van Assche2018-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make scsi_test_unit_ready() send at most as many TURs as specified in the 'retries' argument instead of retries * (retries + 1) / 2. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds2018-04-051-3/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
| * | | bsg: split handling of SCSI CDBs vs transport requeuesChristoph Hellwig2018-03-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current BSG design tries to shoe-horn the transport-specific passthrough commands into the overall framework for SCSI passthrough requests. This has a couple problems: - each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag despite not dealing with SCSI commands at all. Because of that these queues could also incorrectly accept SCSI commands from in-kernel users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl. - the real SCSI bsg queues also incorrectly accept bsg requests of the BSG_SUB_PROTOCOL_SCSI_TRANSPORT type - the bsg transport code is almost unredable because it tries to reuse different SCSI concepts for its own purpose. This patch instead adds a new bsg_ops structure to handle the two cases differently, and thus solves all of the above problems. Another side effect is that the bsg-lib queues also don't need to embedd a struct scsi_request anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()Bart Van Assche2018-03-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch has been generated as follows: for verb in set_unlocked clear_unlocked set clear; do replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \ $(git grep -lw queue_flag_${verb} drivers block/bsg*) done Except for protecting all queue flag changes with the queue lock this patch does not change any functionality. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: Add 'lock' as third argument to blk_alloc_queue_node()Bart Van Assche2018-02-281-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2018-03-071-0/+4
|\ \ \ | |/ / |/| / | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is mostly fixes for driver specific issues (nine of them) and the storvsc performance improvement with interrupt handling which was dropped from the previous fixes pull request. We also have two regressions: one is a double call_rcu() in ATA error handling and the other is a missed conversion to BLK_STS_OK in __scsi_error_from_host_byte()" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qedi: Fix kernel crash during port toggle scsi: qla2xxx: Fix FC-NVMe LUN discovery scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte() scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops scsi: qla2xxx: ensure async flags are reset correctly scsi: qla2xxx: do not check login_state if no loop id is assigned scsi: qla2xxx: Fixup locking for session deletion scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS scsi: mpt3sas: wait for and flush running commands on shutdown/unload scsi: mpt3sas: fix oops in error handlers after shutdown/unload scsi: storvsc: Spread interrupts when picking a channel for I/O requests scsi: megaraid_sas: Do not use 32-bit atomic request descriptor for Ventura controllers
| * scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()Hannes Reinecke2018-03-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | When converting __scsi_error_from_host_byte() to BLK_STS error codes the case DID_OK was forgotten, resulting in it always returning an error. Fixes: 2a842acab109 ("block: introduce new block status code type") Cc: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: core: Avoid that ATA error handling can trigger a kernel hang or oopsBart Van Assche2018-03-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid that the recently introduced call_rcu() call in the SCSI core triggers a double call_rcu() call. Reported-by: Natanael Copa <ncopa@alpinelinux.org> Reported-by: Damien Le Moal <damien.lemoal@wdc.com> References: https://bugzilla.kernel.org/show_bug.cgi?id=198861 Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets woken up") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Natanael Copa <ncopa@alpinelinux.org> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Alexandre Oliva <oliva@gnu.org> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-blockLinus Torvalds2018-02-041-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: "Most of this is fixes and not new code/features: - skd fix from Arnd, fixing a build error dependent on sla allocator type. - blk-mq scheduler discard merging fixes, one from me and one from Keith. This fixes a segment miscalculation for blk-mq-sched, where we mistakenly think two segments are physically contigious even though the request isn't carrying real data. Also fixes a bio-to-rq merge case. - Don't re-set a bit on the buffer_head flags, if it's already set. This can cause scalability concerns on bigger machines and workloads. From Kemi Wang. - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to distuingish between a local (device related) resource starvation and a global one. The latter might happen without IO being in flight, so it has to be handled a bit differently. From Ming" * tag 'for-linus-20180204' of git://git.kernel.dk/linux-block: block: skd: fix incorrect linux/slab_def.h inclusion buffer: Avoid setting buffer bits that are already set blk-mq-sched: Enable merging discard bio into request blk-mq: fix discard merge with scheduler attached blk-mq: introduce BLK_STS_DEV_RESOURCE
| * | blk-mq: introduce BLK_STS_DEV_RESOURCEMing Lei2018-01-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'usercopy-v4.16-rc1' of ↵Linus Torvalds2018-02-031-4/+5
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardened usercopy whitelisting from Kees Cook: "Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage" * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits) lkdtm: Update usercopy tests for whitelisting usercopy: Restrict non-usercopy caches to size 0 kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl kvm: whitelist struct kvm_vcpu_arch arm: Implement thread_struct whitelist for hardened usercopy arm64: Implement thread_struct whitelist for hardened usercopy x86: Implement thread_struct whitelist for hardened usercopy fork: Provide usercopy whitelisting for task_struct fork: Define usercopy region in thread_stack slab caches fork: Define usercopy region in mm_struct slab caches net: Restrict unwhitelisted proto caches to size 0 sctp: Copy struct sctp_sock.autoclose to userspace using put_user() sctp: Define usercopy region in SCTP proto slab cache caif: Define usercopy region in caif proto slab cache ip: Define usercopy region in IP proto slab cache net: Define usercopy region in struct proto slab cache scsi: Define usercopy region in scsi_sense_cache slab cache cifs: Define usercopy region in cifs_request slab cache vxfs: Define usercopy region in vxfs_inode slab cache ufs: Define usercopy region in ufs_inode_cache slab cache ...
| * | | scsi: Define usercopy region in scsi_sense_cache slab cacheDavid Windsor2018-01-151-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore contained in the scsi_sense_cache slab cache, need to be copied to/from userspace. cache object allocation: drivers/scsi/scsi_lib.c: scsi_select_sense_cache(...): return ... ? scsi_sense_isadma_cache : scsi_sense_cache scsi_alloc_sense_buffer(...): return kmem_cache_alloc_node(scsi_select_sense_cache(), ...); scsi_init_request(...): ... cmd->sense_buffer = scsi_alloc_sense_buffer(...); ... cmd->req.sense = cmd->sense_buffer example usage trace: block/scsi_ioctl.c: (inline from sg_io) blk_complete_sghdr_rq(...): struct scsi_request *req = scsi_req(rq); ... copy_to_user(..., req->sense, len) scsi_cmd_ioctl(...): sg_io(...); In support of usercopy hardening, this patch defines a region in the scsi_sense_cache slab cache in which userspace copy operations are allowed. This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log, provide usage trace] Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
* | | | Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2018-01-311-17/+33
|\ \ \ \ | |_|/ / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This is mostly updates of the usual driver suspects: arcmsr, scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas, hisi_sas. We also have a rework of the libsas hotplug handling to make it more robust, a slew of 32 bit time conversions and fixes, and a host of the usual minor updates and style changes. The biggest potential for regressions is the libsas hotplug changes, but so far they seem stable under testing" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits) scsi: qla2xxx: Fix logo flag for qlt_free_session_done() scsi: arcmsr: avoid do_gettimeofday scsi: core: Add VENDOR_SPECIFIC sense code definitions scsi: qedi: Drop cqe response during connection recovery scsi: fas216: fix sense buffer initialization scsi: ibmvfc: Remove unneeded semicolons scsi: hisi_sas: fix a bug in hisi_sas_dev_gone() scsi: hisi_sas: directly attached disk LED feature for v2 hw scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw scsi: megaraid_sas: NVMe passthrough command support scsi: megaraid: use ktime_get_real for firmware time scsi: fnic: use 64-bit timestamps scsi: qedf: Fix error return code in __qedf_probe() scsi: devinfo: fix format of the device list scsi: qla2xxx: Update driver version to 10.00.00.05-k scsi: qla2xxx: Add XCB counters to debugfs scsi: qla2xxx: Fix queue ID for async abort with Multiqueue scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event() scsi: qla2xxx: Fix warning during port_name debug print scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout() ...
| * | scsi: core: Change third __scsi_queue_insert() argument from int to boolBart Van Assche2018-01-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does not change any functionality but makes the SCSI core source code slightly easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Unexport scsi_initialize_rq()Bart Van Assche2017-12-071-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough") removed the only call to scsi_initialize_rq() from outside the SCSI core. Hence unexport scsi_initialize_rq(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Ensure that the SCSI error handler gets woken upBart Van Assche2017-12-071-11/+28
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If scsi_eh_scmd_add() is called concurrently with scsi_host_queue_ready() while shost->host_blocked > 0 then it can happen that neither function wakes up the SCSI error handler. Fix this by making every function that decreases the host_busy counter wake up the error handler if necessary and by protecting the host_failed checks with the SCSI host lock. Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> References: https://marc.info/?l=linux-kernel&m=150461610630736 Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: core: run queue if SCSI device queue isn't ready and queue is idleMing Lei2017-12-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq"), we run queue after 3ms if queue is idle and SCSI device queue isn't ready, which is done in handling BLK_STS_RESOURCE. After commit 0df21c86bdbf is introduced, queue won't be run any more under this situation. IO hang is observed when timeout happened, and this patch fixes the IO hang issue by running queue after delay in scsi_dev_queue_ready, just like non-mq. This issue can be triggered by the following script[1]. There is another issue which can be covered by running idle queue: when .get_budget() is called on request coming from hctx->dispatch_list, if one request just completes during .get_budget(), we can't depend on SCSI's restart to make progress any more. This patch fixes the race too. With this patch, we basically recover to previous behaviour (before commit 0df21c86bdbf) of handling idle queue when running out of resource. [1] script for test/verify SCSI timeout rmmod scsi_debug modprobe scsi_debug max_queue=1 DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*` echo "using scsi device $DEVICE" echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth echo "temporary write through" >$DISK_DIR/cache_type echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts echo none > /sys/block/$DEVICE/queue/scheduler dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 & sleep 5 echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts wait echo "SUCCESS" Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: use dma_get_cache_alignment() as minimum DMA alignmentHuacai Chen2017-11-211-4/+6
|/ | | | | | | | | | | | | | | In non-coherent DMA mode, kernel uses cache flushing operations to maintain I/O coherency, so scsi's block queue should be aligned to the value returned by dma_get_cache_alignment(). Otherwise, If a DMA buffer and a kernel structure share a same cache line, and if the kernel structure has dirty data, cache_invalidate (no writeback) will cause data corruption. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhc@lemote.com> [hch: rebased and updated the comment and changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2017-11-141-1/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor updates. There's no major behaviour change or additions to the core in all of this, so the potential for regressions should be small (biggest potential being in the scsi error handler changes)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits) scsi: lpfc: Fix hard lock up NMI in els timeout handling. scsi: mpt3sas: remove a stray KERN_INFO scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event() scsi: aacraid: use timespec64 instead of timeval scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair() scsi: mpt3sas: fix dma_addr_t casts scsi: be2iscsi: Use kasprintf scsi: storvsc: Avoid excessive host scan on controller change scsi: lpfc: fix kzalloc-simple.cocci warnings scsi: mpt3sas: Update mpt3sas driver version. scsi: mpt3sas: Fix sparse warnings scsi: mpt3sas: Fix nvme drives checking for tlr. scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives. scsi: mpt3sas: scan and add nvme device after controller reset scsi: mpt3sas: Set NVMe device queue depth as 128 scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware. scsi: mpt3sas: API's to remove nvme drive from sml scsi: mpt3sas: API 's to support NVMe drive addition to SML ...
| * scsi: scsi_error: Handle power-on reset unit attentionHannes Reinecke2017-10-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | As per SAM there is a status precedence, with any sense code 29/XX taking second place just after an ACA ACTIVE status. Additionally, each target might prefer to not queue any unit attention conditions, but just report one. Due to the above, this will be that one with the highest precedence. This results in the sense code 29/XX effectively overwriting any other unit attention. Hence we should report the power-on reset to userland so that it can take appropriate action. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: sd_zbc: Fix comments and indentationDamien Le Moal2017-10-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | Fix comments style (use kernel-doc style) and content to clarify some functions. Also fix some functions signature indentation and remove a useless blank line in sd_zbc_read_zones(). No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-11-141-30/+69
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
| * | block, scsi: Make SCSI quiesce and resume work reliablyBart Van Assche2017-11-101-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The contexts from which a SCSI device can be quiesced or resumed are: * Writing into /sys/class/scsi_device/*/device/state. * SCSI parallel (SPI) domain validation. * The SCSI device power management methods. See also scsi_bus_pm_ops. It is essential during suspend and resume that neither the filesystem state nor the filesystem metadata in RAM changes. This is why while the hibernation image is being written or restored that SCSI devices are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() and scsi_device_resume(). In the SDEV_QUIESCE state execution of non-preempt requests is deferred. This is realized by returning BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI devices. Avoid that a full queue prevents power management requests to be submitted by deferring allocation of non-preempt requests for devices in the quiesced state. This patch has been tested by running the following commands and by verifying that after each resume the fio job was still running: for ((i=0; i<10; i++)); do ( cd /sys/block/md0/md && while true; do [ "$(<sync_action)" = "idle" ] && echo check > sync_action sleep 1 done ) & pids=($!) for d in /sys/class/block/sd*[a-z]; do bdev=${d#/sys/class/block/} hcil=$(readlink "$d/device") hcil=${hcil#../../../} echo 4 > "$d/queue/nr_requests" echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \ --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \ --iodepth_batch=1 --thread --loops=$((2**31)) & pids+=($!) done sleep 1 echo "$(date) Hibernating ..." >>hibernate-test-log.txt systemctl hibernate sleep 10 kill "${pids[@]}" echo idle > /sys/block/md0/md/sync_action wait echo "$(date) Done." >>hibernate-test-log.txt done Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ide, scsi: Tell the block layer at request allocation time about preempt ↵Bart Van Assche2017-11-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | requests Convert blk_get_request(q, op, __GFP_RECLAIM) into blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: don't handle failure in .get_budgetMing Lei2017-11-041-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is enough to just check if we can get the budget via .get_budget(). And we don't need to deal with device state change in .get_budget(). For SCSI, one issue to be fixed is that we have to call scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails to handle the request. And it isn't enough to simply call blk_mq_end_request() to do that if this request is marked as RQF_DONTPREP. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | SCSI: don't get target/host busy_count in scsi_mq_get_budget()Ming Lei2017-11-041-16/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is very expensive to atomic_inc/atomic_dec the host wide counter of host->busy_count, and it should have been avoided via blk-mq's mechanism of getting driver tag, which uses the more efficient way of sbitmap queue. Also we don't check atomic_read(&sdev->device_busy) in scsi_mq_get_budget() and don't run queue if the counter becomes zero, so IO hang may be caused if all requests are completed just before the current SCSI device is added to shost->starved_list. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Reported-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | scsi: implement .get_budget and .put_budget for blk-mqMing Lei2017-11-011-23/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to tell blk-mq to reserve resources before queuing one request, so implement these two callbacks. Then blk-mq can avoid to dequeue request too early, and IO merging can be improved a lot. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | scsi: allow passing in null rq to scsi_prep_state_check()Ming Lei2017-11-011-2/+2
| |/ | | | | | | | | | | | | | | | | In the following patch, we will implement scsi_get_budget() which need to call scsi_prep_state_check() when rq isn't dequeued yet. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Revert "scsi: make 'state' device attribute pollable"Linus Torvalds2017-11-071-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8a97712e5314aefe16b3ffb4583a34deaa49de04. This commit added a call to sysfs_notify() from within scsi_device_set_state(), which in turn turns out to make libata very unhappy, because ata_eh_detach_dev() does spin_lock_irqsave(ap->lock, flags); .. if (ata_scsi_offline_dev(dev)) { dev->flags |= ATA_DFLAG_DETACHED; ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG; } and ata_scsi_offline_dev() then does that scsi_device_set_state() to set it offline. So now we called sysfs_notify() from within a spinlocked region, which really doesn't work. The 0day robot reported this as: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 because sysfs_notify() ends up calling kernfs_find_and_get_ns() which then does mutex_lock(&kernfs_mutex).. The pollability of the device state isn't critical, so revert this all for now, and maybe we'll do it differently in the future. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFERBart Van Assche2017-10-231-7/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The legacy block layer handles requests as follows: - If the prep function returns BLKPREP_OK, let blk_peek_request() return the pointer to that request. - If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED flag and retry calling the prep function later. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end the request. In none of these cases it is correct to clear the SCMD_INITIALIZED flag from inside scsi_prep_fn(). Since scsi_prep_fn() already guarantees that scsi_init_command() will be called once even if scsi_prep_fn() is called multiple times, remove the code that clears SCMD_INITIALIZED from scsi_prep_fn(). The scsi-mq code handles requests as follows: - If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag and submit the request to the SCSI LLD. - If scsi_mq_prep_fn() returns BLKPREP_DEFER, call blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call scsi_mq_uninit_cmd() and let the blk-mq core end the request. In none of these cases scsi_mq_prep_fn() should clear the SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn() function that clears that flag. This patch avoids that the following warning is triggered when using the legacy block layer: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220 CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1 task: ffff91c147a4b800 task.stack: ffffb282c37b8000 RIP: 0010:scsi_end_request+0x1de/0x220 Call Trace: <IRQ> scsi_io_completion+0x204/0x5e0 scsi_finish_command+0xce/0xe0 scsi_softirq_done+0x126/0x130 blk_done_softirq+0x6e/0x80 __do_softirq+0xcf/0x2a8 irq_exit+0xab/0xb0 do_IRQ+0x7b/0xc0 common_interrupt+0x90/0x90 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10 __test_set_page_writeback+0xc7/0x2c0 __block_write_full_page+0x158/0x3b0 block_write_full_page+0xc4/0xd0 blkdev_writepage+0x13/0x20 __writepage+0x12/0x40 write_cache_pages+0x204/0x500 generic_writepages+0x48/0x70 blkdev_writepages+0x9/0x10 do_writepages+0x34/0xc0 __filemap_fdatawrite_range+0x6c/0x90 file_write_and_wait_range+0x31/0x90 blkdev_fsync+0x16/0x40 vfs_fsync_range+0x44/0xa0 do_fsync+0x38/0x60 SyS_fsync+0xb/0x10 entry_SYSCALL_64_fastpath+0x13/0x94 ---[ end trace 86e8ef85a4a6c1d1 ]--- Fixes: commit 64104f703212 ("scsi: Call scsi_initialize_rq() for filesystem requests") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: scsi-mq: Always unprepare before requeuing a requestBart Van Assche2017-08-311-2/+8
| | | | | | | | | | | | | | | | | One of the two scsi-mq functions that requeue a request unprepares a request before requeueing (scsi_io_completion()) but the other function not (__scsi_queue_insert()). Make sure that a request is unprepared before requeuing it. Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: Improve requeuing behaviorBart Van Assche2017-08-311-2/+13
| | | | | | | | | | | | | | | | | | | Requests are unprepared and reprepared when being requeued. Avoid that requeuing resets .jiffies_at_alloc and .retries by initializing these two member variables from inside scsi_initialize_rq() and by preserving both member variables when preparing a request. This patch affects the requeuing behavior of both the legacy scsi and the scsi-mq code paths. Reported-by: Brian King <brking@linux.vnet.ibm.com> References: https://lkml.org/lkml/2017/8/18/923 ("Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: Call scsi_initialize_rq() for filesystem requestsBart Van Assche2017-08-311-4/+22
| | | | | | | | | | | | | | | | If a pass-through request is submitted then blk_get_request() initializes that request by calling scsi_initialize_rq(). Also call this function for filesystem requests. Introduce CMD_INITIALIZED to keep track of whether or not a request has already been initialized. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: Rework handling of scsi_device.vpd_pg8[03]Bart Van Assche2017-08-291-8/+8
| | | | | | | | | | | | | | | | | | | | | | | Introduce struct scsi_vpd for the VPD page length, data and the RCU head that will be used to free the VPD data. Use kfree_rcu() instead of kfree() to free VPD data. Move the VPD buffer pointer check inside the RCU read lock in the sysfs code. Only annotate pointers that are shared across threads with __rcu. Use rcu_dereference() when dereferencing an RCU pointer. This patch suppresses about twenty sparse complaints about the vpd_pg8[03] pointers. This patch also fixes a race condition, namely that updating of the VPD pointers and length variables in struct scsi_device was not atomic with reference to the code reading these variables. See also "Does the update code tolerate concurrent accesses?" in Documentation/RCU/checklist.txt. Fixes: commit 09e2b0b14690 ("scsi: rescan VPD attributes") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Shane Seymour <shane.seymour@hpe.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane Seymour <shane.seymour@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: Fix the kerneldoc for scsi_initialize_rq()Jonathan Corbet2017-08-251-0/+1
| | | | | | | | | | | | | | The kerneldoc comment for scsi_initialize_rq() neglected to document the "rq" parameter, leading to this docs build warning: ./drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq' Document the parameter and make the build slightly quieter. [mkp: used wording suggested by Bart] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: fix comment in scsi_device_set_state()Hannes Reinecke2017-08-251-1/+1
| | | | | | | | | | | The function returns '0' if successful; with the original comment the function doesn't have a way to indicate success ... Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart van Assche <bvanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: Use blk_mq_rq_to_pdu() to convert a request to a SCSI command pointerBart Van Assche2017-08-251-9/+9
| | | | | | | | | | | | | | | | | Since commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request") struct request and struct scsi_cmnd are adjacent. This means that there is now an alternative to reading req->special to convert a pointer to a prepared request into a SCSI command pointer, namely by using blk_mq_rq_to_pdu(). Make this change where appropriate. Although this patch does not change any functionality, it slightly improves performance and slightly improves readability. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
OpenPOWER on IntegriCloud