summaryrefslogtreecommitdiffstats
path: root/drivers/s390/scsi
Commit message (Collapse)AuthorAgeFilesLines
* scsi: zfcp: fix erp_action use-before-initialize in REC action traceSteffen Maier2017-10-163-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit <address>, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit <address> struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: early returns for traces disabled via levelMartin Peschke2017-08-101-8/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds early checks to avoid burning CPU cycles on the assembly of trace entries which would be skipped anyway. Introduce a static const variable to keep the trace level to check with debug_level_enabled() in sync with the actual trace emit with debug_event(). In order not to refactor the SAN tracing too much, simply use a define instead. This change is only for the non / semi hot paths, while the actual (I/O) hot path was already improved earlier: zfcp_dbf_scsi() is already guarded by its only caller _zfcp_dbf_scsi() since commit dcd20e2316cd ("[SCSI] zfcp: Only collect SCSI debug data for matching trace levels"). zfcp_dbf_hba_fsf_res() is already guarded by its only caller zfcp_dbf_hba_fsf_response() since commit 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels"). Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: rebase, reword, default level 3 branch prediction] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: clean up unnecessary module_param_named() with no_auto_port_rescanMartin Peschke2017-08-101-1/+1
| | | | | | | | | | | | Improves commit 43f60cbd56c4 ("[SCSI] zfcp: No automatic port_rescan on events") Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: reword, underscore in description to match sysfs] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: clean up a member of struct zfcp_qdio that was assigned but ↵Martin Peschke2017-08-102-3/+0
| | | | | | | | | | | | | | | | | | never used v2.6.38 commit a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") dropped trace information previously introduced with v2.6.27 commit c3baa9a26c5a ("[SCSI] zfcp: Add information about interrupt to trace.") but kept and needlessly assigned a now no longer used struct field. Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: reword, added git history] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: clean up no longer existent prototype from zfcp API headerSteffen Maier2017-08-101-1/+0
| | | | | | | | | | | | Commit a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") refactored zfcp_dbf_hba_berr into zfcp_dbf_hba_bit_err but added the prototype for the latter without removing it for the former. Suggested-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: clean up redundant code with fall through in link down SRB ↵Martin Peschke2017-08-101-2/+0
| | | | | | | | | | | switch case Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: re-worded short description for more details] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix kernel doc comment typos for struct zfcp_dbf_scsiSteffen Maier2017-08-101-3/+3
| | | | | | | | | | Improves commit 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: use endianness conversions with common FC(P) struct fieldsSteffen Maier2017-08-104-35/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just to silence sparse. Since zfcp only exists for s390 and s390 is big endian, this has been working correctly without conversions and all the new conversions are NOPs so no performance impact. Nonetheless, use the conversion on the constant expression where possible. NB: N_Port-IDs have always been handled with hton24 or ntoh24 conversions because they also convert to / from character array. Affected common code structs and .fields are: HOT I/O PATH: fcp_cmnd .fc_dl FCP command: regular SCSI I/O, including DIX case SEMI-HOT I/O PATH: fcp_cmnd .fc_dl recovery FCP command: task management function (LUN / target reset) fcp_resp_ext FCP response having FCP_SNS_LEN_VAL with .fr_rsp_len .fr_sns_len FCP response having FCP_RESID_UNDER with .fr_resid RECOVERY / DISCOVERY PATHS: fc_ct_hdr .ct_cmd .ct_mr_size zfcp auto port scan [GPN_FT] with fc_gpn_ft_resp.fp_wwpn, recovery for returned port [GID_PN] with fc_ns_gid_pn.fn_wwpn, get symbolic port name [GSPN], register symbolic port name [RSPN] (NPIV only). fc_els_rscn .rscn_plen incoming ELS (RSCN). fc_els_flogi .fl_wwpn .fl_wwnn incoming ELS (PLOGI), port open response with .fl_csp.sp_bb_data .fl_cssp[0..3].cp_class, FCP channel physical port, point-to-point peer (P2P only). fc_els_logo .fl_n_port_wwn incoming ELS (LOGO). fc_els_adisc .adisc_wwnn .adisc_wwpn path test after RSCN for gone target port. Since v4.10 commit 05de97003c77 ("linux/types.h: enable endian checks for all sparse builds"), below sparse endianness reports appear by default. Previously, one needed to pass argument CF="-D__CHECK_ENDIAN__" to make as in: $ make C=1 CF="-D__CHECK_ENDIAN__" M=drivers/s390/scsi. Silenced sparse warnings and one error: $ make C=1 M=drivers/s390/scsi ... CHECK drivers/s390/scsi/zfcp_dbf.c drivers/s390/scsi/zfcp_dbf.c:463:22: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_dbf.c:476:28: warning: restricted __be16 degrades to integer CC drivers/s390/scsi/zfcp_dbf.o ... CHECK drivers/s390/scsi/zfcp_fc.c drivers/s390/scsi/zfcp_fc.c:263:26: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:299:41: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:299:41: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:299:41: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fc.c:309:40: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:309:40: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:309:40: got restricted __be64 [usertype] fl_n_port_wwn drivers/s390/scsi/zfcp_fc.c:338:31: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:355:24: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:355:24: expected restricted __be16 [usertype] ct_cmd drivers/s390/scsi/zfcp_fc.c:355:24: got unsigned short [unsigned] [usertype] cmd drivers/s390/scsi/zfcp_fc.c:356:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:356:28: expected restricted __be16 [usertype] ct_mr_size drivers/s390/scsi/zfcp_fc.c:356:28: got int drivers/s390/scsi/zfcp_fc.c:379:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:379:36: expected restricted __be64 [usertype] fn_wwpn drivers/s390/scsi/zfcp_fc.c:379:36: got unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:463:18: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:465:17: warning: cast from restricted __be64 drivers/s390/scsi/zfcp_fc.c:473:20: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:473:20: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:473:20: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.c:474:29: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:474:29: expected unsigned int [unsigned] [usertype] maxframe_size drivers/s390/scsi/zfcp_fc.c:474:29: got restricted __be16 [usertype] sp_bb_data drivers/s390/scsi/zfcp_fc.c:476:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:478:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:480:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:482:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:500:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:500:28: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:500:28: got restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:502:38: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:541:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:541:40: expected restricted __be64 [usertype] adisc_wwpn drivers/s390/scsi/zfcp_fc.c:541:40: got unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fc.c:542:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:542:40: expected restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:542:40: got unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fc.c:669:16: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:696:24: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:699:54: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:699:54: expected unsigned long long [unsigned] [usertype] <noident> drivers/s390/scsi/zfcp_fc.c:699:54: got restricted __be64 [usertype] fp_wwpn CC drivers/s390/scsi/zfcp_fc.o CHECK drivers/s390/scsi/zfcp_fsf.c drivers/s390/scsi/zfcp_fsf.c:479:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:479:34: expected unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fsf.c:479:34: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:480:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:480:34: expected unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fsf.c:480:34: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fsf.c:506:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:506:36: expected unsigned long long [unsigned] [usertype] peer_wwpn drivers/s390/scsi/zfcp_fsf.c:506:36: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:507:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:507:36: expected unsigned long long [unsigned] [usertype] peer_wwnn drivers/s390/scsi/zfcp_fsf.c:507:36: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.h:269:46: warning: restricted __be32 degrades to integer drivers/s390/scsi/zfcp_fc.h:270:29: error: incompatible types in comparison expression (different base types) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: use common code fcp_cmnd and fcp_resp with union in ↵Steffen Maier2017-08-104-10/+17
| | | | | | | | | | | | | | | | | | fsf_qtcb_bottom_io This eases crash dump analysis by automatically dissecting these protocol headers at least somewhat rather than getting a string interpretation of large unstructured character array buffer fields. Also, we can get rid of some unnecessary and error-prone type casts. This change is possible since v2.6.33 commit 4318e08c84e4 ("[SCSI] zfcp: Update FCP protocol related code"). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: clarify that we don't need "link" test on failed open portSteffen Maier2017-08-101-1/+3
| | | | | | | Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: more fitting constant for fc_ct_hdr.ct_reason on port scan responseSteffen Maier2017-08-101-2/+2
| | | | | | | | | | | | | | v2.6.33 commit dbf5dfe9dbce ("[SCSI] zfcp: Use common code definitions for FC CT structs") replaced own definitions with common code definitions. While FC_BA_RJT_UNABLE happens to be defined with the same value 9 as FC_FS_RJT_UNABL and thus also works, here we should use the latter from fc_gs.h. See also its use in libfc's fc_disc_gpn_ft_resp(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: trace high part of "new" 64 bit SCSI LUNSteffen Maier2017-08-102-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Complements debugging aspects of the otherwise functionally complete v3.17 commit 9cb78c16f5da ("scsi: use 64-bit LUNs"). While I don't have access to a target exporting 3 or 4 level LUNs, I did test it by explicitly attaching a non-existent fake 4 level LUN by means of zfcp sysfs attribute "unit_add". In order to see corresponding trace records of otherwise successful events, we had to increase the trace level of area SCSI and HBA to 6. $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level $ echo 0x4011402240334044 > \ /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add Example output formatted by an updated zfcpdbf from the s390-tools package interspersed with kernel messages at scsi_logging_level=4605: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : scsla_1 LUN : 0x4011402240334044 WWPN : 0x50050763031bd327 D_ID : 0x00...... Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x41000000 Ready count : 0x00000001 Running count : 0x00000000 ERP want : 0x01 ERP need : 0x01 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0 Timestamp : ... Area : HBA Subarea : 00 Level : 6 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fs_norm Request ID : 0x<inquiry2-req-id> Request status : 0x00000010 FSF cmnd : 0x00000001 FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 Prot stat qual : ........ ........ 00000000 00000000 Port handle : 0x... LUN handle : 0x... | Timestamp : ... Area : SCSI Subarea : 00 Level : 6 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_nor Request ID : 0x<inquiry2-req-id> SCSI ID : 0x00000000 SCSI LUN : 0x40224011 SCSI LUN high : 0x40444033 <======================= SCSI result : 0x00000000 SCSI retries : 0x00 SCSI allowed : 0x03 SCSI scribble : 0x<inquiry2-req-id> SCSI opcode : 12000000 a4000000 00000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000000 00000000 00000000 00000000 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0 scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \ no device added Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 9cb78c16f5da ("scsi: use 64-bit LUNs") Cc: <stable@vger.kernel.org> #3.17+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late ↵Steffen Maier2017-08-101-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | response At the default trace level, we only trace unsuccessful events including FSF responses. zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to decide on an unsuccessful response. However, this is only one of multiple possible sources determining a failed struct zfcp_fsf_req. An FSF request can also "fail" if its response runs into an ERP timeout or if it gets dismissed because a higher level recovery was triggered [trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()]. FSF requests with ERP timeout are: FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA, FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports, FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN. One example is slow queue processing which can cause follow-on errors, e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out. In order to see the root cause, we need to see late responses even if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD. Example trace records formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : fcegpf1 LUN : 0xffffffffffffffff WWPN : 0x<WWPN> D_ID : 0x00<D_ID> Adapter status : 0x5400050b Port status : 0x41200000 LUN status : 0x00000000 Ready count : 0x00000001 Running count : 0x... ERP want : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT ERP need : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT | Timestamp : ... 30 seconds later Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : ... Record ID : 2 Tag : erscf_2 LUN : 0xffffffffffffffff WWPN : 0x<WWPN> D_ID : 0x00<D_ID> Adapter status : 0x5400050b Port status : 0x41200000 LUN status : 0x00000000 Request ID : 0x<request_ID> ERP status : 0x10000000 ZFCP_STATUS_ERP_TIMEDOUT ERP step : 0x0800 ZFCP_ERP_STEP_PORT_OPENING ERP action : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT ERP count : 0x00 | Timestamp : ... later than previous record Area : HBA Subarea : 00 Level : 5 > default level => 3 <= default level Exception : - CPU ID : 00 Caller : ... Record ID : 1 Tag : fs_qtcb => fs_rerr Request ID : 0x<request_ID> Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED | ZFCP_STATUS_FSFREQ_CLEANUP FSF cmnd : 0x00000005 FSF sequence no: 0x... FSF issued : ... > 30 seconds ago FSF stat : 0x00000000 FSF_GOOD FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 FSF_PROT_GOOD Prot stat qual : 00000000 00000000 00000000 00000000 Port handle : 0x... LUN handle : 0x00000000 QTCB log length: ... QTCB log info : ... In case of problems detecting that new responses are waiting on the input queue, we sooner or later trigger adapter recovery due to an FSF request timeout (trace tag "fsrth_1"). FSF requests with FSF request timeout are: typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs, FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports, FSF_QTCB_FCP_CMND for task management function (LUN / target reset). One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD because the channel filled in the response via DMA into the request's QTCB. In a theroretical case, inject code can create an erroneous FSF request on purpose. If data router is enabled, it uses deferred error reporting. A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and SAM_STAT_GOOD. But on writing the read data to host memory via DMA, it can still fail, e.g. if an intentionally wrong scatter list does not provide enough space. Rather than getting an unsuccessful response, we get a QDIO activate check which in turn triggers adapter recovery. One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD because the channel filled in the response via DMA into the request's QTCB. Example trace records formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : HBA Subarea : 00 Level : 6 > default level => 3 <= default level Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : fs_norm => fs_rerr Request ID : 0x<request_ID2> Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED | ZFCP_STATUS_FSFREQ_CLEANUP FSF cmnd : 0x00000001 FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF_GOOD FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 FSF_PROT_GOOD Prot stat qual : ........ ........ 00000000 00000000 Port handle : 0x... LUN handle : 0x... | Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : rsl_err Request ID : 0x<request_ID2> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x000e0000 DID_TRANSPORT_DISRUPTED SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_ID2> SCSI opcode : 28... Read(10) FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000000 00000000 ^^ SAM_STAT_GOOD 00000000 00000000 Only with luck in both above cases, we could see a follow-on trace record of an unsuccesful event following a successful but late FSF response with FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED [On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure]. However, the reason for this follow-on trace was invisible because the corresponding HBA trace record was missing at the default trace level (by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open"). On adapter recovery, after we had shut down the QDIO queues, we perform unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED for each pending FSF request in zfcp_fsf_req_dismiss_all(). In order to find the root cause, we need to see all pseudo responses even if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD. Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr". It does not matter that there are numerous places which set ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response early. These cases are based on protocol status != FSF_PROT_GOOD or == FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default as trace tag "fs_perr" or "fs_ferr" respectively. NB: The trace record with tag "fssrh_1" for status read buffers on dismiss all remains. zfcp_fsf_req_complete() handles this and returns early. All other FSF request types are handled separately and as described above. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace recordsSteffen Maier2017-08-101-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and thus does not fit into the fsp_rsp field built into a SCSI trace record, trace the full FCP_RSP UI with all optional parts as payload record instead of just FCP_SNS_INFO as payload and a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record. That way we would also get the full FCP_SNS_INFO in case a target would ever send more than min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96. The mandatory part of FCP_RSP IU is only 24 bytes. PAYload costs at least one full PAY record of 256 bytes anyway. We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128. So we can just put the whole FCP_RSP IU with any optional parts into PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)"). This does not cause any additional trace records wasting memory. Decoded trace records were confusing because they showed a hard-coded sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed actually less. Since the same commit, we set pl_len for SAN traces to the full length of a request/response even if we cap the corresponding trace. In contrast, here for SCSI traces we set pl_len to the pre-computed length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid. Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128 if there were optional parts. This makes it easier for the zfcpdbf tool to format only the relevant part of the long FCP_RSP UI buffer. And any trailing information is still available in the payload trace record just in case. Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new content explicit to zfcpdbf which can then pick a suitable field name such as "FCP rsp IU all:" instead of "Sense info :" Also, the same zfcpdbf can still be backwards compatible with "fcp_sns". Old example trace record before this fix, formatted with the tool zfcpdbf from s390-tools: Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU id : .. Caller : 0x... Record id : 1 Tag : rsl_err Request id : 0x<request_id> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00000002 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_id> SCSI opcode : 00000000 00000000 00000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000202 00000000 ^^==FCP_SNS_LEN_VALID 00000020 00000000 ^^^^^^^^==FCP_SNS_LEN==32 Sense len : 96 <==min(SCSI_SENSE_BUFFERSIZE,ZFCP_DBF_PAY_MAX_REC) Sense info : 70000600 00000018 00000000 29000000 00000400 00000000 00000000 00000000 00000000 00000000 00000000 00000000<==superfluous 00000000 00000000 00000000 00000000<==superfluous 00000000 00000000 00000000 00000000<==superfluous 00000000 00000000 00000000 00000000<==superfluous New example trace records with this fix: Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_err Request ID : 0x<request_id> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00000002 SCSI retries : 0x00 SCSI allowed : 0x03 SCSI scribble : 0x<request_id> SCSI opcode : a30c0112 00000000 02000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000a02 00000200 00000020 00000000 FCP rsp IU len : 56 FCP rsp IU all : 00000000 00000000 00000a02 00000200 ^^=FCP_RESID_UNDER|FCP_SNS_LEN_VALID 00000020 00000000 70000500 00000018 ^^^^^^^^==FCP_SNS_LEN ^^^^^^^^^^^^^^^^^ 00000000 240000cb 00011100 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^==FCP_SNS_INFO Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_okay Request ID : 0x<request_id> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00000000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_id> SCSI opcode : <CDB of unrelated SCSI command passed to eh handler> FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000100 00000000 00000000 00000008 FCP rsp IU len : 32 FCP rsp IU all : 00000000 00000000 00000100 00000000 ^^==FCP_RSP_LEN_VALID 00000000 00000008 00000000 00000000 ^^^^^^^^==FCP_RSP_LEN ^^^^^^^^^^^^^^^^^==FCP_RSP_INFO Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix missing trace records for early returns in TMF eh handlersSteffen Maier2017-08-101-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | For problem determination we need to see that we were in scsi_eh as well as whether and why we were successful or not. The following commits introduced new early returns without adding a trace record: v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL, v2.6.30 commit 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") on not having gotten an FSF request after the maximum number of retry attempts and thus could not issue a TMF and has to return FAILED. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBASteffen Maier2017-08-102-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this fix we get SCSI trace records on task management functions which cannot be correlated to HBA trace records because all fields related to the FSF request are empty (zero). Also, the FCP_RSP_IU is missing as well as any sense data if available. This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") introducing trace records for TMFs but hard coding NULL for a possibly existing TMF FSF request. The scsi_cmnd scribble is also zero or unrelated for the TMF request so it also could not lookup a suitable FSF request from there. A broken example trace record formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_fail Request ID : 0x0000000000000000 ^^^^^^^^^^^^^^^^ no correlation to HBA record SCSI ID : 0x<scsitarget> SCSI LUN : 0x<scsilun> SCSI result : 0x000e0000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x0000000000000000 SCSI opcode : 2a000017 3bb80000 08000000 00000000 FCP rsp inf cod: 0x00 ^^ no TMF response FCP rsp IU : 00000000 00000000 00000000 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU Sense len : ... ^^^^^^^^^^^^^^^^^^^^ no sense data length Sense info : ... ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present There are some true cases where we really do not have an FSF request: "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early returns / completions in zfcp_scsi_queuecommand(), "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from zfcp_scsi_eh_abort_handler() where we did not get as far, "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're successful and do not need to do anything because adapter stopped. For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace recordsSteffen Maier2017-08-102-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") necessary for problem determination, e.g. to see the currently active zone set during automatic port scan. While it already saves space by not dumping any empty residual entries of the large successful GPN_FT response (4 pages), there are seldom cases where the GPN_FT response is unsuccessful and likely does not have FC_NS_FID_LAST set in fp_flags so we did not cap the trace record. We typically see such case for an initiator WWPN, which is not in any zone. Cap unsuccessful responses to at least the actual basic CT_IU response plus whatever fits the SAN trace record built-in "payload" buffer just in case there's trailing information of which we would at least see the existence and its beginning. In order not to erroneously cap successful responses, we need to swap calling the trace function and setting the CT / ELS status to success (0). Example trace record pair formatted with zfcpdbf: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fssct_1 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN req short : 01000000 fc020000 01720ffc 00000000 00000008 SAN req length : 20 | Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 2 Tag : fsscth2 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN resp short : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] SAN resp length: 16384 San resp info : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] The fix saves all but one of the previously associated 64 PAYload trace record chunks of size 256 bytes each. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress pathBenjamin Block2017-08-101-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP response IU. When this flag is set, it is possible, in regards to the FCP standard, that the storage-server processes the command normally, up to the point where data is missing and simply ignores those. In this case no CHECK CONDITION would be set, and because we ignored the FCP_RESID_OVER flag we resulted in at least a data loss or even -corruption as a follow-up error, depending on how the applications/layers on top behave. To prevent this, we now set the host-byte of the corresponding scsi_cmnd to DID_ERROR. Other storage-behaviors, where the same condition results in a CHECK CONDITION set in the answer, don't need to be changed as they are handled in the mid-layer already. Following is an example trace record decoded with zfcpdbf from the s390-tools package. We forcefully injected a fc_dl which is one byte too small: Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_err Request ID : 0x... SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00070000 ^^DID_ERROR SCSI retries : 0x.. SCSI allowed : 0x.. SCSI scribble : 0x... SCSI opcode : 2a000000 00000000 08000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000400 00000001 ^^fr_flags==FCP_RESID_OVER ^^fr_status==SAM_STAT_GOOD ^^^^^^^^fr_resid 00000000 00000000 As of now, we don't actively handle to possibility that a response IU has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once. Reported-by: Luke M. Hopkins <lmhopkin@us.ibm.com> Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 553448f6c483 ("[SCSI] zfcp: Message cleanup") Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Cc: <stable@vger.kernel.org> #2.6.33+ Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabledSteffen Maier2017-08-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit db007fc5e20c ("[SCSI] Command protection operation"), scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to SCSI_PROT_NORMAL. Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand() to only access any of scsi_prot_sg...() if (scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL). Do the same thing for zfcp, which introduced DIX support with commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX"). Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp, because the regular SCSI command with DIX protection data, that scsi_eh re-uses in scsi_send_eh_cmnd(), of course still has (scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the FCP channel hardware. This causes scsi_eh_test_devices() to have (finish_cmds == 0) [not SCSI device is online or not scsi_eh_tur() failed] so regular SCSI commands, that caused / were affected by scsi_eh, are moved to work_q and scsi_eh_test_devices() itself returns false. In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs() beyond host reset to finally scsi_eh_offline_sdevs() which sets affected SCSI devices offline with the following kernel message: "kernel: sd H:0:T:L: Device offlined - not ready after error recovery" Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX") Cc: <stable@vger.kernel.org> #2.6.36+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: convert bool-definitions to use 'true' instead of '1'Benjamin Block2017-08-102-2/+2
| | | | | | | | | Better form and cleans remaining warnings. Found with scripts/coccinelle/misc/boolinit.cocci. Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: Remove unneeded linux/miscdevice.h includeCorentin Labbe2017-08-101-1/+0
| | | | | | | | | | | | | | | | | | | drivers/s390/scsi/zfcp_aux.c does not contain any miscdevice so the inclusion of linux/miscdevice.h is unnecessary. [maier@linux.vnet.ibm.com: just for the records, this is in fact a minor missing code cleanup of the following older "feature" which also dropped the only former use of a misc device in zfcp: commit 663e0890e31c ("[SCSI] zfcp: remove access control tables interface") commit b5dc3c4800cc ("[SCSI] zfcp: remove access control tables interface (keep sysfs files)") commit 1b33ef23946a ("zfcp: remove access control tables interface (port leftovers)")] Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: use setup_timer instead of init_timerLukáš Korenčik2017-08-101-3/+2
| | | | | | | | | | Use initialization with setup_timer function instead of using init_timer function and data fields. It improves readability. Signed-off-by: Lukáš Korenčik <xkorenc1@fi.muni.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: replace zfcp_qdio_sbale_count by sg_nentsLABBE Corentin2017-08-102-17/+1
| | | | | | | | | | The zfcp_qdio_sbale_count function do the same work than sg_nents(). So replace it by sg_nents() for removing duplicate code. Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2017-02-211-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This update includes the usual round of major driver updates (ncr5380, ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid, megaraid_sas, ...). There's also an assortment of minor fixes and the major update of switching a bunch of drivers to pci_alloc_irq_vectors from Christoph" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits) scsi: megaraid_sas: handle dma_addr_t right on 32-bit scsi: megaraid_sas: array overflow in megasas_dump_frame() scsi: snic: switch to pci_irq_alloc_vectors scsi: megaraid_sas: driver version upgrade scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2 scsi: megaraid_sas: Indentation and smatch warning fixes scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints scsi: megaraid_sas: Increase internal command pool scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete scsi: megaraid_sas: Bail out the driver load if ld_list_query fails scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate scsi: megaraid_sas: update can_queue only if the new value is less scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD ...
| * scsi: remove eh_timed_out methods in the transport templateChristoph Hellwig2017-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Instead define the timeout behavior purely based on the host_template eh_timed_out method and wire up the existing transport implementations in the host templates. This also clears up the confusion that the transport template method overrides the host template one, so some drivers have to re-override the transport template one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed sendSteffen Maier2017-02-091-4/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan Carpenter kindly reported: <quote> The patch d27a7cb91960: "zfcp: trace on request for open and close of WKA port" from Aug 10, 2016, leads to the following static checker warning: drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port() warn: 'req' was already freed. drivers/s390/scsi/zfcp_fsf.c 1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT); 1610 retval = zfcp_fsf_req_send(req); 1611 if (retval) 1612 zfcp_fsf_req_free(req); ^^^ Freed. 1613 out: 1614 spin_unlock_irq(&qdio->req_q_lock); 1615 if (req && !IS_ERR(req)) 1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); ^^^^^^^^^^^ Use after free. 1617 return retval; 1618 } Same thing for zfcp_fsf_close_wka_port() as well. </quote> Rather than relying on req being NULL (or ERR_PTR) for all cases where we don't want to trace or should not trace, simply check retval which is unconditionally initialized with -EIO != 0 and it can only become 0 on successful retval = zfcp_fsf_req_send(req). With that we can also remove the then again unnecessary unconditional initialization of req which was introduced with that earlier commit. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix rport unblock race with LUN recoverySteffen Maier2016-12-144-9/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() | write_lock_irqsave(&adapter->erp_lock, flags);-------+ | zfcp_erp_action_enqueue() | | zfcp_erp_setup_act() | | * set ERP_INUSE -----------------------------------|--|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | | .context-switch. | | zfcp_erp_thread() | | zfcp_erp_strategy() | | write_lock_irqsave(&adapter->erp_lock, flags);------+ | | ... | | | zfcp_erp_strategy_check_target() | | | zfcp_erp_strategy_check_adapter() | | | zfcp_erp_adapter_unblock() | | | * set UNBLOCK -----------------------------------|--+ | zfcp_erp_action_dequeue() | | * clear ERP_INUSE ---------------------------------|-----+ ... | write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: do not trace pure benign residual HBA responses at default levelSteffen Maier2016-12-142-3/+30
| | | | | | | | | | | | | | | | | | | | | Since quite a while, Linux issues enough SCSI commands per scsi_device which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE, and SAM_STAT_GOOD. This floods the HBA trace area and we cannot see other and important HBA trace records long enough. Therefore, do not trace HBA response errors for pure benign residual under counts at the default trace level. This excludes benign residual under count combined with other validity bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL. For all those other cases, we still do want to see both the HBA record and the corresponding SCSI record by default. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable@vger.kernel.org> #2.6.37+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: fix use-after-"free" in FC ingress path after TMFBenjamin Block2016-12-143-3/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Cc: <stable@vger.kernel.org> #2.6.32+ Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: fc: use bsg_job_doneJohannes Thumshirn2016-11-171-1/+1
| | | | | | | | | | fc_bsg_jobdone() and bsg_job_done() are 1:1 copies now so use the bsg-lib one instead of the FC private implementation. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: change FC drivers to use 'struct bsg_job'Johannes Thumshirn2016-11-172-9/+10
| | | | | | | | | | | Change FC drivers to use 'struct bsg_job' from bsg-lib.h instead of 'struct fc_bsg_job' from scsi_transport_fc.h and remove 'struct fc_bsg_job'. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: fc: provide fc_bsg_to_rport() helperJohannes Thumshirn2016-11-171-3/+5
| | | | | | | | | | | Provide fc_bsg_to_rport() helper that will become handy when we're moving from struct fc_bsg_job to a plain struct bsg_job. Also move all LLDDs to use the new helper. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: fc: provide fc_bsg_to_shost() helperJohannes Thumshirn2016-11-171-2/+4
| | | | | | | | | | | Provide fc_bsg_to_shost() helper that will become handy when we're moving from struct fc_bsg_job to a plain struct bsg_job. Also use this little helper in the LLDDs. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: fc: Export fc_bsg_jobdone and use it in FC driversJohannes Thumshirn2016-11-171-1/+1
| | | | | | | | | | | | | | | | Export fc_bsg_jobdone so drivers can use it directly instead of doing the round-trip via struct fc_bsg_job::job_done() and use it in the LLDDs. That way we can also unify the interfaces of fc_bsg_jobdone and bsg_job_done. As we've converted all LLDDs over to use fc_bsg_jobdone() directly, we can remove the function pointer from struct fc_bsg_job as well. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directlyJohannes Thumshirn2016-11-171-3/+6
| | | | | | | | | | Don't use fc_bsg_job::request and fc_bsg_job::reply directly, but use helper variables bsg_request and bsg_reply. This will be helpful when transitioning to bsg-lib. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: zfcp: spin_lock_irqsave() is not nestableDan Carpenter2016-10-141-1/+1
| | | | | | | | | | | | | We accidentally overwrite the original saved value of "flags" so that we can't re-enable IRQs at the end of the function. Presumably this function is mostly called with IRQs disabled or it would be obvious in testing. Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Cc: <stable@vger.kernel.org> #2.6.38+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2016-10-077-34/+196
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This update includes the usual round of major driver updates (hpsa, be2iscsi, hisi_sas, zfcp, cxlflash). There's a new incarnation of hpsa called smartpqi for which a driver is added, there's some cleanup work of the ibm vscsi target and updates to libfc, plus a whole host of minor fixes and updates and finally the removal of several ISA drivers which seem not to have been used for years" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (173 commits) scsi: mvsas: Mark symbols static where possible scsi: pm8001: Mark symbols static where possible scsi: arcmsr: Simplify user_len checking scsi: fcoe: fix off by one in eth2fc_speed() scsi: dtc: remove from tree scsi: t128: remove from tree scsi: pas16: remove from tree scsi: u14-34f: remove from tree scsi: ultrastor: remove from tree scsi: in2000: remove from tree scsi: wd7000: remove from tree scsi: scsi_dh_alua: Fix memory leak in alua_rtpg() scsi: lpfc: Mark symbols static where possible scsi: hpsa: correct call to hpsa_do_reset scsi: ufs: Get a TM service response from the correct offset scsi: ibmvfc: Fix I/O hang when port is not mapped scsi: megaraid_sas: clean function declarations in megaraid_sas_base.c up scsi: ipr: Remove redundant messages at adapter init time scsi: ipr: Don't log unnecessary 9084 error details scsi: smartpqi: raid bypass lba calculation fix ...
| * zfcp: trace full payload of all SAN records (req,resp,iels)Steffen Maier2016-08-122-13/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was lost with commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") but is necessary for problem determination, e.g. to see the currently active zone set during automatic port scan. For the large GPN_FT response (4 pages), save space by not dumping any empty residual entries. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: fix payload trace length for SAN request&responseSteffen Maier2016-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") started to add FC_CT_HDR_LEN which made zfcp dump random data out of bounds for RSPN GS responses because u.rspn.rsp is the largest and last field in the union of struct zfcp_fc_req. Other request/response types only happened to stay within bounds due to the padding of the union or due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD. Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : ... Record id : 2 Tag : fsscth2 Request id : 0x... Destination ID : 0x00fffffc Payload short : 01000000 fc020000 80020000 00000000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <=== 00000000 00000000 00000000 00000000 Payload length : 32 <=== struct zfcp_fc_req { [0] struct zfcp_fsf_ct_els ct_els; [56] struct scatterlist sg_req; [96] struct scatterlist sg_rsp; union { struct {req; rsp;} adisc; SIZE: 28+28= 56 struct {req; rsp;} gid_pn; SIZE: 24+20= 44 struct {rspsg; req;} gpn_ft; SIZE: 40*4+20=180 struct {req; rsp;} gspn; SIZE: 20+273= 293 struct {req; rsp;} rspn; SIZE: 277+16= 293 [136] } u; } SIZE: 432 Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: fix D_ID field with actual value on tracing SAN responsesSteffen Maier2016-08-123-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") we lost the N_Port-ID where an ELS response comes from. With commit 7c7dc196814b9e1d5cc254dc579a5fa78ae524f7 ("[SCSI] zfcp: Simplify handling of ct and els requests") we lost the N_Port-ID where a CT response comes from. It's especially useful if the request SAN trace record with D_ID was already lost due to trace buffer wrap. GS uses an open WKA port handle and ELS just a D_ID, and only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req. To cover both cases, add a new field to zfcp_fsf_ct_els and fill it in on request to use in SAN response trace. Strictly speaking the D_ID on SAN response is the FC frame's S_ID. We don't need a field for the other end which is always us. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Fixes: 7c7dc196814b ("[SCSI] zfcp: Simplify handling of ct and els requests") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: restore tracing of handle for port and LUN with HBA recordsSteffen Maier2016-08-122-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This information was lost with commit a54ca0f62f953898b05549391ac2a8a4dad6482b ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") but is required to debug e.g. invalid handle situations. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: trace on request for open and close of WKA portSteffen Maier2016-08-123-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit a54ca0f62f953898b05549391ac2a8a4dad6482b ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") HBA records no longer contain WWPN, D_ID, or LUN to reduce duplicate information which is already in REC records. In contrast to "regular" target ports, we don't use recovery to open WKA ports such as directory/nameserver, so we don't get REC records. Therefore, introduce pseudo REC running records without any actual recovery action but including D_ID of WKA port on open/close. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: restore: Dont use 0 to indicate invalid LUN in rec traceSteffen Maier2016-08-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bring back commit d21e9daa63e009ce5b87bbcaa6d11ce48e07bbbe ("[SCSI] zfcp: Dont use 0 to indicate invalid LUN in rec trace") which was lost with commit ae0904f60fab7cb20c48d32eefdd735e478b91fb ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: retain trace level for SCSI and HBA FSF response recordsSteffen Maier2016-08-123-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While retaining the actual filtering according to trace level, the following commits started to write such filtered records with a hardcoded record level of 1 instead of the actual record level: commit 250a1352b95e1db3216e5c5d4f4365bea5122f4a ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") commit a54ca0f62f953898b05549391ac2a8a4dad6482b ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Now we can distinguish written records again for offline level filtering. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Fixes: a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: close window with unblocked rport during rport goneSteffen Maier2016-08-123-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a successful end of reopen port forced, zfcp_erp_strategy_followup_success() re-uses the port erp_action and the subsequent zfcp_erp_action_cleanup() now sees ZFCP_ERP_SUCCEEDED with erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED but must not perform zfcp_scsi_schedule_rport_register(). We can detect this because the fresh port reopen erp_action is in its very first step ZFCP_ERP_STEP_UNINITIALIZED. Otherwise this opens a time window with unblocked rport (until the followup port reopen recovery would block it again). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Also, unnecessary and repeated DID_IMM_RETRY for pending and undesired new requests occur because internally zfcp still has its zfcp_port blocked. As follow-on errors with scsi_eh, it can cause, in the worst case, permanently lost paths due to one of: sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk! sd <scsidev>: Device offlined - not ready after error recovery For fix validation and to aid future debugging with other recoveries we now also trace (un)blocking of rports. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: fix ELS/GS request&response length for hardware data routerSteffen Maier2016-08-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the hardware data router case, introduced with kernel 3.2 commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") the ELS/GS request&response length needs to be initialized as in the chained SBAL case. Otherwise, the FCP channel rejects ELS requests with FSF_REQUEST_SIZE_TOO_LARGE. Such ELS requests can be issued by user space through BSG / HBA API, or zfcp itself uses ADISC ELS for remote port link test on RSCN. The latter can cause a short path outage due to unnecessary remote target port recovery because the always failing ADISC cannot detect extremely short path interruptions beyond the local FCP channel. Below example is decoded with zfcpdbf from s390-tools: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_san_req+0408 Record id : 1 Tag : fssels1 Request id : 0x<reqid> Destination ID : 0x00<target d_id> Payload info : 52000000 00000000 <our wwpn > [ADISC] <our wwnn > 00<s_id> 00000000 00000000 00000000 00000000 00000000 Timestamp : ... Area : HBA Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_hba_fsf_res+0740 Record id : 1 Tag : fs_ferr Request id : 0x<reqid> Request status : 0x00000010 FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS] FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE] FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000100 Prot stat qual : 00000000 00000000 00000000 00000000 Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") Cc: <stable@vger.kernel.org> # 3.2+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * zfcp: fix fc_host port_type with NPIVSteffen Maier2016-08-121-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For an NPIV-enabled FCP device, zfcp can erroneously show "NPort (fabric via point-to-point)" instead of "NPIV VPORT" for the port_type sysfs attribute of the corresponding fc_host. s390-tools that can be affected are dbginfo.sh and ziomon. zfcp_fsf_exchange_config_evaluate() ignores fsf_qtcb_bottom_config.connection_features indicating NPIV and only sets fc_host_port_type to FC_PORTTYPE_NPORT if fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC. Only the independent zfcp_fsf_exchange_port_evaluate() evaluates connection_features to overwrite fc_host_port_type to FC_PORTTYPE_NPIV in case of NPIV. Code was introduced with upstream kernel 2.6.30 commit 0282985da5923fa6365adcc1a1586ae0c13c1617 ("[SCSI] zfcp: Report fc_host_port_type as NPIV"). This works during FCP device recovery (such as set online) because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by FSF_QTCB_EXCHANGE_PORT_DATA in sequence. However, the zfcp-specific scsi host sysfs attributes "requests", "megabytes", or "seconds_active" trigger only zfcp_fsf_exchange_config_evaluate() resetting fc_host port_type to FC_PORTTYPE_NPORT despite NPIV. The zfcp-specific scsi host sysfs attribute "utilization" triggers only zfcp_fsf_exchange_port_evaluate() correcting the fc_host port_type again in case of NPIV. Evaluate fsf_qtcb_bottom_config.connection_features in zfcp_fsf_exchange_config_evaluate() where it belongs to. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 0282985da592 ("[SCSI] zfcp: Report fc_host_port_type as NPIV") Cc: <stable@vger.kernel.org> #2.6.30+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | s390: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar2016-09-061-1/+1
|/ | | | | | | | | | | | | | | | | | | | | The workqueue "appldata_wq" has been replaced with an ordered dedicated workqueue. WQ_MEM_RECLAIM has not been set since the workqueue is not being used on a memory reclaim path. The adapter->work_queue queues multiple work items viz &adapter->scan_work, &port->rport_work, &adapter->ns_up_work, &adapter->stat_work, adapter->work_queue, &adapter->events.work, &port->gid_pn_work, &port->test_link_work. Hence, an ordered dedicated workqueue has been used. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* zfcp: Revert to original scanning behaviourHannes Reinecke2016-04-151-1/+1
| | | | | | | | | | | zfcp has its own mechanism for selective scanning, so revert to the original scanning behaviour to not confuse users. Fixes: 4e91e876e9b8b6eb4255aa0d690778a89d3f1d28 Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: disable automatic target scanHannes Reinecke2016-04-111-1/+2
| | | | | | | | | | | | | | | | | On larger installations it is useful to disable automatic LUN scanning, and only add the required LUNs via udev rules. This can speed up bootup dramatically. This patch introduces a new scan module parameter value 'manual', which works like 'none', but can be overridden by setting the 'rescan' value from scsi_scan_target to 'SCSI_SCAN_MANUAL'. And it updates all relevant callers to set the 'rescan' value to 'SCSI_SCAN_MANUAL' if invoked via the 'scan' option in sysfs. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
OpenPOWER on IntegriCloud