linux/drivers/s390/scsi
Steffen Maier 60a161b7e5 scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
Suppose adapter (open) recovery is between opened QDIO queues and before
(the end of) initial posting of status read buffers (SRBs). This time
window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
causing by design looping with exponential increase sleeps in the function
performing exchange config data during recovery
[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.

Suppose an event occurs for which the FCP channel would send an unsolicited
notification to zfcp by means of a previously posted SRB.  We saw it with
local cable pull (link down) in multi-initiator zoning with multiple
NPIV-enabled subchannels of the same shared FCP channel.

As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
status read buffers from within the adapter's ERP thread, the channel does
send an unsolicited notification.

Since v2.6.27 commit d26ab06ede ("[SCSI] zfcp: receiving an unsolicted
status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
adapter->stat_work to re-fill the just consumed SRB from a work item.

Now the ERP thread and the work item post SRBs in parallel.  Both contexts
call the helper function zfcp_status_read_refill().  The tracking of
missing (to be posted / re-filled) SRBs is not thread-safe due to separate
atomic_read() and atomic_dec(), in order to depend on posting
success. Hence, both contexts can see
atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
zfcp_qdio_handler_error().

An obvious and seemingly clean fix would be to schedule stat_work from the
ERP thread and wait for it to finish. This would serialize all SRB
re-fills. However, we already have another work item wait on the ERP
thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
open port recoveries during zfcp auto port scan, but in fact it waits for
any pending recovery including an adapter recovery. This approach leads to
a deadlock.  [see also v3.19 commit 18f87a67e6 ("zfcp: auto port scan
resiliency"); v2.6.37 commit d3e1088d68
("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
v2.6.28 commit fca55b6fb5
("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
fixing v2.6.27 commit c57a39a45a
("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
v2.6.27 commit cc8c282963
("[SCSI] zfcp: Automatically attach remote ports")]

Instead make the accounting of missing SRBs atomic for parallel execution
in both the ERP thread and adapter->stat_work.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d26ab06ede ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
Cc: <stable@vger.kernel.org> #2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 22:03:24 -05:00
..
Makefile s390: add a few more SPDX identifiers 2017-12-05 07:51:09 +01:00
zfcp_aux.c scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown 2018-12-07 22:03:24 -05:00
zfcp_ccw.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
zfcp_dbf.c scsi: zfcp: silence all W=1 build warnings for existing kdoc 2018-11-15 15:01:18 -05:00
zfcp_dbf.h scsi: zfcp: silence remaining kdoc warnings in header files 2018-11-15 15:01:18 -05:00
zfcp_def.h scsi: zfcp: use enum zfcp_erp_steps for struct zfcp_erp_action.step 2018-11-15 15:01:18 -05:00
zfcp_erp.c scsi: zfcp: drop old default switch case which might paper over missing case 2018-11-15 15:01:18 -05:00
zfcp_ext.h scsi: zfcp: make DIX experimental, disabled, and independent of DIF 2018-12-07 21:36:40 -05:00
zfcp_fc.c scsi: zfcp: silence all W=1 build warnings for existing kdoc 2018-11-15 15:01:18 -05:00
zfcp_fc.h scsi: zfcp: silence remaining kdoc warnings in header files 2018-11-15 15:01:18 -05:00
zfcp_fsf.c scsi: zfcp: silence all W=1 build warnings for existing kdoc 2018-11-15 15:01:18 -05:00
zfcp_fsf.h scsi: zfcp: silence remaining kdoc warnings in header files 2018-11-15 15:01:18 -05:00
zfcp_qdio.c scsi: zfcp: silence all W=1 build warnings for existing kdoc 2018-11-15 15:01:18 -05:00
zfcp_qdio.h scsi: zfcp: silence remaining kdoc warnings in header files 2018-11-15 15:01:18 -05:00
zfcp_reqlist.h scsi: zfcp: silence remaining kdoc warnings in header files 2018-11-15 15:01:18 -05:00
zfcp_scsi.c scsi: zfcp: make DIX experimental, disabled, and independent of DIF 2018-12-07 21:36:40 -05:00
zfcp_sysfs.c scsi: zfcp: support SCSI_ADAPTER_RESET via scsi_host sysfs attribute host_reset 2018-05-18 11:28:15 -04:00
zfcp_unit.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00