4eeaa4f3f1
On a successful end of reopen port forced, zfcp_erp_strategy_followup_success() re-uses the port erp_action and the subsequent zfcp_erp_action_cleanup() now sees ZFCP_ERP_SUCCEEDED with erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED but must not perform zfcp_scsi_schedule_rport_register(). We can detect this because the fresh port reopen erp_action is in its very first step ZFCP_ERP_STEP_UNINITIALIZED. Otherwise this opens a time window with unblocked rport (until the followup port reopen recovery would block it again). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Also, unnecessary and repeated DID_IMM_RETRY for pending and undesired new requests occur because internally zfcp still has its zfcp_port blocked. As follow-on errors with scsi_eh, it can cause, in the worst case, permanently lost paths due to one of: sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk! sd <scsidev>: Device offlined - not ready after error recovery For fix validation and to aid future debugging with other recoveries we now also trace (un)blocking of rports. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: |
||
---|---|---|
.. | ||
Makefile | ||
zfcp_aux.c | ||
zfcp_ccw.c | ||
zfcp_dbf.c | ||
zfcp_dbf.h | ||
zfcp_def.h | ||
zfcp_erp.c | ||
zfcp_ext.h | ||
zfcp_fc.c | ||
zfcp_fc.h | ||
zfcp_fsf.c | ||
zfcp_fsf.h | ||
zfcp_qdio.c | ||
zfcp_qdio.h | ||
zfcp_reqlist.h | ||
zfcp_scsi.c | ||
zfcp_sysfs.c | ||
zfcp_unit.c |