linux/drivers/scsi/libfc
Abhijeet Joglekar 5543c72e2b [SCSI] libfc: remote port gets stuck in restart state without really restarting
We ran into a scenario where a remote port goes into RESTART state, but
never gets added to scsi transport. The running vmcore showed the following:
a) Port was in RESTART state
b) rdata->event was STOP
c) no work gets scheduled for the remote work to fc_rport_work

After this point, shut/no-shut of the remote port did not cause the port
to get re-discovered. The port would move betwen DELETE and RESTART states,
but the event would always be STOP, no work would get scheduled to
fc_rport_work and the port would not get added to scsi_transport.

The problem is that rdata->event is not set to NONE after a port is
restarted. After this point, no more work gets scheduled for the remote port
since new work is scheduled only if rdata->event is non-NONE. So, the event
and state keep changing, but fc_rport_work does not get scheduled to actually
handle the event.

Here's a transition of states that explains the above observation:

) Port is first in READY State, event is NONE

2) RSCN on shut, port goes to DELETED, event is stop

3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is
still STOP

4) fc_rport_work gets scheduled, removes the port from transport, sees state
as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT
changed to NONE, this is the bug)

5) Plogi state machine completes, port state goes to READY, event goes to
READY, but no work is scheduled since event was STOP (non-NONE) before.
Fc_rport_work is not scheduled, port remains in READY state, but is not added
to transport.

Things are broken at this point. Libfc rport is ready, but no transport rport
created.

6) now a shut causes port state to change to DELETE, event to change to STOP,
no work gets scheduled

7) no-shut causes port state to change to RESTART, event remains at STOP,
no work gets scheduled

(6) and (7) now get repeated everytime we do shut/no-shut. No way to get out
of this state. Fcc reset does not help too.

Only way to get out is to load/unload module.

Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED
events, inside the discovery and rport locks.

Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-12 16:29:47 -06:00
..
fc_disc.c [SCSI] libfc fcoe: increase ELS and CT timeouts 2009-12-04 12:01:27 -06:00
fc_elsct.c [SCSI] libfc: fix fc_els_resp_type to correct display of CT responses 2009-12-04 12:01:17 -06:00
fc_exch.c [SCSI] libfc: fix an issue of pending exch/es after i/f destroyed or rmmod fcoe 2009-12-04 12:01:26 -06:00
fc_fcp.c [SCSI] libfc, fcoe: fixes for highmem skb linearize panics 2009-12-04 12:01:25 -06:00
fc_frame.c [SCSI] libfc, fcoe: fixes for highmem skb linearize panics 2009-12-04 12:01:25 -06:00
fc_libfc.c [SCSI] libfc: Formatting cleanups across libfc 2009-12-04 12:01:07 -06:00
fc_libfc.h [SCSI] libfc: Formatting cleanups across libfc 2009-12-04 12:01:07 -06:00
fc_lport.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-12-09 19:43:33 -08:00
fc_npiv.c [SCSI] libfc: vport link handling and fc_vport state managment 2009-12-04 12:00:57 -06:00
fc_rport.c [SCSI] libfc: remote port gets stuck in restart state without really restarting 2009-12-12 16:29:47 -06:00
Makefile [SCSI] libfc: add some generic NPIV support routines to libfc 2009-12-04 12:00:56 -06:00