linux/drivers
Steffen Maier 60a161b7e5 scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
Suppose adapter (open) recovery is between opened QDIO queues and before
(the end of) initial posting of status read buffers (SRBs). This time
window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
causing by design looping with exponential increase sleeps in the function
performing exchange config data during recovery
[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.

Suppose an event occurs for which the FCP channel would send an unsolicited
notification to zfcp by means of a previously posted SRB.  We saw it with
local cable pull (link down) in multi-initiator zoning with multiple
NPIV-enabled subchannels of the same shared FCP channel.

As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial
status read buffers from within the adapter's ERP thread, the channel does
send an unsolicited notification.

Since v2.6.27 commit d26ab06ede ("[SCSI] zfcp: receiving an unsolicted
status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
adapter->stat_work to re-fill the just consumed SRB from a work item.

Now the ERP thread and the work item post SRBs in parallel.  Both contexts
call the helper function zfcp_status_read_refill().  The tracking of
missing (to be posted / re-filled) SRBs is not thread-safe due to separate
atomic_read() and atomic_dec(), in order to depend on posting
success. Hence, both contexts can see
atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
zfcp_qdio_handler_error().

An obvious and seemingly clean fix would be to schedule stat_work from the
ERP thread and wait for it to finish. This would serialize all SRB
re-fills. However, we already have another work item wait on the ERP
thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the
open port recoveries during zfcp auto port scan, but in fact it waits for
any pending recovery including an adapter recovery. This approach leads to
a deadlock.  [see also v3.19 commit 18f87a67e6 ("zfcp: auto port scan
resiliency"); v2.6.37 commit d3e1088d68
("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
v2.6.28 commit fca55b6fb5
("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
fixing v2.6.27 commit c57a39a45a
("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
v2.6.27 commit cc8c282963
("[SCSI] zfcp: Automatically attach remote ports")]

Instead make the accounting of missing SRBs atomic for parallel execution
in both the ERP thread and adapter->stat_work.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: d26ab06ede ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
Cc: <stable@vger.kernel.org> #2.6.27+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 22:03:24 -05:00
..
accessibility
acpi pwm: Changes for v4.20-rc1 2018-11-02 11:22:45 -07:00
amba
android
ata libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9 2018-10-26 08:21:04 -06:00
atm atm: zatm: Fix empty body Clang warnings 2018-10-18 15:39:10 -07:00
auxdisplay The Compiler Attributes series 2018-11-01 18:34:46 -07:00
base mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock 2018-10-31 08:54:17 -07:00
bcma
block for-linus-20181102 2018-11-02 11:25:48 -07:00
bluetooth Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-10-24 14:43:41 +01:00
bus ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
cdrom gdrom: fix mistake in assignment of error 2018-10-25 11:17:40 -06:00
char RTC for 4.20 2018-10-27 09:24:24 -07:00
clk This time it looks like a quieter release cycle in the clk tree. I guess that's 2018-10-31 11:08:30 -07:00
clocksource Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-04 08:15:15 -08:00
connector
cpufreq cpufreq: remove unused arm_big_little_dt driver 2018-10-25 18:39:02 +02:00
cpuidle More power management updates for 4.20-rc1 2018-10-30 09:08:07 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-10-25 16:43:35 -07:00
dax
dca
devfreq
dio
dma pci-v4.20-changes 2018-10-25 06:50:48 -07:00
dma-buf
edac * skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo) 2018-11-02 11:17:22 -07:00
eisa
extcon
firewire
firmware Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-11-03 18:25:17 -07:00
fmc
fpga fpga: add devm_fpga_region_create 2018-10-16 11:13:50 +02:00
fsi iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
gnss
gpio pci-v4.20-changes 2018-10-25 06:50:48 -07:00
gpu drm, i915, amdgpu, bridge + core quirk 2018-11-02 10:58:20 -07:00
hid platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
hsi
hv hv_balloon: Replace spin_is_locked() with lockdep 2018-10-15 20:54:17 +02:00
hwmon Lots of small changes to the IPMI driver. Most of the changes 2018-10-23 09:42:05 +01:00
hwspinlock
hwtracing
i2c i2c: Clear client->irq in i2c_device_remove 2018-10-31 23:33:34 +00:00
ide
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
iio Staging/IIO patches for 4.20-rc1 2018-10-29 10:38:10 -07:00
infiniband scsi: target: replace fabric_ops.name with fabric_alias 2018-11-28 18:50:59 -05:00
input Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
iommu mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
ipack
irqchip irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function 2018-11-01 12:38:48 +01:00
isdn Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
leds leds: gpio: set led_dat->gpiod pointer for OF defined GPIO leds 2018-10-26 20:51:36 +02:00
lightnvm
macintosh memblock: stop using implicit alignment to SMP_CACHE_BYTES 2018-10-31 08:54:16 -07:00
mailbox - Convert print users to use the %pOFn format specifier 2018-10-29 10:30:44 -07:00
mcb
md for-linus-20181102 2018-11-02 11:25:48 -07:00
media media updates for v4.20-rc1 2018-10-31 10:53:29 -07:00
memory
memstick
message
mfd chrome-platform for v4.20 2018-10-31 16:47:55 -07:00
misc Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
mmc Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
mtd This pull request contains updates for UBIFS: 2018-11-04 14:46:04 -08:00
mux This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
net NTB IDT thermal changes and hook into hwmon, ntb_netdev clean-up of 2018-11-04 08:12:44 -08:00
nfc NFC: nfcmrvl_uart: fix OF child-node lookup 2018-10-23 13:28:53 -05:00
ntb ntb: idt: Alter the driver info comments 2018-11-01 10:33:12 -04:00
nubus
nvdimm libnvdimm for 4.20 2018-10-25 06:31:56 -07:00
nvme for-linus-20181102 2018-11-02 11:25:48 -07:00
nvmem nvmem: hide unused nvmem_find_cell_by_index function 2018-10-15 15:56:15 +02:00
of Devicetree fixes for v4.20-rc1: 2018-11-01 14:45:38 -07:00
opp
oprofile
parisc parisc: Add alternative coding infrastructure 2018-10-17 17:22:26 +02:00
parport
pci Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
pcmcia powerpc updates for 4.20 2018-10-26 14:36:21 -07:00
perf arm64 updates for 4.20: 2018-10-22 17:30:06 +01:00
phy USB/PHY patches for 4.20-rc1 2018-10-26 08:14:13 -07:00
pinctrl This is the bulk of GPIO changes for the v4.20 series: 2018-10-23 08:45:05 +01:00
platform platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
pnp
power Devicetree updates for 4.20: 2018-10-26 12:09:58 -07:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-10-23 13:32:18 +01:00
pps
ps3
ptp ptp: drop redundant kasprintf() to create worker name 2018-10-28 19:20:06 -07:00
pwm pwm: lpss: Only set update bit if we are actually changing the settings 2018-10-16 13:16:15 +02:00
rapidio
ras
regulator regulator: Regulator updates for next release 2018-10-23 01:54:44 +01:00
remoteproc remoteproc: qcom: q6v5-mss: Register segments/dumpfn for coredump 2018-10-19 12:54:03 -07:00
reset ARM: SoC driver updates for 4.17 2018-10-29 15:16:01 -07:00
rpmsg
rtc rtc: sc27xx: Always read normal alarm when registering RTC device 2018-10-25 02:35:42 +02:00
s390 scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown 2018-12-07 22:03:24 -05:00
sbus
scsi scsi: sun_esp: Use of_node_name_eq for node name comparisons 2018-12-07 21:56:06 -05:00
sfi mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sh
siox
slimbus
sn
soc soc: ti: QMSS: Fix usage of irq_set_affinity_hint 2018-11-02 11:22:09 -07:00
soundwire
spi - New Drivers 2018-10-25 06:19:15 -07:00
spmi
ssb
staging media updates for v4.20-rc1 2018-10-31 10:53:29 -07:00
target scsi: target: perform t10_wwn ID initialisation in target_alloc_device() 2018-12-07 21:54:33 -05:00
tc
tee
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2018-10-31 11:28:12 -07:00
thunderbolt
tty mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
uio
usb scsi: target: replace fabric_ops.name with fabric_alias 2018-11-28 18:50:59 -05:00
uwb
vfio VFIO updates for v4.20 2018-10-31 11:01:38 -07:00
vhost scsi: target: replace fabric_ops.name with fabric_alias 2018-11-28 18:50:59 -05:00
video fbdev changes for v4.20: 2018-10-31 11:41:37 -07:00
virt
virtio virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON 2018-10-24 20:57:55 -04:00
visorbus
vlynq
vme
w1 w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). 2018-10-15 20:50:32 +02:00
watchdog watchdog: ts4800: release syscon device node in ts4800_wdt_probe() 2018-10-22 10:16:28 +02:00
xen scsi: target: replace fabric_ops.name with fabric_alias 2018-11-28 18:50:59 -05:00
zorro
Kconfig
Makefile