linux/drivers
Gabriel Krisman Bertazi 7e7cd796f2 scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
iSCSI suffers from a deadlock in case a management command submitted via
the netlink socket sleeps on an allocation while holding the rx_queue_mutex
if that allocation causes a memory reclaim that writebacks to a failed
iSCSI device.  The recovery procedure can never make progress to recover
the failed disk or abort outstanding IO operations to complete the reclaim
(since rx_queue_mutex is locked), thus locking the system.

Nevertheless, just marking all allocations under rx_queue_mutex as GFP_NOIO
(or locking the userspace process with something like PF_MEMALLOC_NOIO) is
not enough, since the iSCSI command code relies on other subsystems that
try to grab locked mutexes, whose threads are GFP_IO, leading to the same
deadlock. One instance where this situation can be observed is in the
backtraces below, stitched from multiple bugs reports, involving the kobj
uevent sent when a session is created.

The root of the problem is not the fact that iSCSI does GFP_IO allocations,
that is acceptable. The actual problem is that rx_queue_mutex has a very
large granularity, covering every unrelated netlink command execution at
the same time as the error recovery path.

The proposed fix leverages the recently added mechanism to stop failed
connections from the kernel, by enabling it to execute even though a
management command from the netlink socket is being run (rx_queue_mutex is
held), provided that the command is known to be safe.  It splits the
rx_queue_mutex in two mutexes, one protecting from concurrent command
execution from the netlink socket, and one protecting stop_conn from racing
with other connection management operations that might conflict with it.

It is not very pretty, but it is the simplest way to resolve the deadlock.
I considered making it a lock per connection, but some external mutex would
still be needed to deal with iscsi_if_destroy_conn.

The patch was tested by forcing a memory shrinker (unrelated, but used
bufio/dm-verity) to reclaim iSCSI pages every time
ISCSI_UEVENT_CREATE_SESSION happens, which is reasonable to simulate
reclaims that might happen with GFP_KERNEL on that path.  Then, a faulty
hung target causes a connection to fail during intensive IO, at the same
time a new session is added by iscsid.

The following stacktraces are stiches from several bug reports, showing a
case where the deadlock can happen.

 iSCSI-write
         holding: rx_queue_mutex
         waiting: uevent_sock_mutex

         kobject_uevent_env+0x1bd/0x419
         kobject_uevent+0xb/0xd
         device_add+0x48a/0x678
         scsi_add_host_with_dma+0xc5/0x22d
         iscsi_host_add+0x53/0x55
         iscsi_sw_tcp_session_create+0xa6/0x129
         iscsi_if_rx+0x100/0x1247
         netlink_unicast+0x213/0x4f0
         netlink_sendmsg+0x230/0x3c0

 iscsi_fail iscsi_conn_failure
         waiting: rx_queue_mutex

         schedule_preempt_disabled+0x325/0x734
         __mutex_lock_slowpath+0x18b/0x230
         mutex_lock+0x22/0x40
         iscsi_conn_failure+0x42/0x149
         worker_thread+0x24a/0xbc0

 EventManager_
         holding: uevent_sock_mutex
         waiting: dm_bufio_client->lock

         dm_bufio_lock+0xe/0x10
         shrink+0x34/0xf7
         shrink_slab+0x177/0x5d0
         do_try_to_free_pages+0x129/0x470
         try_to_free_mem_cgroup_pages+0x14f/0x210
         memcg_kmem_newpage_charge+0xa6d/0x13b0
         __alloc_pages_nodemask+0x4a3/0x1a70
         fallback_alloc+0x1b2/0x36c
         __kmalloc_node_track_caller+0xb9/0x10d0
         __alloc_skb+0x83/0x2f0
         kobject_uevent_env+0x26b/0x419
         dm_kobject_uevent+0x70/0x79
         dev_suspend+0x1a9/0x1e7
         ctl_ioctl+0x3e9/0x411
         dm_ctl_ioctl+0x13/0x17
         do_vfs_ioctl+0xb3/0x460
         SyS_ioctl+0x5e/0x90

 MemcgReclaimerD"
         holding: dm_bufio_client->lock
         waiting: stuck io to finish (needs iscsi_fail thread to progress)

         schedule at ffffffffbd603618
         io_schedule at ffffffffbd603ba4
         do_io_schedule at ffffffffbdaf0d94
         __wait_on_bit at ffffffffbd6008a6
         out_of_line_wait_on_bit at ffffffffbd600960
         wait_on_bit.constprop.10 at ffffffffbdaf0f17
         __make_buffer_clean at ffffffffbdaf18ba
         __cleanup_old_buffer at ffffffffbdaf192f
         shrink at ffffffffbdaf19fd
         do_shrink_slab at ffffffffbd6ec000
         shrink_slab at ffffffffbd6ec24a
         do_try_to_free_pages at ffffffffbd6eda09
         try_to_free_mem_cgroup_pages at ffffffffbd6ede7e
         mem_cgroup_resize_limit at ffffffffbd7024c0
         mem_cgroup_write at ffffffffbd703149
         cgroup_file_write at ffffffffbd6d9c6e
         sys_write at ffffffffbd6662ea
         system_call_fastpath at ffffffffbdbc34a2

Link: https://lore.kernel.org/r/20200520022959.1912856-1-krisman@collabora.com
Reported-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-26 21:20:24 -04:00
..
accessibility
acpi More ACPI updates for 5.7-rc1 2020-04-10 09:52:15 -07:00
amba Revert "amba: Initialize dma_parms for amba devices" 2020-04-01 08:03:28 +02:00
android
ata ahci: Add Intel Comet Lake PCH RAID PCI ID 2020-04-09 09:31:38 -06:00
atm .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
auxdisplay
base mm/memory_hotplug: allow to specify a default online_type 2020-04-07 10:43:41 -07:00
bcma
block xen: branch for v5.7-rc1b 2020-04-10 17:20:06 -07:00
bluetooth
bus ARM: driver updates 2020-04-03 15:05:35 -07:00
cdrom
char Merge branch 'akpm' (patches from Andrew) 2020-04-10 17:57:48 -07:00
clk There's not much to see in the core framework this time around. Instead the 2020-04-05 10:43:32 -07:00
clocksource clocksource/drivers/timer-vf-pit: Add missing parenthesis 2020-04-05 09:24:58 +02:00
connector
counter
cpufreq Additional power management updates for 5.7-rc1 2020-04-06 10:14:39 -07:00
cpuidle Merge branch 'pm-cpuidle' 2020-04-10 11:32:22 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-04-08 21:35:29 -07:00
dax dax: Move mandatory ->zero_page_range() check in alloc_dax() 2020-04-02 19:15:03 -07:00
dca
devfreq PM / devfreq: Fix handling dev_pm_qos_remove_request result 2020-03-25 08:35:03 +09:00
dio
dma drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings 2020-04-10 15:36:22 -07:00
dma-buf A bunch of fixes to avoid null pointer dereference in fbcon, fix a return 2020-04-08 09:14:34 +10:00
edac Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
eisa .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
extcon Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
firewire
firmware sound fixes for 5.7-rc1 2020-04-10 12:27:06 -07:00
fpga
fsi
gnss
gpio This is the bulk of GPIO development for the v5.7 kernel cycle. 2020-04-04 10:27:00 -07:00
gpu Kbuild updates for v5.7 (2nd) 2020-04-11 09:46:12 -07:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2020-04-01 15:18:42 -07:00
hsi
hv hv_balloon: don't check for memhp_auto_online manually 2020-04-07 10:43:40 -07:00
hwmon change email address for Pali Rohár 2020-04-10 15:36:22 -07:00
hwspinlock hwspinlock: hwspinlock_internal.h: Replace zero-length array with flexible-array member 2020-03-25 22:30:46 -07:00
hwtracing
i2c Merge branch 'i2c/for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2020-04-02 15:54:13 -07:00
i3c i3c: convert to use i2c_new_client_device() 2020-03-29 10:35:50 +02:00
ide drivers/ide: Fix build regression. 2020-04-04 18:07:59 -07:00
idle Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
iio chrome platform changes for 5.7 2020-04-08 21:25:49 -07:00
infiniband RDMA 5.7 pull request 2020-04-01 18:18:18 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2020-04-07 20:20:12 -07:00
interconnect
iommu Merge branches 'iommu/fixes', 'arm/qcom', 'arm/omap', 'arm/smmu', 'x86/amd', 'x86/vt-d', 'virtio' and 'core' into next 2020-03-27 11:33:27 +01:00
ipack
irqchip Two reverts addressing regressions of the Xilinx interrupt controller 2020-04-05 11:57:12 -07:00
isdn
leds leds: core: Fix warning message when init_data 2020-04-06 23:12:08 +02:00
lightnvm for-5.7/drivers-2020-03-29 2020-03-30 11:43:51 -07:00
macintosh Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
mailbox
mcb
md libnvdimm for 5.7 2020-04-08 21:03:40 -07:00
media Power management updates for 5.7-rc1 2020-03-30 15:05:01 -07:00
memory ARM: driver updates 2020-04-03 15:05:35 -07:00
memstick
message scsi: docs: fusion: get rid of a doc build warning 2020-04-13 14:20:00 -04:00
mfd mfd: intel-lpss: Fix Intel Elkhart Lake LPSS I2C input clock 2020-03-30 07:35:28 +01:00
misc virtio: fixes, vdpa 2020-04-08 10:51:53 -07:00
mmc MMC core: 2020-03-31 16:13:09 -07:00
most
mtd This pull request contains fixes for UBI and UBIFS: 2020-04-07 12:40:56 -07:00
mux
net scsi: qed: Send BW update notifications to the protocol drivers 2020-04-17 17:55:26 -04:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-25 18:58:11 -07:00
ntb pci-v5.7-changes 2020-04-03 14:25:02 -07:00
nubus
nvdimm libnvdimm for 5.7 2020-04-08 21:03:40 -07:00
nvme block-5.7-2020-04-10 2020-04-10 10:06:54 -07:00
nvmem nvmem: core: remove nvmem_sysfs_get_groups() 2020-03-25 19:23:49 +01:00
of Devicetree updates for v5.7: 2020-04-02 17:32:52 -07:00
opp
oprofile
parisc parisc: Replace setup_irq() by request_irq() 2020-04-05 22:05:23 +02:00
parport
pci IOMMU Updates for Linux v5.7 2020-04-08 11:00:00 -07:00
pcmcia pcmcia: remove some unused space characters 2020-03-31 18:48:22 +02:00
perf arm64 updates for 5.7: 2020-03-31 10:05:01 -07:00
phy pci-v5.7-changes 2020-04-03 14:25:02 -07:00
pinctrl This is the bulk of GPIO development for the v5.7 kernel cycle. 2020-04-04 10:27:00 -07:00
platform change email address for Pali Rohár 2020-04-10 15:36:22 -07:00
pnp
power change email address for Pali Rohár 2020-04-10 15:36:22 -07:00
powercap Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:40:08 -07:00
pps
ps3 powerpc/ps3: Remove an unneeded NULL check 2020-04-03 00:09:59 +11:00
ptp ptp: Avoid deadlocks in the programmable pin code. 2020-03-30 11:16:38 -07:00
pwm pwm: pca9685: Fix PWM/GPIO inter-operation 2020-04-03 21:41:42 +02:00
rapidio
ras
regulator spi/regulator: Updates for v5.7 2020-03-30 14:58:26 -07:00
remoteproc remoteproc/omap: Fix set_load call in omap_rproc_request_timer 2020-04-03 10:47:21 -07:00
reset
rpmsg
rtc - New Drivers 2020-04-07 19:48:52 -07:00
s390 scsi: zfcp: Move allocation of the shost object to after xconf- and xport-data 2020-05-11 23:19:52 -04:00
sbus
scsi scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim 2020-05-26 21:20:24 -04:00
sfi
sh
siox
slimbus
soc RISC-V Patches for the 5.7 Merge Window, Part 1 2020-04-09 10:51:30 -07:00
soundwire Char/Misc driver patches for 5.7-rc1 2020-04-03 13:22:40 -07:00
spi sound updates for 5.7-rc1 2020-04-02 15:50:04 -07:00
spmi
ssb
staging mm/vma: introduce VM_ACCESS_FLAGS 2020-04-10 15:36:21 -07:00
target scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd() 2020-05-26 15:54:39 -04:00
tc
tee ARM: driver updates 2020-04-03 15:05:35 -07:00
thermal - Convert tsens configuration DT binding to yaml (Rajeshwari) 2020-04-07 20:00:16 -07:00
thunderbolt
tty powerpc updates for 5.7 2020-04-05 11:12:59 -07:00
uio
usb SCSI misc on 20200402 2020-04-02 17:03:53 -07:00
vdpa vdpa: move to drivers/vdpa 2020-04-02 10:41:40 -04:00
vfio vfio: Ignore -ENODEV when getting MSI cookie 2020-04-01 13:51:51 -06:00
vhost scsi: vhost: Notify TCM about the maximum sg entries supported per command 2020-05-26 15:52:08 -04:00
video drm fixes for 5.7-rc1 2020-04-07 20:24:34 -07:00
virt
virtio virtio: fixes, vdpa 2020-04-08 10:51:53 -07:00
visorbus
vlynq
vme
w1
watchdog watchdog: Add K3 RTI watchdog support 2020-04-01 11:35:23 +02:00
xen xen: branch for v5.7-rc1b 2020-04-10 17:20:06 -07:00
zorro SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
Kconfig virtio: fixes, vdpa 2020-04-08 10:51:53 -07:00
Makefile virtio: fixes, vdpa 2020-04-08 10:51:53 -07:00