linux/drivers/nvme/host
Dongli Zhang 9210c075ce nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()
There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g.,
when doing live reset while polling the nvme device.

      CPU X                        CPU Y
                               nvme_poll()
nvme_dev_disable()
-> nvme_stop_queues()
-> nvme_suspend_io_queues()
-> nvme_suspend_queue()
                               -> spin_lock(&nvmeq->cq_poll_lock);
-> nvme_reap_pending_cqes()
   -> nvme_process_cq()        -> nvme_process_cq()

In the above scenario, the nvme_process_cq() for the same queue may be
running on both CPU X and CPU Y concurrently.

It is much more easier to reproduce the issue when CONFIG_PREEMPT is
enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer
time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace
period.

This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in
nvme_reap_pending_cqes().

Fixes: fa46c6fb5d ("nvme/pci: move cqe check after device shutdown")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27 20:32:56 +02:00
..
core.c nvme: fix possible hang when ns scanning fails during error recovery 2020-05-09 16:07:58 -06:00
fabrics.c nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow 2020-03-26 04:51:55 +09:00
fabrics.h nvme: Remove ADMIN_ONLY state 2019-10-14 23:21:44 +09:00
fault_inject.c nvme: enable to inject errors into admin commands 2019-06-21 11:15:50 +02:00
fc.c nvme-fc: Revert "add module to ops template to allow module references" 2020-04-04 09:09:39 +02:00
hwmon.c nvme: hwmon: switch to use <linux/units.h> helpers 2020-01-31 10:30:40 -08:00
Kconfig nvme: Don't deter users from enabling hwmon support 2020-03-26 04:45:25 +09:00
lightnvm.c lightnvm: move metadata mapping to lower level driver 2019-08-06 08:20:10 -06:00
Makefile nvme: Add hardware monitoring support 2019-11-12 01:57:35 +09:00
multipath.c nvme: fix deadlock caused by ANA update wrong locking 2020-04-04 09:07:03 +02:00
nvme.h nvme: Fix controller creation races with teardown flow 2020-03-26 04:51:56 +09:00
pci.c nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll() 2020-05-27 20:32:56 +02:00
rdma.c block-5.7-2020-04-10 2020-04-10 10:06:54 -07:00
tcp.c nvme-tcp: fix possible crash in recv error flow 2020-04-01 11:07:13 +02:00
trace.c nvme: trace: parse Get LBA Status command in detail 2019-08-29 12:55:01 -07:00
trace.h nvme-trace: print result and status in hex format 2019-06-21 11:12:37 +02:00