linux/block
Muchun Song 96a9fe64bf block: fix ordering between checking BLK_MQ_S_STOPPED request adding
Supposing first scenario with a virtio_blk driver.

CPU0                        CPU1

blk_mq_try_issue_directly()
  __blk_mq_issue_directly()
    q->mq_ops->queue_rq()
      virtio_queue_rq()
        blk_mq_stop_hw_queue()
                            virtblk_done()
  blk_mq_request_bypass_insert()  1) store
                              blk_mq_start_stopped_hw_queue()
                                clear_bit(BLK_MQ_S_STOPPED)       3) store
                                blk_mq_run_hw_queue()
                                  if (!blk_mq_hctx_has_pending()) 4) load
                                    return
                                  blk_mq_sched_dispatch_requests()
  blk_mq_run_hw_queue()
    if (!blk_mq_hctx_has_pending())
      return
    blk_mq_sched_dispatch_requests()
      if (blk_mq_hctx_stopped())  2) load
        return
      __blk_mq_sched_dispatch_requests()

Supposing another scenario.

CPU0                        CPU1

blk_mq_requeue_work()
  blk_mq_insert_request() 1) store
                            virtblk_done()
                              blk_mq_start_stopped_hw_queue()
  blk_mq_run_hw_queues()        clear_bit(BLK_MQ_S_STOPPED)       3) store
                                blk_mq_run_hw_queue()
                                  if (!blk_mq_hctx_has_pending()) 4) load
                                    return
                                  blk_mq_sched_dispatch_requests()
    if (blk_mq_hctx_stopped())  2) load
      continue
    blk_mq_run_hw_queue()

Both scenarios are similar, the full memory barrier should be inserted
between 1) and 2), as well as between 3) and 4) to make sure that either
CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list.
Otherwise, either CPU will not rerun the hardware queue causing
starvation of the request.

The easy way to fix it is to add the essential full memory barrier into
helper of blk_mq_hctx_stopped(). In order to not affect the fast path
(hardware queue is not stopped most of the time), we only insert the
barrier into the slow path. Actually, only slow path needs to care about
missing of dispatching the request to the low-level device driver.

Fixes: 320ae51fee ("blk-mq: new multi-queue block IO queueing mechanism")
Cc: stable@vger.kernel.org
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241014092934.53630-4-songmuchun@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22 08:16:40 -06:00
..
partitions block: add partition uuid into uevent as "PARTUUID" 2024-10-22 08:15:17 -06:00
badblocks.c badblocks: avoid checking invalid range in badblocks_check() 2023-12-23 18:38:08 -07:00
bdev.c for-6.12/block-20240925 2024-09-25 14:56:40 -07:00
bfq-cgroup.c block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator() 2024-09-10 16:32:09 -06:00
bfq-iosched.c block, bfq: factor out a helper to split bfqq in bfq_init_rq() 2024-09-10 16:32:09 -06:00
bfq-iosched.h block, bfq: remove bfq_log_bfqg() 2024-09-10 16:32:09 -06:00
bfq-wf2q.c block, bfq: inject I/O to underutilized actuators 2023-01-29 15:18:33 -07:00
bio-integrity.c Linux 6.11 2024-09-17 08:32:53 -06:00
bio.c block: unpin user pages belonging to a folio at once 2024-09-11 07:24:01 -06:00
blk-cgroup-fc-appid.c block: Replace all non-returning strlcpy with strscpy 2023-06-01 09:13:31 -06:00
blk-cgroup-rwstat.c blk-cgroup: use group allocation/free of per-cpu counters API 2024-04-03 09:10:17 -06:00
blk-cgroup-rwstat.h block: Use the new blk_opf_t type 2022-07-14 12:14:30 -06:00
blk-cgroup.c blk-ioprio: remove per-disk structure 2024-07-28 16:47:51 -06:00
blk-cgroup.h blk-cgroup: Remove unused declaration blkg_path() 2024-08-16 15:07:27 -06:00
blk-core.c scsi: block: Don't check REQ_ATOMIC for reads 2024-08-12 18:03:38 -04:00
blk-crypto-fallback.c block, fs: Restore the per-bio/request data lifetime fields 2024-02-06 14:31:05 +01:00
blk-crypto-internal.h blk-crypto: remove blk_crypto_insert_cloned_request() 2023-03-16 09:35:09 -06:00
blk-crypto-profile.c blk-crypto: use dynamic lock class for blk_crypto_profile::lock 2023-07-05 16:36:12 -06:00
blk-crypto-sysfs.c block: make kobj_type structures constant 2023-02-09 09:38:16 -07:00
blk-crypto.c blk-crypto: make blk_crypto_evict_key() more robust 2023-03-16 09:35:09 -06:00
blk-flush.c for-6.11/block-20240710 2024-07-15 14:20:22 -07:00
blk-ia-ranges.c block: make kobj_type structures constant 2023-02-09 09:38:16 -07:00
blk-integrity.c block: fix blk_rq_map_integrity_sg kernel-doc 2024-10-02 07:15:33 -06:00
blk-ioc.c block: replace call_rcu by kfree_rcu for simple kmem_cache_free callback 2024-10-22 08:16:40 -06:00
blk-iocost.c blk_iocost: remove some duplicate irq disable/enables 2024-10-02 07:15:43 -06:00
blk-iolatency.c block: add blk_time_get_ns() and blk_time_get() helpers 2024-02-05 10:07:22 -07:00
blk-ioprio.c blk-ioprio: remove per-disk structure 2024-07-28 16:47:51 -06:00
blk-ioprio.h blk-ioprio: remove per-disk structure 2024-07-28 16:47:51 -06:00
blk-lib.c block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes 2024-08-28 08:49:25 -06:00
blk-map.c block: don't free the integrity payload in bio_integrity_unmap_free_user 2024-07-03 10:21:16 -06:00
blk-merge.c block: kill blk_do_io_stat() helper 2024-10-22 08:14:56 -06:00
blk-mq-cpumap.c blk-mq: include <linux/blk-mq.h> in block/blk-mq.h 2023-04-13 06:52:29 -06:00
blk-mq-debugfs.c block: Catch possible entries missing from rqf_name[] 2024-07-19 09:32:49 -06:00
blk-mq-debugfs.h block: Replace zone_wlock debugfs entry with zone_wplugs entry 2024-04-17 08:44:03 -06:00
blk-mq-pci.c blk-mq: include <linux/blk-mq.h> in block/blk-mq.h 2023-04-13 06:52:29 -06:00
blk-mq-sched.c blk-mq: Remove the hctx 'run' debugfs attribute 2024-01-17 14:16:34 -07:00
blk-mq-sched.h blk-mq: make sure elevator callbacks aren't called for passthrough request 2023-05-18 19:42:54 -06:00
blk-mq-sysfs.c blk-mq: include <linux/blk-mq.h> in block/blk-mq.h 2023-04-13 06:52:29 -06:00
blk-mq-tag.c block: Fix lockdep warning in blk_mq_mark_tag_wait 2024-08-15 19:25:03 -06:00
blk-mq-virtio.c blk-mq: include <linux/blk-mq.h> in block/blk-mq.h 2023-04-13 06:52:29 -06:00
blk-mq.c block: fix ordering between checking BLK_MQ_S_STOPPED request adding 2024-10-22 08:16:40 -06:00
blk-mq.h block: fix ordering between checking BLK_MQ_S_STOPPED request adding 2024-10-22 08:16:40 -06:00
blk-pm.c block: Remove blk_set_runtime_active() 2023-11-20 10:22:40 -07:00
blk-pm.h
blk-rq-qos.c blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race 2024-10-16 07:20:14 -06:00
blk-rq-qos.h block: skip QUEUE_FLAG_STATS and rq-qos for passthrough io 2023-12-01 18:29:18 -07:00
blk-settings.c block: Remove unused blk_limits_io_{min,opt} 2024-09-20 00:19:48 -06:00
blk-stat.c blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW 2024-05-09 09:44:55 -06:00
blk-stat.h block: delete redundant function declaration 2024-05-27 13:58:06 -06:00
blk-sysfs.c block: enable passthrough command statistics 2024-10-22 08:16:32 -06:00
blk-throttle.c Revert "blk-throttle: Fix IO hang for a corner case" 2024-10-22 08:16:40 -06:00
blk-throttle.h blk-throttle: remove last_low_overflow_time 2024-09-10 16:31:41 -06:00
blk-timeout.c
blk-wbt.c blk-wbt: don't throttle swap writes in direct reclaim 2024-07-01 06:51:53 -06:00
blk-wbt.h blk-wbt: remove the separate write cache tracking 2023-12-26 09:28:10 -07:00
blk-zoned.c for-6.11/block-20240710 2024-07-15 14:20:22 -07:00
blk.h block: add support for defining read-only partitions 2024-10-22 08:14:56 -06:00
bounce.c block: split integrity support out of bio.h 2024-07-03 10:21:15 -06:00
bsg-lib.c scsi: bsg: Pass dev to blk_mq_alloc_queue() 2024-05-30 20:22:15 -04:00
bsg.c SCSI misc on 20230629 2023-06-30 11:57:07 -07:00
disk-events.c block: move bdev_mark_dead out of disk_check_media_change 2023-10-28 13:29:23 +02:00
early-lookup.c wrapper for access to ->bd_partno 2024-05-02 17:48:09 -04:00
elevator.c block: return void from the queue_sysfs_entry load_module method 2024-10-22 08:16:22 -06:00
elevator.h block: return void from the queue_sysfs_entry load_module method 2024-10-22 08:16:22 -06:00
fops.c vfs-6.12.blocksize 2024-09-20 17:53:17 -07:00
genhd.c block: introduce add_disk_fwnode() 2024-10-22 08:14:56 -06:00
holder.c block: fix deadlock between bd_link_disk_holder and partition scan 2024-02-23 07:44:19 -07:00
ioctl.c block: implement async io_uring discard cmd 2024-09-11 10:45:28 -06:00
ioprio.c block: move __get_task_ioprio() into header file 2024-01-08 12:27:39 -07:00
Kconfig block: remove the blk_integrity_profile structure 2024-06-14 10:20:06 -06:00
Kconfig.iosched block: Default to use cgroup support for BFQ 2023-01-30 09:42:42 -07:00
kyber-iosched.c blk-mq: pass a flags argument to elevator_type->insert_requests 2023-04-13 06:52:30 -06:00
Makefile block: remove the blk_integrity_profile structure 2024-06-14 10:20:06 -06:00
mq-deadline.c block/mq-deadline: Fix the tag reservation code 2024-07-02 08:47:45 -06:00
opal_proto.h block: sed-opal: handle empty atoms when parsing response 2024-02-16 15:52:45 -07:00
sed-opal.c block: sed-opal: add ioctl IOC_OPAL_SET_SID_PW 2024-10-22 08:16:40 -06:00
t10-pi.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00