linux/drivers/ufs/core
Ziqi Chen 77691af484 scsi: ufs: core: Quiesce request queues before checking pending cmds
In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked,
ufshcd_pending_cmds() is called to check whether there are pending
transactions or not. And only if there are no pending transactions can we
proceed to kickstart the clock scaling sequence.

ufshcd_pending_cmds() traverses over all SCSI devices and calls
sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down
to three steps:

 1. Calculate the nr outstanding bits set in the 'word' bitmap.

 2. Calculate the nr outstanding bits set in the 'cleared' bitmap.

 3. Subtract the result from step 1 by the result from step 2.

This can lead to a race condition as outlined below:

Assume there is one pending transaction in the request queue of one SCSI
device, say sda, and the budget token of this request is 0, the 'word' is
0x1 and the 'cleared' is 0x0.

 1. When step 1 executes, it gets the result as 1.

 2. Before step 2 executes, block layer tries to dispatch a new request to
    sda. Since the SCSI layer is blocked, the request cannot pass through
    SCSI but the block layer would do budget_get() and budget_put() to
    sda's budget map regardless, so the 'word' has become 0x3 and 'cleared'
    has become 0x2 (assume the new request got budget token 1).

 3. When step 2 executes, it gets the result as 1.

 4. When step 3 executes, it gets the result as 0, meaning there is no
    pending transactions, which is wrong.

    Thread A                        Thread B
    ufshcd_pending_cmds()           __blk_mq_sched_dispatch_requests()
    |                               |
    sbitmap_weight(word)            |
    |                               scsi_mq_get_budget()
    |                               |
    |                               scsi_mq_put_budget()
    |                               |
    sbitmap_weight(cleared)
    ...

When this race condition happens, the clock scaling sequence is started
with transactions still in flight, leading to subsequent hibernate enter
failure, broken link, task abort and back to back error recovery.

Fix this race condition by quiescing the request queues before calling
ufshcd_pending_cmds() so that block layer won't touch the budget map when
ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer
blocking/unblocking to reduce redundancies and latencies.

Fixes: 8d077ede48 ("scsi: ufs: Optimize the command queueing code")
Co-developed-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Can Guo <quic_cang@quicinc.com>
Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com>
Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11 21:22:33 -04:00
..
Kconfig scsi: ufs: core: Remove HPB support 2023-07-23 16:40:39 -04:00
Makefile scsi: ufs: core: Remove HPB support 2023-07-23 16:40:39 -04:00
ufs_bsg.c scsi: bsg: Pass queue_limits to bsg_setup_queue() 2024-04-11 21:37:48 -04:00
ufs_bsg.h
ufs-debugfs.c
ufs-debugfs.h
ufs-fault-injection.c scsi: ufs: core: Make fault injection dynamically configurable per HBA 2023-11-24 19:23:35 -05:00
ufs-fault-injection.h scsi: ufs: core: Make fault injection dynamically configurable per HBA 2023-11-24 19:23:35 -05:00
ufs-hwmon.c scsi: ufs: Rename a function argument 2023-07-31 15:17:50 -04:00
ufs-mcq.c scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort() 2024-05-30 20:40:48 -04:00
ufs-sysfs.c scsi: ufs: core: Add CPU latency QoS support for UFS driver 2024-01-23 21:00:02 -05:00
ufs-sysfs.h
ufshcd-crypto.c scsi: ufs: Ungate the clock synchronously 2023-05-31 11:44:01 -04:00
ufshcd-crypto.h scsi: ufs: Simplify transfer request header initialization 2023-07-31 15:17:51 -04:00
ufshcd-priv.h scsi: ufs: Improve type safety 2023-07-31 15:17:50 -04:00
ufshcd.c scsi: ufs: core: Quiesce request queues before checking pending cmds 2024-06-11 21:22:33 -04:00