Commit Graph

1233514 Commits

Author SHA1 Message Date
Coly Li
146e843f6b badblocks: avoid checking invalid range in badblocks_check()
If prev_badblocks() returns '-1', it means no valid badblocks record
before the checking range. It doesn't make sense to check whether
the input checking range is overlapped with the non-existed invalid
front range.

This patch checkes whether 'prev >= 0' is true before calling
overlap_front(), to void such invalid operations.

Fixes: 3ea3354cb9 ("badblocks: improve badblocks_check() for multiple ranges handling")
Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/nvdimm/3035e75a-9be0-4bc3-8d4a-6e52c207f277@leemhuis.info/
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Geliang Tang <geliang.tang@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/r/20231224002820.20234-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-23 18:38:08 -07:00
Jens Axboe
13d822bf1c nvme fixes for Linux 6.7
- Revert a commit with improper sleep context (Keith)
  - Fix async event handling sleep context (Maurizio)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmWErcoACgkQPe3zGtjz
 RglIchAAy/XX+4IdJkA6jhr8r9gNjVMQRkAeoTwIPFqV81ixELqjgc1DfKO58YrM
 2U2Rt4jfrLSzJSQF0DKYSSDa16ztlWd6i1y9Zyl9sxv0UvUm+zv4VzNzNkFMHqEl
 s5IiyPeBQUVqHvWtkNF9p+I24J3yzYtQ/5cWqBFRgCWVX73AwqgnP6Qmsa5tDue+
 eG+0PTZg2uyq64rZtX6m3UVsERat96iU3x4sXLCNkWyKlbMjZGNCumKZCbBkty4N
 coFF6PNv5Sw9M7ywYa5Bjc20qP/9FqMRtkaT6DPMi3nl0QiIRs7QLMdEukpho2Q/
 pNpgs0KsG4TxXIx0QDXzirXRDE0+ShmHd6JYDqOcA9omGtsy+h2QY4zhILXHd7Sc
 0tchIY4XVxzULObDnJU8FQYt/QXH9VWN97RWCqayv/Iynjv7PNStGZOkInpQeA6K
 Gm25q9LDCWA0pfb0hQcUiT/CMAgRwg6ze9m23FBAUvsf2gIogIlw6RdpEUCPZ51t
 bVUHvCv8p36RSw/NLE0vW4GWafMlKJgabZFjH2RNu6jya2Rz0/eW/rnoqV5vQkR0
 4ZFVdDbD6ATXcDu+Ig02RVFjCifnYNJXtDH3En2eLRo6/nGXAEz4zva4rSNAFGLN
 GP1qu5kEEUST6zMz4yWQxbDrZ3q2wFM448ZI6XLx+UN6L3SQF2c=
 =Q6dO
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme into block-6.7

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.7

 - Revert a commit with improper sleep context (Keith)
 - Fix async event handling sleep context (Maurizio)"

* tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme:
  nvme-pci: fix sleeping function called from interrupt context
  Revert "nvme-fc: fix race between error recovery and creating association"
2023-12-21 14:32:35 -07:00
Maurizio Lombardi
f6fe0b2d35 nvme-pci: fix sleeping function called from interrupt context
the nvme_handle_cqe() interrupt handler calls nvme_complete_async_event()
but the latter may call nvme_auth_stop() which is a blocking function.
Sleeping functions can't be called in interrupt context

 BUG: sleeping function called from invalid context
 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/15
  Call Trace:
     <IRQ>
      __cancel_work_timer+0x31e/0x460
      ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core]
      ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core]
      nvme_complete_async_event+0x365/0x480 [nvme_core]
      nvme_poll_cq+0x262/0xe50 [nvme]

Fix the bug by moving nvme_auth_stop() to fw_act_work
(executed by the nvme_wq workqueue)

Fixes: f50fff73d6 ("nvme: implement In-Band authentication")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-19 12:41:05 -08:00
Keith Busch
d3e8b18587 Revert "nvme-fc: fix race between error recovery and creating association"
The commit was identified to might sleep in invalid context and is
blocking regression testing.

This reverts commit ee6fdc5055.

Link: https://lore.kernel.org/linux-nvme/hkhl56n665uvc6t5d6h3wtx7utkcorw4xlwi7d2t2bnonavhe6@xaan6pu43ap6/
Link: https://lists.infradead.org/pipermail/linux-nvme/2023-December/043756.html
Reported-by: Daniel Wagner <dwagner@suse.de>
Reported-by: Maurizio Lombardi <mlombard@redhat.com>
Cc: Michael Liang <mliang@purestorage.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-19 08:58:46 -08:00
Jens Axboe
c6d3ab9e76 Merge tag 'md-fixes-20231207-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7
Pull MD fix from Song:

"This change from Yu Kuai fixes a bug reported in
 https://bugzilla.kernel.org/show_bug.cgi?id=218200"

* tag 'md-fixes-20231207-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: split MD_RECOVERY_NEEDED out of mddev_resume
2023-12-07 12:15:18 -07:00
Yu Kuai
b39113349d md: split MD_RECOVERY_NEEDED out of mddev_resume
New mddev_resume() calls are added to synchronize IO with array
reconfiguration, however, this introduces a performance regression while
adding it in md_start_sync():

1) someone sets MD_RECOVERY_NEEDED first;
2) daemon thread grabs reconfig_mutex, then clears MD_RECOVERY_NEEDED and
   queues a new sync work;
3) daemon thread releases reconfig_mutex;
4) in md_start_sync
   a) check that there are spares that can be added/removed, then suspend
      the array;
   b) remove_and_add_spares may not be called, or called without really
      add/remove spares;
   c) resume the array, then set MD_RECOVERY_NEEDED again!

Loop between 2 - 4, then mddev_suspend() will be called quite often, for
consequence, normal IO will be quite slow.

Fix this problem by don't set MD_RECOVERY_NEEDED again in md_start_sync(),
hence the loop will be broken.

Fixes: bc08041b32 ("md: suspend array in md_start_sync() if array need reconfiguration")
Suggested-by: Song Liu <song@kernel.org>
Reported-by: Janpieter Sollie <janpieter.sollie@edpnet.be>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218200
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231207020724.2797445-1-yukuai1@huaweicloud.com
2023-12-07 10:19:47 -08:00
Jens Axboe
22b9a8964e nvme fixes for Linux 6.7
- Proper nvme ctrl state setting (Keith)
  - Passthrough command optimization (Keith)
  - Spectre fix (Nitesh)
  - Kconfig clarifications (Shin'ichiro)
  - Frozen state deadlock fix (Bitao)
  - Power setting quirk (Georg)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmVx+w4ACgkQPe3zGtjz
 RgnVFBAA1tQ7l2MCUB4j70q4+pCoWVVP+ZX2iiy9g55flK7SQlDs0FkniusTYidC
 dHQt28b4S7nPo27N1F857Q92/Fn2OQ7prPCViiXVWJejylAcZK5gaIXOCCchCPyn
 3ktfROPkSfdyo3gTCieNdA7rdh/SJMsam+Qy8Fqz5Z/aqlvmWVSDSaify/xlbOd0
 mZcqvFZyU2w0ILV8vhAldFlT2rHpNezvttqfHoJCS7g7Bcsz0VOGbjzrwEfRq0Sp
 MRodnoaGPcgl0TDeiYoZjGjR4pa4CD18I5L6jL5g0WEsyGoCMd2XuKXu5aZMtayZ
 PzHXRTMLR52EXqXIW/XIbbH3eTe0kmw75reWoIRGi/4uC5H7Ge8owXStSsfnTMA5
 l83yaw0I3DvHYvvtXEzZ1lkgBUSwO37RAbc7XOmh1vT5dkx44N1T8wZZUp67Atg8
 vL63fTXpVUHwo8C2bfOWGME3gEz7eeWHlIH9RKiApMKKufkxe/twEA2Sf/QI4Ljv
 SHsoqVLhcS1LtnDeezyOzhzV6xf6rZ0GNw/948+7sEsLrW/qE2ilwt0s9BrijNBe
 pdt2iaWGWAYZGlw7x0MqNrlQ3ZEpqWvUbs+Jrv7dHnNPgTxOlIgFGR6wCOdTDlCG
 gNwpbsQlhBBRe059cXntrS315VKUMtoj5Tl7o25nA+QVehARd8w=
 =XzMU
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.7-2023-12-7' of git://git.infradead.org/nvme into block-6.7

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.7

 - Proper nvme ctrl state setting (Keith)
 - Passthrough command optimization (Keith)
 - Spectre fix (Nitesh)
 - Kconfig clarifications (Shin'ichiro)
 - Frozen state deadlock fix (Bitao)
 - Power setting quirk (Georg)"

* tag 'nvme-6.7-2023-12-7' of git://git.infradead.org/nvme:
  nvme-pci: Add sleep quirk for Kingston drives
  nvme: fix deadlock between reset and scan
  nvme: prevent potential spectre v1 gadget
  nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions
  nvme-ioctl: move capable() admin check to the end
  nvme: ensure reset state check ordering
  nvme: introduce helper function to get ctrl state
2023-12-07 10:30:54 -07:00
Georg Gottleuber
107b4e063d nvme-pci: Add sleep quirk for Kingston drives
Some Kingston NV1 and A2000 are wasting a lot of power on specific TUXEDO
platforms in s2idle sleep if 'Simple Suspend' is used.

This patch applies a new quirk 'Force No Simple Suspend' to achieve a
low power sleep without 'Simple Suspend'.

Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-07 08:53:20 -08:00
Jens Axboe
7d2affce33 Merge tag 'md-fixes-20231206' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7
Pull MD fixes from Song:

"This set from Yu Kuai fixes issues around sync_work, which was introduced
 in 6.7 kernels."

* tag 'md-fixes-20231206' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fix stopping sync thread
  md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()
  md: fix missing flush of sync_work
2023-12-06 15:31:58 -07:00
Yu Kuai
f52f5c71f3 md: fix stopping sync thread
Currently sync thread is stopped from multiple contex:
 - idle_sync_thread
 - frozen_sync_thread
 - __md_stop_writes
 - md_set_readonly
 - do_md_stop

And there are some problems:
1) sync_work is flushed while reconfig_mutex is grabbed, this can
   deadlock because the work function will grab reconfig_mutex as well.
2) md_reap_sync_thread() can't be called directly while md_do_sync() is
   not finished yet, for example, commit 130443d60b ("md: refactor
   idle/frozen_sync_thread() to fix deadlock").
3) If MD_RECOVERY_RUNNING is not set, there is no need to stop
   sync_thread at all because sync_thread must not be registered.

Factor out a helper stop_sync_thread(), so that above contex will behave
the same. Fix 1) by flushing sync_work after reconfig_mutex is released,
before waiting for sync_thread to be done; Fix 2) bt letting daemon thread
to unregister sync_thread; Fix 3) by always checking MD_RECOVERY_RUNNING
first.

Fixes: db5e653d7c ("md: delay choosing sync action to md_start_sync()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231205094215.1824240-4-yukuai1@huaweicloud.com
2023-12-06 12:44:00 -08:00
Yu Kuai
c9f7cb5b2b md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()
If md_set_readonly() failed, the array could still be read-write, however
'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
abnormal state that sync or recovery can't continue anymore.
Hence make sure the flag is cleared after md_set_readonly() returns.

Fixes: 88724bfa68 ("md: wait for pending superblock updates before switching to read-only")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231205094215.1824240-3-yukuai1@huaweicloud.com
2023-12-06 12:44:00 -08:00
Yu Kuai
f2d87a759f md: fix missing flush of sync_work
Commit ac61978196 ("md: use separate work_struct for md_start_sync()")
use a new sync_work to replace del_work, however, stop_sync_thread() and
__md_stop_writes() was trying to wait for sync_thread to be done, hence
they should switch to use sync_work as well.

Noted that md_start_sync() from sync_work will grab 'reconfig_mutex',
hence other contex can't held the same lock to flush work, and this will
be fixed in later patches.

Fixes: ac61978196 ("md: use separate work_struct for md_start_sync()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231205094215.1824240-2-yukuai1@huaweicloud.com
2023-12-06 12:44:00 -08:00
Bitao Hu
839a40d1e7 nvme: fix deadlock between reset and scan
If controller reset occurs when allocating namespace, both
nvme_reset_work and nvme_scan_work will hang, as shown below.

Test Scripts:

    for ((t=1;t<=128;t++))
    do
    nsid=`nvme create-ns /dev/nvme1 -s 14537724 -c 14537724 -f 0 -m 0 \
    -d 0 | awk -F: '{print($NF);}'`
    nvme attach-ns /dev/nvme1 -n $nsid -c 0
    done
    nvme reset /dev/nvme1

We will find that both nvme_reset_work and nvme_scan_work hung:

    INFO: task kworker/u249:4:17848 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    task:kworker/u249:4  state:D stack:    0 pid:17848 ppid:     2
    flags:0x00000028
    Workqueue: nvme-reset-wq nvme_reset_work [nvme]
    Call trace:
    __switch_to+0xb4/0xfc
    __schedule+0x22c/0x670
    schedule+0x4c/0xd0
    blk_mq_freeze_queue_wait+0x84/0xc0
    nvme_wait_freeze+0x40/0x64 [nvme_core]
    nvme_reset_work+0x1c0/0x5cc [nvme]
    process_one_work+0x1d8/0x4b0
    worker_thread+0x230/0x440
    kthread+0x114/0x120
    INFO: task kworker/u249:3:22404 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    task:kworker/u249:3  state:D stack:    0 pid:22404 ppid:     2
    flags:0x00000028
    Workqueue: nvme-wq nvme_scan_work [nvme_core]
    Call trace:
    __switch_to+0xb4/0xfc
    __schedule+0x22c/0x670
    schedule+0x4c/0xd0
    rwsem_down_write_slowpath+0x32c/0x98c
    down_write+0x70/0x80
    nvme_alloc_ns+0x1ac/0x38c [nvme_core]
    nvme_validate_or_alloc_ns+0xbc/0x150 [nvme_core]
    nvme_scan_ns_list+0xe8/0x2e4 [nvme_core]
    nvme_scan_work+0x60/0x500 [nvme_core]
    process_one_work+0x1d8/0x4b0
    worker_thread+0x260/0x440
    kthread+0x114/0x120
    INFO: task nvme:28428 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    task:nvme            state:D stack:    0 pid:28428 ppid: 27119
    flags:0x00000000
    Call trace:
    __switch_to+0xb4/0xfc
    __schedule+0x22c/0x670
    schedule+0x4c/0xd0
    schedule_timeout+0x160/0x194
    do_wait_for_common+0xac/0x1d0
    __wait_for_common+0x78/0x100
    wait_for_completion+0x24/0x30
    __flush_work.isra.0+0x74/0x90
    flush_work+0x14/0x20
    nvme_reset_ctrl_sync+0x50/0x74 [nvme_core]
    nvme_dev_ioctl+0x1b0/0x250 [nvme_core]
    __arm64_sys_ioctl+0xa8/0xf0
    el0_svc_common+0x88/0x234
    do_el0_svc+0x7c/0x90
    el0_svc+0x1c/0x30
    el0_sync_handler+0xa8/0xb0
    el0_sync+0x148/0x180

The reason for the hang is that nvme_reset_work occurs while nvme_scan_work
is still running. nvme_scan_work may add new ns into ctrl->namespaces
list after nvme_reset_work frozen all ns->q in ctrl->namespaces list.
The newly added ns is not frozen, so nvme_wait_freeze will wait forever.
Unfortunately, ctrl->namespaces_rwsem is held by nvme_reset_work, so
nvme_scan_work will also wait forever. Now we are deadlocked!

PROCESS1                         PROCESS2
==============                   ==============
nvme_scan_work
  ...                            nvme_reset_work
  nvme_validate_or_alloc_ns        nvme_dev_disable
    nvme_alloc_ns                    nvme_start_freeze
     down_write                      ...
     nvme_ns_add_to_ctrl_list        ...
     up_write                      nvme_wait_freeze
    ...                              down_read
    nvme_alloc_ns                    blk_mq_freeze_queue_wait
     down_write

Fix by marking the ctrl with say NVME_CTRL_FROZEN flag set in
nvme_start_freeze and cleared in nvme_unfreeze. Then the scan can check
it before adding the new namespace (under the namespaces_rwsem).

Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Reviewed-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04 08:39:04 -08:00
Nitesh Shetty
20dc66f2d7 nvme: prevent potential spectre v1 gadget
This patch fixes the smatch warning, "nvmet_ns_ana_grpid_store() warn:
potential spectre issue 'nvmet_ana_group_enabled' [w] (local cap)"
Prevent the contents of kernel memory from being leaked to  user space
via speculative execution by using array_index_nospec.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04 08:39:04 -08:00
Shin'ichiro Kawasaki
29ac4b2f92 nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions
Currently two similar config options NVME_HOST_AUTH and NVME_TARGET_AUTH
have almost same descriptions. It is confusing to choose them in
menuconfig. Improve the descriptions to distinguish them.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04 08:39:03 -08:00
Keith Busch
7be866b1cf nvme-ioctl: move capable() admin check to the end
This can be an expensive call on some kernel configs. Move it to the end
after checking the cheaper ways to determine if the command is allowed.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04 08:39:03 -08:00
Keith Busch
e6e7f7ac03 nvme: ensure reset state check ordering
A different CPU may be setting the ctrl->state value, so ensure proper
barriers to prevent optimizing to a stale state. Normally it isn't a
problem to observe the wrong state as it is merely advisory to take a
quicker path during initialization and error recovery, but seeing an old
state can report unexpected ENETRESET errors when a reset request was in
fact successful.

Reported-by: Minh Hoang <mh2022@meta.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
2023-12-04 08:39:03 -08:00
Keith Busch
5c687c287c nvme: introduce helper function to get ctrl state
The controller state is typically written by another CPU, so reading it
should ensure no optimizations are taken. This is a repeated pattern in
the driver, so start with adding a convenience function that returns the
controller state with READ_ONCE().

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-04 08:39:03 -08:00
Jens Axboe
a134cd8dfb Merge tag 'md-fixes-20231201-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7
Pull MD fix from Song:

"This change fixes issue with raid456 reshape."

* tag 'md-fixes-20231201-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid6: use valid sector values to determine if an I/O should wait on the reshape
2023-12-01 18:37:24 -07:00
David Jeffery
c467e97f07 md/raid6: use valid sector values to determine if an I/O should wait on the reshape
During a reshape or a RAID6 array such as expanding by adding an additional
disk, I/Os to the region of the array which have not yet been reshaped can
stall indefinitely. This is from errors in the stripe_ahead_of_reshape
function causing md to think the I/O is to a region in the actively
undergoing the reshape.

stripe_ahead_of_reshape fails to account for the q disk having a sector
value of 0. By not excluding the q disk from the for loop, raid6 will always
generate a min_sector value of 0, causing a return value which stalls.

The function's max_sector calculation also uses min() when it should use
max(), causing the max_sector value to always be 0. During a backwards
rebuild this can cause the opposite problem where it allows I/O to advance
when it should wait.

Fixing these errors will allow safe I/O to advance in a timely manner and
delay only I/O which is unsafe due to stripes in the middle of undergoing
the reshape.

Fixes: 486f605586 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231128181233.6187-1-djeffery@redhat.com
2023-12-01 14:43:29 -08:00
Jens Axboe
8ad3ac92f0 nvme fixes for Linux 6.7
- Invalid namespace identification error handling (Marizio Ewan, Keith)
  - Fabrics keep-alive tuning (Mark)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmVqAtwACgkQPe3zGtjz
 RglRnA/+M7S1MTBBeW9XfjJWZQcQQdcspXskc+ylFfKMYkcxOHPNZVYnA2tvcDCP
 g9Vg8NtHab8b4i4Q/axBPnYSWpYimvj6mse3G9DutA7Ag/G7vA+hNoh6YZ0PE7xO
 SNASA34lBN5dpfM1IZ8JyOn22XH+R6HMHLRpw7vBkdBNCQr9/m2OUs2ckQ7rVARj
 +FCibJBog8WZVCULvDW2nt9cOyHM5jc6Ti3r0+s60u0YKNpoJ4Am6JSHdeBeDj2t
 vh1Mto3C0z9E/nn7VNZCZNohQrsP/YnmC7henS1xnALlNTO8lVJK9w0ZBgoiNqhA
 2fIfl4k1txjqrhIshtsptP1uAh8j90evZ7QbQ3B1dPAuzHWVETxhZ2Bzn/gJrId4
 rB/wtrQimezvFqpw83s8yQPMmvO3jhW8DYsEvctHtzn3bbIRiMsagfe5P0TgQsCb
 dqXgVZzWTqyPaa1/M8X0VaWmh4XPKqqZULCuwKifRVnn0GIPavhMmtfqxmRvyayg
 WP9bhe2Sgr+FrItdRjDYMrmQiE4onzhSpMRWPWaKxsfYS9sf2DgUcEMKb1FrlHaQ
 ApTXrP9tM48K4aqybjRJfcQkuDesGgnZu2q5EsmU6ZHhD86kIthX/lJkdd2EmY8J
 iykLCeTMRo6nsut2Lr2yTt7H6Y4gUhsUrNLCO50Go7jMyxISkqA=
 =tdDy
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme into block-6.7

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.7

 - Invalid namespace identification error handling (Marizio Ewan, Keith)
 - Fabrics keep-alive tuning (Mark)"

* tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme:
  nvme-core: check for too small lba shift
  nvme: check for valid nvme_identify_ns() before using it
  nvme-core: fix a memory leak in nvme_ns_info_from_identify()
  nvme: fine-tune sending of first keep-alive
2023-12-01 09:09:16 -07:00
Keith Busch
74fbc88e16 nvme-core: check for too small lba shift
The block layer doesn't support logical block sizes smaller than 512
bytes. The nvme spec doesn't support that small either, but the driver
isn't checking to make sure the device responded with usable data.
Failing to catch this will result in a kernel bug, either from a
division by zero when stacking, or a zero length bio.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-01 07:49:50 -08:00
Ming Lei
0e4237ae8d blk-mq: don't count completed flush data request as inflight in case of quiesce
Request queue quiesce may interrupt flush sequence, and the original request
may have been marked as COMPLETE, but can't get finished because of
queue quiesce.

This way is fine from driver viewpoint, because flush sequence is block
layer concept, and it isn't related with driver.

However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count &
drain inflight requests, then the wait & drain never gets done because
the completed & not-finished flush request is counted as inflight.

Fix this issue by not counting completed flush data request as inflight in
case of quiesce.

Cc: Mike Snitzer <snitzer@kernel.org>
Cc: David Jeffery <djeffery@redhat.com>
Cc: John Pittman <jpittman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-01 07:34:47 -07:00
Bart Van Assche
3649ff0a0b block: Document the role of the two attribute groups
It is nontrivial to derive the role of the two attribute groups in source
file block/blk-sysfs.c. Hence add a comment that explains their roles. See
also commit 6d85ebf95c ("blk-sysfs: add a new attr_group for blk_mq").

Cc: Christoph Hellwig <hch@lst.de>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20231128194019.72762-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-29 10:18:38 -07:00
Yu Kuai
67d995e069 block: warn once for each partition in bio_check_ro()
Commit 1b0a151c10 ("blk-core: use pr_warn_ratelimited() in
bio_check_ro()") fix message storm by limit the rate, however, there
will still be lots of message in the long term. Fix it better by warn
once for each partition.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231128123027.971610-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-28 12:11:08 -07:00
Ming Lei
fad907cffd block: move .bd_inode into 1st cacheline of block_device
The .bd_inode field of block_device is used in IO fast path of
blkdev_write_iter() and blkdev_llseek(), so it is more efficient to keep
it into the 1st cacheline.

.bd_openers is only touched in open()/close(), and .bd_size_lock is only
for updating bdev capacity, which is in slow path too.

So swap .bd_inode layout with .bd_openers & .bd_size_lock to move
.bd_inode into the 1st cache line.

Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231128123027.971610-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-28 12:11:08 -07:00
Ewan D. Milne
d8b90d600a nvme: check for valid nvme_identify_ns() before using it
When scanning namespaces, it is possible to get valid data from the first
call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second
call in nvme_update_ns_info_block().  In particular, if the NSID becomes
inactive between the two commands, a storage device may return a buffer
filled with zero as per 4.1.5.1.  In this case, we can get a kernel crash
due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will
be set to zero.

PID: 326      TASK: ffff95fec3cd8000  CPU: 29   COMMAND: "kworker/u98:10"
 #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7
 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa
 #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788
 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb
 #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce
 #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595
 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6
 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926
    [exception RIP: blk_stack_limits+434]
    RIP: ffffffff92191872  RSP: ffffad8f8702fc80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff95efa0c91800  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000000000001
    RBP: 00000000ffffffff   R8: ffff95fec7df35a8   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: ffff95fed33c09a8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core]
 #9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core]

This happened when the check for valid data was moved out of nvme_identify_ns()
into one of the callers.  Fix this by checking in both callers.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186
Fixes: 0dd6fff2aa ("nvme: bring back auto-removal of deleted namespaces during sequential scan")
Cc: stable@vger.kernel.org
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-27 14:00:55 -08:00
Maurizio Lombardi
e3139cef82 nvme-core: fix a memory leak in nvme_ns_info_from_identify()
In case of error, free the nvme_id_ns structure that was allocated
by nvme_identify_ns().

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-27 10:26:43 -08:00
Mark O'Donovan
136cfcb8dc nvme: fine-tune sending of first keep-alive
Keep-alive commands are sent half-way through the kato period.
This normally works well but fails when the keep-alive system is
started when we are more than half way through the kato.
This can happen on larger setups or due to host delays.
With this change we now time the initial keep-alive command from
the controller initialisation time, rather than the keep-alive
mechanism activation time.

Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-27 10:23:13 -08:00
Markus Weippert
bb6cc25386 bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
Commit 028ddcac47 ("bcache: Remove unnecessary NULL point check in
node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a
NULL pointer dereference.

BUG: kernel NULL pointer dereference, address: 0000000000000080
Call Trace:
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xd2/0x2b0
 ? exc_page_fault+0x70/0x170
 ? asm_exc_page_fault+0x22/0x30
 ? btree_node_free+0xf/0x160 [bcache]
 ? up_write+0x32/0x60
 btree_gc_coalesce+0x2aa/0x890 [bcache]
 ? bch_extent_bad+0x70/0x170 [bcache]
 btree_gc_recurse+0x130/0x390 [bcache]
 ? btree_gc_mark_node+0x72/0x230 [bcache]
 bch_btree_gc+0x5da/0x600 [bcache]
 ? cpuusage_read+0x10/0x10
 ? bch_btree_gc+0x600/0x600 [bcache]
 bch_gc_thread+0x135/0x180 [bcache]

The relevant code starts with:

    new_nodes[0] = NULL;

    for (i = 0; i < nodes; i++) {
        if (__bch_keylist_realloc(&keylist, bkey_u64s(&r[i].b->key)))
            goto out_nocoalesce;
    // ...
out_nocoalesce:
    // ...
    for (i = 0; i < nodes; i++)
        if (!IS_ERR(new_nodes[i])) {  // IS_ERR_OR_NULL before
028ddcac47
            btree_node_free(new_nodes[i]);  // new_nodes[0] is NULL
            rw_unlock(true, new_nodes[i]);
        }

This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this.

Fixes: 028ddcac47 ("bcache: Remove unnecessary NULL point check in node allocations")
Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/
Cc: stable@vger.kernel.org
Cc: Zheng Wang <zyytlz.wz@163.com>
Cc: Coly Li <colyli@suse.de>
Signed-off-by: Markus Weippert <markus@gekmihesg.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-24 09:39:03 -07:00
Arnd Bergmann
0e6c4fe782 nvme: tcp: fix compile-time checks for TLS mode
When CONFIG_NVME_KEYRING is enabled as a loadable module, but the TCP
host code is built-in, it fails to link:

arm-linux-gnueabi-ld: drivers/nvme/host/tcp.o: in function `nvme_tcp_setup_ctrl':
tcp.c:(.text+0x1940): undefined reference to `nvme_tls_psk_default'

The problem is that the compile-time conditionals are inconsistent here,
using a mix of #ifdef CONFIG_NVME_TCP_TLS, IS_ENABLED(CONFIG_NVME_TCP_TLS)
and IS_ENABLED(CONFIG_NVME_KEYRING) checks, with CONFIG_NVME_KEYRING
controlling whether the implementation is actually built.

Change it to use IS_ENABLED(CONFIG_NVME_KEYRING) checks consistently,
which should help readability and make it less error-prone. Combining
it with the check for the ctrl->opts->tls flag lets the compiler drop
all the TLS code in configurations without this feature, which also
helps runtime behavior in addition to avoiding the link failure.

To make it possible for the compiler to build the dead code, both
the tls_handshake_timeout variable and the TLS specific members
of nvme_tcp_queue need to be moved out of the #ifdef block as well,
but at least the former of these gets optimized out again.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231122224719.4042108-4-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-22 18:41:14 -07:00
Arnd Bergmann
65e2a74c44 nvme: target: fix Kconfig select statements
When the NVME target code is built-in but its TCP frontend is a loadable
module, enabling keyring support causes a link failure:

x86_64-linux-ld: vmlinux.o: in function `nvmet_ports_make':
configfs.c:(.text+0x100a211): undefined reference to `nvme_keyring_id'

The problem is that CONFIG_NVME_TARGET_TCP_TLS is a 'bool' symbol that
depends on the tristate CONFIG_NVME_TARGET_TCP, so any 'select' from
it inherits the state of the tristate symbol rather than the intended
CONFIG_NVME_TARGET one that contains the actual call.

The same thing is true for CONFIG_KEYS, which itself is required for
NVME_KEYRING.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231122224719.4042108-3-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-22 18:40:14 -07:00
Arnd Bergmann
d78abcbabe nvme: target: fix nvme_keyring_id() references
In configurations without CONFIG_NVME_TARGET_TCP_TLS, the keyring
code might not be available, or using it will result in a runtime
failure:

x86_64-linux-ld: vmlinux.o: in function `nvmet_ports_make':
configfs.c:(.text+0x100a211): undefined reference to `nvme_keyring_id'

Add a check to ensure we only check the keyring if there is a chance
of it being used, which avoids both the runtime and link-time
problems.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231122224719.4042108-2-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-22 18:40:14 -07:00
Jens Axboe
55072cd7ce nvme fixes for Linux 6.7
- TCP TLS fixes (Hannes)
  - Authentifaction fixes (Mark, Hannes)
  - Properly terminate target names (Christoph)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmVeK3oACgkQPe3zGtjz
 Rgl7yxAA3xzBP4DCzpEGJPNMeSxYsk1WsauXow1izjbxz6GhJj9ZRtEFHZlFYKmt
 1Uo7xIZr13rtUvGDTkEJczeuXsGIllPtuA4SEIKMn/7Ftk9Pda11A3NXijp5ZtF1
 F794WNoOujrzZrBPUYAAoKMJ9jELt3ZHJD4WkU7cYAbQpPjZrBIdYAefJo3L4ie+
 bmpXyU4ikieYaQeNEFap/A4IOtcpMUmrO8DBH98MYg51zfEkr9xdm0ynqTKoePBi
 yV1vVnzJyYUxJC6fwVa5b3kcoPrkeBCAIzlxqWxukodL8egmYbkOGskTf39/9hZc
 ls7myjqu23emV/Fj4DTDJenHoB71Ckn0Y/6mWqsu2WqH43exR9gLutUPwrs52BqD
 RTNPcTdhk6Kn5bQjkySJR9QBA3hWGFCJwOiQf6XcCh4ae77LwDbhr3ti5dTCh54M
 pY1VeEMnahtMNsF0hU+k0qOh9zsyxPwdZGMpFPJ3br9jVvhaAOhPIzaRoJx9tPBO
 bF31YJXzFVOLttzfyrJM1BgDXhorN/Nt+HKHSUt2oFZrqT9wRlAs9veyxJOW5e90
 tH5Bp3Gff5tkeWVBT36Xveu8jsd4c/zTu+5VAangSuc6mGeRseJP4G4rOh5VUNAD
 FUxAsIc23CqrENKt0tywvFcWDn+yNdnh6SxGQ09vxVoJvcGRA4Q=
 =2adT
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.7-2023-11-22' of git://git.infradead.org/nvme into block-6.7

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.7

 - TCP TLS fixes (Hannes)
 - Authentifaction fixes (Mark, Hannes)
 - Properly terminate target names (Christoph)"

* tag 'nvme-6.7-2023-11-22' of git://git.infradead.org/nvme:
  nvme: move nvme_stop_keep_alive() back to original position
  nvmet-tcp: always initialize tls_handshake_tmo_work
  nvmet: nul-terminate the NQNs passed in the connect command
  nvme: blank out authentication fabrics options if not configured
  nvme: catch errors from nvme_configure_metadata()
  nvme-tcp: only evaluate 'tls' option if TLS is selected
  nvme-auth: set explanation code for failure2 msgs
  nvme-auth: unlock mutex in one place only
2023-11-22 10:19:27 -07:00
Hannes Reinecke
3af755a468 nvme: move nvme_stop_keep_alive() back to original position
Stopping keep-alive not only stops the keep-alive workqueue,
but also needs to be synchronized with I/O termination as we
must not send a keep-alive command after all I/O had been
terminated.
So to avoid any regressions move the call to stop_keep_alive()
back to its original position and ensure that keep-alive is
correctly stopped failing to setup the admin queue.

Fixes: 4733b65d82 ("nvme: start keep-alive after admin queue setup")
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-22 08:07:02 -08:00
Li Nan
98c598afc2 nbd: pass nbd_sock to nbd_read_reply() instead of index
If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be
krealloc in nbd_add_socket(), and a garbage request is received now, a UAF
may occurs.

  T1
  nbd_ioctl
   __nbd_ioctl
    nbd_add_socket
     blk_mq_freeze_queue
				T2
  				recv_work
  				 nbd_read_reply
  				  sock_xmit
     krealloc config->socks
				   def config->socks

Pass nbd_sock to nbd_read_reply(). And introduce a new function
sock_xmit_recv(), which differs from sock_xmit only in the way it get
socket.

==================================================================
BUG: KASAN: use-after-free in sock_xmit+0x525/0x550
Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779

Workqueue: knbd4-recv recv_work
Call Trace:
 __dump_stack
 dump_stack+0xbe/0xfd
 print_address_description.constprop.0+0x19/0x170
 __kasan_report.cold+0x6c/0x84
 kasan_report+0x3a/0x50
 sock_xmit+0x525/0x550
 nbd_read_reply+0xfe/0x2c0
 recv_work+0x1c2/0x750
 process_one_work+0x6b6/0xf10
 worker_thread+0xdd/0xd80
 kthread+0x30a/0x410
 ret_from_fork+0x22/0x30

Allocated by task 18784:
 kasan_save_stack+0x1b/0x40
 kasan_set_track
 set_alloc_info
 __kasan_kmalloc
 __kasan_kmalloc.constprop.0+0xf0/0x130
 slab_post_alloc_hook
 slab_alloc_node
 slab_alloc
 __kmalloc_track_caller+0x157/0x550
 __do_krealloc
 krealloc+0x37/0xb0
 nbd_add_socket
 +0x2d3/0x880
 __nbd_ioctl
 nbd_ioctl+0x584/0x8e0
 __blkdev_driver_ioctl
 blkdev_ioctl+0x2a0/0x6e0
 block_ioctl+0xee/0x130
 vfs_ioctl
 __do_sys_ioctl
 __se_sys_ioctl+0x138/0x190
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

Freed by task 18784:
 kasan_save_stack+0x1b/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x20/0x40
 __kasan_slab_free.part.0+0x13f/0x1b0
 slab_free_hook
 slab_free_freelist_hook
 slab_free
 kfree+0xcb/0x6c0
 krealloc+0x56/0xb0
 nbd_add_socket+0x2d3/0x880
 __nbd_ioctl
 nbd_ioctl+0x584/0x8e0
 __blkdev_driver_ioctl
 blkdev_ioctl+0x2a0/0x6e0
 block_ioctl+0xee/0x130
 vfs_ioctl
 __do_sys_ioctl
 __se_sys_ioctl+0x138/0x190
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-21 07:42:04 -07:00
Jan Höppner
db46cd1e04 s390/dasd: protect device queue against concurrent access
In dasd_profile_start() the amount of requests on the device queue are
counted. The access to the device queue is unprotected against
concurrent access. With a lot of parallel I/O, especially with alias
devices enabled, the device queue can change while dasd_profile_start()
is accessing the queue. In the worst case this leads to a kernel panic
due to incorrect pointer accesses.

Fix this by taking the device lock before accessing the queue and
counting the requests. Additionally the check for a valid profile data
pointer can be done earlier to avoid unnecessary locking in a hot path.

Cc:  <stable@vger.kernel.org>
Fixes: 4fa52aa7a8 ("[S390] dasd: add enhanced DASD statistics interface")
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20231025132437.1223363-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 11:50:54 -07:00
Muhammad Muzammil
5029c5e4f2 s390/dasd: resolve spelling mistake
resolve typing mistake from pimary to primary

Signed-off-by: Muhammad Muzammil <m.muzzammilashraf@gmail.com>
Link: https://lore.kernel.org/r/20231010043140.28416-1-m.muzzammilashraf@gmail.com
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20231025132437.1223363-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 11:50:54 -07:00
Chengming Zhou
53f2bca260 block/null_blk: Fix double blk_mq_start_request() warning
When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq()
would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request,
which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request().

Then null_queue_rqs() put these requests in the rqlist, return back to
the block layer core, which would try to queue them individually again,
so the warning in blk_mq_start_request() triggered.

Fix it by splitting the null_queue_rq() into two parts: the first is the
preparation of request, the second is the handling of request. We put
the blk_mq_start_request() after the preparation part, which may fail
and return back to the block layer core.

The throttling also belongs to the preparation part, so move it before
blk_mq_start_request(). And change the return type of null_handle_cmd()
to void, since it always return BLK_STS_OK now.

Reported-by:  <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com>
Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/
Fixes: d78bfa1346 ("block/null_blk: add queue_rqs() support")
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 10:26:26 -07:00
Hannes Reinecke
11b9d0b499 nvmet-tcp: always initialize tls_handshake_tmo_work
The TLS handshake timeout work item should always be
initialized to avoid a crash when cancelling the workqueue.

Fixes: 675b453e02 ("nvmet-tcp: enable TLS handshake upcall")
Suggested-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:33 -08:00
Christoph Hellwig
1c22e0295a nvmet: nul-terminate the NQNs passed in the connect command
The host and subsystem NQNs are passed in the connect command payload and
interpreted as nul-terminated strings.  Ensure they actually are
nul-terminated before using them.

Fixes: a07b4970f4 "nvmet: add a generic NVMe target")
Reported-by: Alon Zahavi <zahavi.alon@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:33 -08:00
Hannes Reinecke
c7ca9757bd nvme: blank out authentication fabrics options if not configured
If the config option NVME_HOST_AUTH is not selected we should not
accept the corresponding fabrics options. This allows userspace
to detect if NVMe authentication has been enabled for the kernel.

Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: f50fff73d6 ("nvme: implement In-Band authentication")
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:32 -08:00
Hannes Reinecke
cd9aed6060 nvme: catch errors from nvme_configure_metadata()
nvme_configure_metadata() is issuing I/O, so we might incur an I/O
error which will cause the connection to be reset.
But in that case any further probing will race with reset and
cause UAF errors.
So return a status from nvme_configure_metadata() and abort
probing if there was an I/O error.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:32 -08:00
Hannes Reinecke
23441536b6 nvme-tcp: only evaluate 'tls' option if TLS is selected
We only need to evaluate the 'tls' connect option if TLS is
enabled; otherwise we might be getting a link error.

Fixes: 706add1367 ("nvme: keyring: fix conditional compilation")
Reported-by: kernel test robot <yujie.liu@intel.com>
Closes: https://lore.kernel.org/r/202311140426.0eHrTXBr-lkp@intel.com/
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:32 -08:00
Mark O'Donovan
38ce1570e2 nvme-auth: set explanation code for failure2 msgs
Some error cases were not setting an auth-failure-reason-code-explanation.
This means an AUTH_Failure2 message will be sent with an explanation value
of 0 which is a reserved value.

Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:23 -08:00
Mark O'Donovan
616add70bf nvme-auth: unlock mutex in one place only
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-20 09:25:13 -08:00
Damien Le Moal
c96b817552 block: Remove blk_set_runtime_active()
The function blk_set_runtime_active() is called only from
blk_post_runtime_resume(), so there is no need for that function to be
exported. Open-code this function directly in blk_post_runtime_resume()
and remove it.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20231120070611.33951-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 10:22:40 -07:00
Li Nan
c2da049f41 nbd: fix null-ptr-dereference while accessing 'nbd->config'
Memory reordering may occur in nbd_genl_connect(), causing config_refs
to be set to 1 while nbd->config is still empty. Opening nbd at this
time will cause null-ptr-dereference.

   T1                      T2
   nbd_open
    nbd_get_config_unlocked
                 	   nbd_genl_connect
                 	    nbd_alloc_and_init_config
                 	     //memory reordered
                  	     refcount_set(&nbd->config_refs, 1)  // 2
     nbd->config
      ->null point
			     nbd->config = config  // 1

Fix it by adding smp barrier to guarantee the execution sequence.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20231116162316.1740402-4-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 10:16:44 -07:00
Li Nan
3123ac7792 nbd: factor out a helper to get nbd_config without holding 'config_lock'
There are no functional changes, just to make code cleaner and prepare
to fix null-ptr-dereference while accessing 'nbd->config'.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20231116162316.1740402-3-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 10:16:44 -07:00
Li Nan
1b59860540 nbd: fold nbd config initialization into nbd_alloc_config()
There are no functional changes, make the code cleaner and prepare to
fix null-ptr-dereference while accessing 'nbd->config'.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20231116162316.1740402-2-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20 10:16:44 -07:00