Commit Graph

996917 Commits

Author SHA1 Message Date
Christoph Hellwig
a5d737f100 nvme: factor out a nvme_ns_ioctl helper
Factor out a helper for the namespace based ioctls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2021-04-15 08:12:54 +02:00
Christoph Hellwig
d7790d3739 nvme: pass a user pointer to nvme_nvm_ioctl
Pass the proper user pointer instead of the not all that useful integer
representation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
2021-04-15 08:12:54 +02:00
Christoph Hellwig
9953ab0c5a nvme: cleanup setting the disk name
Return false from nvme_set_disk_name and let the caller set the
non-multipath name instead of duplicating the naming information in two
places.  Also remove the pointless local variables for the disk name
and flags and the not needed ctrl argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
2021-04-15 08:12:54 +02:00
Minwoo Im
3089738868 nvme: add a nvme_ns_head_multipath helper
Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add
a new nvme_ns_head_multipath that uses it to check if a ns_head has
a multipath device associated with it.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[hch: added the IS_ENABLED, converted a few existing users]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2021-04-15 08:12:54 +02:00
Niklas Cassel
95d54bd1a4 nvme: remove single trailing whitespace
There is a single trailing whitespace in core.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:54 +02:00
Niklas Cassel
e234f1f8bb nvme-multipath: remove single trailing whitespace
There is a single trailing whitespace in multipath.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:54 +02:00
Niklas Cassel
53dc180e7c nvme-pci: remove single trailing whitespace
There is a single trailing whitespace in pci.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:54 +02:00
Niklas Cassel
e51183be1f nvme-pci: don't simple map sgl when sgls are disabled
According to the module parameter description for sgl_threshold,
a value of 0 means that SGLs are disabled.

If SGLs are disabled, we should respect that, even for the case
where the request is made up of a single physical segment.

Fixes: 297910571f ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:53 +02:00
Colin Ian King
ccc1003b5b nvmet: fix a spelling mistake "nubmer" -> "number"
There is a spelling mistake in a pr_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:53 +02:00
Amit Engel
0d8ddeea11 nvmet-fc: simplify nvmet_fc_alloc_hostport
Once a host is already created, avoid allocate additional hostports that
will be thrown away. add an helper function to handle host search.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:53 +02:00
Elad Grupi
bdaf132791 nvmet-tcp: fix a segmentation fault during io parsing error
In case there is an io that contains inline data and it goes to
parsing error flow, command response will free command and iov
before clearing the data on the socket buffer.
This will delay the command response until receive flow is completed.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:50 +02:00
Christoph Hellwig
f8ee34a929 lightnvm: deprecated OCSSD support and schedule it for removal in Linux 5.15
Lightnvm was an innovative idea to expose more low-level control over SSDs.
But it failed to get properly standardized and remains a non-standarized
extension to NVMe that requires vendor specific quirks for a few now mostly
obsolete SSD devices.  The standardized ZNS command set for NVMe has take
over a lot of the approaches and allows for fully standardized operation.

Remove the Linux code to support open channel SSDs as the few production
deployments of the above mentioned SSDs are using userspace driver stacks
instead of the fairly limited Linux support.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Link: https://lore.kernel.org/r/20210413105257.159260-5-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-13 09:16:12 -06:00
Zhang Yunkai
655cdafdec lightnvm: remove duplicate include in lightnvm.h
'linux/blkdev.h' and 'uapi/linux/lightnvm.h' included in 'lightnvm.h'
is duplicated.It is also included in the 5th and 7th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-4-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-13 09:16:12 -06:00
Tian Tao
1c6b0bc73f lightnvm: return the correct return value
When memdup_user returns an error, memdup_user has two different return
values, use PTR_ERR to get the correct return value.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-3-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-13 09:16:12 -06:00
Chaitanya Kulkarni
327e1d2957 lightnvm: use kobj_to_dev()
This fixs coccicheck warning:

drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for
kobj_to_dev()

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-13 09:16:12 -06:00
Christoph Hellwig
a8ed1a0607 block: remove the -ERESTARTSYS handling in blkdev_get_by_dev
Now that md has been cleaned up we can get rid of this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12 06:55:31 -06:00
Max Gurtovoy
cee1b21523 null_blk: add option for managing virtual boundary
This will enable changing the virtual boundary of null blk devices. For
now, null blk devices didn't have any restriction on the scatter/gather
elements received from the block layer. Add a module parameter and a
configfs option that will control the virtual boundary. This will
enable testing the efficiency of the block layer bounce buffer in case
a suitable application will send discontiguous IO to the given device.

Initial testing with patched FIO showed the following results (64 jobs,
128 iodepth, 1 nullb device):
IO size      READ (virt=false)   READ (virt=true)   Write (virt=false)  Write (virt=true)
----------  ------------------- -----------------  ------------------- -------------------
 1k            10.7M                8482k               10.8M              8471k
 2k            10.4M                8266k               10.4M              8271k
 4k            10.4M                8274k               10.3M              8226k
 8k            10.2M                8131k               9800k              7933k
 16k           9567k                7764k               8081k              6828k
 32k           8865k                7309k               5570k              5153k
 64k           7695k                6586k               2682k              2617k
 128k          5346k                5489k               1320k              1296k

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12 06:47:25 -06:00
Chaitanya Kulkarni
eb87e4e90b gdrom: fix compilation error
Use the right name for the struct request variable that removes the
following compilation error :-

make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=sh
CROSS_COMPILE=sh4-linux-gnu- 'CC=sccache sh4-linux-gnu-gcc'
'HOSTCC=sccache gcc'

In file included from /builds/linux/include/linux/scatterlist.h:9,
                 from /builds/linux/include/linux/dma-mapping.h:10,
                 from /builds/linux/drivers/cdrom/gdrom.c:16:
/builds/linux/drivers/cdrom/gdrom.c: In function 'gdrom_readdisk_dma':
/builds/linux/drivers/cdrom/gdrom.c:586:61: error: 'rq' undeclared
(first use in this function)
  586 |  __raw_writel(page_to_phys(bio_page(req->bio)) + bio_offset(rq->bio),
      |                                                             ^~

Fixes: 1d2c82001a ("gdrom: support highmem")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 19:32:06 -06:00
Coly Li
33ec5dfe8f bcache: fix a regression of code compiling failure in debug.c
The patch "bcache: remove PTR_CACHE" introduces a compiling failure in
debug.c with following error message,
  In file included from drivers/md/bcache/bcache.h:182:0,
                   from drivers/md/bcache/debug.c:9:
  drivers/md/bcache/debug.c: In function 'bch_btree_verify':
  drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in
  this function)
    bio_set_dev(bio, c->cache->bdev);
                     ^
This patch fixes the regression by replacing c->cache->bdev by b->c->
cache->bdev.

Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:56 -06:00
Gustavo A. R. Silva
62594f189e bcache: Use 64-bit arithmetic instead of 32-bit
Cast multiple variables to (int64_t) in order to give the compiler
complete information about the proper arithmetic to use. Notice that
these variables are being used in contexts that expect expressions of
type int64_t  (64 bit, signed). And currently, such expressions are
being evaluated using 32-bit arithmetic.

Fixes: d0cf9503e9 ("octeontx2-pf: ethtool fec mode support")
Addresses-Coverity-ID: 1501724 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501725 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501726 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:56 -06:00
Bhaskar Chowdhury
9c9b81c456 md: bcache: Trivial typo fixes in the file journal.c
s/condidate/candidate/
s/folowing/following/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:56 -06:00
Arnd Bergmann
be3bacecec md: bcache: avoid -Wempty-body warnings
building with 'make W=1' shows a harmless warning for each user of the
EBUG_ON() macro:

drivers/md/bcache/bset.c: In function 'bch_btree_sort_partial':
drivers/md/bcache/util.h:30:55: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
   30 | #define EBUG_ON(cond)                   do { if (cond); } while (0)
      |                                                       ^
drivers/md/bcache/bset.c:1312:9: note: in expansion of macro 'EBUG_ON'
 1312 |         EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize);
      |         ^~~~~~~

Reword the macro slightly to avoid the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:56 -06:00
Yang Li
f9a018e8a6 bcache: use NULL instead of using plain integer as pointer
This fixes the following sparse warnings:
drivers/md/bcache/features.c:22:16: warning: Using plain integer as NULL
pointer

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:56 -06:00
Christoph Hellwig
11e9560e6c bcache: remove PTR_CACHE
Remove the PTR_CACHE inline and replace it with a direct dereference
of c->cache.

(Coly Li: fix the typo from PTR_BUCKET to PTR_CACHE in commit log)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:55 -06:00
Zhiqiang Liu
13e1db65d2 bcache: reduce redundant code in bch_cached_dev_run()
In bch_cached_dev_run(), free(env[1])|free(env[2])|free(buf)
show up three times. This patch introduce out tag in
which free(env[1])|free(env[2])|free(buf) are only called
one time. If we need to call free() when errors occur,
we can set error code to ret, and then goto out tag directly.

Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-11 08:37:55 -06:00
Jens Axboe
ff91763835 Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers
Pull MD updates from Song:

"These patches fix a race condition with md_release() and md_open()."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: split mddev_find
  md: factor out a mddev_find_locked helper from mddev_find
  md: md_open returns -EBUSY when entering racing area
2021-04-08 09:55:14 -06:00
Christoph Hellwig
65aa97c4d2 md: split mddev_find
Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.

This turns out to fix this bug reported by Zhao Heming.

----------------------------- snip ------------------------------
commit d3374825ce ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger 1 time with 10 tests

`1  node1="15sp3-mdcluster1"
2  node2="15sp3-mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..100}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
`
I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

`ID: 2831   TASK: ffff8dd7223b5040  CPU: 0   COMMAND: "mdadm"
 #0 [ffffa15d00a13b90] __schedule at ffffffffb8f1935f
 #1 [ffffa15d00a13ba8] exact_lock at ffffffffb8a4a66d
 #2 [ffffa15d00a13bb0] kobj_lookup at ffffffffb8c62fe3
 #3 [ffffa15d00a13c28] __blkdev_get at ffffffffb89273b9
 #4 [ffffa15d00a13c98] blkdev_get at ffffffffb8927964
 #5 [ffffa15d00a13cb0] do_dentry_open at ffffffffb88dc4b4
 #6 [ffffa15d00a13ce0] path_openat at ffffffffb88f0ccc
 #7 [ffffa15d00a13db8] do_filp_open at ffffffffb88f32bb
 #8 [ffffa15d00a13ee0] do_sys_open at ffffffffb88ddc7d
 #9 [ffffa15d00a13f38] do_syscall_64 at ffffffffb86053cb ffffffffb900008c

or:
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
`
*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

`<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
`
In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.
------------------------------ snip ------------------------------

Cc: stable@vger.kernel.org
Fixes: d3374825ce ("md: make devices disappear when they are no longer needed.")
Reported-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2021-04-07 22:41:26 -07:00
Christoph Hellwig
8b57251f9a md: factor out a mddev_find_locked helper from mddev_find
Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".

Cc: stable@vger.kernel.org
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2021-04-07 22:41:26 -07:00
Zhao Heming
6a4db2a603 md: md_open returns -EBUSY when entering racing area
commit d3374825ce ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.

This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).

For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.

*** env ***
kvm-qemu VM 2C1G with 2 iscsi luns
kernel should be non-preempt

*** script ***

about trigger every time with below script

```
1  node1="mdcluster1"
2  node2="mdcluster2"
3
4  mdadm -Ss
5  ssh ${node2} "mdadm -Ss"
6  wipefs -a /dev/sda /dev/sdb
7  mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \
   /dev/sdb --assume-clean
8
9  for i in {1..10}; do
10    echo ==== $i ====;
11
12    echo "test  ...."
13    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
14    sleep 1
15
16    echo "clean  ....."
17    ssh ${node2} "mdadm -Ss"
18 done
```

I use mdcluster env to trigger soft lockup, but it isn't mdcluster
speical bug. To stop md array in mdcluster env will do more jobs than
non-cluster array, which will leave enough time/gap to allow kernel to
run md_open.

*** stack ***

```
[  884.226509]  mddev_put+0x1c/0xe0 [md_mod]
[  884.226515]  md_open+0x3c/0xe0 [md_mod]
[  884.226518]  __blkdev_get+0x30d/0x710
[  884.226520]  ? bd_acquire+0xd0/0xd0
[  884.226522]  blkdev_get+0x14/0x30
[  884.226524]  do_dentry_open+0x204/0x3a0
[  884.226531]  path_openat+0x2fc/0x1520
[  884.226534]  ? seq_printf+0x4e/0x70
[  884.226536]  do_filp_open+0x9b/0x110
[  884.226542]  ? md_release+0x20/0x20 [md_mod]
[  884.226543]  ? seq_read+0x1d8/0x3e0
[  884.226545]  ? kmem_cache_alloc+0x18a/0x270
[  884.226547]  ? do_sys_open+0x1bd/0x260
[  884.226548]  do_sys_open+0x1bd/0x260
[  884.226551]  do_syscall_64+0x5b/0x1e0
[  884.226554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
```

*** rootcause ***

"mdadm -A" (or other array assemble commands) will start a daemon "mdadm
--monitor" by default. When "mdadm -Ss" is running, the stop action will
wakeup "mdadm --monitor". The "--monitor" daemon will immediately get
info from /proc/mdstat. This time mddev in kernel still exist, so
/proc/mdstat still show md device, which makes "mdadm --monitor" to open
/dev/md0.

The previously "mdadm -Ss" is removing action, the "mdadm --monitor"
open action will trigger md_open which is creating action. Racing is
happening.

```
<thread 1>: "mdadm -Ss"
md_release
  mddev_put deletes mddev from all_mddevs
  queue_work for mddev_delayed_delete
  at this time, "/dev/md0" is still available for opening

<thread 2>: "mdadm --monitor ..."
md_open
 + mddev_find can't find mddev of /dev/md0, and create a new mddev and
 |    return.
 + trigger "if (mddev->gendisk != bdev->bd_disk)" and return
      -ERESTARTSYS.
```

In non-preempt kernel, <thread 2> is occupying on current CPU. and
mddev_delayed_delete which was created in <thread 1> also can't be
schedule.

In preempt kernel, it can also trigger above racing. But kernel doesn't
allow one thread running on a CPU all the time. after <thread 2> running
some time, the later "mdadm -A" (refer above script line 13) will call
md_alloc to alloc a new gendisk for mddev. it will break md_open
statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller,
the soft lockup is broken.

Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
2021-04-07 22:41:26 -07:00
Guobin Huang
9c282c29a3 drbd: use DEFINE_SPINLOCK() for spinlock
spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Link: https://lore.kernel.org/r/1617710988-49205-1-git-send-email-huangguobin4@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:31:42 -06:00
Christoph Hellwig
b60b270b3d swim3: support highmem
swim3 only uses the virtual address of a bio to stash it into the data
transfer using virt_to_bus.  But the ppc32 virt_to_bus just uses the
physical address with an offset.  Replace virt_to_bus with a local hack
that performs the equivalent transformation and stop asking for block
layer bounce buffering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061839.811588-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:30:09 -06:00
Christoph Hellwig
3d86739c63 floppy: always use the track buffer
Always use the track buffer that is already used for addresses outside
the 16MB address capability of the floppy controller.  This allows to
remove a lot of code that relies on kernel virtual addresses.  With
this gone there is just a single place left that looks at the bio,
which can be converted to memcpy_{from,to}_page, thus removing the need
for the extra block-layer bounce buffering for highmem pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061755.811522-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:29:57 -06:00
Christoph Hellwig
4c6e5bc8c0 swim: don't call blk_queue_bounce_limit
m68k doesn't support highmem, so don't bother enabling the block layer
bounce buffer code.  Just for safety throw in a depend on !HIGHMEM.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061725.811389-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:29:47 -06:00
Christoph Hellwig
1d2c82001a gdrom: support highmem
The gdrom driver only has a single reference to the virtual address of
the bio data, and uses that only to get the physical address.  Switch
to deriving the physical address from the page directly and thus avoid
bounce buffering highmem data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406061648.811275-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:29:36 -06:00
Lee Jones
a425711c6c block: drbd: drbd_nl: Demote half-complete kernel-doc headers
Fixes the following W=1 kernel build warning(s):

 from drivers/block/drbd/drbd_nl.c:24:
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
 drivers/block/drbd/drbd_nl.c:1968:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'flags' not described in 'drbd_determine_dev_size'
 drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'rs' not described in 'drbd_determine_dev_size'
 drivers/block/drbd/drbd_nl.c:1148: warning: Function parameter or member 'dc' not described in 'drbd_check_al_size'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-12-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
5fdbd5bc49 block: xen-blkfront: Demote kernel-doc abuses
Fixes the following W=1 kernel build warning(s):

 drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'dev' not described in 'blkfront_probe'
 drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'id' not described in 'blkfront_probe'
 drivers/block/xen-blkfront.c:1960: warning: expecting prototype for Allocate the basic(). Prototype was for blkfront_probe() instead
 drivers/block/xen-blkfront.c:2085: warning: Function parameter or member 'dev' not described in 'blkfront_resume'
 drivers/block/xen-blkfront.c:2085: warning: expecting prototype for or a backend(). Prototype was for blkfront_resume() instead
 drivers/block/xen-blkfront.c:2444: warning: wrong kernel-doc identifier on line:

Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: xen-devel@lists.xenproject.org
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210312105530.2219008-11-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
6ec2a0f2bc block: drbd: drbd_receiver: Demote less than half complete kernel-doc header
Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-10-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
584164c805 block: drbd: drbd_main: Fix a bunch of function documentation discrepancies
Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_main.c:278: warning: Function parameter or member 'connection' not described in 'tl_clear'
 drivers/block/drbd/drbd_main.c:278: warning: Excess function parameter 'device' description in 'tl_clear'
 drivers/block/drbd/drbd_main.c:489: warning: Function parameter or member 'cpu_mask' not described in 'drbd_calc_cpu_mask'
 drivers/block/drbd/drbd_main.c:528: warning: Excess function parameter 'device' description in 'drbd_thread_current_set_cpu'
 drivers/block/drbd/drbd_main.c:549: warning: Function parameter or member 'connection' not described in 'drbd_header_size'
 drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'device' not described in 'send_bitmap_rle_or_plain'
 drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'c' not described in 'send_bitmap_rle_or_plain'
 drivers/block/drbd/drbd_main.c:1335: warning: Function parameter or member 'peer_device' not described in '_drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1335: warning: Excess function parameter 'device' description in '_drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1379: warning: Function parameter or member 'peer_device' not described in 'drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1379: warning: Excess function parameter 'device' description in 'drbd_send_ack'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'connection' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'sock' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'buffer' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'size' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'msg_flags' not described in 'drbd_send_all'
 drivers/block/drbd/drbd_main.c:3525: warning: Function parameter or member 'flags' not described in 'drbd_queue_bitmap_io'
 drivers/block/drbd/drbd_main.c:3563: warning: Function parameter or member 'flags' not described in 'drbd_bitmap_io'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-9-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
1f1e87b4dc block: drbd: drbd_nl: Make conversion to 'enum drbd_ret_code' explicit
Fixes the following W=1 kernel build warning(s):

 from drivers/block/drbd/drbd_nl.c:24:
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_set_role’:
 drivers/block/drbd/drbd_nl.c:793:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c:795:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’:
 drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_connect’:
 drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]
 drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_disconnect’:
 drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion]

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-8-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
f58a0d184e block: drbd: drbd_main: Remove duplicate field initialisation
[P_RETRY_WRITE] is initialised more than once.

Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_main.c: In function ‘cmdname’:
 drivers/block/drbd/drbd_main.c:3660:22: warning: initialized field overwritten [-Woverride-init]
 drivers/block/drbd/drbd_main.c:3660:22: note: (near initialization for ‘cmdnames[44]’)

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-7-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
9b48ff0787 block: drbd: drbd_receiver: Demote non-conformant kernel-doc headers
Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_receiver.c:265: warning: Function parameter or member 'peer_device' not described in 'drbd_alloc_pages'
 drivers/block/drbd/drbd_receiver.c:265: warning: Excess function parameter 'device' description in 'drbd_alloc_pages'
 drivers/block/drbd/drbd_receiver.c:1362: warning: Function parameter or member 'connection' not described in 'drbd_may_finish_epoch'
 drivers/block/drbd/drbd_receiver.c:1362: warning: Excess function parameter 'device' description in 'drbd_may_finish_epoch'
 drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'resource' not described in 'drbd_bump_write_ordering'
 drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'bdev' not described in 'drbd_bump_write_ordering'
 drivers/block/drbd/drbd_receiver.c:1451: warning: Excess function parameter 'connection' description in 'drbd_bump_write_ordering'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:1643: warning: Excess function parameter 'rw' description in 'drbd_submit_peer_request'
 drivers/block/drbd/drbd_receiver.c:3055: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_0p'
 drivers/block/drbd/drbd_receiver.c:3138: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_1p'
 drivers/block/drbd/drbd_receiver.c:3195: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_2p'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'peer_device' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'size' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'p' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'c' not described in 'receive_bitmap_plain'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'peer_device' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'p' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'c' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'len' not described in 'recv_bm_rle_bits'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'peer_device' not described in 'decode_bitmap_c'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'p' not described in 'decode_bitmap_c'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'c' not described in 'decode_bitmap_c'
 drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'len' not described in 'decode_bitmap_c'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-6-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
49ece311fd block: drbd: drbd_state: Fix some function documentation issues
Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_state.c:913: warning: Function parameter or member 'connection' not described in 'is_valid_soft_transition'
 drivers/block/drbd/drbd_state.c:913: warning: Excess function parameter 'device' description in 'is_valid_soft_transition'
 drivers/block/drbd/drbd_state.c:1054: warning: Function parameter or member 'warn' not described in 'sanitize_state'
 drivers/block/drbd/drbd_state.c:1054: warning: Excess function parameter 'warn_sync_abort' description in 'sanitize_state'
 drivers/block/drbd/drbd_state.c:1703: warning: Function parameter or member 'state_change' not described in 'after_state_ch'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-5-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
d0e0cb970e block: mtip32xx: mtip32xx: Mark debugging variable 'start' as __maybe_unused
Fixes the following W=1 kernel build warning(s):

 drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_standby_immediate’:
 drivers/block/mtip32xx/mtip32xx.c:1216:16: warning: variable ‘start’ set but not used [-Wunused-but-set-variable]

Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-4-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Lee Jones
b8b8710354 block: drbd: drbd_interval: Demote some kernel-doc abuses and fix another header
Fixes the following W=1 kernel build warning(s):

 drivers/block/drbd/drbd_interval.c:11: warning: Function parameter or member 'node' not described in 'interval_end'
 drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'root' not described in 'drbd_insert_interval'
 drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'this' not described in 'drbd_insert_interval'
 drivers/block/drbd/drbd_interval.c:70: warning: Function parameter or member 'root' not described in 'drbd_contains_interval'
 drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'root' not described in 'drbd_remove_interval'
 drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'this' not described in 'drbd_remove_interval'
 drivers/block/drbd/drbd_interval.c:113: warning: Function parameter or member 'root' not described in 'drbd_find_overlap'

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20210312105530.2219008-3-lee.jones@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06 09:21:53 -06:00
Jens Axboe
762d6bd27d nvme updates for Linux 5.13
- fix handling of very large MDTS values (Bart Van Assche)
  - retrigger ANA log update if group descriptor isn't found
    (Hannes Reinecke)
  - fix locking contexts in nvme-tcp and nvmet-tcp (Sagi Grimberg)
  - return proper error code from discovery ctrl (Hou Pu)
  - verify the SGLS field in nvmet-tcp and nvmet-fc (Max Gurtovoy)
  - disallow passthru cmd from targeting a nsid != nsid of the block dev
    (Niklas Cassel)
  - do not allow model_number exceed 40 bytes in nvmet (Noam Gottlieb)
  - enable optional queue idle period tracking in nvmet-tcp
    (Mark Wunderlich)
  - various cleanups and optimizations (Chaitanya Kulkarni, Kanchan Joshi)
  - expose fast_io_fail_tmo in sysfs (Daniel Wagner)
  - implement non-MDTS command limits (Keith Busch)
  - reduce warnings for unhandled command effects (Keith Busch)
  - allocate storage for the SQE as part of the nvme_request (Keith Busch)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmBsAg4LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNZgBAAjRIEqMF7Ii9gyRDmdeDglGvki7wtHGatsmgUrFJ6
 Ra4mTRRqA8jzkgWPm3enaR2KtzqdNEvlsENwySnffxrD3XlYBR6A4SxGgfjnDR+e
 LFbIIl8ttIxTGs4LbaaXc/uLjuJsZav30DC7bqV7lJxIeDxqL064XKy/LtTipy01
 dPJiY8WNhz+LV/FvurCd5uBcx8SEix/olb1z65frfrvov05TmCq7qFqgT3y8B0pM
 tqxnXZM9t9mmNpDQ6748R+ac+/ZytwPWU2KmVE2mbyHgG/ot/2p09R/xryJJVvbf
 ndlgJduyoLYW6O0MDsK5sYQwrzykVE/ZY5pfpRcqgCxhAILMMm8Mfg7ZhyOeIt9t
 0n0Kjo7Yw6MCe8PCVwAemOzcQXQmnSG6PMwcasmjfvY9CFAaSSSYPg2P/sBEEadK
 amJ0P5qpYH5dD8NX7+igCH4vaAlS5NMmf5USMsn86vDDGSpnDL9gyL2tw6WWpZy6
 MbkQPy5RV8XKgdLW2w3P6CDNxe2XECmwH1WatVoDzOoQgBLjdPZ1+pEKh0XG2oKD
 RVEZ0GdpYIDKs/IAt5TFkvf7KPDpw4rCtihJ9IvOBGYLthG0PyRvqhFKs1NVbZVZ
 vFyES+BsZZafDqYJNOGbAtTks45hc+GK1EqiSsB4EL8eCSGx+7RdDP2XkabVMgNj
 PzM=
 =zRY9
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme into for-5.13/drivers

Pull NVMe updates from Christoph:

"nvme updates for Linux 5.13

 - fix handling of very large MDTS values (Bart Van Assche)
 - retrigger ANA log update if group descriptor isn't found
   (Hannes Reinecke)
 - fix locking contexts in nvme-tcp and nvmet-tcp (Sagi Grimberg)
 - return proper error code from discovery ctrl (Hou Pu)
 - verify the SGLS field in nvmet-tcp and nvmet-fc (Max Gurtovoy)
 - disallow passthru cmd from targeting a nsid != nsid of the block dev
   (Niklas Cassel)
 - do not allow model_number exceed 40 bytes in nvmet (Noam Gottlieb)
 - enable optional queue idle period tracking in nvmet-tcp
   (Mark Wunderlich)
 - various cleanups and optimizations (Chaitanya Kulkarni, Kanchan Joshi)
 - expose fast_io_fail_tmo in sysfs (Daniel Wagner)
 - implement non-MDTS command limits (Keith Busch)
 - reduce warnings for unhandled command effects (Keith Busch)
 - allocate storage for the SQE as part of the nvme_request (Keith Busch)"

* tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme: (33 commits)
  nvme: fix handling of large MDTS values
  nvme: implement non-mdts command limits
  nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
  nvme: retrigger ANA log update if group descriptor isn't found
  nvme: export fast_io_fail_tmo to sysfs
  nvme: remove superfluous else in nvme_ctrl_loss_tmo_store
  nvme: use sysfs_emit instead of sprintf
  nvme-fc: check sgl supported by target
  nvme-tcp: check sgl supported by target
  nvmet-tcp: enable optional queue idle period tracking
  nvmet-tcp: fix incorrect locking in state_change sk callback
  nvme-tcp: block BH in sk state_change sk callback
  nvmet: return proper error code from discovery ctrl
  nvme: warn of unhandled effects only once
  nvme: use driver pdu command for passthrough
  nvme-pci: allocate nvme_command within driver pdu
  nvmet: do not allow model_number exceed 40 bytes
  nvmet: remove unnecessary ctrl parameter
  nvmet-fc: update function documentation
  nvme-fc: fix the function documentation comment
  ...
2021-04-06 09:17:22 -06:00
Bart Van Assche
8609c63fce nvme: fix handling of large MDTS values
Instead of triggering an integer overflow and undefined behavior if MDTS is
large, set max_hw_sectors to UINT_MAX.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[hch: rebased to account for the new nvme_mps_to_sectors helper]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:39 +02:00
Keith Busch
5befc7c26e nvme: implement non-mdts command limits
Commands that access LBA contents without a data transfer between the
host historically have not had a spec defined upper limit. The driver
set the queue constraints for such commands to the max data transfer
size just to be safe, but this artificial constraint frequently limits
devices below their capabilities.

The NVMe Workgroup ratified TP4040 defines how a controller may
advertise their non-MDTS limits. Use these if provided and default to
the current constraints if not. Since the Dataset Management command
limits are defined in logical blocks, but without a namespace to tell us
the logical block size, the code defaults to the safe 512b size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:39 +02:00
Niklas Cassel
c881a23fb6 nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
When a passthru command targets a specific namespace, the ns parameter to
nvme_user_cmd()/nvme_user_cmd64() is set. However, there is currently no
validation that the nsid specified in the passthru command targets the
namespace/nsid represented by the block device that the ioctl was
performed on.

Add a check that validates that the nsid in the passthru command matches
that of the supplied namespace.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:38 +02:00
Hannes Reinecke
dd8f7fa908 nvme: retrigger ANA log update if group descriptor isn't found
If ANA is enabled but no ANA group descriptor is found when creating
a new namespace the ANA log is most likely out of date, so trigger
a re-read. The namespace will be tagged with the NS_ANA_PENDING flag
to exclude it from path selection until the ANA log has been re-read.

Fixes: 32acab3181 ("nvme: implement multipath access to nvme subsystems")
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:38 +02:00
Daniel Wagner
09fbed6363 nvme: export fast_io_fail_tmo to sysfs
Commit 8c4dfea97f ("nvme-fabrics: reject I/O to offline device")
introduced fast_io_fail_tmo but didn't export the value to sysfs. The
value can be set during the 'nvme connect'. Export the timeout value
to user space via sysfs to allow runtime configuration.

Cc: Victor Gladkov <Victor.Gladkov@kioxia.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhaani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:29 +02:00