Commit Graph

7181 Commits

Author SHA1 Message Date
Linus Torvalds
9a1d6c9e3f Merge tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "Sitting on top of the core block changes, here are the driver changes
  for the 5.15 merge window:

   - NVMe updates via Christoph:
       - suspend improvements for devices with an HMB (Keith Busch)
       - handle double completions more gacefull (Sagi Grimberg)
       - cleanup the selects for the nvme core code a bit (Sagi Grimberg)
       - don't update queue count when failing to set io queues (Ruozhu Li)
       - various nvmet connect fixes (Amit Engel)
       - cleanup lightnvm leftovers (Keith Busch, me)
       - small cleanups (Colin Ian King, Hou Pu)
       - add tracing for the Set Features command (Hou Pu)
       - CMB sysfs cleanups (Keith Busch)
       - add a mutex_destroy call (Keith Busch)

   - remove lightnvm subsystem. It's served its purpose and ultimately
     led to zoned nvme support, we no longer need it (Christoph)

   - revert floppy O_NDELAY fix (Denis)

   - nbd fixes (Hou, Pavel, Baokun)

   - nbd locking fixes (Tetsuo)

   - nbd device removal fixes (Christoph)

   - raid10 rcu warning fix (Xiao)

   - raid1 write behind fix (Guoqing)

   - rnbd fixes (Gioh, Md Haris)

   - misc fixes (Colin)"

* tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits)
  Revert "floppy: reintroduce O_NDELAY fix"
  raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
  md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard
  nbd: remove nbd->destroy_complete
  nbd: only return usable devices from nbd_find_unused
  nbd: set nbd->index before releasing nbd_index_mutex
  nbd: prevent IDR lookups from finding partially initialized devices
  nbd: reset NBD to NULL when restarting in nbd_genl_connect
  nbd: add missing locking to the nbd_dev_add error path
  nvme: remove the unused NVME_NS_* enum
  nvme: remove nvm_ndev from ns
  nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers
  block: nbd: add sanity check for first_minor
  nvmet: check that host sqsize does not exceed ctrl MQES
  nvmet: avoid duplicate qid in connect cmd
  nvmet: pass back cntlid on successful completion
  nvme-rdma: don't update queue count when failing to set io queues
  nvme-tcp: don't update queue count when failing to set io queues
  nvme-tcp: pair send_mutex init with destroy
  nvme: allow user toggling hmb usage
  ...
2021-08-30 19:01:46 -07:00
Linus Torvalds
679369114e Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "Nothing major in here - lots of good cleanups and tech debt handling,
  which is also evident in the diffstats. In particular:

   - Add disk sequence numbers (Matteo)

   - Discard merge fix (Ming)

   - Relax disk zoned reporting restrictions (Niklas)

   - Bio error handling zoned leak fix (Pavel)

   - Start of proper add_disk() error handling (Luis, Christoph)

   - blk crypto fix (Eric)

   - Non-standard GPT location support (Dmitry)

   - IO priority improvements and cleanups (Damien)o

   - blk-throtl improvements (Chunguang)

   - diskstats_show() stack reduction (Abd-Alrhman)

   - Loop scheduler selection (Bart)

   - Switch block layer to use kmap_local_page() (Christoph)

   - Remove obsolete disk_name helper (Christoph)

   - block_device refcounting improvements (Christoph)

   - Ensure gendisk always has a request queue reference (Christoph)

   - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)"

* tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits)
  sg: pass the device name to blk_trace_setup
  block, bfq: cleanup the repeated declaration
  blk-crypto: fix check for too-large dun_bytes
  blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
  blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
  block: mark blkdev_fsync static
  block: refine the disk_live check in del_gendisk
  mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA
  mmc: block: Support alternative_gpt_sector() operation
  partitions/efi: Support non-standard GPT location
  block: Add alternative_gpt_sector() operation
  bio: fix page leak bio_add_hw_page failure
  block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT
  block: remove a pointless call to MINOR() in device_add_disk
  null_blk: add error handling support for add_disk()
  virtio_blk: add error handling support for add_disk()
  block: add error handling for device_add_disk / add_disk
  block: return errors from disk_alloc_events
  block: return errors from blk_integrity_add
  block: call blk_register_queue earlier in device_add_disk
  ...
2021-08-30 18:52:11 -07:00
Denis Efremov
c7e9d00203 Revert "floppy: reintroduce O_NDELAY fix"
The patch breaks userspace implementations (e.g. fdutils) and introduces
regressions in behaviour. Previously, it was possible to O_NDELAY open a
floppy device with no media inserted or with write protected media without
an error. Some userspace tools use this particular behavior for probing.

It's not the first time when we revert this patch. Previous revert is in
commit f2791e7ead (Revert "floppy: refactor open() flags handling").

This reverts commit 8a0c014cd2.

Link: https://lore.kernel.org/linux-block/de10cb47-34d1-5a88-7751-225ca380f735@compro.net/
Reported-by: Mark Hounschell <markh@compro.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Wim Osterholt <wim@djo.tudelft.nl>
Cc: Kurt Garloff <kurt@garloff.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
2021-08-28 11:16:47 +03:00
Linus Torvalds
64b4fc45be Merge tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - Revert the mq-deadline priority handling, it's causing serious
   performance regressions. While experimental patches exists to fix
   this up, it's too late to do so now. Revert it and re-do it properly
   for 5.15 instead.

 - Fix a NULL vs IS_ERR() regression in this release (Dan)

 - Fix a mq-deadline accounting regression in this release (Bart)

 - Mark cryptoloop as deprecated. It's broken and dm-crypt fully
   supports it, and it's actively intefering with loop. Plan on removal
   for 5.16 (Christoph)

* tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block:
  cryptoloop: add a deprecation warning
  pd: fix a NULL vs IS_ERR() check
  Revert "block/mq-deadline: Prioritize high-priority requests"
  mq-deadline: Fix request accounting
2021-08-27 16:08:29 -07:00
Christoph Hellwig
222013f9ac cryptoloop: add a deprecation warning
Support for cryptoloop has been officially marked broken and deprecated
in favor of dm-crypt (which supports the same broken algorithms if
needed) in Linux 2.6.4 (released in March 2004), and support for it has
been entirely removed from losetup in util-linux 2.23 (released in April
2013).  Add a warning and a deprecation schedule.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-27 10:44:54 -06:00
Dan Carpenter
3375dca0b5 pd: fix a NULL vs IS_ERR() check
blk_mq_alloc_disk() returns error pointers, it doesn't return NULL
so correct the check.

Fixes: 262d431f90 ("pd: use blk_mq_alloc_disk and blk_cleanup_disk")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20210827100023.GB9449@kili
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-27 07:45:48 -06:00
Christoph Hellwig
7ee656c3ac nbd: remove nbd->destroy_complete
The nbd->destroy_complete pointer is not really needed.  For creating
a device without a specific index we now simplify skip devices marked
NBD_DESTROY_ON_DISCONNECT as there is not much point to reuse them.
For device creation with a specific index there is no real need to
treat the case of a requested but not finished disconnect different
than any other device that is being shutdown, i.e. we can just return
an error, as a slightly different race window would anyway.

Fixes: 6e4df4c648 ("nbd: reduce the nbd_index_mutex scope")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot+2c98885bcd769f56b6d6@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210825163108.50713-7-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 14:20:32 -06:00
Christoph Hellwig
438cd318c8 nbd: only return usable devices from nbd_find_unused
Device marked as NBD_DESTROY_ON_DISCONNECT can and should be skipped
given that they won't survive the disconnect.  So skip them and try
to grab a reference directly and just continue if the the devices
is being torn down or created and thus has a zero refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210825163108.50713-6-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 14:20:29 -06:00
Tetsuo Handa
b190300dec nbd: set nbd->index before releasing nbd_index_mutex
Set nbd->index before releasing nbd_index_mutex, as populate_nbd_status()
might access nbd->index as soon as nbd_index_mutex is released.

Fixes: 6e4df4c648 ("nbd: reduce the nbd_index_mutex scope")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210825163108.50713-5-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 14:20:25 -06:00
Tetsuo Handa
75b7f62aa6 nbd: prevent IDR lookups from finding partially initialized devices
Previously nbd_index_mutex was held during whole add/remove/lookup
operations in order to guarantee that partially initialized devices are
not reachable via idr_find() or idr_for_each(). But now that partially
initialized devices become reachable as soon as idr_alloc() succeeds,
we need to skip partially initialized devices. Since it seems that
all functions use refcount_inc_not_zero(&nbd->refs) in order to skip
destroying devices, update nbd->refs from zero to non-zero as the last
step of device initialization in order to also skip partially initialized
devices.

Fixes: 6e4df4c648 ("nbd: reduce the nbd_index_mutex scope")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[hch: split from a larger patch, added comments]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210825163108.50713-4-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 14:20:22 -06:00
Christoph Hellwig
409e0ff10e nbd: reset NBD to NULL when restarting in nbd_genl_connect
When nbd_genl_connect restarts to wait for a disconnecting device, nbd
needs to be reset to NULL.  Do that by facoring out a helper to find
an unused device.

Fixes: 6177b56c96 ("nbd: refactor device search and allocation in nbd_genl_connect")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210825163108.50713-3-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 14:20:20 -06:00
Tetsuo Handa
93f63bc41f nbd: add missing locking to the nbd_dev_add error path
idr_remove needs external synchronization.

Fixes: 6e4df4c648 ("nbd: reduce the nbd_index_mutex scope")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210825163108.50713-2-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25 14:20:08 -06:00
Luis Chamberlain
10e7123d55 null_blk: add error handling support for add_disk()
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling. The actual cleanup in case of error is
already handled by the caller of null_gendisk_register().

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23 12:55:46 -06:00
Luis Chamberlain
dbb301f91f virtio_blk: add error handling support for add_disk()
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23 12:55:45 -06:00
Pavel Skripkin
b1a811633f block: nbd: add sanity check for first_minor
Syzbot hit WARNING in internal_create_group(). The problem was in
too big disk->first_minor.

disk->first_minor is initialized by value, which comes from userspace
and there wasn't any sanity checks about value correctness. It can cause
duplicate creation of sysfs files/links, because disk->first_minor will
be passed to MKDEV() which causes truncation to byte. Since maximum
minor value is 0xff, let's check if first_minor is correct minor number.

NOTE: the root case of the reported warning was in wrong error handling
in register_disk(), but we can avoid passing knowingly wrong values to
sysfs API, because sysfs error messages can confuse users. For example:
user passed 1048576 as index, but sysfs complains about duplicate
creation of /dev/block/43:0. It's not obvious how 1048576 becomes 0.
Log and reproducer for above example can be found on syzkaller bug
report page.

Link: https://syzkaller.appspot.com/bug?id=03c2ae9146416edf811958d5fd7acfab75b143d1
Fixes: b0d9111a2d ("nbd: use an idr to keep track of nbd devices")
Reported-by: syzbot+9937dc42271cd87d4b98@syzkaller.appspotmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16 10:56:33 -06:00
Christoph Hellwig
6da525b3ec ps3vram: use bvec_virt
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16 10:50:33 -06:00
Christoph Hellwig
358b348b91 virtio_blk: use bvec_virt
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20210804095634.460779-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16 10:50:33 -06:00
Christoph Hellwig
cf58b53778 rbd: use bvec_virt
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20210804095634.460779-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16 10:50:33 -06:00
Linus Torvalds
94e95d5899 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Fixes in virtio, vhost, and vdpa drivers"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Fix queue type selection logic
  vdpa/mlx5: Avoid destroying MR on empty iotlb
  tools/virtio: fix build
  virtio_ring: pull in spinlock header
  vringh: pull in spinlock header
  virtio-blk: Add validation for block size in config space
  vringh: Use wiov->used to check for read/write desc order
  virtio_vdpa: reject invalid vq indices
  vdpa: Add documentation for vdpa_alloc_device() macro
  vDPA/ifcvf: Fix return value check for vdpa_alloc_device()
  vp_vdpa: Fix return value check for vdpa_alloc_device()
  vdpa_sim: Fix return value check for vdpa_alloc_device()
  vhost: Fix the calculation in vhost_overflow()
  vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
  virtio_pci: Support surprise removal of virtio pci device
  virtio: Protect vqs list access
  virtio: Keep vring_del_virtqueue() mirror of VQ create
  virtio: Improve vq->broken access to avoid any compiler optimization
2021-08-16 06:16:25 -10:00
Linus Torvalds
020efdadd8 Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A few fixes for block that should go into 5.14:

   - Revert the mq-deadline cgroup addition. More work is needed on this
     front, let's revert it for now and get it right before having it in
     a released kernel (Tejun)

   - blk-iocost lockdep fix (Ming)

   - nbd double completion fix (Xie)

   - Fix for non-idling when clearing the shared tag flag (Yu)"

* tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
  nbd: Aovid double completion of a request
  blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
  Revert "block/mq-deadline: Add cgroup support"
  blk-iocost: fix lockdep warning on blkcg->lock
2021-08-13 13:36:42 -10:00
Christoph Hellwig
6e4df4c648 nbd: reduce the nbd_index_mutex scope
nbd_index_mutex is currently held over add_disk and inside ->open, which
leads to lock order reversals.  Refactor the device creation code path
so that nbd_dev_add is called without nbd_index_mutex lock held and
only takes it for the IDR insertation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210811124428.2368491-7-hch@lst.de
[axboe: fix whitespace]
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 14:17:41 -06:00
Christoph Hellwig
6177b56c96 nbd: refactor device search and allocation in nbd_genl_connect
Use idr_for_each_entry instead of the awkward callback to find an
existing device for the index == -1 case, and de-duplicate the device
allocation if no existing device was found.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210811124428.2368491-6-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 14:17:39 -06:00
Christoph Hellwig
7bdc00cf7e nbd: return the allocated nbd_device from nbd_dev_add
Return the device we just allocated instead of doing an extra search for
it in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210811124428.2368491-5-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 14:17:37 -06:00
Christoph Hellwig
327b501b1d nbd: remove nbd_del_disk
Fold nbd_del_disk and remove the pointless NULL check on ->disk given
that it is always set for a successfully allocated nbd_device structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210811124428.2368491-4-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 14:17:36 -06:00
Christoph Hellwig
3f74e0645c nbd: refactor device removal
Share common code for the synchronous and workqueue based device removal,
and remove the pointless use of refcount_dec_and_mutex_lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210811124428.2368491-3-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 14:17:32 -06:00
Hou Tao
68c9417b19 nbd: do del_gendisk() asynchronously for NBD_DESTROY_ON_DISCONNECT
Now open_mutex is used to synchronize partition operations (e.g,
blk_drop_partitions() and blkdev_reread_part()), however it makes
nbd driver broken, because nbd may call del_gendisk() in nbd_release()
or nbd_genl_disconnect() if NBD_CFLAG_DESTROY_ON_DISCONNECT is enabled,
and deadlock occurs, as shown below:

// AB-BA dead-lock
nbd_genl_disconnect            blkdev_open
  nbd_disconnect_and_put
                                 lock bd_mutex
  // last ref
  nbd_put
    lock nbd_index_mutex
      del_gendisk
                                   nbd_open
                                     try lock nbd_index_mutex
        try lock bd_mutex

 or

// AA dead-lock
nbd_release
  lock bd_mutex
    nbd_put
      try lock bd_mutex

Instead of fixing block layer (e.g, introduce another lock), fixing
the nbd driver to call del_gendisk() in a kworker when
NBD_DESTROY_ON_DISCONNECT is enabled. When NBD_DESTROY_ON_DISCONNECT
is disabled, nbd device will always be destroy through module removal,
and there is no risky of deadlock.

To ensure the reuse of nbd index succeeds, moving the calling of
idr_remove() after del_gendisk(), so if the reused index is not found
in nbd_index_idr, the old disk must have been deleted. And reusing
the existing destroy_complete mechanism to ensure nbd_genl_connect()
will wait for the completion of del_gendisk().

Also adding a new workqueue for nbd removal, so nbd_cleanup()
can ensure all removals complete before exits.

Reported-by: syzbot+0fe7752e52337864d29b@syzkaller.appspotmail.com
Fixes: c76f48eb5c ("block: take bd_mutex around delete_partitions in del_gendisk")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210811124428.2368491-2-hch@lst.de
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 14:17:16 -06:00
Baokun Li
fad7cd3310 nbd: add the check to prevent overflow in __nbd_ioctl()
If user specify a large enough value of NBD blocks option, it may trigger
signed integer overflow which may lead to nbd->config->bytesize becomes a
large or small value, zero in particular.

UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31
signed integer overflow:
1024 * 4611686155866341414 cannot be represented in type 'long long int'
[...]
Call trace:
[...]
 handle_overflow+0x188/0x1dc lib/ubsan.c:192
 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
 nbd_size_set drivers/block/nbd.c:325 [inline]
 __nbd_ioctl drivers/block/nbd.c:1342 [inline]
 nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395
 __blkdev_driver_ioctl block/ioctl.c:311 [inline]
[...]

Although it is not a big deal, still silence the UBSAN by limit
the input value.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com
[axboe: dropped unlikely()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 10:09:26 -06:00
Xie Yongji
cddce01160 nbd: Aovid double completion of a request
There is a race between iterating over requests in
nbd_clear_que() and completing requests in recv_work(),
which can lead to double completion of a request.

To fix it, flush the recv worker before iterating over
the requests and don't abort the completed request
while iterating.

Fixes: 96d97e1782 ("nbd: clear_sock on netlink disconnect")
Reported-by: Jiang Yadong <jiangyadong@bytedance.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13 09:46:48 -06:00
Christoph Hellwig
4f9e14aecf sx8: use the internal state machine to check if del_gendisk needs to be called
Remove usage of the block layer internal GENHD_FL_UP by just looking
at the host state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-12 10:29:36 -06:00
Guoqing Jiang
018eca456c block: move some macros to blkdev.h
Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the
generic header file to remove redundancy.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-11 19:40:28 -06:00
Xie Yongji
82e89ea077 virtio-blk: Add validation for block size in config space
An untrusted device might presents an invalid block size
in configuration space. This tries to add validation for it
in the validate callback and clear the VIRTIO_BLK_F_BLK_SIZE
feature bit if the value is out of the supported range.

And we also double check the value in virtblk_probe() in
case that it's changed after the validation.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210809101609.148-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-08-11 06:44:24 -04:00
Colin Ian King
da20b58d5b xen-blkfront: Remove redundant assignment to variable err
The variable err is being assigned a value that is never read, the
assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210806110601.11386-1-colin.king@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09 20:04:46 -06:00
Christoph Hellwig
edb0872f44 block: move the bdi from the request_queue to the gendisk
The backing device information only makes sense for file system I/O,
and thus belongs into the gendisk and not the lower level request_queue
structure.  Move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09 11:53:23 -06:00
Christoph Hellwig
471aa704db block: pass a gendisk to blk_queue_update_readahead
.. and rename the function to disk_update_readahead.  This is in
preparation for moving the BDI from the request_queue to the gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09 11:52:28 -06:00
Linus Torvalds
6bbf59145c Merge tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A few minor fixes:

   - Fix ldm kernel-doc warning (Bart)

   - Fix adding offset twice for DMA address in n64cart (Christoph)

   - Fix use-after-free in dasd path handling (Stefan)

   - Order kyber insert trace correctly (Vincent)

   - raid1 errored write handling fix (Wei)

   - Fix blk-iolatency queue get failure handling (Yu)"

* tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-block:
  kyber: make trace_block_rq call consistent with documentation
  block/partitions/ldm.c: Fix a kernel-doc warning
  blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit()
  n64cart: fix the dma address in n64cart_do_bvec
  s390/dasd: fix use after free in dasd path handling
  md/raid10: properly indicate failure when ending a failed write request
2021-08-07 10:26:21 -07:00
Bart Van Assche
2112f5c133 loop: Select I/O scheduler 'none' from inside add_disk()
We noticed that the user interface of Android devices becomes very slow
under memory pressure. This is because Android uses the zram driver on top
of the loop driver for swapping, because under memory pressure the swap
code alternates reads and writes quickly, because mq-deadline is the
default scheduler for loop devices and because mq-deadline delays writes by
five seconds for such a workload with default settings. Fix this by making
the kernel select I/O scheduler 'none' from inside add_disk() for loop
devices. This default can be overridden at any time from user space,
e.g. via a udev rule. This approach has an advantage compared to changing
the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
that synchronize_rcu() does not get called.

This patch changes the default I/O scheduler for loop devices from
'mq-deadline' into 'none'.

Additionally, this patch reduces the Android boot time on my test setup
with 0.5 seconds compared to configuring the loop I/O scheduler from user
space.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-05 11:49:21 -06:00
Christoph Hellwig
402e0b8cd0 n64cart: fix the dma address in n64cart_do_bvec
dma_map_bvec already takes bv_offset into account.

Fixes: 9b2a2bbbb4d0 ("block: Add n64 cart driver")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-04 11:15:21 -06:00
Md Haris Iqbal
3087b335b5 block/rnbd: Use sysfs_emit instead of s*printf function for sysfs show
sysfs_emit function was added to be aware of the PAGE_SIZE maximum of
the temporary buffer used for outputting sysfs content, so there is no
possible overruns. So replace the uses of any s*printf functions for
the sysfs show functions with sysfs_emit.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210726115950.470543-3-jinpu.wang@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02 13:37:40 -06:00
Gioh Kim
94dace8c85 block/rnbd-clt: Use put_cpu_ptr after get_cpu_ptr
This patch replaces put_cpu_var with put_cpu_ptr because
get_cpu_ptr should be paired with put_cpu_ptr.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20210726115950.470543-2-jinpu.wang@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02 13:37:40 -06:00
Matteo Croce
9f65c489b6 loop: raise media_change event
Make the loop device raise a DISK_MEDIA_CHANGE event on attach or detach.

	# udevadm monitor -up |grep -e DISK_MEDIA_CHANGE -e DEVNAME &

	# losetup -f zero
	[    7.454235] loop0: detected capacity change from 0 to 16384
	DISK_MEDIA_CHANGE=1
	DEVNAME=/dev/loop0
	DEVNAME=/dev/loop0
	DEVNAME=/dev/loop0

	# losetup -f zero
	[   10.205245] loop1: detected capacity change from 0 to 16384
	DISK_MEDIA_CHANGE=1
	DEVNAME=/dev/loop1
	DEVNAME=/dev/loop1
	DEVNAME=/dev/loop1

	# losetup -f zero2
	[   13.532368] loop2: detected capacity change from 0 to 40960
	DISK_MEDIA_CHANGE=1
	DEVNAME=/dev/loop2
	DEVNAME=/dev/loop2

	# losetup -D
	DEVNAME=/dev/loop1
	DISK_MEDIA_CHANGE=1
	DEVNAME=/dev/loop1
	DEVNAME=/dev/loop2
	DISK_MEDIA_CHANGE=1
	DEVNAME=/dev/loop2
	DEVNAME=/dev/loop0
	DISK_MEDIA_CHANGE=1
	DEVNAME=/dev/loop0

Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-7-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02 13:37:29 -06:00
Christoph Hellwig
4b2731226d loop: don't grab a reference to the block device
The whole device block device won't be removed while the disk is still
alive, so don't bother to grab a reference to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Ming Lei <ming.lei@rehat.com>
Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Link: https://lore.kernel.org/r/20210722075402.983367-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02 13:37:28 -06:00
Christoph Hellwig
6e0a48552b ps3disk: use memcpy_{from,to}_bvec
Use the bvec helpers instead of open coding the copy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Link: https://lore.kernel.org/r/20210727055646.118787-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02 13:37:27 -06:00
Christoph Hellwig
732022b86a rbd: use memzero_bvec
Use memzero_bvec instead of reimplementing it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20210727055646.118787-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02 13:37:27 -06:00
Linus Torvalds
4669e13cd6 Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - gendisk freeing fix (Christoph)

 - blk-iocost wake ordering fix (Tejun)

 - tag allocation error handling fix (John)

 - loop locking fix. While this isn't the prettiest fix in the world,
   nobody has any good alternatives for 5.14. Something to likely
   revisit for 5.15. (Tetsuo)

* tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
  block: delay freeing the gendisk
  blk-iocost: fix operation ordering in iocg_wake_fn()
  blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
  loop: reintroduce global lock for safe loop_validate_file() traversal
2021-07-30 11:08:12 -07:00
Tetsuo Handa
3ce6e1f662 loop: reintroduce global lock for safe loop_validate_file() traversal
Commit 6cc8e74308 ("loop: scale loop device by introducing per
device lock") re-opened a race window for NULL pointer dereference at
loop_validate_file() where commit 310ca162d7 ("block/loop: Use
global lock for ioctl() operation.") has closed.

Although we need to guarantee that other loop devices will not change
during traversal, we can't take remote "struct loop_device"->lo_mutex
inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore,
introduce a global lock dedicated for loop_validate_file() which is
conditionally taken before local "struct loop_device"->lo_mutex is taken.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cc8e74308 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-23 10:18:25 -06:00
Ilya Dryomov
0077a50082 rbd: resurrect setting of disk->private_data in rbd_init_disk()
rbd_open() and rbd_release() expect that disk->private_data is set to
rbd_dev.  Otherwise we hit a NULL pointer dereference when mapping the
image.

URL: https://tracker.ceph.com/issues/51759
Fixes: 195b1956b8 ("rbd: use blk_mq_alloc_disk and blk_cleanup_disk")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-07-21 17:20:43 +02:00
Ilya Dryomov
ed9eb71085 rbd: don't hold lock_rwsem while running_list is being drained
Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking
on releasing_wait completion.  On the I/O completion side, each image
request also needs to take lock_rwsem for read.  Because rw_semaphore
implementation doesn't allow new readers after a writer has indicated
interest in the lock, this can result in a deadlock if something that
needs to take lock_rwsem for write gets involved.  For example:

1. watch error occurs
2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
   releases lock_rwsem
3. after reestablishing the watch, rbd_reregister_watch() takes
   lock_rwsem for write and calls rbd_reacquire_lock()
4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on
   releasing_wait until running_list becomes empty
5. another watch error occurs
6. rbd_watch_errcb() blocks trying to take lock_rwsem for write
7. no in-flight image request can complete and delete itself from
   running_list because lock_rwsem won't be granted anymore

A similar scenario can occur with "lock has been acquired" and "lock
has been released" notification handers which also take lock_rwsem for
write to update owner_cid.

We don't actually get anything useful from sitting on lock_rwsem in
rbd_quiesce_lock() -- owner_cid updates certainly don't need to be
synchronized with.  In fact the whole owner_cid tracking logic could
probably be removed from the kernel client because we don't support
proxied maintenance operations.

Cc: stable@vger.kernel.org # 5.3+
URL: https://tracker.ceph.com/issues/42757
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
2021-07-20 17:57:33 +02:00
Ilya Dryomov
8798d070d4 rbd: always kick acquire on "acquired" and "released" notifications
Skipping the "lock has been released" notification if the lock owner
is not what we expect based on owner_cid can lead to I/O hangs.
One example is our own notifications: because owner_cid is cleared
in rbd_unlock(), when we get our own notification it is processed as
unexpected/duplicate and maybe_kick_acquire() isn't called.  If a peer
that requested the lock then doesn't go through with acquiring it,
I/O requests that came in while the lock was being quiesced would
be stalled until another I/O request is submitted and kicks acquire
from rbd_img_exclusive_lock().

This makes the comment in rbd_release_lock() actually true: prior to
this change the canceled work was being requeued in response to the
"lock has been acquired" notification from rbd_handle_acquired_lock().

Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
2021-07-20 17:57:33 +02:00
Linus Torvalds
0d18c12b28 Merge tag 'block-5.14-2021-07-16' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - NVMe fixes via Christoph:
      - fix various races in nvme-pci when shutting down just after
        probing (Casey Chen)
      - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha)

 - Fix regression in xen-blkfront by cleaning up the removal state
   machine (Christoph)

 - Fix tag_set and queue cleanup ordering regression in nbd (Wang)

 - Fix tag_set and queue cleanup ordering regression in pd (Guoqing)

* tag 'block-5.14-2021-07-16' of git://git.kernel.dk/linux-block:
  xen-blkfront: sanitize the removal state machine
  nbd: fix order of cleaning up the queue and freeing the tagset
  pd: fix order of cleaning up the queue and freeing the tagset
  nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
  nvme-pci: fix multiple races in nvme_setup_io_queues
  nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE
2021-07-16 12:31:44 -07:00
Christoph Hellwig
05d69d950d xen-blkfront: sanitize the removal state machine
xen-blkfront has a weird protocol where close message from the remote
side can be delayed, and where hot removals are treated somewhat
differently from regular removals, all leading to potential NULL
pointer removals, and a del_gendisk from the block device release
method, which will deadlock. Fix this by just performing normal hot
removals even when the device is opened like all other Linux block
drivers.

Fixes: c76f48eb5c ("block: take bd_mutex around delete_partitions in del_gendisk")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20210715141711.1257293-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-15 09:32:34 -06:00