linux

Author	SHA1	Message	Date
Linus Torvalds	78e709522d	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...	2021-09-11 14:48:42 -07:00
Linus Torvalds	c0f7e49fc4	Merge tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - fix nvmet command set reporting for passthrough controllers (Adam Manzanares) - update a MAINTAINERS email address (Chaitanya Kulkarni) - set QUEUE_FLAG_NOWAIT for nvme-multipth (me) - handle errors from add_disk() (Luis Chamberlain) - update the keep alive interval when kato is modified (Tatsuya Sasaki) - fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke) - do not reset transport on data digest errors in nvme-tcp (Daniel Wagner) - only call synchronize_srcu when clearing current path (Daniel Wagner) - revalidate paths during rescan (Hannes Reinecke) - Split out the fs/block_dev into block/fops.c and block/bdev.c, which has been long overdue. Do this now before -rc1, to avoid annoying conflicts due to this (Christoph) - blk-throtl use-after-free fix (Li) - Improve plug depth for multi-device plugs, greatly increasing md resync performance (Song) - blkdev_show() locking fix (Tetsuo) - n64cart error check fix (Yang) * tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block: n64cart: fix return value check in n64cart_probe() blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues block: move fs/block_dev.c to block/bdev.c block: split out operations on block special files blk-throttle: fix UAF by deleteing timer in blk_throtl_exit() block: genhd: don't call blkdev_show() with major_names_lock held nvme: update MAINTAINERS email address nvme: add error handling support for add_disk() nvme: only call synchronize_srcu when clearing current path nvme: update keep alive interval when kato is modified nvme-tcp: Do not reset transport on data digest errors nvmet: fixup buffer overrun in nvmet_subsys_attr_serial() nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req nvmet: looks at the passthrough controller when initializing CAP nvme: move nvme_multi_css into nvme.h nvme-multipath: revalidate paths during rescan nvme-multipath: set QUEUE_FLAG_NOWAIT	2021-09-11 10:19:51 -07:00
Yang Yingliang	221e836083	n64cart: fix return value check in n64cart_probe() In case of error, the function devm_platform_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: `d9b2a2bbbb` ("block: Add n64 cart driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20210909090608.2989716-1-yangyingliang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-09-09 14:24:02 -06:00
Linus Torvalds	1dbe7e386f	Merge tag 'block-5.15-2021-09-05' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Was going to send this one in later this week, but given that -Werror is now enabled (or at least available), the mq-deadline fix really should go in for the folks hitting that. - Ensure dd_queued() is only there if needed (Geert) - Fix a kerneldoc warning for bio_alloc_kiocb() - BFQ fix for queue merging - loop locking fix (Tetsuo)" * tag 'block-5.15-2021-09-05' of git://git.kernel.dk/linux-block: loop: reduce the loop_ctl_mutex scope bio: fix kerneldoc documentation for bio_alloc_kiocb() block, bfq: honor already-setup queue merges block/mq-deadline: Move dd_queued() to fix defined but not used warning	2021-09-06 10:06:26 -07:00
Max Gurtovoy	6105d1fe6f	virtio-blk: remove unneeded "likely" statements Usually we use "likely/unlikely" to optimize the fast path. Remove redundant "likely/unlikely" statements in the control path to simplify the code and make it easier to read. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210905085717.7427-1-mgurtovoy@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-09-06 07:20:56 -04:00
Tetsuo Handa	1c500ad706	loop: reduce the loop_ctl_mutex scope syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit `a160c6159d` ("block: add an optional probe callback to major_names") is calling the module's probe function with major_names_lock held. Fortunately, since commit `990e78116d` ("block: loop: fix deadlock between open and remove") stopped holding loop_ctl_mutex in lo_open(), current role of loop_ctl_mutex is to serialize access to loop_index_idr and loop_add()/loop_remove(); in other words, management of id for IDR. To avoid holding loop_ctl_mutex during whole add/remove operation, use a bool flag to indicate whether the loop device is ready for use. loop_unregister_transfer() which is called from cleanup_cryptoloop() currently has possibility of use-after-free problem due to lack of serialization between kfree() from loop_remove() from loop_control_remove() and mutex_lock() from unregister_transfer_cb(). But since lo->lo_encryption should be already NULL when this function is called due to module unload, and commit `222013f9ac` ("cryptoloop: add a deprecation warning") indicates that we will remove this function shortly, this patch updates this function to emit warning instead of checking lo->lo_encryption. Holding loop_ctl_mutex in loop_exit() is pointless, for all users must close /dev/loop-control and /dev/loop$num (in order to drop module's refcount to 0) before loop_exit() starts, and nobody can open /dev/loop-control or /dev/loop$num afterwards. Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1] Reported-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/adb1e792-fc0e-ee81-7ea0-0906fc36419d@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-09-03 22:14:40 -06:00
Linus Torvalds	b250e6d141	Merge tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add -s option (strict mode) to merge_config.sh to make it fail when any symbol is redefined. - Show a warning if a different compiler is used for building external modules. - Infer --target from ARCH for CC=clang to let you cross-compile the kernel without CROSS_COMPILE. - Make the integrated assembler default (LLVM_IAS=1) for CC=clang. - Add <linux/stdarg.h> to the kernel source instead of borrowing <stdarg.h> from the compiler. - Add Nick Desaulniers as a Kbuild reviewer. - Drop stale cc-option tests. - Fix the combination of CONFIG_TRIM_UNUSED_KSYMS and CONFIG_LTO_CLANG to handle symbols in inline assembly. - Show a warning if 'FORCE' is missing for if_changed rules. - Various cleanups * tag 'kbuild-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits) kbuild: redo fake deps at include/ksym/.h kbuild: clean up objtool_args slightly modpost: get the .mod file path more simply checkkconfigsymbols.py: Fix the '--ignore' option kbuild: merge vmlinux_link() between ARCH=um and other architectures kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh kbuild: merge vmlinux_link() between the ordinary link and Clang LTO kbuild: remove stale *.symversions kbuild: remove unused quiet_cmd_update_lto_symversions gen_compile_commands: extract compiler command from a series of commands x86: remove cc-option-yn test for -mtune= arc: replace cc-option-yn uses with cc-option s390: replace cc-option-yn uses with cc-option ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild sparc: move the install rule to arch/sparc/Makefile security: remove unneeded subdir-$(CONFIG_...) kbuild: sh: remove unused install script kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y kbuild: Switch to 'f' variants of integrated assembler flag kbuild: Shuffle blank line to improve comment meaning ...	2021-09-03 15:33:47 -07:00
Linus Torvalds	a9c9a6f741	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, qla2xxx, target, smartpqi, lpfc, mpt3sas). The core change causing the most churn was replacing the command request field request with a macro, allowing us to offset map to it and remove the redundant field; the same was also done for the tag field. The most impactful change is the final removal of scsi_ioctl, which has been deprecated for over a decade" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits) scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1 scsi: ufs: ufs-exynos: Fix static checker warning scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI scsi: lpfc: Use the proper SCSI midlayer interfaces for PI scsi: lpfc: Copyright updates for 14.0.0.1 patches scsi: lpfc: Update lpfc version to 14.0.0.1 scsi: lpfc: Add bsg support for retrieving adapter cmf data scsi: lpfc: Add cmf_info sysfs entry scsi: lpfc: Add debugfs support for cm framework buffers scsi: lpfc: Add support for maintaining the cm statistics buffer scsi: lpfc: Add rx monitoring statistics scsi: lpfc: Add support for the CM framework scsi: lpfc: Add cmfsync WQE support scsi: lpfc: Add support for cm enablement buffer scsi: lpfc: Add cm statistics buffer support scsi: lpfc: Add EDC ELS support scsi: lpfc: Expand FPIN and RDF receive logging scsi: lpfc: Add MIB feature enablement support scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware scsi: fc: Add EDC ELS definition ...	2021-09-02 15:09:46 -07:00
Linus Torvalds	9ae5fceb9a	Merge tag 'for-linus-5.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - some small cleanups - a fix for a bug when running as Xen PV guest which could result in not all memory being transferred in case of a migration of the guest - a small series for getting rid of code for supporting very old Xen hypervisor versions nobody should be using since many years now - a series for hardening the Xen block frontend driver - a fix for Xen PV boot code issuing warning messages due to a stray preempt_disable() on the non-boot processors * tag 'for-linus-5.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: remove stray preempt_disable() from PV AP startup code xen/pcifront: Removed unnecessary __ref annotation x86: xen: platform-pci-unplug: use pr_err() and pr_warn() instead of raw printk() drivers/xen/xenbus/xenbus_client.c: fix bugon.cocci warnings xen/blkfront: don't trust the backend response data blindly xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: read response from backend only once xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests xen: assume XENFEAT_mmu_pt_update_preserve_ad being set for pv guests xen: check required Xen features xen: fix setting of max_pfn in shared_info	2021-09-02 13:20:11 -07:00
Linus Torvalds	9a1d6c9e3f	Merge tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Sitting on top of the core block changes, here are the driver changes for the 5.15 merge window: - NVMe updates via Christoph: - suspend improvements for devices with an HMB (Keith Busch) - handle double completions more gacefull (Sagi Grimberg) - cleanup the selects for the nvme core code a bit (Sagi Grimberg) - don't update queue count when failing to set io queues (Ruozhu Li) - various nvmet connect fixes (Amit Engel) - cleanup lightnvm leftovers (Keith Busch, me) - small cleanups (Colin Ian King, Hou Pu) - add tracing for the Set Features command (Hou Pu) - CMB sysfs cleanups (Keith Busch) - add a mutex_destroy call (Keith Busch) - remove lightnvm subsystem. It's served its purpose and ultimately led to zoned nvme support, we no longer need it (Christoph) - revert floppy O_NDELAY fix (Denis) - nbd fixes (Hou, Pavel, Baokun) - nbd locking fixes (Tetsuo) - nbd device removal fixes (Christoph) - raid10 rcu warning fix (Xiao) - raid1 write behind fix (Guoqing) - rnbd fixes (Gioh, Md Haris) - misc fixes (Colin)" * tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits) Revert "floppy: reintroduce O_NDELAY fix" raid1: ensure write behind bio has less than BIO_MAX_VECS sectors md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard nbd: remove nbd->destroy_complete nbd: only return usable devices from nbd_find_unused nbd: set nbd->index before releasing nbd_index_mutex nbd: prevent IDR lookups from finding partially initialized devices nbd: reset NBD to NULL when restarting in nbd_genl_connect nbd: add missing locking to the nbd_dev_add error path nvme: remove the unused NVME_NS_* enum nvme: remove nvm_ndev from ns nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers block: nbd: add sanity check for first_minor nvmet: check that host sqsize does not exceed ctrl MQES nvmet: avoid duplicate qid in connect cmd nvmet: pass back cntlid on successful completion nvme-rdma: don't update queue count when failing to set io queues nvme-tcp: don't update queue count when failing to set io queues nvme-tcp: pair send_mutex init with destroy nvme: allow user toggling hmb usage ...	2021-08-30 19:01:46 -07:00
Linus Torvalds	679369114e	Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "Nothing major in here - lots of good cleanups and tech debt handling, which is also evident in the diffstats. In particular: - Add disk sequence numbers (Matteo) - Discard merge fix (Ming) - Relax disk zoned reporting restrictions (Niklas) - Bio error handling zoned leak fix (Pavel) - Start of proper add_disk() error handling (Luis, Christoph) - blk crypto fix (Eric) - Non-standard GPT location support (Dmitry) - IO priority improvements and cleanups (Damien)o - blk-throtl improvements (Chunguang) - diskstats_show() stack reduction (Abd-Alrhman) - Loop scheduler selection (Bart) - Switch block layer to use kmap_local_page() (Christoph) - Remove obsolete disk_name helper (Christoph) - block_device refcounting improvements (Christoph) - Ensure gendisk always has a request queue reference (Christoph) - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)" * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits) sg: pass the device name to blk_trace_setup block, bfq: cleanup the repeated declaration blk-crypto: fix check for too-large dun_bytes blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN blk-zoned: allow zone management send operations without CAP_SYS_ADMIN block: mark blkdev_fsync static block: refine the disk_live check in del_gendisk mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA mmc: block: Support alternative_gpt_sector() operation partitions/efi: Support non-standard GPT location block: Add alternative_gpt_sector() operation bio: fix page leak bio_add_hw_page failure block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT block: remove a pointless call to MINOR() in device_add_disk null_blk: add error handling support for add_disk() virtio_blk: add error handling support for add_disk() block: add error handling for device_add_disk / add_disk block: return errors from disk_alloc_events block: return errors from blk_integrity_add block: call blk_register_queue earlier in device_add_disk ...	2021-08-30 18:52:11 -07:00
Juergen Gross	b94e4b147f	xen/blkfront: don't trust the backend response data blindly Today blkfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request. Introduce a new state of the ring BLKIF_STATE_ERROR which will be switched to in case an inconsistency is being detected. Recovering from this state is possible only via removing and adding the virtual device again (e.g. via a suspend/resume cycle). Make all warning messages issued due to valid error responses rate limited in order to avoid message floods being triggered by a malicious backend. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-4-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-08-30 12:01:59 +02:00
Juergen Gross	8f5a695d99	xen/blkfront: don't take local copy of a request from the ring page In order to avoid a malicious backend being able to influence the local copy of a request build the request locally first and then copy it to the ring page instead of doing it the other way round as today. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-08-30 12:01:55 +02:00
Juergen Gross	71b66243f9	xen/blkfront: read response from backend only once In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2021-08-30 12:01:53 +02:00
Denis Efremov	c7e9d00203	Revert "floppy: reintroduce O_NDELAY fix" The patch breaks userspace implementations (e.g. fdutils) and introduces regressions in behaviour. Previously, it was possible to O_NDELAY open a floppy device with no media inserted or with write protected media without an error. Some userspace tools use this particular behavior for probing. It's not the first time when we revert this patch. Previous revert is in commit `f2791e7ead` (Revert "floppy: refactor open() flags handling"). This reverts commit `8a0c014cd2`. Link: https://lore.kernel.org/linux-block/de10cb47-34d1-5a88-7751-225ca380f735@compro.net/ Reported-by: Mark Hounschell <markh@compro.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Wim Osterholt <wim@djo.tudelft.nl> Cc: Kurt Garloff <kurt@garloff.de> Cc: <stable@vger.kernel.org> Signed-off-by: Denis Efremov <efremov@linux.com>	2021-08-28 11:16:47 +03:00
Linus Torvalds	64b4fc45be	Merge tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Revert the mq-deadline priority handling, it's causing serious performance regressions. While experimental patches exists to fix this up, it's too late to do so now. Revert it and re-do it properly for 5.15 instead. - Fix a NULL vs IS_ERR() regression in this release (Dan) - Fix a mq-deadline accounting regression in this release (Bart) - Mark cryptoloop as deprecated. It's broken and dm-crypt fully supports it, and it's actively intefering with loop. Plan on removal for 5.16 (Christoph) * tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block: cryptoloop: add a deprecation warning pd: fix a NULL vs IS_ERR() check Revert "block/mq-deadline: Prioritize high-priority requests" mq-deadline: Fix request accounting	2021-08-27 16:08:29 -07:00
Christoph Hellwig	222013f9ac	cryptoloop: add a deprecation warning Support for cryptoloop has been officially marked broken and deprecated in favor of dm-crypt (which supports the same broken algorithms if needed) in Linux 2.6.4 (released in March 2004), and support for it has been entirely removed from losetup in util-linux 2.23 (released in April 2013). Add a warning and a deprecation schedule. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-27 10:44:54 -06:00
Dan Carpenter	3375dca0b5	pd: fix a NULL vs IS_ERR() check blk_mq_alloc_disk() returns error pointers, it doesn't return NULL so correct the check. Fixes: `262d431f90` ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20210827100023.GB9449@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-27 07:45:48 -06:00
Christoph Hellwig	7ee656c3ac	nbd: remove nbd->destroy_complete The nbd->destroy_complete pointer is not really needed. For creating a device without a specific index we now simplify skip devices marked NBD_DESTROY_ON_DISCONNECT as there is not much point to reuse them. For device creation with a specific index there is no real need to treat the case of a requested but not finished disconnect different than any other device that is being shutdown, i.e. we can just return an error, as a slightly different race window would anyway. Fixes: `6e4df4c648` ("nbd: reduce the nbd_index_mutex scope") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+2c98885bcd769f56b6d6@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-7-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-25 14:20:32 -06:00
Christoph Hellwig	438cd318c8	nbd: only return usable devices from nbd_find_unused Device marked as NBD_DESTROY_ON_DISCONNECT can and should be skipped given that they won't survive the disconnect. So skip them and try to grab a reference directly and just continue if the the devices is being torn down or created and thus has a zero refcount. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-6-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-25 14:20:29 -06:00
Tetsuo Handa	b190300dec	nbd: set nbd->index before releasing nbd_index_mutex Set nbd->index before releasing nbd_index_mutex, as populate_nbd_status() might access nbd->index as soon as nbd_index_mutex is released. Fixes: `6e4df4c648` ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-5-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-25 14:20:25 -06:00
Tetsuo Handa	75b7f62aa6	nbd: prevent IDR lookups from finding partially initialized devices Previously nbd_index_mutex was held during whole add/remove/lookup operations in order to guarantee that partially initialized devices are not reachable via idr_find() or idr_for_each(). But now that partially initialized devices become reachable as soon as idr_alloc() succeeds, we need to skip partially initialized devices. Since it seems that all functions use refcount_inc_not_zero(&nbd->refs) in order to skip destroying devices, update nbd->refs from zero to non-zero as the last step of device initialization in order to also skip partially initialized devices. Fixes: `6e4df4c648` ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch, added comments] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-4-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-25 14:20:22 -06:00
Christoph Hellwig	409e0ff10e	nbd: reset NBD to NULL when restarting in nbd_genl_connect When nbd_genl_connect restarts to wait for a disconnecting device, nbd needs to be reset to NULL. Do that by facoring out a helper to find an unused device. Fixes: `6177b56c96` ("nbd: refactor device search and allocation in nbd_genl_connect") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Reported-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-3-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-25 14:20:20 -06:00
Tetsuo Handa	93f63bc41f	nbd: add missing locking to the nbd_dev_add error path idr_remove needs external synchronization. Fixes: `6e4df4c648` ("nbd: reduce the nbd_index_mutex scope") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210825163108.50713-2-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-25 14:20:08 -06:00
Luis Chamberlain	10e7123d55	null_blk: add error handling support for add_disk() We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. The actual cleanup in case of error is already handled by the caller of null_gendisk_register(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-23 12:55:46 -06:00
Luis Chamberlain	dbb301f91f	virtio_blk: add error handling support for add_disk() We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-23 12:55:45 -06:00
Alexey Dobriyan	39f75da7bc	isystem: trim/fixup stdarg.h and other headers Delete/fixup few includes in anticipation of global -isystem compile option removal. Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition of uintptr_t error (one definition comes from <stddef.h>, another from <linux/types.h>). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-19 09:02:55 +09:00
Pavel Skripkin	b1a811633f	block: nbd: add sanity check for first_minor Syzbot hit WARNING in internal_create_group(). The problem was in too big disk->first_minor. disk->first_minor is initialized by value, which comes from userspace and there wasn't any sanity checks about value correctness. It can cause duplicate creation of sysfs files/links, because disk->first_minor will be passed to MKDEV() which causes truncation to byte. Since maximum minor value is 0xff, let's check if first_minor is correct minor number. NOTE: the root case of the reported warning was in wrong error handling in register_disk(), but we can avoid passing knowingly wrong values to sysfs API, because sysfs error messages can confuse users. For example: user passed 1048576 as index, but sysfs complains about duplicate creation of /dev/block/43:0. It's not obvious how 1048576 becomes 0. Log and reproducer for above example can be found on syzkaller bug report page. Link: https://syzkaller.appspot.com/bug?id=03c2ae9146416edf811958d5fd7acfab75b143d1 Fixes: `b0d9111a2d` ("nbd: use an idr to keep track of nbd devices") Reported-by: syzbot+9937dc42271cd87d4b98@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-16 10:56:33 -06:00
Christoph Hellwig	6da525b3ec	ps3vram: use bvec_virt Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210804095634.460779-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-16 10:50:33 -06:00
Christoph Hellwig	358b348b91	virtio_blk: use bvec_virt Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20210804095634.460779-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-16 10:50:33 -06:00
Christoph Hellwig	cf58b53778	rbd: use bvec_virt Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20210804095634.460779-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-16 10:50:33 -06:00
Linus Torvalds	94e95d5899	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "Fixes in virtio, vhost, and vdpa drivers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix queue type selection logic vdpa/mlx5: Avoid destroying MR on empty iotlb tools/virtio: fix build virtio_ring: pull in spinlock header vringh: pull in spinlock header virtio-blk: Add validation for block size in config space vringh: Use wiov->used to check for read/write desc order virtio_vdpa: reject invalid vq indices vdpa: Add documentation for vdpa_alloc_device() macro vDPA/ifcvf: Fix return value check for vdpa_alloc_device() vp_vdpa: Fix return value check for vdpa_alloc_device() vdpa_sim: Fix return value check for vdpa_alloc_device() vhost: Fix the calculation in vhost_overflow() vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update() virtio_pci: Support surprise removal of virtio pci device virtio: Protect vqs list access virtio: Keep vring_del_virtqueue() mirror of VQ create virtio: Improve vq->broken access to avoid any compiler optimization	2021-08-16 06:16:25 -10:00
Linus Torvalds	020efdadd8	Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few fixes for block that should go into 5.14: - Revert the mq-deadline cgroup addition. More work is needed on this front, let's revert it for now and get it right before having it in a released kernel (Tejun) - blk-iocost lockdep fix (Ming) - nbd double completion fix (Xie) - Fix for non-idling when clearing the shared tag flag (Yu)" * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block: nbd: Aovid double completion of a request blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED Revert "block/mq-deadline: Add cgroup support" blk-iocost: fix lockdep warning on blkcg->lock	2021-08-13 13:36:42 -10:00
Christoph Hellwig	6e4df4c648	nbd: reduce the nbd_index_mutex scope nbd_index_mutex is currently held over add_disk and inside ->open, which leads to lock order reversals. Refactor the device creation code path so that nbd_dev_add is called without nbd_index_mutex lock held and only takes it for the IDR insertation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-7-hch@lst.de [axboe: fix whitespace] Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 14:17:41 -06:00
Christoph Hellwig	6177b56c96	nbd: refactor device search and allocation in nbd_genl_connect Use idr_for_each_entry instead of the awkward callback to find an existing device for the index == -1 case, and de-duplicate the device allocation if no existing device was found. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-6-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 14:17:39 -06:00
Christoph Hellwig	7bdc00cf7e	nbd: return the allocated nbd_device from nbd_dev_add Return the device we just allocated instead of doing an extra search for it in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-5-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 14:17:37 -06:00
Christoph Hellwig	327b501b1d	nbd: remove nbd_del_disk Fold nbd_del_disk and remove the pointless NULL check on ->disk given that it is always set for a successfully allocated nbd_device structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-4-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 14:17:36 -06:00
Christoph Hellwig	3f74e0645c	nbd: refactor device removal Share common code for the synchronous and workqueue based device removal, and remove the pointless use of refcount_dec_and_mutex_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-3-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 14:17:32 -06:00
Hou Tao	68c9417b19	nbd: do del_gendisk() asynchronously for NBD_DESTROY_ON_DISCONNECT Now open_mutex is used to synchronize partition operations (e.g, blk_drop_partitions() and blkdev_reread_part()), however it makes nbd driver broken, because nbd may call del_gendisk() in nbd_release() or nbd_genl_disconnect() if NBD_CFLAG_DESTROY_ON_DISCONNECT is enabled, and deadlock occurs, as shown below: // AB-BA dead-lock nbd_genl_disconnect blkdev_open nbd_disconnect_and_put lock bd_mutex // last ref nbd_put lock nbd_index_mutex del_gendisk nbd_open try lock nbd_index_mutex try lock bd_mutex or // AA dead-lock nbd_release lock bd_mutex nbd_put try lock bd_mutex Instead of fixing block layer (e.g, introduce another lock), fixing the nbd driver to call del_gendisk() in a kworker when NBD_DESTROY_ON_DISCONNECT is enabled. When NBD_DESTROY_ON_DISCONNECT is disabled, nbd device will always be destroy through module removal, and there is no risky of deadlock. To ensure the reuse of nbd index succeeds, moving the calling of idr_remove() after del_gendisk(), so if the reused index is not found in nbd_index_idr, the old disk must have been deleted. And reusing the existing destroy_complete mechanism to ensure nbd_genl_connect() will wait for the completion of del_gendisk(). Also adding a new workqueue for nbd removal, so nbd_cleanup() can ensure all removals complete before exits. Reported-by: syzbot+0fe7752e52337864d29b@syzkaller.appspotmail.com Fixes: `c76f48eb5c` ("block: take bd_mutex around delete_partitions in del_gendisk") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-2-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 14:17:16 -06:00
Baokun Li	fad7cd3310	nbd: add the check to prevent overflow in __nbd_ioctl() If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 10:09:26 -06:00
Xie Yongji	cddce01160	nbd: Aovid double completion of a request There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: `96d97e1782` ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-13 09:46:48 -06:00
Christoph Hellwig	4f9e14aecf	sx8: use the internal state machine to check if del_gendisk needs to be called Remove usage of the block layer internal GENHD_FL_UP by just looking at the host state. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210809064028.1198327-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-12 10:29:36 -06:00
Guoqing Jiang	018eca456c	block: move some macros to blkdev.h Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the generic header file to remove redundancy. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-11 19:40:28 -06:00
Xie Yongji	82e89ea077	virtio-blk: Add validation for block size in config space An untrusted device might presents an invalid block size in configuration space. This tries to add validation for it in the validate callback and clear the VIRTIO_BLK_F_BLK_SIZE feature bit if the value is out of the supported range. And we also double check the value in virtblk_probe() in case that it's changed after the validation. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210809101609.148-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2021-08-11 06:44:24 -04:00
Colin Ian King	da20b58d5b	xen-blkfront: Remove redundant assignment to variable err The variable err is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210806110601.11386-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 20:04:46 -06:00
Christoph Hellwig	edb0872f44	block: move the bdi from the request_queue to the gendisk The backing device information only makes sense for file system I/O, and thus belongs into the gendisk and not the lower level request_queue structure. Move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 11:53:23 -06:00
Christoph Hellwig	471aa704db	block: pass a gendisk to blk_queue_update_readahead .. and rename the function to disk_update_readahead. This is in preparation for moving the BDI from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-09 11:52:28 -06:00
Linus Torvalds	6bbf59145c	Merge tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few minor fixes: - Fix ldm kernel-doc warning (Bart) - Fix adding offset twice for DMA address in n64cart (Christoph) - Fix use-after-free in dasd path handling (Stefan) - Order kyber insert trace correctly (Vincent) - raid1 errored write handling fix (Wei) - Fix blk-iolatency queue get failure handling (Yu)" * tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-block: kyber: make trace_block_rq call consistent with documentation block/partitions/ldm.c: Fix a kernel-doc warning blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit() n64cart: fix the dma address in n64cart_do_bvec s390/dasd: fix use after free in dasd path handling md/raid10: properly indicate failure when ending a failed write request	2021-08-07 10:26:21 -07:00
Bart Van Assche	2112f5c133	loop: Select I/O scheduler 'none' from inside add_disk() We noticed that the user interface of Android devices becomes very slow under memory pressure. This is because Android uses the zram driver on top of the loop driver for swapping, because under memory pressure the swap code alternates reads and writes quickly, because mq-deadline is the default scheduler for loop devices and because mq-deadline delays writes by five seconds for such a workload with default settings. Fix this by making the kernel select I/O scheduler 'none' from inside add_disk() for loop devices. This default can be overridden at any time from user space, e.g. via a udev rule. This approach has an advantage compared to changing the I/O scheduler from userspace from 'mq-deadline' into 'none', namely that synchronize_rcu() does not get called. This patch changes the default I/O scheduler for loop devices from 'mq-deadline' into 'none'. Additionally, this patch reduces the Android boot time on my test setup with 0.5 seconds compared to configuring the loop I/O scheduler from user space. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-05 11:49:21 -06:00
Christoph Hellwig	402e0b8cd0	n64cart: fix the dma address in n64cart_do_bvec dma_map_bvec already takes bv_offset into account. Fixes: 9b2a2bbbb4d0 ("block: Add n64 cart driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-04 11:15:21 -06:00

1 2 3 4 5 ...

7196 Commits