linux

Author	SHA1	Message	Date
Christoph Hellwig	a7f7b7116c	nvme-rdma: split nvme_rdma_alloc_tagset Split nvme_rdma_alloc_tagset into one helper for the admin tag_set and one for the I/O tag set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Christoph Hellwig	2455a4b778	nvme-pci: split nvme_dev_add Split nvme_dev_add into a helper to actually allocate the tag set, and one that just update the number of queues. Add a local variable for the tag_set to clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Christoph Hellwig	f91b727ccf	nvme-pci: split nvme_alloc_admin_tags Split nvme_alloc_admin_tags into a helper to actually allocate the tag set, and one that just restarts the admin queue. Add a local variable for the tag_set to clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Christoph Hellwig	8614144002	nvme-pci: print the command name of aborted commands To allow for slightly better debugging, print the command name when aborting an command. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Liu Song	33b6debd61	nvme-pci: remove useless assignment in nvme_pci_setup_prps If prp_list is NULL, nvme_unmap_sg will be performed, and the assignment to first_dma is meaningless, so remove it. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Dan Carpenter	80e2768496	nvme-auth: uninitialized variable in nvme_auth_transform_key() A couple of the early error gotos call kfree_sensitive(transformed_key); before "transformed_key" has been initialized. Fixes: `db1312dd95` ("nvmet: implement basic In-Band Authentication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Dan Carpenter	4daf7fa07e	nvme-auth: fix off by one checks The > ARRAY_SIZE() checks need to be >= ARRAY_SIZE() to prevent reading one element beyond the end of the arrays. Fixes: `db1312dd95` ("nvmet: implement basic In-Band Authentication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Nick Bowler	a25d426158	nvme: define compat_ioctl again to unbreak 32-bit userspace. Commit `89b3d6e605` ("nvme: simplify the compat ioctl handling") removed the initialization of compat_ioctl from the nvme block_device_operations structures. Presumably the expectation was that 32-bit ioctls would be directed through the regular handler but this is not the case: failing to assign .compat_ioctl actually means that the compat case is disabled entirely, and any attempt to submit nvme ioctls from 32-bit userspace fails outright with -ENOTTY. For example: % smartctl -x /dev/nvme0n1 [...] Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device The blkdev_compat_ptr_ioctl helper can be used to direct compat calls through the main ioctl handler and makes things work again. Fixes: `89b3d6e605` ("nvme: simplify the compat ioctl handling") Signed-off-by: Nick Bowler <nbowler@draconx.ca> Reviewed-by: Guixin Liu <kanie@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Christoph Hellwig	eb7e2d9258	nvme: don't always build constants.o The entire content of constants.c if guarded by an ifdef, so switch to just building the file conditionally instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:48 -06:00
Bean Huo	679c54f2de	nvme: use command_id instead of req->tag in trace_nvme_complete_rq() Use command_id instead of req->tag in trace_nvme_complete_rq(), because of commit `e7006de6c2` ("nvme: code command_id with a genctr for use authentication after release"), cmd->common.command_id is set to ((genctl & 0xf)< 12 \| req->tag), no longer req->tag, which makes cid in trace_nvme_complete_rq and trace_nvme_setup_cmd are not the same. Fixes: `e7006de6c2` ("nvme: code command_id with a genctr for use authentication after release") Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Mikulas Patocka	d17f744e88	md-raid10: fix KASAN warning There's a KASAN warning in raid10_remove_disk when running the lvm test lvconvert-raid-reshape.sh. We fix this warning by verifying that the value "number" is valid. BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10] Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682 CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? raid10_remove_disk+0x61/0x2a0 [raid10] kasan_report+0xa8/0xe0 ? raid10_remove_disk+0x61/0x2a0 [raid10] raid10_remove_disk+0x61/0x2a0 [raid10] Buffer I/O error on dev dm-76, logical block 15344, async page read ? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0 remove_and_add_spares+0x367/0x8a0 [md_mod] ? super_written+0x1c0/0x1c0 [md_mod] ? mutex_trylock+0xac/0x120 ? _raw_spin_lock+0x72/0xc0 ? _raw_spin_lock_bh+0xc0/0xc0 md_check_recovery+0x848/0x960 [md_mod] raid10d+0xcf/0x3360 [raid10] ? sched_clock_cpu+0x185/0x1a0 ? rb_erase+0x4d4/0x620 ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid10_sync_request+0x36c0/0x36c0 [raid10] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? _raw_spin_unlock_irq+0x11/0x24 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 124495: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 setup_conf+0x140/0x5c0 [raid10] raid10_run+0x4cd/0x740 [raid10] md_run+0x6f9/0x1300 [md_mod] raid_ctr+0x2531/0x4ac0 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 L __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff889108f3d200 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 0 bytes to the right of 256-byte region [ffff889108f3d200, ffff889108f3d300) The buggy address belongs to the physical page: page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4000000000010200(slab\|head\|zone=2) raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Mikulas Patocka	e151db8ecf	md-raid: destroy the bitmap after destroying the thread When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with kasan, we got failure in write_page. The reason for the failure is that md_bitmap_destroy is called before destroying the thread and the thread may be waiting in the function write_page for the bio to complete. When the thread finishes waiting, it executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which triggers the kasan warning. Note that the commit `48df498daf` that caused this bug claims that it is neede for md-cluster, you should check md-cluster and possibly find another bugfix for it. BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod] Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539 CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? write_page+0x18d/0x680 [md_mod] kasan_report+0xa8/0xe0 ? write_page+0x18d/0x680 [md_mod] kasan_check_range+0x13f/0x180 write_page+0x18d/0x680 [md_mod] ? super_sync+0x4d5/0x560 [dm_raid] ? md_bitmap_file_kick+0xa0/0xa0 [md_mod] ? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid] ? mutex_trylock+0x120/0x120 ? preempt_count_add+0x6b/0xc0 ? preempt_count_sub+0xf/0xc0 md_update_sb+0x707/0xe40 [md_mod] md_reap_sync_thread+0x1b2/0x4a0 [md_mod] md_check_recovery+0x533/0x960 [md_mod] raid1d+0xc8/0x2a20 [raid1] ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid1_end_read_request+0x2a0/0x2a0 [raid1] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 5522: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 md_bitmap_create+0xa8/0xe80 [md_mod] md_run+0x777/0x1300 [md_mod] raid_ctr+0x249c/0x4a30 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x40 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0xf7/0x140 kfree+0x80/0x240 md_bitmap_free+0x1c3/0x280 [md_mod] __md_stop+0x21/0x120 [md_mod] md_stop+0x9/0x40 [md_mod] raid_dtr+0x1b/0x40 [dm_raid] dm_table_destroy+0x98/0x1e0 [dm_mod] __dm_destroy+0x199/0x360 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: `48df498daf` ("md: move bitmap_destroy to the beginning of __md_stop") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Christoph Hellwig	34cb92c0a5	md: return the allocated devices from md_alloc Two callers of md_alloc want to use the newly allocated devices, so return it instead of letting them find it cumbersomely after the allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-and-tested-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Christoph Hellwig	a110876828	md: open code md_probe in autorun_devices autorun_devices should not be limited to the controls for the legacy probe on open, so just call md_alloc directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-and-tested-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Yang Li	c0250d16b2	md: remove unneeded semicolon Eliminate the following coccicheck warning: ./drivers/md/md.c:8208:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Christoph Hellwig	d13bc4d84a	remove the sx8 block driver This driver is for fairly obscure hardware, and has only seen random drive-by changes after the maintainer stopped working on it in 2005 (about a year and a half after it was introduced). It has some "interesting" block layer interactions, so let's just drop it unless anyone complains. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220721064102.1715460-1-hch@lst.de [axboe: fix date typo, it was in 2005, not 2015] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Stephen Rothwell	2198c51a08	md: fix build failure for !MODULE After merging the block tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/md/md.c:717:22: error: 'mddev_find' defined but not used [-Werror=unused-function] 717 \| static struct mddev *mddev_find(dev_t unit) \| ^~~~~~~~~~ cc1: all warnings being treated as errors Caused by commit 4500d5c17910 ("md: simplify md_open") Make mddev_find() available only for non-modular builds. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220721131132.070be166@canb.auug.org.au Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Jackie Liu	a20d636bee	raid5: fix duplicate checks for rdev->saved_raid_disk 'first' will always be greater than or equal to 0, it is unnecessary to repeat the 0 check, clean it up. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Christoph Hellwig	5b26804bb0	md: simplify md_open Now that devices are on the all_mddevs list until the gendisk is freed, there can't be any duplicates. Remove the global list lookup and just grab a reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:46 -06:00
Christoph Hellwig	12a6caf273	md: only delete entries from all_mddevs when the disk is freed This ensures device names don't get prematurely reused. Instead add a deleted flag to skip already deleted devices in mddev_get and other places that only want to see live mddevs. Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:44 -06:00
Christoph Hellwig	16648bac86	md: stop using for_each_mddev in md_exit Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a reference when we drop the lock and delete the now unused for_each_mddev macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:44 -06:00
Christoph Hellwig	f265143422	md: stop using for_each_mddev in md_notify_reboot Just do a simple list_for_each_entry_safe on all_mddevs, and only grab a reference when we drop the lock. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:44 -06:00
Christoph Hellwig	b0e706a1ba	md: stop using for_each_mddev in md_do_sync Just do a plain list_for_each that only grabs a mddev reference in the case where the thread sleeps and restarts the list iteration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Christoph Hellwig	2652a1bd2e	md: factor out the rdev overlaps check from rdev_size_store This splits the code into nicely readable chunks and also avoids the refcount inc/dec manipulations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Christoph Hellwig	33b614e334	md: rename md_free to md_kobj_release The md_free name is rather misleading, so pick a better one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Christoph Hellwig	e8c59ac419	md: implement ->free_disk Ensure that all private data is only freed once all accesses are done. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Christoph Hellwig	c57094a6e1	md: fix error handling in md_alloc Error handling in md_alloc is a mess. Untangle it to just free the mddev directly before add_disk is called and thus the gendisk is globally visible. After that clear the hold flag and let the mddev_put take care of cleaning up the mddev through the usual mechanisms. Fixes: `5e55e2f5fc` ("[PATCH] md: convert compile time warnings into runtime warnings") Fixes: `9be68dd7ac` ("md: add error handling support for add_disk()") Fixes: `7ad1069166` ("md: properly unwind when failing to add the kobject in md_alloc") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Christoph Hellwig	ca39f75024	md: fix mddev->kobj lifetime Once a kobject is initialized, the containing object should not be directly freed. So delay initialization until it is added. Also remove the kobject_del call as the last put will remove the kobject as well. The explicitly delete isn't needed here, and dropping it will simplify further fixes. With this md_free now does not need to check that ->gendisk is non-NULL as it is always set by the time that kobject_init is called on mddev->kobj. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Logan Gunthorpe	ee1aa06ba3	md/raid5: Convert prepare_to_wait() to wait_woken() api raid5_get_active_stripe() can sleep in various situations and it is called by make_stripe_request() while inside the prepare_to_wait()/finish_wait() section. Nested waits like this are not supported. This was noticed while making other changes that add different sleeps to raid5_get_active_stripe() that caused a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP. No ill effects have been noticed with the code as is, but theoretically a nested and here could cause a dead lock so it should be fixed. To fix this, convert the prepare_to_wait() call to use wake_woken() which supports nested sleeps. Link: https://lwn.net/Articles/628628/ Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:43 -06:00
Logan Gunthorpe	b9f91d80de	md/raid5: Fix sectors_to_do bitmap overflow in raid5_make_request() For unaligned IO that have nearly maximum sectors, the number of stripes will end up being one greater than the size of the bitmap. When this happens, the last stripe in the IO will not be processed as it should be, resulting in data corruption. However, this is not normally seen when the backing block devices have 4K physical block sizes since the block layer will split the request before that happens. To fix this increase the bitmap size by one bit and ensure the full number of stripes are checked when calling find_first_bit(). Reported-by: David Sloan <David.Sloan@eideticom.com> Fixes: `7e55c60acf` ("md/raid5: Pivot raid5_make_request()") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Coly Li	640c46a21f	bcache: remove EXPERIMENTAL for Kconfig option 'Asynchronous device registration' The "Asynchronous device registration (EXPERIMENTAL)" Kconfig option is for 2+ years, it is used when registration takes too much time for massive amount of cached data, to avoid udev task timeout during boot time. Many users and products enable this Kconfig option for quite long time (e.g. SUSE Linux) and it works as expected and no issue reported. It is time to remove the "EXPERIMENTAL" tag from this Kconfig item. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220719042724.8498-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Yu Kuai	bc9da6dd06	nbd: add missing definition of pr_fmt commit `1243172d58` ("nbd: use pr_err to output error message") tries to define pr_fmt and use short pr_err() to output error message, however, the definition is missed. This patch also remove existing "nbd:" inside pr_err(). Fixes: `1243172d58` ("nbd: use pr_err to output error message") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220723082427.3890655-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Dan Carpenter	ee452a8d98	null_blk: fix ida error handling in null_add_dev() There needs to be some error checking if ida_simple_get() fails. Also call ida_free() if there are errors later. Fixes: `94bc02e30f` ("nullb: use ida to manage index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Joel Granados	c13cf14f44	nvme-multipath: refactor nvme_mpath_add_disk Pass anagrpid as second argument. This is prep patch that allows reusing this function for supporting unknown command sets. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Guixin Liu	0f89f0ece5	nvme-apple: use nvme core helper to cancel requests in tagset Use nvme core helper nvme_cancel_tagset and nvme_cancel_admin_tagset instead of same logic code. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Guixin Liu	1fcfca7812	nvme-pci: use nvme core helper to cancel requests in tagset Use nvme core helper nvme_cancel_tagset and nvme_cancel_admin_tagset instead of same logic code. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ruozhu Li <liruozhu@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Caleb Sander	53ee9e2937	nvme-tcp: use in-capsule data for I/O connect Currently, command data is only sent in-capsule on the for admin or I/O commands on queues that indicate support for it. Send fabrics command data in-capsule for I/O queues as well to avoid needing a separate H2CData PDU for the connect command. This is optimization. Without this change, we send the connect command capsule and data in separate PDUs (CapsuleCmd and H2CData), and must wait for the controller to respond with an R2T PDU before sending the H2CData. With the change, we send a single CapsuleCmd PDU that includes the data. This reduces the number of bytes (and likely packets) sent across the network, and simplifies the send state machine handling in the driver. Signed-off-by: Caleb Sander <csander@purestorage.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Israel Rukshin	0525af711b	nvme-rdma: remove timeout for getting RDMA-CM established event In case many controllers start error recovery at the same time (i.e., when port is down and up), they may never succeed to reconnect again. This is because the target can't handle all the connect requests at three seconds (the arbitrary value set today). Even if some of the connections are established, when a single queue fails to connect, all the controller's queues are destroyed as well. So, on the following reconnection attempts the number of connect requests may remain the same. To fix this, remove the timeout and wait for RDMA-CM event to abort/complete the connect request. RDMA-CM sends unreachable event when a timeout of ~90 seconds is expired. This approach is used at other RDMA-CM users like SRP and iSER at blocking mode. The commit also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -06:00
Vincent Fu	7012eef520	null_blk: add configfs variables for 2 options Allow setting via configfs these two options: no_sched shared_tag_bitmap Previously these could only be activated as module parameters. Still missing are: shared_tags timeout requeue init_hctx Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220708174943.87787-3-vincent.fu@samsung.com [axboe: fold in nullb == NULL fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:15:02 -06:00
Vincent Fu	058efe000b	null_blk: add module parameters for 4 options Add as module parameters these options: memory_backed discard mbps cache_size Previously these could only be set via configfs. Still missing is bad_blocks. The kernel test robot found a documentation formatting issue in v1 of this patch. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20220708174943.87787-2-vincent.fu@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:50 -06:00
Md Haris Iqbal	ce11bdf946	block/rnbd-srv: Replace sess_dev_list with index_idr The structure rnbd_srv_session maintains a list and an xarray of rnbd_srv_dev. There is no need to keep both as one of them can serve the purpose. Since one of the places where the lookup of rnbd_srv_dev using rnbd_srv_session is IO path, an xarray would serve us better than a list traversal. Hence remove sess_dev_list from rnbd_srv_session, and replace its uses from xarray. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220707143122.460362-3-haris.iqbal@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:50 -06:00
Md Haris Iqbal	4bc14f3101	block/rnbd-srv: Set keep_id to true after mutex_trylock After setting keep_id if the mutex trylock fails, the keep_id stays set for the rest of the sess_dev lifetime. Therefore, set keep_id to true after mutex_trylock succeeds, so that a failure of trylock does'nt touch keep_id. Fixes: `b168e1d85c` ("block/rnbd-srv: Prevent a deadlock generated by accessing sysfs in parallel") Cc: gi-oh.kim@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220707143122.460362-2-haris.iqbal@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:50 -06:00
Hannes Reinecke	1a70200f40	nvmet-auth: expire authentication sessions Each authentication step is required to be completed within the KATO interval (or two minutes if not set). So add a workqueue function to reset the transaction ID and the expected next protocol step; this will automatically the next authentication command referring to the terminated authentication. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:50 -06:00
Hannes Reinecke	7a277c37d3	nvmet-auth: Diffie-Hellman key exchange support Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. This patch adds a new host configfs attribute 'dhchap_dhgroup' to select the FFDHE group to use. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:50 -06:00
Hannes Reinecke	db1312dd95	nvmet: implement basic In-Band Authentication Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds three additional configfs entries 'dhchap_key', 'dhchap_ctrl_key', and 'dhchap_hash' to the 'host' configfs directory. The 'dhchap_key' and 'dhchap_ctrl_key' entries need to be in the ASCII format as specified in NVMe Base Specification v2.0 section 8.13.5.8 'Secret representation'. 'dhchap_hash' defaults to 'hmac(sha256)', and can be written to to switch to a different HMAC algorithm. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:49 -06:00
Hannes Reinecke	6490c9ed06	nvmet: parse fabrics commands on io queues Some fabrics commands can be sent via io queues, so add a new function nvmet_parse_fabrics_io_cmd() and rename the existing nvmet_parse_fabrics_cmd() to nvmet_parse_fabrics_admin_cmd(). Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:49 -06:00
Hannes Reinecke	b61775d185	nvme-auth: Diffie-Hellman key exchange support Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:49 -06:00
Hannes Reinecke	f50fff73d6	nvme: implement In-Band authentication Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds two new fabric options 'dhchap_secret' to specify the pre-shared key (in ASCII respresentation according to NVMe 2.0 section 8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify the pre-shared controller key for bi-directional authentication of both the host and the controller. Re-authentication can be triggered by writing the PSK into the new controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> [axboe: fold in clang build fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:49 -06:00
Hannes Reinecke	3bf2fde6fc	nvme-fabrics: decode 'authentication required' connect error The 'connect' command might fail with NVME_SC_AUTH_REQUIRED, so we should be decoding this error, too. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:47 -06:00
Hannes Reinecke	88b140fec0	nvme: add definitions for NVMe In-Band authentication Add new definitions for NVMe In-band authentication as defined in the NVMe Base Specification v2.0. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-02 17:14:47 -06:00

1 2 3 4 5 ...

1109643 Commits