linux

History

Mariusz Tkaczyk 57668f0a4c raid5: introduce MD_BROKEN Raid456 module had allowed to achieve failed state. It was fixed by `fb73b357fb` ("raid5: block failing device if raid will be failed"). This fix introduces a bug, now if raid5 fails during IO, it may result with a hung task without completion. Faulty flag on the device is necessary to process all requests and is checked many times, mainly in analyze_stripe(). Allow to set faulty on drive again and set MD_BROKEN if raid is failed. As a result, this level is allowed to achieve failed state again, but communication with userspace (via -EBUSY status) will be preserved. This restores possibility to fail array via #mdadm --set-faulty command and will be fixed by additional verification on mdadm side. Reproduction steps: mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean mkfs.xfs /dev/md126 -f mount /dev/md126 /mnt/root/ fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 & echo 1 > /sys/block/nvme2n1/device/device/remove echo 1 > /sys/block/nvme1n1/device/device/remove [ 1475.787779] Call Trace: [ 1475.793111] __schedule+0x2a6/0x700 [ 1475.799460] schedule+0x38/0xa0 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456] [ 1475.813856] ? finish_wait+0x80/0x80 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456] [ 1475.828281] ? finish_wait+0x80/0x80 [ 1475.834727] ? finish_wait+0x80/0x80 [ 1475.841127] ? finish_wait+0x80/0x80 [ 1475.847480] md_handle_request+0x119/0x190 [ 1475.854390] md_make_request+0x8a/0x190 [ 1475.861041] generic_make_request+0xcf/0x310 [ 1475.868145] submit_bio+0x3c/0x160 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390 [ 1475.889149] iomap_apply+0xff/0x310 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.909974] iomap_dio_rw+0x2f2/0x490 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.923680] ? atime_needs_update+0x77/0xe0 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs] [ 1475.953403] aio_read+0xd5/0x180 [ 1475.959395] ? _cond_resched+0x15/0x30 [ 1475.965907] io_submit_one+0x20b/0x3c0 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180 [ 1475.979335] ? do_io_getevents+0x7c/0xc0 [ 1475.986009] do_syscall_64+0x5b/0x1a0 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 1476.000255] RIP: 0033:0x7f11fc27978d [ 1476.006631] Code: Bad RIP value. [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds. Cc: stable@vger.kernel.org Fixes: `fb73b357fb` ("raid5: block failing device if raid will be failed") Reviewd-by: Xiao Ni <xni@redhat.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org>		2022-04-25 14:00:35 -07:00
..
bcache	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
persistent-data	dm space map common: add bounds check to sm_ll_lookup_bitmap()	2022-01-04 13:58:19 -05:00
dm-audit.c	dm: introduce audit event module for device mapper	2021-10-27 16:53:47 -04:00
dm-audit.h	dm: introduce audit event module for device mapper	2021-10-27 16:53:47 -04:00
dm-bio-prison-v1.c
dm-bio-prison-v1.h
dm-bio-prison-v2.c	dm bio prison v2: use true/false for bool variable	2020-01-07 12:07:08 -05:00
dm-bio-prison-v2.h
dm-bio-record.h	block: move integrity handling out of <linux/blkdev.h>	2021-10-18 06:17:02 -06:00
dm-bufio.c	block: turn bio_kmalloc into a simple kmalloc wrapper	2022-04-17 19:30:41 -06:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c	dm cache policy smq: make static read-only array table const	2022-02-22 10:35:53 -05:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
dm-clone-metadata.c	dm clone metadata: remove unused function	2021-04-19 13:20:31 -04:00
dm-clone-metadata.h	dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions()	2020-03-27 14:42:51 -04:00
dm-clone-target.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
dm-core.h	dm: fix dm_io and dm_target_io flags race condition on Alpha	2022-04-01 13:19:27 -04:00
dm-crypt.c	SCSI misc on 20220324	2022-03-24 19:37:53 -07:00
dm-delay.c	dm: simplify dm_sumbit_bio_remap interface	2022-03-10 13:44:56 -05:00
dm-dust.c	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-ebs-target.c	scsi: dm: Remove WRITE_SAME support	2022-02-22 21:11:08 -05:00
dm-era-target.c	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-exception-store.c
dm-exception-store.h	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-flakey.c	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-ima.c	dm ima: fix wrong length calculation for no_data string	2022-02-22 10:42:41 -05:00
dm-ima.h	dm ima: add version info to dm related events in ima log	2021-08-20 15:59:47 -04:00
dm-init.c	dm init: Set file local variable static	2020-08-04 15:51:28 -04:00
dm-integrity.c	dm integrity: fix memory corruption when tag_size is less than digest size	2022-04-13 12:38:49 -04:00
dm-io-tracker.h	dm writecache: make writeback pause configurable	2021-06-28 16:30:13 -04:00
dm-io.c	block: add a bdev_max_discard_sectors helper	2022-04-17 19:49:59 -06:00
dm-ioctl.c	dm ioctl: log an error if the ioctl structure is corrupted	2022-04-01 10:29:43 -04:00
dm-kcopyd.c	dm writecache: have ssd writeback wait if the kcopyd workqueue is busy	2021-06-15 15:42:03 -04:00
dm-linear.c	scsi: dm: Remove WRITE_SAME support	2022-02-22 21:11:08 -05:00
dm-log-userspace-base.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
dm-log.c	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-mpath.c	SCSI misc on 20220324	2022-03-24 19:37:53 -07:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h	dm mpath: pass IO start time to path selector	2020-05-15 10:29:36 -04:00
dm-ps-historical-service-time.c	dm mpath: only use ktime_get_ns() in historical selector	2022-04-13 13:22:16 -04:00
dm-ps-io-affinity.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-ps-queue-length.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-ps-round-robin.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-ps-service-time.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-raid1.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-raid.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
dm-region-hash.c
dm-rq.c	SCSI misc on 20220324	2022-03-24 19:37:53 -07:00
dm-rq.h
dm-snap-persistent.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-snap-transient.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-snap.c	dm-snap: use blkdev_issue_flush instead of open coding it	2022-02-02 07:49:59 -07:00
dm-stats.c	dm stats: fix too short end duration_ns when using precise_timestamps	2022-02-21 15:35:39 -05:00
dm-stats.h	dm stats: fix too short end duration_ns when using precise_timestamps	2022-02-21 15:35:39 -05:00
dm-stripe.c	scsi: dm: Remove WRITE_SAME support	2022-02-22 21:11:08 -05:00
dm-switch.c	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them	2021-10-18 14:43:22 -06:00
dm-sysfs.c	dm sysfs: use default_groups in kobj_type	2022-01-06 09:48:55 -05:00
dm-table.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
dm-target.c
dm-thin-metadata.c	dm thin metadata: remove unused dm_thin_remove_block and __remove	2022-02-22 13:55:50 -05:00
dm-thin-metadata.h	dm thin metadata: remove unused dm_thin_remove_block and __remove	2022-02-22 13:55:50 -05:00
dm-thin.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c	dm: update target status functions to support IMA measurement	2021-08-10 13:34:23 -04:00
dm-verity-fec.c	dm verity fec: fix misaligned RS roots IO	2021-04-14 14:28:29 -04:00
dm-verity-fec.h	dm verity fec: fix misaligned RS roots IO	2021-04-14 14:28:29 -04:00
dm-verity-target.c	- Add DM core support for emitting audit events through the audit	2021-11-09 11:02:04 -08:00
dm-verity-verify-sig.c	dm verity: fix require_signatures module_param permissions	2021-05-25 16:14:05 -04:00
dm-verity-verify-sig.h	dm verity: Fix compilation warning	2020-08-04 15:48:13 -04:00
dm-verity.h	dm verity: add "panic_on_corruption" error handling mode	2020-07-13 11:47:33 -04:00
dm-writecache.c	block: pass a block_device and opf to bio_alloc_bioset	2022-02-02 07:49:59 -07:00
dm-zero.c	dm: add support for REQ_NOWAIT to various targets	2020-12-04 18:04:35 -05:00
dm-zone.c	dm zone: fix NULL pointer dereference in dm_zone_map_bio	2022-04-13 13:22:17 -04:00
dm-zoned-metadata.c	dm-zoned: remove the ->name field in struct dmz_dev	2022-03-02 12:15:35 -05:00
dm-zoned-reclaim.c	dm kcopyd: avoid useless atomic operations	2021-06-04 12:07:24 -04:00
dm-zoned-target.c	dm-zoned: remove the ->name field in struct dmz_dev	2022-03-02 12:15:35 -05:00
dm-zoned.h	dm-zoned: remove the ->name field in struct dmz_dev	2022-03-02 12:15:35 -05:00
dm.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
dm.h	dax: remove dax_capable	2021-12-04 08:58:51 -08:00
Kconfig	blk-mq: make the blk-mq stacking code optional	2022-02-16 19:39:09 -07:00
Makefile	dm: introduce audit event module for device mapper	2021-10-27 16:53:47 -04:00
md-autodetect.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
md-bitmap.c	md/bitmap: don't set max_write_behind if there is no write mostly device	2021-11-02 11:41:44 -07:00
md-bitmap.h
md-cluster.c	md: fix spelling of "its"	2022-01-06 08:37:03 -08:00
md-cluster.h
md-faulty.c	block: pass a block_device to bio_clone_fast	2022-02-04 07:43:18 -07:00
md-linear.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
md-linear.h	md/raid1: Replace zero-length array with flexible-array	2020-05-13 12:02:23 -07:00
md-multipath.c	SCSI misc on 20220324	2022-03-24 19:37:53 -07:00
md-multipath.h
md.c	md: Set MD_BROKEN for RAID1 and RAID10	2022-04-25 14:00:34 -07:00
md.h	md: Set MD_BROKEN for RAID1 and RAID10	2022-04-25 14:00:34 -07:00
raid0.c	block: remove QUEUE_FLAG_DISCARD	2022-04-17 19:49:59 -06:00
raid0.h
raid1-10.c	md: raid1/raid10: drop pending_cnt	2022-03-08 15:16:54 -08:00
raid1.c	md: Set MD_BROKEN for RAID1 and RAID10	2022-04-25 14:00:34 -07:00
raid1.h	md: raid1/raid10: drop pending_cnt	2022-03-08 15:16:54 -08:00
raid5-cache.c	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD	2022-04-17 19:49:59 -06:00
raid5-log.h
raid5-ppl.c	for-5.18/write-streams-2022-03-18	2022-03-26 11:51:46 -07:00
raid5.c	raid5: introduce MD_BROKEN	2022-04-25 14:00:35 -07:00
raid5.h	md/raid5: play nice with PREEMPT_RT	2022-01-06 08:37:02 -08:00
raid10.c	md: Set MD_BROKEN for RAID1 and RAID10	2022-04-25 14:00:34 -07:00
raid10.h	md: raid1/raid10: drop pending_cnt	2022-03-08 15:16:54 -08:00