linux/drivers/md
Jonathan Brassow a4dc163a55 DM RAID: Fix raid_resume not reviving failed devices in all cases
DM RAID:  Fix raid_resume not reviving failed devices in all cases

When a device fails in a RAID array, it is marked as Faulty.  Later,
md_check_recovery is called which (through the call chain) calls
'hot_remove_disk' in order to have the personalities remove the device
from use in the array.

Sometimes, it is possible for the array to be suspended before the
personalities get their chance to perform 'hot_remove_disk'.  This is
normally not an issue.  If the array is deactivated, then the failed
device will be noticed when the array is reinstantiated.  If the
array is resumed and the disk is still missing, md_check_recovery will
be called upon resume and 'hot_remove_disk' will be called at that
time.  However, (for dm-raid) if the device has been restored,
a resume on the array would cause it to attempt to revive the device
by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
a situation is then created where the device is thought to concurrently
be the replacement and the device to be replaced.  Thus, the device
is first sync'ed with the rest of the array (because it is the replacement
device) and then marked Faulty and removed from the array (because
it is also the device being replaced).

The solution is to check and see if the device had properly been removed
before the array was suspended.  This is done by seeing whether the
device's 'raid_disk' field is -1 - a condition that implies that
'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
-> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
'hot_remove_disk' must be called to complete the removal of the previously
faulty device before it can be revived via 'hot_add_disk'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14 08:10:25 +10:00
..
bcache bcache: Fix error handling in init code 2013-05-15 00:48:14 -07:00
persistent-data dm persistent metadata: add space map threshold callback 2013-05-10 14:37:20 +01:00
bitmap.c md: use set_bit_le and clear_bit_le 2013-04-24 11:42:41 +10:00
bitmap.h md/bitmap: record the space available for the bitmap in the superblock. 2012-05-22 13:55:34 +10:00
dm-bio-prison.c dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-prison.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-record.h
dm-bufio.c dm bufio: avoid a possible __vmalloc deadlock 2013-05-10 14:37:15 +01:00
dm-bufio.h dm bufio: prefetch 2012-03-28 18:41:29 +01:00
dm-cache-block-types.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-cache-metadata.c dm cache: replace memcpy with struct assignment 2013-05-10 14:37:18 +01:00
dm-cache-metadata.h dm cache: policy ignore hints if generated by different version 2013-03-20 17:21:28 +00:00
dm-cache-policy-cleaner.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-mq.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy.h dm cache policy: fix description of lookup fn 2013-05-10 14:37:17 +01:00
dm-cache-target.c dm cache: set config value 2013-05-10 14:37:21 +01:00
dm-crypt.c block: Convert some code to bio_for_each_segment_all() 2013-03-23 14:26:30 -07:00
dm-delay.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-exception-store.c dm: replace simple_strtoul 2012-07-27 15:07:59 +01:00
dm-exception-store.h dm snapshot: test chunk size against both origin and snapshot 2010-08-12 04:13:51 +01:00
dm-flakey.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-io.c dm kcopyd: add WRITE SAME support to dm_kcopyd_zero 2012-12-21 20:23:37 +00:00
dm-ioctl.c dm ioctl: allow message to return data 2013-03-01 22:45:49 +00:00
dm-kcopyd.c dm kcopyd: introduce configurable throttling 2013-03-01 22:45:49 +00:00
dm-linear.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-log-userspace-base.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
dm-log-userspace-transfer.c connector/userns: replace netlink uses of cap_raised() with capable() 2012-05-10 23:21:39 -04:00
dm-log-userspace-transfer.h
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm mpath: enable WRITE SAME support 2013-05-10 14:37:16 +01:00
dm-mpath.h
dm-path-selector.c md: Add module.h to all files using it implicitly 2011-10-31 19:31:18 -04:00
dm-path-selector.h
dm-queue-length.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-raid1.c block: Use bio_sectors() more consistently 2013-03-23 14:15:30 -07:00
dm-raid.c DM RAID: Fix raid_resume not reviving failed devices in all cases 2013-06-14 08:10:25 +10:00
dm-region-hash.c dm raid1: fix crash with mirror recovery and discard 2012-07-20 14:25:03 +01:00
dm-round-robin.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-service-time.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-snap-persistent.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-snap-transient.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-snap.c dm snapshot: fix error return code in snapshot_ctr 2013-05-10 14:37:15 +01:00
dm-stripe.c dm stripe: fix regression in stripe_width calculation 2013-05-10 14:37:14 +01:00
dm-sysfs.c Driver core: Constify struct sysfs_ops in struct kobj_type 2010-03-07 17:04:49 -08:00
dm-table.c dm table: fix write same support 2013-05-10 14:37:16 +01:00
dm-target.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-thin-metadata.c dm thin: generate event when metadata threshold passed 2013-05-10 14:37:21 +01:00
dm-thin-metadata.h dm thin: generate event when metadata threshold passed 2013-05-10 14:37:21 +01:00
dm-thin.c dm thin: fix metadata dev resize detection 2013-05-19 18:57:50 +01:00
dm-uevent.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-uevent.h
dm-verity.c Merge branch 'writeback-workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core 2013-04-02 10:04:39 +02:00
dm-zero.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm.c block_device_operations->release() should return void 2013-05-07 02:16:21 -04:00
dm.h dm: introduce per_bio_data 2012-12-21 20:23:38 +00:00
faulty.c block: Add bio_end_sector() 2013-03-23 14:15:29 -07:00
Kconfig bcache: A block layer cache 2013-03-23 16:11:31 -07:00
linear.c block: Add bio_end_sector() 2013-03-23 14:15:29 -07:00
linear.h md/linear: typedef removal: linear_conf_t -> struct linear_conf 2011-10-11 16:48:54 +11:00
Makefile bcache: A block layer cache 2013-03-23 16:11:31 -07:00
md.c A few bugfixes for md 2013-06-13 10:13:29 -07:00
md.h MD: Export 'md_reap_sync_thread' function 2013-04-24 11:42:43 +10:00
multipath.c MD: change the parameter of md thread 2012-10-11 13:34:00 +11:00
multipath.h md/multipath: typedef removal: multipath_conf_t -> struct mpconf 2011-10-11 16:48:57 +11:00
raid0.c block: Change bio_split() to respect the current value of bi_idx 2013-03-23 14:15:30 -07:00
raid0.h md: add proper merge_bvec handling to RAID0 and Linear. 2012-03-19 12:46:39 +11:00
raid1.c DM RAID: Add ability to restore transiently failed devices on resume 2013-06-14 08:10:24 +10:00
raid1.h md/raid1: prevent merging too large request 2012-07-31 10:03:53 +10:00
raid5.c A few bugfixes for md 2013-06-13 10:13:29 -07:00
raid5.h md: remove CONFIG_MULTICORE_RAID456 entirely 2013-03-20 13:21:14 +11:00
raid10.c DM RAID: Add ability to restore transiently failed devices on resume 2013-06-14 08:10:24 +10:00
raid10.h MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) 2013-02-26 11:55:30 +11:00