linux/drivers/md
Eric Mei 9ffc8f7cb9 md/raid5: don't do chunk aligned read on degraded array.
When array is degraded, read data landed on failed drives will result in
reading rest of data in a stripe. So a single sequential read would
result in same data being read twice.

This patch is to avoid chunk aligned read for degraded array. The
downside is to involve stripe cache which means associated CPU overhead
and extra memory copy.

Test Results:
Following test are done on a enterprise storage node with Seagate 6T SAS
drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2,
chunk size 128 KiB.

I use FIO, using direct-io with various bs size, enough queue depth,
tested sequential and 100% random read against 3 array config:
 1) optimal, as baseline;
 2) degraded;
 3) degraded with this patch.
Kernel version is 4.0-rc3.

Each individual test I only did once so there might be some variations,
but we just focus on big trend.

Sequential Read:
  bs=(KiB)  optimal(MiB/s)  degraded(MiB/s)  degraded-with-patch (MiB/s)
   1024       1608            656              995
    512       1624            710              956
    256       1635            728              980
    128       1636            771              983
     64       1612           1119             1000
     32       1580           1420             1004
     16       1368            688              986
      8        768            647              953
      4        411            413              850

Random Read:
  bs=(KiB)  optimal(IOPS)  degraded(IOPS)  degraded-with-patch (IOPS)
   1024        163            160              156
    512        274            273              272
    256        426            428              424
    128        576            592              591
     64        726            724              726
     32        849            848              837
     16        900            970              971
      8        927            940              929
      4        948            940              955

Some notes:
  * In sequential + optimal, as bs size getting smaller, the FIO thread
become CPU bound.
  * In sequential + degraded, there's big increase when bs is 64K and
32K, I don't have explanation.
  * In sequential + degraded-with-patch, the MD thread mostly become CPU
bound.

If you want to we can discuss specific data point in those data. But in
general it seems with this patch, we have more predictable and in most
cases significant better sequential read performance when array is
degraded, and almost no noticeable impact on random read.

Performance is a complicated thing, the patch works well for this
particular configuration, but may not be universal. For example I
imagine testing on all SSD array may have very different result. But I
personally think in most cases IO bandwidth is more scarce resource than
CPU.


Signed-off-by: Eric Mei <eric.mei@seagate.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 08:00:43 +10:00
..
bcache md/bcache: use generic io stats accounting functions to simplify io stat accounting 2014-11-24 08:05:12 -07:00
persistent-data - Significant dm-crypt CPU scalability performance improvements thanks 2015-02-21 13:28:45 -08:00
bitmap.c md-cluster: re-add capabilities 2015-04-22 07:59:39 +10:00
bitmap.h md-cluster: re-add capabilities 2015-04-22 07:59:39 +10:00
dm-bio-prison.c dm bio prison: introduce support for locking ranges of blocks 2014-11-10 15:25:30 -05:00
dm-bio-prison.h dm bio prison: introduce support for locking ranges of blocks 2014-11-10 15:25:30 -05:00
dm-bio-record.h dm: Refactor for new bio cloning/splitting 2013-11-23 22:33:55 -08:00
dm-bufio.c dm bufio: fix time comparison to use time_after_eq() 2015-02-09 13:06:48 -05:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-cache-block-types.h dm cache: revert "remove remainder of distinct discard block size" 2014-11-10 15:25:30 -05:00
dm-cache-metadata.c dm cache: fix missing ERR_PTR returns and handling 2015-01-28 09:59:20 -05:00
dm-cache-metadata.h dm cache: revert "remove remainder of distinct discard block size" 2014-11-10 15:25:30 -05:00
dm-cache-policy-cleaner.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h dm cache: add remove_cblock method to policy interface 2013-11-11 11:37:50 -05:00
dm-cache-policy-mq.c dm cache policy mq: simplify ability to promote sequential IO to the cache 2014-11-10 15:25:30 -05:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-target.c - Most significant change this cycle is request-based DM now supports 2015-02-12 16:36:31 -08:00
dm-crypt.c dm crypt: sort writes 2015-02-16 11:11:15 -05:00
dm-delay.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
dm-era-target.c dm era: check for a non-NULL metadata object before closing it 2014-06-03 13:44:08 -04:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-io.c dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME 2015-02-27 14:53:32 -05:00
dm-ioctl.c dm ioctl: fix stale comment above dm_get_inactive_table() 2015-02-09 13:06:48 -05:00
dm-kcopyd.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-linear.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-log-userspace-base.c dm: use time_in_range() and time_after() 2015-02-09 13:06:48 -05:00
dm-log-userspace-transfer.c dm log userspace: fix memory leak in dm_ulog_tfr_init failure path 2014-10-05 20:03:38 -04:00
dm-log-userspace-transfer.h
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm mpath: simplify failure path of dm_multipath_init() 2015-02-09 13:06:49 -05:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c dm mirror: do not degrade the mirror on discard error 2015-02-13 19:50:46 -05:00
dm-raid.c - Most significant change this cycle is request-based DM now supports 2015-02-12 16:36:31 -08:00
dm-region-hash.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c dm snapshot: remove unnecessary NULL checks before vfree() calls 2015-02-09 13:06:49 -05:00
dm-snap-transient.c
dm-snap.c dm snapshot: suspend merging snapshot when doing exception handover 2015-02-27 14:53:16 -05:00
dm-stats.c - Significant DM thin-provisioning performance improvements to meet 2014-12-08 21:10:03 -08:00
dm-stats.h dm: add statistics support 2013-09-05 20:46:06 -04:00
dm-stripe.c dm stripe: fix potential for leak in stripe_ctr error path 2014-10-10 22:05:18 -04:00
dm-switch.c dm switch: efficiently support repetitive patterns 2014-08-01 12:30:37 -04:00
dm-sysfs.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-table.c dm: inherit QUEUE_FLAG_SG_GAPS flags from underlying queues 2015-02-11 10:25:46 -05:00
dm-target.c dm: allocate requests in target when stacking on blk-mq devices 2015-02-09 13:06:47 -05:00
dm-thin-metadata.c dm thin metadata: remove unused dm_pool_get_data_block_size() 2015-02-09 13:06:49 -05:00
dm-thin-metadata.h dm thin metadata: remove unused dm_pool_get_data_block_size() 2015-02-09 13:06:49 -05:00
dm-thin.c dm thin: fix to consistently zero-fill reads to unprovisioned blocks 2015-02-27 09:59:12 -05:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm verity: fix biovecs hash calculation regression 2014-04-15 12:19:24 -04:00
dm-zero.c dm crypt, dm zero: update author name following legal name change 2014-07-10 16:44:14 -04:00
dm.c dm: fix add_disk() NULL pointer due to race with free_dev() 2015-03-23 18:14:00 -04:00
dm.h dm table: train hybrid target type detection to select blk-mq if appropriate 2015-02-09 13:06:47 -05:00
faulty.c md: rename ->stop to ->free 2015-02-04 08:35:52 +11:00
Kconfig Create a separate module for clustering support 2015-02-23 07:28:42 -06:00
linear.c md: rename ->stop to ->free 2015-02-04 08:35:52 +11:00
linear.h
Makefile Create a separate module for clustering support 2015-02-23 07:28:42 -06:00
md-cluster.c md-cluster: re-add capabilities 2015-04-22 07:59:39 +10:00
md-cluster.h md-cluster: re-add capabilities 2015-04-22 07:59:39 +10:00
md.c md: allow resync to go faster when there is competing IO. 2015-04-22 08:00:40 +10:00
md.h md: remove 'go_faster' option from ->sync_request() 2015-04-22 08:00:40 +10:00
multipath.c md: rename ->stop to ->free 2015-02-04 08:35:52 +11:00
multipath.h
raid0.c md raid0: access mddev->queue (request queue member) conditionally because it is not set when accessed from dm-raid 2015-04-22 08:00:41 +10:00
raid0.h
raid1.c md: remove 'go_faster' option from ->sync_request() 2015-04-22 08:00:40 +10:00
raid1.h md: make ->congested robust against personality changes. 2015-02-04 08:35:52 +11:00
raid5.c md/raid5: don't do chunk aligned read on degraded array. 2015-04-22 08:00:43 +10:00
raid5.h md/raid5: allow the stripe_cache to grow and shrink. 2015-04-22 08:00:43 +10:00
raid10.c md: remove 'go_faster' option from ->sync_request() 2015-04-22 08:00:40 +10:00
raid10.h md: make ->congested robust against personality changes. 2015-02-04 08:35:52 +11:00