linux

History

Yuanhan Liu e9e4c377e2 md/raid5: per hash value and exclusive wait_for_stripe I noticed heavy spin lock contention at get_active_stripe() with fsmark multiple thread write workloads. Here is how this hot contention comes from. We have limited stripes, and it's a multiple thread write workload. Hence, those stripes will be taken soon, which puts later processes to sleep for waiting free stripes. When enough stripes(>= 1/4 total stripes) are released, all process are woken, trying to get the lock. But there is one only being able to get this lock for each hash lock, making other processes spinning out there for acquiring the lock. Thus, it's effectiveless to wakeup all processes and let them battle for a lock that permits one to access only each time. Instead, we could make it be a exclusive wake up: wake up one process only. That avoids the heavy spin lock contention naturally. To do the exclusive wake up, we've to split wait_for_stripe into multiple wait queues, to make it per hash value, just like the hash lock. Here are some test results I have got with this patch applied(all test run 3 times): `fsmark.files_per_sec' ===================== next-20150317 this patch ------------------------- ------------------------- metric_value ±stddev metric_value ±stddev change testbox/benchmark/testcase-params ------------------------- ------------------------- -------- ------------------------------ 25.600 ±0.0 92.700 ±2.5 262.1% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-btrfs-4M-30G-fsyncBeforeClose 25.600 ±0.0 77.800 ±0.6 203.9% ivb44/fsmark/1x-64t-9BRD_6G-RAID5-btrfs-4M-30G-fsyncBeforeClose 32.000 ±0.0 93.800 ±1.7 193.1% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-ext4-4M-30G-fsyncBeforeClose 32.000 ±0.0 81.233 ±1.7 153.9% ivb44/fsmark/1x-64t-9BRD_6G-RAID5-ext4-4M-30G-fsyncBeforeClose 48.800 ±14.5 99.667 ±2.0 104.2% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-xfs-4M-30G-fsyncBeforeClose 6.400 ±0.0 12.800 ±0.0 100.0% ivb44/fsmark/1x-64t-3HDD-RAID5-btrfs-4M-40G-fsyncBeforeClose 63.133 ±8.2 82.800 ±0.7 31.2% ivb44/fsmark/1x-64t-9BRD_6G-RAID5-xfs-4M-30G-fsyncBeforeClose 245.067 ±0.7 306.567 ±7.9 25.1% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-f2fs-4M-30G-fsyncBeforeClose 17.533 ±0.3 21.000 ±0.8 19.8% ivb44/fsmark/1x-1t-3HDD-RAID5-xfs-4M-40G-fsyncBeforeClose 188.167 ±1.9 215.033 ±3.1 14.3% ivb44/fsmark/1x-1t-4BRD_12G-RAID5-btrfs-4M-30G-NoSync 254.500 ±1.8 290.733 ±2.4 14.2% ivb44/fsmark/1x-1t-9BRD_6G-RAID5-btrfs-4M-30G-NoSync `time.system_time' ===================== next-20150317 this patch ------------------------- ------------------------- metric_value ±stddev metric_value ±stddev change testbox/benchmark/testcase-params ------------------------- ------------------------- -------- ------------------------------ 7235.603 ±1.2 185.163 ±1.9 -97.4% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-btrfs-4M-30G-fsyncBeforeClose 7666.883 ±2.9 202.750 ±1.0 -97.4% ivb44/fsmark/1x-64t-9BRD_6G-RAID5-btrfs-4M-30G-fsyncBeforeClose 14567.893 ±0.7 421.230 ±0.4 -97.1% ivb44/fsmark/1x-64t-3HDD-RAID5-btrfs-4M-40G-fsyncBeforeClose 3697.667 ±14.0 148.190 ±1.7 -96.0% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-xfs-4M-30G-fsyncBeforeClose 5572.867 ±3.8 310.717 ±1.4 -94.4% ivb44/fsmark/1x-64t-9BRD_6G-RAID5-ext4-4M-30G-fsyncBeforeClose 5565.050 ±0.5 313.277 ±1.5 -94.4% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-ext4-4M-30G-fsyncBeforeClose 2420.707 ±17.1 171.043 ±2.7 -92.9% ivb44/fsmark/1x-64t-9BRD_6G-RAID5-xfs-4M-30G-fsyncBeforeClose 3743.300 ±4.6 379.827 ±3.5 -89.9% ivb44/fsmark/1x-64t-3HDD-RAID5-ext4-4M-40G-fsyncBeforeClose 3308.687 ±6.3 363.050 ±2.0 -89.0% ivb44/fsmark/1x-64t-3HDD-RAID5-xfs-4M-40G-fsyncBeforeClose Where, 1x: where 'x' means iterations or loop, corresponding to the 'L' option of fsmark 1t, 64t: where 't' means thread 4M: means the single file size, corresponding to the '-s' option of fsmark 40G, 30G, 120G: means the total test size 4BRD_12G: BRD is the ramdisk, where '4' means 4 ramdisk, and where '12G' means the size of one ramdisk. So, it would be 48G in total. And we made a raid on those ramdisk As you can see, though there are no much performance gain for hard disk workload, the system time is dropped heavily, up to 97%. And as expected, the performance increased a lot, up to 260%, for fast device(ram disk). v2: use bits instead of array to note down wait queue need to wake up. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>		2015-06-17 10:00:27 +10:00
..
bcache	md/bcache: use generic io stats accounting functions to simplify io stat accounting	2014-11-24 08:05:12 -07:00
persistent-data	- Significant dm-crypt CPU scalability performance improvements thanks	2015-02-21 13:28:45 -08:00
bitmap.c	md/bitmap: remove rcu annotation from pointer arithmetic.	2015-05-21 09:14:41 +10:00
bitmap.h	md-cluster: re-add capabilities	2015-04-22 07:59:39 +10:00
dm-bio-prison.c	dm bio prison: introduce support for locking ranges of blocks	2014-11-10 15:25:30 -05:00
dm-bio-prison.h	dm bio prison: introduce support for locking ranges of blocks	2014-11-10 15:25:30 -05:00
dm-bio-record.h	dm: Refactor for new bio cloning/splitting	2013-11-23 22:33:55 -08:00
dm-bufio.c	dm bufio: fix time comparison to use time_after_eq()	2015-02-09 13:06:48 -05:00
dm-bufio.h	dm snapshot: use dm-bufio prefetch	2014-01-14 23:23:03 -05:00
dm-builtin.c	dm sysfs: fix a module unload race	2014-01-14 23:23:04 -05:00
dm-cache-block-types.h	dm cache: revert "remove remainder of distinct discard block size"	2014-11-10 15:25:30 -05:00
dm-cache-metadata.c	dm cache: fix missing ERR_PTR returns and handling	2015-01-28 09:59:20 -05:00
dm-cache-metadata.h	dm cache: revert "remove remainder of distinct discard block size"	2014-11-10 15:25:30 -05:00
dm-cache-policy-cleaner.c	dm cache: policy change version from string to integer set	2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h	dm cache: add remove_cblock method to policy interface	2013-11-11 11:37:50 -05:00
dm-cache-policy-mq.c	dm cache policy mq: try not to writeback data that changed in the last second	2015-03-31 12:03:48 -04:00
dm-cache-policy.c	dm cache: add policy name to status output	2014-01-16 13:44:11 -05:00
dm-cache-policy.h	dm cache: add policy name to status output	2014-01-16 13:44:11 -05:00
dm-cache-target.c	- Most significant change this cycle is request-based DM now supports	2015-02-12 16:36:31 -08:00
dm-crypt.c	Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"	2015-05-05 12:16:43 -04:00
dm-delay.c	dm delay: use msecs_to_jiffies for time conversion	2015-04-15 12:10:21 -04:00
dm-era-target.c	dm era: check for a non-NULL metadata object before closing it	2014-06-03 13:44:08 -04:00
dm-exception-store.c	dm: replace simple_strtoul	2012-07-27 15:07:59 +01:00
dm-exception-store.h
dm-flakey.c	block: Abstract out bvec iterator	2013-11-23 22:33:47 -08:00
dm-io.c	dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME	2015-02-27 14:53:32 -05:00
dm-ioctl.c	dm: only initialize the request_queue once	2015-04-30 10:25:21 -04:00
dm-kcopyd.c	dm: stop using WQ_NON_REENTRANT	2013-08-23 09:02:13 -04:00
dm-linear.c	block: Abstract out bvec iterator	2013-11-23 22:33:47 -08:00
dm-log-userspace-base.c	dm log userspace base: fix compile warning	2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.c	dm log userspace transfer: match wait_for_completion_timeout return type	2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c	dm: add log writes target	2015-04-15 12:10:24 -04:00
dm-log.c	dm: use memweight()	2012-07-30 17:25:16 -07:00
dm-mpath.c	dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path	2015-05-27 17:37:22 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c	dm: reject trailing characters in sccanf input	2012-03-28 18:41:26 +01:00
dm-raid1.c	dm mirror: do not degrade the mirror on discard error	2015-02-13 19:50:46 -05:00
dm-raid.c	- Most significant change this cycle is request-based DM now supports	2015-02-12 16:36:31 -08:00
dm-region-hash.c	block: Abstract out bvec iterator	2013-11-23 22:33:47 -08:00
dm-round-robin.c	dm: reject trailing characters in sccanf input	2012-03-28 18:41:26 +01:00
dm-service-time.c	dm: reject trailing characters in sccanf input	2012-03-28 18:41:26 +01:00
dm-snap-persistent.c	dm snapshot: remove unnecessary NULL checks before vfree() calls	2015-02-09 13:06:49 -05:00
dm-snap-transient.c
dm-snap.c	dm snapshot: suspend merging snapshot when doing exception handover	2015-02-27 14:53:16 -05:00
dm-stats.c	- Significant DM thin-provisioning performance improvements to meet	2014-12-08 21:10:03 -08:00
dm-stats.h	dm: add statistics support	2013-09-05 20:46:06 -04:00
dm-stripe.c	dm stripe: fix potential for leak in stripe_ctr error path	2014-10-10 22:05:18 -04:00
dm-switch.c	dm switch: efficiently support repetitive patterns	2014-08-01 12:30:37 -04:00
dm-sysfs.c	dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr	2015-04-15 12:10:17 -04:00
dm-table.c	dm: fix reload failure of 0 path multipath mapping on blk-mq devices	2015-05-29 13:41:16 -04:00
dm-target.c	dm: allocate requests in target when stacking on blk-mq devices	2015-02-09 13:06:47 -05:00
dm-thin-metadata.c	dm thin metadata: remove unused dm_pool_get_data_block_size()	2015-02-09 13:06:49 -05:00
dm-thin-metadata.h	dm thin metadata: remove unused dm_pool_get_data_block_size()	2015-02-09 13:06:49 -05:00
dm-thin.c	dm thin: fix to consistently zero-fill reads to unprovisioned blocks	2015-02-27 09:59:12 -05:00
dm-uevent.c
dm-uevent.h
dm-verity.c	dm verity: add error handling modes for corrupted blocks	2015-04-15 12:10:22 -04:00
dm-zero.c	dm crypt, dm zero: update author name following legal name change	2014-07-10 16:44:14 -04:00
dm.c	dm: fix casting bug in dm_merge_bvec()	2015-05-29 13:41:16 -04:00
dm.h	dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr	2015-04-15 12:10:17 -04:00
faulty.c	md: rename ->stop to ->free	2015-02-04 08:35:52 +11:00
Kconfig	md updates for 4.1	2015-04-24 09:28:01 -07:00
linear.c	md: rename ->stop to ->free	2015-02-04 08:35:52 +11:00
linear.h
Makefile	md updates for 4.1	2015-04-24 09:28:01 -07:00
md-cluster.c	md-cluster: re-add capabilities	2015-04-22 07:59:39 +10:00
md-cluster.h	md-cluster: re-add capabilities	2015-04-22 07:59:39 +10:00
md.c	md: convert to kstrto*()	2015-06-17 10:00:06 +10:00
md.h	md: remove 'go_faster' option from ->sync_request()	2015-04-22 08:00:40 +10:00
multipath.c	md: rename ->stop to ->free	2015-02-04 08:35:52 +11:00
multipath.h
raid0.c	md/raid0: fix restore to sector variable in raid0_make_request	2015-05-21 09:14:25 +10:00
raid0.h
raid1.c	md: remove 'go_faster' option from ->sync_request()	2015-04-22 08:00:40 +10:00
raid1.h	md: make ->congested robust against personality changes.	2015-02-04 08:35:52 +11:00
raid5.c	md/raid5: per hash value and exclusive wait_for_stripe	2015-06-17 10:00:27 +10:00
raid5.h	md/raid5: per hash value and exclusive wait_for_stripe	2015-06-17 10:00:27 +10:00
raid10.c	md/raid10: make sync_request_write() call bio_copy_data()	2015-06-17 09:59:57 +10:00
raid10.h	md: make ->congested robust against personality changes.	2015-02-04 08:35:52 +11:00