linux/drivers/md
Gabriel Krisman Bertazi c06dfd124d dm mpath: provide high-resolution timer to HST for bio-based
The precision loss of reading IO start_time with jiffies_to_nsecs
instead of using a high resolution timer degrades HST path prediction
for BIO-based mpath on high load workloads.

Below, I show the utilization percentage of a 10 disk multipath with
asymmetrical disk access cost, while being exercised by a randwrite FIO
benchmark with high submission queue depth (depth=64).  It is possible
to see that the HST path selection degrades heavily for high-iops in
BIO-mpath, underutilizing the slower paths way beyond expected.  This
seems to be caused by the start_time truncation, which makes some IO to
seem much slower than it actually is.  In this scenario ST outperforms
HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns().

The third column shows utilization with this patch applied.  It is easy
to see that now HST prediction is much closer to the ideal distribution
(calculated considering the real cost of each path).

|     |   ST | HST (orig) | HST(ktime) | Best |
| sdd | 0.17 |       0.20 |       0.17 | 0.18 |
| sde | 0.17 |       0.20 |       0.17 | 0.18 |
| sdf | 0.17 |       0.20 |       0.17 | 0.18 |
| sdg | 0.06 |       0.00 |       0.06 | 0.04 |
| sdh | 0.03 |       0.00 |       0.03 | 0.02 |
| sdi | 0.03 |       0.00 |       0.03 | 0.02 |
| sdj | 0.02 |       0.00 |       0.01 | 0.01 |
| sdk | 0.02 |       0.00 |       0.01 | 0.01 |
| sdl | 0.17 |       0.20 |       0.17 | 0.18 |
| sdm | 0.17 |       0.20 |       0.17 | 0.18 |

This issue was originally discussed [1] when we first merged HST, and
this patch was left as a low hanging fruit to be solved later.

Regarding the implementation, as suggested by Mike in that mail thread,
in order to avoid the overhead of ktime_get_ns for other selectors, this
patch adds a flag for the selector code to request the high-resolution
timer.

I tested this using the same benchmark used in the original HST submission.

Full test and benchmark scripts are available here:

  https://people.collabora.com/~krisman/HST-BIO-MPATH/

[1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
[snitzer: cleaned up various implementation details]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-05-09 15:39:23 -04:00
..
bcache block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD 2022-04-17 19:49:59 -06:00
persistent-data dm space map common: add bounds check to sm_ll_lookup_bitmap() 2022-01-04 13:58:19 -05:00
dm-audit.c dm: introduce audit event module for device mapper 2021-10-27 16:53:47 -04:00
dm-audit.h dm: introduce audit event module for device mapper 2021-10-27 16:53:47 -04:00
dm-bio-prison-v1.c
dm-bio-prison-v1.h
dm-bio-prison-v2.c
dm-bio-prison-v2.h
dm-bio-record.h block: move integrity handling out of <linux/blkdev.h> 2021-10-18 06:17:02 -06:00
dm-bufio.c block: turn bio_kmalloc into a simple kmalloc wrapper 2022-04-17 19:30:41 -06:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them 2021-10-18 14:43:22 -06:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c dm cache policy smq: make static read-only array table const 2022-02-22 10:35:53 -05:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
dm-clone-metadata.c dm clone metadata: remove unused function 2021-04-19 13:20:31 -04:00
dm-clone-metadata.h dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() 2020-03-27 14:42:51 -04:00
dm-clone-target.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
dm-core.h dm: simplify bio-based IO accounting further 2022-05-05 17:31:36 -04:00
dm-crypt.c dm crypt: make printing of the key constant-time 2022-05-09 12:34:03 -04:00
dm-delay.c dm: simplify basic targets 2022-05-05 17:31:35 -04:00
dm-dust.c dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them 2021-10-18 14:43:22 -06:00
dm-ebs-target.c scsi: dm: Remove WRITE_SAME support 2022-02-22 21:11:08 -05:00
dm-era-target.c dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them 2021-10-18 14:43:22 -06:00
dm-exception-store.c
dm-exception-store.h dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them 2021-10-18 14:43:22 -06:00
dm-flakey.c dm: simplify basic targets 2022-05-05 17:31:35 -04:00
dm-ima.c dm ima: fix wrong length calculation for no_data string 2022-02-22 10:42:41 -05:00
dm-ima.h dm ima: add version info to dm related events in ima log 2021-08-20 15:59:47 -04:00
dm-init.c dm init: Set file local variable static 2020-08-04 15:51:28 -04:00
dm-integrity.c dm integrity: fix error code in dm_integrity_ctr() 2022-05-09 12:14:00 -04:00
dm-io-tracker.h dm writecache: make writeback pause configurable 2021-06-28 16:30:13 -04:00
dm-io.c block: add a bdev_max_discard_sectors helper 2022-04-17 19:49:59 -06:00
dm-ioctl.c dm ioctl: log an error if the ioctl structure is corrupted 2022-04-01 10:29:43 -04:00
dm-kcopyd.c dm writecache: have ssd writeback wait if the kcopyd workqueue is busy 2021-06-15 15:42:03 -04:00
dm-linear.c dm: simplify basic targets 2022-05-05 17:31:35 -04:00
dm-log-userspace-base.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
dm-log.c dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them 2021-10-18 14:43:22 -06:00
dm-mpath.c dm mpath: provide high-resolution timer to HST for bio-based 2022-05-09 15:39:23 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h dm mpath: provide high-resolution timer to HST for bio-based 2022-05-09 15:39:23 -04:00
dm-ps-historical-service-time.c dm mpath: provide high-resolution timer to HST for bio-based 2022-05-09 15:39:23 -04:00
dm-ps-io-affinity.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ps-queue-length.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ps-round-robin.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ps-service-time.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-raid1.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-raid.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
dm-region-hash.c
dm-rq.c SCSI misc on 20220324 2022-03-24 19:37:53 -07:00
dm-rq.h
dm-snap-persistent.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-snap-transient.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-snap.c dm-snap: use blkdev_issue_flush instead of open coding it 2022-02-02 07:49:59 -07:00
dm-stats.c dm stats: add cond_resched when looping over entries 2022-05-09 12:11:07 -04:00
dm-stats.h dm stats: fix too short end duration_ns when using precise_timestamps 2022-02-21 15:35:39 -05:00
dm-stripe.c scsi: dm: Remove WRITE_SAME support 2022-02-22 21:11:08 -05:00
dm-switch.c dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them 2021-10-18 14:43:22 -06:00
dm-sysfs.c dm sysfs: use default_groups in kobj_type 2022-01-06 09:48:55 -05:00
dm-table.c dm: conditionally enable branching for less used features 2022-05-05 17:31:34 -04:00
dm-target.c
dm-thin-metadata.c dm thin metadata: remove unused dm_thin_remove_block and __remove 2022-02-22 13:55:50 -05:00
dm-thin-metadata.h dm thin metadata: remove unused dm_thin_remove_block and __remove 2022-02-22 13:55:50 -05:00
dm-thin.c block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD 2022-04-17 19:49:59 -06:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-verity-fec.c dm verity fec: fix misaligned RS roots IO 2021-04-14 14:28:29 -04:00
dm-verity-fec.h dm verity fec: fix misaligned RS roots IO 2021-04-14 14:28:29 -04:00
dm-verity-target.c - Add DM core support for emitting audit events through the audit 2021-11-09 11:02:04 -08:00
dm-verity-verify-sig.c dm verity: fix require_signatures module_param permissions 2021-05-25 16:14:05 -04:00
dm-verity-verify-sig.h dm verity: Fix compilation warning 2020-08-04 15:48:13 -04:00
dm-verity.h dm verity: add "panic_on_corruption" error handling mode 2020-07-13 11:47:33 -04:00
dm-writecache.c block: pass a block_device and opf to bio_alloc_bioset 2022-02-02 07:49:59 -07:00
dm-zero.c dm: add support for REQ_NOWAIT to various targets 2020-12-04 18:04:35 -05:00
dm-zone.c dm: don't grab target io reference in dm_zone_map_bio 2022-05-05 17:31:36 -04:00
dm-zoned-metadata.c dm-zoned: remove the ->name field in struct dmz_dev 2022-03-02 12:15:35 -05:00
dm-zoned-reclaim.c dm kcopyd: avoid useless atomic operations 2021-06-04 12:07:24 -04:00
dm-zoned-target.c dm-zoned: remove the ->name field in struct dmz_dev 2022-03-02 12:15:35 -05:00
dm-zoned.h dm-zoned: remove the ->name field in struct dmz_dev 2022-03-02 12:15:35 -05:00
dm.c dm: improve abnormal bio processing 2022-05-05 17:31:36 -04:00
dm.h dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset 2022-05-05 17:31:33 -04:00
Kconfig blk-mq: make the blk-mq stacking code optional 2022-02-16 19:39:09 -07:00
Makefile dm: introduce audit event module for device mapper 2021-10-27 16:53:47 -04:00
md-autodetect.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
md-bitmap.c md/bitmap: don't set max_write_behind if there is no write mostly device 2021-11-02 11:41:44 -07:00
md-bitmap.h
md-cluster.c md: fix spelling of "its" 2022-01-06 08:37:03 -08:00
md-cluster.h
md-faulty.c block: pass a block_device to bio_clone_fast 2022-02-04 07:43:18 -07:00
md-linear.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
md-linear.h md/raid1: Replace zero-length array with flexible-array 2020-05-13 12:02:23 -07:00
md-multipath.c SCSI misc on 20220324 2022-03-24 19:37:53 -07:00
md-multipath.h
md.c block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD 2022-04-17 19:49:59 -06:00
md.h scsi: md: Remove WRITE_SAME support 2022-02-22 21:11:08 -05:00
raid0.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
raid0.h
raid1-10.c md: raid1/raid10: drop pending_cnt 2022-03-08 15:16:54 -08:00
raid1.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
raid1.h md: raid1/raid10: drop pending_cnt 2022-03-08 15:16:54 -08:00
raid5-cache.c block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD 2022-04-17 19:49:59 -06:00
raid5-log.h
raid5-ppl.c for-5.18/write-streams-2022-03-18 2022-03-26 11:51:46 -07:00
raid5.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
raid5.h md/raid5: play nice with PREEMPT_RT 2022-01-06 08:37:02 -08:00
raid10.c block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
raid10.h md: raid1/raid10: drop pending_cnt 2022-03-08 15:16:54 -08:00