linux/drivers
Gabriel Krisman Bertazi c06dfd124d dm mpath: provide high-resolution timer to HST for bio-based
The precision loss of reading IO start_time with jiffies_to_nsecs
instead of using a high resolution timer degrades HST path prediction
for BIO-based mpath on high load workloads.

Below, I show the utilization percentage of a 10 disk multipath with
asymmetrical disk access cost, while being exercised by a randwrite FIO
benchmark with high submission queue depth (depth=64).  It is possible
to see that the HST path selection degrades heavily for high-iops in
BIO-mpath, underutilizing the slower paths way beyond expected.  This
seems to be caused by the start_time truncation, which makes some IO to
seem much slower than it actually is.  In this scenario ST outperforms
HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns().

The third column shows utilization with this patch applied.  It is easy
to see that now HST prediction is much closer to the ideal distribution
(calculated considering the real cost of each path).

|     |   ST | HST (orig) | HST(ktime) | Best |
| sdd | 0.17 |       0.20 |       0.17 | 0.18 |
| sde | 0.17 |       0.20 |       0.17 | 0.18 |
| sdf | 0.17 |       0.20 |       0.17 | 0.18 |
| sdg | 0.06 |       0.00 |       0.06 | 0.04 |
| sdh | 0.03 |       0.00 |       0.03 | 0.02 |
| sdi | 0.03 |       0.00 |       0.03 | 0.02 |
| sdj | 0.02 |       0.00 |       0.01 | 0.01 |
| sdk | 0.02 |       0.00 |       0.01 | 0.01 |
| sdl | 0.17 |       0.20 |       0.17 | 0.18 |
| sdm | 0.17 |       0.20 |       0.17 | 0.18 |

This issue was originally discussed [1] when we first merged HST, and
this patch was left as a low hanging fruit to be solved later.

Regarding the implementation, as suggested by Mike in that mail thread,
in order to avoid the overhead of ktime_get_ns for other selectors, this
patch adds a flag for the selector code to request the high-resolution
timer.

I tested this using the same benchmark used in the original HST submission.

Full test and benchmark scripts are available here:

  https://people.collabora.com/~krisman/HST-BIO-MPATH/

[1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
[snitzer: cleaned up various implementation details]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-05-09 15:39:23 -04:00
..
accessibility
acpi Merge branch 'acpi-bus' 2022-04-08 19:50:44 +02:00
amba
android
ata ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back 2022-04-06 11:08:04 +09:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-03-17 13:56:58 -07:00
auxdisplay auxdisplay: lcd2s: Use array size explicitly in lcd2s_gotoxy() 2022-03-18 20:31:14 +01:00
base net: mdio: don't defer probe forever if PHY IRQ provider is missing 2022-04-08 14:17:55 -07:00
bcma Core MTD changes: 2022-03-25 13:35:34 -07:00
block blk-cgroup: replace bio_blkcg with bio_blkcg_css 2022-05-02 14:06:20 -06:00
bluetooth Bluetooth: ath3k: remove superfluous header files 2022-03-18 17:12:09 +01:00
bus Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
cdrom cdrom: remove unused variable 2022-04-06 08:47:52 -06:00
char random: use memmove instead of memcpy for remaining 32 bytes 2022-04-16 12:53:31 +02:00
clk A single revert to fix a boot regression seen when clk_put() started 2022-04-03 12:21:14 -07:00
clocksource asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
comedi
connector
counter Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
cpufreq Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm 2022-03-22 12:15:47 +01:00
cpuidle RISC-V CPU Idle Support 2022-03-30 16:17:54 -07:00
crypto virtio: features, fixes 2022-03-31 13:57:15 -07:00
cxl cxl/pci: Drop shadowed variable 2022-04-08 12:59:43 -07:00
dax dax for 5.18 2022-03-24 18:12:09 -07:00
dca
devfreq
dio
dma dmaengine updates for v5.18-rc1 2022-03-30 10:54:49 -07:00
dma-buf dma-buf: handle empty dma_fence_arrays gracefully 2022-03-29 09:14:30 +02:00
edac Merge branch 'edac-amd64' into edac-updates-for-v5.18 2022-03-21 10:34:57 +01:00
eisa
extcon
firewire
firmware firmware: arm_scmi: Fix sparse warnings in OPTEE transport driver 2022-04-04 23:06:37 +01:00
fpga
fsi
gnss
gpio intel-gpio for v5.18-2 2022-04-16 21:57:00 +02:00
gpu Merge tag 'amd-drm-fixes-5.18-2022-04-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes 2022-04-15 07:14:20 +10:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2022-04-01 10:14:32 -07:00
hsi
hv hyperv-fixes for 5.18-rc2 2022-04-07 06:35:34 -10:00
hwmon Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
hwspinlock hwspinlock: sprd: Use struct_size() helper in devm_kzalloc() 2022-03-11 14:56:57 -06:00
hwtracing Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
i2c i2c: ismt: Fix undefined behavior due to shift overflowing the constant 2022-04-15 23:49:02 +02:00
i3c
idle cpuidle: intel_idle: Drop redundant backslash at line end 2022-03-17 14:32:59 +01:00
iio Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
infiniband RDMA/hfi1: Fix use-after-free bug for mm struct 2022-04-08 15:40:06 -03:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2022-04-01 10:14:32 -07:00
interconnect
iommu iommu/omap: Fix regression in probe for NULL pointer dereference 2022-04-08 11:16:29 +02:00
ipack
irqchip irqchip/gic, gic-v3: Prevent GSI to SGI translations 2022-04-05 16:33:47 +01:00
isdn mISDN: fix typo "frame to short" -> "frame too short" 2022-03-21 13:26:38 +00:00
leds LED updates for 5.18-rc1. Nothing major here, there are two drivers 2022-03-27 14:09:48 -07:00
macintosh
mailbox mailbox: ti-msgmgr: Operate mailbox in polled mode during system suspend 2022-03-12 19:33:30 -06:00
mcb
md dm mpath: provide high-resolution timer to HST for bio-based 2022-05-09 15:39:23 -04:00
media media: si2157: unknown chip version Si2147-A30 ROM 0x50 2022-04-09 17:45:49 +02:00
memory memory: fsl_ifc: populate child nodes of buses and mfd devices 2022-04-06 09:39:16 +02:00
memstick
message scsi: message: fusion: Remove redundant variable dmp 2022-04-06 22:28:07 -04:00
mfd - New Drivers 2022-03-25 13:56:18 -07:00
misc habanalabs: Fix test build failures 2022-04-04 17:03:04 +02:00
mmc block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD 2022-04-17 19:49:59 -06:00
most
mtd block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
mux
net Networking fixes for 5.18-rc3, including fixes from wireless and 2022-04-14 11:58:19 -07:00
nfc spi: Updates for v5.18 2022-03-21 18:33:57 -07:00
ntb
nubus
nvdimm libnvdimm for 5.18 2022-03-30 10:04:11 -07:00
nvme nvme-fc: fold t fc_update_appid into fc_appid_store 2022-05-02 14:06:20 -06:00
nvmem nvmem: brcm_nvram: parse NVRAM content into NVMEM cells 2022-03-18 14:08:36 +01:00
of Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
opp
parisc parisc: Fix CPU affinity for Lasi, WAX and Dino chips 2022-03-29 21:37:12 +02:00
parport parport_pc: Also enable driver for PCI systems 2022-03-18 14:01:41 +01:00
pci hyperv-fixes for 5.18-rc2 2022-04-07 06:35:34 -10:00
pcmcia
peci
perf perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant 2022-04-08 14:17:57 +01:00
phy phy: PHY_FSL_LYNX_28G should depend on ARCH_LAYERSCAPE 2022-03-29 08:45:16 -07:00
pinctrl Pin control bulk changes for the v5.18 kernel cycle 2022-03-28 11:52:53 -07:00
platform platform/x86: amd-pmc: Fix compilation without CONFIG_SUSPEND 2022-04-04 16:26:09 +02:00
pnp PNP update for 5.18-rc1 2022-03-21 14:46:01 -07:00
power power: supply: Reset err after not finding static battery 2022-04-13 12:05:22 +02:00
powercap
pps pps: generators: pps_gen_parport: Switch to use module_parport_driver() 2022-03-18 14:01:19 +01:00
ps3
ptp ptp: ocp: handle error from nvmem_device_find 2022-03-30 12:08:11 -07:00
pwm
rapidio
ras
regulator regulator: atc260x: Fix missing active_discharge_on setting 2022-04-04 08:59:43 +01:00
remoteproc remoteproc updates for v5.18 2022-03-30 10:50:48 -07:00
reset reset: tegra-bpmp: Restore Handle errors in BPMP response 2022-04-04 11:14:13 +02:00
rpmsg rpmsg: ctrl: Introduce new RPMSG_CREATE/RELEASE_DEV_IOCTL controls 2022-03-13 11:49:53 -05:00
rtc RTC for 5.18 2022-04-01 09:37:18 -07:00
s390 block: remove QUEUE_FLAG_DISCARD 2022-04-17 19:49:59 -06:00
sbus
scsi blk-cgroup: move blkcg_{get,set}_fc_appid out of line 2022-05-02 14:06:20 -06:00
sh
siox
slimbus
soc Networking changes for 5.18. 2022-03-24 13:13:26 -07:00
soundwire Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
spi ACPI updates for 5.18-rc2 2022-04-08 18:23:02 -10:00
spmi
ssb
staging staging: r8188eu: Fix PPPoE tag insertion on little endian systems 2022-04-04 16:35:20 +02:00
target block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD 2022-04-17 19:49:59 -06:00
tc
tee ARM driver updates for 5.18 2022-03-23 18:23:13 -07:00
thermal Merge branch 'thermal-hfi' 2022-03-18 19:00:26 +01:00
thunderbolt Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
tty tty: serial: mpc52xx_uart: make rx/tx hooks return unsigned, part II. 2022-04-04 10:33:02 +02:00
uio
usb xen: branch for v5.18-rc1 2022-03-28 14:32:39 -07:00
vdpa virtio: fixes, cleanups 2022-04-05 10:40:52 -07:00
vfio vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used 2022-04-13 11:37:44 -06:00
vhost virtio: features, fixes 2022-03-31 13:57:15 -07:00
video fbdev: Fix unregistering of framebuffers without device 2022-04-06 21:12:28 +02:00
virt Random number generator fixes for Linux 5.18-rc1. 2022-03-31 14:51:34 -07:00
virtio virtio: fixes, cleanups 2022-04-05 10:40:52 -07:00
visorbus
vlynq
vme
w1 w1: w1_therm: Add support for Maxim MAX31850 thermoelement IF. 2022-03-18 14:07:09 +01:00
watchdog linux-watchdog 5.18-rc1 tag 2022-03-31 14:14:03 -07:00
xen xen/balloon: don't use PV mode extra memory for zone device allocations 2022-04-07 15:08:37 -05:00
zorro
Kconfig
Makefile