linux/drivers
Chris Wilson 688e6c7258 drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:43 +01:00
..
accessibility
acpi Merge branches 'acpica-fixes', 'acpi-video' and 'acpi-processor' 2016-06-03 22:35:05 +02:00
amba ARM: 8566/1: drivers: amba: properly handle devices with power domains 2016-05-05 19:00:40 +01:00
android
ata remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
atm atm: iphase: off by one in rx_pkt() 2016-05-31 11:52:59 -07:00
auxdisplay
base More power management updates for v4.7-rc1 2016-05-25 15:29:21 -07:00
bcma MTD updates for v4.7: 2016-05-24 11:00:20 -07:00
block DAX error handling for 4.7 2016-05-26 19:34:26 -07:00
bluetooth Bluetooth: Add USB ID 13D3:3487 to ath3k 2016-05-13 16:54:59 +02:00
bus Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-05-19 10:02:26 -07:00
cdrom
char drm/i915: Add support for mapping an object page by page 2016-06-13 10:03:54 +01:00
clk remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
clocksource Small release overall. 2016-05-19 11:27:09 -07:00
connector
cpufreq Merge branch 'pm-cpufreq-fixes' 2016-06-03 22:34:18 +02:00
cpuidle cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() 2016-05-18 02:48:37 +02:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-05-30 15:20:18 -07:00
dax /dev/dax, core: file operations and dax-mmap 2016-05-20 22:02:55 -07:00
dca
devfreq PM / devfreq: style/typo fixes 2016-05-03 11:22:10 +09:00
dio
dma remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
dma-buf dma-buf: remove dma_buf_debugfs_create_file() 2016-06-20 22:26:37 +05:30
edac Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2016-05-17 17:05:30 -07:00
eisa
extcon
firewire treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
firmware driver core update for 4.7-rc1 2016-05-20 21:26:15 -07:00
fmc
fpga
gpio gpio: drop lock before reading GPIO direction 2016-05-30 17:11:59 +02:00
gpu drm/i915: Slaughter the thundering i915_wait_request herd 2016-07-01 20:58:43 +01:00
hid Merge branch 'for-4.7/wacom' into for-linus 2016-05-17 12:42:27 +02:00
hsi HSI: omap-ssi: move omap_ssi_port_update_fclk 2016-05-09 22:45:18 +02:00
hv
hwmon Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging 2016-05-26 09:48:23 -07:00
hwspinlock drivers/hwspinlock: use correct radix tree API 2016-05-20 17:58:30 -07:00
hwtracing Char / Misc driver update for 4.7-rc1 2016-05-20 21:20:31 -07:00
i2c i2c: dev: use after free in detach 2016-05-28 17:37:42 +02:00
ide
idle
iio Staging and IIO driver update for 4.7-rc1 2016-05-20 22:20:48 -07:00
infiniband Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2016-05-28 12:04:17 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2016-05-27 19:14:35 -07:00
iommu remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
ipack
irqchip irqchip/irq-pic32-evic: Fix bug with external interrupts. 2016-06-02 18:03:50 +01:00
isdn TTY and Serial driver update for 4.7-rc1 2016-05-20 20:57:27 -07:00
leds pwm: Changes for v4.7-rc1 2016-05-25 10:40:15 -07:00
lguest
lightnvm lightnvm: reserved space calculation incorrect 2016-05-06 12:51:10 -06:00
macintosh
mailbox mailbox: Fix devm_ioremap_resource error detection code 2016-05-08 22:44:46 +05:30
mcb mcb: Delete num_cells variable which is not required 2016-05-03 15:52:28 -07:00
md Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2016-05-27 14:28:09 -07:00
media [media] omap_vout: Switch to use the video/omapfb_dss.h header file 2016-06-03 16:06:39 +03:00
memory MTD updates for v4.7: 2016-05-24 11:00:20 -07:00
memstick drivers/memstick/core/mspro_block: use kmemdup 2016-05-23 17:04:14 -07:00
message SCSI misc on 20160517 2016-05-18 16:38:59 -07:00
mfd remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
misc Char / Misc driver update for 4.7-rc1 2016-05-20 21:20:31 -07:00
mmc mmc: sunxi: Re-enable eMMC HS-DDR modes on Allwinner A80 2016-06-02 10:40:20 +02:00
mtd This pull request contains mostly cleanups and minor 2016-05-27 18:49:29 -07:00
net Linux 4.7-rc2 2016-06-09 11:01:49 +10:00
nfc NFC: pn533: handle interrupted commands in pn533_recv_frame 2016-05-10 00:01:47 +02:00
ntb
nubus
nvdimm DAX error handling for 4.7 2016-05-26 19:34:26 -07:00
nvme nvme/host: Add missing blk_integrity tag_size + flags assignments 2016-05-17 17:14:21 -06:00
nvmem remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
of MTD updates for v4.7: 2016-05-24 11:00:20 -07:00
oprofile
parisc
parport
pci Char / Misc driver update for 4.7-rc1 2016-05-20 21:20:31 -07:00
pcmcia
perf drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error 2016-06-03 10:16:21 +01:00
phy USB patches for 4.7-rc1 2016-05-20 21:12:25 -07:00
pinctrl pinctrl: mediatek: fix dual-edge code defect 2016-05-31 10:13:45 +02:00
platform platform/chrome: Driver and binding changes for 4.7 2016-05-28 12:32:01 -07:00
pnp driver core update for 4.7-rc1 2016-05-20 21:26:15 -07:00
power power supply and reset changes for the v4.7 series 2016-05-20 14:06:21 -07:00
powercap Power management material for v4.7-rc1 2016-05-16 19:17:22 -07:00
pps
ps3
ptp ptp: oops in ptp_ioctl() 2016-05-29 22:32:27 -07:00
pwm pwm: Changes for v4.7-rc1 2016-05-25 10:40:15 -07:00
rapidio rapidio/mport_cdev: fix uapi type definitions 2016-05-05 17:38:53 -07:00
ras
regulator regulator: pwm: Use pwm_get_args() where appropriate 2016-05-17 14:45:02 +02:00
remoteproc remoteproc: Add additional crash reasons 2016-05-12 15:50:19 -07:00
reset
rpmsg rpmsg: add THIS_MODULE to rpmsg_driver in rpmsg core 2016-05-06 11:08:58 -07:00
rtc rtc: tps6586x: rename so module can be autoloaded 2016-05-21 17:07:17 +02:00
s390 DAX error handling for 4.7 2016-05-26 19:34:26 -07:00
sbus openprom: fix warning 2016-05-20 18:33:37 -07:00
scsi SCSI fixes on 20160529 2016-05-29 13:28:39 -07:00
sfi
sh
sn
soc soc: mtk-pmic-wrap: avoid integer overflow warning 2016-05-19 15:20:24 +02:00
spi sound updates #2 for 4.7-rc1 2016-05-28 12:23:12 -07:00
spmi
ssb
staging dma-buf/fence: make fence context 64 bit v2 2016-06-02 08:27:41 +02:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2016-05-28 12:04:17 -07:00
tc
thermal Merge branches 'acpica-fixes', 'acpi-video' and 'acpi-processor' 2016-06-03 22:35:05 +02:00
thunderbolt thunderbolt: Fix double free of drom buffer 2016-05-02 12:09:22 -05:00
tty devpts: Make each mount of devpts an independent filesystem. 2016-06-05 10:36:01 -07:00
uio
usb Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2016-05-28 12:04:17 -07:00
uwb
vfio vfio/pci: Allow VPD short read 2016-05-31 21:25:52 -06:00
vhost target: make close_session optional 2016-05-10 01:19:26 -07:00
video Merge omapdss header refactoring 2016-06-07 12:42:58 +03:00
virt
virtio virtio_balloon: fix PFN format for virtio-1 2016-05-22 19:44:13 +03:00
vlynq
vme
w1
watchdog Merge git://www.linux-watchdog.org/linux-watchdog 2016-05-25 10:19:17 -07:00
xen Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2016-05-28 12:04:17 -07:00
zorro
Kconfig libnvdimm for 4.7 2016-05-23 11:18:01 -07:00
Makefile /dev/dax, pmem: direct access to persistent memory 2016-05-20 22:02:53 -07:00