linux/drivers
Daniel Vetter 9db529aac9 drm/i915: More surgically unbreak the modeset vs reset deadlock
There's no reason to entirely wedge the gpu, for the minimal deadlock
bugfix we only need to unbreak/decouple the atomic commit from the gpu
reset. The simplest way to fix that is by replacing the
unconditional fence wait a the top of commit_tail by a wait which
completes either when the fences are done (normal case, or when a
reset doesn't need to touch the display state). Or when the gpu reset
needs to force-unblock all pending modeset states.

The lesser source of deadlocks is when we try to pin a new framebuffer
and run into a stall. There's a bunch of places this can happen, like
eviction, changing the caching mode, acquiring a fence on older
platforms. And we can't just break the depency loop and keep going,
the only way would be to break out and restart. But the problem with
that approach is that we must stall for the reset to complete before
we grab any locks, and with the atomic infrastructure that's a bit
tricky. The only place is the ioctl code, and we don't want to insert
code into e.g. the BUSY ioctl. Hence for that problem just create a
critical section, and if any code is in there, wedge the GPU. For the
steady-state this should never be a problem.

Note that in both cases TDR itself keeps working, so from a userspace
pov this trickery isn't observable. Users themselvs might spot a short
glitch while the rendering is catching up again, but that's still
better than pre-TDR where we've thrown away all the rendering,
including innocent batches. Also, this fixes the regression TDR
introduced of making gpu resets deadlock-prone when we do need to
touch the display.

One thing I noticed is that gpu_error.flags seems to use both our own
wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
facilities. Not entirely sure why this inconsistency exists, I just
picked one style.

A possible future avenue could be to insert the gpu reset in-between
ongoing modeset changes, which would avoid the momentary glitch. But
that's a lot more work to implement in the atomic commit machinery,
and given that we only need this for pre-g4x hw, of questionable
utility just for the sake of polishing gpu reset even more on those
old boxes. It might be useful for other features though.

v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.

v3: Really emabarrassing fixup, I checked the wrong bit and broke the
unbreak/wakeup logic.

v4: Also handle deadlocks in pin_to_display.

v5: Review from Michel:
- Fixup the BUILD_BUG_ON
- Don't forget about the overlay

Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170808080828.23650-3-daniel.vetter@ffwll.ch
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
2017-08-14 17:03:36 +02:00
..
accessibility
acpi acpi/nfit: Fix memory corruption/Unregister mce decoder on failure 2017-07-17 11:43:58 -07:00
amba
android binder: Use wake up hint for synchronous transactions. 2017-07-17 14:44:19 +02:00
ata Merge branch 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2017-07-06 09:41:58 -07:00
atm atm: zatm: Fix an error handling path in 'zatm_init_one()' 2017-07-18 11:37:46 -07:00
auxdisplay
base Power management fixes for v4.13-rc2 2017-07-20 14:56:46 -07:00
bcma
block nbd: kill unused ret in recv_work 2017-07-13 08:03:30 -06:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
bus main drm pull for v4.13 2017-07-09 18:48:37 -07:00
cdrom block: don't set bounce limit in blk_init_queue 2017-06-27 12:13:45 -06:00
char agp: nvidia: constify pci_device_id. 2017-08-04 16:59:50 +10:00
clk Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-07-15 10:59:54 -07:00
clocksource clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly 2017-07-17 22:43:00 +02:00
connector
cpufreq Merge branches 'intel_pstate' and 'pm-domains' 2017-07-20 18:57:15 +02:00
cpuidle powerpc updates for 4.13 2017-07-07 13:55:45 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-07-14 22:49:50 -07:00
dax device-dax: fix sysfs duplicate warnings 2017-07-18 17:49:14 -07:00
dca
devfreq PM / devfreq: constify attribute_group structures. 2017-07-06 10:17:24 +09:00
dio
dma dmaengine updates for 4.13-rc1 2017-07-08 12:36:50 -07:00
dma-buf dma-buf/sw_sync: clean up list before signaling the fence 2017-07-31 14:11:15 -03:00
edac EDAC, pnd2: Fix Apollo Lake DIMM detection 2017-06-29 10:37:50 +02:00
eisa
extcon
firewire networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
firmware efi: avoid fortify checks in EFI stub 2017-07-12 16:26:02 -07:00
fmc
fpga
fsi drivers/fsi: fix fsi_slave_mode prototype 2017-07-17 16:13:54 +02:00
gpio Merge (most of) tag 'mfd-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd 2017-07-07 13:30:05 -07:00
gpu drm/i915: More surgically unbreak the modeset vs reset deadlock 2017-08-14 17:03:36 +02:00
hid HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value 2017-07-20 15:45:39 +02:00
hsi HSI changes for the v4.13 series 2017-07-04 14:28:22 -07:00
hv vmbus: re-enable channel tasklet 2017-07-17 15:00:47 +02:00
hwmon hwmon: (applesmc) Avoid buffer overruns 2017-07-15 16:38:56 -07:00
hwspinlock
hwtracing Char/Misc patches for 4.13-rc1 2017-07-03 20:55:59 -07:00
i2c Merge branch 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2017-07-12 10:04:56 -07:00
ide ide: avoid warning for timings calculation 2017-07-21 04:37:22 +01:00
idle intel_idle: Use more common logging style 2017-06-29 22:58:35 +02:00
iio hwmon updates for v4.13: 2017-07-04 11:48:27 -07:00
infiniband RDMA/core: Initialize port_num in qp_attr 2017-07-20 11:24:13 -04:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2017-07-14 22:53:37 -07:00
iommu IOMMU Updates for Linux v4.13 2017-07-12 10:00:04 -07:00
ipack
irqchip irqchip/digicolor: Drop unnecessary static 2017-07-18 21:59:23 +02:00
isdn isdn: avm: c4: constify pci_device_id. 2017-07-15 21:25:56 -07:00
leds LED updates for 4.13 2017-07-06 11:32:40 -07:00
lguest
lightnvm lightnvm: pblk: remove unnecessary checks 2017-07-07 13:17:36 -06:00
macintosh Merge branch 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-05 13:13:32 -07:00
mailbox Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration 2017-07-07 10:24:07 -07:00
mcb
md Merge tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-07-18 11:51:08 -07:00
media drm: rcar-du: Repair vblank for DRM page flips using the VSP 2017-08-03 16:17:30 +03:00
memory ARM: SoC driver updates 2017-07-04 14:47:47 -07:00
memstick
message
mfd chrome-platform-for-linus-4.13 2017-07-11 09:55:47 -07:00
misc powerpc updates for 4.13 2017-07-07 13:55:45 -07:00
mmc MMC core: 2017-07-14 13:10:06 -07:00
mtd MTD updates for v4.13-rc1: 2017-07-13 12:07:44 -07:00
mux mux: mux-core: unregister mux_class in mux_exit() 2017-07-17 16:38:35 +02:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-20 16:33:39 -07:00
nfc NFC 4.13 pull request 2017-07-01 14:30:39 -07:00
ntb ntb: Add error path/handling to Debug FS entry creation 2017-07-06 11:30:08 -04:00
nubus
nvdimm libnvdimm: fix badblock range handling of ARS range 2017-07-17 11:43:58 -07:00
nvme nvmet: don't report 0-bytes in serial number 2017-07-20 08:41:56 -06:00
nvmem nvmem: rockchip-efuse: amend compatible rk322x-efuse to rk3228-efuse 2017-07-17 16:15:57 +02:00
of Device properties framework updates for v4.13-rc1 2017-07-10 15:23:45 -07:00
oprofile
parisc parisc: ->mapping_error 2017-07-05 21:46:42 +02:00
parport
pci Power management fixes for v4.13-rc1 2017-07-14 22:24:25 -07:00
pcmcia
perf Merge branch 'aarch64/for-next/ras-apei' into aarch64/for-next/core 2017-06-26 10:54:27 +01:00
phy
pinctrl This is the big bulk of pin control changes for the v4.13 series: 2017-07-06 11:38:59 -07:00
platform Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-07-15 10:59:54 -07:00
pnp This is the bulk of GPIO changes for the v4.13 series: 2017-07-07 12:40:27 -07:00
power power supply and reset changes for the v4.13 series (part 2) 2017-07-13 11:47:59 -07:00
powercap powercap/RAPL: prevent overridding bits outside of the mask 2017-06-28 00:38:34 +02:00
pps
ps3
ptp ptp: dte: Use LL suffix for 64-bit constants 2017-07-06 11:40:58 +01:00
pwm pwm: Changes for v4.13-rc1 2017-07-13 11:49:52 -07:00
rapidio
ras arm64 updates for 4.13: 2017-07-05 17:09:27 -07:00
regulator Merge remote-tracking branches 'regulator/topic/settle', 'regulator/topic/tps65910' and 'regulator/topic/tps65917' into regulator-next 2017-07-03 16:52:21 +01:00
remoteproc remoteproc/keystone: Fix circular dependencies for ARM configs 2017-06-27 16:21:34 -07:00
reset ARM: SoC driver updates 2017-07-04 14:47:47 -07:00
rpmsg rpmsg updates for v4.13 2017-07-06 15:38:31 -07:00
rtc RTC for 4.13 2017-07-13 12:15:06 -07:00
s390 drivers: s390: move static and inline before return type 2017-07-12 16:26:04 -07:00
sbus block: don't set bounce limit in blk_init_queue 2017-06-27 12:13:45 -06:00
scsi SCSI fixes on 20170715 2017-07-17 12:26:12 -07:00
sfi
sh drivers/sh/intc/virq.c: delete an error message for a failed memory allocation in add_virq_to_pirq() 2017-07-06 16:24:30 -07:00
sn
soc ARM: SoC driver updates 2017-07-04 14:47:47 -07:00
spi Merge branch 'for-spi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-08 10:41:53 -07:00
spmi spmi: pmic-arb: Always allocate ppid_to_apid table 2017-07-17 15:00:47 +02:00
ssb
staging staging: vboxvideo: remove dead gamma lut code 2017-08-07 11:20:36 +02:00
target Add wait_for_random_bytes() and get_random_*_wait() functions so that 2017-07-15 12:44:02 -07:00
tc
tee
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2017-07-14 13:12:32 -07:00
thunderbolt thunderbolt: Correct access permissions for active NVM contents 2017-07-17 15:55:08 +02:00
tty tty: hide unused pty_get_peer function 2017-07-17 17:04:41 +02:00
uio
usb xhci: fix memleak in xhci_run() 2017-07-20 14:40:36 +02:00
uwb driver core patches for 4.13-rc1 2017-07-03 20:27:48 -07:00
vfio VFIO updates for v4.13-rc1 2017-07-13 12:23:54 -07:00
vhost Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-07-13 14:27:32 -07:00
video Merge branch 'akpm' (patches from Andrew) 2017-07-13 12:38:49 -07:00
virt
virtio virtio_balloon: disable VIOMMU support 2017-06-18 23:13:35 +03:00
vlynq
vme
w1 w1: omap-hdq: fix error return code in omap_hdq_probe() 2017-07-17 16:48:15 +02:00
watchdog Merge git://www.linux-watchdog.org/linux-watchdog 2017-07-11 09:59:37 -07:00
xen xen/balloon: don't online new memory initially 2017-07-23 08:13:18 +02:00
zorro
Kconfig
Makefile