linux/drivers
Daniel Vetter 17e1df07df drm/i915: fix wait_for_pending_flips vs gpu hang deadlock
My g33 here seems to be shockingly good at hitting them all. This time
around kms_flip/flip-vs-panning-vs-hang blows up:

intel_crtc_wait_for_pending_flips correctly checks for gpu hangs and
if a gpu hang is pending aborts the wait for outstanding flips so that
the setcrtc call will succeed and release the crtc mutex. And the gpu
hang handler needs that lock in intel_display_handle_reset to be able
to complete outstanding flips.

The problem is that we can race in two ways:
- Waiters on the dev_priv->pending_flip_queue aren't woken up after
  we've the reset as pending, but before we actually start the reset
  work. This means that the waiter doesn't notice the pending reset
  and hence will keep on hogging the locks.

  Like with dev->struct_mutex and the ring->irq_queue wait queues we
  there need to wake up everyone that potentially holds a lock which
  the reset handler needs.

- intel_display_handle_reset was called _after_ we've already
  signalled the completion of the reset work. Which means a waiter
  could sneak in, grab the lock and never release it (since the
  pageflips won't ever get released).

  Similar to resetting the gem state all the reset work must complete
  before we update the reset counter. Contrary to the gem reset we
  don't need to have a second explicit wake up call since that will
  have happened already when completing the pageflips. We also don't
  have any issues that the completion happens while the reset state is
  still pending - wait_for_pending_flips is only there to ensure we
  display the right frame. After a gpu hang&reset events such
  guarantees are out the window anyway. This is in contrast to the gem
  code where too-early wake-up would result in unnecessary restarting
  of ioctls.

Also, since we've gotten these various deadlocks and ordering
constraints wrong so often throw copious amounts of comments at the
code.

This deadlock regression has been introduced in the commit which added
the pageflip reset logic to the gpu hang work:

commit 96a02917a0
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Feb 18 19:08:49 2013 +0200

    drm/i915: Finish page flips and update primary planes after a GPU reset

v2:
- Add comments to explain how the wake_up serves as memory barriers
  for the atomic_t reset counter.
- Improve the comments a bit as suggested by Chris Wilson.
- Extract the wake_up calls before/after the reset into a little
  i915_error_wake_up and unconditionally wake up the
  pending_flip_queue waiters, again as suggested by Chris Wilson.

v3: Throw copious amounts of comments at i915_error_wake_up as
suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-09 11:26:03 +02:00
..
accessibility printk: move braille console support into separate braille.[ch] files 2013-07-31 14:41:03 -07:00
acpi Revert "ACPI / video: Always call acpi_video_init_brightness() on init" 2013-08-22 23:39:02 +02:00
amba
ata sata_fsl: save irqs while coalescing 2013-08-20 08:38:23 -04:00
atm
auxdisplay
base regmap: cache: Make sure to sync the last register in a block 2013-08-05 15:51:09 +01:00
bcma bcma: add support for BCM43142 2013-06-27 13:42:16 -04:00
block aoe: adjust ref of head for compound page tails 2013-08-13 17:57:48 -07:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2013-07-31 15:11:50 -04:00
bus ARM SoC device tree changes 2013-07-02 14:23:01 -07:00
cdrom drivers/cdrom/cdrom.c: use kzalloc() for failing hardware 2013-07-03 16:07:25 -07:00
char More virtio console fixes than I'm happy with, but all real issues, 2013-08-08 09:32:20 -07:00
clk clk: exynos4: Add CLK_GET_RATE_NOCACHE flag for the Exynos4x12 ISP clocks 2013-08-13 10:01:56 -07:00
clocksource clocksource+irqchip: delete __cpuinit usage from all related files 2013-07-14 19:36:57 -04:00
connector
cpufreq cpufreq: rename ignore_nice as ignore_nice_load 2013-08-07 22:25:06 +02:00
cpuidle Revert "cpuidle: Quickly notice prediction failure for repeat mode" 2013-07-29 13:32:29 +02:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2013-07-24 11:05:18 -07:00
dca
devfreq Merge branch 'akpm' (updates from Andrew Morton) 2013-07-03 17:12:13 -07:00
dio
dma ARM: SoC fixes for v3.11-rc 2013-08-08 09:28:08 -07:00
edac EDAC: Fix lockdep splat 2013-07-23 16:01:28 -07:00
eisa
extcon drivers: avoid format string in dev_set_name 2013-07-03 16:07:41 -07:00
firewire firewire: fix libdc1394/FlyCap2 iso event regression 2013-07-27 20:24:36 +02:00
firmware dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine() 2013-07-31 14:41:02 -07:00
fmc FMC: fix error handling in probe() function 2013-06-24 16:23:25 -07:00
gpio gpio_msm: Fix build error due to missing err.h 2013-07-31 00:34:31 +02:00
gpu drm/i915: fix wait_for_pending_flips vs gpu hang deadlock 2013-09-09 11:26:03 +02:00
hid Revert "HID: hid-logitech-dj: querying_devices was never set" 2013-08-09 11:34:19 +02:00
hsi drivers: avoid format string in dev_set_name 2013-07-03 16:07:41 -07:00
hv Drivers: hv: balloon: Do not post pressure status if interrupted 2013-07-16 23:19:19 -07:00
hwmon hwmon: (adt7470) Fix incorrect return code check 2013-08-08 12:43:07 -07:00
hwspinlock
i2c i2c: Fix Kontron PLD prescaler calculation 2013-08-05 10:31:18 +02:00
ide Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide 2013-07-10 18:15:41 -07:00
idle
iio iio: adjd_s311: Fix non-scan mode data read 2013-08-19 19:30:21 +01:00
infiniband Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma' and 'qib' into for-next 2013-07-31 14:24:06 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2013-07-13 18:05:13 -07:00
iommu IOMMU Updates for Linux 3.11 2013-07-10 14:46:40 -07:00
ipack
irqchip clocksource+irqchip: delete __cpuinit usage from all related files 2013-07-14 19:36:57 -04:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-07-09 18:24:39 -07:00
leds leds: mc13783: Fix "uninitialized variable" warning 2013-07-02 08:44:02 -07:00
lguest Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-07-04 11:40:58 -07:00
macintosh powerpc/windfarm: Fix noisy slots-fan on Xserve (rm31) 2013-08-01 13:11:47 +10:00
mailbox
md dm cache: avoid conflicting remove_mapping() in mq policy 2013-08-16 15:56:51 -04:00
media Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2013-08-09 15:04:09 -07:00
memory
memstick drivers/memstick/host/r592.c: convert to module_pci_driver 2013-07-03 16:08:06 -07:00
message drivers: avoid format strings in names passed to alloc_workqueue() 2013-07-03 16:07:41 -07:00
mfd For the 3.11 merge we only have one new MFD driver for the Kontron PLD. 2013-07-10 11:10:27 -07:00
misc Char/Misc patches for 3.11-rc3 2013-07-26 11:36:12 -07:00
mmc ARM: pxa: propagate errors from regulator_enable() to pxamci 2013-07-23 12:15:15 -07:00
mtd A couple of fixes and clean-ups, allow for assigning user-defined 2013-07-05 12:09:48 -07:00
net be2net: fix disabling TX in be_close() 2013-08-22 19:58:23 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-07-09 18:24:39 -07:00
ntb
nubus
of of: fdt: fix memory initialization for expanded DT 2013-08-21 20:05:49 -05:00
oprofile drivers: delete __cpuinit usage from all remaining drivers files 2013-07-14 19:36:59 -04:00
parisc parisc: Fix interrupt routing for C8000 serial ports 2013-07-31 23:42:32 +02:00
parport Merge branch 'akpm' (updates from Andrew Morton) 2013-07-03 17:12:13 -07:00
pci ACPI: Try harder to resolve _ADR collisions for bridges 2013-08-07 22:55:00 +02:00
pcmcia Driver core patches for 3.11-rc1 2013-07-02 11:44:19 -07:00
pinctrl pinctrl: sunxi: Add spinlocks 2013-08-07 21:57:17 +02:00
platform Merge branch 'akpm' (patches from Andrew Morton) 2013-08-23 09:52:32 -07:00
pnp PNP / ACPI: avoid garbage in resource name 2013-07-18 01:38:59 +02:00
power Nothing exciting this time, just assorted fixes and cleanups. 2013-07-10 11:13:00 -07:00
pps pps-gpio: add device-tree binding and support 2013-07-03 16:08:06 -07:00
ps3
ptp build some drivers only when compile-testing 2013-06-24 16:41:32 -07:00
pwm pwm: pwm-tiehrpwm: Use clk_enable/disable instead clk_prepare/unprepare. 2013-06-26 23:23:54 +02:00
rapidio rapidio: fix use after free in rio_unregister_scan() 2013-07-31 14:41:02 -07:00
regulator For the 3.11 merge we only have one new MFD driver for the Kontron PLD. 2013-07-10 11:10:27 -07:00
remoteproc Trivial remoteproc fixes by Suman Anna, Wei Yongjun and Thomas Meyer. 2013-07-11 12:35:09 -07:00
reset
rpmsg
rtc drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit 2013-08-13 17:57:48 -07:00
s390 [SCSI] zfcp: remove access control tables interface (keep sysfs files) 2013-08-22 09:26:51 -07:00
sbus
scsi [SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on 2013-08-21 10:54:20 -07:00
sfi
sh Merge branch 'pm-assorted' 2013-06-28 13:01:40 +02:00
sn
spi spi: spi-davinci: Fix direction in dma_map_single() 2013-07-29 20:27:54 +01:00
ssb Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2013-07-13 14:52:21 -07:00
staging Merge branch 'drm-next-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-next 2013-09-02 09:31:40 +10:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2013-07-11 12:57:19 -07:00
tc
thermal Thermal: Fix lockup of cpu_down() 2013-07-22 09:34:46 +08:00
tty parisc: Fix interrupt routing for C8000 serial ports 2013-07-31 23:42:32 +02:00
uio uio: use vma_pages() to replace (vm_end - vm_start) >> PAGE_SHIFT 2013-07-03 16:07:26 -07:00
usb usb: phy: fix build breakage 2013-08-23 10:41:46 -07:00
uwb drivers: avoid format string in dev_set_name 2013-07-03 16:07:41 -07:00
vfio vfio-pci: Avoid deadlock on remove 2013-07-24 16:36:41 -06:00
vhost vhost: more fixes for 3.11 2013-07-23 14:38:20 -07:00
video Merge branch 'drm-next-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-next 2013-09-02 09:31:40 +10:00
virt
virtio No real surprises. 2013-07-10 14:50:58 -07:00
vlynq
vme vme: vme_tsi148.c: fix error return code in tsi148_probe() 2013-06-24 16:23:25 -07:00
w1 drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode 2013-07-03 16:08:06 -07:00
watchdog Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2013-07-13 14:52:21 -07:00
xen Bug-fixes: 2013-08-21 16:38:33 -07:00
zorro zorro: switch to fixed_size_llseek() 2013-06-29 12:57:28 +04:00
Kconfig For the 3.11 merge we only have one new MFD driver for the Kontron PLD. 2013-07-10 11:10:27 -07:00
Makefile For the 3.11 merge we only have one new MFD driver for the Kontron PLD. 2013-07-10 11:10:27 -07:00