mainlining shenanigans
Go to file
Lyude Paul 9a64c65083 drm/i915: Add short HPD IRQ storm detection for non-MST systems
Unfortunately, it seems that the HPD IRQ storm problem from the early
days of Intel GPUs was never entirely solved, only mostly. Within the
last couple of days, I got a bug report from one of our customers who
had been having issues with their machine suddenly booting up very
slowly after having updated. The amount of time it took to boot went
from around 30 seconds, to over 6 minutes consistently.

After some investigation, I discovered that i915 was reporting massive
amounts of short HPD IRQ spam on this system from the DisplayPort port,
despite there not being anything actually connected. The symptoms would
start with one "long" HPD IRQ being detected at boot:

[    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
[    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
[    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
[    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
[    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
[    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
[    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
…

followed by constant short IRQs afterwards:

[    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
[    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
[    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
[    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
[    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
[    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
[    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080

The customer's system in question has a GM45 GPU, which is apparently
well known for hotplugging storms.

So, workaround this impressively broken hardware by changing the default
HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
short IRQs count for 1. This makes it so that 5 long IRQs will trigger
an HPD storm, and on systems with short HPD storm detection 50 short
IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
constant pulsing, which seems like a good middleground between being too
sensitive and not being sensitive enough (which would cause visible
stutters in userspace every time a storm occurs).

And just to be extra safe: we don't enable this by default on systems
with MST support. There's too high of a chance of MST support triggering
storm detection, and systems that are new enough to support MST are a
lot less likely to have issues with IRQ storms anyway.

As a note: this patch was tested using a ThinkPad T450s and a Chamelium
to simulate the short IRQ storms.

Changes since v1:
- Don't use two separate thresholds, just make long IRQs count for 10
  each and short IRQs count for 1. This simplifies the code a bit
  - Ville Syrjälä
Changes since v2:
- Document @long_hpd in intel_hpd_irq_storm_detect, no functional
  changes
Changes since v4:
- Remove !! in long_hpd assignment - Ville Syrjälä
- queue_hp = true - Ville Syrjälä

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-6-lyude@redhat.com
2018-11-07 15:12:30 -05:00
arch Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-29 14:34:06 -07:00
block blk-mq: I/O and timer unplugs are inverted in blktrace 2018-09-27 13:12:44 -06:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto DMAengine updates for v4.19-rc1 2018-08-18 15:55:59 -07:00
Documentation mm, drm/i915: mark pinned shmemfs pages as unevictable 2018-11-07 15:28:32 +00:00
drivers drm/i915: Add short HPD IRQ storm detection for non-MST systems 2018-11-07 15:12:30 -05:00
firmware
fs filesystem-dax for 4.19-rc6 2018-09-30 06:19:38 -07:00
include mm, drm/i915: mark pinned shmemfs pages as unevictable 2018-11-07 15:28:32 +00:00
init Kbuild updates for v4.19 (2nd) 2018-08-25 13:40:38 -07:00
ipc ipc/shm: properly return EIDRM in shm_lock() 2018-09-04 16:45:02 -07:00
kernel Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-09-29 11:32:03 -07:00
lib lib/Kconfig.debug: fix three typos in help text 2018-09-04 16:45:02 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
mm mm, drm/i915: mark pinned shmemfs pages as unevictable 2018-11-07 15:28:32 +00:00
net ip_tunnel: be careful when accessing the inner header 2018-09-24 12:27:04 -07:00
samples samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM 2018-08-16 21:55:32 +02:00
scripts linux-kselftest-4.19-rc5 2018-09-17 07:24:28 +02:00
security Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name" 2018-09-25 13:28:58 +02:00
sound ALSA: x86: Rip out the lpe audio runtime suspend/resume hooks 2018-11-02 18:25:33 +02:00
tools This is the 4.19-rc6 release 2018-10-04 11:03:34 +10:00
usr initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
virt KVM: Remove obsolete kvm_unmap_hva notifier backend 2018-09-07 15:06:02 +02:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS - Fix build failure without CONFIG_DRM_FBDEV_EMULATION (Arnd) 2018-10-11 14:51:57 +10:00
Makefile Linux 4.19-rc6 2018-09-30 07:15:35 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.