of_fdt_unflatten_tree() already sets the flag on this node to
OF_DETACHED, because of_fdt_unflatten_tree() calls
__unflatten_device_tree() with the detached bool set to true.
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Signed-off-by: Jyri Sarha <jsarha@ti.com>
We need the total frame refresh time to check if we are too close to
vertical sync when updating the two framebuffer DMA registers and risk
a collision. This new method is more accurate that the previous that
based on mode's vrefresh value, which itself is inaccurate or may not
even be initialized.
Reported-by: Kevin Hao <kexin.hao@windriver.com>
Fixes: 11abbc9f39 ("drm/tilcdc: Set framebuffer DMA address to HW only if CRTC is enabled")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Jyri Sarha <jsarha@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
omapdrm changes for 4.15
* OMAP4 HDMI CEC support
* tag 'omapdrm-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
omapdrm: omapdss_hdmi_ops: add lost_hotplug op
omapdrm: hdmi4: hook up the HDMI CEC support
omapdrm: hdmi4_cec: add OMAP4 HDMI CEC support
omapdrm: hdmi4: refcount hdmi_power_on/off_core
omapdrm: hdmi4: move hdmi4_core_powerdown_disable to hdmi_power_on_core()
omapdrm: hdmi4: prepare irq handling for HDMI CEC support
omapdrm: hdmi4: make low-level functions available
omapdrm: hdmi.h: extend hdmi_core_data with CEC fields
omapdrm: encoder-tpd12s015: keep ls_oe_gpio high
drm/imx: i.MX5 regression fix and i.MX6QP PRE/PRG stability fixes
- Disable channel burst locking on IPUv3EX (i.MX51) and IPUv3M (i.MX53).
This fixes a regression introduced by commit 790cb4c7c9 ("drm/imx: lock
scanout transfers for consecutive bursts").
- Give PRG a head start. Waiting for both double buffers to fill up before
enabling the IPU improves startup reliability.
- Avoid PRE control register updates during unsafe window, workaround for
ERR009624.
* tag 'imx-drm-fixes-2017-10-12' of git://git.pengutronix.de/git/pza/linux:
gpu: ipu-v3: pre: implement workaround for ERR009624
gpu: ipu-v3: prg: wait for double buffers to be filled on channel startup
gpu: ipu-v3: Allow channel burst locking on i.MX6 only
More 4.15 drm-misc stuff:
Cross-subsystem Changes:
- bridge cleanup refactor (Benjamin Gaignard)
Core Changes:
- less surprising atomic iterators (Maarten), fixes an oops introduced
in drm-next
- better gem/fb helper docs (Noralf)
- fix dma-buf rcu races (Christian König)
Driver Changes:
- adv7511: CEC support (Hans Verkuil)
- sun4i update from Chen-Yu to improve hdmi and A31 support
- sii8620: add remote control support (Maceiej Purski)
New drivers:
- SiI9234 bridge driver (Maciej Purski)
- 7" rpi touch panel (Eric Anholt)
Note that this contains a topic pull from regmap, needed by the sun4i
changes. Mark Brown sent that out for pulling into drm-misc.
* tag 'drm-misc-next-2017-10-12' of git://anongit.freedesktop.org/drm/drm-misc: (29 commits)
drm/dp: WARN about invalid/unknown link rates and bw codes
drm/msm/mdp5: remove less than 0 comparison for unsigned value
drm/bridge/sii8620: add remote control support
drm/sun4i: hdmi: Add support for A31's HDMI controller
drm/sun4i: hdmi: Add A31 specific DDC register definitions
drm/sun4i: hdmi: Add support for controller hardware variants
dt-bindings: display: sun4i: Add binding for A31 HDMI controller
drm/sun4i: hdmi: Allow using second PLL as TMDS clk parent
drm/sun4i: hdmi: create a regmap for later use
drm/sun4i: hdmi: Disable clks in bind function error path and unbind function
drm/sun4i: tcon: Add support for demuxing TCON output on A31
drm/sun4i: tcon: Add variant callback for TCON output muxing
drm/bridge/synopsys: dsi :remove is_panel_bridge
drm/vc4: remove bridge from driver internal structure
drm/stm: ltdc: remove bridge from driver internal structure
drm/drm_of: add drm_of_panel_bridge_remove function
drm/bridge: make drm_panel_bridge_remove more robust
dma-fence: fix dma_fence_get_rcu_safe v2
dma-buf: make reservation_object_copy_fences rcu save
drm/atomic: Unref duplicated drm_atomic_state in drm_atomic_helper_resume()
...
The new driver fails to build when CONFIG_PINCTRL is disabled:
drivers/gpu/drm/rockchip/rockchip_lvds.c: In function 'rockchip_lvds_grf_config':
drivers/gpu/drm/rockchip/rockchip_lvds.c:229:39: error: dereferencing pointer to incomplete type 'struct dev_pin_info'
if (lvds->pins && !IS_ERR(lvds->pins->default_state))
This adds the respective Kconfig dependency.
Fixes: 34cc0aa254 ("drm/rockchip: Add support for Rockchip Soc LVDS")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mark Yao <mark.yao@rock-chips.com>
Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171005120957.485433-1-arnd@arndb.de
There is a risk of overflowing vblank timestamps in 2038 or 2106 if
someone sets the drm_timestamp_monotonic module parameter to zero.
I found no indication of anyone ever setting the parameter, or
complaining about the default being wrong, after it was introduced
as a way to handle backwards-compatibility with linux prior to
c61eef726a ("drm: add support for monotonic vblank timestamps"),
so it's probably safer to just remove the parameter completely
and only allowing the default behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The drm vblank handling uses 'timeval' to store timestamps in either
monotonic or wall-clock time base. In either case, it reads the current
time as a ktime_t in get_drm_timestamp() and converts it from there.
This is a bit suspicious, as users of 'timeval' often suffer from
the time_t overflow in y2038. I have gone through this code and
found that it is unlikely to cause problems here:
- The user space ABI does not use time_t or timeval, but uses
'u32' and 'long' as the types. This means at least that rebuilding
user programs against a new libc with 64-bit time_t does not
change the ABI.
- As of commit c61eef726a ("drm: add support for monotonic vblank
timestamps") in linux-3.8, the monotonic timestamp is the default
and can only get reverted to wall-clock through a module-parameter.
- With the default monotonic timestamps, there is no problem at all.
- The drm_wait_vblank_ioctl() interface is alway safe on 64-bit
architectures, on 32-bit it might overflow the 'long' timestamps
in 2038 with wall-clock timestamps.
- The event handling uses 'u32' seconds, which overflow in 2106
on both 32-bit and 64-bit machines, when wall-clock timestamps
are used.
- The effect of overflowing either of the two is only temporary
(during the overflow, and is likely to keep working again
afterwards. It is likely the same problem as observing a
'settimeofday()' call, which was the reason for moving to the
monotonic timestamps in the first place.
Overall, this seems good enough, so my patch removes the use of
'timeval' from the vblank handling altogether and uses ktime_t
consistently, except for the part where we copy the data to user
space structures in the existing format.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
A bug recently encountered involved the issue where are we were
submitting requests to different ppGTT, each would pin a segment of the
GGTT for its logical context and ring. However, this is invisible to
eviction as we do not tie the context/ring VMA to a request and so do
not automatically wait upon it them (instead they are marked as pinned,
preventing eviction entirely). Instead the eviction code must flush those
contexts by switching to the kernel context. This selftest tries to
fill the GGTT with contexts to exercise a path where the
switch-to-kernel-context failed to make forward progress and we fail
with ENOSPC.
v2: Make the hole in the filled GGTT explicit.
v3: Swap out the arbitrary timeout for a private notification from
i915_gem_evict_something()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
For some selftests, we want to issue requests but delay them going to
hardware. Furthermore, we don't want those requests to block
indefinitely (or else we may hang the driver and block testing) so we
want to employ a timeout. So naturally we want a fence that is
automatically signaled by a timer.
v2: Add kselftests.
v3: Limit the API available to selftests; there isn't an overwhelming
reason to export it universally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-2-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
In the full-ppgtt world, we can fill the GGTT full of context objects.
These context objects are currently implicitly tracked by the requests
that pin them i.e. they are only unpinned when the request is completed
and retired, but we do not have the link from the vma to the request
(anymore). In order to unpin those contexts, we have to issue another
request and wait upon the switch to the kernel context.
The bug during eviction was that we assumed that a full GGTT meant we
would have requests on the GGTT timeline, and so we missed situations
where those requests where merely in flight (and when even they have not
yet been submitted to hw yet). The fix employed here is to change the
already-is-idle test to no look at the execution timeline, but count the
outstanding requests and then check that we have switched to the kernel
context. Erring on the side of overkill here just means that we stall a
little longer than may be strictly required, but we only expect to hit
this path in extreme corner cases where returning an erroneous error is
worse than the delay.
v2: Logical inversion when swapping over branches.
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-1-chris@chris-wilson.co.uk
We need to call reservation_object_reserve_shared() in both cases, but
this wasn't happening in the _NO_IMPLICIT submit case.
Fixes: f0a42bb ("drm/msm: submit support for in-fences")
Reported-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
While converting mdp5_enable/disable() calls to pm_runtime_get/put() API,
an extra call to pm_runtime_put_autosuspend() crept in
mdp5_crtc_cursor_set(). This results in calling the suspend handler
twice, and therefore clk_disables twice, which isn't a nice thing to do.
Fixes: d68fe15b18 (drm/msm/mdp5: Use runtime PM get/put API instead ...)
Reported-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The DSI runtime PM suspend/resume callbacks check whether
msm_host->cfg_hnd is non-NULL before trying to enable the bus clocks.
This is done to accommodate early calls to these functions that may
happen before the bus clocks are even initialized.
Calling pm_runtime_put_autosuspend() in dsi_host_init() can result in
racy behaviour since msm_host->cfg_hnd is set very soon after. If the
suspend callback happens too late, we end up trying to disable clocks
that were never enabled, resulting in a bunch of WARN_ON splats.
Use pm_runtime_put_sync() so that the suspend callback is called
immediately.
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
In case of error, the function msm_gem_get_vaddr() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Fixes: 8223286d62 ("drm/msm: Add a helper function for in-kernel
buffer allocations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
On machines where the vblank interrupt fires some time after the start
of vblank (or we just manage to race with the vblank interrupt handler)
we will currently stuff a stale vblank counter value into the flip event,
and thus we'll prematurely complete the flip.
Switch over to drm_crtc_accurate_vblank_count() to make sure we have an
up to date counter value, crucially also remember to add the +1 so that
the delayed vblank interrupt won't complete the flip prematurely.
Cc: stable@vger.kernel.org
Cc: Daniel Vetter <daniel@ffwll.ch>
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010133322.24029-1-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel@ffwll.ch> #irc
The CEC framework needs to know when the hotplug detect signal
disappears, since that means the CEC physical address has to be
invalidated (i.e. set to f.f.f.f).
Add a lost_hotplug op that is called when the HPD signal goes away.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Hook up the HDMI CEC support in the hdmi4 driver.
It add the CEC irq handler, the CEC (un)init calls and tells the CEC
implementation when the physical address changes.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Add the source and header for the OMAP4 HDMI CEC support.
This code is not yet hooked up, that will happen in the next patch.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
The hdmi_power_on/off_core functions can be called multiple times:
when the HPD changes and when the HDMI CEC support needs to power
the HDMI core.
So use a counter to know when to really power on or off the HDMI core.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Call hdmi4_core_powerdown_disable() in hdmi_power_on_core() to
power up the HDMI core (needed for CEC). The same call can now be dropped
in hdmi4_configure().
Signed-off-by: Hans Verkuil <hansverk@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Pass struct omap_hdmi to the irq handler since it will need access
to hdmi.core.
Do not clear the IRQ_HDMI_CORE bit: that will be controlled by the
HDMI CEC code.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Three low-level functions in hdmi4.c and hdmi4_core.c are
made available for use by the OMAP4 CEC support.
Renamed the prefix to hdmi4 since these are OMAP4 specific.
These function deal with the HDMI core and are needed to
power it up for use with CEC, even when the HPD is low.
Background: even if the HPD is low it should still be possible
to use CEC. Some displays will set the HPD low when they go into standby or
when they switch to another input, but CEC is still available and able
to wake up/change input for such a display.
This is explicitly allowed by the CEC standard.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Extend the hdmi_core_data struct with the additional fields needed
for CEC.
Also fix a simple typo in a comment.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
For OMAP4 CEC support the CEC pin should always be on. So keep
ls_oe_gpio high all the time in order to support CEC.
Background: even if the HPD is low it should still be possible
to use CEC. Some displays will set the HPD low when they go into standby or
when they switch to another input, but CEC is still available and able
to wake up/change input for such a display.
This is explicitly allowed by the CEC standard.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2nd batch of v4.15 features:
- lib/scatterlist updates, use for userptr allocations (Tvrtko)
- Fixed point wrapper cleanup (Mahesh)
- Gen9+ transition watermarks, watermark optimization and fixes (Mahesh)
- Display IPC (Isochronous Priority Control) support (Mahesh)
- GEM workaround fixes (Oscar)
- GVT: PCI config sanitize series (Changbin)
- GVT: Workload submission error handling series (Fred)
- PSR fixes and refactoring (Rodrigo)
- HWSP based optimizations (Chris)
- Private PAT management (Zhi)
- IRQ handling fixes and refactoring (Ville)
- Module parameter refactoring and variable name clash fix (Michal)
- Execlist refactoring, incomplete request unwinding on reset (Chris)
- GuC scheduling improvements (Michal)
- OA updates (Lionel)
- Coffeelake out of alpha support (Rodrigo)
- seqno fixes (Chris)
- Execlist refactoring (Mika)
- DP and DP MST cleanups (Dhinakaran)
- Cannonlake slice/sublice config (Ben)
- Numerous fixes all around (Everyone)
* tag 'drm-intel-next-2017-09-29' of git://anongit.freedesktop.org/drm/drm-intel: (168 commits)
drm/i915: Update DRIVER_DATE to 20170929
drm/i915: Use memset64() to prefill the GTT page
drm/i915: Also discard second CRC on gen8+ platforms.
drm/i915/psr: Set frames before SU entry for psr2
drm/dp: Add defines for latency in sink
drm/i915: Allow optimized platform checks
drm/i915: Avoid using dev_priv->info.gen directly.
i915: Use %pS printk format for direct addresses
drm/i915/execlists: Notify context-out for lost requests
drm/i915/cnl: Add support slice/subslice/eu configs
drm/i915: Compact device info access by a small re-ordering
drm/i915: Add IS_PLATFORM macro
drm/i915/selftests: Try to recover from a wedged GPU during reset tests
drm/i915/huc: Reorganize HuC authentication
drm/i915: Fix default values of some modparams
drm/i915: Extend I915_PARAMS_FOR_EACH with default member value
drm/i915: Make I915_PARAMS_FOR_EACH macro more flexible
drm/i915: Enable scanline read based on frame timestamps
drm/i915/execlists: Microoptimise execlists_cancel_port_request()
drm/i915: Don't rmw PIPESTAT enable bits
...
This will allow __drm_mode_object_file to be extended to perform
access control checks based on the file in use.
v2: Also fix up vboxvideo driver in staging
[airlied: merging early as this is an API change]
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.
This patch tries to address the locking abuse of stop_machine() from
commit 20e4933c47
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 22 14:41:21 2016 +0000
drm/i915: Stop the machine as we install the wedged submit_request handler
Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.
I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.
Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:
- Without stop_machine we must hold the corresponding spinlock.
- Without stop_machine we must ensure that all requests are marked as
having failed with dma_fence_set_error() before we call it. That
means we need to split the nop request submission into two phases,
both synchronized with rcu:
1. Only stop submitting the requests to hw and mark them as failed.
2. After all pending requests in the scheduler/ring are suitably
marked up as failed and we can force complete them all, also force
complete by calling intel_engine_init_global_seqno().
This should fix the followwing lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
but task is already holding lock:
(&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #6 (&dev->struct_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_interruptible_nested+0x1b/0x20
i915_mutex_lock_interruptible+0x51/0x130 [i915]
i915_gem_fault+0x209/0x650 [i915]
__do_fault+0x1e/0x80
__handle_mm_fault+0xa08/0xed0
handle_mm_fault+0x156/0x300
__do_page_fault+0x2c5/0x570
do_page_fault+0x28/0x250
page_fault+0x22/0x30
-> #5 (&mm->mmap_sem){++++}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__might_fault+0x68/0x90
_copy_to_user+0x23/0x70
filldir+0xa5/0x120
dcache_readdir+0xf9/0x170
iterate_dir+0x69/0x1a0
SyS_getdents+0xa5/0x140
entry_SYSCALL_64_fastpath+0x1c/0xb1
-> #4 (&sb->s_type->i_mutex_key#5){++++}:
down_write+0x3b/0x70
handle_create+0xcb/0x1e0
devtmpfsd+0x139/0x180
kthread+0x152/0x190
ret_from_fork+0x27/0x40
-> #3 ((complete)&req.done){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
wait_for_common+0x58/0x210
wait_for_completion+0x1d/0x20
devtmpfs_create_node+0x13d/0x160
device_add+0x5eb/0x620
device_create_groups_vargs+0xe0/0xf0
device_create+0x3a/0x40
msr_device_create+0x2b/0x40
cpuhp_invoke_callback+0xc9/0xbf0
cpuhp_thread_fun+0x17b/0x240
smpboot_thread_fn+0x18a/0x280
kthread+0x152/0x190
ret_from_fork+0x27/0x40
-> #2 (cpuhp_state-up){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpuhp_issue_call+0x133/0x1c0
__cpuhp_setup_state_cpuslocked+0x139/0x2a0
__cpuhp_setup_state+0x46/0x60
page_writeback_init+0x43/0x67
pagecache_init+0x3d/0x42
start_kernel+0x3a8/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb
-> #1 (cpuhp_state_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_nested+0x1b/0x20
__cpuhp_setup_state_cpuslocked+0x53/0x2a0
__cpuhp_setup_state+0x46/0x60
page_alloc_init+0x28/0x30
start_kernel+0x145/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x430/0x840
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpus_read_lock+0x3d/0xb0
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
i915_handle_error+0x2d8/0x430 [i915]
hangcheck_declare_hang+0xd3/0xf0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
ret_from_fork+0x27/0x40
other info that might help us debug this:
Chain exists of:
cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->struct_mutex);
lock(&mm->mmap_sem);
lock(&dev->struct_mutex);
lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
3 locks held by kworker/3:4/562:
#0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
dump_stack+0x68/0x9f
print_circular_bug+0x235/0x3c0
? lockdep_init_map_crosslock+0x20/0x20
check_prev_add+0x430/0x840
? irq_work_queue+0x86/0xe0
? wake_up_klogd+0x53/0x70
__lock_acquire+0x1420/0x15e0
? __lock_acquire+0x1420/0x15e0
? lockdep_init_map_crosslock+0x20/0x20
lock_acquire+0xb0/0x200
? stop_machine+0x1c/0x40
? i915_gem_object_truncate+0x50/0x50 [i915]
cpus_read_lock+0x3d/0xb0
? stop_machine+0x1c/0x40
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
? gen8_gt_irq_ack+0x170/0x170 [i915]
? work_on_cpu_safe+0x60/0x60
i915_handle_error+0x2d8/0x430 [i915]
? vsnprintf+0xd1/0x4b0
? scnprintf+0x3a/0x70
hangcheck_declare_hang+0xd3/0xf0 [i915]
? intel_runtime_pm_put+0x56/0xa0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
? process_one_work+0x660/0x660
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.
v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).
v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.
v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.
v6: Remove unused variable to appease CI.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch