In selftests/live_hangcheck, we have a lot of tests for resetting simple
spinners, but nothing quite prepared us for how the GPU reacted to
triggering a reset outside of the safe spinner. These two subtests fill
the ring with plain old empty, non-spinning requests, and then triggers
a reset. Without a user-payload to blame, these requests will exercise
the 'non-started' paths and mostly be replayed verbatim.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-4-chris@chris-wilson.co.uk
To determine whether an engine has 'stuck', we simply check whether or
not is still on the same seqno for several seconds. To keep this simple
mechanism intact over the loss of a global seqno, we can simply add a
new global heartbeat seqno instead. As we cannot know the sequence in
which requests will then be completed, we use a primitive random number
generator instead (with a cycle long enough to not matter over an
interval of a few thousand requests between hangcheck samples).
The alternative to using a dedicated seqno on every request is to issue
a heartbeat request and query its progress through the system. Sadly
this requires us to reduce struct_mutex so that we can issue requests
without requiring that bkl.
v2: And without the extra CS_STALL for the hangcheck seqno -- we don't
need strict serialisation with what comes later, we just need to be sure
we don't write the hangcheck seqno before our batch is flushed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-1-chris@chris-wilson.co.uk
When DPM for the specific clock is disabled, driver should still print out
current clock info for rocm-smi support on vega20
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Eric Huang <JinhuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently there is a small race window where we could manage to arm the
vblank event from atomic flush, but programming the hardware was too close
to the frame end, so the hardware will only apply the current state on the
next vblank. In this case we will send out the commit done event too early
causing userspace to reuse framebuffes that are still in use.
Instead of using the event arming mechnism, just remember the pending event
and send it from the vblank IRQ handler, once we are sure that all state
has been applied successfully.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: inverted logic: done -> pending, added back
spinlock in atomic_flush, commit message typo fix]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Since the TVE provides a clock to the DI, the driver can only be
compiled if the common clock framework is enabled. With the COMMON_CLK
dependency in place, it will be possible to allow building the other
parts of imx-drm under COMPILE_TEST on architectures that do not select
the common clock framework.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Add a zpos property to planes. Call drm_atomic_helper_check() instead of
calling drm_atomic_helper_check_modeset() and drm_atomic_check_planes()
manually. This effectively adds a call to drm_atomic_normalize_zpos()
before checking planes. Reorder atomic update to allow changing plane
zpos without modeset.
Note that the initial zpos is set in ipu_plane_state_reset(). The
initial value set in ipu_plane_init() is just for show. The zpos
parameter of drm_plane_create_zpos_property() is ignored because
the newly created plane do not have state yet.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Marius Vlad <marius.vlad@collabora.com>
This function allows upper layer to check if a requested atomic update
to the plane has been applied or is still pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: inverted logic: done -> pending]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This allows channels using the PRG to check if a requested configuration
update has been applied or is still pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: inverted logic: done -> pending]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This allows the upper layers to check if a double buffer update has
been applied by the PRE or is still pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
[p.zabel@pengutronix.de: inverted logic: done -> pending]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Depends on GEN family and I915_PARAM_HAS_CONTEXT_ISOLATION, Mesa driver
will decide whether constant buffer 0 address is relative or absolute,
and load GPU initial state by lri to context mmio INSTPM (GEN8)
or 0x20D8 (>=GEN9).
Mesa Commit fa8a764b62
("i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.")
INSTPM is already added to gen8_engine_mmio_list, but 0x20D8 is missed
in gen9_engine_mmio_list. From GVT point of view, different guest could
have different context so should switch those mmio accordingly.
v2: Update fixes commit ID.
Fixes: 1786571393 ("drm/i915/gvt: vGPU context switch")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Fixes for 5.1:
amdgpu:
- Fix missing fw declaration after dropping old CI DPM code
- Fix debugfs access to registers beyond the MMIO bar size
- Fix context priority handling
- Add missing license on some new files
- Various cleanups and bug fixes
radeon:
- Fix missing break in CS parser for evergreen
- Various cleanups and bug fixes
sched:
- Fix entities with 0 run queues
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com
A bit bigger than normal for this week due to fixes for some long
standing display issues that are bound for stable. These changes would
be going to stable anyway, so I figured it was better via 5.0 than 5.1.
- Several display fixes
- Fix PX systems due to core changes in runtime pm
- Disable bulk moves. They are fixed in 5.1, but fix is too invasive for 5.0
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220225715.3240-1-alexander.deucher@amd.com
Annoyingly, struct_mutex was not entirely eliminated from the reset
pathway; for reasons of its own, intel_display_resume() requires
struct_mutex to prepare the planes it already captured. To avoid the
immediate problem of a deadlock between the struct_mutex and the reset
srcu, we have to acquire the reset_lock before struct_mutex in
i915_gem_fault(). Now any wait underneath struct_mutex will result us in
having to forcibly reset all inflight rendering, less than ideal, but
better than a deadlock (and will do for the short term).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221102924.13442-1-chris@chris-wilson.co.uk
If userspace has open fd(s) when drm_dev_unplug() is run, it will result
in drm_dev_unregister() being called twice. First in drm_dev_unplug() and
then later in drm_release() through the call to drm_put_dev().
Since userspace already holds a ref on drm_device through the drm_minor,
it's not necessary to add extra ref counting based on no open file
handles. Instead just drm_dev_put() unconditionally in drm_dev_unplug().
We now have this:
- Userpace holds a ref on drm_device as long as there's open fd(s)
- The driver holds a ref on drm_device as long as it's bound to the
struct device
When both sides are done with drm_device, it is released.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208140103.28919-2-noralf@tronnes.org
When MI_FLUSH_DW post write hw status page in index mode, the index
value is in dword step and turned into address offset in cmd dword1.
As status page size is 4K, so can't exceed that.
This fixed upper bound check in cmd parser code which incorrectly
stopped VM for reason of invalid MI_FLUSH_DW write index.
v2:
- Fix upper bound as 4K page size because index value is address offset.
Fixes: be1da7070a ("drm/i915/gvt: vGPU command scanner")
Cc: stable@vger.kernel.org # v4.10+
Cc: "Zhao, Yan Y" <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Introduce a new ABI method for detecting a wedged driver by reporting
-EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.
This came up in considering how to handle context recovery from
userspace. There we wish to create a new context after the original is
banned (the clients opts into the no recovery after reset strategy) in
order to rebuild the mesa context from scratch. In doing so, if the
device was wedged and not the context banned, we would fall into a loop
of permanently trying to recreate the context and never making forward
progress. This patch would inform the client that we are no longer able
to create a context, and the client would have no choice but to abort
(or at least inform its callers about the lost device for anv).
References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215469.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220225556.28715-1-chris@chris-wilson.co.uk