With some VRR panels, user can turn VRR ON/OFF on the fly from the panel settings.
When VRR is turned OFF ,sends a long HPD to the driver clearing the Ignore MSA bit
in the DPCD. Currently the driver parses that onevery HPD but fails to reset
the corresponding VRR Capable Connector property.
Hence the userspace still sees this as VRR Capable panel which is incorrect.
Fix this by explicitly resetting the connector property.
v2: Reset vrr capable if status == connector_disconnected
v3: Use i915 and use bool vrr_capable (Jani Nikula)
v4: Move vrr_capable to after update modes call (Jani N)
Remove the redundant comment (Jan N)
v5: Fixes the regression on older platforms by resetting the VRR
only if HAS_VRR
v6: Remove the checks from driver, add in drm core before
setting VRR prop (Ville)
v7: Move VRR set/reset to set/unset_edid (Ville)
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: 9bc34b4d0f ("drm/i915/display/vrr: Reset VRR capable property on a long hpd")
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220303233222.4698-1-manasi.d.navare@intel.com
(cherry picked from commit d999ad1079)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The intent of the version check in the mmap ioctl was to maintain
support for existing platforms (i.e., ADL/RPL and earlier), but drop
support on all future igpu platforms. As we've seen on the dgpu side,
the hardware teams are using a more fine-grained numbering system for IP
version numbers these days, so it's possible the version number
associated with our next igpu could be some form of "12.xx" rather than
13 or higher. Comparing against the full ver.release number will ensure
the intent of the check is maintained no matter what numbering the
hardware teams settle on.
Fixes: d3f3baa356 ("drm/i915: Reinstate the mmap ioctl for some platforms")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407161839.1073443-1-matthew.d.roper@intel.com
(cherry picked from commit 8e7e5c077c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Pull drm fixes from Dave Airlie:
"Some fixes were queued up in and in light of the fbdev regressions,
I've pulled those in as well.
core:
- Make audio and color plane support checking only happen when a CEA
extension block is found.
- Small selftest fix.
fbdev:
- two regressions fixes from speedup patches.
ttm:
- Fix a small regression from ttm_resource_fini()
i915:
- Reject unsupported TMDS rates on ICL+
- Treat SAGV block time 0 as SAGV disabled
- Fix PSF GV point mask when SAGV is not possible
- Fix renamed INTEL_INFO->media.arch/ver field"
* tag 'drm-next-2022-03-25' of git://anongit.freedesktop.org/drm/drm:
fbdev: Fix cfb_imageblit() for arbitrary image widths
fbdev: Fix sys_imageblit() for arbitrary image widths
drm/edid: fix CEA extension byte #3 parsing
drm/edid: check basic audio support on CEA extension block
drm/i915: Fix renamed struct field
drm/i915: Fix PSF GV point mask when SAGV is not possible
drm/i915: Treat SAGV block time 0 as SAGV disabled
drm/i915: Reject unsupported TMDS rates on ICL+
drm/selftest: plane_helper: Put test structures in static storage
drm/ttm: Fix a kernel oops due to an invalid read
Pull Kbuild update for C11 language base from Masahiro Yamada:
"Kbuild -std=gnu11 updates for v5.18
Linus pointed out the benefits of C99 some years ago, especially
variable declarations in loops [1]. At that time, we were not ready
for the migration due to old compilers.
Recently, Jakob Koschel reported a bug in list_for_each_entry(), which
leaks the invalid pointer out of the loop [2]. In the discussion, we
agreed that the time had come. Now that GCC 5.1 is the minimum
compiler version, there is nothing to prevent us from going to
-std=gnu99, or even straight to -std=gnu11.
Discussions for a better list iterator implementation are ongoing, but
this patch set must land first"
[1] https://lore.kernel.org/all/CAHk-=wgr12JkKmRd21qh-se-_Gs69kbPgR9x4C+Es-yJV2GLkA@mail.gmail.com/
[2] https://lore.kernel.org/lkml/86C4CE7D-6D93-456B-AA82-F8ADEACA40B7@gmail.com/
* tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Kbuild: use -std=gnu11 for KBUILD_USERCFLAGS
Kbuild: move to -std=gnu11
Kbuild: use -Wdeclaration-after-statement
Kbuild: add -Wno-shift-negative-value where -Wextra is used
Pull drm updates from Dave Airlie:
"Lots of work all over, Intel improving DG2 support, amdkfd CRIU
support, msm new hw support, and faster fbdev support.
dma-buf:
- rename dma-buf-map to iosys-map
core:
- move buddy allocator to core
- add pci/platform init macros
- improve EDID parser deep color handling
- EDID timing type 7 support
- add GPD Win Max quirk
- add yes/no helpers to string_helpers
- flatten syncobj chains
- add nomodeset support to lots of drivers
- improve fb-helper clipping support
- add default property value interface
fbdev:
- improve fbdev ops speed
ttm:
- add a backpointer from ttm bo->ttm resource
dp:
- move displayport headers
- add a dp helper module
bridge:
- anx7625 atomic support, HDCP support
panel:
- split out panel-lvds and lvds bindings
- find panels in OF subnodes
privacy:
- add chromeos privacy screen support
fb:
- hot unplug fw fb on forced removal
simpledrm:
- request region instead of marking ioresource busy
- add panel oreintation property
udmabuf:
- fix oops with 0 pages
amdgpu:
- power management code cleanup
- Enable freesync video mode by default
- RAS code cleanup
- Improve VRAM access for debug using SDMA
- SR-IOV rework special register access and fixes
- profiling power state request ioctl
- expose IP discovery via sysfs
- Cyan skillfish updates
- GC 10.3.7, SDMA 5.2.7, DCN 3.1.6 updates
- expose benchmark tests via debugfs
- add module param to disable XGMI for testing
- GPU reset debugfs register dumping support
amdkfd:
- CRIU support
- SDMA queue fixes
radeon:
- UVD suspend fix
- iMac backlight fix
i915:
- minimal parallel submission for execlists
- DG2-G12 subplatform added
- DG2 programming workarounds
- DG2 accelerated migration support
- flat CCS and CCS engine support for XeHP
- initial small BAR support
- drop fake LMEM support
- ADL-N PCH support
- bigjoiner updates
- introduce VMA resources and async unbinding
- register definitions cleanups
- multi-FBC refactoring
- DG1 OPROM over SPI support
- ADL-N platform enabling
- opregion mailbox #5 support
- DP MST ESI improvements
- drm device based logging
- async flip optimisation for DG2
- CPU arch abstraction fixes
- improve GuC ADS init to work on aarch64
- tweak TTM LRU priority hint
- GuC 69.0.3 support
- remove short term execbuf pins
nouveau:
- higher DP/eDP bitrates
- backlight fixes
msm:
- dpu + dp support for sc8180x
- dp support for sm8350
- dpu + dsi support for qcm2290
- 10nm dsi phy tuning support
- bridge support for dp encoder
- gpu support for additional 7c3 SKUs
ingenic:
- HDMI support for JZ4780
- aux channel EDID support
ast:
- AST2600 support
- add wide screen support
- create DP/DVI connectors
omapdrm:
- fix implicit dma_buf fencing
vc4:
- add CSC + full range support
- better display firmware handoff
panfrost:
- add initial dual-core GPU support
stm:
- new revision support
- fb handover support
mediatek:
- transfer display binding document to yaml format.
- add mt8195 display device binding.
- allow commands to be sent during video mode.
- add wait_for_event for crtc disable by cmdq.
tegra:
- YUV format support
rcar-du:
- LVDS support for M3-W+ (R8A77961)
exynos:
- BGR pixel format for FIMD device"
* tag 'drm-next-2022-03-24' of git://anongit.freedesktop.org/drm/drm: (1529 commits)
drm/i915/display: Do not re-enable PSR after it was marked as not reliable
drm/i915/display: Fix HPD short pulse handling for eDP
drm/amdgpu: Use drm_mode_copy()
drm/radeon: Use drm_mode_copy()
drm/amdgpu: Use ternary operator in `vcn_v1_0_start()`
drm/amdgpu: Remove pointless on stack mode copies
drm/amd/pm: fix indenting in __smu_cmn_reg_print_error()
drm/amdgpu/dc: fix typos in comments
drm/amdgpu: fix typos in comments
drm/amd/pm: fix typos in comments
drm/amdgpu: Add stolen reserved memory for MI25 SRIOV.
drm/amdgpu: Merge get_reserved_allocation to get_vbios_allocations.
drm/amdkfd: evict svm bo worker handle error
drm/amdgpu/vcn: fix vcn ring test failure in igt reload test
drm/amdgpu: only allow secure submission on rings which support that
drm/amdgpu: fixed the warnings reported by kernel test robot
drm/amd/display: 3.2.177
drm/amd/display: [FW Promotion] Release 0.0.108.0
drm/amd/display: Add save/restore PANEL_PWRSEQ_REF_DIV2
drm/amd/display: Wait for hubp read line for Pollock
...
Pull flexible-array transformations from Gustavo Silva:
"Treewide patch that replaces zero-length arrays with flexible-array
members.
This has been baking in linux-next for a whole development cycle"
* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: Replace zero-length arrays with flexible-array members
For modern platforms the spec explicitly states that a
SAGV block time of zero means that SAGV is not supported.
Let's extend that to all platforms. Supposedly there should
be no systems where this isn't true, and it'll allow us to:
- use the same code regardless of older vs. newer platform
- wm latencies already treat 0 as disabled, so this fits well
with other related code
- make it a bit more clear when SAGV is used vs. not
- avoid overflows from adding U32_MAX with a u16 wm0 latency value
which could cause us to miscalculate the SAGV watermarks on tgl+
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220309164948.10671-2-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit d8f5855b31)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Commit 13ea6db2cf ("drm/i915/edp: Ignore short pulse when panel
powered off") completely broke short pulse handling for eDP as it is
usually generated by sink when it is displaying image and there is
some error or status that source needs to handle.
When power panel is enabled, this state is enough to power aux
transactions and VDD override is disabled, so intel_pps_have_power()
is always returning false causing short pulses to be ignored.
So here better naming this function that intends to check if aux
lines are powered to avoid the endless cycle mentioned in the commit
being fixed and fixing the check for what it is intended.
v2:
- renamed to intel_pps_have_panel_power_or_vdd()
- fixed indentation
Fixes: 13ea6db2cf ("drm/i915/edp: Ignore short pulse when panel powered off")
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220311185149.110527-1-jose.souza@intel.com
(cherry picked from commit 8f0c1c0949)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As a preparation for moving to -std=gnu11, turn off the
-Wshift-negative-value option. This warning is enabled by gcc when
building with -Wextra for c99 or higher, but not for c89. Since
the kernel already relies on well-defined overflow behavior,
the warning is not helpful and can simply be disabled in
all locations that use -Wextra.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 (x86-64)
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The current implementation of the async flip wm0/ddb optimization
does not work at all. The biggest problem is that we skip the
whole intel_pipe_update_{start,end}() dance and thus never actually
complete the commit that is trying to do the wm/ddb change.
To fix this we need to move the do_async_flip flag to the crtc
state since we handle commits per-pipe, not per-plane.
Also since all planes can now be included in the first/last
"async flip" (which gets converted to a sync flip to do the
wm/ddb mangling) we need to be more careful when checking if
the plane state is async flip comptatible. Only planes doing
the async flip should be checked and other planes are perfectly
fine not adhereing to any async flip related limitations.
However for subsequent commits which are actually going do the
async flip in hardware we want to make sure no other planes
are in the state. That should never happen assuming we did our
job correctly, so we'll toss in a WARN to make sure we catch
any bugs here.
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Fixes: c3639f3be4 ("drm/i915: Use wm0 only during async flips for DG2")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220214105532.13049-4-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit 2e08437160)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since the async flip state check is done very late and
thus it can see potentially all the planes in the state
(due to the wm/ddb optimization) we need to move the
"can the requested plane do async flips at all?" check
much earlier. For this purpose we introduce
intel_async_flip_check_uapi() that gets called early during
the atomic check.
And for good measure we'll throw in a couple of basic checks:
- is the crtc active?
- was a modeset flagged?
- is+was the plane enabled?
Though atm all of those should be guaranteed by the fact
that the async flip can only be requested through the legacy
page flip ioctl.
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Fixes: c3639f3be4 ("drm/i915: Use wm0 only during async flips for DG2")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220214105532.13049-3-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit b0b2bed2a1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently we are observing occasional screen flickering when
PSR2 selective fetch is enabled. More specifically glitch seems
to happen on full frame update when cursor moves to coords
x = -1 or y = -1.
According to Bspec SF Single full frame should not be set if
SF Partial Frame Enable is not set. This happened to be true for
ADLP as PSR2_MAN_TRK_CTL_ENABLE is always set and for ADL_P it's
actually "SF Partial Frame Enable" (Bit 31).
Setting "SF Partial Frame Enable" bit also on full update seems to
fix screen flickering.
Also make code more clear by setting PSR2_MAN_TRK_CTL_ENABLE
only if not on ADL_P. Bit 31 has different meaning in ADL_P.
Bspec: 49274
v2: Fix Mihai Harpau email address
v3: Modify commit message and remove unnecessary comment
Tested-by: Lyude Paul <lyude@redhat.com>
Fixes: 7f6002e580 ("drm/i915/display: Enable PSR2 selective fetch by default")
Reported-by: Lyude Paul <lyude@redhat.com>
Cc: Mihai Harpau <mharpau@gmail.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/5077
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225070228.855138-1-jouni.hogander@intel.com
(cherry picked from commit 8d5516d18b)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cross-subsystem Changes:
- drm-next backmerge for buddy allocator changes
Driver Changes:
- Skip i915_perf init for DG2 as it is not yet enabled (Ram)
- Add missing workarounds for DG2 (Clint)
- Add 64K page/align support for platforms like DG2 that require it (Matt A, Ram, Bob)
- Add accelerated migration support for DG2 (Matt A)
- Add flat CCS support for XeHP SDV (Abdiel, Ram)
- Add Compute Command Streamer (CCS) engine support for XeHP SDV (Michel,
Daniele, Aravind, Matt R)
- Don't support parallel submission on compute / render (Matt B, Matt R)
- Disable i915 build on PREEMPT_RT until RT behaviour fixed (Sebastian)
- Remove RPS interrupt support for TGL+ (Jose)
- Fix S/R with PM_EARLY for non-GTT mappable objects on DG2 (Matt, Lucas)
- Skip stolen memory init if it is fully reserved (Jose)
- Use iosys_map for GuC data structures that may be in LMEM BAR or SMEM (Lucas)
- Do not complain about stale GuC reset notifications for banned contexts (John)
- Move context descriptor fields to intel_lrc.h
- Start adding support for small BAR (Matt A)
- Clarify vma lifetime (Thomas)
- Simplify subplatform detection on TGL (Jose)
- Correct the param count for unset GuC SLPC param (Vinay, Umesh)
- Read RP_STATE_CAP correctly on Gen12 with GuC SLPC (Vinay)
- Initialize GuC submission locks and queues early (Daniele)
- Fix GuC flag query helper function to not modify state (John)
- Drop fake lmem support now we have real hardware available (Lucas)
- Move misplaced W/A to their correct locations (Srinivasan)
- Use get_reset_domain() helper (Tejas)
- Move context descriptor fields to intel_lrc.h (Matt R)
- Selftest improvements (Matt A)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YiBzY1dM7bKwMQ3H@jlahtine-mobl.ger.corp.intel.com
Additional workarounds are required once we start exposing CCS engines.
Note that we have a number of workarounds that update registers in the
shared render/compute reset domain. Historically we've just added such
registers to the RCS engine's workaround list. But going forward we
should be more careful to place such workarounds on a wa_list for an
engine that definitely exists and is not fused off (e.g., a platform
with no RCS would never apply the RCS wa_list). We'll keep
rcs_engine_wa_init() focused on RCS-specific workarounds that only need
to be applied if the RCS engine is present. A separate
general_render_compute_wa_init() function will be used to define
workarounds that touch registers in the shared render/compute reset
domain and that we need to apply regardless of what render and/or
compute engines actually exist. Any workarounds defined in this new
function will internally be added to the first present RCS or CCS
engine's workaround list to ensure they get applied (and only get
applied once rather than being needlessly re-applied several times).
Co-author: Srinivasan Shanmugam
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-13-matthew.d.roper@intel.com
HW resources are divided across the active CCS engines at the compute
slice level, with each CCS having priority on one of the cslices.
If a compute slice has no enabled DSS, its paired compute engine is not
usable in full parallel execution because the other ones already fully
saturate the HW, so consider it fused off.
v2 (José):
- moved it to its own function
- fixed definition of ccs_mask
v3 (Matt):
- Replace fls() condition with a simple IP version test
v4 (Matt):
- Don't try to calculate a ccs_mask using
intel_slicemask_from_dssmask() until we've determined that we're
running on an Xe_HP platform where the logic makes sense (and won't
overflow).
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302052008.1884985-1-matthew.d.roper@intel.com
There are a few sections in the driver which are not compatible with
PREEMPT_RT. They trigger warnings and can lead to deadlocks at runtime.
Disable the i915 driver on a PREEMPT_RT enabled kernel. This way
PREEMPT_RT itself can be enabled without needing to address the i915
issues first. The RT related patches are still in RT queue and will be
handled later.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YgqmfKhwU5spS069@linutronix.de
It is possible for reset notifications to arrive for a context that is
in the process of being banned. So don't flag these as an error, just
report it as informational (because it is still useful to know that
resets are happening even if they are being ignored).
v2: Better wording for the message (review feedback from Tvrtko).
v3: Fix rebase issue (review feedback from Daniele).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225015232.1939497-1-John.C.Harrison@Intel.com
A flag query helper was actually writing to the flags word rather than
just reading. Fix that. Also update the function's comment as it was
out of date.
NB: No need for a 'Fixes' tag. The test was only ever used inside a
BUG_ON during context registration. Rather than asserting that the
condition was true, it was making the condition true. So, in theory,
there was no consequence because we should never have hit a BUG_ON
anyway. Which means the write should always have been a no-op.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220217212942.629922-1-John.C.Harrison@Intel.com
It's unclear what reference the initial vma kref reference refers to.
A vma can have multiple weak references, the object vma list,
the vm's bound list and the GT's closed_list, and the initial vma
reference can be put from lookups of all these lists.
With the current implementation this means
that any holder of yet another vma refcount (currently only
i915_gem_object_unbind()) needs to be holding two of either
*) An object refcount,
*) A vm open count
*) A vma open count
in order for us to not risk leaking a reference by having the
initial vma reference being put twice.
Address this by re-introducing i915_vma_destroy() which removes all
weak references of the vma and *then* puts the initial vma refcount.
This makes a strong vma reference hold on to the vma unconditionally.
Perhaps a better name would be i915_vma_revoke() or i915_vma_zombify(),
since other callers may still hold a refcount, but with the prospect of
being able to replace the vma refcount with the object lock in the near
future, let's stick with i915_vma_destroy().
Finally this commit fixes a race in that previously i915_vma_release() and
now i915_vma_destroy() could destroy a vma without taking the vm->mutex
after an advisory check that the vma mm_node was not allocated.
This would race with the ungrab_vma() function creating a trace similar
to the below one. This was fixed in one of the __i915_vma_put() callsites
in
commit bc1922e5d3 ("drm/i915: Fix a race between vma / object destruction and unbinding")
but although not seemingly triggered by CI, that
is not sufficient. This patch is needed to fix that properly.
[823.012188] Console: switching to colour dummy device 80x25
[823.012422] [IGT] gem_ppgtt: executing
[823.016667] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
[852.436465] stack segment: 0000 [#1] PREEMPT SMP NOPTI
[852.436480] CPU: 0 PID: 3200 Comm: gem_ppgtt Not tainted 5.16.0-CI-CI_DRM_11115+ #1
[852.436489] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021
[852.436499] RIP: 0010:ungrab_vma+0x9/0x80 [i915]
[852.436711] Code: ef e8 4b 85 cf e0 e8 36 a3 d6 e0 8b 83 f8 9c 00 00 85 c0 75 e1 5b 5d 41 5c 41 5d c3 e9 d6 fd 14 00 55 53 48 8b af c0 00 00 00 <8b> 45 00 85 c0 75 03 5b 5d c3 48 8b 85 a0 02 00 00 48 89 fb 48 8b
[852.436727] RSP: 0018:ffffc90006db7880 EFLAGS: 00010246
[852.436734] RAX: 0000000000000000 RBX: ffffc90006db7598 RCX: 0000000000000000
[852.436742] RDX: ffff88815349e898 RSI: ffff88815349e858 RDI: ffff88810a284140
[852.436748] RBP: 6b6b6b6b6b6b6b6b R08: ffff88815349e898 R09: ffff88815349e8e8
[852.436754] R10: 0000000000000001 R11: 0000000051ef1141 R12: ffff88810a284140
[852.436762] R13: 0000000000000000 R14: ffff88815349e868 R15: ffff88810a284458
[852.436770] FS: 00007f5c04b04e40(0000) GS:ffff88849f000000(0000) knlGS:0000000000000000
[852.436781] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[852.436788] CR2: 00007f5c04b38fe0 CR3: 000000010a6e8001 CR4: 0000000000770ef0
[852.436797] PKRU: 55555554
[852.436801] Call Trace:
[852.436806] <TASK>
[852.436811] i915_gem_evict_for_node+0x33c/0x3c0 [i915]
[852.437014] i915_gem_gtt_reserve+0x106/0x130 [i915]
[852.437211] i915_vma_pin_ww+0x8f4/0xb60 [i915]
[852.437412] eb_validate_vmas+0x688/0x860 [i915]
[852.437596] i915_gem_do_execbuffer+0xc0e/0x25b0 [i915]
[852.437770] ? deactivate_slab+0x5f2/0x7d0
[852.437778] ? _raw_spin_unlock_irqrestore+0x50/0x60
[852.437789] ? i915_gem_execbuffer2_ioctl+0xc6/0x2c0 [i915]
[852.437944] ? init_object+0x49/0x80
[852.437950] ? __lock_acquire+0x5e6/0x2580
[852.437963] i915_gem_execbuffer2_ioctl+0x116/0x2c0 [i915]
[852.438129] ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
[852.438300] drm_ioctl_kernel+0xac/0x140
[852.438310] drm_ioctl+0x201/0x3d0
[852.438316] ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
[852.438490] __x64_sys_ioctl+0x6a/0xa0
[852.438498] do_syscall_64+0x37/0xb0
[852.438507] entry_SYSCALL_64_after_hwframe+0x44/0xae
[852.438515] RIP: 0033:0x7f5c0415b317
[852.438523] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[852.438542] RSP: 002b:00007ffd765039a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[852.438553] RAX: ffffffffffffffda RBX: 000055e4d7829dd0 RCX: 00007f5c0415b317
[852.438562] RDX: 00007ffd76503a00 RSI: 00000000c0406469 RDI: 0000000000000017
[852.438571] RBP: 00007ffd76503a00 R08: 0000000000000000 R09: 0000000000000081
[852.438579] R10: 00000000ffffff7f R11: 0000000000000246 R12: 00000000c0406469
[852.438587] R13: 0000000000000017 R14: 00007ffd76503a00 R15: 0000000000000000
[852.438598] </TASK>
[852.438602] Modules linked in: snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg drm_buddy coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ttm ghash_clmulni_intel snd_hwdep snd_hda_core e1000e drm_dp_helper ptp snd_pcm mei_me drm_kms_helper pps_core mei syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii
[852.440310] ---[ end trace e52cdd2fe4fd911c ]---
v2: Fix typos in the commit message.
Fixes: 7e00897be8 ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
Fixes: bc1922e5d3 ("drm/i915: Fix a race between vma / object destruction and unbinding")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220222133209.587978-1-thomas.hellstrom@linux.intel.com