When trying to bring IS_ACTIVE to linux/kconfig.h I thought it wouldn't
provide much value just encapsulating it in a boolean context. So I also
added the support for handling undefined macros as the IS_ENABLED()
counterpart. However the feedback received from Masahiro Yamada was that
it is too ugly, not providing much value. And just wrapping in a boolean
context is too dumb - we could simply open code it.
As detailed in commit babaab2f47 ("drm/i915: Encapsulate kconfig
constant values inside boolean predicates"), the IS_ACTIVE macro was
added to workaround a compilation warning. However after checking again
our current uses of IS_ACTIVE it turned out there is only
1 case in which it triggers a warning in clang (due
-Wconstant-logical-operand) and 2 in smatch. All the others
can simply use the shorter version, without wrapping it in any macro.
So here I'm dialing all the way back to simply removing the macro. That
single case hit by clang can be changed to make the constant come first,
so it doesn't think it's mask:
- if (context && CONFIG_DRM_I915_FENCE_TIMEOUT)
+ if (CONFIG_DRM_I915_FENCE_TIMEOUT && context)
As talked with Dan Carpenter, that logic will be added in smatch as
well, so it will also stop warning about it.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211005171728.3147094-1-lucas.demarchi@intel.com
Add support to enable/disable PLANE_SURF Decryption Request bit.
It requires only to enable plane decryption support when following
condition met.
1. PXP session is enabled.
2. Buffer object is protected.
v2:
- Used gen fb obj user_flags instead gem_object_metadata. [Krishna]
v3:
- intel_pxp_gem_object_status() API changes.
v4: use intel_pxp_is_active (Daniele)
v5: rebase and use the new protected object status checker (Daniele)
v6: used plane state for plane_decryption to handle async flip
as suggested by Ville.
v7: check pxp session while plane decrypt state computation. [Ville]
removed pointless code. [Ville]
v8 (Daniele): update PXP check
v9: move decrypt check after icl_check_nv12_planes() when overlays
have fb set (Juston)
v10 (Daniele): update PXP check again to match rework in earlier
patches and don't consider protection valid if the object has not
been used in an execbuf beforehand.
Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Huang Sean Z <sean.z.huang@intel.com>
Cc: Gaurav Kumar <kumar.gaurav@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Juston Li <juston.li@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com> #v9
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210924191452.1539378-14-alan.previn.teres.alexis@intel.com
This api allow user mode to create protected buffers and to mark
contexts as making use of such objects. Only when using contexts
marked in such a way is the execution guaranteed to work as expected.
Contexts can only be marked as using protected content at creation time
(i.e. the parameter is immutable) and they must be both bannable and not
recoverable. Given that the protected session gets invalidated on
suspend, contexts created this way hold a runtime pm wakeref until
they're either destroyed or invalidated.
All protected objects and contexts will be considered invalid when the
PXP session is destroyed and all new submissions using them will be
rejected. All intel contexts within the invalidated gem contexts will be
marked banned. Userspace can detect that an invalidation has occurred via
the RESET_STATS ioctl, where we report it the same way as a ban due to a
hang.
v5: squash patches, rebase on proto_ctx, update kerneldoc
v6: rebase on obj create_ext changes
v7: Use session counter to check if an object it valid, hold wakeref in
context, don't add a new flag to RESET_STATS (Daniel)
v8: don't increase guilty count for contexts banned during pxp
invalidation (Rodrigo)
v9: better comments, avoid wakeref put race between pxp_inval and
context_close, add usage examples (Rodrigo)
v10: modify internal set/get-protected-context functions to not
return -ENODEV when setting PXP param to false or getting param
when running on pxp-unsupported hw or getting param when i915
was built with CONFIG_PXP off
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210924191452.1539378-11-alan.previn.teres.alexis@intel.com
We may end up in i915_ttm_bo_destroy() in an error path before the
object is fully initialized. In that case it's not correct to call
__i915_gem_free_object(), because that function
a) Assumes the gem object refcount is 0, which it isn't.
b) frees the placements which are owned by the caller until the
init_object() region ops returns successfully. Fix this by providing
a lightweight cleanup function __i915_gem_object_fini() which is also
called by __i915_gem_free_object().
While doing this, also make sure we call dma_resv_fini() as part of
ordinary object destruction and not from the RCU callback that frees
the object. This will help track down bugs where the object is incorrectly
locked from an RCU lookup.
Finally, make sure the object isn't put on the region list until it's
either locked or fully initialized in order to block list processing of
partially initialized objects.
v2:
- The TTM object backend memory was freed before the gem pages were
put. Separate this functionality into __i915_gem_object_pages_fini()
and call it from the TTM delete_mem_notify() callback.
v3:
- Include i915_gem_object_free_mmaps() in __i915_gem_object_pages_fini()
to make sure we don't inadvertedly introduce a race.
Fixes: 48b0961269 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20210930113236.583531-1-thomas.hellstrom@linux.intel.com
We really only need memcpy restore for objects that affect the
operability of the migrate context. That is, primarily the page-table
objects of the migrate VM.
Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early
restores using memcpy and a way to assign LMEM page-table object flags
to be used by the vms.
Restore objects without this flag with the gpu blitter and only objects
carrying the flag using TTM memcpy.
Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and
defer for a later audit which vms actually need it. Most importantly, user-
allocated vms with pinned page-table objects can be restored using the
blitter.
Performance-wise memcpy restore is probably as fast as gpu restore if not
faster, but using gpu restore will help tackling future restrictions in
mappable LMEM size.
v4:
- Don't mark the aliasing ppgtt page table flags for early resume, but
rather the ggtt page table flags as intended. (Matthew Auld)
- The check for user buffer objects during early resume is pointless, since
they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld)
v5:
- Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored
before we fire up the migrate context.
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com
Just evict unpinned objects to system. For pinned LMEM objects,
make a backup system object and blit the contents to that.
Backup is performed in three steps,
1: Opportunistically evict evictable objects using the gpu blitter.
2: After gt idle, evict evictable objects using the gpu blitter. This will
be modified in an upcoming patch to backup pinned objects that are not used
by the blitter itself.
3: Backup remaining pinned objects using memcpy.
Also move uC suspend to after 2) to make sure we have a functional GuC
during 2) if using GuC submission.
v2:
- Major refactor to make sure gem_exec_suspend@hang-SX subtests work, and
suspend / resume works with a slightly modified GuC submission enabling
patch series.
v3:
- Fix a potential use-after-free (Matthew Auld)
- Use i915_gem_object_create_shmem() instead of
i915_gem_object_create_region (Matthew Auld)
- Minor simplifications (Matthew Auld)
- Fix up kerneldoc for i195_ttm_restore_region().
- Final lmem_suspend() call moved to i915_gem_backup_suspend from
i915_gem_suspend_late, since the latter gets called at driver unload
and we don't unnecessarily want to run it at that time.
v4:
- Interface change of ttm- & lmem suspend / resume functions to use
flags rather than bools. (Matthew Auld)
- Completely drop the i915_gem_backup_suspend change (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-5-thomas.hellstrom@linux.intel.com
An upcoming common pattern is to traverse the region object list and
perform certain actions on all objects in a region. It's a little tricky
to get the list locking right, in particular since a gem object may
change region unless it's pinned or the object lock is held.
Define a function that does this for us and that takes an argument that
defines the action to be performed on each object.
v3:
- Improve structure documentation a bit (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-3-thomas.hellstrom@linux.intel.com
Close the divergence which has caused patches not to apply and
have a solid baseline for the PXP patches that Rodrigo will send
a topic branch PR for.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Clang warns:
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:127:13: warning:
variable 'err' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
} else if (PTR_ERR(import) != -EOPNOTSUPP) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:138:9: note:
uninitialized use occurs here
return err;
^~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:127:9: note: remove
the 'if' if its condition is always true
} else if (PTR_ERR(import) != -EOPNOTSUPP) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:95:9: note:
initialize the variable 'err' to silence this warning
int err;
^
= 0
The test is expected to pass if i915_gem_prime_import() returns
-EOPNOTSUPP so initialize err to zero in this case.
Fixes: cdb35d1ed6 ("drm/i915/gem: Migrate to system at dma-buf attach time (v7)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-3-nathan@kernel.org
Clang warns a couple of times:
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:63:6: warning:
variable 'import_obj' is used uninitialized whenever 'if' condition is
true [-Wsometimes-uninitialized]
if (import != &obj->base) {
^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:80:22: note:
uninitialized use occurs here
i915_gem_object_put(import_obj);
^~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:63:2: note: remove
the 'if' if its condition is always false
if (import != &obj->base) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c:38:46: note:
initialize the variable 'import_obj' to silence this warning
struct drm_i915_gem_object *obj, *import_obj;
^
= NULL
Shuffle the import_obj initialization above these if statements so that
it is not used uninitialized.
Fixes: d7b2cb380b ("drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210824225427.2065517-2-nathan@kernel.org
Usage of Transparent Hugepages was disabled in 9987da4b5d
("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
appears majority of performance regressions reported with an enabled IOMMU
can be almost eliminated by turning them on, lets just do that.
To err on the side of safety we keep the current default in cases where
IOMMU is not active, and only when it is default to the "huge=within_size"
mode. Although there probably would be wins to enable them throughout,
more extensive testing across benchmarks and platforms would need to be
done.
With the patch and IOMMU enabled my local testing on a small Skylake part
shows OglVSTangent regression being reduced from ~14% (IOMMU on versus
IOMMU off) to ~2% (same comparison but with THP on).
More detailed testing done in the below referenced Gitlab issue by Eero:
Skylake GT4e:
Performance drops from enabling IOMMU:
30-35% SynMark CSDof
20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent
~20% GLB Egypt (1/2 screen window)
10-15% GLB T-Rex (1/2 screen window)
8-10% GfxBench T-Rex, MemBW GPU blit
7-8% SynMark DeferredAA + TerrainFly* + ZBuffer
6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth
5-6% GfxBench CarChase, Unigine Valley
3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture,
SynMark Fill*, Deferred, TerrainPan*
1-2% Most of the other tests
With the patch drops become:
20-25% SynMark TexMem*
15-20% GLB Egypt (1/2 screen window)
10-15% GLB T-Rex (1/2 screen window)
4-7% GfxBench T-Rex, GpuTest Triangle
1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%)
3% GfxBench Manhattan 3.0, SynMark CSDof
2-3% Unigine Heaven + Valley, MemBW GPU texture
1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins
Broxton:
Performance drops from IOMMU, without patch:
30% MemBW GPU write
25% SynMark ZBuffer + Fill*
20% MemBW GPU blit
15% MemBW GPU blend, GpuTest Triangle
10-15% MemBW GPU texture
10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly*
7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex,
SynMark Deferred* + TexMem*
6-8% GfxBench CarChase, Unigine Valley,
SynMark CSCloth + ShMapVsm + TerrainPan*
5-6% GfxBench Manhattan 3.1 + GL AztecRuins,
SynMark CSDof + TexFilterTri
2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] +
TexFilterAniso, GpuTest GiMark + 32-bit Julia
And with patch:
15-20% MemBW GPU texture
10% SynMark TexMem*
8-9% GLB Egypt (1/2 screen window)
4-5% GLB T-Rex (1/2 screen window)
3-6% GfxBench Manhattan 3.0, GpuTest FurMark,
SynMark Deferred + TexFilterTri
3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing
2-4% GpuTest Triangle, SynMark DeferredAA
2-3% Unigine Heaven + Valley
1-3% SynMark Terrain*
1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer
Tigerlake-H:
20-25% MemBW GPU texture
15-20% GpuTest Triangle
13-15% SynMark TerrainFly* + DeferredAA + HdrBloom
8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes
6-7% GfxBench Manhattan 3.0, SynMark TexMem*
4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than
offscreen versions of T-Rex/Egypt)
4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia,
SynMark CSDof + DrvState
3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D
1-7% Media tests
2-3% MemBW GPU blit
1-3% Most of the rest of 3D tests
With the patch:
6-8% MemBW GPU blend => the only regression in these tests (compared
to IOMMU without THP)
4-6% SynMark DrvState (not impacted) + HdrBloom (improved)
3-4% GLB T-Rex
~3% GLB Egypt, SynMark DrvRes
1-3% GfxBench T-Rex + Egypt, SynMark TexFilterTri
1-2% GfxBench CarChase + GLES AztecRuins, Unigine Valley,
GpuTest Triangle
~1% GfxBench Manhattan 3.0/3.1, Unigine Heaven
Perf of several tests actually improved with IOMMU + THP, compared to no
IOMMU / no THP:
10-15% SynMark Batch[0-3]
5-10% MemBW GPU texture, SynMark ShMapVsm
3-4% SynMark Fill* + Geom*
2-3% SynMark TexMem512 + CSCloth
1-2% SynMark TexMem128 + DeferredAA
As a summary across all platforms, these are the benchmarks where enabling
THP on top of IOMMU enabled brings regressions:
* Skylake GT4e:
20-25% SynMark TexMem*
(whereas all MemBW GPU tests either improve or are not affected)
* Broxton J4205:
7% MemBW GPU texture
2-3% SynMark TexMem*
* Tigerlake-H:
7% MemBW GPU blend
Other benchmarks show either lowering of regressions or improvements.
v2:
* Add Kconfig dependency to transparent hugepages and some help text.
* Move to helper for easier handling of kernel build options.
v3:
* Drop Kconfig. (Daniel)
v4:
* Add some benchmark results to commit message.
v5:
* Add explicit regression summary to commit message. (Eero)
References: b901bb8932 ("drm/i915/gemfs: enable THP")
References: 9987da4b5d ("drm/i915: Disable THP until we have a GPU read BW W/A")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
Co-developed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210909114448.508493-1-tvrtko.ursulin@linux.intel.com
It's been invariant since
commit ccbc1b9794
Author: Jason Ekstrand <jason@jlekstrand.net>
Date: Thu Jul 8 10:48:30 2021 -0500
drm/i915/gem: Don't allow changing the VM on running contexts (v4)
this just completes the deed. I've tried to split out prep work for
more careful review as much as possible, this is what's left:
- get_ppgtt gets simplified since we don't need to grab a temporary
reference - we can rely on the temporary reference for the gem_ctx
while we inspect the vm. The new vm_id still needs a full
i915_vm_open ofc. This also removes the final caller of context_get_vm_rcu
- A pile of selftests can now just look at ctx->vm instead of
rcu_dereference_protected( , true) or similar things.
- All callers of i915_gem_context_vm also disappear.
- I've changed the hugepage selftest to set scrub_64K without any
locking, because when we inspect that setting we're also not taking
any locks either. It works because it's a selftests that's careful
(single threaded gives you nice ordering) and not a live driver
where races can happen from anywhere.
These can only be split up further if we have some intermediate state
with a bunch more rcu_dereference_protected(ctx->vm, true), just to
shut up lockdep and sparse.
The conversion to __rcu happened in
commit a4e7ccdac3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Oct 4 14:40:09 2019 +0100
drm/i915: Move context management under GEM
Note that we're not breaking the actual bugfix in there: The real
bugfix is pushing the i915_vm_relase onto a separate worker, to avoid
locking inversion issues. The rcu conversion was just thrown in for
entertainment value on top (no vm lookup isn't even close to anything
that's a hotpath where removing the single spinlock can be measured).
v2: Rebase over the change to move the i915_vm_put() into
i915_gem_context_release().
v3: Trivial conflict against repainted shed.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-9-daniel.vetter@ffwll.ch
The comment added in
commit b81dde7194
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue May 21 22:11:29 2019 +0100
drm/i915: Allow userspace to clone contexts on creation
and moved in
commit 27dbae8f36
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Nov 6 09:13:12 2019 +0000
drm/i915/gem: Safely acquire the ctx->vm when copying
suggested that i915_address_space were at least intended to be managed
through SLAB_TYPESAFE_BY_RCU:
* This ppgtt may have be reallocated between
* the read and the kref, and reassigned to a third
* context. In order to avoid inadvertent sharing
* of this ppgtt with that third context (and not
* src), we have to confirm that we have the same
* ppgtt after passing through the strong memory
* barrier implied by a successful
* kref_get_unless_zero().
But extensive git history search has not brough any such reuse to
light.
What has come to light though is that ever since
commit 2850748ef8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Oct 4 14:39:58 2019 +0100
drm/i915: Pull i915_vma_pin under the vm->mutex
(yes this commit is earlier) the final i915_vma_put call has been
moved from i915_gem_context_free (now called _release) to
context_close, which means it's not actually safe anymore to access
the ctx->vm pointer without lock helds, because it might disappear at
any moment. Note that superficially things all still work, because the
i915_address_space is RCU protected since
commit b32fa81115
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 20 19:37:05 2019 +0100
drm/i915/gtt: Defer address space cleanup to an RCU worker
except the very clever macro above (which is designed to protected
against object reuse due to SLAB_TYPESAFE_BY_RCU or similar tricks)
results in an endless loop if the refcount of the ctx->vm ever
permanently drops to 0. Which it totally now can.
Fix that by moving the final i915_vm_put to where it should be.
Note that i915_gem_context is rcu protected, but _only_ the final
kfree. This means anyone who chases a pointer to a gem ctx solely
under the protection can pretty only call kref_get_unless_zero(). This
seems to be pretty much the case, aside from a bunch of cases that
consult the scheduling information without any further protection.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Fixes: 2850748ef8 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-3-daniel.vetter@ffwll.ch
Pull swiotlb updates from Konrad Rzeszutek Wilk:
"A new feature called restricted DMA pools. It allows SWIOTLB to
utilize per-device (or per-platform) allocated memory pools instead of
using the global one.
The first big user of this is ARM Confidential Computing where the
memory for DMA operations can be set per platform"
* 'stable/for-linus-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: (23 commits)
swiotlb: use depends on for DMA_RESTRICTED_POOL
of: restricted dma: Don't fail device probe on rmem init failure
of: Move of_dma_set_restricted_buffer() into device.c
powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
s390/pv: fix the forcing of the swiotlb
swiotlb: Free tbl memory in swiotlb_exit()
swiotlb: Emit diagnostic in swiotlb_exit()
swiotlb: Convert io_default_tlb_mem to static allocation
of: Return success from of_dma_set_restricted_buffer() when !OF_ADDRESS
swiotlb: add overflow checks to swiotlb_bounce
swiotlb: fix implicit debugfs declarations
of: Add plumbing for restricted DMA pool
dt-bindings: of: Add restricted DMA pool
swiotlb: Add restricted DMA pool initialization
swiotlb: Add restricted DMA alloc/free support
swiotlb: Refactor swiotlb_tbl_unmap_single
swiotlb: Move alloc_size to swiotlb_find_slots
swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
swiotlb: Update is_swiotlb_active to add a struct device argument
swiotlb: Update is_swiotlb_buffer to add a struct device argument
...
Pull rdma updates from Jason Gunthorpe:
"This is quite a small cycle, no major series stands out. The HNS and
rxe drivers saw the most activity this cycle, with rxe being broken
for a good chunk of time. The significant deleted line count is due to
a SPDX cleanup series.
Summary:
- Various cleanup and small features for rtrs
- kmap_local_page() conversions
- Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
- Cache the IB subnet prefix
- Rework how CRC is calcuated in rxe
- Clean reference counting in iwpm's netlink
- Pull object allocation and lifecycle for user QPs to the uverbs
core code
- Several small hns features and continued general code cleanups
- Fix the scatterlist confusion of orig_nents/nents introduced in an
earlier patch creating the append operation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
RDMA/mlx5: Relax DCS QP creation checks
RDMA/hns: Delete unnecessary blank lines.
RDMA/hns: Encapsulate the qp db as a function
RDMA/hns: Adjust the order in which irq are requested and enabled
RDMA/hns: Remove RST2RST error prints for hw v1
RDMA/hns: Remove dqpn filling when modify qp from Init to Init
RDMA/hns: Fix QP's resp incomplete assignment
RDMA/hns: Fix query destination qpn
RDMA/hfi1: Convert to SPDX identifier
IB/rdmavt: Convert to SPDX identifier
RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
RDMA/hns: Bugfix for the missing assignment for dip_idx
RDMA/hns: Bugfix for data type of dip_idx
RDMA/hns: Fix incorrect lsn field
RDMA/irdma: Remove the repeated declaration
RDMA/core/sa_query: Retry SA queries
RDMA: Use the sg_table directly and remove the opencoded version from umem
lib/scatterlist: Fix wrong update of orig_nents
lib/scatterlist: Provide a dedicated function to support table append
RDMA/hns: Delete unused hns bitmap interface
...
Using the I915_MMAP_TYPE_FIXED mmap type requires the TTM backend, so
for that mmap type, use __i915_gem_object_create_user() instead of
i915_gem_object_create_internal(), as we really want to tests objects
mmap-able by user-space.
This also means that the out-of-space error happens at object creation
and returns -ENXIO rather than -ENOSPC, so fix the code up to expect
that on out-of-offset-space errors.
Finally only use I915_MMAP_TYPE_FIXED for LMEM and SMEM for now if
testing on LMEM-capable devices. For stolen LMEM, we still take the
same path as for integrated, as that haven't been moved over to TTM yet,
and user-space should not be able to create out of stolen LMEM anyway.
v2:
- Check the presence of the obj->ops->mmap_offset callback rather than
hardcoding the supported mmap regions in can_mmap() (Maarten Lankhorst)
Fixes: 7961c5b60f ("drm/i915: Add TTM offset argument to mmap.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210831122931.157536-1-thomas.hellstrom@linux.intel.com
RDMA is the only in-kernel user that uses __sg_alloc_table_from_pages to
append pages dynamically. In the next patch. That mode will be extended
and that function will get more parameters. So separate it into a unique
function to make such change more clear.
Link: https://lore.kernel.org/r/20210824142531.3877007-2-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
UAPI Changes:
- Add I915_MMAP_OFFSET_FIXED
On devices with local memory `I915_MMAP_OFFSET_FIXED` is the only valid
type. On devices without local memory, this caching mode is invalid.
As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will
be used, depending on the object placement on creation. WB will be used
when the object can only exist in system memory, WC otherwise.
Userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11888
- Reinstate the mmap ioctl for (already released) integrated Gen12 platforms
Rationale: Otherwise media driver breaks eg. for ADL-P. Long term goal is
still to sunset the IOCTL even for integrated and require using mmap_offset.
- Reject caching/set_domain IOCTLs on discrete
Expected to become immutable property of the BO
- Disallow changing context parameters after first use on Gen12 and earlier
- Require setting context parameters at creation on platforms after Gen12
Rationale (for both): Allow less dynamic changes to the context to simplify
the implementation and avoid user shooting theirselves in the foot.
- Drop I915_CONTEXT_PARAM_RINGSIZE
Userspace PR for compute-driver has not been merged
- Drop I915_CONTEXT_PARAM_NO_ZEROMAP
Userspace PR for libdrm / Beignet was never landed
- Drop CONTEXT_CLONE API
Userspace PR for Mesa was never landed
- Drop getparam support for I915_CONTEXT_PARAM_ENGINES
Only existed for symmetry wrt. setparam, never used.
- Disallow bonding of virtual engines
Drop the prep work, no hardware has been released needing it.
- (Implicit) Disable gpu relocations
Media userspace was the last userspace to still use them. They
have converted so performance can be regained with an update.
Core Changes:
- Merge topic branch 'topic/i915-ttm-2021-06-11' (from Maarten)
- Merge topic branch 'topic/revid_steppings' (from Matt R)
- Merge topic branch 'topic/xehp-dg2-definitions-2021-07-21' (from Matt R)
- Backmerges drm-next (Rodrigo)
Driver Changes:
- Initial workarounds for ADL-P (Clint)
- Preliminary code for XeHP/DG2 (Stuart, Umesh, Matt R, Prathap, Ram,
Venkata, Akeem, Tvrtko, John, Lucas)
- Fix ADL-S DMA mask size to 39 bits (Tejas)
- Remove code for CNL (Lucas)
- Add ADL-P GuC/HuC firmwares (John)
- Update HuC to 7.9.3 for TGL/ADL-S/RKL (John)
- Fix -EDEADLK handling regression (Ville)
- Implement Wa_1508744258 for DG1 and Gen12 iGFX (Jose)
- Extend Wa_1406941453 to ADL-S (Jose)
- Drop unnecessary workarounds per stepping for SKL/BXT/ICL (Matt R)
- Use fuse info to enable SFC on Gen12 (Venkata)
- Unconditionally flush the pages on acquire on EHL/JSL (Matt A)
- Probe existence of backing struct pages upon userptr creation (Chris, Matt A)
- Add an intermediate GEM proto-context to delay real context creation (Jason)
- Implement SINGLE_TIMELINE with a syncobj (Jason)
- Set the watchdog timeout directly in intel_context_set_gem (Jason)
- Disallow userspace from creating contexts with too many engines (Jason)
- Revert "drm/i915/gem: Asynchronous cmdparser" (Jason)
- Revert "drm/i915: Propagate errors on awaiting already signaled fences" (Jason)
- Revert "drm/i915: Skip over MI_NOOP when parsing" (Jason)
- Revert "drm/i915: Shrink the GEM kmem_caches upon idling" (Daniel)
- Always let TTM handle object migration (Jason)
- Correct the locking and pin pattern for dma-buf (Thomas H, Michael R, Jason)
- Migrate to system at dma-buf attach time (Thomas, Michael R)
- MAJOR refactoring of the GuC backend code to allow for enabling on Gen11+
(Matt B, John, Michal Wa., Fernando, Daniele, Vinay)
- Update GuC firmware interface to v62.0.0 (John, Michal Wa., Matt B)
- Add GuCRC feature to hand over the control of HW RC6 to the GuC on
Gen12+ when GuC submission is enabled (Vinay, Sujaritha, Daniele,
John, Tvrtko)
- Use the correct IRQ during resume and eliminate DRM IRQ midlayer (Thomas Z)
- Add pipelined page migration and clearing (Chris, Thomas H)
- Use TTM for system memory on discrete (Thomas H)
- Implement object migration for display vs. dma-buf (Thomas H)
- Perform execbuffer object locking as a separate step (Thomas H)
- Add support for explicit L3BANK steering (Matt, Daniele)
- Remove duplicated call to ops->pread (Daniel)
- Fix pagefault disabling in the first execbuf slowpath (Daniel)
- Simplify userptr locking (Thomas H)
- Improvements to the GuC CTB code (Matt B, John)
- Make GT workaround upper bounds exclusive (Matt R)
- Check for nomodeset in i915_init() first (Daniel)
- Delete now unused gpu reloc code (Daniel)
- Document RFC plans for GuC submission, DRM scheduler and new parallel
submit uAPI (Matt B)
- Reintroduce buddy allocator this time with TTM (Matt A)
- Support forcing page size with LMEM (Matt A)
- Add i915_sched_engine to abstract a submission queue between backends (Matt B)
- Use accelerated move in TTM (Ram)
- Fix memory leaks from TTM backend (Thomas H)
- Introduce WW transaction helper (Thomas H)
- Improve debug Kconfig texts a bit (Daniel)
- Unify user object creation code (Jason)
- Use a table for i915_init/exit (Jason)
- Move slabs to module init/exit (Daniel)
- Remove now unused i915_globals (Daniel)
- Extract i915_module.c (Daniel)
- Consistently use adl-p/adl-s in WA comments (Jose)
- Finish INTEL_GEN and friends conversion (Lucas)
- Correct variable/function namings (Lucas)
- Code checker fixes (Wan, Matt A)
- Tracepoint improvements (Matt B)
- Kerneldoc improvements (Tvrtko, Jason, Matt A, Maarten)
- Selftest improvements (Chris, Matt A, Tejas, Thomas H, John, Matt B,
Rahul, Vinay)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YQ0JmYiXhGskNcrI@jlahtine-mobl.ger.corp.intel.com
Jason Ekstrand requested a more efficient method than userptr+set-domain
to determine if the userptr object was backed by a complete set of pages
upon creation. To be more efficient than simply populating the userptr
using get_user_pages() (as done by the call to set-domain or execbuf),
we can walk the tree of vm_area_struct and check for gaps or vma not
backed by struct page (VM_PFNMAP). The question is how to handle
VM_MIXEDMAP which may be either struct page or pfn backed...
With discrete we are going to drop support for set_domain(), so offering
a way to probe the pages, without having to resort to dummy batches has
been requested.
v2:
- add new query param for the PROBE flag, so userspace can easily
check if the kernel supports it(Jason).
- use mmap_read_{lock, unlock}.
- add some kernel-doc.
v3:
- In the docs also mention that PROBE doesn't guarantee that the pages
will remain valid by the time they are actually used(Tvrtko).
- Add a small comment for the hole finding logic(Jason).
- Move the param next to all the other params which just return true.
Testcase: igt/gem_userptr_blits/probe
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723113405.427004-1-matthew.auld@intel.com