With the global kmem_cache shrink infrastructure gone there's nothing
special and we can convert them over.
I'm doing this split up into each patch because there's quite a bit of
noise with removing the static global.slab_vmas to just a
slab_vmas.
We have to keep i915_drv.h include in i915_globals otherwise there's
nothing anymore that pulls in GEM_BUG_ON.
v2: Make slab static (Jason, 0day)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-9-daniel.vetter@ffwll.ch
UAPI Changes:
- Disable mmap ioctl for gen12+ (excl. TGL-LP)
- Start enabling HuC loading by default for upcoming Gen12+
platforms (excludes TGL and RKL)
Core Changes:
- Backmerge of drm-next
Driver Changes:
- Revert "i915: use io_mapping_map_user" (Eero, Matt A)
- Initialize the TTM device and memory managers (Thomas)
- Major rework to the GuC submission backend to prepare
for enabling on new platforms (Michal Wa., Daniele,
Matt B, Rodrigo)
- Fix i915_sg_page_sizes to record dma segments rather
than physical pages (Thomas)
- Locking rework to prep for TTM conversion (Thomas)
- Replace IS_GEN and friends with GRAPHICS_VER (Lucas)
- Use DEVICE_ATTR_RO macro (Yue)
- Static code checker fixes (Zhihao)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YMHeDxg9VLiFtyn3@jlahtine-mobl.ger.corp.intel.com
In the shrinker, we protect framebuffers from light reclaim as we
typically expect framebuffers to be reused in the near future (and with
low latency requirements). We can apply the same logic to the GGTT
eviction and defer framebuffers to the second pass only used if the
caller is desperate enough to wait for space to become available.
In most cases, the caller will use a smaller partial vma instead of
trying to force the object into the GGTT if doing so will cause other
users to be evicted.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-5-chris@chris-wilson.co.uk
On Braswell and Broxton (also known as Valleyview and Apollolake), we
need to serialise updates of the GGTT using the big stop_machine()
hammer. This has the side effect of appearing to lockdep as a possible
reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu
allocations). However, we want to use vm->mutex (including ggtt->mutex)
from within the shrinker and so must avoid such possible taints. For this
purpose, we introduced the asynchronous vma binding and we can apply it
to the PIN_GLOBAL so long as take care to add the necessary waits for
the worker afterwards.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/211
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk
This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature
comes from the value returned by this ioctl which is the offset into the
device fd which userpace uses with mmap(2).
mmap_gtt was our initial mmap_offset implementation, this extends
our CPU mmap support to allow additional fault handlers that depends on
the object's backing pages.
Note that we multiplex mmap_gtt and mmap_offset through the same ioctl,
and use the zero extending behaviour of drm to differentiate between
them, when we inspect the flags.
To support multiple mmap types on an object we need to support multiple
mmap_offsets for an object (each offset in the global device address
space corresponding to a unique instance of the object for a file + mmap
type). As we drop the simplified drm core idea of a single mmap_offset,
we need to provide replacement hooks for the dumb mmap interface as
well.
Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675
Testcase: igt/gem_mmap_offset
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
We need the rename of reservation_object to dma_resv.
The solution on this merge came from linux-next:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 14 Aug 2019 12:48:39 +1000
Subject: [PATCH] drm: fix up fallout from "dma-buf: rename reservation_object to dma_resv"
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
drivers/gpu/drm/i915/gt/intel_engine_pool.c | 8 ++++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.c b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
index 03d90b49584a..4cd54c569911 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pool.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
@@ -43,12 +43,12 @@ static int pool_active(struct i915_active *ref)
{
struct intel_engine_pool_node *node =
container_of(ref, typeof(*node), active);
- struct reservation_object *resv = node->obj->base.resv;
+ struct dma_resv *resv = node->obj->base.resv;
int err;
- if (reservation_object_trylock(resv)) {
- reservation_object_add_excl_fence(resv, NULL);
- reservation_object_unlock(resv);
+ if (dma_resv_trylock(resv)) {
+ dma_resv_add_excl_fence(resv, NULL);
+ dma_resv_unlock(resv);
}
err = i915_gem_object_pin_pages(node->obj);
which is a simplified version from a previous one which had:
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We were using the last_fence to track the last request that used this
vma that might be interpreted by a fence register and forced ourselves
to wait for this request before modifying any fence register that
overlapped our vma. Due to requirement that we need to track any XY_BLT
command, linear or tiled, this in effect meant that we have to track the
vma for its active lifespan anyway, so we can forgo the explicit
last_fence tracking and just use the whole vma->active.
Another solution would be to pipeline the register updates, and would
help resolve some long running stalls for gen3 (but only gen 2 and 3!)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812174804.26180-1-chris@chris-wilson.co.uk
The shrinker cannot touch objects used by the contexts (logical state
and ring). Currently we mark those as "pin_global" to let the shrinker
skip over them, however, if we remove them from the shrinker lists
entirely, we don't event have to include them in our shrink accounting.
By keeping the unshrinkable objects in our shrinker tracking, we report
a large number of objects available to be shrunk, and leave the shrinker
deeply unsatisfied when we fail to reclaim those. The shrinker will
persist in trying to reclaim the unavailable objects, forcing the system
into a livelock (not even hitting the dread oomkiller).
v2: Extend unshrinkable protection for perma-pinned scratch and guc
allocations (Tvrtko)
v3: Notice that we should be pinned when marking unshrinkable and so the
link cannot be empty; merge duplicate paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-1-chris@chris-wilson.co.uk
Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
We currently track GPU memory usage inside VMA, such that we never
release memory used by the GPU until after it has finished accessing it.
However, we may want to track other resources aside from VMA, or we may
want to split a VMA into multiple independent regions and track each
separately. For this purpose, generalise our request tracking (akin to
struct reservation_object) so that we can embed it into other objects.
v2: Tweak error handling during selftest setup.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-2-chris@chris-wilson.co.uk
Previously we only accommodated having a vma pinned by a small number of
users, with the maximum being pinned for use by the display engine. As
such, we used a small bitfield only large enough to allow the vma to
be pinned twice (for back/front buffers) in each scanout plane. Keeping
the maximum permissible pin_count small allows us to quickly catch a
potential leak. However, as we want to split a 4096B page into 64
different cachelines and pin each cacheline for use by a different
timeline, we will exceed the current maximum permissible vma->pin_count
and so time has come to enlarge it.
Whilst we are here, try to pull together the similar bits:
Address/layout specification:
- bias, mappable, zone_4g: address limit specifiers
- fixed: address override, limits still apply though
- high: not strictly an address limit, but an address direction to search
Search controls:
- nonblock, nonfault, noevict
v2: Rewrite the guideline comment on bit consumption.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.C.Harrison@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-2-chris@chris-wilson.co.uk
Removing the pin bias from GuC allows us to not check for GuC every time
we pin a context, which fixes the assertion error on unresolved GuC
platform default in mock contexts selftest.
It also seems that we were using uninitialized WOPCM variables when
setting the GuC pin bias. The pin bias has to be set after the WOPCM,
but before the call to i915_gem_contexts_init where the first contexts
are pinned.
v2:
This also makes it so that there's no need to set GuC variables from
within the WOPCM init function or to move the WOPCM init, while keeping
the correct initialization order. Also for mock tests the pin bias is
left at 0 and we make sure that the pin bias with GuC will not be
smaller than without GuC.
v3:
Avoid unused i915 in intel_guc_ggtt_offset if debug is disabled.
v4:
Squash with WOPCM init reordering.
Moved the i915_ggtt_pin_bias helper to this patch, and made some
functions use it instead of directly dereferencing i915->ggtt.
v5:
Since we now don't use wopcm.guc.base for the pin bias there's no need to
validate it. It also has already been verified in WOPCM init.
v6:
Deleted the now unnecessarily introduced includes from previous versions.
Dropped naming changes from dev_priv to i915 for better patch readability.
v7:
Changed some comments to make more sense in the context they're in.
v8:
Moved and renamed the function which now returns the wopcm.guc.size to
intel_guc.c:intel_guc_reserved_gtt_size to avoid any possible confusion
with the pin_bias in ggtt, which should be used for pinning.
Fixed patch not applying or the most recent upstream.
Fixes: f7dc0157e4 ("drm/i915/uc: Fetch GuC/HuC firmwares from guc/huc specific init")
Testcase: igt/drv_selftest/mock_contexts #GuC
Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180727141148.30874-3-jakub.bartminski@intel.com
Using a VMA on more than one timeline concurrently is the exception
rather than the rule (using it concurrently on multiple engines). As we
expect to only use one active tracker, store the most recently used
tracker inside the i915_vma itself and only fallback to the rbtree if
we need a second or more concurrent active trackers.
v2: Comments on how we overwrite any existing last_active cache.
v3: __list_del_entry() before list_replace_init() is confusing and, much
more important, entirely redundant.
v4: Note that both last_active and the rbtree may be simultaneously
tracking this timeline, albeit with different requests, and so the vma
may be retired twice for the same timeline.
v5: No, that list_del is required!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706123157.9645-1-chris@chris-wilson.co.uk
In the next patch, we will want to be able to use more flexible request
timelines that can hop between engines. From the vma pov, we can then
not rely on the binding of this request to an engine and so can not
ensure that different requests are ordered through a per-engine
timeline, and so we must track activity of all timelines. (We track
activity on the vma itself to prevent unbinding from HW before the HW
has finished accessing it.)
v2: Switch to a rbtree for 32b safety (since using u64 as a radixtree
index is fraught with aliasing of unsigned longs).
v3: s/lookup_active/active_instance/ because we can never agree on names
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-5-chris@chris-wilson.co.uk