Since commit 6ca9a2beb5 ("drm/i915: Unwind i915_gem_init() failure")
we believed that we correctly handle all errors encountered during
GuC initialization, including special one that indicates request to
run driver with disabled GPU submission (-EIO).
Unfortunately since commit 121981fafe ("drm/i915/guc: Combine
enable_guc_loading|submission modparams") we stopped using that
error code to avoid unwanted fallback to execlist submission mode.
In result any GuC initialization failure was treated as non-recoverable
error leading to driver load abort, so we could not even read related
GuC error log to investigate cause of the problem.
For now always return -EIO on any uC hardware related failure.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190811195132.9660-5-michal.wajdeczko@intel.com
We keep a global seed for the legacy BSD round-robin selector, but in
our testing of multiple simultaneous client workloads, a random seed
spreads the load more evenly. (As even as an initial round-robin selector
can be!) Removing the global is one less variable we have to find a home
for!
We can simulate multi-client (both same and mixed workloads) using
igt/gem_wsim to work out optimal strategies and then compare our
simulation with the actual transcoder on multi-engine machines. This
fixed round-robin turns out to be one of the worst methods.
No user is advised to use this method; the current suggestion is to use
a virtual engine for agnostic batches, randomised submission or using
the busyness tracking to select the most idle engine at the time of
dispatch. At the present time, intel-media is explicit, but libva still
seems to use it, with the exception of batches that must execute on vcs0.
Oh well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809091010.23281-2-chris@chris-wilson.co.uk
As we increase the number of RCU objects, it becomes easier for us to
have several hundred thousand objects in the deferred RCU free queues.
An example is gem_ctx_create/files which continually creates active
contexts, which are not immediately freed upon close as they are kept
alive by outstanding requests. This lack of backpressure allows the
context objects to persist until they overwhelm and starve the system.
We can increase our backpressure by flushing the freed object queue upon
closing the device fd which should then not impact other clients.
Testcase: igt/gem_ctx_create/*files
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-2-chris@chris-wilson.co.uk
Until Icelake, each engine had its own set of 64 MOCS registers. In
order to simplify, Tigerlake moves to only 64 Global MOCS registers,
which are no longer part of the engine context. Since these registers
are now global, they also only need to be initialized once.
>From Gen12 onwards, MOCS must specify the target cache (3:2) and LRU
management (5:4) fields and cannot be programmed to 'use the value from
Private PAT', because these fields are no longer part of the PPAT. Also
cacheability control (1:0) field has changed, 00 no longer means 'use
controls from page table', but uncacheable (UC).
v2 (Lucas):
- Move the changes to the fault registers to a separate commit - the
old ones overlap with the range used by the new global MOCS
(requested by Daniele)
v3 (Lucas):
- Clarify comment about setting the unused entries to the same value
of index 0, that is the invalid entry (requested by Daniele)
- Move changes to DONE_REG and ERROR_GEN6 to a separate commit
(requested by Daniele)
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730180407.5993-5-lucas.demarchi@intel.com
The "misc" terminology doesn't clearly explain what we intend to cover
in this phase. The only thing we used ot do in there apart from FW fetch
was initializing the log workqueue, with the latter being required only in
the very rare case where we enable the log relay. As we no longer create
our own workqueue, piggybacking on the system_highpri_wq instead, we can
rename the function to clarify that they only fetch/release the blobs.
v2: only create log wq when needed (Michal), reword commit msg
accordingly
v3: after rebase the wq is gone, reword commit msg accordingly
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Replace mixed "_fini"/"_cleanup"/"_cleanup_hw" suffixes found in names
of functions called from i915_driver_release() with "_release" suffix
consistently. This provides better code readability, especially
helpful when trying to work out which phase the code is in.
Functions names starting with "i915_driver_", i.e., those defined in
drivers/gpu/dri/i915/i915_drv.c, just have their "cleanup" or "fini"
parts of their names replaced with the "_release" suffix, while names
of functions coming from other source files have been suffixed with
"_driver_release" to avoid ambiguity with other possible .release entry
points.
v2: early_probe pairs better with late_release (Chris)
v3: fix typo in commit message (Joonas)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712112429.740-5-janusz.krzysztofik@linux.intel.com
As we have dropped the final reference to the object, we do not need to
wait until after the rcu grace period to drop its pages. We still require
struct_mutex to completely unbind the object to release the pages, so we
still need a free-worker to manage that from process context. By
scheduling the release of pages before waiting for the rcu should mean
that we are not trapping those pages from beyond the reach of the
shrinker.
v2: Pass along the request to skip if the vma is busy to the underlying
unbind routine, to avoid checking the reservation underneath the
i915->mm.obj_lock which may be used from inside irq context.
v3: Flip the bit for unbinding while active, for later convenience.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111035
Fixes: a93615f900 ("drm/i915: Throw away the active object retirement complexity")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-6-chris@chris-wilson.co.uk
Having introduced struct intel_gt (named the anonymous structure in i915)
we can start using it to compartmentalize our code better. It makes more
sense logically to have the code internally like this and it will also
help with future split between gt and display in i915.
v2:
* Keep ggtt flush before fb obj flush. (Chris)
v3:
* Fix refactoring fail.
* Always flush ggtt writes. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-23-tvrtko.ursulin@linux.intel.com
We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.
The sequence of operations for keeping the context pinned until saved is:
- On context activation, we preallocate a node for each physical engine
the context may operate on. This is to avoid allocations during
unpinning, which may be from inside FS_RECLAIM context (aka the
shrinker)
- On context deactivation on retirement of the last active request (which
is before we know the context has been saved), we add the
preallocated node onto a barrier list on each engine
- On engine idling, we emit a switch to kernel context. When this
switch completes, we know that all previous contexts must have been
saved, and so on retiring this request we can finally unpin all the
contexts that were marked as deactivated prior to the switch.
We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.
v2: intel_context_active_acquire/_release
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk