Everything about the file is about display, and mostly about types
related to display. Move under display/ as intel_display_types.h to
reflect the facts.
There's still plenty to clean up, but start off with moving the file
where it logically belongs and naming according to contents.
v2: fix the include guard name in the renamed file
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806113933.11799-1-jani.nikula@intel.com
To maintain a fast lookup from a GT centric irq handler, we want the
engine lookup tables on the intel_gt. To avoid having multiple copies of
the same multi-dimension lookup table, move the generic user engine
lookup into an rbtree (for fast and flexible indexing).
v2: Split uabi_instance cf uabi_class
v3: Set uabi_class/uabi_instance after collating all engines to provide a
stable uabi across parallel unordered construction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk
We do not notify userspace when the scheduler capabilities are changed
(due to wedging the driver) and as such userspace will expect the caps
to be static and unchanging. Make it so, and so we only need to compute
our caps once during driver registration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-1-chris@chris-wilson.co.uk
Switch to tracking activity via i915_active on individual nodes, only
keeping a list of retired objects in the cache, and reaping the cache
when the engine itself idles.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190804124826.30272-2-chris@chris-wilson.co.uk
In the next patch, we would like to couple into the engine wakeref to
free the batch pool on idling. The caveat here is that we therefore want
to track the engine wakeref more precisely and to hold it instead of the
broader GT wakeref as we process the ioctl.
v2: Avoid introducing odd semantics for a shortlived timeline->mutex
acquisition interface.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190804124826.30272-1-chris@chris-wilson.co.uk
As we increase the number of RCU objects, it becomes easier for us to
have several hundred thousand objects in the deferred RCU free queues.
An example is gem_ctx_create/files which continually creates active
contexts, which are not immediately freed upon close as they are kept
alive by outstanding requests. This lack of backpressure allows the
context objects to persist until they overwhelm and starve the system.
We can increase our backpressure by flushing the freed object queue upon
closing the device fd which should then not impact other clients.
Testcase: igt/gem_ctx_create/*files
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-2-chris@chris-wilson.co.uk
The shrinker cannot touch objects used by the contexts (logical state
and ring). Currently we mark those as "pin_global" to let the shrinker
skip over them, however, if we remove them from the shrinker lists
entirely, we don't event have to include them in our shrink accounting.
By keeping the unshrinkable objects in our shrinker tracking, we report
a large number of objects available to be shrunk, and leave the shrinker
deeply unsatisfied when we fail to reclaim those. The shrinker will
persist in trying to reclaim the unavailable objects, forcing the system
into a livelock (not even hitting the dread oomkiller).
v2: Extend unshrinkable protection for perma-pinned scratch and guc
allocations (Tvrtko)
v3: Notice that we should be pinned when marking unshrinkable and so the
link cannot be empty; merge duplicate paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-1-chris@chris-wilson.co.uk
Recently discovered in commit bdae33b8b8 ("drm/i915: Use maximum write
flush for pwrite_gtt") was that we needed to our full write barrier
before changing the GGTT PTE to ensure that our indirect writes through
the GTT landed before the PTE changed (and the writes end up in a
different page). That also applies to our GGTT relocation path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730112151.5633-4-chris@chris-wilson.co.uk
We only compute the lrc_descriptor() on pinning the context, i.e.
infrequently, so we do not benefit from storing the template as the
addressing mode is also fixed for the lifetime of the intel_context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730133035.1977-9-chris@chris-wilson.co.uk
Teach igt_spinner to only use our internal structs, decoupling the
interface from the GEM contexts. This makes it easier to avoid
requiring ce->gem_context back references for kernel_context that may
have them in future.
v2: Lift engine lock to verify_wa() caller.
v3: Less than v2, but more so
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190731081126.9139-1-chris@chris-wilson.co.uk
My plan for the future is to have kernel contexts not to have a GEM
context backpointer (as they will not belong to any GEM context). In a
few places, we use ce->gem_context to simply obtain the i915 backpointer,
for which we can use ce->engine->i915 instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730163441.16477-1-chris@chris-wilson.co.uk
Track the currently bound address space used by the HW context. Minor
conversions to use the local intel_context.vm are made, leaving behind
some more surgery required to make intel_context the primary through the
selftests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730143209.4549-2-chris@chris-wilson.co.uk
Throw out our local hacks of header tests now that the more generic
kbuild versions are upstream.
At least for now, continue to keep the header tests behind
CONFIG_DRM_I915_WERROR=y knob.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190729140847.18557-1-jani.nikula@intel.com
The microcontrollers are part of GT so it makes logical sense to have
them sanitized at the same time. This also fixed an issue with our
status tracking where the FW load status is not reset around
hibernation.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190723091404.6449-2-chris@chris-wilson.co.uk
Avoid a global idle barrier by reconfiguring each context by rewriting
them with MI_STORE_DWORD from the kernel context.
v2: We only need to determine the desired register values once, they are
the same for all contexts.
v3: Don't remove the kernel context from the list of known GEM contexts;
the world is not ready for that yet.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk
All the intel_uc_* can now be moved to work on the intel_uc structure
for better encapsulation of uc-related actions.
Note: I've introduced uc_to_gt instead of uc_to_i915 because the aim is
to move everything to be gt-focused in the medium term, so we would've
had to replace it soon anyway.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-8-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk
Follow dim checkpatch recommendation so it doesn't complain on that now
and again on header file modifications.
v2: drop testing leftover (Chris)
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712112429.740-2-janusz.krzysztofik@linux.intel.com
Right idea, wrong lock. We already drop struct_mutex before we free the
mmap_offset when freeing the object, so we need to take the vma manager
lock when manipulating the mmap_offset address space for our selftests.
Fixes: 8221d21b06 ("drm/i915/selftests: Lock the drm_mm while modifying")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190711065215.4004-2-chris@chris-wilson.co.uk
Preemption via GuC submission is not being supported with its current
legacy incarnation. The current FW does support a similar pre-emption
flow via H2G, but it is class-based instead of being instance-based,
which doesn't fit well with the i915 tracking. To fix this, the
firmware is being updated to better support our needs with a new flow,
so we can safely remove the old code.
v2 (Daniele): resurrect & rebase, reword commit message, remove
preempt_context as well
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190710005437.3496-2-daniele.ceraolospurio@intel.com
Specify that we do want a 64b value for sizeof(u32) as we want to
compute the mask of the upper 62bits.
v2: Use round_down() for automatic type promotion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190710161413.7115-1-chris@chris-wilson.co.uk
If we map an object as readonly into the GTT, we know that the GPU
cannot have written to it and so the object is not dirty and we don't
need to flush the writes back to the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709081718.27843-1-chris@chris-wilson.co.uk
set_page_dirty says:
For pages with a mapping this should be done under the page lock
for the benefit of asynchronous memory errors who prefer a
consistent dirty state. This rule can be broken in some special
cases, but should be better not to.
Under those rules, it is only safe for us to use the plain set_page_dirty
calls for shmemfs/anonymous memory. Userptr may be used with real
mappings and so needs to use the locked version (set_page_dirty_lock).
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Fixes: 5cc9ed4b9a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl")
References: 6dcc693bc5 ("ext4: warn when page is dirtied without buffers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190708140327.26825-1-chris@chris-wilson.co.uk
When using MI operations, we do not care which engine we use, so use
them all where possible, and where inconvenient double check we have the
engine we selected at random.
v2: Drop the local copy of engine->sseu to avoid an unchecked deref
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190704212343.6820-1-chris@chris-wilson.co.uk
During the context execution tests, we issue a lot of work and discard a
lot of objects without releasing the lock and allowing the background
reaper to free those objects. Insert a small break between each pass to
flush the worker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190704165317.21060-1-chris@chris-wilson.co.uk
We need to setup the workarounds on all engines, with the knowledge
about which platforms each workaround applies to kept together in the
workaround list. As such, we can pull the w/a initialisation into the
common setup and try to avoid duplicating knowledge about when to setup
the workarounds.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703135805.7310-2-chris@chris-wilson.co.uk
Since reservation_object_fini() does an immediate free, rather than
kfree_rcu as normal, we have to delay the release until after the RCU
grace period has elapsed (i.e. from the rcu cleanup callback) so that we
can rely on the RCU protected access to the fences while the object is a
zombie.
i915_gem_busy_ioctl relies on having an RCU barrier to protect the
reservation in order to avoid having to take a reference and strong
memory barriers.
v2: Order is important; only release after putting the pages!
Fixes: c03467ba40 ("drm/i915/gem: Free pages before rcu-freeing the object")
Testcase: igt/gem_busy/close-race
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703180601.10950-1-chris@chris-wilson.co.uk
As we have dropped the final reference to the object, we do not need to
wait until after the rcu grace period to drop its pages. We still require
struct_mutex to completely unbind the object to release the pages, so we
still need a free-worker to manage that from process context. By
scheduling the release of pages before waiting for the rcu should mean
that we are not trapping those pages from beyond the reach of the
shrinker.
v2: Pass along the request to skip if the vma is busy to the underlying
unbind routine, to avoid checking the reservation underneath the
i915->mm.obj_lock which may be used from inside irq context.
v3: Flip the bit for unbinding while active, for later convenience.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111035
Fixes: a93615f900 ("drm/i915: Throw away the active object retirement complexity")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-6-chris@chris-wilson.co.uk
We frequently, but not frequently enough!, remember to flush residual
operations and objects at the end of a live subtest. The purpose is to
cleanup after every subtest, leaving a clean slate for the next subtest,
and perform early detection of leaky state. As this should ideally be
common for all live subtests, pull the task into a common teardown
routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-1-chris@chris-wilson.co.uk
With the subdirectories we lost the ability to build individual files on
the command line, for example:
$ make drivers/gpu/drm/i915/display/intel_display.o
This was due to the top level directory missing from header search
path. Add the header search paths to subdir Makefiles.
Note that none of the other options in the top level i915 Makefile are
taken into account when building individual files. Usually this is not a
concern.
Reported-by: Imre Deak <imre.deak@intel.com>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626143618.21800-2-jani.nikula@intel.com
Since the reset path wants to recover the engines itself, it only wants
to reinitialise the hardware using i915_gem_init_hw(). Pull the call to
intel_engines_resume() to the module init/resume path so we can avoid it
during reset.
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-3-chris@chris-wilson.co.uk
As we have already plugged the w->dma into the reservation_object, and
have set ourselves up to automatically signal the request and w->dma on
completion, we do not need to export the rq->fence directly and just use
the w->dma fence.
This avoids having to take the reservation_lock inside the worker which
cross-release lockdep would complain about. :)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621215733.12070-1-chris@chris-wilson.co.uk
If we introduce a callback for i915_active that is only called the first
time we use the i915_active and is symmetrically paired with the
i915_active.retire callback, we can replace the open-coded and
non-atomic implementations -- which will be very fragile (i.e. broken)
upon removing the struct_mutex serialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk
Remove the accumulated optimisations that we have for i915_vma_retire
and reduce it to the bare essential of tracking the active object
reference. This allows us to only use atomic operations, and so will be
able to avoid the struct_mutex requirement.
The principal loss here is the shrinker MRU bumping, so now if we have
to shrink, we will do so in much more random order and more likely to
try and shrink recently used objects. That is a nuisance, but shrinking
active objects is a second step we try to avoid and will always be a
system-wide performance issue.
The other loss is here is in the automatic pruning of the
reservation_object when idling. This is not as large an issue as upon
reservation_object introduction as now adding new fences into the object
replaces already signaled fences, keeping the array compact. But we do
lose the auto-expiration of stale fences and unused arrays. That may be
a noticeable problem for which we need to re-implement autopruning.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-3-chris@chris-wilson.co.uk
i915_gem_wait_for_idle() and i915_retire_requests() introduce a
dependency on the timeline->mutex. This is problematic as we want to
later perform allocations underneath i915_active.mutex, forming a link
between the shrinker, the timeline and active mutexes. Nip this cycle in
the bud by removing the acquisition of the timeline mutex (i.e.
retiring) from inside the shrinker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-1-chris@chris-wilson.co.uk
Our timelines are stored inside intel_gt so we can convert the interface
to take exactly that and not i915.
At the same time re-order the params to our more typical layout and
replace the backpointer to the new containing structure.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-31-tvrtko.ursulin@linux.intel.com
For gt related operations it makes more logical sense to stay in the realm
of gt instead of dereferencing via driver i915.
This patch handles a few of the easy ones with work requiring more
refactoring still outstanding.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-30-tvrtko.ursulin@linux.intel.com
Having introduced struct intel_gt (named the anonymous structure in i915)
we can start using it to compartmentalize our code better. It makes more
sense logically to have the code internally like this and it will also
help with future split between gt and display in i915.
v2:
* Keep ggtt flush before fb obj flush. (Chris)
v3:
* Fix refactoring fail.
* Always flush ggtt writes. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-23-tvrtko.ursulin@linux.intel.com
Since commit 79ffac8599 ("drm/i915: Invert the GEM wakeref
hierarchy"), the request creation itself took responsibility for
managing the engine/GT wakerefs and so we can remove the redundant grabs
in our selftests.
References: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620102432.31580-1-chris@chris-wilson.co.uk
When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.
However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.
To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.
v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk