Silence smatch by demonstrating that guc->stage_desc_pool is allocated
following a successful guc_stage_desc_pool_create(),
drivers/gpu/drm/i915/i915_guc_submission.c:1293 i915_guc_submission_init() error: we previously assumed 'guc->stage_desc_pool' could be null (see line 1261)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171106114833.31199-1-chris@chris-wilson.co.uk
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
If GuC firmware performs an engine reset while that engine had a
preemption pending, it will set the terminated attribute bit on our
preemption stage descriptor. GuC firmware retains all pending work
items for a high-priority GuC client, unlike the normal-priority GuC
client where work items are dropped. It wants to make sure the preempt-
to-idle work doesn't run when scheduling resumes, and uses this bit to
inform its scheduler and presumably us as well. Our job is to clear it
for the next preemption after reset, otherwise that and future
preemptions will never complete. We'll just clear it every time.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171101221630.25086-1-jeff.mcgee@intel.com
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.
To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.
The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.
v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context
v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.
v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)
v5: Make wq NULL after destroying it
v6: Swap struct guc_preempt_work members (Michał)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
We also want to support preemption with GuC submission backend.
In order to do that, we need to remember the priority, like we do on
execlists path.
v2: Remove completed prio == INT_MAX optimization
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-10-michal.winiarski@intel.com
We're using a special preempt context for HW to preempt into. We don't
want to emit any requests there, but we still need to wrap this context
into a valid GuC work item.
Let's cleanup the functions operating on GuC work items.
We can extract guc_request_add - responsible for adding GuC work item and
ringing the doorbell, and guc_wq_item_append - used by the function
above, not tied to the concept of gem request.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-7-michal.winiarski@intel.com
This second client is created with priority KMD_HIGH, and marked
as preemptive. This will allow us to request preemption using GuC actions.
v2: Extract clients creation into a helper, debugfs fixups. (Michał)
Recreate doorbell on init. (Daniele)
Move clients into an array.
v3: And move clients back from an array, to get rid of the enum (Michał)
v4: Use is_high_priority, move DRM_ERROR into __create_doorbell, move
GEM_BUG_ON inside guc_clients_create (Michał)
v5: Split the BUG_ON (Michał)
v6: Cleanup after error during doorbell reinit (Michał)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026141737.31656-1-michal.winiarski@intel.com
We were using first page of kernel context render state for sharing data
with GuC. While it's justified by the fact that those pages are not used
(note, GuC still enforces this layout and refuses to work if we remove
the extra page in front), it's also confusing (why are we using this
particular page?). Let's allocate a separate object instead.
v2: Drop kernel_context from GuC suspend/resume action handlers (Michel)
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-4-michal.winiarski@intel.com
Since it's a two-step process, we can have a cleaner error handling in
the caller if we do the allocations in a helper.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-3-michal.winiarski@intel.com
Apparently, this value is reserved and may be interpreted as changing
doorbell ownership. Even though we're not observing any side effects
now, let's skip over it to be consistent with the spec.
v2: Apply checkpatch (Sagar)
Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025200020.16636-2-michal.winiarski@intel.com
Now that we're handling request resubmission the same way as regular
submission (from the tasklet), we can move GuC initialization earlier,
before restarting the engines. This way, we're no longer being in the
state of flux during engine restart - we're already in user requested
submission mode.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025172519.10670-5-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The execlists emulation on top of the GuC (used for scheduling and
preemption) depends on the MI_USER_INTERRUPT for its notifications and
tasklet action. As we always employ the irq, there is no advantage in
ever disabling it while we are using the GuC, so allow us to arm the
breadcrumb irq when enabling GuC submission and disarm upon disabling.
The impact should be lessened by the delayed irq disabling we do (we
only disable after receiving an interrupt for which no one was wanting),
but allowing guc to explicitly manage the irq in relation to itself is
simpler and prevents an issue with losing an interrupt for preemption
as it is not coupled to an active request.
Internally, we add a reference counter (breadcrumbs.irq_enabled) as a
simple mechanism to allow GuC to keep the breadcrumb irq enabled. To
improve upon always enabling the irq while guc is selected, we need
to hook into the parking facility of intel_engines so that we only enable
the breadcrumbs while the GT is active (one step better would be to
individually park/unpark each engine).
In effect, this means that we keep the breadcrumb irq always enabled for
the entire duration the guc is busy, whereas before we would try to
switch it off whenever we idled for more than interrupt with no
associated waiters. The difference *should* be negligible in practice!
v2: Stop abusing fence signaling (and its auxiliary data structures) to
enable the breadcrumbs irqs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>,
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>,
Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-3-chris@chris-wilson.co.uk
Back in commit a4b2b01523 ("drm/i915: Don't mark an execlists
context-switch when idle") we noticed the presence of late
context-switch interrupts. We were able to filter those out by looking
at whether the ELSP remained active, but in commit beecec9017
("drm/i915/execlists: Preemption!") that became problematic as we now
anticipate receiving a context-switch event for preemption while ELSP
may be empty. To restore the spurious interrupt suppression, add a
counter for the expected number of pending context-switches and skip if
we do not need to handle this interrupt to make forward progress.
v2: Don't forget to switch on for preempt.
v3: Reduce the counter to a on/off boolean tracker. Declare the HW as
active when we first submit, and idle after the final completion event
(with which we confirm the HW says it is idle), and track each source
of activity separately. With a finite number of sources, it should aide
us in debugging which gets stuck.
Fixes: beecec9017 ("drm/i915/execlists: Preemption!")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Prepared substructure rps for RPS related state. autoenable_work is
used for RC6 too hence it is defined outside rps structure. As we do
this lot many functions are refactored to use intel_rps *rps to access
rps related members. Hence renamed intel_rps_client pointer variables
to rps_client in various functions.
v2: Rebase.
v3: s/pm/gt_pm (Chris)
Refactored access to rps structure by declaring struct intel_rps * in
many functions.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> #1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-9-git-send-email-sagar.a.kamble@intel.com
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-8-chris@chris-wilson.co.uk
There is function to tell how many ports we have, so use it.
We still have direct relationship with array size and port count,
so no harm was done.
Fixes: 76e70087d3 ("drm/i915: Make execlist port count variable")
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010114857.13108-1-mika.kuoppala@intel.com
Commit faf654864b ("drm/i915: Unify uC variable types to avoid
flooding checkpatch.pl") breaks 32-bit kernel builds. Lets use
cast helper to make compiler happy.
v2: introduce ptr_to_u64 (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006130844.49012-1-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With the code motion mostly done, convert all the uC code away
from uint??_t at once (only a couple dozen variables), so that
reading the checkpatch.pl output should actually pinpoint if
a new uint??_t was accidentally introduced.
v2: - Include intel_uc_fw.h too (Sagar)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006084940.15910-1-joonas.lahtinen@linux.intel.com
Move GuC submission declarations into dedicated header as we want to
keep uC specific code in separate files.
v2: fix include (Chris)
update commit message (Joonas)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: MichaĹ Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171004181343.66348-3-michal.wajdeczko@intel.com
As we emulate execlists on top of the GuC workqueue, it is not
restricted to just 2 ports and we can increase that number arbitrarily
to trade-off queue depth (i.e. scheduling latency) against pipeline
bubbles.
v2: rebase. better commit msg (Chris)
v3: rebase
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com
When first execlist entry is processed, we move the port (contents).
Introduce function for this as execlist and guc use this common
operation.
v2: rebase. s/GEM_DEBUG_BUG/GEM_BUG (Chris)
v3: rebase
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-4-mika.kuoppala@intel.com
Engine's execlist related items have been increasing to
a point where a separate struct is warranted. Carve execlist
specific items to a dedicated struct to add clarity.
v2: add kerneldoc and fix whitespace (Joonas, Chris)
v3: csb_mmio changes, rebase
v4: s/\b(el|execlist)\b/execlists/ (Joonas)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.
v5: pure rename
v6: fix
Credits-to: Coccinelle
@@
identifier n;
@@
(
- i915.n
+ i915_modparams.n
)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
We can just operate on the wq_tail directly (in the process descriptor).
This allows us to remove the duplicated tail from the client. While I'm
here let's also remove the constants kept in the client and document our
locking requirements. This causes a small change in one of GuC debugfs
files. We're no longer reporting constant values (which I don't think
is a problem), but we're also no longer reporting the tail (does anyone
care?).
v2: Update tail after wqi contents. (Chris)
v3: Really update tail after wqi contents.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170918092536.12287-1-michal.winiarski@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All we're really doing is incrementing a simple counter in a
doorbell_info struct. We can do without extra variables and a separate
counter kept in guc_client. Since it's gone, we're also removing its
debugfs.
The only functional change here, is that we're no longer treating 0 as a
special value. GuC doesn't seem to care, why should we?
v2: Restore desc->tail update.
v3: Drop the retry loop, assert that doorbell cookie doesn't change
behind our back.
v4: WARN rather than BUG, use xchg. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170914105125.3031-1-michal.winiarski@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
To create an upper bound on number of GuC workitems, we need to change
the way that requests are being submitted. Rather than submitting each
request as an individual workitem, we can do coalescing in a similar way
we're handlig execlist submission ports. We also need to stop pretending
that we're doing "lite-restore" in GuC submission (we would create a
workitem each time we hit this condition). This allows us to completely
remove the reservation, replacing it with a compile time check.
v2: Also coalesce when replaying on reset (Daniele)
v3: Consistent wq_resv - per-request (Daniele)
v4: Squash removing wq_resv
v5: Reflect i915_guc_submit argument changes in doc
v6: Rebase on top of execlists reset/restart fix (Chris,Michał)
References: https://bugs.freedesktop.org/show_bug.cgi?id=101873
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170914083216.10192-2-michal.winiarski@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Originally removed in:
c1adab9703 ("drm/i915/guc: Remove failed doorbell stat from debugfs")
f1448a62a1 ("drm/i915/guc: Remove last submission result from debugfs")
Were accidentally restored in:
925344ccc9 ("BackMerge tag 'v4.12-rc5' into drm-next")
We can also remove unused variable and replace it with a WARN.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170914083216.10192-1-michal.winiarski@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using the HWSP ggtt_offset to get the lrca offset is only correct if the
HWSP happens to be before it (when we reuse the PPHWSP of the kernel
context as the engine HWSP). Instead of making this assumption, get the
lrca offset from the kernel_context engine state.
And while looking at this part of the GuC interaction, it was also
noticed that the firmware expects the size of only the engine context
(context minus the execlist part, i.e. don't include the first 80
dwords), so pass the right size.
v2: Use the new macros to prevent abusive overuse of the old ones (Chris).
Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-2-chris@chris-wilson.co.uk
Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:
| [guc] | [hwsp] [logical state] |
|<- our header ->|<- context image ->|
So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).
This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.
v2: Use PPHWSP index for hws again.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk
The default values for the default scheduling policy come from the
GuC firmware itself. Transform the magic numbers into defines.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1505252197-27696-1-git-send-email-oscar.mateo@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We are passing parameters in the wrong order to find next zero bit, and
when it doesn't find anything it returns size (offset in the code), which
is always zero.
For reference the function is defined as:
find_next_bit( *addr, size, offset )
The incorrect parameter order was added by commit abddffdf36
("drm/i915/guc: Sanitize GuC client initialization"). Luckily, currently
we only use a single guc client and a single doorbell, which happens to be
zero; therefore it isn't necessary to backport this fix (which would be for
v4.12).
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170531000546.30762-1-michel.thierry@intel.com
If port[0] is occupied and we're trying to dequeue request from
different context, we will inevitably hit BUG_ON in port_assign.
Let's skip it - similar to what we're doing in execlists counterpart.
Fixes: 77f0d0e925 ("drm/i915/execlists: Pack the count into the low bits of the port.request")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170523102400.9614-2-michal.winiarski@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Usefulness of these stats was over-advertised.
v2: remove duplicated engine stats (Chris)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170515170610.35528-1-michal.wajdeczko@intel.com
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Debugfs does not seems to be a right place to display transient data.
If we want to capture errors, we should log them.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170518113104.54400-3-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This stat is almost always zero unless fatal error occurs,
which should be reported by other means anyway.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170518113104.54400-2-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.
Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.
There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).
v2: Avoid use-after-free when deleting a depleted priolist
v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function old new delta
execlists_submit_ports 262 471 +209
port_assign.isra - 136 +136
capture 6344 6359 +15
reset_common_ring 438 452 +14
execlists_submit_request 228 238 +10
gen8_init_common_ring 334 341 +7
intel_engine_is_idle 106 105 -1
i915_engine_info 2314 2290 -24
__i915_gem_set_wedged_BKL 485 411 -74
intel_lrc_irq_handler 1789 1604 -185
execlists_update_context 294 - -294
The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).
v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
Pre-calculate engine context size based on engine class and device
generation and store it in the engine instance.
v2:
- Squash and get rid of hw_context_size (Chris)
v3:
- Move after MMIO init for probing on Gen7 and 8 (Chris)
- Retained rounding (Tvrtko)
v4:
- Rebase for deferred legacy context allocation
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are enabling the breadcrumbs signaling prior to submitting the
request, we know that we cannot have missed the interrupt and can
therefore skip immediately waking the signaler to check.
This reduces a significant chunk of the __i915_gem_request_submit()
overhead for inter-engine synchronisation, for example in gem_exec_whisper.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170426080659.28771-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We need to keep track of the last location we ask the hw to read up to
(RING_TAIL) separately from our last write location into the ring, so
that in the event of a GPU reset we do not tell the HW to proceed into
a partially written request (which can happen if that request is waiting
for an external signal before being executed).
v2: Refactor intel_ring_reset() (Mika)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Testcase: igt/gem_exec_fence/await-hang
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
This reverts commit 6c943de668 ("drm/i915: Skip execlists_dequeue()
early if the list is empty").
The validity of using READ_ONCE there depends upon having a mb to
coordinate the assignment of engine->execlist_first inside
submit_request() and checking prior to taking the spinlock in
execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
memory barrier making this cross-cpu checking safe", but failed to
notice that this mb was *conditional* on the execlists being ready, i.e.
there wasn't the required mb when it was most necessary!
We could install an unconditional memory barrier to fixup the
READ_ONCE():
diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 7dd732cb9f57..1ed164b16d44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,6 +616,7 @@ static void execlists_submit_request(struct
drm_i915_gem_request *request)
if (insert_request(&request->priotree, &engine->execlist_queue))
{
engine->execlist_first = &request->priotree.node;
+ smp_wmb();
if (execlists_elsp_ready(engine))
But we have opted to remove the race as it should be rarely effective,
and saves us having to explain the necessary memory barriers which we
quite clearly failed at.
Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 6c943de668 ("drm/i915: Skip execlists_dequeue() early if the list is empty")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Whilst I like having the assertions clearly visible in the code, they
are quite repetitious! As we find new limits we want to incorporate into
the set of assertions, it make sense to refactor them to a common
routine.
v2: Add a guc holdout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
We are going to need it for future platforms.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
A GuC context and a HW context are in no way related, so the name "GuC context descriptor"
is very unfortunate, because a new reader of the code gets overwhelmed very quickly with
a lot of things called "context" that refer to different things. We can improve legibility
a lot by simply renaming a few objects in the GuC code.
v2:
- Rebased
- s/ctx_desc_pool/stage_desc_pool
- Move some explanations to the definition of the guc_stage_desc struct (Chris)
v3:
- Calculate gemsize with less intermediate steps (Joonas)
- Use BIT() macro (Joonas)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some recent refactoring patches have left the doorbell creation outside
the GuC client allocation, which does not make a lot of sense (a client
without a doorbell is something useless). Move it back there, and
refactor the init_doorbell_hw consequently.
Thanks to this, we can do some other improvements, like hoisting the
check for GuC submission enabled out of the enable function.
v2: Rebased.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Doorbell release flow requires that we wait for GEN8_DRB_VALID bit to go
to zero after updating db_status before we call the GuC to release the
doorbell.
Kudos to Daniele for finding this out.
v2: WARN instead of DRM_ERROR (Joonas)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
While at it, fix a typo (s/ring_lcra/ring_lrca) and improve the naming of one
firware interface field (s/ring_tail/submit_element_info, since it can contain
more than just the ring tail).
No change in functionality.
v2:
- Remove reference to "unique user" of the GuC (Daniele)
- Keep mention to renaming from "GuC context" to "client" (Daniele)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>