In execlist mode, the ringbuf is a function of the ring and context whereas in
legacy mode, it is derived from the ring alone. Thus the calculation required to
determine the ringbuf pointer from the ring (and context) also needs to test
execlist mode or not. This is messy.
Further, the request structure holds a pointer to both the ring and the context
for which it was created. Thus, given a request, it is possible to derive the
ringbuf in either legacy or execlist mode. Hence it is necessary to pass just
the request in to all the low level functions rather than some combination of
request, ring, context and ringbuf. However, rather than recalculating it each
time, it is much simpler to just cache the ringbuf pointer in the request
structure itself.
Caching the pointer means the calculation is done once at request creation time
and all further code and simply read it directly from the request structure.
OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Drop contentless comment in lrc alloc request entirely. And
spelling fix in the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.
This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.
OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When one EU is disabled in a particular subslice, we can tune how the
work is spread between subslices to improve EU utilization.
v2: - Use a bitfield to record which subslice(s) has(have) 7 EUs. That
will also make the machinery work if several sublices have 7 EUs.
(Jeff Mcgee)
- Only apply the different hashing algorithm if the slice is
effectively unbalanced by checking there's a single subslice with
7 EUs. (Jeff Mcgee)
v3: Fix typo in comment (Jeff Mcgee)
Issue: VIZ-3845
Cc: Jeff Mcgee <jeff.mcgee@intel.com>
Reviewed-by: Jeff Mcgee <jeff.mcgee@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I have no idea how that crept in, but we need to do the write from the
ring and this is a masked register. Two fixes in 1!
Cc: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's always a good idea to keep static analysis happy (also because it
prompts doing the check like I proposed :), this time smatch complains:
drivers/gpu/drm/i915/intel_ringbuffer.c:891 gen9_init_workarounds() warn:
always true condition '((->dev->pdev->revision) >= (0)) => (0-255 >= 0)'
That's because revision is a u8. Tweak a bit the condition then.
Cc: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function will host SKL-only W/As.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function is only used in intel_ringbuffer.c, so restrict it to that
file. The function was moved around to avoid a forward declaration and
group it with its user.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Squash in fixup from Wu Fengguang.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixed the stepping check on WaDisableDgMirrorFixInHalfSliceChicken5
to be for the correct SOC (Skylake)
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move Wa4x4STCOptimizationDisable to gen9_init_workarounds
v2: rebase
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This one doesn't have one of these nice cryptic names unfortunately.
v2: Added missing register bitmap
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Dont add WaDisableThreadStallDopClockGating as not SKL WA. (Found
by Damien Lespiau)
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Bikeshed commit message a bit as per Damien's suggestions.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add framework for gen 9 HW WAs
v1: Changed SOC specific WA function to gen 9 common function (Req: Damien Lespiau)
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already track this in the intel_info struct.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: Make the commit message a bit less terse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I ran a few tests with xonotic and synmark2 trying out the
different WIZ hashing modes on CHV. The results seem to match the
results I got with IVB/HSW when I did the similar tests on them
in the past. That is 16x4 is generally the fastest mode, 8x8 comes
next and finally 8x4. On CHV the difference between the modes is
at most ~1% in most tests. IIRC on IVB/HSW the difference was a little
bigger, but as there doesn't seem to be any real downside to 16x4
let's use it by default.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wa4x4STCOptimizationDisable got only implemented for BDW, but according
to the w/a database CHV needs it too, so add it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have multiple forcewake domains now on recent gens. Change the
function naming to reflect this.
v2: More verbose names (Chris)
v3: Rebase
v4: Rebase
v5: Add documentation for forcewake_get/put
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.
Issue: VIZ-4274
v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer. Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.
Increases performance in OglVSInstancing by about 2.7x on Braswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer. Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.
Improves performance in OglVSInstancing by 3.2x on Broadwell GT3e
(Iris Pro 6200).
Thanks to Jesse Barnes and Ben Widawsky for their help in tracking this
down. Thanks to Chris Wilson for showing me the new workarounds system.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Found by reading the HIZ_CHICKEN documentation.
Improves performance in a HiZ microbenchmark by around 50%.
Improves performance in OglZBuffer by around 18%.
Thanks to Chris Wilson for helping me figure out where to put this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Conflicts:
drivers/gpu/drm/i915/intel_runtime_pm.c
Separate branch so that Takashi can also pull just this refactoring
into sound-next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In order to act as a full command barrier by itself, we need to tell the
pipecontrol to actually stall the command streamer while the flush runs.
We require the full command barrier before operations like
MI_SET_CONTEXT, which currently rely on a prior invalidate flush.
References: https://bugs.freedesktop.org/show_bug.cgi?id=83677
Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In the gen7 pipe control there is an extra bit to flush the media
caches, so let's set it during cache invalidation flushes.
v2: Rename to MEDIA_STATE_CLEAR to be more inline with spec.
Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Otherwise, new platforms without workarounds will hit this warning for
every new context created.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already implement this workaround, but it was missing its name.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We may be hidding bugs by doing that, so let remove it and have the
actual mask value shine through, for better or worse.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
While trying to unify the order of those arguments throughout the
driver, Daniel noticed what we were inverting them in this part of the
code.
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
I was playing with clang and oh surprise! a warning trigerred by
-Wshift-overflow (gcc doesn't have this one):
WA_SET_BIT_MASKED(GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
drivers/gpu/drm/i915/intel_ringbuffer.c:786:2: warning: signed shift result
(0x28002000000) requires 43 bits to represent, but 'int' only has 32 bits
[-Wshift-overflow]
WA_SET_BIT_MASKED(GEN7_GT_MODE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/intel_ringbuffer.c:737:15: note: expanded from macro
'WA_SET_BIT_MASKED'
WA_REG(addr, _MASKED_BIT_ENABLE(mask), (mask) & 0xffff)
Turned out GEN6_WIZ_HASHING_MASK was already shifted by 16, and we were
trying to shift it a bit more.
The other thing is that it's not the usual case of setting WA bits here, we
need to have separate mask and value.
To fix this, I've introduced a new _MASKED_FIELD() macro that takes both the
(unshifted) mask and the desired value and the rest of the patch ripples
through from it.
This bug was introduced when reworking the WA emission in:
Commit 7225342ab5
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Oct 7 17:21:26 2014 +0300
drm/i915: Build workaround list in ring initialization
v2: Invert the order of the mask and value arguments (Daniel Vetter)
Rewrite _MASKED_BIT_ENABLE() and _MASKED_BIT_DISABLE() with
_MASKED_FIELD() (Jani Nikula)
Make sure we only evaluate 'a' once in _MASKED_BIT_ENABLE() (Dave Gordon)
Add check to ensure the value is within the mask boundaries (Chris Wilson)
v3: Ensure the the value and mask are 16 bits (Dave Gordon)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Similar to a patch from Thomas Daniel for lrc contexts. This keeps
both sides somewhat in sync and should make Dave Gordon happy.
Note that both the wa and the golden context init code suffer a bit
from an inssuficient split into driver load and hw init code. Which
means we have a bunch of tests all over the place to check whether the
one-time initialization has been done already or not.
All that one-tim code should be moved into the one-time ring setup
code, but that's work for later.
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For debugging purposes, it is useful to be able to uniquely identify a given
request structure as it works its way through the system. This becomes
especially tricky once the seqno value is lazily allocated as then the request
has nothing but its pointer to identify it for much of its life.
Change-Id: Ie76b2268b940467f4cdf5a4ba6f5a54cbb96445d
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a general theory that kzmalloc is better/safer than kmalloc, especially
for interesting data structures. This change updates the request structure
allocation to be zero filled.
This also fixes crashes in the reset code. Quoting Mika's patch:
"Clean the request structure on alloc. Otherwise we might end up
referencing uninitialized fields. This is apparent when we try to
cleanup the preallocated request on ring reset, before any request has
been submitted to the ring. The request->ctx is foobar and we end up
freeing the foobarness."
Note that this fixes a regression introduced in
commit 9eba5d4a1d
Author: John Harrison <John.C.Harrison@Intel.com>
Date: Mon Nov 24 18:49:23 2014 +0000
drm/i915: Ensure OLS & PLR are always in sync
References: https://bugs.freedesktop.org/show_bug.cgi?id=86959
References: https://bugs.freedesktop.org/show_bug.cgi?id=86962
References: https://bugs.freedesktop.org/show_bug.cgi?id=86992
Change-Id: I68715ef758025fab8db763941ef63bf60d7031e2
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already have it for chv, but was missing for bdw.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that sanity prevails and we have the clean split between software
init and starting the engines we can drop all the "have we allocate
this struct already?" nonsense.
Execlist code could benefit quite a bit more still, but that's for
another patch.
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
We can do this.
And now there's finally the clean split between software setup and
hardware setup I kinda wanted since multi-ring support was merged
aeons ago. It only took almost 5 years.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With this all the ->init_hw hooks really only set up hw state needed
to start the ring, all the software state setup and memory/buffer
allocations happen beforehand.
v2: We need to call intel_init_pipe_control after the ring init since
otherwise engine->dev is NULL and it falls over. Currently that's
now after the hw ring is enabled but a) we'll be fine as long as no
one submits a batch b) this will change soon.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is (mostly, some exceptions that need fixing) the hw setup
function which starts the ring. And not the function which allocates
all the resources.
Make this clear by giving it a better name.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).
This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The used space in a ring is given by the cyclic distance from the
consumer (HEAD) to the producer (TAIL), i.e. ((tail-head) MOD size);
conversely, the available space in a ring is the cyclic distance
from the producer to the consumer, MINUS the amount reserved for a
"gap" that is supposed to guarantee that the producer never catches
up with or overruns the consumer. Note that some GEN h/w requires
that TAIL never approach to within one cacheline of HEAD, so the gap
is usually set to twice the cacheline size to ensure this.
While the existing code gives the correct answer for correct inputs,
if the producer HAS overrun into the reserved space, the result can
be a value larger than the maximum valid value (size-reserved). We
can improve this by reorganising the calculation, so that in the
event of overrun the result will be negative rather than over-large.
This means that the commonly-used test (available >= required)
will then reject further writes into the ring after an overrun,
giving some chance that we can recover from or at least diagnose
the original problem; whereas allowing more writes would likely both
confuse the h/w and destroy the evidence of what went wrong.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It makes a lot more sense (and makes future seqno -> request conversion patches
simpler) to fill in the 'ring' field of the request structure at the point of
creation rather than submission. Given that the request structure is assigned by
ring specific code and thus is locked to a ring from the start, there really is
no reason to defer this assignment.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated i915_wait_seqno() to take a request structure instead of a seqno value
and renamed it accordingly. Internally, it just pulls the seqno out of the
request and calls on to __wait_seqno() as before. However, all the code further
up the stack is now simplified as it can just pass the request object straight
through without having to peek inside.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Squash in hunk from an earlier patch which was rebased
wrongly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>