as these have been fixed in production hw and hurt performance
if applied.
v2: adjust requested ring space (Ville)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83482
Tested-by: zhoujian <jianx.zhou@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gen6 and earlier conflate address space selection (ppgtt vs ggtt) with
the security bit (i.e. only privileged batches were allowed to run from
ggtt). From Haswell only, you are able to select the security bit
separate from the address space - and we always requested to use ppgtt.
This breaks the golden render state batch execution with full-ppgtt as
that is only present in the global GTT and more generally any secure
batch that is not colocated in the ppgtt and ggtt. So we need to
disable the use of the ppgtt selector bit for secure batches, or else we
hang immediately upon boot and thence after every GPU reset...
v2: Only HSW differentiates between secure dispatch and ggtt, so simply
ignore the differentiation and always use secure==ggtt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Rectify commit message as noted by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- final bits (again) for the rotation support (Sonika Jindal)
- support bl_power in the intel backlight (Jani)
- vdd handling improvements from Ville
- i830M fixes from Ville
- piles of prep work all over to make skl enabling just plug in (Damien, Sonika)
- rename DP training defines to reflect latest edp standards, this touches all
drm drivers supporting DP (Sonika Jindal)
- cache edids during single detect cycle to avoid re-reading it for e.g. audio,
from Chris
- move w/a for registers which are stored in the hw context to the context init
code (Arun&Damien)
- edp panel power sequencer fixes, helps chv a lot (Ville)
- piles of other chv fixes all over
- much more paranoid pageflip handling with stall detection and better recovery
from Chris
- small things all over, as usual
* tag 'drm-intel-next-2014-09-05' of git://anongit.freedesktop.org/drm-intel: (114 commits)
drm/i915: Update DRIVER_DATE to 20140905
drm/i915: Decouple the stuck pageflip on modeset
drm/i915: Check for a stalled page flip after each vblank
drm/i915: Introduce a for_each_plane() macro
drm/i915: Rewrite ABS_DIFF() in a safer manner
drm/i915: Add comments explaining the vdd on/off functions
drm/i915: Move DP port disable to post_disable for pch platforms
drm/i915: Enable DP port earlier
drm/i915: Turn on panel power before doing aux transfers
drm/i915: Be more careful when picking the initial power sequencer pipe
drm/i915: Reset power sequencer pipe tracking when disp2d is off
drm/i915: Track which port is using which pipe's power sequencer
drm/i915: Fix edp vdd locking
drm/i915: Reset the HEAD pointer for the ring after writing START
drm/i915: Fix unsafe vma iteration in i915_drop_caches
drm/i915: init sprites with univeral plane init function
drm/i915: Check of !HAS_PCH_SPLIT() in PCH transcoder funcs
drm/i915: Use HAS_GMCH_DISPLAY un underrun reporting code
drm/i915: Use IS_BROADWELL() instead of IS_GEN8() in forcewake code
drm/i915: Don't call gen8_fbc_sw_flush() on chv
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUFjfVAAoJEHm+PkMAQRiGANkIAIU3PNrAz9dIItq8a/rEAhnx
l2shHoOyEmyNR2apholM3BPUNX50cbsc/HGdi7lZKLkA/ifAj6B9nFD2NzVsIChD
1QWVcvdkKlVuxXCDd26qbijlfmbTOAWrLw9ntvM+J6ZtECM6zCAZF4MAV/FwogPq
ETGKD76AxJtVIhBMS99troAiC1YxmQ7DKgEr8CraTOR1qwXEonnPCmN/IZA6x2/G
EXiihOuQB5me1X7k4PI0V8CDscQOn+3B2CQHIrjRB+KiTF+iKIuI8n6ORC6bpFh+
U8UZP9wLlIG1BrUHG83pIndglIHotqPcjmtfl1WGrRr2hn7abzVSfV+g5Syo3Vg=
=Ep+s
-----END PGP SIGNATURE-----
drm: backmerge tag 'v3.17-rc5' into drm-next
This is requested to get the fixes for intel and radeon into the
same tree for future development work.
i915_display.c: fix missing dev_priv conflict.
Running igt, I was encountering the invalid TLB bug on my 845g, despite
that it was using the CS workaround. Examining the w/a buffer in the
error state, showed that the copy from the user batch into the
workaround itself was suffering from the invalid TLB bug (the first
cacheline was broken with the first two words reversed). Time to try a
fresh approach. This extends the workaround to write into each page of
our scratch buffer in order to overflow the TLB and evict the invalid
entries. This could be refined to only do so after we update the GTT,
but for simplicity, we do it before each batch.
I suspect this supersedes our current workaround, but for safety keep
doing both.
v2: The magic number shall be 2.
This doesn't conclusively prove that it is the mythical TLB bug we've
been trying to workaround for so long, that it requires touching a number
of pages to prevent the corruption indicates to me that it is TLB
related, but the corruption (the reversed cacheline) is more subtle than
a TLB bug, where we would expect it to read the wrong page entirely.
Oh well, it prevents a reliable hang for me and so probably for others
as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Ville found an old w/a documented for g4x that suggested that we need to
reset the HEAD after writing START. This is a useful fixup for some of
the g4x ring initialisation woes, but as usual, not all.
v2: Do the rewrite unconditionally anyway
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we happen to emit more than I915_MAX_WA_REGS workarounds, we will
currently discard them, not even emit the LRI. Not really what we want,
so warn loudly.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When entering intel_ring_emit_wa() with num_wa_regs equal to
I915_MAX_WA_REGS, we end up indexing the intel_wa_regs array beyond its
allocation.
Fix the check then.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Follow the BDW example and apply the workarounds touching registers
which are saved in the context image through LRIs in the new
ring->init_context() hook.
This makes Mesa much happier and eg. glxgears doesn't hang after
the first frame.
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add missing wa table initialization to avoid a functional
conflict with Arun's wa table debugfs support.]
Reviewed-by: "Barbalho, Rafael" <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The workarounds that are applied are exported to a debugfs file;
this is used to verify their state after the test case (reset or
suspend/resume etc). This patch is only required to support i-g-t.
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.
v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.
v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.
v4: Use abstract name when exporting gen specific routines (Chris)
For: VIZ-4092
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
According to spec FBC on BDW and HSW are identical without any gaps.
So let's copy the nuke and let FBC really start compressing stuff.
Without this patch we can verify with false color that nothing is being
compressed. With the nuke in place and false color it is possible
to see false color debugs.
Unfortunatelly on some rings like BCS on BDW we have to avoid Bits 22:18 on
LRIs due to a high risk of hung. So, when using Blt ring for frontbuffer rend
cache would never been cleaned and FBC would stop compressing buffer.
One alternative is to cache clean on software frontbuffer tracking.
v2: Fix rebase conflict.
v3: Do not clean cache on BCS ring. Instead use sw frontbuffer tracking.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.
v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.
v3: Daniel suggests to drop the "if (execlists)" because the Execlists
queue should be empty in legacy mode (which is true, if we do the
INIT_LIST_HEAD).
v4: Do the pending intel_runtime_pm_put
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A subsequent patch will no longer initialize the aliasing ppgtt if we
have full ppgtt enabled, since we simply don't need that any more.
Unfortunately a few places check for the aliasing ppgtt instead of
checking for ppgtt in general. Fix them up.
One special case are the gtt offset and size macros, which have some
code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
is _not_ a logical address space, so passing that in as the vm is
plain and simple a bug. So just WARN about it and carry on - we have a
gracefully fall-through anyway if we can't find the vma.
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Same as the legacy-style ring->flush.
v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.
v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).
And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.
Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.
v2: Pass the ringbuffer only (whenever possible).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.
v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.
Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).
v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.
v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.
In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
During ring initialisation, sometimes we observe, though not in
production hardware, that the idle flag is not set even though the ring
is empty. Double check before giving up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Doing a 1s wait (tops) with the cpu is a bit excessive. Tune it down
like everything else in that code.
v2: Also insert the missing space Chris spotted.
Cc: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Withtout this, ring initialization fails reliabily during resume with
[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head ffffff8804 tail 00000000 start 000e4000
This is not a complete fix, but it is verified to make the ring
initialization failures during resume much less likely.
We were not able to root-cause this bug (likely HW-specific to Gen4 chips)
yet. This is therefore used as a ducttape before problem is fully
understood and proper fix created, so that people don't suffer from
completely unusable systems in the meantime.
The discussion and debugging is happening at
https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Broadwell, any PIPE_CONTROL with the "State Cache Invalidate" bit set
must be preceded by a PIPE_CONTROL with the "CS Stall" bit set.
Documented on the BSpec 3D workarounds page.
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[vsyrjala: add chv w/a note too]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We'll want to reuse this for a workaround.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Rmove now unused int.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Traditionally we use genX_ for GT/render stuff and the codenames for
display stuff. But the gt and pm interrupt handling functions on
gen5/6+ stuck out as exceptions, so convert them.
Looking at the diff this nicely realigns our ducks since almost all
the callers are already platform-specific functions following the
genX_ pattern.
Spotted while reviewing some internal rps patches.
No function change in this patch.
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On g33, the documentation states
"HWS_PGA:
Format = Bits 28:12 of graphics memory address (bits 31:29 MBZ)."
which translates to that the address of the HWS must be below 256MiB,
which is conveniently the mappable aperture.
This also appears to be true (but not documented as so) for gen4 and
gen5. To generalise we force it into the low mappable region for all
non-LLC platforms. If we locate the HWS at the top of the GTT the
machine will hard hang during boot (fails on pnv, gm45, ilk and byt,
but works on snb, ivb, hsw).
v2: Add comments to explain why use PIN_MAPPABLE even though we have
no intention of mapping the object. (Ville)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's simple enough that it doesn't need to know anything about the
engine.
Trivial change.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
More prep work: with Execlists, we are going to start creating a lot
of extra ringbuffers soon, so these functions are handy.
No functional changes.
v2: rename allocate/destroy_ring_buffer to alloc/destroy_ringbuffer_obj
because the name is more meaningful and to mirror a similar function in
the context world: i915_gem_alloc_context_obj(). Change suggested by Brad
Volkin.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As Ville points out, it's possible/probable we don't actually need this.
Potentially, this validates the letter of the spec, and not the spirit.
Ville:
> I discussed this on irc w/ Ben, and I was suggesting we don't need to
> poll. Polling apparently can be used as a workaround for certain
> hardware issues, but it looks like those issues shouldn't affect us,
> for the momemnt at least. So my suggestion was to try w/o polling
> first (since there could be some power cost to polling) and add the
> poll bit if problems arise.
Rodrigo: Spec suggests this as an W/A for GT3. However semaphores didn't
worked in my BDW GT2 on Signal Mode. So pool mode is definitely needed.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Semaphore waits use a new instruction, MI_SEMAPHORE_WAIT. The seqno to
wait on is all well defined by the table in the previous patch. There is
nothing else different from previous GEN's semaphore synchronization
code.
v2: Update macros to not require the other ring's ring->id (Chris)
v3: Add missing VCS2 gen8_ring_wait init besides
s/ring_buffer/engine_cs (Rodrigo)
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.
The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).
NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.
v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)
v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)
v4: Fixed compilation rebase conflict
commit 1ec9e26dda
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Feb 14 14:01:11 2014 +0100
drm/i915: Consolidate binding parameters into flags
v5: VCS2 rebase
Replace hweight_long with hweight32
v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
* fixing conflicst on rebase
* minor fixes on address table
* remove WARN_ON
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: s/BUG_ON/WARN_ON/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the ring mask we now have an easy way to know the number of rings
in the system, and therefore can accurately predict the number of dwords
to emit for semaphore signalling. This was not possible (easily)
previously.
There should be no functional impact, simply fewer instructions emitted.
While we're here, simply do the round up to 2 instead of the fancier
rounding we did before, which rounding up per mbox, ie 4. This also
allows us to drop the unnecessary MI_NOOP, so not really 4, 3.
v2: Use 3 dwords instead of 4 (Ville)
Do the proper calculation to get the number of dwords to emit (Ville)
Conditionally set .sync_to when semaphores are enabled (Ville)
v3: Rebased on VCS2
Replace hweight_long with hweight32 (Ville)
v4: Pull out the accidentally squashed hunk from the next patch after
rebase (Daniel).
v5: Fix conflict after rebase (Rodrigo)
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Gen8 has already had some differentiation with how it handles rings.
Semaphores bring yet more differences, and now is as good a time as any
to do the split.
Also, since gen8 doesn't actually use semaphores up until this point,
put the proper "NULL" values in for the mbox info.
v2: v1 had a stale commit message
v3: Move everything in the is_semaphore_enabled() check
v4: VCS2 rebase
Remove double assignment of signal in render ring (Ville)
v5: Adding missed VCS2 signal init on gen8+ (Rodrigo)
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It just fix a typo.
v2: removing underscore to let this like all other ring names (Oscar)
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by (v1): Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This commit add check for return value of init_ring_common() in the
init_render_ring(). Now, when failure is detected the error code is
propagated to the caller instead of being ignored.
Signed-off-by: Konrad Zapalowicz <bergo.torino@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds support for a write-enable bit in the entry of GTT.
This is handled via a read-only flag in the GEM buffer object which
is then used to see how to set the bit when writing the GTT entries.
Currently by default the Batch buffer & Ring buffers are marked as read only.
v2: Moved the pte override code for read-only bit to 'byt_pte_encode'. (Chris)
Fixed the issue of leaving 'gt_old_ro' as unused. (Chris)
v3: Removed the 'gt_old_ro' field, now setting RO bit only for Ring Buffers(Daniel).
v4: Added a new 'flags' parameter to all the pte(gen6) encode & insert_entries functions,
in lieu of overloading the cache_level enum (Daniel).
v5: Removed the superfluous VLV check & changed the definition location of PTE_READ_ONLY flag (Imre)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
These do not exist anymore.
Spotted while reading through intel_ringbuffer.c
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Gen2 doesn't have the ring idle/stop bits in the SCPD/MI_MODE register,
so don't go spewing warnings about the state of those bits.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Manual cleanup after the previous Coccinelle script.
Yes, I could write another Coccinelle script to do this but I
don't want labor-replacing robots making an honest programmer's
work obsolete (also, I'm lazy).
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As advanced by the previous patch, the ringbuffers and the engine
command streamers belong in different structs. This is so because,
while they used to be tightly coupled together, the new Logical
Ring Contexts (LRC for short) have a ringbuffer each.
In legacy code, we will use the buffer* pointer inside each ring
to get to the pertaining ringbuffer (the actual switch will be
done in the next patch). In the new Execlists code, this pointer
will be NULL and we will use instead the one inside the context
instead.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.
No functional changes.
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We implement the following workarounds:
* WaDisableAsyncFlipPerfMode:chv
* WaProgramMiArbOnOffAroundMiSetContext:chv
v2: Drop WaDisableSemaphoreAndSyncFlipWait note
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we dont have semaphores enabled, we allocate 4
dwords for signalling. But end up emitting more regardless.
Fix this by bailing out early if semaphores are not enabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78274
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78283
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is missing in:
commit 78325f2d27
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Tue Apr 29 14:52:29 2014 -0700
drm/i915: Virtualize the ringbuffer signal func
Looks to me like a rebase side-effect...
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For clients that submit large batch buffers the command parser has
a substantial impact on performance. On my HSW ULT system performance
drops as much as ~20% on some tests. Most of the time is spent in the
command lookup code. Converting that from the current naive search to
a hash table lookup reduces the performance drop to ~10%.
The choice of value for I915_CMD_HASH_ORDER allows all commands
currently used in the parser tables to hash to their own bucket (except
for one collision on the render ring). The tradeoff is that it wastes
memory. Because the opcodes for the commands in the tables are not
particularly well distributed, reducing the order still leaves many
buckets empty. The increased collisions don't seem to have a huge
impact on the performance gain, but for now anyhow, the parser trades
memory for performance.
NB: Ville noticed that the error paths through the ring init code
will leak memory. I've not addressed that here. We can do a follow
up pass to handle all of the leaks.
v2: improved comment describing selection of hash key mask (Damien)
replace a BUG_ON() with an error return (Tvrtko, Ville)
commit message improvements
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
During the review of
commit 1f70999f90
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Jan 27 22:43:07 2014 +0000
drm/i915: Prevent recursion by retiring requests when the ring is full
Ville raised the point that our interaction with request->tail was
likely to foul up other uses elsewhere (such as hang check comparing
ACTHD against requests).
However, we also need to restore the implicit retire requests that certain
test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the
spectre that the ppgtt will randomly call i915_gpu_idle() and recurse
back into intel_ring_begin().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Remove now unused 'tail' variable as spotted by Brad.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A few improvements to the fallback method for waiting upon ring space:
1. Fix the start/end wait tracepoints to always be paired.
2. Increase responsiveness of checking
3. Mark the process as waiting upon io
4. Check for signal interruptions
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Drop the s/msleep/io_schedule_timeout/ change again since the
latter isn't exported.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>