Commit Graph

282 Commits

Author SHA1 Message Date
John Harrison
5fb9de1a2e drm/i915: Update intel_ring_begin() to take a request structure
Now that everything above has been converted to use requests, intel_ring_begin()
can be updated to take a request instead of a ring. This also means that it no
longer needs to lazily allocate a request if no-one happens to have done it
earlier.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:29 +02:00
John Harrison
bba09b12b4 drm/i915: Update cacheline_align() to take a request structure
Updated intel_ring_cacheline_align() to take a request instead of a ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:28 +02:00
John Harrison
f71696876a drm/i915: Update ring->signal() to take a request structure
Updated the various ring->signal() implementations to take a request instead of
a ring. This removes their reliance on the OLR to obtain the seqno value that
should be used for the signal.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:27 +02:00
John Harrison
599d924c6b drm/i915: Update ring->sync_to() to take a request structure
Updated the ring->sync_to() implementations to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the patch which added ->uniq.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:27 +02:00
John Harrison
be795fc17b drm/i915: Update ring->emit_bb_start() to take a request structure
Updated the ring->emit_bb_start() implementation to take a request instead of a
ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:26 +02:00
John Harrison
53fddaf70d drm/i915: Update ring->dispatch_execbuffer() to take a request structure
Updated the various ring->dispatch_execbuffer() implementations to take a
request instead of a ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:25 +02:00
John Harrison
c4e766389e drm/i915: Update ring->emit_request() to take a request structure
Updated the ring->emit_request() implementation to take a request instead of a
ringbuf/request pair. Also removed its use of the OLR for obtaining the
request's seqno.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:24 +02:00
John Harrison
ee044a8863 drm/i915: Update ring->add_request() to take a request structure
Updated the various ring->add_request() implementations to take a request
instead of a ring. This removes their reliance on the OLR to obtain the seqno
value that the request should be tagged with.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:24 +02:00
John Harrison
7deb4d3980 drm/i915: Update ring->emit_flush() to take a request structure
Updated the various ring->emit_flush() implementations to take a request instead
of a ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:23 +02:00
John Harrison
a84c3ae168 drm/i915: Update ring->flush() to take a requests structure
Updated the various ring->flush() functions to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the addition of req->uniq.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:21 +02:00
John Harrison
4866d729ab drm/i915: Update flush_all_caches() to take request structures
Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:20 +02:00
John Harrison
2f20055d36 drm/i915: Update a bunch of execbuffer helpers to take request structures
Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and
i915_emit_box() to take request structures instead of ring or ringbuf/context
pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:18 +02:00
John Harrison
6258fbe23f drm/i915: Update queue_flip() to take a request structure
Updated the display page flip code to do explicit request creation and
submission rather than relying on the OLR and just hoping that the request
actually gets submitted at some random point.

The sequence is now to create a request, queue the work to the ring, assign the
known request to the flip queue work item then actually submit the work and post
the request.

Note that every single flip function used to finish with
'__intel_ring_advance(ring);'. However, immediately after they return there is
now an add request call which will do the advance anyway. Thus the many
duplicate advance calls have been removed.

v2: Updated commit message with comment about advance removal.

v3: The request can now be allocated by the _sync() code earlier on. Thus the
page flip path does not necessarily need to allocate a new request, it may be
able to re-use one.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:15 +02:00
John Harrison
8753181e10 drm/i915: Update init_context() to take a request structure
Now that everything above has been converted to use requests, it is possible to
update init_context() to take a request pointer instead of a ring/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:12 +02:00
John Harrison
29b1b415fc drm/i915: Reserve ring buffer space for i915_add_request() commands
It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:56 +02:00
Arun Siluvery
17ee950df3 drm/i915/gen8: Add infrastructure to initialize WA batch buffers
Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.
Use do {} while(0) for wa_ctx_emit() macro.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:39 +02:00
Francisco Jerez
c1091b25f5 drm/i915: Extend the parser to check register writes against a mask/value pair.
In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits.  This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset.  For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-06-15 16:00:43 +03:00
Francisco Jerez
4e86f725ce drm/i915: Extend the parser to check register writes against a mask/value pair.
In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits.  This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset.  For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-15 12:34:50 +02:00
Chris Wilson
06fbca713e drm/i915: Split the batch pool by engine
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by:  Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:04 +02:00
John Harrison
6689cb2b62 drm/i915: Move common request allocation code into a common function
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:54:30 +02:00
Paulo Zanoni
dbef0f15b5 drm/i915: add frontbuffer tracking to FBC
Kill the blt/render tracking we currently have and use the frontbuffer
tracking infrastructure.

Don't enable things by default yet.

v2: (Rodrigo) Fix small conflict on rebase and typo at subject.
v3: (Paulo) Rebase on RENDER_CS change.
v4: (Paulo) Rebase.
v5: (Paulo) Simplify: flushes don't have origin (Daniel).
            Also rebase due to patch order changes.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-17 22:29:56 +01:00
John Harrison
8e004efc16 drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.

This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.

OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 22:43:29 +01:00
Thomas Daniel
b07da53c79 drm/i915: Shift driver's HWSP usage out of reserved range
As of Gen6, the general purpose area of the hardware status page has shrunk and
now begins at dword 0x30.  i915 driver uses dword 0x20 to store the seqno which
is now reserved.  So shift our HWSP dwords up into the general purpose range
before this bites us.

Note that all available documentation just says this is reserved
without going into details about what it's used for.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
[danvet: Add clarification from Thomas that unfortunately Bspec is
silent on what "reserverd" precisely means.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-24 11:50:32 +01:00
Damien Lespiau
af75f26918 drm/i915: Make intel_ring_setup_status_page() static
This function is only used in intel_ringbuffer.c, so restrict it to that
file. The function was moved around to avoid a forward declaration and
group it with its user.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Squash in fixup from Wu Fengguang.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:27 +01:00
Nick Hoath
21076372af drm/i915: Remove FIXME_lrc_ctx backpointer
The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.

v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:53 +01:00
Nick Hoath
72f95afa5f drm/i915: Removed duplicate members from submit_request
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.

Issue: VIZ-4274

v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:52 +01:00
Daniel Vetter
ecfe00d802 drm/i915: s/init()/init_hw()/ in intel_engine_cs
This is (mostly, some exceptions that need fixing) the hw setup
function which starts the ring. And not the function which allocates
all the resources.

Make this clear by giving it a better name.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:27 +01:00
Dave Gordon
ebd0fd4bef drm/i915: Consolidate ring freespace calculations
There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).

This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:26 +01:00
John Harrison
581c26e8a2 drm/i915: Convert 'trace_irq' to use requests rather than seqnos
Updated the trace_irq code to use requests instead of seqnos. This includes
reference counting the request object to ensure it sticks around when required.
Note that getting access to the reference counting functions means moving the
inline i915_trace_irq_get() function from intel_ringbuffer.h to i915_drv.h.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Resolve conflict due to shuffled merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:24 +01:00
John Harrison
6259cead57 drm/i915: Remove 'outstanding_lazy_seqno'
The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:16 +01:00
John Harrison
97b2a6a10a drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req
The object structure contains the last read, write and fenced seqno values for
use in syncrhonisation operations. These have now been replaced with their
request structure counterparts.

Note that to ensure that objects do not end up with dangling pointers, the
assignments of last_*_req include reference count updates. Thus a request cannot
be freed if an object is still hanging on to it for any reason.

v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did
not get updated when 'last_rendering_seqno' became 'last_read|write_seqno'
several millenia ago.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:14 +01:00
John Harrison
b793a00a57 drm/i915: Add helper functions to aid seqno -> request transition
Added helper functions for retrieving the ring and seqno entries from a request
structure. This allows the internal workings of the request structure to be
hidden from code that is using these. It also allows for useful
workarounds/debug code to be added as or when necessary.

Note that it is intended that the majority (if not all) uses of the seqno
accessor will disappear eventually as code is updated to use the request
structure itself rather than working with seqno values.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:14 +01:00
Chris Wilson
5c6c600354 drm/i915: Remove DRI1 ring accessors and API
With the deprecation of UMS, and by association DRI1, we have a tough
choice when updating the ring access routines. We either rewrite the
DRI1 routines blindly without testing (so likely to be broken) or take
the liberty of declaring them no longer supported and remove them
entirely. This takes the latter approach.

v2: Also remove the DRI1 sarea updates

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Fix rebase conflicts.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 21:17:11 +01:00
Thomas Daniel
7ba717cf36 drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).

Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.

v2: Same as with the context pinning, we cannot really do it during
an interrupt. Also, pin the default ringbuffers objects regardless
(makes error capture a lot easier).

v3: Rebased. Take a pin reference of the ringbuffer for each item
in the execlist request queue because the hardware may still be using
the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update
is executed.  The ringbuffer must remain pinned until the context save
is complete.  No longer pin and unpin ringbuffer in
populate_lr_context() - this transient address is meaningless and the
pinning can cause a sleep while atomic.

v4: Moved ringbuffer pin and unpin into the lr_context_pin functions.
Downgraded pinning check BUG_ONs to WARN_ONs.

v5: Reinstated WARN_ONs for unexpected execlist states.  Removed unused
variable.

Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:56:44 +01:00
Thomas Daniel
c86ee3a9f8 drm/i915/bdw: Clean up execlist queue items in retire_work
No longer create a work item to clean each execlist queue item.
Instead, move retired execlist requests to a queue and clean up the
items during retire_requests.

v2: Fix legacy ring path broken during overzealous cleanup

v3: Update idle detection to take execlists queue into account

v4: Grab execlist lock when checking queue state

v5: Fix leaking requests by freeing in execlists_retire_requests.

Issue: VIZ-4274
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:16:45 +01:00
Michel Thierry
771b9a5324 drm/i915: Initialize workarounds in logical ring mode too
Following the legacy ring submission example, update the
ring->init_context() hook to support the execlist submission mode.

v2: update to use the new workaround macros and cleanup unused code.
This takes care of both bdw and chv workarounds.

v2.1: Add missing call to init_context() during deferred context creation.

v3: Split init_context (emit) in legacy/lrc modes. For lrc, get the ringbuf
from the context (Mika/Daniel).

v4: Merge init_context interfaces back, the legacy mode only needs the ring,
but the lrc mode needs the ring and context (Mika).

Issue: VIZ-4092
Issue: GMIN-3475
Change-Id: Ie3d093b2542ab0e2a44b90460533e2f979788d6c
Cc: Deepak S <deepak.s@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Align function paramater lists properly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 10:29:25 +01:00
Arun Siluvery
86d7f23842 drm/i915/bdw: Apply workarounds in render ring init function
For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.

v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.

v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.

v4: Use abstract name when exporting gen specific routines (Chris)

For: VIZ-4092
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 11:04:42 +02:00
Thomas Daniel
e981e7b17f drm/i915/bdw: Handle context switch events
Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:43:47 +02:00
Michel Thierry
acdd884a2e drm/i915/bdw: Two-stage execlist submit process
Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:10:59 +02:00
Oscar Mateo
582d67f0b1 drm/i915: Add temporary ring->ctx backpointer
The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.

The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 18:42:59 +02:00
Oscar Mateo
1564858526 drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Ditch the check for aliasing ppgtt. It'll break soon and
execlists requires full ppgtt anyway.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:12:34 +02:00
Oscar Mateo
73d477f6bb drm/i915/bdw: Interrupts with logical rings
We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the GEN8_ prefix from the context switch interrupt
define and move it to its brethren.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:06:58 +02:00
Oscar Mateo
4712274c36 drm/i915/bdw: GEN-specific logical ring emit flush
Same as the legacy-style ring->flush.

v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.

v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:44:37 +02:00
Oscar Mateo
4da46e1e5b drm/i915/bdw: GEN-specific logical ring emit request
Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.

v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:49 +02:00
Oscar Mateo
82e104cc26 drm/i915/bdw: New logical ring submission mechanism
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.

Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.

v2: Pass the ringbuffer only (whenever possible).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:36 +02:00
Oscar Mateo
9b1136d505 drm/i915/bdw: GEN-specific logical ring init
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 17:03:28 +02:00
Oscar Mateo
48d823878d drm/i915/bdw: Generic logical ring init and cleanup
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:55:17 +02:00
Daniel Vetter
0c7dd53b84 drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.

v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:18:17 +02:00
Oscar Mateo
84c2377fce drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:10:58 +02:00
Mika Kuoppala
f260fe7b2f drm/i915: Don't accumulate hangcheck score on forward progress
If the actual head has progressed forward inside a batch (request),
don't accumulate hangcheck score.

As the hangcheck score in increased only by acthd jumping backwards,
the result is that we only declare an active batch as stuck if it is
trapped inside a loop. Or that the looping will dominate the batch
progression so that it overcomes the bonus that forward progress gives.

v2: Improved commit message (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: s/active_loop/active (loop)/ as requested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-07 14:04:07 +02:00
Oscar Mateo
1b5d063faf drm/i915: Generalize intel_ring_get_tail to take a ringbuf
Again, it's low-level enough to simply take a ringbuf and nothing
else.

Trivial change.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:31:02 +02:00
Ben Widawsky
3e78998a58 drm/i915/bdw: implement semaphore signal
Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.

The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).

NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.

v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)

v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)

v4: Fixed compilation rebase conflict
commit 1ec9e26dda
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Feb 14 14:01:11 2014 +0100

    drm/i915: Consolidate binding parameters into flags

v5: VCS2 rebase
Replace hweight_long with hweight32

v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
   	      * fixing conflicst on rebase
    	      * minor fixes on address table
	      * remove WARN_ON

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: s/BUG_ON/WARN_ON/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:16:23 +02:00
Rodrigo Vivi
ddd4dbc6c1 drm/i915: Updating comments.
ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:02:49 +02:00
Chris Wilson
4be173813e drm/i915: Reorder semaphore deadlock check
If a semaphore is waiting on another ring, which in turn happens to be
waiting on the first ring, but that second semaphore has been signalled,
we will be able to kick the second ring and so can treat the first ring
as a valid WAIT and not as HUNG.

v2: Be paranoid and cap the potential recursion depth whilst visiting
the semaphore signallers. (Mika)

References: https://bugs.freedesktop.org/show_bug.cgi?id=54226
References: https://bugs.freedesktop.org/show_bug.cgi?id=75502
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-06-11 11:06:43 +03:00
Oscar Mateo
273497e5cd drm/i915: s/i915_hw_context/intel_context
Up until now, contexts had one (and only one) backing object that was
used by the hardware to save/restore render ring contexts (via the
MI_SET_CONTEXT command). Other rings did not have or need this, so
our i915_hw_context struct had a 1:1 relationship with a a real HW
context.

With Logical Ring Contexts and Execlists, this is not possible anymore:
all rings need a backing object, and it cannot be reused. To prepare
for that, rename our contexts to the more generic term intel_context.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:41:17 +02:00
Oscar Mateo
93b0a4e0b2 drm/i915: Split the ringbuffers from the rings (3/3)
Manual cleanup after the previous Coccinelle script.

Yes, I could write another Coccinelle script to do this but I
don't want labor-replacing robots making an honest programmer's
work obsolete (also, I'm lazy).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:30:18 +02:00
Oscar Mateo
ee1b1e5ef3 drm/i915: Split the ringbuffers from the rings (2/3)
This refactoring has been performed using the following Coccinelle
semantic script:

    @@
    struct intel_engine_cs r;
    @@
    (
    - (r).obj
    + r.buffer->obj
    |
    - (r).virtual_start
    + r.buffer->virtual_start
    |
    - (r).head
    + r.buffer->head
    |
    - (r).tail
    + r.buffer->tail
    |
    - (r).space
    + r.buffer->space
    |
    - (r).size
    + r.buffer->size
    |
    - (r).effective_size
    + r.buffer->effective_size
    |
    - (r).last_retired_head
    + r.buffer->last_retired_head
    )

    @@
    struct intel_engine_cs *r;
    @@
    (
    - (r)->obj
    + r->buffer->obj
    |
    - (r)->virtual_start
    + r->buffer->virtual_start
    |
    - (r)->head
    + r->buffer->head
    |
    - (r)->tail
    + r->buffer->tail
    |
    - (r)->space
    + r->buffer->space
    |
    - (r)->size
    + r->buffer->size
    |
    - (r)->effective_size
    + r->buffer->effective_size
    |
    - (r)->last_retired_head
    + r->buffer->last_retired_head
    )

    @@
    expression E;
    @@
    (
    - LP_RING(E)->obj
    + LP_RING(E)->buffer->obj
    |
    - LP_RING(E)->virtual_start
    + LP_RING(E)->buffer->virtual_start
    |
    - LP_RING(E)->head
    + LP_RING(E)->buffer->head
    |
    - LP_RING(E)->tail
    + LP_RING(E)->buffer->tail
    |
    - LP_RING(E)->space
    + LP_RING(E)->buffer->space
    |
    - LP_RING(E)->size
    + LP_RING(E)->buffer->size
    |
    - LP_RING(E)->effective_size
    + LP_RING(E)->buffer->effective_size
    |
    - LP_RING(E)->last_retired_head
    + LP_RING(E)->buffer->last_retired_head
    )

Note: On top of this this patch also removes the now unused ringbuffer
fields in intel_engine_cs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: Add note about fixup patch included here.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:27:25 +02:00
Oscar Mateo
8ee149756e drm/i915: Split the ringbuffers from the rings (1/3)
As advanced by the previous patch, the ringbuffers and the engine
command streamers belong in different structs. This is so because,
while they used to be tightly coupled together, the new Logical
Ring Contexts (LRC for short) have a ringbuffer each.

In legacy code, we will use the buffer* pointer inside each ring
to get to the pertaining ringbuffer (the actual switch will be
done in the next patch). In the new Execlists code, this pointer
will be NULL and we will use instead the one inside the context
instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:02:16 +02:00
Oscar Mateo
a4872ba6d0 drm/i915: s/intel_ring_buffer/intel_engine_cs
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-22 23:01:05 +02:00
Brad Volkin
44e895a8a2 drm/i915: Use hash tables for the command parser
For clients that submit large batch buffers the command parser has
a substantial impact on performance. On my HSW ULT system performance
drops as much as ~20% on some tests. Most of the time is spent in the
command lookup code. Converting that from the current naive search to
a hash table lookup reduces the performance drop to ~10%.

The choice of value for I915_CMD_HASH_ORDER allows all commands
currently used in the parser tables to hash to their own bucket (except
for one collision on the render ring). The tradeoff is that it wastes
memory. Because the opcodes for the commands in the tables are not
particularly well distributed, reducing the order still leaves many
buckets empty. The increased collisions don't seem to have a huge
impact on the performance gain, but for now anyhow, the parser trades
memory for performance.

NB: Ville noticed that the error paths through the ring init code
will leak memory. I've not addressed that here. We can do a follow
up pass to handle all of the leaks.

v2: improved comment describing selection of hash key mask (Damien)
replace a BUG_ON() with an error return (Tvrtko, Ville)
commit message improvements

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-12 19:15:51 +02:00
Ben Widawsky
9bcb144c83 drm/i915: Support 64b execbuf
Previously, our code only had a 32b offset value for where the
batchbuffer starts. With full PPGTT, and 64b canonical GPU address
space, that is an insufficient value. The code to expand is pretty
straight forward, and only one platform needs to do anything with the
extra bits.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:01:58 +02:00
Ben Widawsky
024a43e12c drm/i915: Move ring_begin to signal()
Add_request has always contained both the semaphore mailbox updates as
well as the breadcrumb writes. Since the semaphore signal is the one
which actually knows about the number of dwords it needs to emit to the
ring, we move the ring_begin to that function. This allows us to remove
the hideously shared #define

On a related not, gen8 will use a different number of dwords for
semaphores, but not for add request.

v2: Make number of dwords an explicit part of signalling (via function
argument). (Chris)

v3: very slight comment change

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:53 +02:00
Ben Widawsky
78325f2d27 drm/i915: Virtualize the ringbuffer signal func
This abstraction again is in preparation for gen8. Gen8 will bring new
semantics for doing this operation.

While here, make the writes of MI_NOOPs explicit for non-existent rings.
This should have been implicit before.

NOTE: This is going to be removed in a few patches.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:53 +02:00
Ben Widawsky
ebc348b2ad drm/i915: Move semaphore specific ring members to struct
This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.

v2: Move mbox stuff to a separate struct

v3: Rebased over VCS2 work

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:52 +02:00
Zhao Yakui
845f74a701 drm/i915:Initialize the second BSD ring on BDW GT3 machine
Based on the hardware spec, the BDW GT3 machine has two independent
BSD ring that can be used to dispatch the video commands.
So just initialize it.

V3->V4: Follow Imre's comment to do some minor updates. For example:
more comments are added to describe the semaphore between ring.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
[danvet: Fix up checkpatch error.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:46 +02:00
Zhao Yakui
b1a93306ed drm/i915: Update the restrict check to filter out wrong Ring ID passed by user-space
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:44 +02:00
Chris Wilson
e3efda49e7 drm/i915: Preserve ring buffers objects across resume
Tearing down the ring buffers across resume is overkill, risks
unnecessary failure and increases fragmentation.

After failure, since the device is still active we may end up trying to
write into the dangling iomapping and trigger an oops.

v2: stop_ringbuffers() was meant to call stop(ring) not
cleanup(ring) during resume!

Reported-by: Jae-hyeon Park <jhyeon@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as
suggested by Oscar.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:37 +02:00
Chris Wilson
9991ae787a drm/i915: Move all ring resets before setting the HWS page
In commit a51435a313
Author: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Date:   Wed Mar 12 16:39:40 2014 +0530

    drm/i915: disable rings before HW status page setup

we reordered stopping the rings to do so before we set the HWS register.
However, there is an extra workaround for g45 to reset the rings twice,
and for consistency we should apply that workaround before setting the
HWS to be sure that the rings are truly stopped.

Cc: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-03 17:16:45 +02:00
Ben Widawsky
057f6a8ad7 drm/i915: Invariably invalidate before ctx switch
We have been setting the bit which was originally BIOS dependent since:
commit f05bb0c7b6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jan 20 16:33:32 2013 +0000

    drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits

Therefore, we do not need to try to figure it out dynamically and we can
just always invalidate the TLBs.

It's a partial revert of:
commit 12b0286f49
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Mon Jun 4 14:42:50 2012 -0700

    drm/i915: possibly invalidate TLB before context switch

The original commit attempted to only invalidate when necessary
(very much a relic from the old days). Now, we can just always invalidate.

I guess the old TODO still exists. Since we seem to have abandoned ILK
contexts however, there isn't much point in even remembering.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-03 11:41:39 +02:00
Chris Wilson
508774452d drm/i915: Broadwell expands ACTHD to 64bit
As Broadwell has an increased virtual address size, it requires more
than 32 bits to store offsets into its address space. This includes the
debug registers to track the current HEAD of the individual rings, which
may be anywhere within the per-process address spaces. In order to find
the full location, we need to read the high bits from a second register.
We then also need to expand our storage to keep track of the larger
address.

v2: Carefully read the two registers to catch wraparound between
    the reads.
v3: Use a WARN_ON rather than loop indefinitely on an unstable
    register read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Drop spurious hunk which conflicted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-28 18:33:14 +01:00
Naresh Kumar Kachhi
e9fea5747d drm/i915: wait for rings to become idle once disabled
make sure we wait for rings to become idle once they are
disabled. In case of timeout print an error message

Signed-off-by: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
[danvet: Frob patch as suggested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-12 16:04:19 +01:00
Daniel Vetter
e8e6e6012d Linux 3.14-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTHSaRAAoJEHm+PkMAQRiG7G8IAJHElwFDNSQE7Y9MmbicrAMG
 kfjhBtBpTaVrJKQXegCNUwDaLLyC4oLIxDheW84oPXbrEGDLqPtBov/hrcFkHVr4
 lh/ZYk02nYtcfpN0JnL/Yj2oKHVmBWs0vFlM7StSFsJCj10DoCVQQdmAJ8XODTPo
 CXMapk+UikTX1TlIO8+B5toyl3R1OqPmW211UV1vQVLKy66hu+MKVN/V+/EyopL0
 1jO81EDpaRaeIJh1/okcyUoIq9pqLkAWNpeQ7uyXZ+Sfivt9RXwLYKmAB3lP20Hc
 ZMIIoHSCyYRFjxLlQvt02bA9nY4wTY7YN5kZ2kk65y7TFfhcGsCw1Sc69iyCoKs=
 =CJcA
 -----END PGP SIGNATURE-----

Merge tag 'v3.14-rc6' into drm-intel-next-queued

Linux 3.14-rc6

I need the hdmi/dvi-dual link fixes in 3.14 to avoid ugly conflicts
when merging Ville's new hdmi cloning support into my -next tree

Conflicts:
	drivers/gpu/drm/i915/Makefile
	drivers/gpu/drm/i915/intel_dp.c

Makefile cleanup conflicts with an acpi build fix, intel_dp.c is
trivial.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-10 21:43:46 +01:00
Brad Volkin
351e3db2b3 drm/i915: Implement command buffer parsing logic
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

See the overview comment in the source for more details.

This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.

v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:37:00 +01:00
Ville Syrjälä
753b1ad4a2 drm/i915: Add intel_ring_cachline_align()
intel_ring_cachline_align() emits MI_NOOPs until the ring tail is
aligned to a cacheline boundary.

Cc: Bjoern C <lkml@call-home.ch>
Cc: Alexandru DAMIAN <alexandru.damian@intel.com>
Cc: Enrico Tagliavini <enrico.tagliavini@gmail.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org (prereq for the next patch)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-11 23:00:19 +01:00
Mika Kuoppala
b6b0fac04d drm/i915: Use hangcheck score to find guilty context
With full ppgtt using acthd is not enough to find guilty
batch buffer. We get multiple false positives as acthd is
per vm.

Instead of scanning which vm was running on a ring,
to find corressponding context, use a different, simpler,
strategy of finding batches that caused gpu hang:

If hangcheck has declared ring to be hung, find first non complete
request on that ring and claim it was guilty.

v2: Rebase

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 11:57:24 +01:00
Chris Wilson
092467327c drm/i915: Write RING_TAIL once per-request
Ignoring the legacy DRI1 code, and a couple of special cases (to be
discussed later), all access to the ring is mediated through requests.
The first write to a ring will grab a seqno and mark the ring as having
an outstanding_lazy_request. Either through explicitly adding a request
after an execbuffer or through an implicit wait (either by the CPU or by
a semaphore), that sequence of writes will be terminated with a request.
So we can ellide all the intervening writes to the tail register and
send the entire command stream to the GPU at once. This will reduce the
number of *serialising* writes to the tail register by a factor or 3-5
times (depending upon architecture and number of workarounds, context
switches, etc involved). This becomes even more noticeable when the
register write is overloaded with a number of debugging tools. The
astute reader will wonder if it is then possible to overflow the ring
with a single command. It is not. When we start a command sequence to
the ring, we check for available space and issue a wait in case we have
not. The ring wait will in this case be forced to flush the outstanding
register write and then poll the ACTHD for sufficient space to continue.

The exception to the rule where everything is inside a request are a few
initialisation cases where we may want to write GPU commands via the CS
before userspace wakes up and page flips.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-10 15:35:58 +02:00
Mika Kuoppala
da66146425 drm/i915: include hangcheck action and score in error_state
Score and action reveals what all the rings were doing
and why hang was declared. Add idle state so that
we can distinguish between waiting and idle ring.

v2: - add idle as a hangcheck action
    - consensed hangcheck status to single line (Chris)
    - mark active explicitly when we are making progress (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:56:17 +02:00
Chris Wilson
3c0e234c84 drm/i915; Preallocate the lazy request
It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:53 +02:00
Chris Wilson
1823521d2b drm/i915: Rename ring->outstanding_lazy_request
Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:12 +02:00
Chris Wilson
0d1aacac36 drm/i915: Embed the ring->private within the struct intel_ring_buffer
We now have more devices using ring->private than not, and they all want
the same structure. Worse, I would like to use a scratch page from
outside of intel_ringbuffer.c and so for convenience would like to reuse
ring->private. Embed the object into the struct intel_ringbuffer so that
we can keep the code clean.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:17:55 +02:00
Damien Lespiau
e3ce7633ba drm/i915: Remove I915_READ_{NOPID, SYNC_0, SYNC_1})()
The code directly uses the registers and ring->mmio_base.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:24 +02:00
Jani Nikula
f2f4d82faf drm/i915: give more distinctive names to ring hangcheck action enums
The short lowercase names are bound to collide. The default warnings
don't even warn about shadowing.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:37 +02:00
Daniel Vetter
c7113cc35f drm/i915: unify ring irq refcounts (again)
With the simplified locking there's no reason any more to keep the
refcounts seperate.

v2: Readd the lost comment that ring->irq_refcount is protected by
dev_priv->irq_lock.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-11 14:36:49 +02:00
Daniel Vetter
59cdb63d52 drm/i915: kill dev_priv->rps.lock
Now that the rps interrupt locking isn't clearly separated (at elast
conceptually) from all the other interrupt locking having a different
lock stopped making sense: It protects much more than just the rps
workqueue it started out with. But with the addition of VECS the
separation started to blurr and resulted in some more complex locking
for the ring interrupt refcount.

With this we can (again) unifiy the ringbuffer irq refcounts without
causing a massive confusion, but that's for the next patch.

v2: Explain better why the rps.lock once made sense and why no longer,
requested by Ben.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-11 14:36:43 +02:00
Mika Kuoppala
ad8beaeada drm/i915: store ring hangcheck action
For guilty batchbuffer analysis later on when rings are reset,
store what state the ring was on when hang was declared.
This helps to weed out the waiting rings from the active ones.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:17 +02:00
Chris Wilson
6274f2126a drm/i915: Don't count semaphore waits towards a stuck ring
If we detect a ring is in a valid wait for another, just let it be.
Eventually it will either begin to progress again, or the entire system
will come grinding to a halt and then hangcheck will fire as soon as the
deadlock is detected.

This error was foretold by Ben in
commit 05407ff889
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu May 30 09:04:29 2013 +0300

    drm/i915: detect hang using per ring hangcheck_score

"If ring B is waiting on ring A via semaphore, and ring A is making
progress, albeit slowly - the hangcheck will fire. The check will
determine that A is moving, however ring B will appear hung because
the ACTHD doesn't move. I honestly can't say if that's actually a
realistic problem to hit it probably implies the timeout value is too
low."

v2: Make sure we don't even incur the KICK cost whilst waiting.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65394
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-11 11:49:28 +02:00
Chris Wilson
c65355bbef drm/i915: Track when we dirty the scanout with render commands
This is required for tracking render damage for use with FBC and will be
used in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-07 17:56:45 +02:00
Mika Kuoppala
05407ff889 drm/i915: detect hang using per ring hangcheck_score
Keep track of ring seqno progress and if there are no
progress detected, declare hang. Use actual head (acthd)
to distinguish between ring stuck and batchbuffer looping
situation. Stuck ring will be kicked to trigger progress.

This commit adds a hard limit for batchbuffer completion time.
If batchbuffer completion time is more than 4.5 seconds,
the gpu will be declared hung.

Review comment from Ben which nicely clarifies the semantic change:

"Maybe I'm just stating the functional changes of the patch, but in case
they were unintended here is what I see as potential issues:

1. "If ring B is waiting on ring A via semaphore, and ring A is making
   progress, albeit slowly - the hangcheck will fire. The check will
   determine that A is moving, however ring B will appear hung because
   the ACTHD doesn't move. I honestly can't say if that's actually a
   realistic problem to hit it probably implies the timeout value is too
   low.

2. "There's also another corner case on the kick. If the seqno = 2
   (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
   we try to kick it... we don't actually try to find out if the kick
   helped"

v2: use atchd to detect stuck ring from loop (Ben Widawsky)

v3: Use acthd to check when ring needs kicking.
Declare hang on third time in order to give time for
kick_ring to take effect.

v4: Update commit msg

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Paste in Ben's review comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 10:58:21 +02:00
Ben Widawsky
a19d2933cb drm/i915: vebox interrupt get/put
v2: Use the correct lock to protect PM interrupt regs, this was
accidentally lost from earlier (Haihao)
Fix return types (Ben)

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:19 +02:00
Ben Widawsky
aeb0659338 drm/i915: Convert irq_refounct to struct
It's overkill on older gens, but it's useful for newer gens.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:18 +02:00
Ben Widawsky
9a8a2213a7 drm/i915: Vebox ringbuffer init
v2: Add set_seqno which didn't exist before rebase (Haihao)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:12 +02:00
Ben Widawsky
4a3dd19d94 drm/i915: Introduce VECS: the 4th ring
The video enhancement command streamer is a new ring on HSW which does
what it sounds like it does. This patch provides the most minimal
inception of the ring.

In order to support a new ring, we need to bump the number. The patch
may look trivial to the untrained eye, but bumping the number of rings
is a bit scary. As such the patch is not terribly useful by itself, but
a pretty nice place to find issues during a bisection.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:09 +02:00
Ben Widawsky
ad776f8b09 drm/i915: Semaphore MBOX update generalization
This replaces the existing MBOX update code with a more generalized
calculation for emitting mbox updates. We also create a sentinel for
doing the updates so we can more abstractly deal with the rings.

When doing MBOX updates the code must be aware of the /other/ rings.
Until now the platforms which supported semaphores had a fixed number of
rings and so it made sense for the code to be very specialized
(hardcoded).

The patch does contain a functional change, but should have no
behavioral changes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:08 +02:00
Ben Widawsky
5586181fce drm/i915: Comments for semaphore clarification
Semaphores are tied very closely to the rings in the GPU. Trivial patch
adds comments to the existing code so that when we add new rings we can
include comments there as well. It also helps distinguish the ring to
semaphore mailbox interactions by using the ringname in the semaphore
data structures.

This patch should have no functional impact.

v2: The English parts (as opposed to register names) of the comments
were reversed. (Damien)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:08 +02:00
Mika Kuoppala
92cab73451 drm/i915: track ring progression using seqnos
Instead of relying in acthd, track ring seqno progression
to detect if ring has hung.

v2: put hangcheck stuff inside struct (Chris Wilson)

v3: initialize hangcheck.seqno (Ben Widawsky)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:53:54 +02:00
Chris Wilson
112522f678 drm/i915: put context upon switching
In order to be notified of when the context and all of its associated
objects is idle (for if the context maps to a ppgtt) we need a callback
from the retire handler. We can arrange this by using the kref_get/put
of the context for request tracking and by inserting a request to
demarque the switch away from the old context.

[Ben: fixed minor error to patch compile, AND s/last_context/from/]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-06 11:20:48 +02:00
Dave Airlie
b5cc6c0387 Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
  Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
  real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces

* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
  drm/i915: Return the real error code from intel_set_mode()
  drm/i915: Make GSM void
  drm/i915: Move GSM mapping into dev_priv
  drm/i915: Move even more gtt code to i915_gem_gtt
  drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
  drm/i915: Introduce i915_gem_set_seqno()
  drm/i915: Always clear semaphore mboxes on seqno wrap
  drm/i915: Initialize hardware semaphore state on ring init
  drm/i915: Introduce ring set_seqno
  drm/i915: Missed conversion to gtt_pte_t
  drm/i915: Bug on unsupported swizzled platforms
  drm/i915: BUG() if fences are used on unsupported platform
  drm/i915: fixup overlay stolen memory leak
  drm/i915: clean up PIPECONF bpc #defines
  drm/i915: add intel_dp_set_signal_levels
  drm/i915: remove leftover display.update_wm assignment
  drm/i915: check for the PCH when setting pch_transcoder
  drm/i915: Clear the stolen fb before enabling
  drm/i915: Access to snooped system memory through the GTT is incoherent
  drm/i915: Remove stale comment about intel_dp_detect()
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-01-17 20:34:08 +10:00
Mika Kuoppala
f7e98ad4d4 drm/i915: Initialize hardware semaphore state on ring init
Hardware status page needs to have proper seqno set
as our initial seqno can be arbitrary. If initial seqno is close
to wrap boundary on init and i915_seqno_passed() (31bit space)
refers to hw status page which contains zero, errorneous result
will be returned.

v2: clear mboxes and set hws page directly instead of going
through rings. Suggested by Chris Wilson.

v3: hws needs to be updated for all gens. Noticed by Chris
Wilson.

References: https://bugs.freedesktop.org/show_bug.cgi?id=58230
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:17:01 +01:00
Mika Kuoppala
b70ec5bf43 drm/i915: Introduce ring set_seqno
In preparation for setting per ring initial seqno values
add ring::set_seqno().

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:16:18 +01:00
Daniel Vetter
b45305fce5 drm/i915: Implement workaround for broken CS tlb on i830/845
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.

v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
  wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
  dumping, so that we still catch batches when userspace opts out of
  the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-17 17:27:02 +01:00
Mika Kuoppala
498d2ac15c drm/i915: Add intel_ring_handle_seqno wrap
If there are pre-wrap values in semaphore-mbox registers after wrap,
syncing against some after-wrap request will complete immediately.
Fix this by emitting ring commands to set mbox registers to zero
when the wrap happens.

v2: Use __intel_ring_begin to emit ring commands, from
Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment to handle_seqno_wrap.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-06 13:14:34 +01:00
Ville Syrjälä
633cf8f505 drm/i915: Don't allow ring tail to reach the same cacheline as head
From BSpec:
"If the Ring Buffer Head Pointer and the Tail Pointer are on the same
cacheline, the Head Pointer must not be greater than the Tail
Pointer."

The easiest way to enforce this is to reduce the reported ring space.

References:
Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"

v2: Include the exact BSpec references in the description

v3: s/64/I915_RING_FREE_SPACE, and add the BSpec information to the code

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 18:31:20 +01:00
Chris Wilson
3e9605018a drm/i915: Rearrange code to only have a single method for waiting upon the ring
Replace the wait for the ring to be clear with the more common wait for
the ring to be idle. The principle advantage is one less exported
intel_ring_wait function, and the removal of a hardcoded value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:53 +01:00
Chris Wilson
9d7730914f drm/i915: Preallocate next seqno before touching the ring
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:52 +01:00
Jesse Barnes
9a28977181 drm/i915: TLB invalidation with MI_FLUSH_DW requires a post-sync op v3
So store into the scratch space of the HWS to make sure the invalidate
occurs.

v2: use GTT address space for store, clean up #defines (Chris)
v3: use correct #define in blt ring flush (Chris)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Antti Koskipää <antti.koskipaa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1063252
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:36 +01:00
Chris Wilson
d7d4eeddb8 drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers
With the introduction of per-process GTT space, the hardware designers
thought it wise to also limit the ability to write to MMIO space to only
a "secure" batch buffer. The ability to rewrite registers is the only
way to program the hardware to perform certain operations like scanline
waits (required for tear-free windowed updates). So we either have a
choice of adding an interface to perform those synchronized updates
inside the kernel, or we permit certain processes the ability to write
to the "safe" registers from within its command stream. This patch
exposes the ability to submit a SECURE batch buffer to
DRM_ROOT_ONLY|DRM_MASTER processes.

v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
bit (bit 13, accidentally not set). Also add a comment explaining why
secure batches need a global gtt binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
[danvet: added hsw fixup.]
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-17 21:06:59 +02:00
Chris Wilson
b2eadbc85b drm/i915: Lazily apply the SNB+ seqno w/a
Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.

This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.

v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-10 11:11:32 +02:00
Chris Wilson
a7b9761d0a drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Chris Wilson
69c2fc8913 drm/i915: Remove the per-ring write list
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00
Daniel Vetter
cc889e0f6c drm/i915: disable flushing_list/gpu_write_list
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-20 13:54:28 +02:00
Ben Widawsky
12b0286f49 drm/i915: possibly invalidate TLB before context switch
From http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol1_Part3.pdf

[DevSNB] If Flush TLB invalidation Mode is enabled it's the driver's
responsibility to invalidate the TLBs at least once after the previous
context switch after any GTT mappings changed (including new GTT
entries).  This can be done by a pipelined PIPE_CONTROL with TLB inv bit
set immediately before MI_SET_CONTEXT.

On GEN7 the invalidation mode is explicitly set, but this appears to be
lacking for GEN6. Since I don't know the history on this, I've decided
to dynamically read the value at ring init time, and use that value
throughout.

v2: better comment (daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:19 +02:00
Ben Widawsky
e055684168 drm/i915: context switch implementation
Implement the context switch code as well as the interfaces to do the
context switch. This patch also doesn't match 1:1 with the RFC patches.
The main difference is that from Daniel's responses the last context
object is now stored instead of the last context. This aids in allows us
to free the context data structure, and context object independently.

There is room for optimization: this code will pin the context object
until the next context is active. The optimal way to do it is to
actually pin the object, move it to the active list, do the context
switch, and then unpin it. This allows the eviction code to actually
evict the context object if needed.

The context switch code is missing workarounds, they will be implemented
in future patches.

v2: actually do obj->dirty=1 in switch (daniel)
Modified comment around above
Remove flags to context switch (daniel)
Move mi_set_context code to i915_gem_context.c (daniel)
Remove seqno , use lazy request instead (daniel)

v3: use i915_gem_request_next_seqno instead of
      outstanding_lazy_request (Daniel)
remove id's from trace events (Daniel)
Put the context BO in the instruction domain (Daniel)
Don't unref the BO is context switch fails (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:17 +02:00
Ben Widawsky
40521054fd drm/i915: context basic create & destroy
Invent an abstraction for a hw context which is passed around through
the core functions. The main bit a hw context holds is the buffer object
which backs the context. The rest of the members are just helper
functions. Specifically the ring member, which could likely go away if
we decide to never implement whatever other hw context support exists.

Of note here is the introduction of the 64k alignment constraint for the
BO. If contexts become heavily used, we should consider tweaking this
down to 4k. Until the contexts are merged and tested a bit though, I
think 64k is a nice start (based on docs).

Since we don't yet switch contexts, there is really not much complexity
here. Creation/destruction works pretty much as one would expect. An idr
is used to generate the context id numbers which are unique per file
descriptor.

v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:16 +02:00
Chris Wilson
b4519513e8 drm/i915: Introduce for_each_ring() macro
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.

Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.

Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.

v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-19 22:39:53 +02:00
Daniel Vetter
4225d0f219 drm/i915: fixup __iomem mixups in ringbuffer.c
Two things:
- ring->virtual start is an __iomem pointer, treat it accordingly.
- dev_priv->status_page.page_addr is now always a cpu addr, no pointer
  casting needed for that.

Take the opportunity to remove the unnecessary drm indirection when
setting up the ringbuffer iomapping.

v2: Add a compiler barrier before reading the hw status page.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:31 +02:00
Daniel Vetter
09422b2e72 drm/i915: move LP_RING&friends to i915_dma.c
Wohoo!

Now we only need to move all the gem/kms stuff that accidentally
landed in i915_dma.c out of it, and this will be our legacy dri1
grave-yard.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:28 +02:00
Chris Wilson
6d171cb4c2 drm/i915: Remove unused ring->irq_seqno
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:23 +02:00
Ben Widawsky
9574b3fe29 drm/i915: kill waiting_seqno
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.

v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:21 +02:00
Chris Wilson
7338aefa5c drm/i915: Use a global lock for modifying global irq flags
We were attempting to use a per-ring spinlock whilst modifying global
IRQ flags. A recipe for rare missed interrupts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:09 +02:00
Daniel Vetter
6a848ccb80 drm/i915: rip out ring->irq_mask
We only ever enable/disable one interrupt (namely user_interrupts and
pipe_notify), so we don't need to track the interrupt masking state.

Also rename irq_enable to irq_enable_mask, now that it won't collide -
beforehand both a irq_mask and irq_enable_mask would have looked a bit
strange.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-13 12:40:57 +02:00
Ben Widawsky
25c063004a drm/i915: open code gen6+ ring irqs
We can now open-code the get/put irq functions as they were just
abstracting single register definitions.

It would be nice to merge this in with the IRQ handling code... but that
is too much work for me at present. In addition I could probably
collapse this in to a lot of the Ironlake stuff, but I don't think it's
worth the potential regressions.

This patch itself should not effect functionality.

CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-09 18:04:06 +02:00
Chris Wilson
a71d8d9452 drm/i915: Record the tail at each request and use it to estimate the head
By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
  firefox-paintball  18927ms -> 15646ms: 1.21x speedup
  firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-02-15 14:26:03 +01:00
Daniel Vetter
96154f2fab drm/i915: switch ring->id to be a real id
... and add a helpr function for the places where we want a flag.

This way we can use ring->id to index into arrays.

v2: Resurrect the missing beautification-space Chris Wilson noted.
I'm moving this space around because I'll reuse ring_str in the next
patch.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-01-29 17:32:58 +01:00
Ben Widawsky
c8c99b0f0d drm/i915: Dumb down the semaphore logic
While I think the previous code is correct, it was hard to follow and
hard to debug. Since we already have a ring abstraction, might as well
use it to handle the semaphore updates and compares.

I don't expect this code to make semaphores better or worse, but you
never know...

v2:
Remove magic per Keith's suggestions.
Ran Daniel's gem_ring_sync_loop test on this.

v3:
Ignored one of Keith's suggestions.

v4:
Removed some bloat per Daniel's recommendation.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-21 14:52:41 -07:00
Akshay Joshi
0206e353a0 Drivers: i915: Fix all space related issues.
Various issues involved with the space character were generating
warnings in the checkpatch.pl file. This patch removes most of those
warnings.

Signed-off-by: Akshay Joshi <me@akshayjoshi.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-09-19 18:01:47 -07:00
Chris Wilson
a94919eadd drm/i915/ringbuffer: Idling requires waiting for the ring to be empty
...which is measured by the size and not the amount of space remaining.

Waiting upon size-8, did one of two things. In the common case with more
than 8 bytes available to write into the ring, it would return
immediately. Otherwise, it would timeout given the impossible condition
of waiting for more space than is available in the ring, leading to
warnings such as:

[drm:intel_cleanup_ring_buffer] *ERROR* failed to quiesce render ring
whilst cleaning up: -16

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-12 10:35:45 -07:00
Ben Widawsky
b7287d8054 drm/i915: proper use of forcewake
Moved the macros around to properly do reads and writes for the given
GPU. This is to address special requirements for gen6 (SNB) reads and
writes.

Registers in the range 0-0x40000 on gen6 platforms require special
handling. Instead of relying on the callers to pick the registers
correctly, move the logic into the read and write functions.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-05-10 13:56:45 -07:00
Ben Widawsky
96f298aa9c drm/1915: ringbuffer wait for idle function
Added a new function which waits for the ringbuffer space to be equal to
(total - 8). This is the empty condition of the ringbuffer, and
equivalent to head==tail.

Also modified two users of this functionality elsewhere in the code.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-05-10 13:56:40 -07:00
Chris Wilson
47ae63e0c2 Merge branch 'drm-intel-fixes' into drm-intel-next
Apply the trivial conflicting regression fixes, but keep GPU semaphores
enabled.

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_execbuffer.c
2011-03-07 12:35:15 +00:00
Chris Wilson
9135583464 drm/i915: Do not overflow the MMADDR write FIFO
Whilst the GT is powered down (rc6), writes to MMADDR are placed in a
FIFO by the System Agent. This is a limited resource, only 64 entries, of
which 20 are reserved for Display and PCH writes, and so we must take
care not to queue up too many writes. To avoid this, there is counter
which we can poll to ensure there are sufficient free entries in the
fifo.

"Issuing a write to a full FIFO is not supported; at worst it could
result in corruption or a system hang."

Reported-and-Tested-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34056
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-03-06 09:07:46 +00:00
Chris Wilson
db53a30261 drm/i915: Refine tracepoints
A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-02-07 14:59:18 +00:00
Chris Wilson
bdd92c9ad2 Merge branch 'drm-intel-fixes' into drm-intel-next
Merge important suspend and resume regression fixes and resolve the
small conflict.

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
2011-01-24 23:45:32 +00:00
Chris Wilson
c7dca47bd6 drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space
During suspend, Linus found that his machine would hang for 3 seconds,
and identified that intel_ring_buffer_wait() was the culprit:

"Because from looking at the code, I get the notion that
"intel_read_status_page()" may not be exact. But what happens if that
inexact value matches our cached ring->actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty."

As the reported HEAD position is only updated every time it crosses a
64k boundary, whilst draining the ring it is indeed likely to remain one
value. If that value matches the last known HEAD position, we never read
the true value from the register and so trigger a timeout.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-20 17:26:57 +00:00
Chris Wilson
e8616b6ced drm/i915: Initialise ring vfuncs for old DRI paths
We weren't setting up the vfunc table when initialising the old DRI
ringbuffer, leading to such OOPSes as:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
PGD 10c441067 PUD 1185e5067 PMD 0
Oops: 0010 [#1] PREEMPT SMP
last sysfs file: /sys/class/dmi/id/chassis_asset_tag
CPU 3
Modules linked in: i915 drm_kms_helper drm fb fbdev i2c_algo_bit
cfbcopyarea video backlight output cfbimgblt cfbfillrect autofs4 ipv6
nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp hwmon_vid mousedev
usbhid hid option usb_wwan snd_hda_codec_via asus_atk0110 atl1e
usbserial snd_hda_intel snd_hda_codec firmware_class snd_hwdep snd_pcm
snd_seq snd_timer snd_seq_device processor parport_pc thermal snd
thermal_sys parport 8250_pnp button rng_core rtc_cmos shpchp hwmon
rtc_core ehci_hcd pci_hotplug uhci_hcd soundcore tpm_tis i2c_i801
rtc_lib tpm serio_raw snd_page_alloc tpm_bios i2c_core usbcore psmouse
intel_agp sg pcspkr sr_mod evdev cdrom ext3 jbd mbcache dm_mod sd_mod
ata_piix libata scsi_mod unix
Jan 18 15:49:29 lithui kernel:
Pid: 3605, comm: Xorg Not tainted 2.6.36.2 #5 P5KPL-CM/System Product
Name
RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
RSP: 0018:ffff8801150d1d40  EFLAGS: 00010202
RAX: 000000000001ffff RBX: ffff88011a011b00 RCX: 000000000001a704
RDX: ffff880118566028 RSI: ffff880118566028 RDI: ffff880117876800
RBP: ffff8801150d1d48 R08: ffff8801195fe300 R09: 00000000c0086444
R10: 0000000000000001 R11: 0000000000003206 R12: ffff880117876800
R13: ffff880118566000 R14: ffff880117876820 R15: ffff8801150d1df8
FS:  00007f1038d456e0(0000) GS:ffff880001780000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001187e7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process Xorg (pid: 3605, threadinfo ffff8801150d0000, task
ffff88011b016e40)
Stack:
ffffffffa043b8e6 ffff8801150d1d98 ffffffffa041768b dead000000000000
<0> 0000000000000048 00007f1023f2a000 0000000000000044 0000000000000008
<0> ffff88010d26bd80 ffff880117876800 ffff8801150d1df8 ffff8801150d1ea8
Call Trace:
[<ffffffffa043b8e6>] ? intel_ring_advance+0x16/0x20 [i915]
[<ffffffffa041768b>] i915_irq_emit+0x15b/0x240 [i915]
[<ffffffffa03ea7b1>] drm_ioctl+0x1f1/0x460 [drm]
[<ffffffffa0417530>] ? i915_irq_emit+0x0/0x240 [i915]
[<ffffffff810dd8f1>] ? do_sync_read+0xd1/0x120
[<ffffffff81025b1f>] ? do_page_fault+0x1df/0x3d0
[<ffffffff810ed5c7>] do_vfs_ioctl+0x97/0x550
[<ffffffff8115c2ea>] ? security_file_permission+0x7a/0x90
[<ffffffff810edb19>] sys_ioctl+0x99/0xa0
[<ffffffff810024ab>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<(null)>] (null)
RSP <ffff8801150d1d40>
CR2: 0000000000000000

Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Herbert Xu <herbert@gondor.apana.org.au>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29153
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23172
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
2011-01-20 11:20:53 +00:00
Chris Wilson
311bd68e02 drm/i915: Trivial sparse fixes
Move code around and invoke iomem annotation in a few more places in
order to silence sparse. Still a few more iomem annotations to go...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-19 12:39:38 +00:00
Chris Wilson
0dc79fb2a3 drm/i915: Make the ring IMR handling private
As the IMR for the USER interrupts are not modified elsewhere, we can
separate the spinlock used for these from that of hpd and pipestats.
Those two IMR are manipulated under an IRQ and so need heavier locking.

Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:58 +00:00
Chris Wilson
01a03331e5 drm/i915/ringbuffer: Simplify the ring irq refcounting
... and move it under the spinlock to gain the appropriate memory
barriers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:57 +00:00
Chris Wilson
9862e600ce drm/i915/debugfs: Show the per-ring IMR
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:56 +00:00
Chris Wilson
0f46832fab drm/i915: Mask USER interrupts on gen6 (until required)
Otherwise we may consume 20% of the CPU just handling IRQs whilst
rendering. Ouch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:56 +00:00
Chris Wilson
b72f3acb71 drm/i915: Handle ringbuffer stalls when flushing
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:43:55 +00:00
Chris Wilson
55249baaa5 drm/i915: Workaround erratum on i830 for TAIL pointer within last 2 cachelines
On i830 if the tail pointer is set to within 2 cachelines of the end of
the buffer, the chip may hang. So instead if the tail were to land in
that location, we pad the end of the buffer with NOPs, and start again
at the beginning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-11 20:35:41 +00:00
Chris Wilson
b13c2b96bf drm/i915/ringbuffer: Make IRQ refcnting atomic
In order to enforce the correct memory barriers for irq get/put, we need
to perform the actual counting using atomic operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-14 11:34:46 +00:00
Chris Wilson
8d5203ca62 Merge branch 'drm-intel-fixes' into drm-intel-next 2010-12-09 20:22:04 +00:00
Chris Wilson
8c0a6bfef1 drm/i915/ringbuffer: Handle wrapping of the autoreported HEAD
If the tail advances beyond the autoreport HEAD value, then we need to
fallback to an uncached read of the HEAD register in order to ascertain
the correct amount of remaining space in the ringbuffer.

Reported-by: Fang, Xun <xunx.fang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32259
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-09 12:53:19 +00:00
Chris Wilson
1ec14ad313 drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB
The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-12-05 00:37:38 +00:00
Chris Wilson
c4e7a41467 drm/i915/ringbuffer: Handle cliprects in the caller
This makes the various rings more consistent by removing the anomalous
handing of the rendering ring execbuffer dispatch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-30 14:17:51 +00:00
Chris Wilson
05394f3975 drm/i915: Use drm_i915_gem_object as the preferred type
A glorified s/obj_priv/obj/ with a net reduction of over a 100 lines and
many characters!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-23 20:19:10 +00:00
Zou Nan hai
cae5852dca drm/i915/ringbuffer: set FORCE_WAKE bit before reading ring register
Before reading ring register, set FORCE_WAKE bit to prevent GT core
power down to low power state, otherwise we may read stale values.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: added a udelay which seemed to do the trick on my SNB]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-11 17:45:54 +00:00
Chris Wilson
5d97eb69bd drm/i915: Only add the lazy request if we end up waiting for it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-10 20:41:16 +00:00
Chris Wilson
5588978882 drm/i915: SNB BLT workaround
On some stepping of SNB cpu, the first command to be parsed in BLT
command streamer should be MI_BATCHBUFFER_START otherwise the GPU
may hang.

(cherry picked from commit 8d19215be8)

Conflicts:

	drivers/gpu/drm/i915/intel_ringbuffer.c
	drivers/gpu/drm/i915/intel_ringbuffer.h

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-11-02 10:48:48 +00:00