Commit Graph

279 Commits

Author SHA1 Message Date
Chris Wilson
688e6c7258 drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:43 +01:00
Tvrtko Ursulin
1b9e665064 drm/i915: Compact Gen8 semaphore initialization
Replace the macro initializer with a programatic loop which
results in smaller code and hopefully just as clear.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-30 17:20:45 +01:00
Chris Wilson
e2efd13007 drm/i915: Rename struct intel_context
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:

	s/struct intel_context/struct i915_gem_context/

which fits much better with its constructors already conveying the
i915_gem_context prefix!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
2016-05-24 15:27:14 +01:00
Chris Wilson
c033666a94 drm/i915: Store a i915 backpointer from engine, and use it
text	   data	    bss	    dec	    hex	filename
6309351	3578714	 696320	10584385	 a18141	vmlinux
6308391	3578714	 696320	10583425	 a17d81	vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

   text	   data	    bss	    dec	    hex	filename
6304579	3578778	 696320	10579677	 a16edd	vmlinux
6303427	3578778	 696320	10578525	 a16a5d	vmlinux

Now over 1KiB!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
2016-05-09 13:41:24 +01:00
Chris Wilson
215a7e3210 drm/i915: Fix gen8 semaphores id for legacy mode
With the introduction of a distinct engine->id vs the hardware id, we need
to fix up the value we use for selecting the target engine when signaling
a semaphore. Note that these values can be merged with engine->guc_id.

Fixes: de1add3605
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-3-git-send-email-chris@chris-wilson.co.uk
2016-04-29 13:59:26 +01:00
Chris Wilson
a58c01aa6e drm/i915: Apply strongly ordered RCS breadcrumb to gen8/legacy
For legacy ringbuffer mode, we need the new ordered breadcrumb emission
tried and tested on execlists in order to avoid the dreaded "missed
interrupt" syndrome. A secondary advantage of the execlists method is
that it writes to an arbitrary address, useful if one wants to write a
breadcrumb elsewhere.

This fix is taken from commit 7c17d37737 (drm/i915: Use ordered seqno
write interrupt generation on gen8+ execlists).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-1-git-send-email-chris@chris-wilson.co.uk
2016-04-29 13:58:26 +01:00
Chris Wilson
596e5efc69 drm/i915: Bump reserved size for legacy gen8 semaphore emission
With 5 rings and a flush, we need 192 bytes of space to emit the
breadcrumb and semaphores. However, we need some spare room the size of
the single largest packet (36 dwords, 144 bytes) to accommodate
wraparound giving a grand total of 336 bytes

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461917226-9132-1-git-send-email-chris@chris-wilson.co.uk
2016-04-29 10:20:51 +01:00
Tvrtko Ursulin
e39d42fa7e drm/i915: Stop tracking execlists retired requests
With the previous patch having extended the pinned lifetime of
contexts by referencing the previous context from the current
request until the latter is retired (completed by the GPU),
we can now remove usage of execlist retired queue entirely.

This is because the above now guarantees that all execlist
object access requirements are satisfied by this new tracking,
and we can stop taking additional references and stop keeping
request on the execlists retired queue.

The latter was a source of significant scalability issues in
the driver causing performance hits on some tests. Most
dramatical of which was igt/gem_close_race which had run time
in tens of minutes which is now reduced to tens of seconds.

Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-24-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson
0251a96322 drm/i915: Remove the identical implementations of request space reservation
Now that we share intel_ring_begin(), reserving space for the tail of
the request is identical between legacy/execlists and so the tautology
can be removed. In the process, we move the reserved space tracking
from the ringbuffer on to the request. This is to enable us to reorder
the reserved space allocation in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Chris Wilson
987046ad65 drm/i915: Unify intel_ring_begin()
Combine the near identical implementations of intel_logical_ring_begin()
and intel_ring_begin() - the only difference is that the logical wait
has to check for a matching ring (which is assumed by legacy).

In the process some debug messages are culled as there were following a
WARN if we hit an actual error.

v2: Updated commentary

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-12-git-send-email-chris@chris-wilson.co.uk
2016-04-28 12:17:32 +01:00
Tvrtko Ursulin
3756685a18 drm/i915: Only grab correct forcewake for the engine with execlists
Rather than blindly waking up all forcewake domains on command
submission, we can teach each engine what is (or are) the correct
one to take.

On platforms with multiple forcewake domains like VLV, CHV, SKL
and BXT, this has the potential of lowering the GPU and CPU
power use and submission latency.

To implement it we add a function named
intel_uncore_forcewake_for_reg whose purpose is to query which
forcewake domains need to be taken to read or write a specific
register with raw mmio accessors.

These enables the execlists engine setup  to query which
forcewake domains are relevant per engine on the currently
running platform.

v2:
  * Kerneldoc.
  * Split from intel_uncore.c macro extraction, WARN_ON,
    no warns on old platforms. (Chris Wilson)

v3:
  * Single domain per engine, mention all registers,
    bi-directional function and a new name, fix handling
    of gen6 and gen7 writes. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460468251-14069-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-12 15:35:22 +01:00
Chris Wilson
5dd8e50c27 drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor
When reading from the HWS page, we use barrier() to prevent the compiler
optimising away the read from the volatile (may be updated by the GPU)
memory address. This is more suited to READ_ONCE(); make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-5-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:59 +01:00
Chris Wilson
0d317ce99e drm/i915: Use simplest form for flushing the single cacheline in the HWS
Rather than call a function to compute the matching cachelines and
clflush them, just call the clflush *instruction* directly. We also know
that we can use the unpatched plain clflush rather than the clflushopt
alternative.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-4-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:45 +01:00
Chris Wilson
12471ba87a drm/i915: Harden detection of missed interrupts
Only declare a missed interrupt if we find that the GPU is idle with
waiters and a hangcheck interval has passed in which no new user
interrupts have been raised.

v2: Clear the stuck interrupt marker between successful batches

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-3-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:29 +01:00
Chris Wilson
c04e0f3b4e drm/i915: Separate out the seqno-barrier from engine->get_seqno
In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.

v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).

Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]

v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:05 +01:00
Chris Wilson
8c12672ee2 drm/i915: Refactor gen8 semaphore offset calculation
We reuse the same calculation into two macros, and I want to add a third
user. Time to refactor.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-5-git-send-email-chris@chris-wilson.co.uk
2016-04-08 11:43:48 +01:00
Tvrtko Ursulin
27af5eea54 drm/i915: Move execlists irq handler to a bottom half
Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.

Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
 Modules linked in: [redacted for brevity]
 CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
 Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
 Workqueue: i915 gen6_pm_rps_work [i915]
 task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
 RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
 RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
 RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
 RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
 RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
 R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
 R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
 FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Stack:
  042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
  0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
  0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
 Call Trace:
  <IRQ>
  [<ffffffff8104a716>] irq_exit+0x86/0x90
  [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
  [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
  <EOI>
  [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
  [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
  [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
  [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
  [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
  [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
  [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
  [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
  [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
  [<ffffffff8105ab29>] process_one_work+0x139/0x350
  [<ffffffff8105b186>] worker_thread+0x126/0x490
  [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
  [<ffffffff8105fa64>] kthread+0xc4/0xe0
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
  [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170

I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.

When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.

By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.

Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:

gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us

This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:

gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us

Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.

Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)

More interesting scenario with regards to throughput is
"gem_latency -n 100" which  shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.

I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.

v2:
   * execlists_lock should be taken as spin_lock_bh when
     queuing work from userspace now. (Chris Wilson)
   * uncore.lock must be taken with spin_lock_irq when
     submitting requests since that now runs from either
     softirq or process context.

v3:
   * Expanded commit message with more testing data;
   * converted missed locking sites to _bh;
   * added execlist_lock comment. (Chris Wilson)

v4:
   * Mention dispatch parallelism in commit. (Chris Wilson)
   * Do not hold uncore.lock over MMIO reads since the block
     is already serialised per-engine via the tasklet itself.
     (Chris Wilson)
   * intel_lrc_irq_handler should be static. (Chris Wilson)
   * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
   * Document and WARN that tasklet cannot be active/pending
     on engine cleanup. (Chris Wilson/Imre Deak)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-04 14:08:52 +01:00
Jordan Justen
361b027bc6 drm/i915: Use an array of register tables in command parser
For Haswell, we will want another table of registers while retaining
the large common table of whitelisted registers shared by all gen7
devices.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
[danvet: Pipe patch through sed -e 's/\<ring\>/engine/g' to make it
apply.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-03-21 10:02:01 +01:00
Tvrtko Ursulin
117897f42c drm/i915: More renaming of rings to engines
This time using only sed and a few by hand.

v2: Rename also intel_ring_id and intel_ring_initialized.
v3: Fixed typo in intel_ring_initialized.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458126040-33105-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-16 15:33:30 +00:00
Tvrtko Ursulin
666796da7a drm/i915: More intel_engine_cs renaming
Some trivial ones, first pass done with Coccinelle:

@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)

But that didn't fully work so I cleaned it up with:

for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:24 +00:00
Tvrtko Ursulin
4a570db57c drm/i915: Rename intel_engine_cs struct members
below and a couple manual fixups.

@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs *J;
+ struct intel_engine_cs *engine;
...
}
@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs J;
+ struct intel_engine_cs engine;
...
}
@@
struct drm_i915_private *d;
@@
(
- d->ring
+ d->engine
)
@@
struct i915_execbuffer_params *p;
@@
(
- p->ring
+ p->engine
)
@@
struct intel_ringbuffer *r;
@@
(
- r->ring
+ r->engine
)
@@
struct drm_i915_gem_request *req;
@@
(
- req->ring
+ req->engine
)

v2: Script missed the tracepoint code - fixed up by hand.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:17 +00:00
Tvrtko Ursulin
0bc40be85f drm/i915: Rename intel_engine_cs function parameters
@@
identifier func;
@@
func(..., struct intel_engine_cs *
- ring
+ engine
, ...)
{
<...
- ring
+ engine
...>
}
@@
identifier func;
type T;
@@
T func(..., struct intel_engine_cs *
- ring
+ engine
, ...);

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:10 +00:00
Tvrtko Ursulin
e2f8039147 drm/i915: Rename local struct intel_engine_cs variables
Done by the Coccinelle script below plus a manual
intervention to GEN8_RING_SEMAPHORE_INIT.

@@
expression E;
@@
- struct intel_engine_cs *ring = E;
+ struct intel_engine_cs *engine = E;
<+...
- ring
+ engine
...+>
@@
@@
- struct intel_engine_cs *ring;
+ struct intel_engine_cs *engine;
<+...
- ring
+ engine
...+>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-03-16 15:33:00 +00:00
Mika Kuoppala
24a65e624b drm/i915/hangcheck: Prevent long walks across full-ppgtt
With full-ppgtt, it takes the GPU an eon to traverse the entire 256PiB
address space, causing a loop to be detected. Under the current scheme,
if ACTHD walks off the end of a batch buffer and into an empty
address space, we "never" detect the hang. If we always increment the
score as the ACTHD is progressing then we will eventually timeout (after
~46.5s (31 * 1.5s) without advancing onto a new batch). To counter act
this, increase the amount we reduce the score for good batches, so that
only a series of almost-bad batches trigger a full reset. DoS detection
suffers slightly but series of long running shader tests will benefit.

Based on a patch from Chris Wilson.

Testcase: igt/drv_hangman/hangcheck-unterminated
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456930109-21532-1-git-send-email-mika.kuoppala@intel.com
2016-03-04 15:17:14 +02:00
Tvrtko Ursulin
c6a2ac712d drm/i915: Execlists small cleanups and micro-optimisations
Assorted changes in the areas of code cleanup, reduction of
invariant conditional in the interrupt handler and lock
contention and MMIO access optimisation.

 * Remove needless initialization.
 * Improve cache locality by reorganizing code and/or using
   branch hints to keep unexpected or error conditions out
   of line.
 * Favor busy submit path vs. empty queue.
 * Less branching in hot-paths.

v2:

 * Avoid mmio reads when possible. (Chris Wilson)
 * Use natural integer size for csb indices.
 * Remove useless return value from execlists_update_context.
 * Extract 32-bit ppgtt PDPs update so it is out of line and
   shared with two callers.
 * Grab forcewake across all mmio operations to ease the
   load on uncore lock and use chepear mmio ops.

v3:

 * Removed some more pointless u8 data types.
 * Removed unused return from execlists_context_queue.
 * Commit message updates.

v4:
 * Unclumsify the unqueue if statement. (Chris Wilson)
 * Hide forcewake from the queuing function. (Chris Wilson)

Version 3 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.

Benchmarking with "gem_latency -n 100" (keep submitting
batches with 100 nop instruction) shows approximately 4% higher
throughput, 2% less CPU time and 22% smaller latencies. This was
on a big-core while small-cores could benefit even more.

Most likely reason for the improvements are the MMIO
optimization and uncore lock traffic reduction.

One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which is yet unexplained.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-03-01 10:36:02 +00:00
Alex Dai
397097b026 drm/i915/guc: Decouple GuC engine id from ring id
Previously GuC uses ring id as engine id because of same definition.
But this is not true since this commit:

commit de1add3605
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Fri Jan 15 15:12:50 2016 +0000

    drm/i915: Decouple execbuf uAPI from internal implementation

Added GuC engine id into GuC interface to decouple it from ring id used
by driver.

v2: Keep ring name print out in debugfs; using for_each_ring() where
    possible to keep driver consistent. (Chris W.)

Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com
2016-01-25 10:56:30 +00:00
Chris Wilson
426960bed3 drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids
Tvrtko was looking through the execbuffer-ioctl and noticed that the
uABI was tightly coupled to our internal engine identifiers. Close
inspection also revealed that we leak those internal engine identifiers
through the busy-ioctl, and those internal identifiers already do not
match the user identifiers. Fortuitiously, there is only one user of the
set of busy rings from the busy-ioctl, and they only wish to choose
between the RENDER and the BLT engines.

Let's fix the userspace ABI while we still can.

v2: Update the uAPI documentation to explain the identifiers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_busy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk
2016-01-21 11:00:35 +00:00
Tvrtko Ursulin
de1add3605 drm/i915: Decouple execbuf uAPI from internal implementation
At the moment execbuf ring selection is fully coupled to
internal ring ids which is not a good thing on its own.

This dependency is also spread between two source files and
not spelled out at either side which makes it hidden and
fragile.

This patch decouples this dependency by introducing an explicit
translation table of execbuf uAPI to ring id close to the only
call site (i915_gem_do_execbuffer).

This way we are free to change driver internal implementation
details without breaking userspace. All state relating to the
uAPI is now contained in, or next to, i915_gem_do_execbuffer.

As a side benefit, this patch decreases the compiled size
of i915_gem_do_execbuffer.

v2: Extract ring selection into eb_select_ring. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-21 10:55:44 +00:00
Chris Wilson
7c17d37737 drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists
Broadwell and later currently use the same unordered command sequence to
update the seqno in the HWS status page and then assert the user
interrupt. We should apply the w/a from legacy (where we do an mmio
read to delay the seqno read after the interrupt), but this is not
enough to enforce coherent seqno visibilty on Skylake. Rather than
search for the proper post-interrupt seqno barrier, use a strongly
ordered command sequence to write the seqno, then assert the user
interrupt from the ring.

v2: Move around the wa tail dwords to avoid adding duplicate code.

v3: Add references, comments on workarounds and bit5 check.

References: https://bugs.freedesktop.org/show_bug.cgi?id=93693
Testcase: igt/gem_ring_sync_loop #skl
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453297415-17793-1-git-send-email-mika.kuoppala@intel.com
2016-01-21 11:53:09 +02:00
Dave Gordon
ed54c1a1d1 drm/i915: abolish separate per-ring default_context pointers
Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.

All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.

From an idea by Chris Wilson.

v2:	transform an extra instance of ring->default_context introduced by
    42f1cae8c drm/i915: Restore inhibiting the load of the default context
    That patch's commentary includes:
	v2: Mark the global default context as uninitialized on GPU reset so
	    that the context-local workarounds are reloaded upon re-enabling
    The code implementing that now also benefits from the replacement of
    the multiple (per-ring) pointers to the default context with a single
    pointer to the unique kernel context.

v4:	Rebased, remove underused local (Nick Hoath)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-01-21 09:21:29 +01:00
Jani Nikula
e282891472 drm/i915: turn some bogus kernel-doc comments to normal comments
Apparently accidental or misplaced /** kernel-doc comments, confusing
the tool. Turn them to normal comments.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453101588-18008-2-git-send-email-jani.nikula@intel.com
2016-01-20 10:22:08 +02:00
Tvrtko Ursulin
0eb973d31d drm/i915: Cache ringbuffer GTT VMA
Purpose is to avoid calling i915_gem_obj_ggtt_offset from the
interrupt context without the big lock held.

v2: Renamed gtt_start to gtt_offset. (Daniel Vetter)
v3: Cache the VMA instead of address. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-2-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-18 09:58:40 +00:00
Tvrtko Ursulin
ca82580c9c drm/i915: Do not call API requiring struct_mutex where it is not available
LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).

To avoid that this patch caches some interesting bits and values
in the engine and context structures.

Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.

Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.

This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.

v2:
 * Cache the VMA instead of the address. (Chris Wilson)
 * Incorporate Dave Gordon's good comments and function name.

v3:
 * Extract ctx descriptor template to a function and group
   functions dealing with ctx descriptor & co together near
   top of the file. (Dave Gordon)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-01-18 09:58:36 +00:00
Mika Kuoppala
61642ff035 drm/i915: Inspect subunit states on hangcheck
If head seems stuck and engine in question is rcs,
inspect subunit state transitions from undone to done,
before deciding that this really is a hang instead of limited
progress. Only account the transitions of subunits from
undone to done once, to prevent unstable subunit states
to keep us falsely active.

As this adds one extra steps to hangcheck heuristics,
before hang is declared, it adds 1500ms to to detect hang
for render ring to a total of 7500ms. We could sample
the subunit states on first head stuck condition but
decide not to do so only in order to mimic old behaviour. This
way the check order of promotion from seqno > atchd > instdone
is consistently done.

v2: Deal with unstable done states (Arun)
    Clear instdone progress on head and seqno movement (Chris)
    Report raw and accumulated instdone's in in debugfs (Chris)
    Return HANGCHECK_ACTIVE on undone->done

References: https://bugs.freedesktop.org/show_bug.cgi?id=93029
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1448985372-19535-1-git-send-email-mika.kuoppala@intel.com
2016-01-08 13:06:04 +02:00
Dave Gordon
b0366a54b4 drm/i915: intel_ring_initialized() must be simple and inline
Based on Chris Wilson's patch from 6 months ago, rebased and adapted.

The current implementation of intel_ring_initialized() is too heavyweight;
it's a non-inlined function that chases several levels of pointers. This
wouldn't matter too much if it were rarely called, but it's used inside
the iterator test of for_each_ring() and is therefore called quite
frequently. So let's make it simple and inline ...

The idea here is to use ring->dev as an indicator showing which engines
have been initialised and are therefore to be included in iterations that
use for_each_ring(). This allows us to avoid multiple memory references
and a (non-inlined) function call on each iteration of each such loop.

	Fixes regression from
	commit 48d823878d
	Author: Oscar Mateo <oscar.mateo@intel.com>
	Date:   Thu Jul 24 17:04:23 2014 +0100

	    drm/i915/bdw: Generic logical ring init and cleanup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449586956-32360-2-git-send-email-david.s.gordon@intel.com
2015-12-10 14:14:36 +01:00
Ville Syrjälä
f0f59a00a1 drm/i915: Type safe register read/write
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.

This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.

The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.

As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
  lea    0x70024(%rdx,%rax,1),%r9d
  mov    $0x1,%edx
- movslq %r9d,%r9
- mov    %r9,%rsi
- mov    %r9,-0x58(%rbp)
- callq  *0xd8(%rbx)
+ mov    %r9d,%esi
+ mov    %r9d,-0x48(%rbp)
 callq  *0xd8(%rbx)

So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.

v2: i915_mmio_reg_{offset,equal,valid}() helpers added
    s/_REG/_MMIO/ in the register defines
    mo more switch statements left to worry about
    ring_emit stuff got sorted in a prep patch
    cmd parser, lrc context and w/a batch buildup also in prep patch
    vgpu stuff cleaned up and moved to a prep patch
    all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
2015-11-18 15:39:11 +02:00
Ville Syrjälä
f92a916220 drm/i915: Add functions to emit register offsets to the ring
When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.

v2: Convert MOCS setup too

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-11-18 14:35:24 +02:00
Chris Wilson
608c1a526c drm/i915: Recover all available ringbuffer space following reset
Having flushed all requests from all queues, we know that all
ringbuffers must now be empty. However, since we do not reclaim
all space when retiring the request (to prevent HEADs colliding
with rapid ringbuffer wraparound) the amount of available space
on each ringbuffer upon reset is less than when we start. Do one
more pass over all the ringbuffers to reset the available space

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
2015-10-28 17:10:31 +00:00
Chris Wilson
01101fa7cc drm/i915: Refactor common ringbuffer allocation code
A small, very small, step to sharing the duplicate code between
execlists and legacy submission engines, starting with the ringbuffer
allocation code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-09-04 10:17:00 +02:00
Imre Deak
319404df2f drm/i915/bxt: work around HW coherency issue when accessing GPU seqno
By running igt/store_dword_loop_render on BXT we can hit a coherency
problem where the seqno written at GPU command completion time is not
seen by the CPU. This results in __i915_wait_request seeing the stale
seqno and not completing the request (not considering the lost
interrupt/GPU reset mechanism). I also verified that this isn't a case
of a lost interrupt, or that the command didn't complete somehow: when
the coherency issue occured I read the seqno via an uncached GTT mapping
too. While the cached version of the seqno still showed the stale value
the one read via the uncached mapping was the correct one.

Work around this issue by clflushing the corresponding CPU cacheline
following any store of the seqno and preceding any reading of it. When
reading it do this only when the caller expects a coherent view.

v2:
- fix using the proper logical && instead of a bitwise & (Jani, Mika)
- limit the workaround to A stepping, on later steppings this HW issue
  is fixed
v3:
- use a separate get_seqno/set_seqno vfunc (Chris)

Testcase: igt/store_dword_loop_render
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-08-26 09:39:13 +02:00
Daniel Vetter
ca6e440577 Merge tag 'drm-intel-fixes-2015-07-15' into drm-intel-next-queued
Backmerge fixes since it's getting out of hand again with the massive
split due to atomic between -next and 4.2-rc. All the bugfixes in
4.2-rc are addressed already (by converting more towards atomic
instead of minimal duct-tape) so just always pick the version in next
for the conflicts in modeset code.

All the other conflicts are just adjacent lines changed.

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_gtt.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/i915/intel_drv.h
	drivers/gpu/drm/i915/intel_ringbuffer.h

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-07-15 16:36:50 +02:00
Tomas Elf
94f7bbe150 drm/i915: Snapshot seqno of most recently submitted request.
The hang checker needs to inspect whether or not the ring request list is empty
as well as if the given engine has reached or passed the most recently
submitted request. The problem with this is that the hang checker cannot grab
the struct_mutex, which is required in order to safely inspect requests since
requests might be deallocated during inspection. In the past we've had kernel
panics due to this very unsynchronized access in the hang checker.

One solution to this problem is to not inspect the requests directly since
we're only interested in the seqno of the most recently submitted request - not
the request itself. Instead the seqno of the most recently submitted request is
stored separately, which the hang checker then inspects, circumventing the
issue of synchronization from the hang checker entirely.

This fixes a regression introduced in

commit 44cdd6d219
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:40 2014 +0000

    drm/i915: Convert 'ring_idle()' to use requests not seqnos

v2 (Chris Wilson):
- Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
than compute it over again.
- Remove extra whitespace.

Issue: VIZ-5998
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add regressing commit citation provided by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-13 22:42:39 +02:00
Abdiel Janulgue
919032ec7c drm/i915: Enable resource streamer bits on MI_BATCH_BUFFER_START
Adds support for enabling the resource streamer on the legacy
ringbuffer for HSW and GEN8.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-06 10:25:57 +02:00
John Harrison
79bbcc299f drm/i915: Reserve space improvements
An earlier patch was added to reserve space in the ring buffer for the
commands issued during 'add_request()'. The initial version was
pessimistic in the way it handled buffer wrapping and would cause
premature wraps and thus waste ring space.

This patch updates the code to better handle the wrap case. It no
longer enforces that the space being asked for and the reserved space
are a single contiguous block. Instead, it allows the reserve to be on
the far end of a wrap operation. It still guarantees that the space is
available so when the wrap occurs, no wait will happen. Thus the wrap
cannot fail which is the whole point of the exercise.

Also fixed a merge failure with some comments from the original patch.

v2: Incorporated suggestion by David Gordon to move the wrap code
inside the prepare function and thus allow a single combined
wait_for_space() call rather than doing one before the wrap and
another after. This also makes the prepare code much simpler and
easier to follow.

v3: Fix for 'effective_size' vs 'size' during ring buffer remainder
calculations (spotted by Tomas Elf).

For: VIZ-5115
CC: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-03 07:38:59 +02:00
John Harrison
bccca494f7 drm/i915: Remove the now obsolete 'outstanding_lazy_request'
The outstanding_lazy_request is no longer used anywhere in the driver.
Everything that was looking at it now has a request explicitly passed in from on
high. Everything that was relying upon it behind the scenes is now explicitly
creating/passing/submitting its own private request. Thus the OLR can be
removed.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:32 +02:00
John Harrison
59c35a4d12 drm/i915: Remove the now obsolete intel_ring_get_request()
Much of the driver has now been converted to passing requests around instead of
rings/ringbufs/contexts. Thus the function for retreiving the request from a
ring (i.e. the OLR) is no longer used and can be removed.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:31 +02:00
John Harrison
ccd98fe499 drm/i915: Add *_ring_begin() to request allocation
Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:30 +02:00
John Harrison
5fb9de1a2e drm/i915: Update intel_ring_begin() to take a request structure
Now that everything above has been converted to use requests, intel_ring_begin()
can be updated to take a request instead of a ring. This also means that it no
longer needs to lazily allocate a request if no-one happens to have done it
earlier.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:29 +02:00
John Harrison
bba09b12b4 drm/i915: Update cacheline_align() to take a request structure
Updated intel_ring_cacheline_align() to take a request instead of a ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:28 +02:00
John Harrison
f71696876a drm/i915: Update ring->signal() to take a request structure
Updated the various ring->signal() implementations to take a request instead of
a ring. This removes their reliance on the OLR to obtain the seqno value that
should be used for the signal.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:27 +02:00
John Harrison
599d924c6b drm/i915: Update ring->sync_to() to take a request structure
Updated the ring->sync_to() implementations to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the patch which added ->uniq.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:27 +02:00
John Harrison
be795fc17b drm/i915: Update ring->emit_bb_start() to take a request structure
Updated the ring->emit_bb_start() implementation to take a request instead of a
ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:26 +02:00
John Harrison
53fddaf70d drm/i915: Update ring->dispatch_execbuffer() to take a request structure
Updated the various ring->dispatch_execbuffer() implementations to take a
request instead of a ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:25 +02:00
John Harrison
c4e766389e drm/i915: Update ring->emit_request() to take a request structure
Updated the ring->emit_request() implementation to take a request instead of a
ringbuf/request pair. Also removed its use of the OLR for obtaining the
request's seqno.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:24 +02:00
John Harrison
ee044a8863 drm/i915: Update ring->add_request() to take a request structure
Updated the various ring->add_request() implementations to take a request
instead of a ring. This removes their reliance on the OLR to obtain the seqno
value that the request should be tagged with.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:24 +02:00
John Harrison
7deb4d3980 drm/i915: Update ring->emit_flush() to take a request structure
Updated the various ring->emit_flush() implementations to take a request instead
of a ringbuf/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:23 +02:00
John Harrison
a84c3ae168 drm/i915: Update ring->flush() to take a requests structure
Updated the various ring->flush() functions to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the addition of req->uniq.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:21 +02:00
John Harrison
4866d729ab drm/i915: Update flush_all_caches() to take request structures
Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:20 +02:00
John Harrison
2f20055d36 drm/i915: Update a bunch of execbuffer helpers to take request structures
Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and
i915_emit_box() to take request structures instead of ring or ringbuf/context
pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:18 +02:00
John Harrison
6258fbe23f drm/i915: Update queue_flip() to take a request structure
Updated the display page flip code to do explicit request creation and
submission rather than relying on the OLR and just hoping that the request
actually gets submitted at some random point.

The sequence is now to create a request, queue the work to the ring, assign the
known request to the flip queue work item then actually submit the work and post
the request.

Note that every single flip function used to finish with
'__intel_ring_advance(ring);'. However, immediately after they return there is
now an add request call which will do the advance anyway. Thus the many
duplicate advance calls have been removed.

v2: Updated commit message with comment about advance removal.

v3: The request can now be allocated by the _sync() code earlier on. Thus the
page flip path does not necessarily need to allocate a new request, it may be
able to re-use one.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:15 +02:00
John Harrison
8753181e10 drm/i915: Update init_context() to take a request structure
Now that everything above has been converted to use requests, it is possible to
update init_context() to take a request pointer instead of a ring/context pair.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:02:12 +02:00
John Harrison
29b1b415fc drm/i915: Reserve ring buffer space for i915_add_request() commands
It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:56 +02:00
Arun Siluvery
17ee950df3 drm/i915/gen8: Add infrastructure to initialize WA batch buffers
Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.
Use do {} while(0) for wa_ctx_emit() macro.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-23 14:01:39 +02:00
Francisco Jerez
c1091b25f5 drm/i915: Extend the parser to check register writes against a mask/value pair.
In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits.  This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset.  For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-06-15 16:00:43 +03:00
Francisco Jerez
4e86f725ce drm/i915: Extend the parser to check register writes against a mask/value pair.
In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits.  This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset.  For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-06-15 12:34:50 +02:00
Chris Wilson
06fbca713e drm/i915: Split the batch pool by engine
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by:  Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-10 08:56:04 +02:00
John Harrison
6689cb2b62 drm/i915: Move common request allocation code into a common function
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-04-01 07:54:30 +02:00
Paulo Zanoni
dbef0f15b5 drm/i915: add frontbuffer tracking to FBC
Kill the blt/render tracking we currently have and use the frontbuffer
tracking infrastructure.

Don't enable things by default yet.

v2: (Rodrigo) Fix small conflict on rebase and typo at subject.
v3: (Paulo) Rebase on RENDER_CS change.
v4: (Paulo) Rebase.
v5: (Paulo) Simplify: flushes don't have origin (Daniel).
            Also rebase due to patch order changes.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-17 22:29:56 +01:00
John Harrison
8e004efc16 drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.

This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.

OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-25 22:43:29 +01:00
Thomas Daniel
b07da53c79 drm/i915: Shift driver's HWSP usage out of reserved range
As of Gen6, the general purpose area of the hardware status page has shrunk and
now begins at dword 0x30.  i915 driver uses dword 0x20 to store the seqno which
is now reserved.  So shift our HWSP dwords up into the general purpose range
before this bites us.

Note that all available documentation just says this is reserved
without going into details about what it's used for.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
[danvet: Add clarification from Thomas that unfortunately Bspec is
silent on what "reserverd" precisely means.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-24 11:50:32 +01:00
Damien Lespiau
af75f26918 drm/i915: Make intel_ring_setup_status_page() static
This function is only used in intel_ringbuffer.c, so restrict it to that
file. The function was moved around to avoid a forward declaration and
group it with its user.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Squash in fixup from Wu Fengguang.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-13 23:28:27 +01:00
Nick Hoath
21076372af drm/i915: Remove FIXME_lrc_ctx backpointer
The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.

v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:53 +01:00
Nick Hoath
72f95afa5f drm/i915: Removed duplicate members from submit_request
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.

Issue: VIZ-4274

v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-27 09:50:52 +01:00
Daniel Vetter
ecfe00d802 drm/i915: s/init()/init_hw()/ in intel_engine_cs
This is (mostly, some exceptions that need fixing) the hw setup
function which starts the ring. And not the function which allocates
all the resources.

Make this clear by giving it a better name.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:27 +01:00
Dave Gordon
ebd0fd4bef drm/i915: Consolidate ring freespace calculations
There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).

This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:26 +01:00
John Harrison
581c26e8a2 drm/i915: Convert 'trace_irq' to use requests rather than seqnos
Updated the trace_irq code to use requests instead of seqnos. This includes
reference counting the request object to ensure it sticks around when required.
Note that getting access to the reference counting functions means moving the
inline i915_trace_irq_get() function from intel_ringbuffer.h to i915_drv.h.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Resolve conflict due to shuffled merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:24 +01:00
John Harrison
6259cead57 drm/i915: Remove 'outstanding_lazy_seqno'
The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:16 +01:00
John Harrison
97b2a6a10a drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req
The object structure contains the last read, write and fenced seqno values for
use in syncrhonisation operations. These have now been replaced with their
request structure counterparts.

Note that to ensure that objects do not end up with dangling pointers, the
assignments of last_*_req include reference count updates. Thus a request cannot
be freed if an object is still hanging on to it for any reason.

v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did
not get updated when 'last_rendering_seqno' became 'last_read|write_seqno'
several millenia ago.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:14 +01:00
John Harrison
b793a00a57 drm/i915: Add helper functions to aid seqno -> request transition
Added helper functions for retrieving the ring and seqno entries from a request
structure. This allows the internal workings of the request structure to be
hidden from code that is using these. It also allows for useful
workarounds/debug code to be added as or when necessary.

Note that it is intended that the majority (if not all) uses of the seqno
accessor will disappear eventually as code is updated to use the request
structure itself rather than working with seqno values.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:35:14 +01:00
Chris Wilson
5c6c600354 drm/i915: Remove DRI1 ring accessors and API
With the deprecation of UMS, and by association DRI1, we have a tough
choice when updating the ring access routines. We either rewrite the
DRI1 routines blindly without testing (so likely to be broken) or take
the liberty of declaring them no longer supported and remove them
entirely. This takes the latter approach.

v2: Also remove the DRI1 sarea updates

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Fix rebase conflicts.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 21:17:11 +01:00
Thomas Daniel
7ba717cf36 drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).

Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.

v2: Same as with the context pinning, we cannot really do it during
an interrupt. Also, pin the default ringbuffers objects regardless
(makes error capture a lot easier).

v3: Rebased. Take a pin reference of the ringbuffer for each item
in the execlist request queue because the hardware may still be using
the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update
is executed.  The ringbuffer must remain pinned until the context save
is complete.  No longer pin and unpin ringbuffer in
populate_lr_context() - this transient address is meaningless and the
pinning can cause a sleep while atomic.

v4: Moved ringbuffer pin and unpin into the lr_context_pin functions.
Downgraded pinning check BUG_ONs to WARN_ONs.

v5: Reinstated WARN_ONs for unexpected execlist states.  Removed unused
variable.

Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:56:44 +01:00
Thomas Daniel
c86ee3a9f8 drm/i915/bdw: Clean up execlist queue items in retire_work
No longer create a work item to clean each execlist queue item.
Instead, move retired execlist requests to a queue and clean up the
items during retire_requests.

v2: Fix legacy ring path broken during overzealous cleanup

v3: Update idle detection to take execlists queue into account

v4: Grab execlist lock when checking queue state

v5: Fix leaking requests by freeing in execlists_retire_requests.

Issue: VIZ-4274
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-19 19:16:45 +01:00
Michel Thierry
771b9a5324 drm/i915: Initialize workarounds in logical ring mode too
Following the legacy ring submission example, update the
ring->init_context() hook to support the execlist submission mode.

v2: update to use the new workaround macros and cleanup unused code.
This takes care of both bdw and chv workarounds.

v2.1: Add missing call to init_context() during deferred context creation.

v3: Split init_context (emit) in legacy/lrc modes. For lrc, get the ringbuf
from the context (Mika/Daniel).

v4: Merge init_context interfaces back, the legacy mode only needs the ring,
but the lrc mode needs the ring and context (Mika).

Issue: VIZ-4092
Issue: GMIN-3475
Change-Id: Ie3d093b2542ab0e2a44b90460533e2f979788d6c
Cc: Deepak S <deepak.s@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Align function paramater lists properly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-14 10:29:25 +01:00
Arun Siluvery
86d7f23842 drm/i915/bdw: Apply workarounds in render ring init function
For BDW workarounds are currently initialized in init_clock_gating() but
they are lost during reset, suspend/resume etc; this patch moves the WAs
that are part of register state context to render ring init fn otherwise
default context ends up with incorrect values as they don't get initialized
until init_clock_gating fn.

v2: Add workarounds to golden render state
This method has its own issues, first of all this is different for
each gen and it is generated using a tool so adding new workaround
and mainitaining them across gens is not a straightforward process.

v3: Use LRIs to emit these workarounds (Ville)
Instead of modifying the golden render state the same LRIs are
emitted from within the driver.

v4: Use abstract name when exporting gen specific routines (Chris)

For: VIZ-4092
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-03 11:04:42 +02:00
Thomas Daniel
e981e7b17f drm/i915/bdw: Handle context switch events
Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch ...]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:43:47 +02:00
Michel Thierry
acdd884a2e drm/i915/bdw: Two-stage execlist submit process
Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 22:10:59 +02:00
Oscar Mateo
582d67f0b1 drm/i915: Add temporary ring->ctx backpointer
The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.

The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-14 18:42:59 +02:00
Oscar Mateo
1564858526 drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
Dispatch_execbuffer's evil twin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Ditch the check for aliasing ppgtt. It'll break soon and
execlists requires full ppgtt anyway.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:12:34 +02:00
Oscar Mateo
73d477f6bb drm/i915/bdw: Interrupts with logical rings
We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

v3: Add new get/put_irq functions.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the GEN8_ prefix from the context switch interrupt
define and move it to its brethren.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 23:06:58 +02:00
Oscar Mateo
4712274c36 drm/i915/bdw: GEN-specific logical ring emit flush
Same as the legacy-style ring->flush.

v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS
rings (but still consolidate the blt and bsd ring flushes into one).
This was noticed by Brad Volkin.

v3: The command for BSD and for other rings is slightly different:
get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:44:37 +02:00
Oscar Mateo
4da46e1e5b drm/i915/bdw: GEN-specific logical ring emit request
Very similar to the legacy add_request, only modified to account for
logical ringbuffer.

v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin.

v3: Unify render and non-render in the same function, as noticed by
Brad Volkin.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:49 +02:00
Oscar Mateo
82e104cc26 drm/i915/bdw: New logical ring submission mechanism
Well, new-ish: if all this code looks familiar, that's because it's
a clone of the existing submission mechanism (with some modifications
here and there to adapt it to LRCs and Execlists).

And why did we do this instead of reusing code, one might wonder?
Well, there are some fears that the differences are big enough that
they will end up breaking all platforms.

Also, Execlists offer several advantages, like control over when the
GPU is done with a given workload, that can help simplify the
submission mechanism, no doubt. I am interested in getting Execlists
to work first and foremost, but in the future this parallel submission
mechanism will help us to fine tune the mechanism without affecting
old gens.

v2: Pass the ringbuffer only (whenever possible).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Appease checkpatch. Again. And drop the legacy sarea gunk
that somehow crept in.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 22:42:36 +02:00
Oscar Mateo
9b1136d505 drm/i915/bdw: GEN-specific logical ring init
Logical rings do not need most of the initialization their
legacy ringbuffer counterparts do: we just need the pipe
control object for the render ring, enable Execlists on the
hardware and a few workarounds.

v2: Squash with: "drm/i915: Extract pipe control fini & make
init outside accesible".

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Make checkpatch happy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 17:03:28 +02:00
Oscar Mateo
48d823878d drm/i915/bdw: Generic logical ring init and cleanup
Allocate and populate the default LRC for every ring, call
gen-specific init/cleanup, init/fini the command parser and
set the status page (now inside the LRC object). These are
things all engines/rings have in common.

Stopping the ring before cleanup and initializing the seqnos
is left as a TODO task (we need more infrastructure in place
before we can achieve this).

v2: Check the ringbuffer backing obj for ring_is_initialized,
instead of the context backing obj (similar, but not exactly
the same).

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:55:17 +02:00
Daniel Vetter
0c7dd53b84 drm/i915/bdw: Add a context and an engine pointers to the ringbuffer
Any given ringbuffer is unequivocally tied to one context and one engine.
By setting the appropriate pointers to them, the ringbuffer struct holds
all the infromation you might need to submit a workload for processing,
Execlists style.

v2: Drop ring->ctx since that looks terribly ill-defined for legacy
ringbuffer submission.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:18:17 +02:00
Oscar Mateo
84c2377fce drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
As we have said a couple of times by now, logical ring contexts have
their own ringbuffers: not only the backing pages, but the whole
management struct.

In a previous version of the series, this was achieved with two separate
patches:
drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC
drm/i915/bdw: Allocate ringbuffer for user-created LRCs

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-11 16:10:58 +02:00
Mika Kuoppala
f260fe7b2f drm/i915: Don't accumulate hangcheck score on forward progress
If the actual head has progressed forward inside a batch (request),
don't accumulate hangcheck score.

As the hangcheck score in increased only by acthd jumping backwards,
the result is that we only declare an active batch as stuck if it is
trapped inside a loop. Or that the looping will dominate the batch
progression so that it overcomes the bonus that forward progress gives.

v2: Improved commit message (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: s/active_loop/active (loop)/ as requested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-08-07 14:04:07 +02:00
Oscar Mateo
1b5d063faf drm/i915: Generalize intel_ring_get_tail to take a ringbuf
Again, it's low-level enough to simply take a ringbuf and nothing
else.

Trivial change.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-08 12:31:02 +02:00
Ben Widawsky
3e78998a58 drm/i915/bdw: implement semaphore signal
Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.

The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).

NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.

v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)

v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)

v4: Fixed compilation rebase conflict
commit 1ec9e26dda
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Feb 14 14:01:11 2014 +0100

    drm/i915: Consolidate binding parameters into flags

v5: VCS2 rebase
Replace hweight_long with hweight32

v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
   	      * fixing conflicst on rebase
    	      * minor fixes on address table
	      * remove WARN_ON

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: s/BUG_ON/WARN_ON/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:16:23 +02:00
Rodrigo Vivi
ddd4dbc6c1 drm/i915: Updating comments.
ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-07 22:02:49 +02:00