Commit Graph

108 Commits

Author SHA1 Message Date
Chris Wilson
2e46a2a0b0 drm/i915: Use explicit flag to mark unreachable intel_context
I need to keep the GEM context around a bit longer so adding an explicit
flag for syncing execbuf with closed/abandonded contexts.

v2:
 * Use already available context flags. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200319170707.8262-1-chris@chris-wilson.co.uk
(cherry picked from commit 207e4a71fb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-03-26 10:21:04 -07:00
Chris Wilson
af7a272ef6 drm/i915/gt: Only delay the context barrier pm
It is strictly sufficient to only delay the intel_engine_pm_put from the
context barrier (and not from the context exit) in order to prevent the
gem_exec_nop contention. Adding the delay to the context exit incurs
noticably extra penalty for soft-rc6.

Fixes: edee52c927 ("drm/i915/gt: Delay release of engine-pm after last retirement")
Testcase: igt/i915_pm_rc6_residency/rc6-idle
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323192029.20723-1-chris@chris-wilson.co.uk
2020-03-23 20:38:56 +00:00
Chris Wilson
edee52c927 drm/i915/gt: Delay release of engine-pm after last retirement
Keep the engine-pm awake until the next jiffie, to avoid immediate
ping-pong under moderate load. (Forcing the idle barrier excerbates the
moderate load, dramatically increasing the driver overhead.) On the
other hand, delaying the idle-barrier slightly incurs longer rc6-off
and so more power consumption.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/848
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-4-chris@chris-wilson.co.uk
2020-03-23 12:51:19 +00:00
Chris Wilson
bb6892b7ce drm/i915/gt: Use the correct err_unlock unwind path for a closed context
A silly cut'n'paste copied the unlocked error path and used it inside
the pin_mutex lock, we need to drop that lock before returning.

Fixes: b412c63f1c ("drm/i915/gt: Report context-is-closed prior to pinning")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200322123241.17694-1-chris@chris-wilson.co.uk
2020-03-22 13:44:24 +00:00
Chris Wilson
b412c63f1c drm/i915/gt: Report context-is-closed prior to pinning
Our assertion caught that we do try to pin a closed context if userspace
is viciously racing context-closure with execbuf, so make it fail
gracefully.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1492
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200320130159.3922-1-chris@chris-wilson.co.uk
2020-03-20 13:27:28 +00:00
Chris Wilson
207e4a71fb drm/i915: Use explicit flag to mark unreachable intel_context
I need to keep the GEM context around a bit longer so adding an explicit
flag for syncing execbuf with closed/abandonded contexts.

v2:
 * Use already available context flags. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200319170707.8262-1-chris@chris-wilson.co.uk
2020-03-19 21:28:20 +00:00
Chris Wilson
373f27f24c drm/i915/gt: Prevent allocation on a banned context
If a context is banned even before we submit our first request to it,
report the failure before we attempt to allocate any resources for the
context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200303080546.1140508-2-chris@chris-wilson.co.uk
2020-03-03 17:30:20 +00:00
Tvrtko Ursulin
1883a0a465 drm/i915: Track hw reported context runtime
GPU saves accumulated context runtime (in CS timestamp units) in PPHWSP
which will be useful for us in cases when we are not able to track context
busyness ourselves (like with GuC). Keep a copy of this in struct
intel_context from where it can be easily read even if the context is not
pinned.

v2:
 (Chris)
 * Do not store pphwsp address in intel_context.
 * Log CS wrap-around.
 * Simplify calculation by relying on integer wraparound.
v3:
 * Include total/avg in traces and error state for debugging

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200216133620.394962-1-chris@chris-wilson.co.uk
2020-02-16 15:16:22 +00:00
Chris Wilson
c616d2387a drm/i915/gt: Expand bad CS completion event debug
Show the ring/request/context state if we see what we believe is an
early CS completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211230944.1203098-1-chris@chris-wilson.co.uk
2020-02-12 20:34:36 +00:00
Chris Wilson
e5429340bf drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex
Similar to commit ac0e331a62 ("drm/i915: Tighten atomicity of
i915_active_acquire vs i915_active_release") we have the same race of
trying to pin the context underneath a mutex while allowing the
decrement to be atomic outside of that mutex. This leads to the problem
where two threads may simultaneously try to pin the context and the
second not notice that they needed to repin the context.

<2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
<4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
<4> [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
<4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
<4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
<4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
<4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
<4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
<4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
<4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
<4> [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
<4> [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
<4> [198.669872] Call Trace:
<4> [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
<4> [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
<4> [198.670024]  i915_request_create+0x86/0x1c0 [i915]
<4> [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
<4> [198.670082]  ? __lock_acquire+0x460/0x15d0
<4> [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
<4> [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [198.670181]  drm_ioctl_kernel+0xa7/0xf0
<4> [198.670188]  drm_ioctl+0x2e1/0x390
<4> [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]

Fixes: 8413502238 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: ac0e331a62 ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
2020-01-27 21:11:59 +00:00
Chris Wilson
8ccfc20a7d drm/i915/gt: Mark ring->vma as active while pinned
As we use the active state to keep the vma alive while we are reading
its contents during GPU error capture, we need to mark the
ring->vma as active during execution if we want to include the rinbuffer
in the error state.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110110402.1231745-3-chris@chris-wilson.co.uk
2020-01-10 14:48:35 +00:00
Chris Wilson
1b8bfc5726 drm/i915/gt: Mark context->state vma as active while pinned
As we use the active state to keep the vma alive while we are reading
its contents during GPU error capture, we need to mark the
context->state vma as active during execution if we want to include it
in the error state.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110110402.1231745-2-chris@chris-wilson.co.uk
2020-01-10 14:48:34 +00:00
Chris Wilson
d4fb2d7609 drm/i915/gt: runtime-pm is no longer required for ce->ops->pin()
Now that we have moved the runtime-pm management out of
intel_context_acctive_acquire, and that itself out of ce->ops->pin(), no
explicit runtime pm wakeref is required in intel_context_pin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109085717.873326-3-chris@chris-wilson.co.uk
2020-01-09 13:48:13 +00:00
Chris Wilson
b11b28ea0d drm/i915/gt: Pull context activation into central intel_context_pin()
While this is encroaching on midlayer territory, having already made the
state allocation a previous step in pinning, we can now pull the common
intel_context_active_acquire() into intel_context_pin() itself. This is
a prelude to make the activation a separate step inside pinning, outside
of the ce->pin_mutex

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109085717.873326-2-chris@chris-wilson.co.uk
2020-01-09 13:48:00 +00:00
Chris Wilson
89f98d634f drm/i915/gt: Push context state allocation earlier
Allow for knowledgeable users to preallocate the context state, and to
separate the allocation step from the pinning step during
intel_context_pin()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109085717.873326-1-chris@chris-wilson.co.uk
2020-01-09 13:48:00 +00:00
Chris Wilson
8413502238 drm/i915/gt: Drop mutex serialisation between context pin/unpin
The last remaining reason for serialising the pin/unpin of the
intel_context is to ensure that our preallocated wakerefs are not
consumed too early (i.e. the unpin of the previous phase does not emit
the idle barriers for this phase before we even submit). All of the
other operations within the context pin/unpin are supposed to be
atomic...  Therefore, we can reduce the serialisation to being just on
the i915_active.preallocated_barriers itself and drop the nested
pin_mutex from intel_context_unpin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-5-chris@chris-wilson.co.uk
2020-01-06 14:38:57 +00:00
Chris Wilson
3fbbbef4f5 drm/i915/gt: Convert the final GEM_TRACE to GT_TRACE and co
Convert the few remaining GEM_TRACE() used for debugging over to the
appropriate GT_TRACE or RQ_TRACE.

References: 639f2f2489 ("drm/i915: Introduce new macros for tracing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-4-chris@chris-wilson.co.uk
2020-01-06 14:38:56 +00:00
Chris Wilson
e6ba764802 drm/i915: Remove i915->kernel_context
Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).

Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.

GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
2019-12-21 16:37:10 +00:00
Chris Wilson
0f100b7048 drm/i915: Push the use-semaphore marker onto the intel_context
Instead of rummaging through the intel_context to peek at the GEM
context in the middle of request submission to decide whether to use
semaphores, store that information on the intel_context itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-2-chris@chris-wilson.co.uk
2019-12-20 10:57:10 +00:00
Chris Wilson
9f3ccd40ac drm/i915: Drop GEM context as a direct link from i915_request
Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
2019-12-20 10:52:21 +00:00
Venkata Sandeep Dhanalakota
639f2f2489 drm/i915: Introduce new macros for tracing
New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and
GT_TRACE() are introduce to tag device name and engine
name with contexts and requests tracing in i915.

Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com
2019-12-13 20:16:23 +00:00
Chris Wilson
ccd2094559 drm/i915: Try hard to bind the context
It is not acceptable for context pinning to fail with -ENOSPC as we
should always be able to make space in the GGTT. The only reason we may
fail is that other "temporary" context pins are reserving their space
and we need to wait for an available slot.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/676
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205113726.413351-2-chris@chris-wilson.co.uk
2019-12-05 13:50:54 +00:00
Chris Wilson
f70de8d2ca drm/i915/gt: Track the context validity explicitly
Rather than assume if and only if the engine->default_state is not set
that the context is invalid, instead track when we know the context has
valid state -- either because we have copied the default_state or we
have completed a context switch to save the HW state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203124155.3019926-1-chris@chris-wilson.co.uk
2019-12-03 16:49:31 +00:00
Chris Wilson
df9f85d858 drm/i915: Serialise i915_active_fence_set() with itself
The expected downside to commit 58b4c1a07a ("drm/i915: Reduce nested
prepare_remote_context() to a trylock") was that it would need to return
-EAGAIN to userspace in order to resolve potential mutex inversion. Such
an unsightly round trip is unnecessary if we could atomically insert a
barrier into the i915_active_fence, so make it happen.

Currently, we use the timeline->mutex (or some other named outer lock)
to order insertion into the i915_active_fence (and so individual nodes
of i915_active). Inside __i915_active_fence_set, we only need then
serialise with the interrupt handler in order to claim the timeline for
ourselves.

However, if we remove the outer lock, we need to ensure the order is
intact between not only multiple threads trying to insert themselves
into the timeline, but also with the interrupt handler completing the
previous occupant. We use xchg() on insert so that we have an ordered
sequence of insertions (and each caller knows the previous fence on
which to wait, preserving the chain of all fences in the timeline), but
we then have to cmpxchg() in the interrupt handler to avoid overwriting
the new occupant. The only nasty side-effect is having to temporarily
strip off the RCU-annotations to apply the atomic operations, otherwise
the rules are much more conventional!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402
Fixes: 58b4c1a07a ("drm/i915: Reduce nested prepare_remote_context() to a trylock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
2019-11-27 17:02:14 +00:00
Chris Wilson
58b4c1a07a drm/i915: Reduce nested prepare_remote_context() to a trylock
On context retiring, we may invoke the kernel_context to unpin this
context. Elsewhere, we may use the kernel_context to modify this
context. This currently leads to an AB-BA lock inversion, so we need to
back-off from the contended lock, and repeat.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111732
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: a9877da2d6 ("drm/i915/oa: Reconfigure contexts on the fly")
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191126065521.2331017-1-chris@chris-wilson.co.uk
2019-11-26 12:45:45 +00:00
Chris Wilson
2871ea85c1 drm/i915/gt: Split intel_ring_submission
Split the legacy submission backend from the common CS ring buffer
handling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
2019-10-24 12:14:21 +01:00
Chris Wilson
cd6a851385 drm/i915/gt: Prefer local path to runtime powermanagement
Avoid going to the base i915 device when we already have a path from gt
to the runtime powermanagement interface. The benefit is that it looks a
bit more self-consistent to always be acquiring the gt->uncore->rpm for
use with the gt->uncore.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191007154531.1750-1-chris@chris-wilson.co.uk
2019-10-07 21:44:02 +01:00
Chris Wilson
a4e7ccdac3 drm/i915: Move context management under GEM
Keep track of the GEM contexts underneath i915->gem.contexts and assign
them their own lock for the purposes of list management.

v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex
v3: Correct split with removal of logical HW ID

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk
2019-10-04 15:39:34 +01:00
Chris Wilson
b1e3177bd1 drm/i915: Coordinate i915_active with its own mutex
Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

v2: Add some comments that barely elucidate anything :(

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
2019-10-04 15:39:12 +01:00
Chris Wilson
274cbf20fd drm/i915: Push the i915_active.retire into a worker
As we need to use a mutex to serialise i915_active activation
(because we want to allow the callback to sleep), we need to push the
i915_active.retire into a worker callback in case we get need to retire
from an atomic context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-5-chris@chris-wilson.co.uk
2019-10-04 15:39:10 +01:00
Chris Wilson
d19d71fc2b drm/i915: Mark i915_request.timeline as a volatile, rcu pointer
The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.

One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.

v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
2019-09-20 10:24:09 +01:00
Chris Wilson
99013b1010 drm/i915: Make shrink/unshrink be atomic
Add an atomic counter and always take the spinlock around the pin/unpin
events, so that we can perform the list manipulation concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910212204.17190-1-chris@chris-wilson.co.uk
2019-09-11 08:14:23 +01:00
Chris Wilson
25ffd4b11d drm/i915: Markup expected timeline locks for i915_active
As every i915_active_request should be serialised by a dedicated lock,
i915_active consists of a tree of locks; one for each node. Markup up
the i915_active_request with what lock is supposed to be guarding it so
that we can verify that the serialised updated are indeed serialised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-2-chris@chris-wilson.co.uk
2019-08-16 18:02:07 +01:00
Chris Wilson
531958f6f3 drm/i915/gt: Track timeline activeness in enter/exit
Lift moving the timeline to/from the active_list on enter/exit in order
to shorten the active tracking span in comparison to the existing
pin/unpin.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-1-chris@chris-wilson.co.uk
2019-08-15 23:16:05 +01:00
Chris Wilson
75d0a7f31e drm/i915: Lift timeline into intel_context
Move the timeline from being inside the intel_ring to intel_context
itself. This saves much pointer dancing and makes the relations of the
context to its timeline much clearer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk
2019-08-09 20:18:30 +01:00
Chris Wilson
48ae397b6b drm/i915: Push the ring creation flags to the backend
Push the ring creation flags from the outer GEM context to the inner
intel_context to avoid an unsightly back-reference from inside the
backend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-3-chris@chris-wilson.co.uk
2019-08-09 20:18:30 +01:00
Chris Wilson
4c60b1aaa2 drm/i915/gt: Make deferred context allocation explicit
Refactor the backends to handle the deferred context allocation in a
consistent manner, and allow calling it as an explicit first step in
pinning a context for the first time. This should make it easier for
backends to keep track of partially constructed contexts from
initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-2-chris@chris-wilson.co.uk
2019-08-09 20:18:30 +01:00
Chris Wilson
1aff1903d0 drm/i915: Hide unshrinkable context objects from the shrinker
The shrinker cannot touch objects used by the contexts (logical state
and ring). Currently we mark those as "pin_global" to let the shrinker
skip over them, however, if we remove them from the shrinker lists
entirely, we don't event have to include them in our shrink accounting.

By keeping the unshrinkable objects in our shrinker tracking, we report
a large number of objects available to be shrunk, and leave the shrinker
deeply unsatisfied when we fail to reclaim those. The shrinker will
persist in trying to reclaim the unavailable objects, forcing the system
into a livelock (not even hitting the dread oomkiller).

v2: Extend unshrinkable protection for perma-pinned scratch and guc
allocations (Tvrtko)
v3: Notice that we should be pinned when marking unshrinkable and so the
link cannot be empty; merge duplicate paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-1-chris@chris-wilson.co.uk
2019-08-02 23:39:46 +01:00
Chris Wilson
d8af05ff38 drm/i915: Allow sharing the idle-barrier from other kernel requests
By placing our idle-barriers in the i915_active fence tree, we expose
those for reuse by other components that are issuing requests along the
kernel_context. Reusing the proto-barrier active_node is perfectly fine
as the new request implies a context-switch, and so an opportune point
to run the idle-barrier. However, the proto-barrier is not equivalent
to a normal active_node and care must be taken to avoid dereferencing the
ERR_PTR used as its request marker.

v2: Comment the more egregious cheek
v3: A glossary!

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ce476c80b8 ("drm/i915: Keep contexts pinned until after the next kernel context switch")
Fixes: a9877da2d6 ("drm/i915/oa: Reconfigure contexts on the fly")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802100015.1281-1-chris@chris-wilson.co.uk
2019-08-02 11:53:04 +01:00
Chris Wilson
f5d974f9d2 drm/i915/gt: Provide a local intel_context.vm
Track the currently bound address space used by the HW context. Minor
conversions to use the local intel_context.vm are made, leaving behind
some more surgery required to make intel_context the primary through the
selftests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730143209.4549-2-chris@chris-wilson.co.uk
2019-07-30 16:09:35 +01:00
Chris Wilson
340c4c8daa drm/i915/gt: Add to timeline requires the timeline mutex
Modifying a remote context requires careful serialisation with requests
on that context, and that serialisation requires us to take their
timeline->mutex. Make it so.

Note that while struct_mutex rules, we can't create more than one
request in parallel, but that age is soon coming to an end.

v2: Though it doesn't affect the current users, contexts may share
timelines so check if we already hold the right mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190725131447.27515-1-chris@chris-wilson.co.uk
2019-07-26 19:25:34 +01:00
Chris Wilson
df8cf31e74 drm/i915/gt: Hook up intel_context_fini()
Prior to freeing the struct, call the fini function to cleanup the
common members. Currently this only calls the debug functions to mark
the structs as destroyed, but may be extended to real work in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190718070024.21781-2-chris@chris-wilson.co.uk
2019-07-22 23:20:07 +01:00
Chris Wilson
a9877da2d6 drm/i915/oa: Reconfigure contexts on the fly
Avoid a global idle barrier by reconfiguring each context by rewriting
them with MI_STORE_DWORD from the kernel context.

v2: We only need to determine the desired register values once, they are
the same for all contexts.
v3: Don't remove the kernel context from the list of known GEM contexts;
the world is not ready for that yet.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk
2019-07-17 07:58:27 +01:00
Chris Wilson
cba17e5d64 drm/i915/gt: Add some debug tracing for context pinning
Add the context pin/unpin events to the trace for post-mortem debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625194859.28005-1-chris@chris-wilson.co.uk
2019-06-26 07:44:43 +01:00
Chris Wilson
12c255b5da drm/i915: Provide an i915_active.acquire callback
If we introduce a callback for i915_active that is only called the first
time we use the i915_active and is symmetrically paired with the
i915_active.retire callback, we can replace the open-coded and
non-atomic implementations -- which will be very fragile (i.e. broken)
upon removing the struct_mutex serialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk
2019-06-21 19:47:55 +01:00
Chris Wilson
09c5ab384f drm/i915: Keep rings pinned while the context is active
Remember to keep the rings pinned as well as the context image until the
GPU is no longer active.

v2: Introduce a ring->pin_count primarily to hide the
mock_ring that doesn't fit into the normal GGTT vma picture.

v3: Order is important in teardown, ringbuffer submission needs to drop
the pin count on the engine->kernel_context before it can gleefully free
its ring.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110946
Fixes: ce476c80b8 ("drm/i915: Keep contexts pinned until after the next kernel context switch")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190619170135.15281-1-chris@chris-wilson.co.uk
2019-06-19 19:49:14 +01:00
Chris Wilson
44d89409a1 drm/i915: Make the semaphore saturation mask global
The idea behind keeping the saturation mask local to a context backfired
spectacularly. The premise with the local mask was that we would be more
proactive in attempting to use semaphores after each time the context
idled, and that all new contexts would attempt to use semaphores
ignoring the current state of the system. This turns out to be horribly
optimistic. If the system state is still oversaturated and the existing
workloads have all stopped using semaphores, the new workloads would
attempt to use semaphores and be deprioritised behind real work. The
new contexts would not switch off using semaphores until their initial
batch of low priority work had completed. Given sufficient backload load
of equal user priority, this would completely starve the new work of any
GPU time.

To compensate, remove the local tracking in favour of keeping it as
global state on the engine -- once the system is saturated and
semaphores are disabled, everyone stops attempting to use semaphores
until the system is idle again. One of the reason for preferring local
context tracking was that it worked with virtual engines, so for
switching to global state we could either do a complete check of all the
virtual siblings or simply disable semaphores for those requests. This
takes the simpler approach of disabling semaphores on virtual engines.

The downside is that the decision that the engine is saturated is a
local measure -- we are only checking whether or not this context was
scheduled in a timely fashion, it may be legitimately delayed due to user
priorities. We still have the same dilemma though, that we do not want
to employ the semaphore poll unless it will be used.

v2: Explain why we need to assume the worst wrt virtual engines.

Fixes: ca6e56f654 ("drm/i915: Disable semaphore busywaits on saturated systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk
2019-06-19 12:10:45 +01:00
Chris Wilson
ce476c80b8 drm/i915: Keep contexts pinned until after the next kernel context switch
We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.

The sequence of operations for keeping the context pinned until saved is:

 - On context activation, we preallocate a node for each physical engine
   the context may operate on. This is to avoid allocations during
   unpinning, which may be from inside FS_RECLAIM context (aka the
   shrinker)

 - On context deactivation on retirement of the last active request (which
   is before we know the context has been saved), we add the
   preallocated node onto a barrier list on each engine

 - On engine idling, we emit a switch to kernel context. When this
   switch completes, we know that all previous contexts must have been
   saved, and so on retiring this request we can finally unpin all the
   contexts that were marked as deactivated prior to the switch.

We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.

v2: intel_context_active_acquire/_release

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
2019-06-14 19:03:32 +01:00
Daniele Ceraolo Spurio
c447ff7db3 drm/i915: update with_intel_runtime_pm to use the rpm structure
Matching the underlying get/put functions.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-8-daniele.ceraolospurio@intel.com
2019-06-14 15:58:33 +01:00
Chris Wilson
10be98a77c drm/i915: Move more GEM objects under gem/
Continuing the theme of separating out the GEM clutter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
2019-05-28 12:45:29 +01:00
Chris Wilson
ca6e56f654 drm/i915: Disable semaphore busywaits on saturated systems
Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.

The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.

This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.

With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
|                                                    *                   |
|                                                    *+                  |
|                                                    **+                 |
|                                                    **+  x              |
|                                x               *  +**+  x              |
|                                x  x       *    *  +***x xx             |
|                                x  x       *    * *+***x *x             |
|                                x  x*   +  *    * *****x *x x           |
|                         +    x xx+x*   + ***   * ********* x   *       |
|                         +    x xx+x*   * *** +** ********* xx  *       |
|    *   +         ++++*  +    x*x****+*+* ***+*************+x*  *       |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++    *|
|                                   |__________A_____M_____|             |
|                           |_______________A____M_________|             |
|                                 |____________A___M________|            |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120       2.60475       3.50941       3.31123     3.2143953    0.21117399
+ 120        2.3826       3.57077       3.25101     3.1414161    0.28146407
Difference at 95.0% confidence
	-0.0729792 +/- 0.0629585
	-2.27039% +/- 1.95864%
	(Student's t, pooled s = 0.248814)
* 120       2.35536       3.66713        3.2849     3.2059917    0.24618565
No difference proven at 95.0% confidence

With 10 clients over-saturating the pipeline:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
|                     ++                                        **       |
|                     ++                                        **       |
|                     ++                                        **       |
|                     ++                                        **       |
|                     ++                                    xx ***       |
|                     ++                                    xx ***       |
|                     ++                                    xxx***       |
|                     ++                                    xxx***       |
|                    +++                                    xxx***       |
|                    +++                                    xx****       |
|                    +++                                    xx****       |
|                    +++                                    xx****       |
|                    +++                                    xx****       |
|                    ++++                                   xx****       |
|                   +++++                                   xx****       |
|                   +++++                                 x x******      |
|                  ++++++                                 xxx*******     |
|                  ++++++                                 xxx*******     |
|                  ++++++                                 xxx*******     |
|                  ++++++                                 xx********     |
|                  ++++++                               xxxx********     |
|                  ++++++                               xxxx********     |
|                ++++++++                             xxxxx*********     |
|+ +  +        + ++++++++                           xxx*xx**********x*  *|
|                                                         |__A__|        |
|                 |__AM__|                                               |
|                                                            |__A_|      |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120       2.47855        2.8972       2.72376     2.7193402   0.074604933
+ 120       1.17367       1.77459       1.71977     1.6966782   0.085850697
Difference at 95.0% confidence
	-1.02266 +/- 0.0203502
	-37.607% +/- 0.748352%
	(Student's t, pooled s = 0.0804246)
* 120       2.57868       3.00821       2.80142     2.7923878   0.058646477
Difference at 95.0% confidence
	0.0730476 +/- 0.0169791
	2.68622% +/- 0.624383%
	(Student's t, pooled s = 0.0671018)

Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.

Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.

One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.

The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
2019-05-04 09:18:02 +01:00
Chris Wilson
0268444607 drm/i915: Remove intel_context.active_link
We no longer need to track the active intel_contexts within each engine,
allowing us to drop a tricky mutex_lock from inside unpin (which may
occur inside fs_reclaim).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-8-chris@chris-wilson.co.uk
2019-04-26 18:32:17 +01:00
Chris Wilson
5e2a0419ef drm/i915: Switch back to an array of logical per-engine HW contexts
We switched to a tree of per-engine HW context to accommodate the
introduction of virtual engines. However, we plan to also support
multiple instances of the same engine within the GEM context, defeating
our use of the engine as a key to looking up the HW context. Just
allocate a logical per-engine instance and always use an index into the
ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user
specified map.

v2: Add for_each_gem_engine() helper to iterator within the engines lock
v3: intel_context_create_request() helper
v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough.
v5: Push iterator locking to caller

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk
2019-04-26 18:32:11 +01:00
Chris Wilson
6b736de574 drm/i915: Pass intel_context to intel_context_pin_lock()
Move the intel_context_instance() to the caller so that we can decouple
ourselves from one context instance per engine.

v2: Rename pin_lock() to lock_pinned(), hopefully that is clearer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-5-chris@chris-wilson.co.uk
2019-04-26 18:32:05 +01:00
Chris Wilson
fa9f668141 drm/i915: Export intel_context_instance()
We want to pass in a intel_context into intel_context_pin() and that
requires us to first be able to lookup the intel_context!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk
2019-04-26 18:32:00 +01:00
Chris Wilson
79ffac8599 drm/i915: Invert the GEM wakeref hierarchy
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)

Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.

Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-04-24 22:26:49 +01:00
Chris Wilson
6eee33e87f drm/i915: Introduce context->enter() and context->exit()
We wish to start segregating the power management into different control
domains, both with respect to the hardware and the user interface. The
first step is that at the lowest level flow of requests, we want to
process a context event (and not a global GEM operation). In this patch,
we introduce the context callbacks that in future patches will be
redirected to per-engine interfaces leading to global operations as
required.

The intent is that this will be guarded by the timeline->mutex, except
that retiring has not quite finished transitioning over from being
guarded by struct_mutex. So at the moment it is protected by
struct_mutex with a reminded to switch.

v2: Rename default handlers to intel_context_enter_engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk
2019-04-24 22:25:32 +01:00
Chris Wilson
112ed2d31a drm/i915: Move GraphicsTechnology files under gt/
Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/

One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
2019-04-24 21:01:46 +01:00