linux

Author	SHA1	Message	Date
Chris Wilson	675d9ad71b	drm/i915: Track requests inside each intel_ring By tracking each request occupying space inside an individual intel_ring, we can greatly simplify the logic of tracking available space and not worry about other timelines. (Each ring is an ordered timeline of committed requests.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-17-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:26 +01:00
Chris Wilson	fa545cbf97	drm/i915: Refactor activity tracking for requests With the introduction of requests, we amplified the number of atomic refcounted objects we use and update every execbuffer; from none to several references, and a set of references that need to be changed. We also introduced interesting side-effects in the order of retiring requests and objects. Instead of independently tracking the last request for an object, track the active objects for each request. The object will reside in the buffer list of its most recent active request and so we reduce the kref interchange to a list_move. Now retirements are entirely driven by the request, dramatically simplifying activity tracking on the object themselves, and removing the ambiguity between retiring objects and retiring requests. Furthermore with the consolidation of managing the activity tracking centrally, we can look forward to using RCU to enable lockless lookup of the current active requests for an object. In the future, we will be able to query the status or wait upon rendering to an object without even touching the struct_mutex BKL. All told, less code, simpler and faster, and more extensible. v2: Add a typedef for the function pointer for convenience later. v3: Make the noop retirement callback explicit. Allow passing NULL to the init_request_active() which is expanded to a common noop function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk	2016-08-04 08:09:25 +01:00
Chris Wilson	96a945aa42	drm/i915: Move the common engine cleanup to intel_engine_cs.c Now that we initialize the state to both legacy and execlists inside intel_engine_cs, we should also clean up that state from the common functions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470226756-24401-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2016-08-03 14:01:12 +01:00
Chris Wilson	ad7bdb2b99	drm/i915: Rename engine->semaphore.sync_to, engine->sempahore.signal locals In order to be more consistent with the rest of the request construction and ring emission, use the common names for the ring and request. Rather than using signaler_req, waiter_req, and intel_ring *wait, we use plain req and ring. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-32-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-23-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:33 +01:00
Chris Wilson	ddf07be7a2	drm/i915: Simplify calling engine->sync_to Since requests can no longer be generated as a side-effect of intel_ring_begin(), we know that the seqno will be unchanged during ring-emission. This predicatablity then means we do not have to check for the seqno wrapping around whilst emitting the semaphore for engine->sync_to(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-31-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-22-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:32 +01:00
Chris Wilson	9242f974dc	drm/i915: Stop passing caller's num_dwords to engine->semaphore.signal() Rather than pass in the num_dwords that the caller wishes to use after the signal command packet, split the breadcrumb emission into two phases and have both the signal and breadcrumb individiually acquire space on the ring. This makes the interface simpler for the reader, and will simplify for patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-25-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-16-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:28 +01:00
Chris Wilson	ddd66c5154	drm/i915: Unify request submission Move request submission from emit_request into its own common vfunc from i915_add_request(). v2: Convert I915_DISPATCH_flags to BIT(x) whilst passing v3: Rename a few functions to match. v4: Reenable execlists submission after disabling guc. v5: Be aware that everyone calls i915_guc_submission_disable()! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-23-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-14-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:26 +01:00
Chris Wilson	8f9420184a	drm/i915: Move the modulus for ring emission to the register write Space reservation is already safe with respect to the ring->size modulus, but hardware only expects to see values in the range 0...ring->size-1 (inclusive) and so requires the modulus to prevent us writing the value ring->size instead of 0. As this is only required for the register itself, we can defer the modulus to the register update and not perform it after every command packet. We keep the intel_ring_advance() around in the code to provide demarcation for the end-of-packet (which then can be compared against intel_ring_begin() as the number of dwords emitted must match the reserved space). v2: Assert that the ring size is a power-of-two to match assumptions in the code. Simplify the comment before writing the tail value to explain why the modulus is necessary. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-13-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:25 +01:00
Chris Wilson	c5efa1ad09	drm/i915: Convert engine->write_tail to operate on a request If we rewrite the I915_WRITE_TAIL specialisation for the legacy ringbuffer as submitting the request onto the ringbuffer, we can unify the vfunc with both execlists and GuC in the next patch. v2: Drop the modulus from the I915_WRITE_TAIL as it is currently being applied in intel_ring_advance() after every command packet, and add a comment explaining why we need the modulus at all. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-22-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-12-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:24 +01:00
Chris Wilson	ba76d91bc0	drm/i915: Remove intel_ring_get_tail() Joonas doesn't like the tiny function, especially if I go around making it more complicated and using it elsewhere. To remove that temptation, remove the function! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-21-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-11-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:23 +01:00
Chris Wilson	803688babd	drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer - we need only one vfunc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-20-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-10-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:21 +01:00
Chris Wilson	7c9cf4e33a	drm/i915: Reduce engine->emit_flush() to a single mode parameter Rather than passing a complete set of GPU cache domains for either invalidation or for flushing, or even both, just pass a single parameter to the engine->emit_flush to determine the required operations. engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE) engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH) engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH \| EMIT_INVALIDATE) This allows us to extend the behaviour easily in future, for example if we want just a command barrier without the overhead of flushing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-8-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	c7fe7d25ed	drm/i915: Remove obsolete engine->gpu_caches_dirty Space for flushing the GPU cache prior to completing the request is preallocated and so cannot fail - the GPU caches will always be flushed along with the completed request. This means we no longer have to track whether the GPU cache is dirty between batches like we had to with the outstanding_lazy_seqno. With the removal of the duplication in the per-backend entry points for emitting the obsolete lazy flush, we can then further unify the engine->emit_flush. v2: Expand a bit on the legacy of gpu_caches_dirty Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-18-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-7-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:19 +01:00
Chris Wilson	aad29fbbb8	drm/i915: Rename intel_pin_and_map_ring() For more consistent oop-naming, we would use intel_ring_verb, so pick intel_ring_pin() and intel_ring_unpin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-17-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-6-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:18 +01:00
Chris Wilson	32c04f16f0	drm/i915: Rename residual ringbuf parameters Now that we have a clear ring/engine split and a struct intel_ring, we no longer need the stopgap ringbuf names. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-16-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-5-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:17 +01:00
Chris Wilson	7e37f889b5	drm/i915: Rename struct intel_ringbuffer to struct intel_ring The state stored in this struct is not only the information about the buffer object, but the ring used to communicate with the hardware. Using buffer here is overly specific and, for me at least, conflates with the notion of buffer objects themselves. s/struct intel_ringbuffer/struct intel_ring/ s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/ s/describe_ctx_ringbuf()/describe_ctx_ring()/ s/intel_ring_get_active_head()/intel_engine_get_active_head()/ s/intel_ring_sync_index()/intel_engine_sync_index()/ s/intel_ring_init_seqno()/intel_engine_init_seqno()/ s/ring_stuck()/engine_stuck()/ s/intel_cleanup_engine()/intel_engine_cleanup()/ s/intel_stop_engine()/intel_engine_stop()/ s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/ s/intel_unpin_ringbuffer()/intel_unpin_ring()/ s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/ s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/ s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/ s/intel_ringbuffer_free()/intel_ring_free()/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:16 +01:00
Chris Wilson	b5321f309b	drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Both perform the same actions with more or less indirection, so just unify the code. v2: Add back a few intel_engine_cs locals Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-11-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-1-git-send-email-chris@chris-wilson.co.uk	2016-08-02 22:58:13 +01:00
Chris Wilson	33a051a5fc	drm/i915/cmdparser: Remove stray intel_engine_cs *ring When we refer to intel_engine_cs, we want to use engine so as not to confuse ourselves about ringbuffers. v2: Rename all the functions as well, as well as a few more stray comments. v3: Split the really long error message strings Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-6-git-send-email-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-1-git-send-email-chris@chris-wilson.co.uk	2016-07-27 16:23:05 +01:00
Dave Gordon	38a0f2db5b	drm/i915: rename 'ring' where it refers to an engine or engine_id 'ring' is an old deprecated term for a GPU engine. Chris Wilson wants to use the name for what is currently known as an intel_ringbuffer, but it will be dreadfully confusing if some rings are ringbuffers but other rings are still engines. So this patch changes the names of a bunch of parameters called 'ring' to either 'engine' or 'engine_id' according to what they actually are. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-3-git-send-email-david.s.gordon@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-21 09:59:41 +01:00
Dave Gordon	bbdc070a79	drm/i915: rename macro parameter(ring) to (engine) 'ring' is an old deprecated term for a GPU engine. Here we make the terminology more consistent by renaming the 'ring' parameter of lots of macros that calculate addresses within the MMIO space of an engine. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469034967-15840-2-git-send-email-david.s.gordon@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-07-21 09:59:25 +01:00
Chris Wilson	f2f0ed718b	drm/i915: Rename ring->virtual_start as ring->vaddr Just a different colour to better match virtual addresses elsewhere. s/ring->virtual_start/ring->vaddr/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-9-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-8-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:39 +01:00
Chris Wilson	406ea8d22f	drm/i915: Treat ringbuffer writes as write to normal memory Ringbuffers are now being written to either through LLC or WC paths, so treating them as simply iomem is no longer adequate. However, for the older !llc hardware, the hardware is documentated as treating the TAIL register update as serialising, so we can relax the barriers when filling the rings (but even if it were not, it is still an uncached register write and so serialising anyway.). For simplicity, let's ignore the iomem annotation. v2: Remove iomem from ringbuffer->virtual_address v3: And for good measure add iomem elsewhere to keep sparse happy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2 Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-8-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-7-git-send-email-chris@chris-wilson.co.uk	2016-07-20 13:40:14 +01:00
Chris Wilson	04769652c8	drm/i915: Derive GEM requests from dma-fence dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspace or between drivers. v2: The fence_context needs to be globally unique, not just unique to this device. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk	2016-07-20 09:29:53 +01:00
Tvrtko Ursulin	019bf27763	drm/i915: Pull out some more common engine init code Created two common helpers for engine setup and engine init phases respectively to help with code sharing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468422221-12132-1-git-send-email-tvrtko.ursulin@linux.intel.com Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:24 +01:00
Tvrtko Ursulin	88d2ba2e95	drm/i915: Move common engine setup into intel_engine_cs.c Common code deserves to be put in a separate file from legacy and execlists implementation for clarity and ease of maintenance. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:20 +01:00
Tvrtko Ursulin	8b3e2d3639	drm/i915: Unify engine init loop With the unified common engine setup done, and the execlist engine initialization loop clearly split into two phases, we can eliminate the separate legacy engine initialization code. v2: Fix cleanup path for legacy. v3: Rename constructors. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris-wilson.co.uk>	2016-07-14 11:17:06 +01:00
Dave Gordon	c2c7f24008	drm/i915: unify first-stage engine struct setup intel_lrc.c has a table of "logical rings" (meaning engines), while intel_ringbuffer.c has separately open-coded initialisation for each engine. We can deduplicate this somewhat by using the same first-stage engine-setup function for both modes. So here we expose the function that transfers information from the static table of (all) known engines to the dev_priv->engine array of engines available on this device (adjusting the names along the way) and then embed calls to it in both the LRC and the legacy-mode setup. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Chris Wilson <chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-07-14 11:16:18 +01:00
Chris Wilson	aca34b6e1c	drm/i915: Group the irq breadcrumb variables into the same cacheline As we inspect both the tasklet (to check for an active bottom-half) and set the irq-posted flag at the same time (both in the interrupt handler and then in the bottom-halt), group those two together into the same cacheline. (Not having total control over placement of the struct means we can't guarantee the cacheline boundary, we need to align the kmalloc and then each struct, but the grouping should help.) v2: Try a couple of different names for the state touched by the user interrupt handler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-3-git-send-email-chris@chris-wilson.co.uk	2016-07-06 12:47:39 +01:00
Chris Wilson	7b4d3a16dd	drm/i915: Remove stop-rings debugfs interface Now that we have (near) universal GPU recovery code, we can inject a real hang from userspace and not need any fakery. Not only does this mean that the testing is far more realistic, but we can simplify the kernel in the process. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk	2016-07-04 08:18:23 +01:00
Chris Wilson	67d97da349	drm/i915: Only start retire worker when idle The retire worker is a low frequency task that makes sure we retire outstanding requests if userspace is being lax. We only need to start it once as it remains active until the GPU is idle, so do a cheap test before the more expensive queue_work(). A consequence of this is that we need correct locking in the worker to make the hot path of request submission cheap. To keep the symmetry and keep hangcheck strictly bound by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle worker before dropping the wakelock. v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick the waiter. v3: Remove the wakeref assertion squelching (now we hold a wakeref for the hangcheck, any rpm error there is genuine). v4: To prevent excess work when retiring requests, we split the busy flag into two, a boolean to denote whether we hold the wakeref and a bitmask of active engines. v5: Reorder cancelling hangcheck upon idling to avoid a race where we might cancel a hangcheck after being preempted by a new task Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> References: https://bugs.freedesktop.org/show_bug.cgi?id=88437 Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk	2016-07-04 08:18:19 +01:00
Chris Wilson	61ff75ac20	drm/i915: Simplify enabling user-interrupts with L3-remapping Borrow the idea from intel_lrc.c to precompute the mask of interrupts we wish to always enable to avoid having lots of conditionals inside the interrupt enabling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-19-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:17 +01:00
Chris Wilson	31bb59cc01	drm/i915: Move the get/put irq locking into the caller With only a single callsite for intel_engine_cs->irq_get and ->irq_put, we can reduce the code size by moving the common preamble into the caller, and we can also eliminate the reference counting. For completeness, as we are no longer doing reference counting on irq, rename the get/put vfunctions to enable/disable respectively and are able to review the use of posting reads. We only require the serialisation with hardware when enabling the interrupt (i.e. so we cannot miss an interrupt by going to sleep before the hardware truly enables it). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:16 +01:00
Chris Wilson	b3850855f4	drm/i915: Embed signaling node into the GEM request Under the assumption that enabling signaling will be a frequent operation, lets preallocate our attachments for signaling inside the (rather large) request struct (and so benefiting from the slab cache). v2: Convert from void * to more meaningful names and types. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-17-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:04:14 +01:00
Chris Wilson	c81d46138d	drm/i915: Convert trace-irq to the breadcrumb waiter If we convert the tracing over from direct use of ring->irq_get() and over to the breadcrumb infrastructure, we only have a single user of the ring->irq_get and so we will be able to simplify the driver routines (eliminating the redundant validation and irq refcounting). Process context is preferred over softirq (or even hardirq) for a couple of reasons: - we already utilize process context to have fast wakeup of a single client (i.e. the client waiting for the GPU inspects the seqno for itself following an interrupt to avoid the overhead of a context switch before it returns to userspace) - engine->irq_seqno() is not suitable for use from an softirq/hardirq context as we may require long waits (100-250us) to ensure the seqno write is posted before we read it from the CPU A signaling framework is a requirement for enabling dma-fences. v2: Move to a signaling framework based upon the waiter. v3: Track the first-signal to avoid having to walk the rbtree everytime. v4: Mark the signaler thread as RT priority to reduce latency in the indirect wakeups. v5: Make failure to allocate the thread fatal. v6: Rename kthreads to i915/signal:%u Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-16-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:02:34 +01:00
Chris Wilson	3d5564e910	drm/i915: Only apply one barrier after a breadcrumb interrupt is posted If we flag the seqno as potentially stale upon receiving an interrupt, we can use that information to reduce the frequency that we apply the heavyweight coherent seqno read (i.e. if we wake up a chain of waiters). v2: Use cmpxchg to replace READ_ONCE/WRITE_ONCE for more explicit control of the ordering wrt to interrupt generation and interrupt checking in the bottom-half. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-14-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:00:54 +01:00
Chris Wilson	7d5ea80720	drm/i915: Refactor scratch object allocation for gen2 w/a buffer The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch buffer. If we pass in the size we want to allocate for the scratch buffer, both callers can use the same routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-11-git-send-email-chris@chris-wilson.co.uk	2016-07-01 21:00:50 +01:00
Chris Wilson	f8291952bd	drm/i915: Stop mapping the scratch page into CPU space After the elimination of using the scratch page for Ironlake's breadcrumb, we no longer need to kmap the object. We therefore can move it into the high unmappable space and do not need to force the object to be coherent (i.e. snooped on !llc platforms). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-9-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:48 +01:00
Chris Wilson	1b7744e7ba	drm/i915: Use HWS for seqno tracking everywhere By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. v2: Fix semaphore_passed() to look at the signaling engine (not the waiter's) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:48 +01:00
Chris Wilson	688e6c7258	drm/i915: Slaughter the thundering i915_wait_request herd One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk	2016-07-01 20:58:43 +01:00
Tvrtko Ursulin	1b9e665064	drm/i915: Compact Gen8 semaphore initialization Replace the macro initializer with a programatic loop which results in smaller code and hopefully just as clear. v2: Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2016-06-30 17:20:45 +01:00
Chris Wilson	e2efd13007	drm/i915: Rename struct intel_context Our goal is to rename the anonymous per-engine struct beneath the current intel_context. However, after a lively debate resolving around the confusion between intel_context_engine and intel_engine_context, the realisation is that the two structs target different users. The outer struct is API / user facing, and so carries the higher level GEM information. The inner struct is hw facing. Thus we want to name the inner struct intel_context and the outer one i915_gem_context. As the first step, we need to rename the current struct: s/struct intel_context/struct i915_gem_context/ which fits much better with its constructors already conveying the i915_gem_context prefix! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk	2016-05-24 15:27:14 +01:00
Chris Wilson	c033666a94	drm/i915: Store a i915 backpointer from engine, and use it text data bss dec hex filename 6309351 3578714 696320 10584385 a18141 vmlinux 6308391 3578714 696320 10583425 a17d81 vmlinux Almost 1KiB of code reduction. v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions text data bss dec hex filename 6304579 3578778 696320 10579677 a16edd vmlinux 6303427 3578778 696320 10578525 a16a5d vmlinux Now over 1KiB! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk	2016-05-09 13:41:24 +01:00
Chris Wilson	215a7e3210	drm/i915: Fix gen8 semaphores id for legacy mode With the introduction of a distinct engine->id vs the hardware id, we need to fix up the value we use for selecting the target engine when signaling a semaphore. Note that these values can be merged with engine->guc_id. Fixes: `de1add3605` Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-3-git-send-email-chris@chris-wilson.co.uk	2016-04-29 13:59:26 +01:00
Chris Wilson	a58c01aa6e	drm/i915: Apply strongly ordered RCS breadcrumb to gen8/legacy For legacy ringbuffer mode, we need the new ordered breadcrumb emission tried and tested on execlists in order to avoid the dreaded "missed interrupt" syndrome. A secondary advantage of the execlists method is that it writes to an arbitrary address, useful if one wants to write a breadcrumb elsewhere. This fix is taken from commit `7c17d37737` (drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-1-git-send-email-chris@chris-wilson.co.uk	2016-04-29 13:58:26 +01:00
Chris Wilson	596e5efc69	drm/i915: Bump reserved size for legacy gen8 semaphore emission With 5 rings and a flush, we need 192 bytes of space to emit the breadcrumb and semaphores. However, we need some spare room the size of the single largest packet (36 dwords, 144 bytes) to accommodate wraparound giving a grand total of 336 bytes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461917226-9132-1-git-send-email-chris@chris-wilson.co.uk	2016-04-29 10:20:51 +01:00
Tvrtko Ursulin	e39d42fa7e	drm/i915: Stop tracking execlists retired requests With the previous patch having extended the pinned lifetime of contexts by referencing the previous context from the current request until the latter is retired (completed by the GPU), we can now remove usage of execlist retired queue entirely. This is because the above now guarantees that all execlist object access requirements are satisfied by this new tracking, and we can stop taking additional references and stop keeping request on the execlists retired queue. The latter was a source of significant scalability issues in the driver causing performance hits on some tests. Most dramatical of which was igt/gem_close_race which had run time in tens of minutes which is now reduced to tens of seconds. Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-24-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	0251a96322	drm/i915: Remove the identical implementations of request space reservation Now that we share intel_ring_begin(), reserving space for the tail of the request is identical between legacy/execlists and so the tautology can be removed. In the process, we move the reserved space tracking from the ringbuffer on to the request. This is to enable us to reorder the reserved space allocation in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Chris Wilson	987046ad65	drm/i915: Unify intel_ring_begin() Combine the near identical implementations of intel_logical_ring_begin() and intel_ring_begin() - the only difference is that the logical wait has to check for a matching ring (which is assumed by legacy). In the process some debug messages are culled as there were following a WARN if we hit an actual error. v2: Updated commentary Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-12-git-send-email-chris@chris-wilson.co.uk	2016-04-28 12:17:32 +01:00
Tvrtko Ursulin	3756685a18	drm/i915: Only grab correct forcewake for the engine with execlists Rather than blindly waking up all forcewake domains on command submission, we can teach each engine what is (or are) the correct one to take. On platforms with multiple forcewake domains like VLV, CHV, SKL and BXT, this has the potential of lowering the GPU and CPU power use and submission latency. To implement it we add a function named intel_uncore_forcewake_for_reg whose purpose is to query which forcewake domains need to be taken to read or write a specific register with raw mmio accessors. These enables the execlists engine setup to query which forcewake domains are relevant per engine on the currently running platform. v2: * Kerneldoc. * Split from intel_uncore.c macro extraction, WARN_ON, no warns on old platforms. (Chris Wilson) v3: * Single domain per engine, mention all registers, bi-directional function and a new name, fix handling of gen6 and gen7 writes. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1460468251-14069-1-git-send-email-tvrtko.ursulin@linux.intel.com	2016-04-12 15:35:22 +01:00
Chris Wilson	5dd8e50c27	drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor When reading from the HWS page, we use barrier() to prevent the compiler optimising away the read from the volatile (may be updated by the GPU) memory address. This is more suited to READ_ONCE(); make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-5-git-send-email-chris@chris-wilson.co.uk	2016-04-09 12:09:59 +01:00

1 2 3 4 5 ...

267 Commits