linux

Author	SHA1	Message	Date
Chris Wilson	b47161858b	drm/i915: Implement inter-engine read-read optimisations Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-21 15:11:42 +02:00
Mika Kuoppala	b3da4a627e	drm/i915: Free wa_batchbuffer when freeing error state wa_batchbuffer is part of some error states. Make sure it is freed. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-05-08 13:03:37 +02:00
Chris Wilson	94f8cf109e	drm/i915: Record ring->start address in error state This is mostly useful for execlists where the rings switch between contexts (and so checking that the ring's start register matches the context is important). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-10 08:56:07 +02:00
Mika Kuoppala	6c826f3495	drm/i915: Add fault address to error state for gen8 and gen9 The faulting virtual address is >32bits and has been moved to different registers. Add to error state and output upper register first, in the same line for easy reconstruction of the fault address. v2: correct gen masking (Michel) v3: s/TBL/TLB (Ville) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-25 18:23:44 +01:00
Michel Thierry	0b37a9a9eb	drm/i915: Do not leak objects after capturing error state While running kmemleak chasing a different memleak, I saw that the capture_error_state function was leaking some objects, for example: unreferenced object 0xffff8800a9b72148 (size 8192): comm "kworker/u16:0", pid 1499, jiffies 4295201243 (age 990.096s) hex dump (first 32 bytes): 00 00 04 00 00 00 00 00 5d f4 ff ff 00 00 00 00 ........]....... 00 30 b0 01 00 00 00 00 37 00 00 00 00 00 00 00 .0......7....... backtrace: [<ffffffff811e5ae4>] create_object+0x104/0x2c0 [<ffffffff8178f50a>] kmemleak_alloc+0x7a/0xc0 [<ffffffff811cde4b>] __kmalloc+0xeb/0x220 [<ffffffffa038f1d9>] kcalloc.constprop.12+0x2d/0x2f [i915] [<ffffffffa0316064>] i915_capture_error_state+0x3f4/0x1660 [i915] [<ffffffffa03207df>] i915_handle_error+0x7f/0x660 [i915] [<ffffffffa03210f7>] i915_hangcheck_elapsed+0x2e7/0x470 [i915] [<ffffffff8108d574>] process_one_work+0x144/0x490 [<ffffffff8108dfbd>] worker_thread+0x11d/0x530 [<ffffffff81094079>] kthread+0xc9/0xe0 [<ffffffff817a2398>] ret_from_fork+0x58/0x90 [<ffffffffffffffff>] 0xffffffffffffffff The following objects are allocated in i915_gem_capture_buffers, but not released in i915_error_state_free: - error->active_bo_count - error->pinned_bo - error->pinned_bo_count - error->active_bo[vm_count] (allocated in i915_gem_capture_vm). The leaks were introduced by commit `95f5301dd8` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 17:00:15 2013 -0700 drm/i915: Update error capture for VMs v2: Reuse iterator and add culprit commit details (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-03-20 15:46:46 +01:00
Mika Kuoppala	071c92de1d	drm/i915: Add process identifier to requests We use the pid of the process which opened our device when we track which was the culprit of the gpu hang. But as that file descriptor might get inherited, we might blame the wrong process when we record the error state. Track process identifiers in requests to always find the correct offender. v2: Track only user processes (Chris) Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: drop NULL check before put_pid as suggested by Chris.] Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-02-13 23:28:37 +01:00
Nick Hoath	72f95afa5f	drm/i915: Removed duplicate members from submit_request Where there were duplicate variables for the tail, context and ring (engine) in the gem request and the execlist queue item, use the one from the request and remove the duplicate from the execlist queue item. Issue: VIZ-4274 v1: Rebase v2: Fixed build issues. Keep separate postfix & tail pointers as these are used in different ways. Reinserted missing full tail pointer update. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-01-27 09:50:52 +01:00
Rodrigo Vivi	563f94f6fa	drm/i915: Organize INSTDONE report for future. Let's be optimistic that for future platforms this will remain the same and reorg a bit. This reorg in if blocks instead of switch make life easier for future platform support addition. Cc: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-17 18:18:43 +01:00
Rodrigo Vivi	74745b0938	drm/i915: Organize PDP regs report for future. Let's be optimistic that for future platforms this will remain the same and reorg a bit. This reorg in if blocks instead of switch make life easier for future platform support addition. Cc: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-17 18:18:34 +01:00
Rodrigo Vivi	ce38ab0593	drm/i915: Organize Fence registers for future enablement. Let's be optimistic that for future platforms this will remain the same and reorg a bit. This reorg in if blocks instead of switch make life easier for future platform support addition. v2: Jani pointed out I was missing reg_830 for some gen3 platforms. So let's make this platforms subcases of Gen checks. Cc: Jani Nikula <jani.nikula@intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-17 18:17:54 +01:00
Tvrtko Ursulin	fe14d5f4e5	drm/i915: Infrastructure for supporting different GGTT views per object Things like reliable GGTT mappings and mirrored 2d-on-3d display will need to map objects into the same address space multiple times. Added a GGTT view concept and linked it with the VMA to distinguish between multiple instances per address space. New objects and GEM functions which do not take this new view as a parameter assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the previous behaviour. This now means that objects can have multiple VMA entries so the code which assumed there will only be one also had to be modified. Alternative GGTT views are supposed to borrow DMA addresses from obj->pages which is DMA mapped on first VMA instantiation and unmapped on the last one going away. v2: * Removed per view special casing in i915_gem_ggtt_prepare / finish_object in favour of creating and destroying DMA mappings on first VMA instantiation and last VMA destruction. (Daniel Vetter) * Simplified i915_vma_unbind which does not need to count the GGTT views. (Daniel Vetter) * Also moved obj->map_and_fenceable reset under the same check. * Checkpatch cleanups. v3: * Only retire objects once the last VMA is unbound. v4: * Keep scatter-gather table for alternative views persistent for the lifetime of the VMA. * Propagate binding errors to callers and handle appropriately. v5: * Explicitly look for normal GGTT view in i915_gem_obj_bound to align usage in i915_gem_object_ggtt_unpin. (Michel Thierry) * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry) * Removed stray semi-colon in i915_gem_object_set_cache_level. For: VIZ-4544 Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Michel Thierry <michel.thierry@intel.com> [danvet: Drop hunk from i915_gem_shrink since it's just prettification but upsets a __must_check warning.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-15 11:25:04 +01:00
John Harrison	41c5241555	drm/i915: Remove the now redundant 'obj->ring' The ring member of the object structure was always updated with the last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant and potentially misleading (especially as there was no comment to explain its purpose). This checkin removes the redundant field. Many uses were simply testing for non-null to see if the object is active on the GPU. Some of these have been converted to check 'obj->active' instead. Others (where the last_read_req is about to be used anyway) have been changed to check obj->last_read_req. The rest simply pull the ring out from the request structure and proceed as before. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:23 +01:00
John Harrison	97b2a6a10a	drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req The object structure contains the last read, write and fenced seqno values for use in syncrhonisation operations. These have now been replaced with their request structure counterparts. Note that to ensure that objects do not end up with dangling pointers, the assignments of last_*_req include reference count updates. Thus a request cannot be freed if an object is still hanging on to it for any reason. v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did not get updated when 'last_rendering_seqno' became 'last_read\|write_seqno' several millenia ago. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-12-03 09:35:14 +01:00
Daniel Vetter	4feb765943	drm/i915: Remove user pinning code Now unused. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-12-03 09:35:11 +01:00
Daniel Vetter	cc1df8a3fe	drm/i915: Use ggtt error obj capture helper for gen8 semaphores Spotted while reading and trying to understand how our error capture code deals with full ppgtt. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-11-20 16:59:49 +01:00
Daniel Vetter	77c1aa84de	drm/i915: Don't print header in error state for non-existing CS This goes back to commit `362b8af7ad` Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Thu Jan 30 00:19:38 2014 -0800 drm/i915: Move per ring error state to ring_error Spotted while reading error states. Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-11-18 16:23:13 +01:00
Mika Kuoppala	0b5492d6b5	drm/i915: Add gen to the gpu hang ecode for the Brothers in Triage Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-07 18:42:22 +01:00
Tvrtko Ursulin	aff437667b	drm/i915: Move flags describing VMA mappings into the VMA If these flags are on the object level it will be more difficult to allow for multiple VMAs per object. v2: Simplification and cleanup after code review comments (Chris Wilson). Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-11-04 14:04:51 +01:00
Daniel Vetter	955e36d0b4	Merge branch 'topic/skl-stage1' into drm-intel-next-queued SKL stage 1 patches still need polish so will likely miss the 3.18 merge window. We've decided to postpone to 3.19 so let's pull this in to make patch merging and conflict handling easier. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>	2014-09-30 22:36:57 +02:00
Damien Lespiau	2a9b753966	drm/i915/skl: Report the PDP regs as in gen8 Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-24 14:52:00 +02:00
Damien Lespiau	2fcdcd8a2e	drm/i915/skl: report the same INSTDONE registers as gen8 Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-24 14:52:00 +02:00
Damien Lespiau	01209dd56e	drm/i915/skl: Fence registers on SKL are the same as SNB v2: Rebased on top of the i915_gpu_error.c extraction. Reviewed-by: Thomas Wood <thomas.wood@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-24 14:33:13 +02:00
Daniel Vetter	5b254c5978	drm/i915: Clarify gpu_error.lock locking i915_capture_error_state can be called from all kinds of contexts, so needs the full irqsave dance. But the other two places to grab and release the error state are only called from process context. So simplify them to the plaine _irq spinlock versions to clarify the locking semantics. Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-19 14:43:18 +02:00
Chris Wilson	0a4cd7c8c8	drm/i915: Differentiate between LLC or snooped for the user Rather than describing an object as either "snooped or LLC", we can do better as we should know what machine we are running on! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 11:04:25 +02:00
Oscar Mateo	9075e52fac	drm/i915/bdw: Make sure error capture keeps working with Execlists Since the ringbuffer does not belong per engine anymore, we have to make sure that we are always recording the correct ringbuffer. TODO: This is only a small fix to keep basic error capture working, but we need to add more information for it to be useful (e.g. dump the context being executed). v2: Reorder how the ringbuffer is chosen to clarify the change and rename the variable, both changes suggested by Chris Wilson. Also, add the TODO comment to the code, as suggested by Daniel. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 10:54:28 +02:00
Chris Wilson	87a01e822d	drm/i915: Suppress a WARN on reading an object back for a GPU hang Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 10:54:26 +02:00
Chris Wilson	8ae62dc62b	drm/i915: Remove num_pages parameter to i915_error_object_create() For cleanliness, i915_error_object_create() was written to handle the NULL pointer in a central location. The macro that wrapped it and passed it a num_pages to use, was not safe. As we now never limit the num_pages to use (we did so at one point to only capture the first page of the context), we can remove the redundant macro and be NULL safe again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 10:54:24 +02:00
Chris Wilson	b3c3f5e69e	drm/i915: Do not access stolen memory directly by the CPU, even for error capture For stolen pages, since it is verboten to access them directly on many architectures, we have to read them through the GTT aperture. If they are not accessible through the aperture, then we have to abort. This was complicated by commit `8b6124a633` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 30 14:38:16 2014 +0000 drm/i915: Don't access snooped pages through the GTT (even for error capture) and the desire to use stolen memory for ringbuffers, contexts and batches in the future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-09-03 10:54:23 +02:00
Chris Wilson	3a44873490	drm/i915: Print captured bo for all VM in error state The current error state harks back to the era of just a single VM. For full-ppgtt, we capture every bo on every VM. It behoves us to then print every bo for every VM, which we currently fail to do and so miss vital information in the error state. v2: Use the vma address rather than -1! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-14 16:52:41 +02:00
Daniel Vetter	ae6c480692	drm/i915: Only track real ppgtt for a context There's a bit a confusion since we track the global gtt, the aliasing and real ppgtt in the ctx->vm pointer. And not all callers really bother to check for the different cases and just presume that it points to a real ppgtt. Now looking closely we don't actually need ->vm to always point at an address space - the only place that cares actually has fixup code already to decide whether to look at the per-proces or the global address space. So switch to just tracking the ppgtt directly and ditch all the extraneous code. v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt. Also drop the early exit - without aliasing ppgtt we want to dump all the ppgtts of the contexts if we have full ppgtt. v3: Actually git add the compile fix. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Cc: "Thierry, Michel" <michel.thierry@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> OTC-Jira: VIZ-3724 [danvet: Resolve conflicts with execlist patches while applying.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:33 +02:00
Rodrigo Vivi	885ea5a813	drm/i915: Fix DEIER and GTIER collecting for BDW. BDW has many other Display Engine interrupts and GT interrupts registers. Collecting it properly on gpu_error_state. On debugfs all was properly listed already but besides we were also listing old DEIER and GTIER that doesn't exist on BDW anymore. This was causing unclaimed register messages v2: Fix small issues of first version and don't read DEIER regs when pipe's power well is disabled v3: bikeshed accepted: use enum pipe pipe instead of int i for pipe interection v4: Ben notice previous version was checking for display_power_enabled without using propper locks. Using _unlocked version isn't reliable and we cannot get this registers when power well is off. So let's avoid getting all DE_IER per pipe for now. If someone think this is an useful information it can be added later. v5: Ben: put back debugfs stuff that might be coverred by pm_get and use gen >= 8 trying to predict future. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81701 Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: (v3) Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 14:04:08 +02:00
Mika Kuoppala	f260fe7b2f	drm/i915: Don't accumulate hangcheck score on forward progress If the actual head has progressed forward inside a batch (request), don't accumulate hangcheck score. As the hangcheck score in increased only by acthd jumping backwards, the result is that we only declare an active batch as stuck if it is trapped inside a loop. Or that the looping will dominate the batch progression so that it overcomes the bonus that forward progress gives. v2: Improved commit message (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: s/active_loop/active (loop)/ as requested by Chris.] Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 14:04:07 +02:00
Rodrigo Vivi	843db716a9	drm/i915: Collect gtier properly on HSW. GTIER and DEIER doesn't have same interface on HSW so this "or" operation makes the information provided useless. v2: since we have gtier variable already let's split for everybody and avoid the strange \| op. Also avoid overriding the value that was set for vlv. In this case I believe that we should reorganize the whole function, but I'll respect the comment that ask to not touch the order and let this organization work to be done later. v3: moving VLV check to the right place. Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 11:07:20 +02:00
Rodrigo Vivi	864c61811c	drm/i915: Fix error state collecting Fix signal_offset when recording semaphore state on BDW. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 11:07:18 +02:00
Rodrigo Vivi	b4558b46d5	drm/i915: Fix possible overflow when recording semaphore states. semaphore _sync_seqno, _seqno and _mbox are smaller than number of rings. This optimization is to remove the ring itself from the list and the logic to do that is at intel_ring_sync_index as below: /* * rcs -> 0 = vcs, 1 = bcs, 2 = vecs, 3 = vcs2; * vcs -> 0 = bcs, 1 = vecs, 2 = vcs2, 3 = rcs; * bcs -> 0 = vecs, 1 = vcs2. 2 = rcs, 3 = vcs; * vecs -> 0 = vcs2, 1 = rcs, 2 = vcs, 3 = bcs; * vcs2 -> 0 = rcs, 1 = vcs, 2 = bcs, 3 = vecs; */ v2: Skip when from == to (Damien). v3: avoid computing idx when from == to (Damien). use ring == to instead of ring->id == to->id (Damien). use continue instead of return (Rodrigo). v4: avoid all unecessary computation (Damien). reduce idx to loop scope (Damien). Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:41 +02:00
Ben Widawsky	36362ad3b6	drm/i915/error: Check the potential ctx obj's vm The bound list is global (all objects which back the VMAs are stored here). Recently the BUG() in the offset lookup was demoted to a WARN, but the fault actually lies in the caller, here. This bug has existed since the initial introduction of PPGTT (however, it was fixed in unmerged patches to fix up the error state). Note: The reason for the BUG_ON to WARN_ON demotion was _not_ to duct-tape over this bug here but another but triggerable without ppgtt. See the commit for details: commit `f25748ea73` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jun 17 22:34:38 2014 +0200 drm/i915: Don't BUG_ON in i915_gem_obj_offset A WARN_ON is perfectly fine. The BUG in here seems to be the cause behind hard-hangs when I cat the i915_gem_pageflip debugfs file (which calls this from an irq spinlock). But only while running a full igt run after a while. I still need to root cause the underlying issue. I'll also start reject patches which add new BUG_ON but don't come with a really good justification for it. The general rule really should be to just WARN and hope the driver survives for long enough. v2: Make the WARN a bit more useful per Chris' suggestion. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Clarfy that the WARN_ON (former BUG_ON) in ggtt_offset caught more than just this bug fixed in this patch here.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:38 +02:00
Ben Widawsky	0ca36d7839	drm/i915/bdw: collect semaphore error state Since the semaphore information is in an object, just dump it, and let the user parse it later. NOTE: The page being used for the semaphores are incoherent with the CPU. No matter what I do, I cannot figure out a way to read anything but 0s. Note that the semaphore waits are indeed working. v2: Don't print signal, and wait (they should be the same). Instead, print sync_seqno (Chris) v3: Free the semaphore error object (Chris) v4: Fix semaphore offset calculation during error state collection (Ville) v5: VCS2 rebase Make semaphore object error capture coding style consistent (Ville) Do the proper math for the signal offset (Ville) v6: Fix small conflicts on rebase and s/ring_buffer/engine_cs (Rodrigo) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 23:16:54 +02:00
Ben Widawsky	87f85ebc8d	drm/i915: Extract semaphore error collection v2: s/ring_buffer/engine_cs (Rodrigo) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 23:16:53 +02:00
Chris Wilson	eee73b4626	drm/i95: Initialize active ring->pid to -1 Otherwise we print out spurious processes on unused rings in the error state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-06-11 11:06:43 +03:00
Oscar Mateo	ee1b1e5ef3	drm/i915: Split the ringbuffers from the rings (2/3) This refactoring has been performed using the following Coccinelle semantic script: @@ struct intel_engine_cs r; @@ ( - (r).obj + r.buffer->obj \| - (r).virtual_start + r.buffer->virtual_start \| - (r).head + r.buffer->head \| - (r).tail + r.buffer->tail \| - (r).space + r.buffer->space \| - (r).size + r.buffer->size \| - (r).effective_size + r.buffer->effective_size \| - (r).last_retired_head + r.buffer->last_retired_head ) @@ struct intel_engine_cs *r; @@ ( - (r)->obj + r->buffer->obj \| - (r)->virtual_start + r->buffer->virtual_start \| - (r)->head + r->buffer->head \| - (r)->tail + r->buffer->tail \| - (r)->space + r->buffer->space \| - (r)->size + r->buffer->size \| - (r)->effective_size + r->buffer->effective_size \| - (r)->last_retired_head + r->buffer->last_retired_head ) @@ expression E; @@ ( - LP_RING(E)->obj + LP_RING(E)->buffer->obj \| - LP_RING(E)->virtual_start + LP_RING(E)->buffer->virtual_start \| - LP_RING(E)->head + LP_RING(E)->buffer->head \| - LP_RING(E)->tail + LP_RING(E)->buffer->tail \| - LP_RING(E)->space + LP_RING(E)->buffer->space \| - LP_RING(E)->size + LP_RING(E)->buffer->size \| - LP_RING(E)->effective_size + LP_RING(E)->buffer->effective_size \| - LP_RING(E)->last_retired_head + LP_RING(E)->buffer->last_retired_head ) Note: On top of this this patch also removes the now unused ringbuffer fields in intel_engine_cs. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: Add note about fixup patch included here.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:27:25 +02:00
Oscar Mateo	a4872ba6d0	drm/i915: s/intel_ring_buffer/intel_engine_cs In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:01:05 +02:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Ben Widawsky	ebc348b2ad	drm/i915: Move semaphore specific ring members to struct This will be helpful in abstracting some of the code in preparation for gen8 semaphores. v2: Move mbox stuff to a separate struct v3: Rebased over VCS2 work Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 10:56:52 +02:00
Imre Deak	f301b1e116	drm/i915: add missing error capturing of the PIPESTAT reg While checking the error capture path I noticed that we lacked the power domain-on check for PIPESTAT so fix this by moving that to where the rest of pipe registers are captured. The move also revealed that we actually don't include this register in the error report, so fix that too. v2: - patch introduced in v2 of the patchset v3: - add back !HAS_PCH_SPLIT check (Ville) [ Ignore my previous comment about the gen<=5 \|\| vlv check, I realized that it's the same as !HAS_PCH_SPLIT. ] Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:09:02 +02:00
Imre Deak	51660e0eb6	drm/i915: gen2: move error capture of IER to its correct place While checking the error capture path I noticed that this register is read twice for GEN2, so fix this and also move the read where it's done for other platforms. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:09:01 +02:00
Zhao Yakui	845f74a701	drm/i915:Initialize the second BSD ring on BDW GT3 machine Based on the hardware spec, the BDW GT3 machine has two independent BSD ring that can be used to dispatch the video commands. So just initialize it. V3->V4: Follow Imre's comment to do some minor updates. For example: more comments are added to describe the semaphore between ring. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> [danvet: Fix up checkpatch error.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:46 +02:00
Ben Widawsky	17d36749a5	drm/i915: Dump the whole context object. As we've learned over time, the HW context is just a series of GPU commands that we're able to decode without any changes in intel_error_decode. Since many bugs recently have been implicated in the HW context state, it makes sense to dump the whole context object in a form which can be parsed. Sample: render ring --- HW Context = 0x042db000 ringbuffer (render ring) at 0x0160c000; HEAD points to: 0x0160c000 0x0160c000: 0x00000000: MI_NOOP 0x0160c004: 0x00000000: MI_NOOP 0x0160c008: 0x00000000: MI_NOOP 0x0160c00c: 0x00000000: MI_NOOP 0x0160c010: 0x00000000: MI_NOOP 0x0160c014: 0x00000000: MI_NOOP 0x0160c018: 0x00000000: MI_NOOP 0x0160c01c: 0x00000000: MI_NOOP Unfortunately, our decoder isn't quite smart enough to deal with the variable length LRIs - but that is a tools problem. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Clarify commit message a bit, seems to have lost a few crucial words.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-09 14:37:13 +02:00
Ben Widawsky	13ffadd1f9	drm/i915/bdw: Expand FADD to 64bit For error state, like the recent modification to ACTHD, FADD also gets an upper dword. This is useful for debug to make sure the fetch address and head are similar. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-02 09:21:44 +02:00
Jani Nikula	50227e1cae	drm/i915: prefer struct drm_i915_private to drm_i915_private_t Remove the rest of the references to drm_i915_private_t. No functional changes. Signed-off-by: Jani Nikula <jani.nikula@intel.com> [danvet: Drop hunk in i915_cmd_parser.c] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-31 15:34:21 +02:00
Chris Wilson	e3243d1672	drm/i915: Split 64bit hexadecimal addresses to make them easier to read Broadwell introduces large address spaces, greater than 32bits in width. These require that we then store and print 64bit values. If we were to zero pad them out to 16 hexadecimal places, we have to carefully count the leading zeroes - which is easy to make a mistake. Conversely, if we do not zero pad out to 16, but keep it padding to 8 hexadecimal places, it is very easy to miss an address that is actually larger than 4GiB. A suggested compromise is to insert a space between the upper and lower dwords of the address so that we can continue with our accustom 32bit parser. (Alternatively, we could do the equivalent in our userspace decoder.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-28 18:33:15 +01:00

1 2

98 Commits