linux

Author	SHA1	Message	Date
Lucas De Marchi	0069000877	drm/i915: Rename IS_GEN to IS_GEN_RANGE RANGE makes it longer, but clearer. We are also going to add a macro to check an individual gen, so add the _RANGE prefix here. Diff generated with: sed 's/IS_GEN(/IS_GEN_RANGE(/g' drivers/gpu/drm/i915/{/,}.{c,h} -i v2: use IS_GEN rather than GT_GEN Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-1-lucas.demarchi@intel.com	2018-12-12 16:51:49 -08:00
Christian König	b312d8ca3a	dma-buf: make fence sequence numbers 64 bit v2 For a lot of use cases we need 64bit sequence numbers. Currently drivers overload the dma_fence structure to store the additional bits. Stop doing that and make the sequence number in the dma_fence always 64bit. For compatibility with hardware which can do only 32bit sequences the comparisons in __dma_fence_is_later only takes the lower 32bits as significant when the upper 32bits are all zero. v2: change the logic in __dma_fence_is_later Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Link: https://patchwork.freedesktop.org/patch/266927/	2018-12-07 12:44:16 +01:00
Chris Wilson	5179749925	drm/i915: Allocate a common scratch page Currently we allocate a scratch page for each engine, but since we only ever write into it for post-sync operations, it is not exposed to userspace nor do we care for coherency. As we then do not care about its contents, we can use one page for all, reducing our allocations and avoid complications by not assuming per-engine isolation. For later use, it simplifies engine initialisation (by removing the allocation that required struct_mutex!) and means that we can always rely on there being a scratch page. v2: Check that we allocated a large enough scratch for I830 w/a Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.18.20+	2018-12-04 15:57:08 +00:00
Tvrtko Ursulin	452420d22d	drm/i915: Fuse per-context workaround handling with the common framework Convert the per context workaround handling code to run against the newly introduced common workaround framework and fuse the two to use the existing smarter list add helper, the one which does the sorted insert and merges registers where possible. This completes migration of all four classes of workarounds onto the common framework. Existing macros are kept untouched for smaller code churn. v2: * Rename to list name ctx_wa_list and move from dev_priv to engine. v3: * API rename and parameters tweaking. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181203133357.10341-1-tvrtko.ursulin@linux.intel.com	2018-12-04 12:23:22 +00:00
Tvrtko Ursulin	69bcdecf1a	drm/i915: Move register white-listing to the common workaround framework Instead of having a separate list of white-listed registers we can trivially move this to the common workarounds framework. This brings us one step closer to the goal of driving all workaround classes using the same code. v2: * Use GEM_DEBUG_WARN_ON for the sanity check. (Chris Wilson) v3: * API rename. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181203125014.3219-6-tvrtko.ursulin@linux.intel.com	2018-12-04 12:23:21 +00:00
Tvrtko Ursulin	4a15c75c42	drm/i915: Introduce per-engine workarounds We stopped re-applying the GT workarounds after engine reset since commit `59b449d5c8` ("drm/i915: Split out functions for different kinds of workarounds"). Issue with this is that some of the GT workarounds live in the MMIO space which gets lost during engine resets. So far the registers in 0x2xxx and 0xbxxx address range have been identified to be affected. This losing of applied workarounds has obvious negative effects and can even lead to hard system hangs (see the linked Bugzilla). Rather than just restoring this re-application, because we have also observed that it is not safe to just re-write all GT workarounds after engine resets (GPU might be live and weird hardware states can happen), we introduce a new class of per-engine workarounds and move only the affected GT workarounds over. Using the framework introduced in the previous patch, we therefore after engine reset, re-apply only the workarounds living in the affected MMIO address ranges. v2: * Move Wa_1406609255:icl to engine workarounds as well. * Rename API. (Chris Wilson) * Drop redundant IS_KABYLAKE. (Chris Wilson) * Re-order engine wa/ init so latest platforms are first. (Rodrigo Vivi) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=107945 Fixes: `59b449d5c8` ("drm/i915: Split out functions for different kinds of workarounds") Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181203133341.10258-1-tvrtko.ursulin@linux.intel.com	2018-12-04 12:23:16 +00:00
Chris Wilson	aa6a65daca	drm/i915: Show waiter's status on engine dump When showing the list of waiters, include the task's status so that we can tell if they have been woken up and are waiting for the CPU, or if they are still waiting to be woken. v2: task_state_to_char() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181121151653.24595-1-chris@chris-wilson.co.uk	2018-11-21 15:57:46 +00:00
Chris Wilson	f911e7234f	drm/i915/selftests: Workaround an issue with unused lockdep subclass lockdep insists that if we give a lock a subclass, it must be used. Failure to do so triggers a self-consistency check when reading lockdep_stats: [ 49.902002] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != nr_unused) [ 49.902009] WARNING: CPU: 3 PID: 383 at kernel/locking/lockdep_proc.c:249 lockdep_stats_show+0x984/0xa10 [ 49.902026] Modules linked in: nls_ascii nls_cp437 vfat fat crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf intel_gtt efivars prime_numbers ahci libahci i2c_i801 video button efivarfs [last unloaded: drm_kms_helper] [ 49.902059] CPU: 3 PID: 383 Comm: cat Tainted: G U 4.20.0-rc2+ #304 [ 49.902068] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 49.902079] RIP: 0010:lockdep_stats_show+0x984/0xa10 [ 49.902086] Code: 00 85 c0 0f 84 aa f8 ff ff 8b 05 77 37 e2 00 85 c0 0f 85 9c f8 ff ff 48 c7 c6 e0 57 bc 81 48 c7 c7 28 30 bb 81 e8 6b 77 fa ff <0f> 0b e9 82 f8 ff ff 48 c7 44 24 50 00 00 00 00 45 31 e4 31 db 31 [ 49.902103] RSP: 0018:ffffc90000247d58 EFLAGS: 00010292 [ 49.902110] RAX: 0000000000000044 RBX: 00000000000002f0 RCX: 0000000000000000 [ 49.902118] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffff810b3464 [ 49.902126] RBP: 0000000000000039 R08: 0000000000000002 R09: 0000000000000000 [ 49.902133] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000007ead [ 49.902141] R13: 0000000000000001 R14: ffff88884c021000 R15: 0000000000000097 [ 49.902150] FS: 00007fb347e66540(0000) GS:ffff88885e600000(0000) knlGS:0000000000000000 [ 49.902159] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.902165] CR2: 00007fb347aeb000 CR3: 00000008544bd005 CR4: 00000000001606e0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181115203851.25739-1-chris@chris-wilson.co.uk	2018-11-16 08:38:21 +00:00
Rodrigo Vivi	9e7833758b	drm/i915: Prefer IS_GEN<n> check with bitmask. Whenever possible we should stick with IS_GEN<n> checks. Bitmaks has been introduced on commit `ae7617f0ef` ("drm/i915: Allow optimized platform checks") for efficiency. Let's stick with it whenever possible. This patch was generated with coccinelle: spatch -sp_file is_gen.cocci *{c,h} --in-place is_gen.cocci: @gen2@ expression e; @@ -INTEL_GEN(e) == 2 +IS_GEN2(e) @gen3@ expression e; @@ -INTEL_GEN(e) == 3 +IS_GEN3(e) @gen4@ expression e; @@ -INTEL_GEN(e) == 4 +IS_GEN4(e) @gen5@ expression e; @@ -INTEL_GEN(e) == 5 +IS_GEN5(e) @gen6@ expression e; @@ -INTEL_GEN(e) == 6 +IS_GEN6(e) @gen7@ expression e; @@ -INTEL_GEN(e) == 7 +IS_GEN7(e) @gen8@ expression e; @@ -INTEL_GEN(e) == 8 +IS_GEN8(e) @gen9@ expression e; @@ -INTEL_GEN(e) == 9 +IS_GEN9(e) @gen10@ expression e; @@ -INTEL_GEN(e) == 10 +IS_GEN10(e) @gen11@ expression e; @@ -INTEL_GEN(e) == 11 +IS_GEN11(e) Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181026195143.20353-1-rodrigo.vivi@intel.com	2018-10-29 10:44:11 -07:00
Tvrtko Ursulin	bbb8a9d7e0	drm/i915: GEM_WARN_ON considered harmful GEM_WARN_ON currently has dangerous semantics where it is completely compiled out on !GEM_DEBUG builds. This can leave users who expect it to be more like a WARN_ON, just without a warning in non-debug builds, in complete ignorance. Another gotcha with it is that it cannot be used as a statement. Which is again different from a standard kernel WARN_ON. This patch fixes both problems by making it behave as one would expect. It can now be used both as an expression and as statement, and also the condition evaluates properly in all builds - code under the conditional will therefore not unexpectedly disappear. To satisfy call sites which really want the code under the conditional to completely disappear, we add GEM_DEBUG_WARN_ON and convert some of the callers to it. This one can also be used as both expression and statement. >From the above it follows GEM_DEBUG_WARN_ON should be used in situations where we are certain the condition will be hit during development, but at a place in code where error can be handled to the benefit of not crashing the machine. GEM_WARN_ON on the other hand should be used where condition may happen in production and we just want to distinguish the level of debugging output emitted between the production and debug build. v2: * Dropped BUG_ON hunk. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181012063142.16080-1-tvrtko.ursulin@linux.intel.com	2018-10-18 10:10:12 +01:00
Jani Nikula	410ed5731a	drm/i915: Ensure intel_engine_init_execlist() builds with Clang Clang build with UBSAN enabled leads to the following build error: drivers/gpu/drm/i915/intel_engine_cs.o: In function `intel_engine_init_execlist': drivers/gpu/drm/i915/intel_engine_cs.c:411: undefined reference to `__compiletime_assert_411' Again, for this to work the code would first need to be inlined and then constant folded, which doesn't work for Clang because semantic analysis happens before optimization/inlining. Use GEM_BUG_ON() instead of BUILD_BUG_ON(). v2: Use is_power_of_2() from log2.h (Chris) References: http://mid.mail-archive.com/20181015203410.155997-1-swboyd@chromium.org Reported-by: Stephen Boyd <swboyd@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181016122938.18757-2-jani.nikula@intel.com	2018-10-17 09:56:56 +03:00
Michal Wajdeczko	645ff9e371	drm/i915: Inject load failure inside intel_engines_init_mmio We need extra load failure point to better test error path in i915_driver_init_mmio. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181011130008.24640-2-michal.wajdeczko@intel.com	2018-10-11 21:06:52 +01:00
Chris Wilson	85f5e1f385	drm/i915: Combine multiple internal plists into the same i915_priolist bucket As we are about to allow ourselves to slightly bump the user priority into a few different sublevels, packthose internal priority lists into the same i915_priolist to keep the rbtree compact and avoid having to allocate the default user priority even after the internal bumping. The downside to having an requests[] rather than a node per active list, is that we then have to walk over the empty higher priority lists. To compensate, we track the active buckets and use a small bitmap to skip over any inactive ones. v2: Use MASK of internal levels to simplify our usage. v3: Prevent overflow when SHIFT is zero. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001123204.23982-4-chris@chris-wilson.co.uk	2018-10-01 15:26:20 +01:00
Chris Wilson	74f6e18391	drm/i915: Convert to BITS_PER_TYPE In commit `9144d75e22` ("include/linux/bitops.h: introduce BITS_PER_TYPE"), we made BITS_PER_TYPE available to all and now we can use the macro to replace some open-coded computation of sizeof(T) * BITS_PER_BYTE. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180926104707.17410-1-chris@chris-wilson.co.uk	2018-09-26 13:32:03 +01:00
Chris Wilson	22495b68f9	drm/i915: Flush the tasklet when checking for idle In order to reduce latency when checking for idle we kick the tasklet directly. Sometimes this is not enough as it is queued on another cpu and so to improve the accuracy of this idle-check (and so to reduce latency overall by avoiding another pass, or worse declaring a timeout!) wait for the tasklet to complete. References: https://bugs.freedesktop.org/show_bug.cgi?id=107916 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-2-chris@chris-wilson.co.uk	2018-09-14 11:55:00 +01:00
Chris Wilson	d6acae363e	drm/i915: Use a cached mapping for the physical HWS Older gen use a physical address for the hardware status page, for which we use cache-coherent writes. As the writes are into the cpu cache, we use a normal WB mapped page to read the HWS, used for our seqno tracking. Anecdotally, I observed lost breadcrumbs writes into the HWS on i965gm, which so far have not reoccurred with this patch. How reliable that evidence is remains to be seen. v2: Explicitly pass the expected physical address to the hw v3: Also remember the wild writes we once had for HWS above 4G. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180903152304.31589-2-chris@chris-wilson.co.uk	2018-09-03 17:55:59 +01:00
Chris Wilson	a0e731f4e2	drm/i915: Combine cleanup_status_page() Pull the physical status page cleanup into a common cleanup_status_page() for caller simplicity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903152304.31589-1-chris@chris-wilson.co.uk	2018-09-03 17:55:58 +01:00
Chris Wilson	df4f94e810	drm/i915: Correct CSB probing for engine state dumper Since we no longer maintain our read position in the CSB pointers register, it always returns 0 and not where we last read up to. As a result the CSB probing in the state dumper starts from 0, either missing entries or showing stale one. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180821101138.15822-1-chris@chris-wilson.co.uk	2018-08-21 11:39:33 +01:00
Chris Wilson	a99b32a6ff	drm/i915: Clear stop-engine for a pardoned reset If we pardon a per-engine reset, we may leave the STOP_RING bit asserted in RING_MI_MODE resulting in the engine hanging. Unconditionally clear it on the per-engine exit path as we know that either we skipped the reset and so need the cancellation, or the reset was successful and the cancellation is a no-op, or there was an error and we will follow up with a full-reset or wedging (both of which will stop the engines again as required). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106560 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180814171857.24673-1-chris@chris-wilson.co.uk	2018-08-15 10:15:28 +01:00
Chris Wilson	97f0615800	drm/i915: Pull seqno started checks together We have a few instances of checking seqno-1 to see if the HW has started the request. Pull those together under a helper. v2: Pull the !seqno assertion higher, as given seqno==1 we may indeed check to see if we have started using seqno==0. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180806112605.20725-1-chris@chris-wilson.co.uk	2018-08-07 12:43:00 +01:00
Chris Wilson	7a859c655d	drm/i915: Eliminate use of PAGE_SIZE as a virtual alignment Using PAGE_SIZE for virtual offset alignment is superfluous as it is equal to the minimum gtt alignment and so equivalent to 0. It is also the wrong value to use as we stopped using physical page constructs for the virtual GTT, i.e. it would be preferrable to use I915_GTT_PAGE_SIZE and in these cases merely imply I915_GTT_MIN_ALIGNMENT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180727091855.1879-1-chris@chris-wilson.co.uk	2018-07-27 11:05:28 +01:00
Chris Wilson	6a2f59e45a	drm/i915: Pull unpin map into vma release A reasonably common operation is to pin the map of the vma alongside the vma itself for the lifetime of the vma, and so release both pins at the same time as destroying the vma. It is common enough to pull into the release function, making that central function more attractive to a couple of other callsites. The continual ulterior motive is to sweep over errors on module load aborting... Testcase: igt/drv_module_reload/basic-reload-inject Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180721125037.20127-1-chris@chris-wilson.co.uk	2018-07-24 09:55:12 +01:00
Chris Wilson	9701975e85	drm/i915: Do not short-circuit tasklets during reset Inside intel_engine_is_idle(), we flush the tasklet to ensure that is being run in a timely fashion (ksoftirqd has taught us to expect the worst). However, if we are in the middle of reset, the HW may not yet be ready to execute the submission tasklet and so we must respect the disable flag. Fixes: `dd0cf235d8` ("drm/i915: Speed up idle detection by kicking the tasklets") Testcase: igt/drv_selftest/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713203529.1973-2-chris@chris-wilson.co.uk	2018-07-13 22:32:20 +01:00
Chris Wilson	655250a8d1	drm/i915/execlists: Switch to rb_root_cached The kernel recently gained an augmented rbtree with the purpose of cacheing the leftmost element of the rbtree, a frequent optimisation to avoid calls to rb_first() which is also employed by the execlists->queue. Switch from our open-coded cache to the library. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180629075348.27358-9-chris@chris-wilson.co.uk	2018-07-11 14:38:45 +01:00
Chris Wilson	890fd185d5	drm/i915: Replace nested subclassing with explicit subclasses In the next patch, we will want a third distinct class of timeline that may overlap with the current pair of client and engine timeline classes. Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise the different timeline classes with an explicit subclass. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk	2018-07-07 08:09:43 +01:00
Chris Wilson	481827b441	drm/i915: Record logical context support in driver caps Avoid looking at the magical engines[RCS] to decide if the HW and driver supports logical contexts, and instead record that knowledge during initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706101442.21279-1-chris@chris-wilson.co.uk	2018-07-06 14:05:23 +01:00
Gustavo A. R. Silva	f0d759f038	drm/i915: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 141432 Addresses-Coverity-ID: 141433 Addresses-Coverity-ID: 141434 Addresses-Coverity-ID: 141435 Addresses-Coverity-ID: 141436 Addresses-Coverity-ID: 1357360 Addresses-Coverity-ID: 1357403 Addresses-Coverity-ID: 1357433 Addresses-Coverity-ID: 1392622 Addresses-Coverity-ID: 1415273 Addresses-Coverity-ID: 1435752 Addresses-Coverity-ID: 1441500 Addresses-Coverity-ID: 1454596 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com	2018-07-05 16:40:51 +03:00
Chris Wilson	9512f985c3	drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Back in commit `27af5eea54` ("drm/i915: Move execlists irq handler to a bottom half"), we came to the conclusion that running our CSB processing and ELSP submission from inside the irq handler was a bad idea. A really bad idea as we could impose nearly 1s latency on other users of the system, on average! Deferring our work to a tasklet allowed us to do the processing with irqs enabled, reducing the impact to an average of about 50us. We have since eradicated the use of forcewaked mmio from inside the CSB processing and ELSP submission, bringing the impact down to around 5us (on Kabylake); an order of magnitude better than our measurements 2 years ago on Broadwell and only about 2x worse on average than the gem_syslatency on an unladen system. In this iteration of the tasklet-vs-direct submission debate, we seek a compromise where by we submit new requests immediately to the HW but defer processing the CS interrupt onto a tasklet. We gain the advantage of low-latency and ksoftirqd avoidance when waking up the HW, while avoiding the system-wide starvation of our CS irq-storms. Comparing the impact on the maximum latency observed (that is the time stolen from an RT process) over a 120s interval, repeated several times (using gem_syslatency, similar to RT's cyclictest) while the system is fully laden with i915 nops, we see that direct submission an actually improve the worse case. Maximum latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 2) x Always using tasklets (a couple of >1000us outliers removed) + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ \| + \| \| + \| \| + \| \| + + \| \| + + + \| \| + + + + x x x \| \| +++ + + + x x x x x x \| \| +++ + ++ + + x x x x x x \| \| +++ + ++ + x x x x x \| \| + +++ + ++ * * +xxx x x xx \| \| * +++ + ++++* x+xx+ x x xxxx x \| \| *x++++++*+xx**x+ +x xx xxxx x x \| \|x* ****+************+++**xxxxxx xxx xxx + x+\| \| \|__________MA___________\| \| \| \|______M__A________\| \| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 118 91 186 124 125.28814 16.279137 + 120 92 187 109 112.00833 13.458617 Difference at 95.0% confidence -13.2798 +/- 3.79219 -10.5994% +/- 3.02677% (Student's t, pooled s = 14.9237) However the mean latency is adversely affected: Mean latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 1) x Always using tasklets + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ \| xxxxxx + ++ \| \| xxxxxx + ++ \| \| xxxxxx + +++ ++ \| \| xxxxxxx +++++ ++ \| \| xxxxxxx +++++ ++ \| \| xxxxxxx +++++ +++ \| \| xxxxxxx + ++++++++++ \| \| xxxxxxxx ++ ++++++++++ \| \| xxxxxxxx ++ ++++++++++ \| \| xxxxxxxxxx +++++++++++++++ \| \| xxxxxxxxxxx x +++++++++++++++ \| \|x xxxxxxxxxxxxx x + + ++++++++++++++++++ +\| \| \|__A__\| \| \| \|____A___\| \| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 3.506 3.727 3.631 3.6321417 0.02773109 + 120 3.834 4.149 4.039 4.0375167 0.041221676 Difference at 95.0% confidence 0.405375 +/- 0.00888913 11.1608% +/- 0.244735% (Student's t, pooled s = 0.03513) However, since the mean latency corresponds to the amount of irqsoff processing we have to do for a CS interrupt, we only need to speed that up to benefit not just system latency but our own throughput. v2: Remember to defer submissions when under reset. v4: Only use direct submission for new requests v5: Be aware that with mixing direct tasklet evaluation and deferred tasklets, we may end up idling before running the deferred tasklet. v6: Remove the redudant likely() from tasklet_is_enabled(), restrict the annotation to reset_in_progress(). v7: Take the full timeline.lock when enabling perf_pmu stats as the tasklet is no longer a valid guard. A consequence is that the stats are now only valid for engines also using the timeline.lock to process state. Testcase: igt/gem_exec_latency/rthog* References: `27af5eea54` ("drm/i915: Move execlists irq handler to a bottom half") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-9-chris@chris-wilson.co.uk	2018-06-28 22:55:10 +01:00
Chris Wilson	fd8526e509	drm/i915/execlists: Trust the CSB Now that we use the CSB stored in the CPU friendly HWSP, we do not need to track interrupts for when the mmio CSB registers are valid and can just check where we read up to last from the cached HWSP. This means we can forgo the atomic bit tracking from interrupt, and in the next patch it means we can check the CSB at any time. v2: Change the splitting inside reset_prepare, we only want to lose testing the interrupt in this patch, the next patch requires the change in locking Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-8-chris@chris-wilson.co.uk	2018-06-28 22:55:09 +01:00
Chris Wilson	bc4237ec8d	drm/i915/execlists: Unify CSB access pointers Following the removal of the last workarounds, the only CSB mmio access is for the old vGPU interface. The mmio registers presented by vGPU do not require forcewake and can be treated as ordinary volatile memory, i.e. they behave just like the HWSP access just at a different location. We can reduce the CSB access to a set of read/write/buffer pointers and treat the various paths identically and not worry about forcewake. (Forcewake is nightmare for worstcase latency, and we want to process this all with irqsoff -- no latency allowed!) v2: Comments, comments, comments. Well, 2 bonus comments. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-5-chris@chris-wilson.co.uk	2018-06-28 22:55:06 +01:00
Chris Wilson	26eb4cd6c7	drm/i915: Disable bh around call to tasklet The guc submission backends expects to only be run from (at least) softirq context, but during our intel_engine_is_idle() check we would call into the tasklet to make sure it was flushed. As this could occur from process context, occasionally we would be caught out using a wait_for_atomic() not from an atomic context: [ 59.939091] WARN_ON_ONCE((1) && !(preempt_count() != 0)) [ 59.939142] WARNING: CPU: 1 PID: 2901 at drivers/gpu/drm/i915/intel_guc_submission.c:615 guc_submission_tasklet+0x784/0xa90 [i915] [ 59.939143] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei prime_numbers [ 59.939164] CPU: 1 PID: 2901 Comm: gem_exec_schedu Tainted: G U W 4.18.0-rc1-g93475d62c730-drmtip_67+ #1 [ 59.939165] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018 [ 59.939188] RIP: 0010:guc_submission_tasklet+0x784/0xa90 [i915] [ 59.939189] Code: fc ff ff 80 3d 2f 87 11 00 00 0f 85 80 fb ff ff 48 c7 c6 f8 49 40 c0 48 c7 c7 80 41 3e c0 c6 05 14 87 11 00 01 e8 2c ea d6 d3 <0f> 0b e9 5f fb ff ff 8b 46 38 89 cf 31 c7 83 e7 c0 75 08 39 c1 0f [ 59.939253] RSP: 0018:ffffaafe08a03c10 EFLAGS: 00010286 [ 59.939255] RAX: 0000000000000000 RBX: ffff8f9112c246f0 RCX: 0000000000000001 [ 59.939256] RDX: 0000000080000001 RSI: ffffffff95086d8e RDI: 00000000ffffffff [ 59.939257] RBP: ffff8f9112c24680 R08: 000000009517be77 R09: 0000000000000000 [ 59.939258] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f9112c24700 [ 59.939259] R13: ffff8f9112c24700 R14: 0000000000000000 R15: ffff8f9112c242a8 [ 59.939260] FS: 00007fc2cc7e5980(0000) GS:ffff8f9136c40000(0000) knlGS:0000000000000000 [ 59.939261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.939262] CR2: 00007fc2cc815040 CR3: 000000021f10e003 CR4: 00000000003606e0 [ 59.939263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.939264] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.939265] Call Trace: [ 59.939288] ? intel_engine_is_idle+0x64/0x160 [i915] [ 59.939323] ? intel_engine_dump+0x638/0x890 [i915] [ 59.939327] ? seq_printf+0x49/0x70 [ 59.939353] ? i915_engine_info+0xc8/0x100 [i915] [ 59.939356] ? drm_get_color_range_name+0x20/0x20 [ 59.939361] ? seq_read+0xf1/0x470 [ 59.939365] ? trace_hardirqs_on_caller+0xe0/0x1b0 [ 59.939370] ? full_proxy_read+0x51/0x80 [ 59.939389] ? __vfs_read+0x31/0x170 [ 59.939395] ? do_sys_open+0x13b/0x240 [ 59.939398] ? rcu_read_lock_sched_held+0x6f/0x80 [ 59.939401] ? vfs_read+0x9e/0x140 [ 59.939404] ? ksys_read+0x50/0xc0 [ 59.939409] ? do_syscall_64+0x55/0x190 [ 59.939412] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 59.939420] irq event stamp: 552834 [ 59.939422] hardirqs last enabled at (552833): [<ffffffff940fc74c>] console_unlock+0x3fc/0x600 [ 59.939425] hardirqs last disabled at (552834): [<ffffffff94a0111c>] error_entry+0x7c/0x100 [ 59.939451] softirqs last enabled at (552614): [<ffffffffc02e0f53>] i915_request_add+0x2e3/0x7b0 [i915] [ 59.939470] softirqs last disabled at (552604): [<ffffffffc02e0ecb>] i915_request_add+0x25b/0x7b0 [i915] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106977 Fixes: `dd0cf235d8` ("drm/i915: Speed up idle detection by kicking the tasklets") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180620135929.23956-1-chris@chris-wilson.co.uk	2018-06-20 22:22:52 +01:00
Chris Wilson	4fdd5b4e9a	drm/i915: Fix fallout of fake reset along resume commit `b2209e62a4` ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization") and commit `1288786b18` ("drm/i915: Move GEM sanitize from resume_early to resume") show the conflicting requirements on the code. We must reset the GPU before trashing live state on a fast resume (hibernation debug, or error paths), but we must only reset our state tracking iff the GPU is reset (or power cycled). This is tricky if we are disabling GPU reset to simulate broken hardware; we reset our state tracking but the GPU is left intact and recovers from its stale state. v2: Again without the assertion for forcewake, no longer required since commit `b3ee09a4de` ("drm/i915/ringbuffer: Fix context restore upon reset") as the contexts are reset from the CS ensuring everything is powered up. Fixes: `b2209e62a4` ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization") Fixes: `1288786b18` ("drm/i915: Move GEM sanitize from resume_early to resume") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk	2018-06-18 10:14:54 +01:00
Chris Wilson	e62230deee	drm/i915: Show CCID in engine dumps For debugging context issues, knowing what context the GPU is loading/using is helpful. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180614094103.18025-4-chris@chris-wilson.co.uk	2018-06-14 14:35:12 +01:00
Chris Wilson	286e615356	drm/i915: Make the hexdump row offset visually distinct Currently we use %08x for the row offset, and %08x for the binary contents of the buffer. This makes it very easily to confuse the two, so switch to using [%04x] for the start-of-row offset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180614094103.18025-3-chris@chris-wilson.co.uk	2018-06-14 14:35:12 +01:00
Chris Wilson	83c317832e	drm/i915: Dump the ringbuffer of the active request for debugging Sometimes we need to see what instructions we emitted for a request to try and gather a glimmer of insight into what the GPU is doing when it stops responding. v2: Move ring dumping into its own routine Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180614122150.17552-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2018-06-14 14:34:13 +01:00
Chris Wilson	d9d117e40d	drm/i915/ringbuffer: Serialize load of PD_DIR After triggering the mm switch with a load of PD_DIR, which may be deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands with that load by posting a read of PD_DIR (or else those subsequent commands may access the stale page tables). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611171825.13678-2-chris@chris-wilson.co.uk	2018-06-12 09:10:04 +01:00
Chris Wilson	b3ee09a4de	drm/i915/ringbuffer: Fix context restore upon reset The discovery with trying to enable full-ppgtt was that we were completely failing to the load both the mm and context following the reset. Although we were performing mmio to set the PP_DIR (per-process GTT) and CCID (context), these were taking no effect (the assumption was that this would trigger reload of the context and restore the page tables). It was not until we performed the LRI + MI_SET_CONTEXT in a following context switch would anything occur. Since we are then required to reset the context image and PP_DIR using CS commands, we place those commands into every batch. The hardware should recognise the no-ops and eliminate the expensive context loads, but we still have to pay the cost of using cross-powerwell register writes. In practice, this has no effect on actual context switch times, and only adds a few hundred nanoseconds to no-op switches. We can improve the latter by eliminating the w/a around known no-op switches, but there is an ulterior motive to keeping them. Always emitting the context switch at the beginning of the request (and relying on HW to skip unneeded switches) does have one key advantage. Should we implement request reordering on Haswell, we will not know in advance what the previous executing context was on the GPU and so we would not be able to elide the MI_SET_CONTEXT commands ourselves and always have to emit them. Having our hand forced now actually prepares us for later. Now since that context and mm follow the request, we no longer (and not for a long time since requests took over!) require a trace point to tell when we write the switch into the ring, since it is always. (This is even more important when you remember that simply writing into the ring bears no relation to the current mm.) v2: Sandybridge has to agree to use LRI as well. Testcase: igt/drv_selftests/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk	2018-06-11 14:03:47 +01:00
Chris Wilson	82ad6443a5	drm/i915/gtt: Rename i915_hw_ppgtt base member In the near future, I want to subclass gen6_hw_ppgtt as it contains a few specialised members and I wish to add more. To avoid the ugliness of using ppgtt->base.base, rename the i915_hw_ppgtt base member (i915_address_space) as vm, which is our common shorthand for an i915_address_space local. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk	2018-06-05 21:11:20 +01:00
Yunwei Zhang	d78fa508f0	drm/i915/icl: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Icelake as well. References: HSD#1405586840, BSID#0575 v2: - GEN11 mask is different from its predecessors. (Oscar) - Better separate GEN10 and GEN11. (Oscar) Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Yunwei Zhang <yunwei.zhang@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1526683232-24753-1-git-send-email-yunwei.zhang@intel.com	2018-05-24 12:52:52 +03:00
Yunwei Zhang	1e40d4aea5	drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReads WaProgramMgsrForCorrectSliceSpecificMmioReads dictate that before any MMIO read into Slice/Subslice specific registers, MCR packet control register(0xFDC) needs to be programmed to point to any enabled slice/subslice pair. Otherwise, incorrect value will be returned. However, that means each subsequent MMIO read will be forwarded to a specific slice/subslice combination as read is unicast. This is OK since slice/subslice specific register values are consistent in almost all cases across slice/subslice. There are rare occasions such as INSTDONE that this value will be dependent on slice/subslice combo, in such cases, we need to program 0xFDC and recover this after. This is already covered by read_subslice_reg. Also, 0xFDC will lose its information after TDR/engine reset/power state change. References: HSD#1405586840, BSID#0575 v2: - use fls() instead of find_last_bit() (Chris) - added INTEL_SSEU to extract sseu from device info. (Chris) v3: - rebase on latest tip v5: - Added references (Mika) - Change the ordered of passing arguments and etc. (Ursulin) v7: - Moved WA explanation Comments(Oscar) - Rebased. v8: - Renamed sanitize_mcr to calculate_s_ss_select. (Oscar) - calculate s/ss selector instead of whole mcr. (Oscar) v9: - Updated function name (Oscar) - Remove redundant variables (Oscar) v10: - Separate pre-GEN10 and GEN11 mask. (Oscar) Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Yunwei Zhang <yunwei.zhang@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1526683197-24656-1-git-send-email-yunwei.zhang@intel.com	2018-05-24 12:52:51 +03:00
Chris Wilson	fe0c493538	drm/i915/execlists: Handle copying default context state for atomic reset We want to be able to reset the GPU from inside a timer callback (hardirq context). One step requires us to copy the default context state over to the guilty context, which means we need to plan in advance to have that object accessible from within an atomic context. The atomic context prevents us from pinning the object or from peeking into the shmemfs backing store (all may sleep), so we choose to pin the default_state into memory when the engine becomes active. This compromise allows us to swap out the default state when idle, when required. References: `5692251c25` ("drm/i915/lrc: Scrub the GPU state of the guilty hanging request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180518090212.5349-2-chris@chris-wilson.co.uk	2018-05-19 12:50:58 +01:00
Chris Wilson	d6d12ec081	drm/i915: Make intel_engine_dump irqsafe To be useful later, enable intel_engine_dump() to be called from irq context (i.e. using saving and restoring irq start rather than assuming we enter with irqs enabled). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180518090212.5349-1-chris@chris-wilson.co.uk	2018-05-19 12:50:57 +01:00
Chris Wilson	dd0cf235d8	drm/i915: Speed up idle detection by kicking the tasklets We rely on ksoftirqd to run in a timely fashion in order to drain the execlists queue. Quite frequently, it does not. In some cases we may see latencies of over 200ms triggering our idle timeouts and forcing us to declare the driver wedged! Thus we can speed up idle detection by bypassing ksoftirqd in these cases and flush our tasklet to confirm if we are indeed still waiting for the ELSP to drain. v2: Put the execlists.first check back; it is required for handling reset! References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180506171328.30034-1-chris@chris-wilson.co.uk	2018-05-19 12:50:57 +01:00
Chris Wilson	1fc44d9b1a	drm/i915: Store a pointer to intel_context in i915_request To ease the frequent and ugly pointer dance of &request->gem_context->engine[request->engine->id] during request submission, store that pointer as request->hw_context. One major advantage that we will exploit later is that this decouples the logical context state from the engine itself. v2: Set mock_context->ops so we don't crash and burn in selftests. Cleanups from Tvrtko. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk	2018-05-18 09:35:22 +01:00
Chris Wilson	01278cb143	drm/i915: Move fiddling with engine->last_retired_context Move the knowledge about resetting the current context tracking on the engine from inside i915_gem_context.c into intel_engine_cs.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-2-chris@chris-wilson.co.uk	2018-05-18 09:35:21 +01:00
Chris Wilson	4e0d64dba8	drm/i915: Move request->ctx aside In the next patch, we want to store the intel_context pointer inside i915_request, as it is frequently access via a convoluted dance when submitting the request to hw. Having two context pointers inside i915_request leads to confusion so first rename the existing i915_gem_context pointer to i915_request.gem_context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk	2018-05-18 09:35:17 +01:00
Chris Wilson	96d4f03c20	drm/i915: Nul-terminate legacy debug string Make sure that when we don't have any scheduler attributes for the request, the string is terminated. Fixes: `247870ac8e` ("drm/i915: Build request info on stack before printk") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517152824.11619-1-chris@chris-wilson.co.uk	2018-05-17 20:50:28 +01:00
Chris Wilson	3f6e982230	drm/i915: Stop parking the signaler around reset We cannot call kthread_park() from softirq context, so let's avoid it entirely during the reset. We wanted to suspend the signaler so that it would not mark a request as complete at the same time as we marked it as being in error. Instead of parking the signaling, stop the engine from advancing so that the GPU doesn't emit the breadcrumb for our chosen "guilty" request. v2: Refactor setting STOP_RING so that we don't have the same code thrice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michałt Winiarski <michal.winiarski@intel.com> CC: Michel Thierry <michel.thierry@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-8-chris@chris-wilson.co.uk	2018-05-16 20:20:39 +01:00
Chris Wilson	4db518e4e8	drm/i915/execlists: Relax CSB force-mmio for VT-d The original switch to use CSB from the HWSP was plagued by the effect of read ordering on VT-d; we would read the WRITE pointer from the HWSP before it had completed writing the CSB contents. The mystery comes down to the lack of rmb() for correct ordering with respect to the writes from HW, and with that resolved we can remove the VT-d special casing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-3-chris@chris-wilson.co.uk Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-05-14 09:40:58 +01:00
Chris Wilson	e71a82d8c1	Revert "drm/i915/cnl: Use mmio access to context status buffer" In the previous patch (to include a rmb() after readig the CSB WRITE pointer from the HWSP) we believe we have fixed the underlying bug, and so can re-enable using the HWSP on Cannolake. This reverts commit `61bf9719fa` ("drm/i915/cnl: Use mmio access to context status buffer"). References: https://bugs.freedesktop.org/show_bug.cgi?id=105888 References: https://bugs.freedesktop.org/show_bug.cgi?id=106185 References: `61bf9719fa` ("drm/i915/cnl: Use mmio access to context status buffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Timo Aaltonen <tjaalton@ubuntu.com> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-2-chris@chris-wilson.co.uk	2018-05-11 16:41:43 +01:00
Chris Wilson	e21b141376	drm/i915: Mark the hangcheck as idle when unparking the engines As we unpark the engines and are about to begin a new cycle of activity, mark the current status of the hangceck as idle so that we avoid carrying over a stale timestamp/action into the next cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-2-chris@chris-wilson.co.uk	2018-05-03 10:43:46 +01:00
Chris Wilson	a89d1f921c	drm/i915: Split i915_gem_timeline into individual timelines We need to move to a more flexible timeline that doesn't assume one fence context per engine, and so allow for a single timeline to be used across a combination of engines. This means that preallocating a fence context per engine is now a hindrance, and so we want to introduce the singular timeline. From the code perspective, this has the notable advantage of clearing up a lot of mirky semantics and some clumsy pointer chasing. By splitting the timeline up into a single entity rather than an array of per-engine timelines, we can realise the goal of the previous patch of tracking the timeline alongside the ring. v2: Tweak wait_for_idle to stop the compiling thinking that ret may be uninitialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk	2018-05-02 23:57:18 +01:00
Chris Wilson	65fcb8064d	drm/i915: Move timeline from GTT to ring In the future, we want to move a request between engines. To achieve this, we first realise that we have two timelines in effect here. The first runs through the GTT is required for ordering vma access, which is tracked currently by engine. The second is implied by sequential execution of commands inside the ringbuffer. This timeline is one that maps to userspace's expectations when submitting requests (i.e. given the same context, batch A is executed before batch B). As the rings's timelines map to userspace and the GTT timeline an implementation detail, move the timeline from the GTT into the ring itself (per-context in logical-ring-contexts/execlists, or a global per-engine timeline for the shared ringbuffers in legacy submission. The two timelines are still assumed to be equivalent at the moment (no migrating requests between engines yet) and so we can simply move from one to the other without adding extra ordering. v2: Reinforce that one isn't allowed to mix the engine execution timeline with the client timeline from userspace (on the ring). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk	2018-05-02 23:57:13 +01:00
Chris Wilson	3a068721a9	drm/i915: Show ring->start for the ELSP context/request queue Since the advent of execlists, the HW no longer executes from a single statically assigned ring, but instead switches to a different ring for each context (logical ringbuffer contexts as it is called). So a good way to tally the executing context against what we have queued is by comparing the RING_START register against our requests. Make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502104150.29874-1-chris@chris-wilson.co.uk	2018-05-02 15:27:22 +01:00
Chris Wilson	ab82a0635c	drm/i915: Wrap engine->context_pin() and engine->context_unpin() Make life easier in upcoming patches by moving the context_pin and context_unpin vfuncs into inline helpers. v2: Fixup mock_engine to mark the context as pinned on use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk	2018-04-30 16:01:13 +01:00
Chris Wilson	52d7f16e55	drm/i915: Stop tracking timeline->inflight_seqnos In commit `9b6586ae9f` ("drm/i915: Keep a global seqno per-engine"), we moved from a global inflight counter to per-engine counters in the hope that will be easy to run concurrently in future. However, with the advent of the desire to move requests between engines, we do need a global counter to preserve the semantics that no engine wraps in the middle of a submit. (Although this semantic is now only required for gen7 semaphore support, which only supports greater-then comparisons!) v2: Keep a global counter of all requests ever submitted and force the reset when it wraps. References: `9b6586ae9f` ("drm/i915: Keep a global seqno per-engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-1-chris@chris-wilson.co.uk	2018-04-30 16:01:11 +01:00
Tvrtko Ursulin	741258cdd2	drm/i915: Use seqlock in engine stats We can convert engine stats from a spinlock to seqlock to ensure interrupt processing is never even a tiny bit delayed by parallel readers. There is a smidgen bit more cost on the write lock side, and an extremely unlikely chance that readers will have to retry a few times in face of heavy interrupt load. But it should be extremely unlikely given how lightweight read side section is compared to the interrupt processing side, and also compared to the rest of the code paths which can lead into it. Furthermore, writer is the ones doing the real, latency sensitive work, while readers are only informative. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180426074716.7352-1-tvrtko.ursulin@linux.intel.com	2018-04-26 10:10:05 +01:00
Chris Wilson	aaab22bcd1	drm/i915: Skip printing global offsets for per-engine scratch pages Knowing the offset of the per-engine scratch/HWS page during boot is not very informative, so remove the DRM_DEBUG. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180424115236.2022-1-chris@chris-wilson.co.uk	2018-04-24 14:00:37 +01:00
Chris Wilson	56021f48db	drm/i915: Don't dump umpteen thousand requests If we have more than a few, possibly several thousand request in the queue, don't show the central portion, just the first few and the last being executed and/or queued. The first few should be enough to help identify a problem in execution, and most often comparing the first/last in the queue is enough to identify problems in the scheduling. We may need some fine tuning to set MAX_REQUESTS_TO_SHOW for common debug scenarios, but for the moment if we can avoiding spending more than a few seconds dumping the GPU state that will avoid a nasty livelock (where hangcheck spends so long dumping the state, it fires again and starts to dump the state again in parallel, ad infinitum). v2: Remember to print last not the stale rq iter after the loop. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180424081600.27544-1-chris@chris-wilson.co.uk	2018-04-24 14:00:37 +01:00
Chris Wilson	247870ac8e	drm/i915: Build request info on stack before printk printk unhelpfully inserts a '\n' between consecutive calls, and since our drm_printf wrapper may be emitting info a seq_file instead, KERN_CONT is not an option. To work with any drm_printf destination, we need to build up the output into a temporary buf on the stack and then feed the complete line in a single call to printk. Fixes: `b7268c5eed` ("drm/i915: Pack params to engine->schedule() into a struct") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180424010839.22860-1-chris@chris-wilson.co.uk	2018-04-24 14:00:37 +01:00
Chris Wilson	b7268c5eed	drm/i915: Pack params to engine->schedule() into a struct Today we only want to pass along the priority to engine->schedule(), but in the future we want to have much more control over the various aspects of the GPU during a context's execution, for example controlling the frequency allowed. As we need an ever growing number of parameters for scheduling, move those into a struct for convenience. v2: Move the anonymous struct into its own function for legibility and ye olde gcc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk	2018-04-18 21:09:11 +01:00
Chris Wilson	0c7112a002	drm/i915: Rename priotree to sched Having moved the priotree struct into i915_scheduler.h, identify it as the scheduling element and rebrand into i915_sched. This becomes more useful as we start attaching more information we require to propagate through the scheduler. v2: Use i915_sched_node for future distinctiveness Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-2-chris@chris-wilson.co.uk	2018-04-18 21:09:09 +01:00
Mika Kuoppala	61bf9719fa	drm/i915/cnl: Use mmio access to context status buffer Evidence indicates that Cannonlake HWSP is not coherent as it should. Revert to using mmio access for now. Testcase: igt/gem_ctx_switch References: https://bugs.freedesktop.org/show_bug.cgi?id=105888 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180412145802.23313-1-mika.kuoppala@linux.intel.com	2018-04-13 10:11:17 +03:00
Oscar Mateo	7d3c425fef	drm/i915: Move a bunch of workaround-related code to its own file This has grown to be a sizable amount of code, so move it to its own file before we try to refactor anything. For the moment, we are leaving behind the WA BB code and the WAs that get applied (incorrectly) in init_clock_gating, but we will deal with it later. v2: Use intel_ prefix for code that deals with the hardware (Chris) v3: Rebased v4: - Rebased - New license header v5: - Rebased - Added some organisational notes to the file (Chris) v6: Include DOC section in the documentation build (Jani) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [ickle: appease checkpatch, mostly] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-1-git-send-email-oscar.mateo@intel.com	2018-04-11 22:47:01 +01:00
Chris Wilson	15c83c4364	drm/i915/execlists: Set queue priority from secondary port We can refine our current execlists->queue_priority if we inspect ELSP[1] rather than the head of the unsubmitted queue. Currently, we use the unsubmitted queue and say that if a subsequent request is more important than the current queue, we will rerun the submission tasklet to evaluate the need for preemption. However, we only want to preempt if we need to jump ahead of a currently executing request in ELSP. The second reason for running the submission tasklet is amalgamate requests into the active context on ELSP[0] to avoid a stall when ELSP[0] drains. (Though repeatedly amalgamating requests into the active context and triggering many lite-restore is off question gain, the goal really is to put a context into ELSP[1] to cover the interrupt.) So if instead of looking at the head of the queue, we look at the context in ELSP[1] we can answer both of the questions more accurately -- we don't need to rerun the submission tasklet unless our new request is important enough to feed into, at least, ELSP[1]. v2: Add some comments from the discussion with Tvrtko. v3: More commentary to cross-reference queue_request() References: `f6322eddaf` ("drm/i915/preemption: Allow preemption between submission ports") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180411103929.27374-1-chris@chris-wilson.co.uk	2018-04-11 17:55:56 +01:00
Chris Wilson	9040871336	drm/i915: Include submission tasklet state in engine dump For the off-chance we have an interrupt posted and haven't processed the CSB. v2: Include tasklet enable/disable state for good measure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180326115044.2505-4-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-27 12:13:20 +01:00
Kelvin Gardiner	d3d5792799	drm/i915/icl: Update subslice define for ICL 11 ICL 11 has a greater number of maximum subslices. This patch reflects this. v2: GEN11 updates to MCR_SELECTOR (Oscar) v3: Copypasta error in the new defines (Lionel) Bspec: 21139 BSpec: 21108 Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> (v1) Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180316121456.11577-3-mika.kuoppala@linux.intel.com Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-20 16:26:46 +02:00
Daniele Ceraolo Spurio	fa6f071d54	drm/i915: move gen8 irq shifts to intel_lrc.c The only usage outside the intel_lrc.c file is in the ringbuffer init, but the irq mask calculated there is then overwritten for all engines that have a non-zero shift, so we can drop it. This change is not aimed at code saving but at removing from intel_engines information that does not apply to all gens that have the engine. When checking without the temporary WARN_ON, code size is basically unchanged. v2: make the irq_shifts array static const v3: rebase, move irq_shifts array to logical_ring_default_irqs v4: move array inside the if and use u8 for it (Chris) Suggested-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-4-daniele.ceraolospurio@intel.com	2018-03-15 08:46:06 +00:00
Daniele Ceraolo Spurio	74419daaae	drm/i915: add a selftest for the mmio_bases table Check that the entries are in reverse gen order and that all entries with gen > 0 have an mmio base set. v2: loop forward, simplify logic, use i915_subtests (Chris) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-2-daniele.ceraolospurio@intel.com	2018-03-15 08:46:06 +00:00
Daniele Ceraolo Spurio	80b216b98b	drm/i915: store all mmio bases in intel_engines The mmio bases we're currently storing in the intel_engines array are only valid for a subset of gens, so we need to ignore them and use different values in some cases. Instead of doing that, we can have a table of [starting gen, mmio base] pairs for each engine in intel_engines and select the correct one based on the gen we're running on in a consistent way. v2: document that the list goes in reverse order, update starting gen for render (Chris) v3: starting gen for render back to 1 to make our life easier with selftests (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v2 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-1-daniele.ceraolospurio@intel.com	2018-03-15 08:46:06 +00:00
Chris Wilson	ab2681512b	drm/i915: Check rq->timeline before deference Not only is the context suspect to disappearing, but so is it's timeline. Under a lockless inspection of the requests for debugging from intel_engine_dump(), the context may already have been freed and we have to check before chasing the dangling pointer. [28033.681755] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel snd_pcm mei_me mei i915 r8169 mii prime_numbers i2c_hid [28033.681796] CPU: 3 PID: 3058 Comm: gem_exec_schedu Tainted: G U 4.16.0-rc5+ #9 [28033.681804] Hardware name: Acer Aspire E5-575G/Ironman_SK , BIOS V1.12 08/02/2016 [28033.681834] RIP: 0010:print_request+0x2b/0xb0 [i915] [28033.681840] RSP: 0018:ffffc90004afbc18 EFLAGS: 00010202 [28033.681847] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801921b5a40 RCX: 0000000000000006 [28033.681854] RDX: ffffc90004afbc60 RSI: ffff8801921b5a40 RDI: 0000000000000004 [28033.681861] RBP: ffffc90004afbd80 R08: 0000000000000000 R09: 0000000000000001 [28033.681868] R10: ffffc90004afbbd0 R11: ffffc90004afbc73 R12: ffffc90004afbc60 [28033.681875] R13: ffffc90004afbd80 R14: ffff8801d40ec670 R15: ffff8801921b5a40 [28033.681883] FS: 00007fbba5f6c8c0(0000) GS:ffff8801e8400000(0000) knlGS:0000000000000000 [28033.681891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [28033.681897] CR2: 00007fbba5f8f000 CR3: 00000001b2efa002 CR4: 00000000003606e0 [28033.681904] Call Trace: [28033.681932] intel_engine_print_registers+0x6a7/0x930 [i915] [28033.681962] intel_engine_dump+0x30d/0x740 [i915] [28033.681971] ? seq_printf+0x3a/0x50 [28033.681995] i915_engine_info+0xb8/0xe0 [i915] [28033.682003] ? drm_get_color_range_name+0x20/0x20 [28033.682010] seq_read+0xe1/0x440 [28033.682018] full_proxy_read+0x51/0x80 [28033.682025] __vfs_read+0x21/0x130 [28033.682031] ? do_sys_open+0x134/0x220 [28033.682037] ? kmem_cache_free+0x177/0x2b0 [28033.682043] vfs_read+0xa1/0x150 [28033.682049] SyS_read+0x40/0xa0 [28033.682055] do_syscall_64+0x6b/0x1b0 [28033.682063] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [28033.682069] RIP: 0033:0x7fbba4655d11 [28033.682074] RSP: 002b:00007ffd8c49da58 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [28033.682082] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbba4655d11 [28033.682089] RDX: 000000000000003f RSI: 00005647bfbfc260 RDI: 0000000000000006 [28033.682096] RBP: 000000000000003f R08: 00000000ffffffff R09: 0000000000000000 [28033.682104] R10: 0000000000000000 R11: 0000000000000246 R12: 00005647bfbfc260 [28033.682111] R13: 0000000000000006 R14: 0000000000000000 R15: 00005647bfbfc260 [28033.682119] Code: 41 55 41 54 49 89 d4 55 53 48 89 fd 48 8b 86 c8 00 00 00 48 8b 3d d6 1e 14 e2 48 89 f3 48 2b be a8 02 00 00 48 8b 80 b0 00 00 00 <4c> 8b 68 18 e8 bc 80 02 e1 8b 8b 70 02 00 00 8b b3 28 02 00 00 [28033.682206] RIP: print_request+0x2b/0xb0 [i915] RSP: ffffc90004afbc18 Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180314101630.8933-1-chris@chris-wilson.co.uk	2018-03-14 14:30:58 +00:00
Michal Wajdeczko	c578135145	drm/i915: Change parameters order in i915_gem_batch_pool_init Function i915_gem_batch_pool_init() failed to follow obj-verb naming schema. Fix that by swapping function parameters. While here, change license text to SPDX format. v2: use intel_engine_init_batch_pool (Chris) as proxy (Michal) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180308095037.18264-3-michal.wajdeczko@intel.com	2018-03-09 22:21:41 +00:00
Chris Wilson	ef5032a06a	drm/i915: Include ring->emit in debugging Include ring->emit and ring->space alongside ring->(head,tail) when printing debug information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-4-chris@chris-wilson.co.uk	2018-03-09 14:13:34 +00:00
Daniele Ceraolo Spurio	ac52da6af8	drm/i915/icl: new context descriptor support Starting from Gen11 the context descriptor format has been updated in the HW. The hw_id field has been considerably reduced in size and engine class and instance fields have been added. There is a slight name clashing issue because the field that we call hw_id is actually called SW Context ID in the specs for Gen11+. With the current size of the hw_id field we can have a maximum of 2k contexts at any time, but we could use the sw_counter field (which is sw defined) to increase that because the HW requirement is that engine_id + sw id + sw_counter is a unique number. GuC uses a similar method to support more contexts but does its tracking at lrc level. To avoid doing an implementation that will need to be reworked once GuC support lands, defer it for now and mark it as TODO. v2: rebased, add documentation, fix GEN11_ENGINE_INSTANCE_SHIFT v3: rebased, bring back lost code from i915_gem_context.c v4: make TODO comment more generic v5: be consistent with bit ordering, add extra checks (Chris) Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-3-mika.kuoppala@linux.intel.com Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-07 15:07:20 +02:00
Oscar Mateo	5f79e7c675	drm/i915/icl: Correctly initialize the Gen11 engines Gen11 has up to 4 VCS and up to 2 VECS engines, this patch adds mmio base definitions for all of them. Bspec: 20944 Bspec: 7021 v2: Set the correct mmio_base in intel_engines_init_mmio; updating the base mmio values any later would cause incorrect reads in i915_gem_sanitize (Michel). Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-2-mika.kuoppala@linux.intel.com Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-03-07 15:07:04 +02:00
Chris Wilson	367a35a6c6	drm/i915: Don't deref request->ctx inside unlocked print_request() Although we protect the request itself, we don't lock inside intel_engine_dump() and so the request maybe retired as we peek into it. One consequence is that the request->ctx may be freed before we dereference it, leading to a use-after-free. Replace the hw_id we are peeking from inside request->ctx with the request->fence.context, with which we can still track from which context the request originated (although to tie to HW reports requires a little more legwork, but is good enough to follow the GEM traces). [52640.729670] general protection fault: 0000 [#2] SMP [52640.729694] Dumping ftrace buffer: [52640.729701] (ftrace buffer empty) [52640.729705] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_\ temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec snd_hwdep gha\ sh_clmulni_intel snd_hda_core snd_pcm mei_me mei i915 r8169 mii prime_numbers i2c_hid [52640.729748] CPU: 2 PID: 4335 Comm: gem_exec_schedu Tainted: G UD W 4.16.0-rc3+ #7 [52640.729759] Hardware name: Acer Aspire E5-575G/Ironman_SK , BIOS V1.12 08/02/2016 [52640.729803] RIP: 0010:print_request+0x2b/0xb0 [i915] [52640.729811] RSP: 0018:ffffc90001453c18 EFLAGS: 00010206 [52640.729820] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8801e0292d40 RCX: 0000000000000006 [52640.729829] RDX: ffffc90001453c60 RSI: ffff8801e0292d40 RDI: 0000000000000003 [52640.729838] RBP: ffffc90001453d80 R08: 0000000000000000 R09: 0000000000000001 [52640.729847] R10: ffffc90001453bd0 R11: ffffc90001453c73 R12: ffffc90001453c60 [52640.729856] R13: ffffc90001453d80 R14: ffff8801d5a683c8 R15: ffff8801e0292d40 [52640.729866] FS: 00007f1ee50548c0(0000) GS:ffff8801e8200000(0000) knlGS:0000000000000000 [52640.729876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [52640.729884] CR2: 00007f1ee5077000 CR3: 00000001d9411004 CR4: 00000000003606e0 [52640.729893] Call Trace: [52640.729922] intel_engine_print_registers+0x623/0x890 [i915] [52640.729948] intel_engine_dump+0x4a3/0x590 [i915] [52640.729957] ? seq_printf+0x3a/0x50 [52640.729977] i915_engine_info+0xb8/0xe0 [i915] [52640.729984] ? drm_mode_gamma_get_ioctl+0xf0/0xf0 [52640.729990] seq_read+0xd5/0x410 [52640.729997] full_proxy_read+0x4b/0x70 [52640.730004] __vfs_read+0x1e/0x120 [52640.730009] ? do_sys_open+0x134/0x220 [52640.730015] ? kmem_cache_free+0x174/0x2b0 [52640.730021] vfs_read+0xa1/0x150 [52640.730026] SyS_read+0x40/0xa0 [52640.730032] do_syscall_64+0x65/0x1a0 [52640.730038] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180228094732.28462-1-chris@chris-wilson.co.uk	2018-02-28 14:16:42 +00:00
Chris Wilson	f6322eddaf	drm/i915/preemption: Allow preemption between submission ports Sometimes we need to boost the priority of an in-flight request, which may lead to the situation where the second submission port then contains a higher priority context than the first and so we need to inject a preemption event. To do so we must always check inside execlists_dequeue() whether there is a priority inversion between the ports themselves as well as the head of the priority sorted queue, and we cannot just skip dequeuing if the queue is empty. As Michał noted, this doesn't simply extend to handling more than 2-port submission, as we may need to reorder within the array of executing requests which themselves are lower priority than the first. A task for later! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180222142229.14517-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2018-02-23 16:37:40 +00:00
Chris Wilson	e61e0f51ba	drm/i915: Rename drm_i915_gem_request to i915_request We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2018-02-21 20:57:22 +00:00
Chris Wilson	3ceda3a4a8	drm/i915: Hold rpm wakeref for printing the engine's register state When dumping the engine, we print out the current register values. This requires the rpm wakeref. If the device is alseep, we can assume the engine is asleep (and the register state is uninteresting) so skip and only acquire the rpm wakeref if the device is already awake. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180212102415.24246-1-chris@chris-wilson.co.uk	2018-02-12 13:33:33 +00:00
Chris Wilson	74d00d28a1	drm/i915: Don't wake the device up to check if the engine is asleep If the entire device is powered off, we can safely assume that the engine is also asleep (and idle). Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `a091d4ee93` ("drm/i915: Hold a wakeref for probing the ring registers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180212093928.6005-1-chris@chris-wilson.co.uk	2018-02-12 10:59:01 +00:00
Chris Wilson	8e47b4b65b	drm/i915: Remove redundant check on execlists interrupt Since commit `4a118ecbe9` ("drm/i915: Filter out spurious execlists context-switch interrupts") we probe execlists->active, and no longer have to peek at the execlist interrupt to determine if the tasklet still needs to be run to drain the ELSP. References: `4a118ecbe9` ("drm/i915: Filter out spurious execlists context-switch interrupts") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180208151224.16285-1-chris@chris-wilson.co.uk	2018-02-08 18:30:01 +00:00
Chris Wilson	d637637491	drm/i915: Only allocate preempt context when required If we remove some hardcoded assumptions about the preempt context having a fixed id, reserved from use by normal user contexts, we may only allocate the i915_gem_context when required. Then the subsequent decisions on using preemption reduce to having the preempt context available. v2: Include an assert that we don't allocate the preempt context twice. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-3-chris@chris-wilson.co.uk Reviewed-by: Michel Thierry <michel.thierry@intel.com>	2018-02-08 07:30:16 +00:00
Tvrtko Ursulin	ae504be2e0	drm/i915: Downgrade incorrect engine constructor usage warnings to development Render engine constructor helpers must only be called from the render engine constructors, but there is no need to burden the production binaries with warnings which can only be triggered during development. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180119100005.9072-1-tvrtko.ursulin@linux.intel.com	2018-01-22 17:15:20 +00:00
Tvrtko Ursulin	b86aa4458a	drm/i915/icl: Gen11 render context size Gen11 removes the Resource Streamer, which frees up a big chunk of the context image. BSpec indicates 12544 DWORDs (13 pages), plus one page for PPHWSP. Please notice that, when looking at the BSpec context image table, the right filter has to be applied as some rows are excluded for specific GENs. Also, some rows apply per-subslice (for the calculation above, we have supposed I915_MAX_SUBSLICES = 8). v2: Rebase. v3: Use the right size as per the BSpec. v4: - Rebased on top of the default context size (Rodrigo) - Clarify in the commit message where the subslice calculation comes from. v5: s/12538/12544/ (Daniele) BSpec: 18907 Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Ben Widawsky <benjamin.widawsky@intel.com> (older version) Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1515711307-28979-2-git-send-email-oscar.mateo@intel.com	2018-01-19 18:13:33 -02:00
Oscar Mateo	7ab4adbd92	drm/i915: Return a default RCS context size Instead of returning whatever size the latest GEN used. This is because context sizes for new GENs can go up or down, but the only safe thing to do for missing cases is to use the largest known one, whatever that is. Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1515711307-28979-1-git-send-email-oscar.mateo@intel.com	2018-01-19 18:09:47 -02:00
Chris Wilson	99e48bf98d	drm/i915: Lock out execlist tasklet while peeking inside for busy-stats In order to prevent a race condition where we may end up overaccounting the active state and leaving the busy-stats believing the GPU is 100% busy, lock out the tasklet while we reconstruct the busy state. There is no direct spinlock guard for the execlists->port[], so we need to utilise tasklet_disable() as a synchronous barrier to prevent it, the only writer to execlists->port[], from running at the same time as the enable. Fixes: `4900727d35` ("drm/i915/pmu: Reconstruct active state on starting busy-stats") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180115092041.13509-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2018-01-15 12:18:46 +00:00
Chris Wilson	4900727d35	drm/i915/pmu: Reconstruct active state on starting busy-stats We have a hole in our busy-stat accounting if the pmu is enabled during a long running batch, the pmu will not start accumulating busy-time until the next context switch. This then fails tests that are only sampling a single batch. v2: Count each active port just once (context in/out events are only on the first and last assignment to a port). v3: Avoid hardcoding knowledge of 2 submission ports Fixes: `30e17b7847` ("drm/i915: Engine busy time tracking") Testcase: igt/perf_pmu/busy-start Testcase: igt/perf_pmu/busy-double-start Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180111073031.14614-1-chris@chris-wilson.co.uk	2018-01-11 14:57:27 +00:00
Kenneth Graunke	ab062639ed	drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. Geminilake requires the 3D driver to select whether barriers are intended for compute shaders, or tessellation control shaders, by whacking a "Barrier Mode" bit in SLICE_COMMON_ECO_CHICKEN1 when switching pipelines. Failure to do this properly can result in GPU hangs. Unfortunately, this means it needs to switch mid-batch, so only userspace can properly set it. To facilitate this, the kernel needs to whitelist the register. The workarounds page currently tags this as applying to Broxton only, but that doesn't make sense. The documentation for the register it references says the bit userspace is supposed to toggle only exists on Geminilake. Empirically, the Mesa patch to toggle this bit appears to fix intermittent GPU hangs in tessellation control shader barrier tests on Geminilake; we haven't seen those hangs on Broxton. v2: Mention WA #0862 in the comment (it doesn't have a name). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180105085905.9298-1-kenneth@whitecape.org	2018-01-05 09:42:33 -08:00
Chris Wilson	c1bf272857	drm/i915: Show HWSP in intel_engine_dump() Looking at a CI failure with an ominous line of [ 362.550715] hangcheck current seqno ffffff6b, last ffffff8c, hangcheck ffffff6b [6016 ms], inflight 118 with no apparent cause for the seqno to be negative, left me wondering if someone had scribbled over the HWSP. So include the HWSP in the engine dump to see if there are more signs of random scribbling. v2: Fix row pointer, i is now incremented by 8 so doesn't need scaling by 8, and we don't need to keep volatile here as the status_page isn't marked up as volatile itself. v3: Use hexdump, with suppression of identical lines. (Tvrtko) Which results in HWSP: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 * 00000040 00000001 00000000 00000018 00000002 00000001 00000000 00000018 00000000 00000060 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000003 00000080 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 * 000000c0 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 000000e0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 * instead of 128 lines of mostly 0s. v4: Tidy up the locals Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171222182521.18106-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-12-22 19:02:52 +00:00
Rafael Antognolli	a2b1658857	drm/i915: Implement WaDisableEarlyEOT. There seems to be another clock gating issue which the workaround is described as: "WA: Set 0xE4F0[1] = 1 to disable Early EOT of thread." Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171216001117.14232-2-rafael.antognolli@intel.com	2017-12-19 13:43:26 -08:00
Chris Wilson	a0cf579080	drm/i915: Show IPEIR and IPEHR in the engine dump A useful bit of information for inspecting GPU stalls from intel_engine_dump() are the error registers, IPEIR and IPEHR. v2: Fixup gen changes in register offsets (Tvrtko) v3: Old FADDR location as well v4: Use I915_READ64_2x32 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171218123914.19027-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-12-18 13:04:34 +00:00
Chris Wilson	d7dc4131eb	drm/i915: Don't check #active_requests from i915_gem_wait_for_idle() i915_gem_wait_for_idle() is called from inside the shrinker, to ensure that we drain the last resources from the GPU in dire circumstances (OOM). As we may allocate whilst building a request, it is then possible to hit the shrinker with a request under construction, and so we must account for the incomplete request whilst waiting. In particular, we preincrement (in reserve_engine) the i915->gt.active_requests counter and mark the GPU as busy, therefore we can not use that counter for shortcircuiting the wait-for-idle. [ 950.859024] GEM_BUG_ON(i915->gt.active_requests) [ 950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0 [ 950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211 [ 950.859082] snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core [ 950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G U 4.15.0-rc2+ #900 [ 950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010 [ 950.859107] task: c5119cb4 task.stack: f3ccb8d8 [ 950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0 [ 950.859113] EFLAGS: 00010296 CPU: 2 [ 950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007 [ 950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938 [ 950.859116] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0 [ 950.859118] Call Trace: [ 950.859125] ? drm_printk+0x70/0x70 [ 950.859129] i915_gem_wait_for_idle+0x18/0x30 [ 950.859133] i915_gem_shrink+0x360/0x410 [ 950.859138] ? vmpressure+0xa8/0xf0 [ 950.859142] ? ktime_get+0x4a/0x100 [ 950.859147] i915_gem_shrink_all+0x21/0x40 [ 950.859151] i915_gem_shrinker_oom+0x23/0x130 [ 950.859156] notifier_call_chain+0x4e/0x70 [ 950.859160] __blocking_notifier_call_chain+0x2f/0x60 [ 950.859164] blocking_notifier_call_chain+0x11/0x20 [ 950.859169] out_of_memory+0x207/0x280 [ 950.859174] __alloc_pages_nodemask+0xd47/0xe60 [ 950.859179] new_slab+0x32d/0x450 [ 950.859183] ___slab_alloc.constprop.81+0x358/0x4e0 [ 950.859189] ? i915_sw_fence_await_dma_fence+0x53/0x160 [ 950.859193] ? __slab_free+0x1fe/0x310 [ 950.859197] ? native_sched_clock+0x1e/0xc0 [ 950.859201] ? i915_gem_request_alloc+0xcf/0x510 [ 950.859205] ? sched_clock+0x9/0x10 [ 950.859209] __slab_alloc.constprop.80+0x29/0x40 [ 950.859212] ? __slab_alloc.constprop.80+0x29/0x40 [ 950.859216] kmem_cache_alloc_trace+0x160/0x1a0 [ 950.859220] ? i915_sw_fence_await_dma_fence+0x53/0x160 [ 950.859224] i915_sw_fence_await_dma_fence+0x53/0x160 [ 950.859229] i915_gem_request_await_dma_fence+0x1eb/0x390 [ 950.859233] i915_gem_request_await_object+0xee/0x230 [ 950.859239] i915_gem_do_execbuffer+0xc16/0x1200 [ 950.859246] ? irqtime_account_irq+0x3e/0xc0 [ 950.859251] ? irq_exit+0x4f/0xb0 [ 950.859257] ? smp_apic_timer_interrupt+0x5f/0x110 [ 950.859261] ? apic_timer_interrupt+0x35/0x3c [ 950.859266] i915_gem_execbuffer2_ioctl+0x212/0x440 [ 950.859270] ? apic_timer_interrupt+0x35/0x3c [ 950.859274] ? i915_gem_do_execbuffer+0x1200/0x1200 [ 950.859279] ? insn_get_seg_base+0x1b/0x50 [ 950.859283] ? i915_gem_do_execbuffer+0x1200/0x1200 [ 950.859287] drm_ioctl_kernel+0x51/0xa0 [ 950.859291] drm_ioctl+0x2a3/0x350 [ 950.859294] ? i915_gem_do_execbuffer+0x1200/0x1200 [ 950.859300] ? sched_clock+0x9/0x10 [ 950.859303] ? drm_getunique+0x70/0x70 [ 950.859308] do_vfs_ioctl+0x7d/0x640 [ 950.859311] ? native_sched_clock+0x1e/0xc0 [ 950.859315] ? sched_clock+0x9/0x10 [ 950.859319] ? sched_clock_cpu+0x13/0x120 [ 950.859323] SyS_ioctl+0x4e/0x80 [ 950.859326] do_fast_syscall_32+0x75/0x250 [ 950.859331] ? irq_exit+0x4f/0xb0 [ 950.859334] entry_SYSENTER_32+0x47/0x71 [ 950.859338] EIP: 0xb7f81d11 [ 950.859339] EFLAGS: 00000296 CPU: 2 [ 950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20 [ 950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38 [ 950.859341] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b [ 950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212132148.8124-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-12-13 11:15:38 +00:00
Chris Wilson	d5acadfe7d	drm/i915: Stop showing seqno info from debugfs/i915_interrupt_info Since the seqno information shown from i915_interrupt_info is just a small subset of i915_engine_info, remove it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171209104418.4223-2-chris@chris-wilson.co.uk	2017-12-11 09:14:48 +00:00
Chris Wilson	2d8d1afb4d	drm/i915: Add is-wedged flag to intel_engine_dump() Comparing the state tested by intel_engine_is_idle() and printed by intel_engine_dump(), the only bit not shown is whether or not the device is wedged. Add that little bit of information to the pretty printer so that if the engine fails to idle we can see why. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171208012303.25504-5-chris@chris-wilson.co.uk	2017-12-08 18:48:38 +00:00
Chris Wilson	528dd16a7c	drm/i915: Include the global reset count for intel_engine_dump() Since a global reset affects the engine, include that along side the per-engine reset counter when pretty printing the engine state in intel_engine_dump(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171208012303.25504-4-chris@chris-wilson.co.uk	2017-12-08 18:48:37 +00:00
Chris Wilson	832265d38c	drm/i915: Include engine state on detecting a missed breadcrumb/seqno Now that we have a common engine state pretty printer, we can use that instead of the adhoc information printed when we miss a breadcrumb. v2: Rearrange intel_engine_disarm_breadcrumbs() to avoid calling intel_engine_dump() under the rb spinlock (Mika) and to pretty-print the error state early so that we include the full list of waiters. v3: Pass missed breadcrumb msg to pretty-printer as the header v4: Preserve DRM_DEBUG_DRIVER filtering. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171208012303.25504-3-chris@chris-wilson.co.uk	2017-12-08 18:48:36 +00:00
Chris Wilson	0db18b17c8	drm/i915: Make engine state pretty-printer header configurable Pass in a format string (and args) to specify the header to be emitted along with the engine state when pretty-printing. This allows the header to be emitted inside the drm_printer stream, so sharing the same prefix and output characteristics (e.g. debug level and filtering). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171208012303.25504-2-chris@chris-wilson.co.uk	2017-12-08 18:48:34 +00:00
Chris Wilson	e8a70cab25	drm/i915: Use snprintf to avoid line-break when pretty-printing engines When printing the execlist ports, we first print the ELSP header then follow it with the pretty-printed request. Since switching to drm_printer and show the output via printk, it automatically appends a newline to each call (unlike the old seq_printf output). To avoid the unwanted line break, construct the ELSP request header in a temporary buffer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171208012303.25504-1-chris@chris-wilson.co.uk	2017-12-08 18:48:34 +00:00
Valtteri Rantala	7436830c8d	drm/i915/glk: Apply WaProgramL3SqcReg1DefaultForPerf for GLK too Testing the texture read performance shows that the same tuning for the SQ credits is needed on GLK as on BXT/APL. This has been also confirmed by Altug from the HW team. V4: Rebase + fix Signed-off-by: Valtteri Rantala <valtteri.rantala@intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> (v1) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1511880305-12166-1-git-send-email-valtteri.rantala@intel.com	2017-11-30 13:32:51 +02:00
Tvrtko Ursulin	cf669b4e9f	drm/i915: Consolidate checks for engine stats availability Sagar noticed the check can be consolidated between the engine stats implementation and the PMU. My first choice was a static inline helper but that got into include ordering mess quickly fast so I went with a macro instead. At some point we should perhaps looking into taking out the non-ringubffer bits from intel_ringbuffer.h into a new intel_engine.h or something. v2: Use engine->flags. (Chris Wilson) v3: Rebase and mark GuC as not yet supported. (Chris Wilson) v4: Move flag setting to intel_engines_reset_default_submission. (Chris Wilson) v5: Move flag setting to logical_ring_setup. v6: intel_engines_reset_default_submission is the wrong place to set the flag - it needs to be in execlists_set_default_submission. (Sagar) v7: Flag setting in logical_ring_setup is not required. (Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v6) Link: https://patchwork.freedesktop.org/patch/msgid/20171129102805.22690-1-tvrtko.ursulin@linux.intel.com	2017-11-29 12:29:40 +00:00

1 2 3 4 5 ...

287 Commits