Commit Graph

109 Commits

Author SHA1 Message Date
Chris Wilson
c7c6e46f91 drm/i915: Convert execbuf to use struct-of-array packing for critical fields
When userspace is doing most of the work, avoiding relocs (using
NO_RELOC) and opting out of implicit synchronisation (using ASYNC), we
still spend a lot of time processing the arrays in execbuf, even though
we now should have nothing to do most of the time. One issue that
becomes readily apparent in profiling anv is that iterating over the
large execobj[] is unfriendly to the loop prefetchers of the CPU and it
much prefers iterating over a pair of arrays rather than one big array.

v2: Clear vma[] on construction to handle errors during vma lookup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-3-chris@chris-wilson.co.uk
2017-08-18 11:57:36 +01:00
Chris Wilson
2889caa923 drm/i915: Eliminate lots of iterations over the execobjects array
The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.

Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.

The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.

The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.

v2: Add a Theory of Operation spiel.
v3: Fall back to slow relocations in preparation for flushing userptrs.
v4: Document struct members, factor out eb_validate_vma(), add a few
more comments to explain some magic and hide other magic behind macros.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16 16:54:05 +01:00
Chris Wilson
8c45cec48e drm/i915: Split vma exec_link/evict_link
Currently the vma has one link member that is used for both holding its
place in the execbuf reservation list, and in any eviction list. This
dual property is quite tricky and error prone.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-3-chris@chris-wilson.co.uk
2017-06-15 10:53:26 +01:00
Chris Wilson
d55495b4dc drm/i915: Use vma->exec_entry as our double-entry placeholder
This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-2-chris@chris-wilson.co.uk
2017-06-15 10:52:58 +01:00
Chris Wilson
72022a705e drm/i915: Move retire-requests into i915_gem_wait_for_idle()
As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-31 12:03:46 +01:00
Matthew Auld
fe65cbdbc9 drm/i915: use correct node for handling cache domain eviction
It looks like we were incorrectly comparing vma->node against itself
instead of the target node, when evicting for a node on systems where we
need guard pages between regions with different cache domains. As a
consequence we can end up trying to needlessly evict neighbouring nodes,
even if they have the same cache domain, and if they were pinned we
would fail the eviction.

Fixes: 625d988acc ("drm/i915: Extract reserving space in the GTT to a helper")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170306235414.23407-3-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-03-09 08:42:35 +00:00
Chris Wilson
381b943b07 drm/i915: Remove i915_address_space.start
Once upon a time, back in the UMS days, we supported userspace
initialising the GTT and sharing portions of the GTT with other users.
Now, we own the GTT (both global and per-process) and the tables always
start at 0 - so we can remove i915_address_space.start and forget about
this old complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-20-chris@chris-wilson.co.uk
2017-02-15 10:07:32 +00:00
Chris Wilson
f40a7b7558 drm/i915: Initial selftests for exercising eviction
Very simple tests to just ask eviction to find some free space in a full
GTT and one with some available space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-41-chris@chris-wilson.co.uk
2017-02-13 20:46:47 +00:00
Daniel Vetter
51a831a772 Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Chris Wilson needs the new drm_driver->release callback to make sure
the shiny new dma-buf testcases don't oops the driver on unload.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-02-10 16:27:24 +01:00
Chris Wilson
a6508ded2a drm/i915: Use page coloring to provide the guard page at the end of the GTT
As we now mark the reserved hole (drm_mm.head_node) with the special
UNEVICTABLE color, we can use the page coloring to avoid prefetching of
the CS beyond the end of the GTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206084547.27921-3-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-02-06 13:46:40 +00:00
Chris Wilson
4e64e5539d drm: Improve drm_mm search (and fix topdown allocation) with rbtrees
The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.

In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.

v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
2017-02-03 11:10:32 +01:00
Chris Wilson
16ee20619f drm/i915: Detect vma reserved for execbuf in evict-for-node
The vma->exec_list is still the only means we have for both reserving an
object in execbuf, and for constructing the eviction list. So during the
construction of the eviction list, we must treat anything already on the
exec_list as being pinned.

Yes, this sharing of two semantically different lists will be fixed! But
in the meantime, we have the issue that this is tripping up CI since we
started using i915_gem_gtt_reserve_node() + i915_gem_evict_for_node()
from the regular execbuf reservation path in commit 606fec956c
("drm/i915: Prefer random replacement before eviction search"):

[  108.424063] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:254!
[  108.424072] invalid opcode: 0000 [#1] PREEMPT SMP
[  108.424079] Modules linked in: snd_hda_intel i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me snd_pcm lpc_ich mei sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915]
[  108.424132] CPU: 1 PID: 6865 Comm: gem_cs_tlb Tainted: G     U          4.10.0-rc3-CI-CI_DRM_2049+ #1
[  108.424143] Hardware name: Hewlett-Packard HP EliteBook 8440p/172A, BIOS 68CCU Ver. F.24 09/13/2013
[  108.424154] task: ffff88012ae22600 task.stack: ffffc90000a14000
[  108.424220] RIP: 0010:i915_gem_evict_for_node+0x237/0x410 [i915]
[  108.424229] RSP: 0018:ffffc90000a17a58 EFLAGS: 00010202
[  108.424237] RAX: 0000000000005871 RBX: ffff88012d1ad778 RCX: 0000000000000000
[  108.424246] RDX: 000000007ffff000 RSI: ffffc90000a17a68 RDI: ffff880127e694d8
[  108.424255] RBP: ffffc90000a17aa0 R08: ffffc90000a17a68 R09: 0000000000000000
[  108.424264] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000080000000
[  108.424273] R13: ffffc90000a17a68 R14: ffff880127e694d8 R15: ffffffffa0387330
[  108.424283] FS:  00007f8236e3d8c0(0000) GS:ffff880137c40000(0000) knlGS:0000000000000000
[  108.424293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  108.424305] CR2: 00007f82347a2000 CR3: 000000012c866000 CR4: 00000000000006e0
[  108.424317] Call Trace:
[  108.424368]  i915_gem_gtt_reserve+0x67/0x80 [i915]
[  108.424424]  __i915_vma_do_pin+0x248/0x620 [i915]
[  108.424487]  ? __i915_vma_do_pin+0x162/0x620 [i915]
[  108.424540]  i915_gem_execbuffer_reserve_vma.isra.8+0x153/0x1f0 [i915]
[  108.424591]  i915_gem_execbuffer_reserve.isra.9+0x40e/0x440 [i915]
[  108.424643]  i915_gem_do_execbuffer.isra.15+0x6d9/0x1b20 [i915]
[  108.424696]  i915_gem_execbuffer2+0xc0/0x250 [i915]
[  108.424712]  drm_ioctl+0x200/0x450
[  108.424760]  ? i915_gem_execbuffer+0x330/0x330 [i915]
[  108.424776]  do_vfs_ioctl+0x90/0x6e0
[  108.424789]  ? up_read+0x1a/0x40
[  108.424800]  ? trace_hardirqs_on_caller+0x122/0x1b0
[  108.424813]  SyS_ioctl+0x3c/0x70
[  108.424828]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  108.424839] RIP: 0033:0x7f8235867357
[  108.424848] RSP: 002b:00007ffdc14504c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  108.424866] RAX: ffffffffffffffda RBX: 00007ffdc1450600 RCX: 00007f8235867357
[  108.424878] RDX: 00007ffdc14505a0 RSI: 0000000040406469 RDI: 0000000000000003
[  108.424890] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000022
[  108.424903] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000002
[  108.424915] R13: 0000000000419101 R14: 00007ffdc1450600 R15: 00007ffdc14505f0
[  108.424928] Code: 45 b8 8b 4d c0 4c 89 f2 48 89 de ff d0 49 8b 07 4c 8b 45 b8 48 85 c0 75 dd 65 ff 0d d4 a1 c8 5f 0f 84 47 01 00 00 e9 0d fe ff ff <0f> 0b 45 31 f6 4c 8b 65 c8 49 8b 04 24 4d 39 ec 49 8d 9c 24 28
[  108.425055] RIP: i915_gem_evict_for_node+0x237/0x410 [i915] RSP: ffffc90000a17a58

Fixes: 172ae5b4c8 ("drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)")
Fixes: 606fec956c ("drm/i915: Prefer random replacement before eviction search")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111182132.19174-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-12 07:51:49 +00:00
Chris Wilson
625d988acc drm/i915: Extract reserving space in the GTT to a helper
Extract drm_mm_reserve_node + calling i915_gem_evict_for_node into its
own routine so that it can be shared rather than duplicated.

v2: Kerneldoc

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: igvt-g-dev@lists.01.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-2-chris@chris-wilson.co.uk
2017-01-11 12:28:13 +00:00
Chris Wilson
f51455d442 drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-10 20:54:32 +00:00
Chris Wilson
121dfbb2a2 drm/i915: Clear ret before unbinding in i915_gem_evict_something()
Missed when rebasing patches, I failed to set ret to zero before
starting the unbind loop (which depends upon ret being zero).

Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
Fixes: 9332f3b1b9 ("drm/i915: Combine loops within i915_gem_evict_something")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105155940.10033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Cc: <stable@vger.kernel.org> # v4.9+
2017-01-05 16:44:51 +00:00
Daniel Vetter
ef426c1038 Merge tag 'drm-misc-next-2016-12-30' of git://anongit.freedesktop.org/git/drm-misc into drm-intel-next-queued
Directly merge drm-misc into drm-intel since Dave is on vacation and
we need the various drm-misc patches (fb format rework, drm mm fixes,
selftest framework and others). Also pulled back -rc2 in first to
resync with drm-intel-fixes and make sure I can reuse the exact rerere
solutions from drm-tip for safety, and because I'm lazy.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-01-04 11:41:10 +01:00
Chris Wilson
3fa489dabe drm: Apply tight eviction scanning to color_adjust
Using mm->color_adjust makes the eviction scanner much tricker since we
don't know the actual neighbours of the target hole until after it is
created (after scanning is complete). To work out whether we need to
evict the neighbours because they impact upon the hole, we have to then
check the hole afterwards - requiring an extra step in the user of the
eviction scanner when they apply color_adjust.

v2: Massage kerneldoc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-34-chris@chris-wilson.co.uk
2016-12-28 13:23:06 +01:00
Chris Wilson
0b04d474a6 drm: Compute tight evictions for drm_mm_scan
Compute the minimal required hole during scan and only evict those nodes
that overlap. This enables us to reduce the number of nodes we need to
evict to the bare minimum.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-31-chris@chris-wilson.co.uk
2016-12-28 11:50:55 +01:00
Chris Wilson
2c4b389518 drm: Unconditionally do the range check in drm_mm_scan_add_block()
Doing the check is trivial (low cost in comparison to overall eviction)
and helps simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-29-chris@chris-wilson.co.uk
2016-12-28 11:50:28 +01:00
Chris Wilson
9a71e27788 drm: Extract struct drm_mm_scan from struct drm_mm
The scan state occupies a large proportion of the struct drm_mm and is
rarely used and only contains temporary state. That makes it suitable to
moving to its struct and onto the stack of the callers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Fix up etnaviv to compile, was missing a BUG_ON.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-12-27 16:44:13 +01:00
Chris Wilson
7155b057c6 drm/i915: Retire before attempting to evict from the active lists
Some object retain an extra pin whilst they are active (e.g. contexts).
This excludes them from being considered for eviction unless we idle the
GPU. If before we look at the active list, we retire beforehand we can
hopefully remove a few excess pins and reduce the amount of searching
required.

v2: Similar principle applies to evict_for_vma

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161209150555.602-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-12-12 12:25:32 +00:00
Chris Wilson
172ae5b4c8 drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)
Soft-pinning depends upon being able to check for availabilty of an
interval and evict overlapping object from a drm_mm range manager very
quickly. Currently it uses a linear list, and so performance is dire and
not suitable as a general replacement. Worse, the current code will oops
if it tries to evict an active buffer.

It also helps if the routine reports the correct error codes as expected
by its callers and emits a tracepoint upon use.

For posterity since the wrong patch was pushed (i.e. that missed these
key points and had known bugs), this is the changelog that should have
been on commit 506a8e87d8 ("drm/i915: Add soft-pinning API for
execbuffer"):

Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
* if the user supplies a virtual address via the execobject->offset
  *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
  that object is placed at that offset in the address space selected
  by the context specifier in execbuffer.
* the location must be aligned to the GTT page size, 4096 bytes
* as the object is placed exactly as specified, it may be used by this
  execbuffer call without relocations pointing to it

It may fail to do so if:
* EINVAL is returned if the object does not have a 4096 byte aligned
  address
* the object conflicts with another pinned object (either pinned by
  hardware in that address space, e.g. scanouts in the aliasing ppgtt)
  or within the same batch.
  EBUSY is returned if the location is pinned by hardware
  EINVAL is returned if the location is already in use by the batch
* EINVAL is returned if the object conflicts with its own alignment (as meets
  the hardware requirements) or if the placement of the object does not fit
  within the address space

All other execbuffer errors apply.

Presence of this execbuf extension may be queried by passing
I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
a reported value of 1 (or greater).

v2: Combine the hole/adjusted-hole ENOSPC checks
v3: More color, more splitting, more blurb.

Fixes: 506a8e87d8 ("drm/i915: Add soft-pinning API for execbuffer")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk
2016-12-05 20:49:17 +00:00
Chris Wilson
49d73912cb drm/i915: Convert vm->dev backpointer to vm->i915
99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-11-29 11:38:00 +00:00
Chris Wilson
80b204bce8 drm/i915: Enable multiple timelines
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.

v2: Add a comment to indicate the xfer between timelines upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
2016-10-28 20:53:57 +01:00
Chris Wilson
4c7d62c6b8 drm/i915: Markup GEM API with lockdep asserts
Add lockdep_assert_held(struct_mutex) to the API preamble of the
internal GEM interfaces.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk
2016-10-28 20:53:45 +01:00
Chris Wilson
275f039db5 drm/i915: Move user fault tracking to a separate list
We want to decouple RPM and struct_mutex, but currently RPM has to walk
the list of bound objects and remove userspace mmapping before we
suspend (otherwise userspace may continue to access the GTT whilst it is
powered down). This currently requires the struct_mutex to walk the
bound_list, but if we move that to a separate list and lock we can take
the first step towards removing the struct_mutex.

v2: Split runtime suspend unmapping vs regular unmapping, to make the
locking (and barriers) clearer. Add the object to the userfault_list
prior to inserting the first PTE, the race between add/revoke depends
upon struct_mutex for regular unmappings and rpm for runtime-suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk
2016-10-24 13:45:35 +01:00
Akash Goel
3b3f1650b1 drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
	struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
	struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.

There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).

v2:
- Remove the engine iterator field added in drm_i915_private structure,
  instead pass a local iterator variable to the for_each_engine**
  macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
  NULL pointer check on engine pointer. (Chris)

v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
  can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
  engine specific init is done later in Driver load sequence.

v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().

v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
  allocation of intel_engine_cs structure. (Chris)

v6:
- Rebase.

v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.

v8: Rebase.

v9: Rebase.

v10:
- For index calculation use engine ID instead of pointer based arithmetic in
  intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
  check for NULL engine pointer in cleanup() routines. (Joonas)

v11: Rebase.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 09:58:43 +01:00
Chris Wilson
22dd3bb919 drm/i915: Mark up all locked waiters
In the next patch we want to handle reset directly by a locked waiter in
order to avoid issues with returning before the reset is handled. To
handle the reset, we must first know whether we hold the struct_mutex.
If we do not hold the struct_mtuex we can not perform the reset, but we do
not block the reset worker either (and so we can just continue to wait for
request completion) - otherwise we must relinquish the mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Chris Wilson
ea746f3659 drm/i915: Expand bool interruptible to pass flags to i915_wait_request()
We need finer control over wakeup behaviour during i915_wait_request(),
so expand the current bool interruptible to a bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk
2016-09-09 14:23:03 +01:00
Chris Wilson
821188778b drm/i915: Choose not to evict faultable objects from the GGTT
Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a page-fault-of-doom as we ping-pong between evicting two
objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk
2016-08-18 22:36:55 +01:00
Chris Wilson
dcff85c844 drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.

As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).

v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
2016-08-05 10:54:37 +01:00
Chris Wilson
20dfbde463 drm/i915: Wrap vma->pin_count accessors with small inline helpers
In the next few patches, the VMA pinning API is overhauled and to reduce
the churn we pull out the update to the accessors into a prep patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:58 +01:00
Chris Wilson
2ffffd0f85 drm/i915: Fix up vma alignment to be u64
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.

v2: long long casts not required for u64 printk (%llx)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:52 +01:00
Chris Wilson
e522ac2324 drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something()
Eviction is VM local, so we can ignore the significance of the
drm_device in the caller, and leave it to i915_gem_evict_something() to
manage itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-2-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:50 +01:00
Chris Wilson
9332f3b1b9 drm/i915: Combine loops within i915_gem_evict_something
Slight micro-optimise to produce combine loops so that gcc is able to
optimise the inner-loops concisely. Since we are reviewing the loops, we
can update the comments to describe the current state of affairs, in
particular the distinction between evicting from the global GTT (which
may contain untracked items and transient global pins) and the
per-process GTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-1-git-send-email-chris@chris-wilson.co.uk
2016-08-04 20:19:49 +01:00
Chris Wilson
b1f788c6ac drm/i915: Release vma when the handle is closed
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.

v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.

Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
2016-08-04 08:09:33 +01:00
Chris Wilson
f8c417cdb1 drm/i915: Rename drm_gem_object_unreference in preparation for lockless free
Ultimately wraps kref_put(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_unreference/i915_gem_object_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:12 +01:00
Chris Wilson
25dc556a2a drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get
Ultimately wraps kref_get(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_reference/i915_gem_object_get/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk
2016-07-20 13:40:11 +01:00
Chris Wilson
945657b461 drm/i915/evict: Always switch away from the current context
Currently execlists is exempt from emitting a request to switch each
ring away from the current context over to the dev_priv->kernel_context
(for whatever reason, just under execlists the GGTT is unlikely to be as
fragmented, however the switch may help in some extreme cases). Extract
the switcher and enable it for execlsts as well, as we need to do so in
a later patch to force the context switch before suspend. (And since for
that switch we explicitly require the disposable kernel context, rename
the extracted function.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-1-git-send-email-chris@chris-wilson.co.uk
2016-07-15 20:40:53 +01:00
Chris Wilson
883445d43e drm/i915: Only switch to default context when evicting from GGTT
The contexts only pin space within the global GTT. Therefore forcing the
switch to the perma-pinned kernel context only has an effect when trying
to evict from and find room within the global GTT. We can then restrict
the switch to only when operating on the default context. This is mostly
a no-op as full-ppgtt only exists with execlists at present which skips
the context switch anyway.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-7-git-send-email-chris@chris-wilson.co.uk
2016-06-24 15:03:32 +01:00
Chris Wilson
6e5a5beb8e drm/i915: Split idling from forcing context switch
We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.

v2: Tweak an error message if the wait fails for the ilk vtd w/a

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk
2016-06-24 15:03:14 +01:00
Chris Wilson
c033666a94 drm/i915: Store a i915 backpointer from engine, and use it
text	   data	    bss	    dec	    hex	filename
6309351	3578714	 696320	10584385	 a18141	vmlinux
6308391	3578714	 696320	10583425	 a17d81	vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

   text	   data	    bss	    dec	    hex	filename
6304579	3578778	 696320	10579677	 a16edd	vmlinux
6303427	3578778	 696320	10578525	 a16a5d	vmlinux

Now over 1KiB!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
2016-05-09 13:41:24 +01:00
Chris Wilson
1c7f4bca5a drm/i915: Rename vma->*_list to *_link for consistency
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).

s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-02-26 13:15:39 +00:00
Chris Wilson
506a8e87d8 drm/i915: Add soft-pinning API for execbuffer
Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
Wilson.  Fixed incorrect error paths causing crash found by Michal Winiarski.
(Not published externally)

v3: Rebased because of trivial conflict in object_bind_to_vm.  Fixed eviction
to allow eviction of soft-pinned objects when another soft-pinned object used
by a subsequent execbuffer overlaps reported by Michal Winiarski.
(Not published externally)

v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
pinned first after an address conflict happens to avoid repeated conflicts in
rare cases (Suggested by Chris Wilson).  Expanded comment on
drm_i915_gem_exec_object2.offset to cover this new API.

v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
(Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
buffers are not considered misplaced without the user specifying
EXEC_OBJECT_SUPPORTS_48B_ADDRESS.  User must assume responsibility for any
addressing workarounds.  Updated object2.offset field comment again to clarify
NO_RELOC case (Chris).  checkpatch cleanup.

v6: Trivial rebase on latest drm-intel-nightly

v7: Catch attempts to pin above the max virtual address size and return
EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and
EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin
something at an offset above 4GB (Chris, Daniel Vetter).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Zou Nanhai <nanhai.zou@intel.com>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Acked-by: PDT
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
2015-12-09 10:20:17 +00:00
Chris Wilson
ce8daef358 drm/i915: Remove dead i915_gem_evict_everything()
With UMS gone, we no longer use it during suspend. And with the last
user removed from the shrinker, we can remove the dead code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-10-07 16:05:40 +02:00
Daniel Vetter
eb0b44adc0 drm/i915: kerneldoc for i915_gem_shrinker.c
And remove one bogus * from i915_gem_gtt.c since that's not a
kerneldoc there.

v2: Review from Chris:
- Clarify memory space to better distinguish from address space.
- Add note that shrink doesn't guarantee the freed memory and that
  users must fall back to shrink_all.
- Explain how pinning ties in with eviction/shrinker.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-03-20 11:48:16 +01:00
Ben Widawsky
b9b5dce5e7 drm/i915: Add some extra guards in evict_vm
v2: Use WARN_ONs (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-06 09:07:59 +01:00
Daniel Vetter
7838a63a53 drm/i915: Include i915_gem_evict.c kerneldoc into the drm docbook
I've written these long before we've had a reasonable docbook
structure, and naturally they've gone stale. Fix this up asap.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-01-06 09:07:59 +01:00
Michel Thierry
cf30362674 drm/i915: fix another use-after-free in i915_gem_evict_everything
Also here, i915_gem_evict_vm causes an unbind, which can end up dropping
the last ref to the ppgtt.

Triggered by igt gem_evict_everything test.

Testcase: igt/gem_evict_everything
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@cris-wilsonc.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-09-19 14:41:16 +02:00
Chris Wilson
d23db88c3a drm/i915: Prevent negative relocation deltas from wrapping
This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
  were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
  observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-27 11:18:40 +03:00