With Intel GVT-g, the fence registers are partitioned by multiple
vGPU instances in different VMs. Routine i915_gem_load() is modified
to reset the num_fence_regs, when the driver detects it's running in
a VM. Accesses to the fence registers from vGPU will be trapped and
remapped by the host side. And the allocated fence number is provided
in PV INFO page structure. By now, the value of fence number is fixed,
but in the future we can relax this limitation, to allocate the fence
registers dynamically from host side.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The core fix was applied in
commit a63b03e2d2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Jan 6 10:29:35 2015 +0000
mutex: Always clear owner field upon mutex_unlock()
(note the absence of stable@ tag)
so we can now revert our band-aid commit 226e5ae9e5 for -next.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When run as a timer, i915_hangcheck_elapsed() must adhere to all the
rules of running in a softirq context. This is advantageous to us as we
want to minimise the risk that a driver bug will prevent us from
detecting a hung GPU. However, that is irrelevant if the driver bug
prevents us from resetting and recovering. Still it is prudent not to
rely on mutexes inside the checker, but given the coarseness of
dev->struct_mutex doing so is extremely hard.
Give in and run from a work queue, i.e. outside of softirq.
v2: Use own workqueue to avoid deadlocks (Daniel)
Cleanup commit msg and add comment to i915_queue_hangcheck() (Chris)
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <dnaiel.vetter@ffwll.chm>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Remove accidental kerneldoc comment starter, to appease the 0
day builder.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We can push down the decision whether to force flushing into the
implementation since in all places that matter obj->pin_display is
accurate already. The only place where the optimization really matters
is the sw_finish_ioctl, and that already checks for obj->pin_display
on its own.
I suspect that this was simply an artifact of how
commit 2c22569bba
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Aug 9 12:26:45 2013 +0100
drm/i915: Update rules for writing through the LLC with the cpu
evolved - only v2 added the pin_display tracking.
Note that we still retain the gist of this logic from the above commit
with the explicit force argument for the low-level clflush function.
Ville noted in his review that there's a slight behavioural change in
the set_to_gtt_domain function, which now also will flush display
plane data. This opens-open the potential for userspace to start doing
buggy things by omitting the sw_finish_ioctl, which is why I've
rejected a functional equivalent patch from Ville a while ago:
http://lists.freedesktop.org/archives/intel-gfx/2013-November/036421.html
But on second consideration it's not that evil, and in any case the
justification here is more clarity, not allowing crazy userspace.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we are hitting the WARN inside
i915_gem_object_set_cache_level() as we can now have an unbound object
in the GTT write domain (due to 43566dedde "drm/i915: Broaden
application of set-domain(GTT)"). To avoid the warning, we need to track
when we elided the clflush on a cacheable object and then evict the
cache for the object when we move the object out of a cacheable domain.
Reported-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Jani Nikula <jani.nikula@intel.com>
Testcase: igt/gem_mmap_wc/set-cache-level
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88607
Tested-by: huax.lu@intel.com
[danvet: Split if into nested if as discussion on the m-l.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We pin when we submit to execlist queue. Balance
the pinning when the submitted queue is cleaned on reset.
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move all remaining elements that were unique to execlists queue items
in to the associated request.
Issue: VIZ-4274
v2: Rebase. Fixed issue of overzealous freeing of request.
v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
v4: Rebase.
v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
pointer in __i915_add_request (found by Thomas Daniel)
v6: Removed unrelated changes
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Reformat comment with strange linebreaks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.
v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.
Issue: VIZ-4274
v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- refactor i915/snd-hda interaction to use the component framework (Imre)
- psr cleanups and small fixes (Rodrigo)
- a few perf w/a from Ken Graunke
- switch to atomic plane helpers (Matt Roper)
- wc mmap support (Chris Wilson & Akash Goel)
- smaller things all over
* tag 'drm-intel-next-2015-01-17' of git://anongit.freedesktop.org/drm-intel: (40 commits)
drm/i915: Update DRIVER_DATE to 20150117
i915: reuse %ph to dump small buffers
drm/i915: Ensure the HiZ RAW Stall Optimization is on for Cherryview.
drm/i915: Enable the HiZ RAW Stall Optimization on Broadwell.
drm/i915: PSR link standby at debugfs
drm/i915: group link_standby setup and let this info visible everywhere.
drm/i915: Add missing vbt check.
drm/i915: PSR HSW/BDW: Fix inverted logic at sink main_link_active bit.
drm/i915: PSR VLV/CHV: Remove condition checks that only applies to Haswell.
drm/i915: VLV/CHV PSR needs to exit PSR on every flush.
drm/i915: Fix kerneldoc for i915 atomic plane code
drm/i915: Don't pretend SDVO hotplug works on 915
drm/i915: Don't register HDMI connectors for eDP ports on VLV/CHV
drm/i915: Remove I915_HAS_HOTPLUG() check from i915_hpd_irq_setup()
drm/i915: Make hpd arrays big enough to avoid out of bounds access
Revert "drm/i915/chv: Use timeout mode for RC6 on chv"
drm/i915: Improve HiZ throughput on Cherryview.
drm/i915: Reset CSB read pointer in ring init
drm/i915: Drop unused position fields (v2)
drm/i915: Move to atomic plane helpers (v9)
...
Backmerge Linus tree after rc5 + drm-fixes went in.
There were a few amdkfd conflicts I wanted to avoid,
and Ben requested this for nouveau also.
Conflicts:
drivers/gpu/drm/amd/amdkfd/Makefile
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
drivers/gpu/drm/amd/include/kgd_kfd_interface.h
drivers/gpu/drm/i915/intel_runtime_pm.c
drivers/gpu/drm/radeon/radeon_kfd.c
Conflicts:
drivers/gpu/drm/i915/intel_runtime_pm.c
Separate branch so that Takashi can also pull just this refactoring
into sound-next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
If CONFIG_DEBUG_MUTEXES is set, the mutex->owner field is only cleared
if the mutex debugging is enabled which introduces a race in our
mutex_is_locked_by() - i.e. we may inspect the old owner value before it
is acquired by the new task.
This is the root cause of this error:
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 5cf6731..3ef3736 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -80,13 +80,13 @@ void debug_mutex_unlock(struct mutex *lock)
DEBUG_LOCKS_WARN_ON(lock->owner != current);
DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
- mutex_clear_owner(lock);
}
/*
* __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
* mutexes so that we can do it here after we've verified state.
*/
+ mutex_clear_owner(lock);
atomic_set(&lock->count, 1);
}
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87955
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
- plane handling refactoring from Matt Roper and Gustavo Padovan in prep for
atomic updates
- fixes and more patches for the seqno to request transformation from John
- docbook for fbc from Rodrigo
- prep work for dual-link dsi from Gaurav Signh
- crc fixes from Ville
- special ggtt views infrastructure from Tvrtko Ursulin
- shadow patch copying for the cmd parser from Brad Volkin
- execlist and full ppgtt by default on gen8, for testing for now
* tag 'drm-intel-next-2014-12-19' of git://anongit.freedesktop.org/drm-intel: (131 commits)
drm/i915: Update DRIVER_DATE to 20141219
drm/i915: Hold runtime PM during plane commit
drm/i915: Organize bind_vma funcs
drm/i915: Organize INSTDONE report for future.
drm/i915: Organize PDP regs report for future.
drm/i915: Organize PPGTT init
drm/i915: Organize Fence registers for future enablement.
drm/i915: tame the chattermouth (v2)
drm/i915: Warn about missing context state workarounds only once
drm/i915: Use true PPGTT in Gen8+ when execlists are enabled
drm/i915: Skip gunit save/restore for cherryview
drm/i915/chv: Use timeout mode for RC6 on chv
drm/i915: Add GPGPU_THREADS_DISPATCHED to the register whitelist
drm/i915: Tidy up execbuffer command parsing code
drm/i915: Mark shadow batch buffers as purgeable
drm/i915: Use batch length instead of object size in command parser
drm/i915: Use batch pools with the command parser
drm/i915: Implement a framework for batch buffer pools
drm/i915: fix use after free during eDP encoder destroying
drm/i915/skl: Skylake also supports DP MST
...
This will allow us to set per-file, or even per-context, periods in the
future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously, this was restricted to only operate on bound objects - to
make pointer access through the GTT to the object coherent with writes
to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
which at present does not function unless the object also happens to
be bound into the GGTT (on current systems that is becoming increasingly
rare, especially for the typical requests from mesa). A third usecase is
a future patch wishing to extend the coverage of the GTT domain to
include objects not bound into the GGTT but still in its coherent cache
domain. For the latter pair of requests, we need to operate on the
object regardless of its bind state.
v2: After discussion with Akash, we came to the conclusion that the
get-pages was required in order for accurate domain tracking in the
corner cases (like the shrinker) and also useful for ensuring memory
coherency with earlier cached CPU mmaps in case userspace uses exotic
cache bypass (non-temporal) instructions.
v3: Fix the inactive object check.
v4: Rebase to latest drm-intel-nightly codebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I've had these since before -rc1, but they missed my last pull
request. Real bug fixes and mostly cc: stable material.
* tag 'drm-intel-next-fixes-2014-12-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915: add missing rpm ref to i915_gem_pwrite_ioctl
Revert "drm/i915: Preserve VGACNTR bits from the BIOS"
drm/i915: Don't call intel_prepare_page_flip() multiple times on gen2-4
drm/i915: Kill check_power_well() calls
This reverts commit 355a701838.
This had some bad side effects under normal operation, and should
have been dropped earlier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Without this RPM ref we can hit the device suspended WARN via:
i915_gem_object_pin()->ggtt_bind_vma->gen6_ggtt_insert_entries(). I
noticed this on my BYT while keeping the i915 device in runtime
suspended state for a while. I chose this place to take the ref to
avoid the possible deadlock via the mutex_lock taken both later in this
function and in the runtime suspend handler. This can happen if an RPM
suspend event is queued and need to be flushed before taking the RPM
ref.
Testcase: igt/pm_rpm/gem-evict-pwrite
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87363
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Let's be optimistic that for future platforms this will remain the same
and reorg a bit.
This reorg in if blocks instead of switch make life easier for future
platform support addition.
v2: Jani pointed out I was missing reg_830 for some gen3 platforms. So let's make
this platforms subcases of Gen checks.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.
After this patch, the parser is in 'enabling' mode.
Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
memset the rest of the buffer. Reduces the number of reads.
v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring
v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request
v4:
- Fix read domains for shadow_batch_obj
- Remove the set_to_gtt_domain call from i915_parse_cmds
- ggtt_pin/unpin in the parser block to simplify error handling
- Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
- Remove i915_gem_batch_pool_put calls
v5:
- Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
the parser (danvet, from v4 0/7 feedback)
Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds a small module for managing a pool of batch buffers.
The only current use case is for the command parser, as described
in the kerneldoc in the patch. The code is simple, but separating
it out makes it easier to change the underlying algorithms and to
extend to future use cases should they arise.
The interface is simple: init to create an empty pool, fini to
clean it up, get to obtain a new buffer. Note that all buffers are
expected to be inactive before cleaning up the pool.
Locking is currently based on the caller holding the struct_mutex.
We already do that in the places where we will use the batch pool
for the command parser.
v2:
- s/BUG_ON/WARN_ON/ for locking assertions
- Remove the cap on pool size
- Switch from alloc/free to init/fini
v3:
- Idiomatic looping structure in _fini
- Correct handling of purged objects
- Don't return a buffer that's too much larger than needed
v4:
- Rebased to latest -nightly
v5:
- Remove _put() function and clean up comments to match
v6:
- Move purged check inside the loop (danvet, from v4 1/7 feedback)
v7:
- Use single list instead of two. (Chris W)
- s/active_list/cache_list
- Squashed in debug patches (Chris W)
drm/i915: Add a batch pool debugfs file
It provides some useful information about the buffers in
the global command parser batch pool.
v2: rebase on global pool instead of per-ring pools
v3: rebase
drm/i915: Add batch pool details to i915_gem_objects debugfs
To better account for the potentially large memory consumption
of the batch pool.
v8:
- Keep cache in LRU order (danvet, from v6 1/5 feedback)
Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
to map objects into the same address space multiple times.
Added a GGTT view concept and linked it with the VMA to distinguish between
multiple instances per address space.
New objects and GEM functions which do not take this new view as a parameter
assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
previous behaviour.
This now means that objects can have multiple VMA entries so the code which
assumed there will only be one also had to be modified.
Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
which is DMA mapped on first VMA instantiation and unmapped on the last one
going away.
v2:
* Removed per view special casing in i915_gem_ggtt_prepare /
finish_object in favour of creating and destroying DMA mappings
on first VMA instantiation and last VMA destruction. (Daniel Vetter)
* Simplified i915_vma_unbind which does not need to count the GGTT views.
(Daniel Vetter)
* Also moved obj->map_and_fenceable reset under the same check.
* Checkpatch cleanups.
v3:
* Only retire objects once the last VMA is unbound.
v4:
* Keep scatter-gather table for alternative views persistent for the
lifetime of the VMA.
* Propagate binding errors to callers and handle appropriately.
v5:
* Explicitly look for normal GGTT view in i915_gem_obj_bound to align
usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
* Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
* Removed stray semi-colon in i915_gem_object_set_cache_level.
For: VIZ-4544
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Drop hunk from i915_gem_shrink since it's just prettification
but upsets a __must_check warning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Release struct_mutex if init_rings() fails.
This is a regression introduced in
commit 35a57ffbb1
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Nov 20 00:33:07 2014 +0100
drm/i915: Only init engines once
Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
So apparently jiffies<->nsec<->ktime isn't accurate or something. At
elast if we timeout there's occasionally still a few hundred us left
(in a 2 second timeout).
Stuff I've tried and thrown out again:
- Sampling the before timestamp before jiffies. Doesn't improve test
path rate at all.
- Using jiffies. Way to inaccurate, which means way too much drift
with signals plus automatic ioctl restarting in userspace. In
hindsight we should have used an absolute timeout, but hey we need
something for v3 of the i915 gem wait interfaces ;-)
- Trying to figure out where accuracy gets lost. gl testcase really
don't care all that much about this (as long as isn't not massively
off), it's just that the testcase gets a bit upset if it receives an
EITME with timeout > 0.
So as long as we're in the ballbark it's good enough. So patch
everything up if we're at most one jiffies off. I get's me a solid
test again.
This regression is probably introduced in
commit 5ed0bdf21a
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000
drm: i915: Use nsec based interfaces
Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Probably because I'm too lazy to confirm myself and still waiting for
QA ;-)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We've lost the +1 required for correct timeouts in
commit 5ed0bdf21a
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000
drm: i915: Use nsec based interfaces
Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>
So fix this up by reinstating our handrolled _timeout function. While
at it bother with handling MAX_JIFFIES.
v2: Convert to usecs (we don't care about the accuracy anyway) first
to avoid overflow issues Dave Gordon spotted.
v3: Drop the explicit MAX_JIFFY_OFFSET check, usecs_to_jiffies should
take care of that already. It might be a bit too enthusiastic about it
though.
v4: Chris has a much nicer color, so use his implementation.
This requires to export nsec_to_jiffies from time.c.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Multiple GGTT VMAs per object will be introduced in the near future which will
make it impossible to guarantee normal GGTT view is at the head of the list.
Purpose of this patch is to break this assumption straight away so any
potential hidden assumptions in the code base can be bisected to this
simple patch.
For: VIZ-4544
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to do that every time we resume the rings, not just at load.
I've overlooked this in my untangling of the ring init code.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We can do this.
And now there's finally the clean split between software setup and
hardware setup I kinda wanted since multi-ring support was merged
aeons ago. It only took almost 5 years.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated the trace_irq code to use requests instead of seqnos. This includes
reference counting the request object to ensure it sticks around when required.
Note that getting access to the reference counting functions means moving the
inline i915_trace_irq_get() function from intel_ringbuffer.h to i915_drv.h.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Resolve conflict due to shuffled merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ring member of the object structure was always updated with the
last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is
now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant
and potentially misleading (especially as there was no comment to explain its
purpose).
This checkin removes the redundant field. Many uses were simply testing for
non-null to see if the object is active on the GPU. Some of these have been
converted to check 'obj->active' instead. Others (where the last_read_req is
about to be used anyway) have been changed to check obj->last_read_req. The rest
simply pull the ring out from the request structure and proceed as before.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Almost everywhere that caled i915_seqno_passed() was really asking 'has the
given seqno popped out of the hardware yet?'. Thus it had to query the current
hardware seqno and then do a signed delta comparison (which copes with wrapping
around zero but not with seqno values more than 2GB apart, although the latter
is unlikely!).
Now that the majority of seqno instances have been replaced with request
structures, it is possible to convert this test to be request based as well.
There is now a 'i915_gem_request_completed()' function which takes a request and
returns true or false as appropriate. Note that this currently just wraps up the
original _passed() test but a later patch in the series will reduce this to
simply returning a cached internal value, i.e.:
_completed(req) { return req->completed; }'
This checkin converts almost all _seqno_passed() calls. The only one left is in
the semaphore code which still requires seqnos not request structures.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Drop hunk touching the trace_irq code since I've dropped the
patch which converts that, and resolve resulting conflict.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It makes a lot more sense (and makes future seqno -> request conversion patches
simpler) to fill in the 'ring' field of the request structure at the point of
creation rather than submission. Given that the request structure is assigned by
ring specific code and thus is locked to a ring from the start, there really is
no reason to defer this assignment.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
All the code above is now using requests not seqnos so it is possible to convert
the trace functions across. Note that rather than get into problematic reference
counting issues, the trace code only saves the seqno and ring values from the
request structure not the structure pointer itself.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that all code above is using request structures instead of seqno values, it
is possible to convert __wait_seqno() itself. Internally, it is still calling
i915_seqno_passed(), this will be updated later in the series. This step is just
changing the parameter list and function name.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated i915_wait_seqno() to take a request structure instead of a seqno value
and renamed it accordingly. Internally, it just pulls the seqno out of the
request and calls on to __wait_seqno() as before. However, all the code further
up the stack is now simplified as it can just pass the request object straight
through without having to peek inside.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Squash in hunk from an earlier patch which was rebased
wrongly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated the _check_olr() function to actually take a request object and compare
it to the OLR rather than extracting seqnos and comparing those.
Note that there is one use case where the request object being processed is no
longer available at that point in the call stack. Hence a temporary copy of the
original function is still present (but called _check_ols() instead). This will
be removed in a subsequent patch.
Also, downgraded a BUG_ON to a WARN_ON as apparently the former is frowned upon
for shipping code.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Added reference counting of the request structure around __wait_seqno() calls.
This is a precursor to updating the wait code itself to take the request rather
than a seqno. At that point, it would be a Bad Idea for a request object to be
retired and freed while the wait code is still using it.
v3:
Note that even though the mutex lock is held during a call to i915_wait_seqno(),
it is still necessary to explicitly bump the reference count. It appears that
the shrinker can asynchronously retire items even though the mutex is locked.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Remove wrongly squashed hunk which breaks the build.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Convert the throttle code to use the request structure rather than extracting a
ring/seqno pair from it and using those. This is in preparation for
__wait_seqno() becoming __wait_request().
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The object structure contains the last read, write and fenced seqno values for
use in syncrhonisation operations. These have now been replaced with their
request structure counterparts.
Note that to ensure that objects do not end up with dangling pointers, the
assignments of last_*_req include reference count updates. Thus a request cannot
be freed if an object is still hanging on to it for any reason.
v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did
not get updated when 'last_rendering_seqno' became 'last_read|write_seqno'
several millenia ago.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to use request structures everywhere that seqno values were
previously used. This means saving pointers to structures in places that used to
be simple integers. In turn, that means that the target structure now needs much
more stringent lifetime tracking. That is, it must not be freed while some other
random object still holds a pointer to it.
To achieve this tracking, a reference count needs to be added. Whenever a
pointer to the structure is saved away, the count must be incremented and the
free must only occur when all references have been released.
For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Before suspending, we wait upon the outstanding GPU requests and flush
our pending idle handlers. This should downclock the GPU to its lowest
power state. Add a WARN to check that the delayed tasks were run and did
their job properly.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Dynamic context pinning for LRCs introduced a leak in legacy mode.
Reinstate context unreference in i915_gem_free_request for legacy contexts.
Leak reported by i-g-t/drv_module_reload fixed by this patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86507
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: John Harrison<John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The problem here is that SNA pins batchbuffers to etch out a bit more
performance. Iirc it started out as a w/a for i830M (which we've
implemented in the kernel since a long time already). The problem is
that the pin ioctl wasn't added in
commit d23db88c3a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri May 23 08:48:08 2014 +0200
drm/i915: Prevent negative relocation deltas from wrapping
Fix this by simply disallowing pinning from userspace so that the
kernel is in full control of batch placement again. Especially since
distros are moving towards running X as non-root, so most users won't
even be able to see any benefits.
UMS support is dead now, but we need this minimal patch for
backporting. Follow-up patch will remove the pin ioctl code
completely.
Note to backporters: You must have both
commit b45305fce5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Dec 17 16:21:27 2012 +0100
drm/i915: Implement workaround for broken CS tlb on i830/845
which laned in 3.8 and
commit c4d69da167
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Sep 8 14:25:41 2014 +0100
drm/i915: Evict CS TLBs between batches
which is also marked cc: stable. Otherwise this could introduce a
regression by disabling the userspace w/a without the kernel w/a being
fully functional on i830/45.
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554#c116
Cc: stable@vger.kernel.org # requires c4d69da167 and v3.8
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drm-intel-next-2014-11-21:
- infoframe tracking (for fastboot) from Jesse
- start of the dri1/ums support removal
- vlv forcewake timeout fixes (Imre)
- bunch of patches to polish the rps code (Imre) and improve it on bdw (Tom
O'Rourke)
- on-demand pinning for execlist contexts
- vlv/chv backlight improvements (Ville)
- gen8+ render ctx w/a work from various people
- skl edp programming (Satheeshakrishna et al.)
- psr docbook (Rodrigo)
- piles of little fixes and improvements all over, as usual
* tag 'drm-intel-next-2014-11-21-fixed' of git://anongit.freedesktop.org/drm-intel: (117 commits)
drm/i915: Don't pin LRC in GGTT when dumping in debugfs
drm/i915: Update DRIVER_DATE to 20141121
drm/i915/g4x: fix g4x infoframe readout
drm/i915: Only call mod_timer() if not already pending
drm/i915: Don't rely upon encoder->type for infoframe hw state readout
drm/i915: remove the IRQs enabled WARN from intel_disable_gt_powersave
drm/i915: Use ggtt error obj capture helper for gen8 semaphores
drm/i915: vlv: increase timeout when setting idle GPU freq
drm/i915: vlv: fix cdclk setting during modeset while suspended
drm/i915: Dump hdmi pipe_config state
drm/i915: Gen9 shadowed registers
drm/i915/skl: Gen9 multi-engine forcewake
drm/i915: Read power well status before other registers for drpc info
drm/i915: Pin tiled objects for L-shaped configs
drm/i915: Update ring freq for full gpu freq range
drm/i915: change initial rps frequency for gen8
drm/i915: Keep min freq above floor on HSW/BDW
drm/i915: Use efficient frequency for HSW/BDW
drm/i915: Can i915_gem_init_ioctl
drm/i915: Sanitize ->lastclose
...
It happens on occasion that developers of generic user-space applications
abuse the dumb buffer API to get hold of drm buffers that they can both
mmap() and use for GPU acceleration, using the assumptions that dumb buffers
and buffers available for GPU are
a) The same type and can be aribtrarily type-casted.
b) fully coherent.
This patch makes the most widely used drivers warn nicely when that happens,
the next step will be to fail.
v2: Move drmP.h changes to drm_gem.h. Fix Radeon dumb mmap breakage.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>