When checking for a dependency fence for belonging to the same entity
compare it with scheduled as well finished fence. Earlier we were only
comparing it with the scheduled fence.
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is required so we use the correct minimum clocks for Vega. Without
this pplib will never be able to enter the lowest clock states.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Casting a pointer to a 64-bit type causes a warning on 32-bit targets:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
lower_32_bits((uint64_t)wptr));
^
drivers/gpu/drm/amd/amdgpu/amdgpu.h:1701:53: note: in definition of macro 'WREG32'
#define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
^
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:10: note: in expansion of macro 'lower_32_bits'
lower_32_bits((uint64_t)wptr));
^~~~~~~~~~~~~
The correct method is to cast to 'uintptr_t'.
Fixes: d5a114a6c5 ("drm/amdgpu: Add GFXv9 kfd2kgd interface functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
VBT seems to have some bits to tell us whether the internal LVDS port
has something hooked up. In theory one might expect the VBT to not have
a child device for the LVDS port if there's no panel hooked up, but
in practice many VBTs still add the child device. The "LVDS config" bits
seem more reliable though, so let's check those.
So far we've used the "LVDS config" bits to check for eDP support on
ILK+, and disable the internal LVDS when the value is 3. That value
is actually documented as "Both internal LVDS and SDVO LVDS", but in
practice it looks to mean "eDP" on all the ilk+ VBTs I've seen. So let's
keep that interpretation, but for pre-ILK we will consider the value
3 to also indicate the presence of the internal LVDS.
Currently we have 25 DMI matches for the "no internal LVDS" quirk. In an
effort to reduce that let's toss in a WARN when the DMI match and VBT
both tell us that the internal LVDS is not present. The hope is that
people will report a bug, and then we can just nuke the corresponding
entry from the DMI quirk list. Credits to Jani for this idea.
v2: Split the basic int_lvds_support thing to a separate patch (Jani)
v3: Rebase
v4: Limit this to VBT version >= 134
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180518150138.18361-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
In order to prepare the GPU for sleeping, we may want to submit commands
to it. This is a complicated process that may even require some swapping
in from shmemfs, if the GPU was in the wrong state. As such, we need to
do this preparation step synchronously before the rest of the system has
started to turn off (e.g. swapin fails if scsi is suspended).
Fortunately, we are provided with a such a hook, pm_ops.prepare().
v2: Compile cleanup
v3: Fewer asserts, fewer problems?
v4: Ville pointed out that in some circumstances (such as switching off
the overlay) the display code may issue a GPU request. This is
unexpected, and will result in us going to sleep with us believing the
GPU is still awake (though all user work has been saved). Add a comment
to remind our future selves of what trouble brews.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106640
Testcase: igt/drv_suspend after igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180525092629.1456-1-chris@chris-wilson.co.uk
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
"id" needs to be signed for the error handling to work.
Fixes: 7a2d5c77c5 ("drm/exynos: fimc: Convert driver to IPP v2 core API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its
error paths on 32-bit VMs. One is a fix for a hibernate flaw
introduced with the 4.17 merge window.
* 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Schedule an fb dirty update after resume
drm/vmwgfx: Fix host logging / guestinfo reading error paths
drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
This is an important message, so it should be visible to users without
having to enable extra debugging.
Signed-off-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In fixed31_32.h, in dc_fixpt_shl,'/' was used for division of one long
long int by another long long int. As there is no inbuilt long long
int division function in c, gcc inserted its own. However, gcc does not
link the library that contains this function. To avoid this, use
bitwise operators instead of /
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
is_dpm_running callback was assigned to the same value
twice. Drop the duplicate.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To free the fence from the amdgpu_fence_slab, need twice call_rcu, to avoid
the amdgpu_fence_slab_fini call kmem_cache_destroy(amdgpu_fence_slab) before
kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after drm_sched_entity_fini.
The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below:
1.drm_sched_entity_fini ->
drm_sched_entity_cleanup ->
dma_fence_put(entity->last_scheduled) ->
drm_sched_fence_release_finished ->
drm_sched_fence_release_scheduled ->
call_rcu(&fence->finished.rcu, drm_sched_fence_free)
2.drm_sched_fence_free ->
dma_fence_put(fence->parent) ->
amdgpu_fence_release ->
call_rcu(&f->rcu, amdgpu_fence_free) ->
kmem_cache_free(amdgpu_fence_slab, fence);
v2:put the barrier before the kmem_cache_destroy
v3:put the dma_fence_put(fence->parent) before call_rcu in
drm_sched_fence_release_scheduled
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only the moved state needs a separate spin lock protection. All other
states are protected by reserving the VM anyway.
v2: fix some more incorrect cases
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We were not very carefully checking to see if an older request on the
engine was an earlier switch-to-kernel-context before deciding to emit a
new switch. The end result would be that we could get into a permanent
loop of trying to emit a new request to perform the switch simply to
flush the existing switch.
What we need is a means of tracking the completion of each timeline
versus the kernel context, that is to detect if a more recent request
has been submitted that would result in a switch away from the kernel
context. To realise this, we need only to look in our syncmap on the
kernel context and check that we have synchronized against all active
rings.
v2: Since all ringbuffer clients currently share the same timeline, we do
have to use the gem_context to distinguish clients.
As a bonus, include all the tracing used to debug the death inside
suspend.
v3: Test, test, test. Construct a selftest to exercise and assert the
expected behaviour that multiple switch-to-contexts do not emit
redundant requests.
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: a89d1f921c ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
To no surprise (since we've flip-flopped over the use of PIN_HIGH a few
times), doing a search by address over a pathologically fragmented
address space is exceeding slow. To protect ourselves from nearly
unbounded latency (think searching a million holes while under
struct_mutex), limit the search for the highest available hole and
fallback to best-fit if it fails.
In the pathologically fragmented case, such as igt/gem_ctx_thrash, the
effect is dramatic, bringing the runtime down from hours to seconds
(depending on how many other slow searches you hit, e.g. alloc_iova()
and alloc_vmap_area() both degrade to a slow rbtree walk after their
small cache is exhausted). For the real world, the number of search
steps is unlikely to be significant as we should only need to search
once per new context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-3-chris@chris-wilson.co.uk
Searching for an available hole by address is slow, as there no
guarantee that a hole will be available and so we must walk over all
nodes in the rbtree before we determine the search was futile. In many
cases, the caller doesn't strictly care for the highest available hole
and was just opportunistically laying out the address space in a
preferred order. In such cases, the caller can accept any address and
would rather do so then do a slow walk.
To be able to mix search strategies, the caller wants to tell the drm_mm
how long to spend on the search. Without a good guide for what should be
the best split, start with a request to try once at most. That is return
the top-most (or lowest) hole if it fulfils the alignment and size
requirements.
v2: Documentation, by why of example (selftests) and kerneldoc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-2-chris@chris-wilson.co.uk
As we keep an rbtree of available holes sorted by their size, we can
very easily determine if there is any hole large enough that might
satisfy the allocation request. This helps when dealing with a highly
fragmented address space and a request for a search by address.
To cache the largest size, we convert into the cached rbtree variant
which tracks the leftmost node for us. However, currently we sorted into
ascending size order so the leftmost node is the smallest, and so to
make it the largest hole we need to invert our sorting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-1-chris@chris-wilson.co.uk