I'm not increasing the DRM version because GDS isn't totally without bugs yet.
v2: update emit_ib_size
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New chunk for dependency on start of job's execution instead on
the end. This is used for GPU deadlock prevention when
userspace uses mid-IB fences to wait for mid-IB work on other rings.
v2: Fix typo in AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
v3: Bump KMS version
v4: put old fence AFTER acquiring the scheduled fence.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- move all adjustments into one place
- specify GDS/GWS/OA alignment in basic units of the heaps
- it looks like GDS alignment was 1 instead of 4
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kptr is not used any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reduce the repeated node and hive information during XGMI initialization
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sriov need to restrict max_pfn below AMDGPU_GMC_HOLE.
access the hole results in a range fault interrupt IIRC.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After removing unnecessary VM size calculations,
vm_manager.max_pfn would reach 0x10,0000,0000
max_pfn << AMDGPU_GPU_PAGE_SHIFT exceeding AMDGPU_GMC_HOLE_START
would cause GPU reset.
Signed-off-by: wentalou <Wentao.Lou@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Decauple sched threads stop and start and ring mirror
list handling from the policy of what to do about the
guilty jobs.
When stoppping the sched thread and detaching sched fences
from non signaled HW fenes wait for all signaled HW fences
to complete before rerunning the jobs.
v2: Fix resubmission of guilty job into HW after refactoring.
v4:
Full restart for all the jobs, not only from guilty ring.
Extract karma increase into standalone function.
v5:
Rework waiting for signaled jobs without relying on the job
struct itself as those might already be freed for non 'guilty'
job's schedulers.
Expose karma increase to drivers.
v6:
Use list_for_each_entry_safe_continue and drm_sched_process_job
in case fence already signaled.
Call drm_sched_increase_karma only once for amdgpu and add documentation.
v7:
Wait only for the latest job's fence.
Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The power smu7 powerplay code is much more robust and has
been the default for a while now. Remove the old code.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add missing power_average to visible check for power
attributes for APUs. Was missed before.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace the last bool type parameter with a general flags parameter,
to make the last parameter be able to contain more information.
v2: drop setting need_ctx_switch = false
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sriov would meet guest driver load failure,
if calling amdgpu_asic_reset in amdgpu_device_init.
sriov should skip asic_reset in device_init.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the APU judgement to make it really work as expected.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So that we do not need to check this in every internal function.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This finally enables processing of ring 1 & 2.
v2: fix copy&paste error
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously we only added the ring buffer memory, now add the handling as
well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The entries are ignored for now, but it at least stops crashing the
hardware when somebody tries to push something to the other IH rings.
v2: limit ring size, add TODO comment
v3: only program rings if they are actually allocated
v4: limit the ring init to Vega10
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we run into a non-retry fault on access.
It seems to be a hardware bug that the executable bit has
higher priority than the valid bit.
v2: handle clears as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
User can use "pp_dpm_dcefclk" to retrieve and adjust dcefclock power
levels.
V2: expose this interface for Vega10 and later ASICs only
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
User can use "pp_dpm_fclk" to retrieve and adjust fclock power
levels.
V2: expose this interface for Vega20 and later ASICs only
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
User can use "pp_dpm_socclk" to retrieve and adjust SOC clock power
levels.
V2: expose this interface for Vega10 and later ASICs only
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
User can use "ppfeatures" sysfs interface to retrieve and set enabled
powerplay features.
V2: expose this feature for Vega10 and later dGPUs
V3: squash in removal of unused variable (Alex)
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
if lru is changed, we cannot do bulk moving.
v2:
root bo isn't in bulk moving, skip its change.
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In some cases, psp response status is not 0 even there is no
problem while the command is submitted. Some version of PSP FW
doesn't write 0 to that field.
So here we would like to only print a warning instead of an error
during psp initialization to avoid breaking hw_init and it doesn't
return -EINVAL.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Xiangliang Yu<Xiangliang.Yu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Paul Menzel <pmenzel+amd-gfx@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
HW doorbell writing routing policy: writing to doorbell
not in SDMA/IH/MM/ACV doorbell range will be routed to CP.
So CP doorbell routing depends on doorbell range setting
of above blocks. Setting doorbell range of above blocks
earlier (soc15_common_hw_init) to make sure CP doorbell
writing be routed to CP block.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Different ASIC has different SDMA queue number so
different SDMA doorbell range. Introduce an extra
parameter to sdma_doorbell_range function and set
sdma doorbell range correctly.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Different ASIC has different sdma doorbell range. Add
a per device sdma_doorbell_range field and initialize
it.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It will fall back to use mode1 reset if platform does not support BACO
feature.
v2: squash in warning fix (Alex)
Signed-off-by: Jim Qu <Jim.Qu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To deal with situations like kexec or GPU VM passthrough
where the device may have been used previously without a
proper GPU reset between.
v2: rebase
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108585
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108754
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SOC15 chips require a reset if the driver was previously loaded
because the PSP can only be loaded once between each reset.
v2: rebase, handle multiple asic funcs
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VI chips require a reset if the driver was previously loaded
because the SMU can only be loaded once between each reset.
v2: rebase
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CIK chips require a reset if the driver was previously loaded
because the SMU can only be loaded once between each reset.
v2: rebase
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SI chips don't require a reset on reload due to the nature of
the SMU on them.
v2: rebase
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Used to determine if we need to reset the asic on init due
to the driver having been previously loaded or not shutdown
cleanly. E.g., kexec or VM passthrough.
v2: rebase
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Expose sclk (gfx clock) and mclk (memory clock) via
hwmon compatible interface. hwmon does not actually
formally specify a frequency type attribute, but these
are compatible with the format of the other attributes
exposed via hwmon. Units are hertz.
freq1_input - GPU gfx/compute clock in hertz
freq2_input - GPU memory clock in hertz (dGPU only)
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a sysfs file that reports the number of bytes transmitted and
received in the last second. This can be used to approximate the PCIe
bandwidth usage over the last second.
v2: Clarify use of mps as estimation of bandwidth
v3: Don't make the file on APUs
v4: Early exit for APUs in the read function, change output to
display "packets-received packets-sent mps"
v5: fix missing header for si (Alex)
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need these offsets for PCIE perf counters, so include them as well as
the the previously-used defines from the nbio_*.c files
v2: Return NBIF definitions back to previous files
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: Move locks around in other functions so that this
function can stand on its own. Also only hold the hive
specific lock for add/remove device instead of the driver
global lock so you can't add/remove devices in parallel from
one hive.
v3: add reset_lock
Acked-by: Shaoyun.liu < Shaoyun.liu@amd.com>
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add message print out and return -EINVAL when driver can not get valid hive
from hive arrary on xgmi configuration
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
get_fw_type and prep_cmd_buf should be common interface
instead of IP specific ones
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX_FW_TYPE_RLC_RESTORE_LIST_CNTL was renamed to GFX_FW_TYPE_RLC_RESTORE_LIST_SRM_CNTL
in latest psp_gfx_if drop
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix boolean expressions by using logical AND operator '&&'
instead of bitwise operator '&'.
This issue was detected with the help of Coccinelle.
Fixes: 9a4b7d4c76 ("drm/amdgpu: Add vm context module param")
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When init fail, send rel init, req_fini and rel_fini to host for the
finishing routine.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Note if this is a retry fault or not and cleanup the message a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
distinguish ip_reinit_early_sriov and ip_reinit_late_sriov
by different log RE-INIT-early and RE-INIT-late
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To distinct on which IH ring an IV was found.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>