the 2nd parameter should be smu msg type rather than asic msg index.
Fixes: 7d38d9dc4e ("drm/amdgpu: add mode2 reset support for yellow carp")
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Merge amdgpu_ras_late_init to
amdgpu_ras_block_late_init.
2. Remove amdgpu_ras_late_init since no ras block
calls amdgpu_ras_late_init.
3. Merge amdgpu_ras_late_fini to
amdgpu_ras_block_late_fini.
4. Remove amdgpu_ras_late_fini since no ras block
calls amdgpu_ras_late_fini.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In order to reduce redundant struct conversion, modify
operating sysfs and interrupt function interface parameters.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_xgmi_ras_late_init/amdgpu_xgmi_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_umc_ras_late_init/amdgpu_umc_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_sdma_ras_late_init/amdgpu_sdma_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_nbio_ras_late_init/amdgpu_nbio_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_mmhub_ras_late_init/amdgpu_mmhub_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_mca_ras_late_init/amdgpu_mca_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_hdp_ras_late_init/amdgpu_hdp_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize amdgpu_gfx_ras_late_init/amdgpu_gfx_ras_fini function code.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Define amdgpu_ras_block_late_init to create sysfs nodes
and interrupt handles.
2. Define amdgpu_ras_block_late_fini to remove sysfs nodes
and interrupt handles.
3. Replace ras block variable members in struct
amdgpu_ras_block_object with struct ras_common_if, which
can make it easy to associate each ras block instance
with each ras block functional interface.
4. Add .ras_cb to struct amdgpu_ras_block_object.
5. Change each ras block to fit for the changement of struct
amdgpu_ras_block_object.
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
rlcg legacy read is not available in SRIOV configration.
Otherwise, gmc_v9_0_flush_gpu_tlb will always complain
timeout and finally breaks driver load.
v2: bypass read in amdgpu_virt_get_rlcg_reg_access_flag (from Victor)
Fixes: 97d1a3b967 ("drm/amdgpu: switch to get_rlcg_reg_access_flag for gfx9")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Victor Skvortsov <Victor.Skvortsov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add missing parameters to fix a kerneldoc warning
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A bunch of errors and warnings are leftover KFD over the years, attempt
to fix the errors and most warnings reported by checkpatch tool. Still a
few warnings remain which may be false positives so ignore them for now.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add IP discovery data in sysfs. The format is:
/sys/class/drm/cardX/device/ip_discovery/die/D/B/I/<attrs>
where,
X is the card ID, an integer,
D is the die ID, an integer,
B is the IP HW ID, an integer, aka block type,
I is the IP HW ID instance, an integer.
<attrs> are the attributes of the block instance. At the moment these
include HW ID, instance number, major, minor, revision, number of base
addresses, and the base addresses themselves.
A symbolic link of the acronym HW ID is also created, under D/, if you
prefer to browse by something humanly accessible.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Tom StDenis <tom.stdenis@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix few kerneldoc warnings and one typo.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd-drm-next-5.18-2022-02-11-1:
amdgpu:
- Clean up of power management code
- Enable freesync video mode by default
- Clean up of RAS code
- Improve VRAM access for debug using SDMA
- Coding style cleanups
- SR-IOV fixes
- More display FP reorg
- TLB flush fixes for Arcuturus, Vega20
- Misc display fixes
- Rework special register access methods for SR-IOV
- DP2 fixes
- DP tunneling fixes
- DSC fixes
- More IP discovery cleanups
- Misc RAS fixes
- Enable both SMU i2c buses where applicable
- s2idle improvements
- DPCS header cleanup
- Add new CAP firmware support for SR-IOV
amdkfd:
- Misc cleanups
- SVM fixes
- CRIU support
- Clean up MQD manager
UAPI:
- Add interface to amdgpu CTX ioctl to request a stable power state for profiling
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207
- Add amdkfd support for CRIU
https://github.com/checkpoint-restore/criu/pull/1709
- Remove old unused amdkfd debugger interface
Was only implemented for Kaveri and was only ever used by an old HSA tool that was never open sourced
radeon:
- Fix error handling in radeon_driver_open_kms
- UVD suspend fix
- Misc fixes
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220211220706.5803-1-alexander.deucher@amd.com
Avoid spam the kernel log on application memory allocation failures.
__func__ argument was also removed from dev_fmt macro due to
parameter conflicts with dynamic_dev_dbg.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.comi>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the warning: no previous prototype for 'get_process_num_bos'
[-Wmissing-prototypes]
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix checkpatch reported warning for a quoted line and block line
comments.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
the UMC_STATUS register is not linear, adjust offset
calculation formula to get correct address
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
SDMA ring buffer test failed if suspend is aborted during
S0i3 resume.
[How]
If suspend is aborted for some reason during S0i3 resume
cycle, it follows SDMA ring test failing and errors in amdgpu
resume. For RN/CZN/Picasso, SMU saves and restores SDMA
registers during S0ix cycle. So, skipping SDMA suspend and
resume from driver solves the issue. This time, the system
is able to resume gracefully even the suspend is aborted.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rajib Mahapatra <rajib.mahapatra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KMD reports a warning on holding a lock from drm_syncobj_find_fence,
when running amdgpu_test case “syncobj timeline test”.
ctx->lock was designed to prevent concurrent "amdgpu_ctx_wait_prev_fence"
calls and avoid dead reservation lock from GPU reset. since no reservation
lock is held in latest GPU reset any more, ctx->lock can be simply removed
and concurrent "amdgpu_ctx_wait_prev_fence" call also can be prevented by
PD root bo reservation lock.
call stacks:
=================
//hold lock
amdgpu_cs_ioctl->amdgpu_cs_parser_init->mutex_lock(&parser->ctx->lock);
…
//report warning
amdgpu_cs_dependencies->amdgpu_cs_process_syncobj_timeline_in_dep \
->amdgpu_syncobj_lookup_and_add_to_sync -> drm_syncobj_find_fence \
-> lockdep_assert_none_held_once
…
amdgpu_cs_ioctl->amdgpu_cs_parser_fini->mutex_unlock(&parser->ctx->lock);
Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The OOB table error count info should be reset after reset
eeprom table
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable the support for DriverSmuConfig table on Navi1x and
Sienna_Cichlid.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Correct the UMD pstate profiling clocks for Dimgrey Cavefish and Beige
Goby.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(v4)
Modifications to satisfy checkpatch --strict
(v3)
Rewrote patchset to order patches as (API, hw impl, usecase)
- modified amdgpu_get_pp_od_clk_voltage to try
amdgpu_dpm_emit_clock_levels and fallback to
amdgpu_dpm_print_clock_levels if emit is not implemented.
- modified amdgpu_get_pp_dpm_clock to try
amdgpu_dpm_emit_clock_levels and fallback to
amdgpu_dpm_print_clock_levels if emit is not implemented.
- Newline is printed to buf if no output produced
== Test ==
LOGFILE=pp_clk.test.log
AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1`
AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'`
HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON}
lspci -nn | grep "VGA\|Display" > $LOGFILE
FILES="pp_od_clk_voltage
pp_dpm_sclk
pp_dpm_mclk
pp_dpm_pcie
pp_dpm_socclk
pp_dpm_fclk
pp_dpm_dcefclk
pp_dpm_vclk
pp_dpm_dclk "
for f in $FILES
do
echo === $f === >> $LOGFILE
cat $HWMON_DIR/device/$f >> $LOGFILE
done
cat $LOGFILE
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(v4)
Modifications to satisfy checkpatch --strict
(v3)
Rewrote patchset to order patches as (API, hw impl, usecase)
- implement emit_clk_levels for navi10, based on print_clk_levels,
but using sysfs_emit without smu_cmn_get_sysfs() workaround
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to check poison setting for each channel, check for umc0
channel0 is enough.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add basic support for GC 10.1.4,
it uses same IP blocks with GC 10.1.3
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang static analysis reports this problem
kfd_chardev.c:2594:16: warning: The expression is an uninitialized value.
The computed value will also be garbage
while (ret && i--) {
^~~
i is a loop variable and this block unwinds a problem in the loop.
When the error happens before the loop, this value is garbage.
Move the initialization of i to its decalaration.
Fixes: be072b06c7 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
clang static analysis reports this problem
kfd_chardev.c:2092:2: warning: 1st function call argument
is an uninitialized value
kvfree(bo_privs);
^~~~~~~~~~~~~~~~
When bo_buckets alloc fails, it jumps to an error handler
that frees the yet to be allocated bo_privs. Because
bo_buckets is the first error, return directly.
Fixes: 5ccbb057c0 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If copy_to_user() fails, it returns the number of bytes remaining to
be copied but we want to return a negative error code (-EFAULT) to the
user.
Fixes: 9d5dabfeff ("drm/amdkfd: CRIU Save Shared Virtual Memory ranges")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The kfd_process_device_data_by_id() does not return error pointers,
it returns NULL.
Fixes: bef153b70c ("drm/amdkfd: CRIU implement gpu_id remapping")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For Cyan Skillfish and Renoir, there is no interface provided by PMFW
to retrieve the enabled features. So, we assume all features are enabled.
Fixes: 7ade3ca9cd ("drm/amd/pm: correct the usage for 'supported' member of smu_feature structure")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
"adev->gfx.rlc.rlcg_reg_access_supported = true;"
the above varible were set too late during driver initialization.
it will cause the driver to fail to write/read register during GMC hw init
in sriov mode.
move gfx_xxx_init_rlcg_reg_access_ctrl() function to gfx early init stage
to avoid this issue.
Fixes: 5d447e2967 ("drm/amdgpu: add helper for rlcg indirect reg access")
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
only vangogh has 2 types of hwmon power node: "fastPPT" and "slowPPT",
the other asic only has 1 type of hwmon power node: "PPT".
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After the buffer object is successfully mapped,
call amdgpu_bo_kunmap before the function returns.
Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A few MQD manager functions are duplicated for all versions of
MQD manager. Remove this duplication by moving the common
functions into kfd_mqd_manager.c file.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cleanup the kfd code by removing the unused old debugger
implementation.
The address watch was only ever implemented in the upstream
driver for GFXv7 (Kaveri). The user mode tools runtime using
this API was never open-sourced. Work on the old debugger
prototype that used this API has been discontinued years ago.
Only a small piece of resetting wavefronts is kept and
is moved to kfd_device_queue_manager.c.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With no HWS, TLB flushing will not work in SVM code.
Fix this by calling kfd_flush_tlb() which works for both
HWS and no HWS case.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
it will cause hwmon node of power1_label is not created.
v2:
the hwmon node of "power1_label" is always needed for all ASICs.
and the patch will remove ASIC type check for "power1_label".
Fixes: ae07970a06 ("drm/amd/pm: add support for hwmon control of slow and fast PPT limit on vangogh")
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PPT limit cannot be queried from VF
Fixes: f3527a6483 ("drm/amd/pm: Enable sysfs required by rocm-smi tool for One VF mode")
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace reset queue for specific PASID with unmap all queues, reset
queue could break CP scheduler.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the function is used in more different cases, use a more general
name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>