[why]
1. Current code hard codes link to PHY mapping in dc link level per asic
per revision.
This is not scalable. In long term the mapping will be obatined from
DMUB and store in dc resource.
2. Depending on DCN revision and endpoint type, the definition of
dio_output_idx dio_output_type and phy_idx are not consistent. We need
to unify the meaning of these hardware indices across different system
configuration.
[how]
1. Temporarly move the hardcoded mapping to dc_resource level, which
should have full awareness of asic specific configuration and add a TODO
comment to move the mapping to DMUB.
2. populate dio_output_idx/phy_idx for all configuration, define
usb4_enabled bit instead of dio_output_type as an external enum.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Depend on res_pool->res_cap->num_timing_generator to query timing
gernerator information, it would case underflow at the fused display
pipes case.
Due to the res_pool->res_cap->num_timing_generator records default
timing generator resource built in driver, not the current chip.
[How]
Some ASICs would be fused display pipes less than the default setting.
In dcnxx_resource_construct function, driver would obatin real timing
generator count and store it into res_pool->timing_generator_count.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Yi-Ling Chen <Yi-Ling.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This makes it clearer which codepaths are in use specifically in
one state or the other.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This codepath should be running in both s0ix and s3, but only does
currently because s3 and s0ix are both set in the s0ix case.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To pair with the workaround which always reset the ASIC in suspend.
Otherwise, the reset which relies on BACO will fail.
Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Malicious mailbox event1 fails driver loading on vega10.
A dummy event6 prevent driver from taking response from malicious event1 as its own.
[how]
On vega10, send a mailbox event6 before sending event1.
Signed-off-by: James Yao <yiqing.yao@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chips with no display hardware should return false for
DC support.
v2: drop Arcturus and Aldebaran
Fixes: f7f12b2582 ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reported-by: Tareque Md.Hanif <tarequemd.hanif@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The "plane_state" pointer was access before checking if it was NULL.
Avoid a possible NULL pointer dereference by accessing the plane
address after the check.
Addresses-Coverity-ID: 1493892 ("Dereference before null check")
Fixes: 3f68c01be9 ("drm/amd/display: add cyan_skillfish display support")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.
v2: handle s0ix
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and
put itself into a state ready for PNP. That can workaround some random resuming
failure observed on BOCO capable platforms.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In function enable_stream_features(), the variable "old_downspread.raw"
could be uninitialized if core_link_read_dpcd() fails, however, it is
used in the later if statement, and further, core_link_write_dpcd()
may write random value, which is potentially unsafe.
Fixes: 6016cd9dba ("drm/amd/display: add helper for enabling mst stream features")
Cc: stable@vger.kernel.org
Signed-off-by: Yizhuo Zhai <yzhai003@ucr.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's not only supported by HG/PX laptops. It's supported
by all dGPUs which supports BOCO/BACO functionality (runtime
D3).
BOCO - Bus Off, Chip Off. The entire chip is powered off.
This is controlled by ACPI.
BACO - Bus Active, Chip Off. The chip still shows up
on the PCI bus, but the device itself is powered
down.
v2: fix missed HG/PX reference
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
== Description ==
Setting values of pm attributes through sysfs
should not be allowed in SRIOV mode.
These calls will not be processed by FW anyway,
but error handling on sysfs level should be improved.
== Changes ==
This patch prohibits performing of all set commands
in SRIOV mode on sysfs level.
It offers better error handling as calls that are
not allowed will not be propagated further.
== Test ==
Writing to any sysfs file in passthrough mode will succeed.
Writing to any sysfs file in ONEVF mode will yield error:
"calling process does not have sufficient permission to execute a command".
Signed-off-by: Marina Nikolic <Marina.Nikolic@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise the RAS error count couldn't be queried from sysfs.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A minor typo.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
When reboot the link res map should be persisted. So during boot up,
driver will look at the map to determine which link should take priority
to use certain link res. This is to ensure that link res remains
unshuffled after a reboot.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
When there are more DP2.0 RXs connected than the number HPO DP link
encoders we have, we need to dynamically allocate HPO DP link encoder to
the port that needs it.
[how]
Only allocate HPO DP link encoder when it is needed.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Update all accesses to use hpo dp link encoder through link resource
only.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
This commit is to populate link res in preparation of the next commit.
The next commit will replace all existing code to use link res instead
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
There will be a series of re-arch changes in Link Resource Management.
They are more and more muxable link resource objects and the resource is
insufficient for a one to one allocation to all links created.
Therefore a link resource sharing logic is required to determine which
link should use certain link resource.
This commit is the first one in this series that starts by defining a
link resource struct, this struct will be available to all interfaces
that need to perform link programming sequence.
In later commits, we will granduately decouple link resource objects out
of dc link. So instead of access a link resource from dc link. Current
link's resource can be accessible through pipe_ctx->link_res during
commit, or by calling dc_link_get_cur_link_res function with current
link passed in after commit.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This version brings along the following:
- Fixes and improvements in the LTTPR code
- Improve z-state
- Fix null pointer check
- Improve communication with s0i2
- Update multiple-display split policy
- Add missing registers
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
These registers are currently missing from the DCN303 header files
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: George Shen <George.Shen@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
We'll exit optimized power state to do link detection but we won't enter
back into the optimized power state.
This could potentially block s2idle entry depending on the sequencing,
but it also means we're losing some power during the transition period.
[How]
Hook up the handler like DCN21. It was also missed like the
exit_optimized_pwr_state callback.
Fixes: 64b1d0e8d5 ("drm/amd/display: Add DCN3.1 HWSEQ")
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DP spec specifies that DPRX shall use the read interval in the
TRAINING_AUX_RD_INTERVAL_PHY_REPEATER LTTPR DPCD register. This
register's bit definition is the same as the AUX read interval register
for DPRX.
[How}
Remove logic which forces AUX read interval to 100us for repeaters when
in LTTPR non-transparent mode.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Otherwise SMU won't mark Display as idle when trying to perform s2idle.
[How]
Mark the bit in the dcn31 codepath, doesn't apply to older ASIC.
It needed to be split from phy refclk off to prevent entering s2idle
when PSR was engaged but driver was not ready.
Fixes: 118a331516 ("drm/amd/display: Add DCN3.1 clock manager support")
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Bug fix for null function ptr (should check for NULL instead of not
NULL)
[How]
Fix if condition
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Samson Tam <samson.tam@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lacked power down for DCN10.
[How]
Added power down for DCN10.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Stutter period won't be less than 5000.0, but if PSR is enabled then we
can potentially enter Z9 when MPO is enabled.
SMU will try to enter Z9 too early in these cases (before PSR is
enabled) and we'll see underflow.
[How]
Block z-states (z9, z10) until we can add a new interface to SMU to
signal when we can support z10 but not z9.
We can revert this once the interface change is in.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Current implementation is not scalable and retrofits the existing
standard link training code for purposes outside of its original design.
[How]
Refactor vendor specific link training sequence into its own separate
function to be called instead of the standard link training function.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Max link rate should be limited to the maximum link rate support by any
LTTPR that are connected, including when operating in transparent mode.
[How]
Include transparent mode when factoring in LTTPR max supported link
rate.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Wesley Chalmers <wesley.chalmers@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <George.Shen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
B0 PHY C map to F, D map to G driver use logic instance, dmub does the
remap. Driver still need use the right PHY instance to access right HW.
[how]
use phyical instance when program PHY register.
[note]
could move resync_control programming to dmub next.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the event guard is enabled and VF doesn't receive an ack from PF for full access,
the guest driver load crashes.
This is caused due to the call to ttm_device_clear_dma_mappings with non-initialized
mman during driver tear down.
This patch adds the necessary condition to check if the mman initialization passed or not
and takes the path based on the condition output.
Signed-off-by: Surbhi Kakarya <Surbhi.Kakarya@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch keeps the setting of sdma queue number to the same
after recent KFD code refactor. Additionally, improve code to
use switch case to list IP version to complete kfd device_info
structure filling for IH version assignment. This makes consistency
with the IP parse code in amdgpu_discovery.c.
v2: use dev_warn for the default switch case;
set default sdma queue per engine(8) and IH handler to v9. (Jonathan)
v3: Fix missed IP version check of Raven.
Fixes: f0dc99a6f7 ("drm/amdkfd: add kfd_device_info_init function")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <Graham.Sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is supported, although the offset is different from VG20, so fix
that with a variable and enable getting the product name and serial
number from the FRU. Do this for all SKUs since all SKUs have the FRU
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On Aldebaran, the serial may be obtained from the FRU. Only overwrite
the serial with the unique_id if the serial is empty. This will support
printing serial numbers for mGPU devices where there are 2 unique_ids
for the 2 GPUs, but only one serial number for the board
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's supported, so support the unique_id sysfs file
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Having seen at least 1 42-character product_name, bump the number up to
64, and put that definition into amdgpu.h to make future adjustments
simpler.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The psp bootloader functions code of psp_v13_0.c had been
optimized before. According the code style of psp_v13_0.c
to remove the redundant code of psp_v11_0.c.
v2: squash in drop unused variable (Alex)
Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it
will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well
to release mapped VRAM. However, in the following release callback, driver stills visits the
unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.
[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some boards(like RX550) seem to have garbage in the upper
16 bits of the vram size register. Check for
this and clamp the size properly. Fixes
boards reporting bogus amounts of vram.
after add this patch,the maximum GPU VRAM size is 64GB,
otherwise only 64GB vram size will be used.
Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
A porting error on a previous patch left the block of code that
causes the crash from a NULL pointer dereference.
More specifically, we try to access link_enc before it's assigned in
the USB4 case in the following assignment:
config.dio_output_idx = link_enc->transmitter - TRANSMITTER_UNIPHY_A;
[How]
That assignment occurs later depending on the ASIC version. It's only
needed on DCN31 and only after link_enc is already assigned.
Fixes: 986430446c ("drm/amd/display: fix a crash on USB4 over C20 PHY")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.
Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: sashank saye <sashank.saye@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When an application having open file access to a node forks, its shared
mappings also get reflected in the address space of child process even
though it cannot access them with the object permissions applied. With the
existing permission checks on the gem objects, it might be reasonable to
also create the VMAs with VM_DONTCOPY flag so a user space application
doesn't need to explicitly call the madvise(addr, len, MADV_DONTFORK)
system call to prevent the pages in the mapped range to appear in the
address space of the child process. It also prevents the memory leaks
due to additional reference counts on the mapped BOs in the child
process that prevented freeing the memory in the parent for which we had
worked around earlier in the user space inside the thunk library.
Additionally, we faced this issue when using CRIU to checkpoint restore
an application that had such inherited mappings in the child which
confuse CRIU when it mmaps on restore. Having this flag set for the
render node VMAs helps. VMAs mapped via KFD already take care of this so
this is needed only for the render nodes.
To limit the impact of the change to user space consumers such as OpenGL
etc, limit it to KFD BOs only.
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CP supports unmap queue with reset mode which only destroys specific queue without affecting others.
Replacing whole gpu reset with reset queue mode for RAS poison consumption
saves much time, and we can also fallback to gpu reset solution if reset
queue fails.
v2: Return directly if process is NULL;
Reset queue solution is not applicable to SDMA, fallback to legacy
way;
Call kfd_unref_process after lookup process.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The new interface unmaps queues with reset mode for the process consumes
RAS poison, it's only for compute queue.
v2: rename the function to reset_queues.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So we can set reset mode for unmap operation, no functional change.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a reset parameter for umc page retirement, let user decide whether
call gpu reset in umc page retirement.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Expand RLCG interface for new GC read & write commands.
New interface will only be used if the PF enables the flag in pf2vf msg.
v2: Added a description for the scratch registers
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver needs to call get_xgmi_info() before ip_init
to determine whether it needs to handle a pending hive reset.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Reviewed by: shaoyun.liu <Shaoyun.lui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Modify GC register access from MMIO to RLCG if the indirect
flag is set
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Modify GC register access from MMIO to RLCG if the
indirect flag is set
v2: Replaced ternary operator with if-else for better
readability
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add helper macros to change register access
from direct to indirect.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: David Nieto <david.nieto@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Recently, there is security policy update under SRIOV.
We need to filter the registers that hit the violation
and move the code to the host driver side so that
the guest driver can execute correctly.
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Used on gfx9 based systems. Fixes incorrect CU counts reported
in the kernel log.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1833
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some old registers leftover from pre-silicon. No longer
relevant on real hardware. Remove.
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We want to be able to call virt data exchange conditionally
after gmc sw init to reserve bad pages as early as possible.
Since this is a conditional call, we will need
to call it again unconditionally later in the init sequence.
Refactor the data exchange function so it can be
called multiple times without re-initializing the work item.
v2: Cleaned up the code. Kept the original call to init_exchange_data()
inside early init to initialize the work item, afterwards call
exchange_data() when needed.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed By: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use max() and min() in order to make code cleaner.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:
amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<vcn_v1_0> failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.
When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.
When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.
[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the message argument.
0: Allow power down
1: Disallow power down
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is still needed for thoes in case the firmware fails to load
then the message is the only way to tell what firmware was on them
Suggested-by: Lijo Lazar <lijo.Lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add svm_range_bo_unref_async to schedule work to wait for svm_bo
eviction work done and then free svm_bo. __do_munmap put_page
is atomic context, call svm_range_bo_unref_async to avoid warning
invalid wait context. Other non atomic context call svm_range_bo_unref.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the s0ix entry need retain gfx in the gfxoff state,so here need't
set gfx cgpg in the S0ix suspend-resume process. Moreover move the S0ix
check into SMU12 can simplify the code condition check.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since this variable was made available by the previous commit, use
it to make function access cleaner.
Suggested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Backmerging for v5.16-rc5. Resolves a conflict between drm-misc-next
and drm-misc-fixes in the vc4 driver.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Those are not today pulled by the sphinx doc, but better be ready.
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Memory is allocated for gpu_metrics_table in renoir_init_smc_tables(),
but not freed in int smu_v12_0_fini_smc_tables(). Free it!
Fixes: 95868b8576 ("drm/amd/powerplay: add Renoir support for gpu metrics export")
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
gmc bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement to skip pin bo
v2: fix wrong judgement
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement to skip pin bo
v2: fix wrong judgement
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Power states are not valid for arcturus and aldebaran, no need to
allocate memory.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pair the operations did in GMC ->hw_init and ->hw_fini. That
can help to maintain correct cached state for GMC and avoid
unintention gate operation dropping due to wrong cached state.
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Updated for consistency when accessing drm_device from amdgpu driver.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Host initiated VF FLR may fail if someone else is
already holding a read_lock. Change from down_write_trylock
to down_write to guarantee the reset goes through.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
why:
Need interface to lower clocks when in dc (power save)
mode. Must be able to work with p_state unsupported cases
Can cause flicker when OS notifies us of dc state change
how:
added dal3 interface for KMD
added pathway to query smu for this softmax
added blank before clock change to override underflow
added logic to change clk based on pstatesupport and softmax
added logic in prepare/optimize_bw to conform while changing
clocks
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Martin Leung <Martin.Leung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
For ODM + MPO window on one half of ODM, only 3 pipes should
be allocated and scaling parameters adjusted to handle this case
[How]
Fix pipe allocation when MPO viewport is only on one side of ODM
split, and modify scaling paramters.
Added diags test cases for ODM + windows MPO, where MPO window is
on right half, left half, and both halves or ODM.
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Eric Bernstein <eric.bernstein@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
If the firmware wasn't reset by PSP or HW and is currently running
then the firmware will hang or perform underfined behavior when we
modify its firmware state underneath it.
[How]
Reset DMCUB before setting up cache windows and performing HW init.
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
This workaround resolves underflow caused by incorrect DST_Y_PREFETCH.
Overriding to 192KB DET buf size until the DST_Y_PREFETCH calc is fixed.
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Adding a function to read PSR capabilities
and ALPM capabilities.
Also adding a helper function to validate if
the sink and the driver support PSR SU.
[how]
- isolated all PSR and ALPM reading calls to a separate funciton
- set all required PSR caps
- added a helper function to check if PSR SU is supported by sink
and the driver
Reviewed-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Current error log of dummy irq service doesn't have
src/ext ID info in the log.
[How]
Add src/ext ID in ack/set of dummy irq service.
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
In order to know the intermediate link rates supported by the eDP
panel and test to select the optimized link rate to save power,
create a new debugfs entry "ilr_setting" for
setting ILR.
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
SMU now respects the PHY refclk disable request from driver.
This causes a hang during hotplug when PHY refclk was disabled
because it's not being re-enabled and the transmitter control
starts on dc_link_detect.
[How]
We normally would re-enable the clk with exit_optimized_pwr_state
but this is only set on DCN21 and DCN301. Set it for dcn31 as well.
This fixes DMCUB timeouts in the PHY.
Fixes: 64b1d0e8d5 ("drm/amd/display: Add DCN3.1 HWSEQ")
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3
w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0
HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem
St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz
iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+
VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz
1m2tVNY=
=7wRd
-----END PGP SIGNATURE-----
Merge v5.16-rc5 into drm-next
Thomas Zimmermann requested a fixes backmerge, specifically also for
96c5f82ef0 ("drm/vc4: fix error code in vc4_create_object()")
Just a bunch of adjacent changes conflicts, even the big pile of them
in vc4.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
return value form directly instead of
taking this in another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cm>
Signed-off-by: chiminghao <chi.minghao@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update smu_v13 to match smu_v12 and smu_v11 behavior where this is
fetched from debugfs rather than in kernel logs on every boot.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This value does not get cached into adev->pm.fw_version during
startup for smu13 like it does for other SMU like smu12.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This turns previously global functions into static, thus removing
compile-time warnings such as:
warning: no previous prototype for 'get_highest_allowed_voltage_level'
[-Wmissing-prototypes]
742 | unsigned int get_highest_allowed_voltage_level(uint32_t chip_family, uint32_t hw_internal_rev, uint32_t pci_revision_id)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
warning: no previous prototype for 'rv1_vbios_smu_send_msg_with_param'
[-Wmissing-prototypes]
102 | int rv1_vbios_smu_send_msg_with_param(struct clk_mgr_internal *clk_mgr, unsigned int msg_id, unsigned int param)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Changes since v1:
- As suggested by Rodrigo Siqueira:
1. Rewrite function signatures to make them more readable.
2. Get rid of unused functions in order to remove 'defined but not
used' warnings.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the struct display_mode_lib pointer instead of passing lots of large
arrays as parameters by value.
Addresses this warning (resulting in failure to build a RHEL debug kernel
with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘UseMinimumDCFCLK’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:7478:1: warning: the frame size of 2128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
NOTE: AFAICT this function previously had no observable effect, since it
only modified parameters passed by value and doesn't return anything.
Now it may modify some values in struct display_mode_lib passed in by
reference.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move code using the Pipe struct to a new helper function.
Works around[0] this warning (resulting in failure to build a RHEL debug
kernel with Werror enabled):
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c: In function ‘dml31_ModeSupportAndSystemConfigurationFull’:
../drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:5740:1: warning: the frame size of 2144 bytes is larger than 2048 bytes [-Wframe-larger-than=]
The culprit seems to be the Pipe struct, so pull the relevant block out
into its own sub-function. (This is porting
commit a62427ef9b ("drm/amd/display: Reduce stack size for dml21_ModeSupportAndSystemConfigurationFull")
from dml31 to dml21)
[0] AFAICT this doesn't actually reduce the total amount of stack which
can be used, just moves some of it from
dml31_ModeSupportAndSystemConfigurationFull to the new helper function,
so the former happens to no longer exceed the limit for a single
function.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>