Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.
Signed-off-by: Aurabindo Pillai <mail@aurabindo.in>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vram lost counter is wrongly increased by two during baco reset.
V2: assumed vram lost for mode1 reset on all ASICs
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add indirect access support to registers outside of
mmio bar.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
those are not needed anymore
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
the workaround is not needed for soc15 ASICs except
for vega10. it is even not needed with latest vega10
vbios.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.
v2: Use disable the GFX CGPG instead of RLC safe mode guard.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Tested-by: Mengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[PATCH 2/2]
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate
Without this change, sriov tdr code path will never free those
allocated memories and get memory leak.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 5161bba4311f in order to split it into two
different patches, and this will make it easier to understand.
[PATCH 1/2]
porting to gfx10 from
commit 1b0bfcff46 ("drm/amdgpu: Avoid destroy hqd when GPU is on reset")
Originally, MEC is touched
without GPU initialized first.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.
v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum
v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list
v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate
Without this change, sriov tdr code path will never free those allocated
memories and get memory leak.
v2:add a bugfix for kiq ring test fail
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
extend compute lockup timeout to 60000 for SR-IOV.
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Jiawei <Jiawei.Gu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
if host support new handshake we only need to enter
fullaccess_mode in ip_init() part, otherwise we need
to do it before reading vbios (becuase host prepares vbios
for VF only after received REQ_GPU_INIT event under
legacy handshake)
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
what:
1)move timtout setting before ip_early_init to reduce exclusive mode
cost for SRIOV
2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init()
it is a prepare for the later upcoming patches.
why:
in later upcoming patches we would use a new mailbox event --
"req_gpu_init_data", which is a callback hooked in adev->virt.ops and
this callback send a new event "REQ_GPU_INIT_DAT" to host to notify
host to do some preparation like "IP discovery/vbios on the VF FB"
and this callback must be:
A) invoked after set_ip_block() because virt.ops is configured during
set_ip_block()
B) invoked before ip_discovery_init() becausen ip_discovery_init()
need host side prepares everything in VF FB first.
current place of ip_discovery_init() is before we can invoke callback
of adev->virt.ops, thus we must move ip_discovery_init() to a place
after the adev->virt.ops all settle done, and the perfect place is in
amdgpu_discovery_reg_base_init()
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
by this new handshake host side can prepare vbios/ip-discovery
and pf&vf exchange data upon recieving this request without
stopping world switch.
this way the world switch is less impacted by VF's exclusive mode
request
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
added flag to ras context to indicate if ras query functionality is ready
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503
2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow for reading of information like manufacturer, product number
and serial number from the FRU chip. Report the serial number as
the new sysfs file serial_number. Note that this only works on
server cards, as consumer cards do not feature the FRU chip, which
contains this information.
v2: Add documentation to amdgpu.rst, add helper functions,
rename functions for consistency, fix bad starting offset
v3: Remove testing definitions
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
MMHub EDC becomes dirty after BACO reset
EDC registers should be cleared early on in reset phase
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help
now even debugfs's reg_op can used to dump wave.
tested-by: Monk Liu <monk.liu@amd.com>
tested-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1)for gfx IB test we shouldn't insert DE meta data
2)we should make sure IB test finished before we
send event 3 to hypervisor otherwise the IDLE from
event 3 will preempt IB test, which is not designed
as a compatible structure for MCBP
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We've moved the debugfs handling into a centralized place
so we can remove the legacy load an unload callbacks.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for firmware.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for register access files.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for gem.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
to amdgpu_debugfs_fini. It will be used for other things in
the future.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since emulators are slower, sometime some operations like flushing tlb
through FM need more than twice the regular timout of 100ms, so increase
the timeout to 1s on emulators.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
UAPI Changes:
- lima: Add support for heap buffers
Cross-subsystem Changes:
Core Changes:
- Implement mode_config mode_valid for memory constrained drivers
- Bus format negociation between bridges
- Consolidate fake vblank events for drivers without vblank interrupts
- drm/bufs: dma_alloc related cleanups
- drm/dp_mst: Various fixes
- drm/print: New drm_device based print helpers
- Thomas is a drm-misc maintainer now!
Driver Changes:
- DPMS cleanups for atomic drivers
- Removal of owner field in SPI tinydrm drivers
- Removal of explicit dependency on DT for tinydrm drivers
- Conversion to YAML schemas for DT bindings
- tidss: New driver
- virtio: various reworks and fixes
- Our usual dozen or so new panels or bridges
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXkEjOgAKCRDj7w1vZxhR
xeaDAQD+1MludG4RmfQhATe4jTsPC1r2x63OF2CA0ChMGHXJyQEA8qqQ+8y1Cd/u
PZ3PpcTl4qYYHgzJ6FwW7kDPTvlaZQE=
=IJAt
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2020-02-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.7:
UAPI Changes:
- lima: Add support for heap buffers
Cross-subsystem Changes:
Core Changes:
- Implement mode_config mode_valid for memory constrained drivers
- Bus format negociation between bridges
- Consolidate fake vblank events for drivers without vblank interrupts
- drm/bufs: dma_alloc related cleanups
- drm/dp_mst: Various fixes
- drm/print: New drm_device based print helpers
- Thomas is a drm-misc maintainer now!
Driver Changes:
- DPMS cleanups for atomic drivers
- Removal of owner field in SPI tinydrm drivers
- Removal of explicit dependency on DT for tinydrm drivers
- Conversion to YAML schemas for DT bindings
- tidss: New driver
- virtio: various reworks and fixes
- Our usual dozen or so new panels or bridges
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200210093421.xu4sofldm6wm6xq6@gilmour.lan
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.
However, in an ideal case, when gpu is runtime suspended the kfd driver
should be able to:
- auto resume amdgpu driver whenever a client requests compute service
- prevent runtime suspend for amdgpu while kfd is in use
This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.
Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This should speed up debugging VRAM access a lot.
v2: add HDP flush/invalidate
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only write the _HI register when necessary.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For sriov and pp_onevf_mode, do not send message to set smu
status, because smu doesn't support these messages under VF.
Besides, it should skip smu_suspend when pp_onevf_mode is disabled.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since drm_global_mutex is a true global mutex across devices, we don't
want to acquire it unless absolutely necessary. For maintaining the
device local open_count, we can use atomic operations on the counter
itself, except when making the transition to/from 0. Here, we tackle the
easy portion of delaying acquiring the drm_global_mutex for the final
release by using atomic_dec_and_mutex_lock(), leaving the global
serialisation across the device opens.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Thomas Hellström <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124130107.125404-1-chris@chris-wilson.co.uk
Better clean that up before some automation starts to complain about it
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reading some registers by mmio will result in hang when GPU is in
"gfxoff" state.This problem can be solved by GPU in "ring command
packages" way.
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c,
and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and
flexible.
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initialize notifier_lock.
Bug: https://gitlab.freedesktop.org/drm/amd/issues/1016
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support check if dirver should try gpu recovery for
arcturus
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.
V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Under sriov and pp_onevf mode,
1.take resume instead of hw_init for smc recover to avoid
potential memory leak.
2.add return condition inside smc resume function for
sriov_pp_onevf_mode and pm_enabled param.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes coccicheck warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3961:1-19: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3981:1-19: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes coccicheck warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1036:5-8: Unneeded variable: "ret". Return "0" on line 1079
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously there is no VCN1 type ID in psp gfx interface. Also add VCN ip
block type unless the reinit after FLR for sriov would fail.
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is to avoid queueing jobs to same CPU during XGMI hive reset
because there is a strict timeline for when the reset commands
must reach all the GPUs in the hive.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use task barrier in XGMI hive to synchronize ASIC resets
across devices in XGMI hive.
v2: Return right away with a warning if no xgmi hive, update doc.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In preparation for doing XGMI reset synchronization using task barrier.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
issues:
MEC is ruined by the amdkfd_pre_reset after VF FLR done
fix:
amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR,
the correct sequence is do amdkfd_pre_reset before VF FLR but there is
a limitation to block this sequence:
if we do pre_reset() before VF FLR, it would go KIQ way to do register
access and stuck there, because KIQ probably won't work by that time
(e.g. you already made GFX hang)
so the best way right now is to simply remove it.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This sched array can be passed on to entity creation routine
instead of manually creating such sched array on every context creation.
v2: squash in missing break fix
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm_sched_entity_init() takes drm gpu scheduler list instead of
drm_sched_rq list. This makes conversion of drm_sched_rq list
to drm gpu scheduler list unnecessary
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The log will be useful for easily getting the CU info on various
emulation models or ASICs.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This makes it easier to figure out whether the kernel parameter has been
taken into account.
Signed-off-by: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response
v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc
Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise next err_event_athub error cannot call gpu reset. And following
resume sequence will not be affected by this flag.
v2: create function to clear amdgpu_ras_in_intr for modularity of ras driver
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This athub fatal error can be recovered by baco without system-level reboot,
so add a mode to use baco for the recovery. Not affect the default psp reset
situations for now.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently each XGMI node reset wq does not run in parrallel if bound to same
cpu. Make change to bound the xgmi_reset_work item to different cpus.
XGMI requires all nodes enter into baco within very close proximity before
any node exit baco. So schedule the xgmi_reset_work wq twice for enter/exit
baco respectively.
To use baco for XGMI, PMFW supported for baco on XGMI needs to be involved.
The case that PSP reset and baco reset coexist within an XGMI hive never exist
and is not in the consideration.
v2: define use_baco flag to simplify the code for xgmi baco sequence
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This operation is needed when baco entry/exit for ras recovery
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As will call unload kms when initialize fail, and the unload kms will
send event 3 and 4, so don't need event 3 and 4 in device init.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The documentation says the that PCI core handles this
for you unless you choose to implement it. Just rely
on the PCI core to handle the pci specific bits.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Originally we only supported runtime pm on PX/HG laptops
so vga_switcheroo and runtime pm are sort of entangled.
Attempt to logically separate them.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BACO - Bus Active, Chip Off
Will be used for runtime pm. Entry will enter the BACO
state (chip off). Exit will exit the BACO state (chip on).
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BACO - Bus Active, Chip Off
BOCO - Bus Off, Chip Off
To better match what we are checking for and to align with
amdgpu_device_supports_baco.
BOCO is used on PowerXpress/Hybrid Graphics systems and BACO
is used on desktop dGPU boards.
v2: fix typo in documentation
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BACO - Bus Active, Chip Off
To check if a device supports BACO or not. This will be
used in determining when to enable runtime pm.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is one regression from 042f3d7b745cd76aa
To put flush_delayed_work after adev->shutdown = true
which will make amdgpu_ih_process not response the irq
At last, all ib ring tests will be failed just like below
[drm] amdgpu: finishing device.
[drm] Fence fallback timer expired on ring gfx
[drm] Fence fallback timer expired on ring comp_1.0.0
[drm] Fence fallback timer expired on ring comp_1.1.0
[drm] Fence fallback timer expired on ring comp_1.2.0
[drm] Fence fallback timer expired on ring comp_1.3.0
[drm] Fence fallback timer expired on ring comp_1.0.1
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.1.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.2.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.3.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd_enc_0.0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vce0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).
v2: replace cancel_delayed_work_sync() with flush_delayed_work()
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
By using JPEG IP block type
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes gcc warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1954: warning: Function
parameter or member 'state' not described in 'amdgpu_device_set_cg_state'
Fixes: e3ecdffac9 ("drm/amdgpu: add documentation for amdgpu_device.c")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since dcn20 and dcn21 are under dcn1 it doesnt make sense to
have it named dcn1.
Change it to "dcn" to make it generic
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DCN21 is stable enough to be build by default. So drop the flags.
[How]
Remove them using the unifdef tool. The following commands were executed
in sequence:
$ find -name '*.c' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DCN2_1 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_1 '{}' ';'
$ find -name '*.h' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DCN2_1 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_1 '{}' ';'
In addition:
* Remove from kconfig, and replace any dependencies with DCN1_0.
* Remove from any makefiles.
* Fix and cleanup Renoir definitions in dal_asic_id.h
* Expand DCN1 ifdef to include DCN21 code in the following files:
* clk_mgr/clk_mgr.c: dc_clk_mgr_create()
* core/dc_resources.c: dc_create_resource_pool()
* gpio/hw_factory.c: dal_hw_factory_init()
* gpio/hw_translate.c: dal_hw_translate_init()
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DCN2 and DSC are stable enough to be build by default. So drop the flags.
[How]
Remove them using the unifdef tool. The following commands were executed
in sequence:
$ find -name '*.c' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DSC_SUPPORT -DCONFIG_DRM_AMD_DC_DCN2_0 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_0 '{}' ';'
$ find -name '*.h' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DSC_SUPPORT -DCONFIG_DRM_AMD_DC_DCN2_0 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_0 '{}' ';'
In addition:
* Remove from kconfig, and replace any dependencies with DCN1_0.
* Remove from any makefiles.
* Fix and cleanup NV defninitions in dal_asic_id.h
* Expand DCN1 ifdef to include DCN2 code in the following files:
* clk_mgr/clk_mgr.c: dc_clk_mgr_create()
* core/dc_resources.c: dc_create_resource_pool()
* dce/dce_dmcu.c: dcn20_*lock_phy()
* dce/dce_dmcu.c: dcn20_funcs
* dce/dce_dmcu.c: dcn20_dmcu_create()
* gpio/hw_factory.c: dal_hw_factory_init()
* gpio/hw_translate.c: dal_hw_translate_init()
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
flush/cancel delayed works before doing finalization
to avoid concurrently requests.
Signed-off-by: Jesse Zhang <zhexi.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
P-state switch should be performed after all devices from the hive
get initialized.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise, the feature enablement will be skipped due to wrong count.
Fixes: beff74bc6e ("drm/amdgpu: fix a race in GPU reset with IB test (v2)")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pstate settings should be performed after all device of the
XGMI setup get initialized.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PSP lost connection when err_event_athub occurs. These cleanup work can be
skipped in BACO reset.
v2: squash in missing include (Alex)
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Correct the "_LENTH" mispelling in the AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
constant.
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For Arcturus the I2C traffic is done through SMU tables and so
we must postpone RAS recovery init to after they are ready
which is in amdgpu_device_ip_hw_init_phase2.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add a generic helper function for accessing framebuffer via MMIO
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move it into the caller. There are cases were we don't
want it. We need it for hibernation, but we don't need
it for runtime pm, so drop it for runtime pm.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for amdgpu_device_suspend. This follows the logic
in the resume path.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When index is 1, need to set compute ring timeout for sriov and passthrough.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's only used in amdgpu_device.c and the naming also
reflects that. Move it there.
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, every single piece of code in amdgpu that loops through
connectors does it incorrectly and doesn't use the proper list iteration
helpers, drm_connector_list_iter_begin() and
drm_connector_list_iter_end(). Yeesh.
So, do that.
Cc: Juston Li <juston.li@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When compute fence did not signal, compute ring cannot detect hardware hang
because its timeout value is set to be infinite by default.
In SR-IOV and passthrough mode, if user does not declare custome timeout
value for compute ring, then use gfx ring timeout value as default. So
that when there is a ture hardware hang, compute ring can detect it.
Signed-off-by: Jesse Zhang <zhexi.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
except soc_bounding_box which is not integrated in discovery table yet
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set mp1 state properly for SW SMU suspend/reset routine.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
define sched_policy in case CONFIG_HSA_AMD is not
enabled, with this there is no need to check for CONFIG_HSA_AMD
else where in driver code.
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If CONFIG_HSA_AMD is not set, build fails:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function `amdgpu_device_ip_early_init':
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to `sched_policy'
Use CONFIG_HSA_AMD to guard this.
Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W scheduling policy")
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
While gfxoff is enabled, the mmVM_XXX registers will be 0xfffffff while the GFX
is in "off" state. KFD queue creattion doesn't use ring based method, so it will
trigger a VM fault.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As more and more new asics start to reuse the old device IDs before
launch, there is a need to quickly override the existing asic type
corresponding to the reused device ID through a kernel parameter. With
this, engineers no longer need to rely on local hack patches,
facilitating cooperation across teams.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ras recovery_init should be called after ttm init,
bad page reserve should be put in front of gpu reset since i2c
may be unstable during gpu reset.
add cleanup for recovery_init and recovery_fini
v2: add more comment and print.
remove cancel_work_sync in recovery_init.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This optimizes out the pci device id usage in KFD and makes the code
more maintainable.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In case of RAS error allow user configure auto system
reboot through ras_ctrl.
This is also part of the temproray work around for the RAS
hang problem.
v4: Use latest kernel API for disk sync.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.
Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.
v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count
v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Issue 1:
In XGMI case amdgpu_device_lock_adev for other devices in hive
was called to late, after access to their repsective schedulers.
So relocate the lock to the begining of accessing the other devs.
Issue 2:
Using amdgpu_device_ip_need_full_reset to switch the device list from
all devices in hive to the single 'master' device who owns this reset
call is wrong because when stopping schedulers we iterate all the devices
in hive but when restarting we will only reactivate the 'master' device.
Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS
needed we then have to stop schedulers for all devices in hive and not
only the 'master' but with amdgpu_device_ip_need_full_reset we
already missed the opprotunity do to so. So just remove this logic and
always stop and start all schedulers for all devices in hive.
Also minor cleanup and print fix.
v4: Minor coding style fix.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This should be checked at all places job is accessed.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable DC support for renoir.
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for SOC15/vega10 the BACO reset & mode1 would introduce vram lost
in high end address range, current kmd's vram lost checking cannot
catch it since it only check very ahead visible frame buffer
v2:
cover NV as well
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Integrate the mode2 reset into rest sequence.
v2:
Check ppfuncs pointer for NULL
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds renoir support for gpu_info firmware and ip block setting.
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
atomic 64 bits REG operations are useless currently
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
what we really want is a read or write that is guaranteed to be 64 bits
at a time, atomic64 operations are supported on all architectures
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_ip_block.status.hw for GMC wasn't set to
false on suspend during GPU reset and so on resume gmc_v9_0_resume
wasn't called.
Caused by 'drm/amdgpu: fix double ucode load by PSP(v3)'
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clear the flag after hw suspend, otherwise it skips the corresponding hw
resume.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Load DC and amdgpu display manager
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
same with navi10
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
previously the ucode loading of PSP was repreated, one executed in
phase_1 init/re-init/resume and the other in fw_loading routine
Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
prior to the FW loading and any block's hw_init/resume
v2:
still do the smu fw loading since it is needed by bare-metal
v3:
drop the change in reinit_early_sriov, just clear all block's status.hw
in the head place and set the status.hw after hw_init done is enough
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
we can simplify all those unnecessary function under
SRIOV for vega10 since:
1) PSP L1 policy is by force enabled in SRIOV
2) original logic always set all flags which make itself
a dummy step
besides,
1) the ih_doorbell_range set should also be skipped
for VEGA10 SRIOV.
2) the gfx_common registers should also be skipped
for VEGA10 SRIOV.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add 64 bits register access functions
v2: implement 64 bit functions in low level
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When doing a GPU reset or unloading the driver, we need to
put the SMU into the apprpriate state for the re-init after
the reset or unload to reliably work.
I don't think this is necessary for BACO because the SMU actually
controls the BACO state to it needs to be active.
For suspend (S3), the asic is put into D3 so the SMU would be
powered down so I don't think we need to put the SMU into
any special state.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add GPU info firmware for Arcturus.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add IP blocks for Arcturus.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add asic type for Arcturus.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
enable DC for navi14.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
same with navi10
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add navi14 to case statement to load the GPU info firmware.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add CHIP_NAVI14 to the list of asic types.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to serialize access to the psp ring if there are multiple
callers at runtime.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When CONFIG_PERF_EVENTS is disabled, we cannot compile the pmu
portion of the amdgpu driver:
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:48:38: error: no member named 'hw' in 'struct perf_event'
struct hw_perf_event *hwc = &event->hw;
~~~~~ ^
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:51:13: error: no member named 'attr' in 'struct perf_event'
if (event->attr.type != event->pmu->type)
~~~~~ ^
...
Use conditional compilation for this file.
Fixes: 9c7c85f7ea ("drm/amdgpu: add pmu counters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Perform a ras_suspend to disable ras on all IPs to workaround
some ROCm stability issue.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GPU atomics operation depends on PCIE atomics support.
Always enable PCIE atomics ops support in case that
it hasn't been enabled.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
MGPU fan boost feature should not be enabled until all the
devices from the same hive are all back from reset.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
We don't want to expose sensitive ASIC information before ASIC release.
[HOW]
Encode the soc_bounding_box in the gpu_info FW (for Linux) and read it
at driver load.
v2: fix warning when CONFIG_DRM_AMD_DC_DCN2_0 is not set (Alex)
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set the IPs for navi10 in early_init like other asics.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since from soc15, make sure only AndMasked bit get changed
when applied or_mask
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When amdgpu_mes is enabled and asic family is navi10 and
later asic, enable mes per device.
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
to control enablement.
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The MCBP unit test is used to test the functionality of MCBP.
It emualtes to send preemption request and resubmit the unfinished
jobs.
v2: squash in fixes (Alex)
v3: squash in memory leak fix (Jack)
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CSA is the Context Save Area for preemption.
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add mcbp driver parameter, so that mcbp feature can be
enabled/disabled by driver parameter.
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Parse the new parameters for gfx10.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gpu info firmware stores configuration data for various
IP blocks.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
adding perf event counters
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Gj1MeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGctkH/0At3+SQPY2JJSy8
i6+TDeytFx9OggeGLPHChRfehkAlvMb/kd34QHnuEvDqUuCAMU6HZQJFKoK9mvFI
sDJVayPGDSqpm+iv8qLpMBPShiCXYVnGZeVfOdv36jUswL0k6wHV1pz4avFkDeZa
1F4pmI6O2XRkNTYQawbUaFkAngWUCBG9ECLnHJnuIY6ohShBvjI4+E2JUaht+8gO
M2h2b9ieddWmjxV3LTKgsK1v+347RljxdZTWnJ62SCDSEVZvsgSA9W2wnebVhBkJ
drSmrFLxNiM+W45mkbUFmQixRSmjv++oRR096fxAnodBxMw0TDxE1RiMQWE6rVvG
N6MC6xA=
=+B0P
-----END PGP SIGNATURE-----
Merge v5.2-rc5 into drm-next
Maarten needs -rc4 backmerged so he can pull in the fbcon notifier
removal topic branch into drm-misc-next.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need the asic_funcs set for the get rom callbacks in some
cases.
Tested-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
UAPI Changes:
Cross-subsystem Changes:
- Add code to signal all dma-fences when freed with pending signals.
- Annotate reservation object access in CONFIG_DEBUG_MUTEXES
Core Changes:
- Assorted documentation fixes.
- Use irqsave/restore spinlock to add crc entry.
- Move code around to drm_client, for internal modeset clients.
- Make drm_crtc.h and drm_debugfs.h self-contained.
- Remove drm_fb_helper_connector.
- Add bootsplash to todo.
- Fix lock ordering in pan_display_legacy.
- Support pinning buffers to current location in gem-vram.
- Remove the now unused locking functions from gem-vram.
- Remove the now unused kmap-object argument from vram helpers.
- Stop checking return value of debugfs_create.
- Add atomic encoder enable/disable helpers.
- pass drm_atomic_state to atomic connector check.
- Add atomic support for bridge enable/disable.
- Add self refresh helpers to core.
Driver Changes:
- Add extra delay to make MTP SDM845 work.
- Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip.
- Add zpos and ?BGR8888 support to meson.
- More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis.
- Allow synopsis to unwedge the i2c hdmi bus.
- Add orientation quirks for GPD panels.
- Edid cleanups and fixing handling for edid < 1.2.
- Add runtime pm to stm.
- Handle s/r in dw-hdmi.
- Add hooks for power on/off to dsi for stm.
- Remove virtio dirty tracking code, done in drm core.
- Rework BO handling in ast and mgag200.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl0DYU8ACgkQ/lWMcqZw
E8NNWw/+MhcRakQmrNDMRIj4DvukzPW2efXbhRFuvthUvVN7rOHMzQZBc3le+gUb
2GGoEeUYG7XoA/Nj3ZQMUoalrjODywtLClBClC4Blped0mZ4JPiI7bTsrNILn1N1
hZ0+DbffMCAKqKN8TftK/TrFF9IEM8JSftqD/1RdkiXVcMH3NKuLABHZxzPxH2BH
XuSqIL5lDyAtanixB53aDf2gw9iipUphYoFlKhdx9dr5Ql96RhiOcDgFhXnFiQu4
O9z3W6tRI2VPoCzsnhNy3Eah7rBDnZwvyfGa9YU/Q+VeHegb601p8OmNNwpshWE1
ohixBbADj0dr+K3T/lVW30kovg34i4L5K3O7MR0HxWYSA7+v3AHyG7/GWLxbBNQn
AFHTRbBph8aP/Dn24ucbKaB7wHi31j7b0Hxj+oJR8RoGhuOYcMOuZrCHqpAxStto
riSVDCRcq/KcPiuqZZ1UnzFWlQMhNFUwumloPiXFkJ4mcSdK9IbdKBd2eqbRdaU1
eTOA4istVgNgaNbgLvVB2ltjqXrsdio7/jh6RhobFPqHISiL7iMZg3C/KRBXrkUB
lYMeGkiE3Wp77zdycdofuEbMfAYUwLts8EYjVsM6xo0BKlBYhpeVuBOYeQEkU7PV
PpGYqQVeZUoD1OyGlMWIYoyb5Ya7OLUDpooOJdFqoPzUfDki31E=
=4uQX
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-06-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.3:
UAPI Changes:
Cross-subsystem Changes:
- Add code to signal all dma-fences when freed with pending signals.
- Annotate reservation object access in CONFIG_DEBUG_MUTEXES
Core Changes:
- Assorted documentation fixes.
- Use irqsave/restore spinlock to add crc entry.
- Move code around to drm_client, for internal modeset clients.
- Make drm_crtc.h and drm_debugfs.h self-contained.
- Remove drm_fb_helper_connector.
- Add bootsplash to todo.
- Fix lock ordering in pan_display_legacy.
- Support pinning buffers to current location in gem-vram.
- Remove the now unused locking functions from gem-vram.
- Remove the now unused kmap-object argument from vram helpers.
- Stop checking return value of debugfs_create.
- Add atomic encoder enable/disable helpers.
- pass drm_atomic_state to atomic connector check.
- Add atomic support for bridge enable/disable.
- Add self refresh helpers to core.
Driver Changes:
- Add extra delay to make MTP SDM845 work.
- Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip.
- Add zpos and ?BGR8888 support to meson.
- More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis.
- Allow synopsis to unwedge the i2c hdmi bus.
- Add orientation quirks for GPD panels.
- Edid cleanups and fixing handling for edid < 1.2.
- Add runtime pm to stm.
- Handle s/r in dw-hdmi.
- Add hooks for power on/off to dsi for stm.
- Remove virtio dirty tracking code, done in drm core.
- Rework BO handling in ast and mgag200.
Tiny conflict in drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c,
needed #include <linux/slab.h> to make it compile.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0e01de30-9797-853c-732f-4a5bd6e61445@linux.intel.com
we need register pm sysfs for virt in order
to support dpm level modification because
smu ip block will not be added under SRIOV
v2: whitespace fixes (Alex)
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This option is no longer needed. The default code paths
are now the only option.
v2: Add HPAGE support and a default for non contiguous maps
v3: Misread 512 pages as MiB ...
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use SMU firmware version to indentify the raven1 refresh device and
then load homologous RLC FW.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Huang Rui<Ray.Huang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Drop use of drmP.h in all files named amdgpu*
in drm/amd/amdgpu/
Fix fallout.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org
Split late_init into two functions, one (do_late_init) which
just does the hw init, and late_init which calls do_late_init
and schedules the IB test work. Call do_late_init in
the GPU reset code to run the init code, but not schedule
the IB test code. The IB test code is called directly
in the gpu reset code so no need to run the IB tests
in a separate work thread. If we do, we end up racing.
v2: Rework late_init. Pull out the mgpu fan boost and xgmi
pstate code into late_init so they get called in all cases.
rename the late_init worker thread to delayed work since it's
just the IB tests now which can happen later. Schedule the
work at init and resume time. It's not needed at reset time
because the IB tests are called directly.
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gpu reset will run late_init and schedule the late_init_work. if we
keep triggering gpu reset in a short time, there are potenial races.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use SMU firmware version to indentify the raven1 refresh device and
then load homologous RLC FW.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Huang Rui<Ray.Huang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New stuff for 5.3:
- Add new thermal sensors for vega asics
- Various RAS fixes
- Add sysfs interface for memory interface utilization
- Use HMM rather than mmu notifier for user pages
- Expose xgmi topology via kfd
- SR-IOV fixes
- Fixes for manual driver reload
- Add unique identifier for vega asics
- Clean up user fence handling with UVD/VCE/VCN blocks
- Convert DC to use core bpc attribute rather than a custom one
- Add GWS support for KFD
- Vega powerplay improvements
- Add CRC support for DCE 12
- SR-IOV support for new security policy
- Various cleanups
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190529220944.14464-1-alexander.deucher@amd.com
For passthrough, after rebooted the VM, driver will do
a baco reset before doing other driver initialization during loading
driver. For doing the baco reset, it will first
check the baco reset capability. So first need to set the
cap from the vbios information or baco reset won't be
enabled.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It does the same thing we were doing already. I though it needed
work for gen3/4 speeds, but that seems to be covered already.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Firmware versions can be found as separate sysfs files at:
/sys/class/drm/cardX/device/fw_version (where X is the card number)
The firmware versions are displayed in hexadecimal.
v2: Moved sysfs files to subfolder
Signed-off-by: Ori Messinger <ori.messinger@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
suspend/resume will change ras state behind us. Let driver get notified.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add ras suspend function. rename ras_post_init to amdgpu_ras_resume.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In order to support new PSP feature that PSP may provide interface
to program IH CNTL register, initialize PSP before IH under Vega10
SR-IOV VF
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set different register access mode according to the features
provided by firmware
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ras need initialize proper state after late init
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ras need late init to initialize proper state.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a sysfs file for reporting the number of PCIe replays (NAKs). This
returns the sum of NAKs received and NAKs generated
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Every ring type can have its own timeout setting.
- V2: update lockup_timeout parameter format and cosmetic fixes
- V3: invalidate 0 and negative values
- V4: update lockup_timeout parameter format
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc04M6AAoJEAx081l5xIa+SJgP/0uIgIOM53vPpydgmr+2IEHF
jbDqrd+mipgNriRVHjDsWdUHCUNtyhB7YEBCMrj3mY0rRFI7FlQQf4lOwYGoHiKP
4JZg4kwC37997lFXl1uabGj3DmJLtxKL2/D15zCH/uLe+2EDzWznP6NVdFT3WK0P
YKZQCWT19PWSsLoBRPutWxkmop4AYvkqE0a6vXUlJlFYZK3Bbytx6/179uWKfiX5
ZkKEEtx1XiDAvcp5gBb6PISurycrBY0e/bkPBnK3ES5vawMbTU5IrmWOrQ4D8yOd
z9qOVZawZ6+b2XBDgBWjQ9bM7I5R7Il1q/LglYEaFI9+wHUnlUdDSm6ft5/5BiCZ
fqgkh5Bj2iEsajbSsacoljMOpxpYPqj63mqc+7fAGXF34V+B+9U1bpt8kCbMKowf
7Abb7IuiCR6vLDapjP6VqTMvdQ4O466OEAN83ULGFTdmMqYYH4AxaIwc+xcAk/aP
RNq7/RHhh4FRynRAj9fCkGlF3ArnM88gLINwWuEQq4SClWGcvdw7eaHpwWo77c4g
iccCnTLqSIg5pDVu07AQzzBlW6KulWxh5o72x+Xx+EXWdYUDHQ1SlNs11bSNUBV1
5MkrzY2GuD+NFEjsXJEDIPOr40mQOyJCXnxq8nXPsz/hD9kHeJPvWn3J3eVKyb5B
Z6/knNqM0BDn3SaYR/rD
=YFiQ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This has two exciting community drivers for ARM Mali accelerators.
Since ARM has never been open source friendly on the GPU side of the
house, the community has had to create open source drivers for the
Mali GPUs. Lima covers the older t4xx and panfrost the newer 6xx/7xx
series. Well done to all involved and hopefully this will help ARM
head in the right direction.
There is also now the ability if you don't have any of the legacy
drivers enabled (pre-KMS) to remove all the pre-KMS support code from
the core drm, this saves 10% or so in codesize on my machine.
i915 also enable Icelake/Elkhart Lake Gen11 GPUs by default, vboxvideo
moves out of staging.
There are also some rcar-du patches which crossover with media tree
but all should be acked by Mauro.
Summary:
uapi changes:
- Colorspace connector property
- fourcc - new YUV formts
- timeline sync objects initially merged
- expose FB_DAMAGE_CLIPS to atomic userspace
new drivers:
- vboxvideo: moved out of staging
- aspeed: ASPEED SoC BMC chip display support
- lima: ARM Mali4xx GPU acceleration driver support
- panfrost: ARM Mali6xx/7xx Midgard/Bitfrost acceleration driver support
core:
- component helper docs
- unplugging fixes
- devm device init
- MIPI/DSI rate control
- shmem backed gem objects
- connector, display_info, edid_quirks cleanups
- dma_buf fence chain support
- 64-bit dma-fence seqno comparison fixes
- move initial fb config code to core
- gem fence array helpers for Lima
- ability to remove legacy support code if no drivers requires it (removes 10% of drm.ko size)
- lease fixes
ttm:
- unified DRM_FILE_PAGE_OFFSET handling
- Account for kernel allocations in kernel zone only
panel:
- OSD070T1718-19TS panel support
- panel-tpo-td028ttec1 backlight support
- Ronbo RB070D30 MIPI/DSI
- Feiyang FY07024DI26A30-D MIPI-DSI panel
- Rocktech jh057n00900 MIPI-DSI panel
i915:
- Comet Lake (Gen9) PCI IDs
- Updated Icelake PCI IDs
- Elkhartlake (Gen11) support
- DP MST property addtions
- plane and watermark fixes
- Icelake port sync and VEBOX disable fixes
- struct_mutex usage reduction
- Icelake gamma fix
- GuC reset fixes
- make mmap more asynchronous
- sound display power well race fixes
- DDI/MIPI-DSI clocks for Icelake
- Icelake RPS frequency changing support
- Icelake workarounds
amdgpu:
- Use HMM for userptr
- vega20 experimental smu11 support
- RAS support for vega20
- BACO support for vega12 + fixes for vega20
- reworked IH interrupt handling
- amdkfd RAS support
- Freesync improvements
- initial timeline sync object support
- DC Z ordering fixes
- NV12 planes support
- colorspace properties for planes=
- eDP opts if eDP already initialized
nouveau:
- misc fixes
etnaviv:
- misc fixes
msm:
- GPU zap shader support expansion
- robustness ABI addition
exynos:
- Logging cleanups
tegra:
- Shared reset fix
- CPU cache maintenance fix
cirrus:
- driver rewritten using simple helpers
meson:
- G12A support
vmwgfx:
- Resource dirtying management improvements
- Userspace logging improvements
virtio:
- PRIME fixes
rockchip:
- rk3066 hdmi support
sun4i:
- DSI burst mode support
vc4:
- load tracker to detect underflow
v3d:
- v3d v4.2 support
malidp:
- initial Mali D71 support in komeda driver
tfp410:
- omap related improvement
omapdrm:
- drm bridge/panel support
- drop some omap specific panels
rcar-du:
- Display writeback support"
* tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm: (1507 commits)
drm/msm/a6xx: No zap shader is not an error
drm/cma-helper: Fix drm_gem_cma_free_object()
drm: Fix timestamp docs for variable refresh properties.
drm/komeda: Mark the local functions as static
drm/komeda: Fixed warning: Function parameter or member not described
drm/komeda: Expose bus_width to Komeda-CORE
drm/komeda: Add sysfs attribute: core_id and config_id
drm: add non-desktop quirk for Valve HMDs
drm/panfrost: Show stored feature registers
drm/panfrost: Don't scream about deferred probe
drm/panfrost: Disable PM on probe failure
drm/panfrost: Set DMA masks earlier
drm/panfrost: Add sanity checks to submit IOCTL
drm/etnaviv: initialize idle mask before querying the HW db
drm: introduce a capability flag for syncobj timeline support
drm: report consistent errors when checking syncobj capibility
drm/nouveau/nouveau: forward error generated while resuming objects tree
drm/nouveau/fb/ramgk104: fix spelling mistake "sucessfully" -> "successfully"
drm/nouveau/i2c: Disable i2c bus access after ->fini()
drm/nouveau: Remove duplicate ACPI_VIDEO_NOTIFY_PROBE definition
...
Also reject TDRs if another one already running.
v2:
Stop all schedulers across device and entire XGMI hive before
force signaling HW fences.
Avoid passing job_signaled to helper fnctions to keep all the decision
making about skipping HW reset in one place.
v3:
Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
against it's decrement in drm_sched_stop in non HW reset case.
v4: rebase
v5: Revert v3 as we do it now in sceduler code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-6-git-send-email-andrey.grodzovsky@amd.com
We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.
v2: Remove unused variable.
v4: Move guilty job free into sched code.
v5:
Move sched->hw_rq_count to drm_sched_start to account for counter
decrement in drm_sched_stop even when we don't call resubmit jobs
if guily job did signal.
v6: remove unused variable
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com
It's normal for VRAM to lost during GPU reset and so change
the log level to INFO to avoid confusing users.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
shadow was added into shadow_list by amdgpu_bo_create_shadow.
meanwhile, shadow->tbo.mem was not fully configured.
tbo.mem would be fully configured by amdgpu_vm_sdma_map_table until calling amdgpu_vm_clear_bo.
If sriov TDR occurred between amdgpu_bo_create_shadow and amdgpu_vm_sdma_map_table,
amdgpu_device_recover_vram would deal with shadow without tbo.mem.start.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Under vega10 virtualuzation, smu ip block will not be added.
Therefore, we need add pp clk query and force dpm level function
at amdgpu_virt_ops to support the feature.
v2: add get_pp_clk existence check and use kzalloc to allocate buf
v3: return -ENOMEM for allocation failure and correct the coding style
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_bo_restore_shadow would assign zero to r if succeeded.
r would remain zero if there is only one node in shadow_list.
current code would always return failure when r <= 0.
restart the timeout for each wait was a rather problematic bug as well.
The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver vote low to high pstate switch whenever there is an outstanding
XGMI mapping request. Driver vote high to low pstate when all the
outstanding XGMI mapping is terminated.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
"error" was not correctly spelled.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
use pcie_bandwidth_available to get real link state
to update pcie table.
v2: fix incorrect initialized return value
v3: expand the fetching method about the link width to all asics.
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 9b638f9751.
Adding this to the mapping is complete nonsense and the whole
implementation looks racy. This patch wasn't thoughtfully reviewed
and should be reverted for now.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Liu, Shaoyun <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch enables gfxoff and stutter mode again, since we take more testing on
raven series. For raven2 and picasso, we can enable it directly. And for raven,
we need check the RLC/SMC ucode version cannot be less than #531/0x1e45.
v2: add smc version checking for raven.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Tested-by: Likun Gao <Likun.Gao@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add ras post init function.
Do some initialization after all IP have finished their late init.
Add new member flags which will control the ras work flow.
For now, vbios enable ras for us on boot. That might change in the
future.
So there should be a flag from vbios to tell us if ras is enabled or not
on boot. Looks like there is no such info now.
Other bits of the flags are reserved to control other parts of ras.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mark vram pages with errors as bad and prevent the driver
from using them.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add obj management.
add feature control.
add debugfs infrastructure.
add sysfs infrastructure.
add IH infrastructure.
add recovery infrastructure.
It is a framework. Other IPs need call amdgpu_ras_xxx function instead of
psp_ras_xxx functions.
v2: squash in warning fixes
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
Using SDMA for TLB invalidation in certain ASICs exposed a problem
of IB pool not being ready while SDMA already up on Init and already
shutt down while SDMA still running on Fini. This caused
IB allocation failure. Temproary fix was commited into a
bringup branch but this is the generic fix.
Fix:
Init IB pool rigth after GMC is ready but before SDMA is ready.
Do th opposite for Fini.
v2: Remove restriction on SDMA early init and move amdgpu_ib_pool_fini
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver vote low to high pstate switch whenever there is an outstanding
XGMI mapping request. Driver vote high to low pstate when all the
outstanding XGMI mapping is terminated.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move pp_feature from the struct of amd_powerplay to amdgpu_device.
Add pp_feature limit for overdrive interface.
v2: put pp_feature into struct amdgpu_pm.
v3: merge feature_mask with pp_feature.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The new Vega series GPU cards have in-built bridges. To get the pcie
speed and width supported by the platform walk the hierarchy and get the
slowest link.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- Split out some part of drm_crtc_helper.h into drm_probe_helper.h
- DRIVER_* flags improvements
- New tasks on the TODO-list
- Improvements to the documentation
Driver Changes:
- Continual of drmP.h removal in multiple drivers
- Removal of FBINFO_(FLAG_)DEFAULT in multiple drivers
- sun4i: Addition of the A23 support, multiple fixes for the tiled
formats
- atmel-hlcdc: Fix of clipping and rotation properties
- qxl: various BO-related improvements, prime and generic fbdev emulation
support
- dw-hdmi: Support for HDMI2.0 2160p modes and YUV420 output
- New Sitronix ST7701 panel driver
- New Kingdisplay KD097D04 panel driver
- New LeMaker BL035-RGB-002 panel driver
- New PDA 91-00156-A0 panel driver
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXFRb/wAKCRDj7w1vZxhR
xXj2AQDFAkH0/tksJ+T6yrHZXQE7q/6fAWTlrDXLo7Nb1hhfswEA1AyaHmZJA0Ar
u7S+4Gfzs/gisw9Jr4bqQaerEpYJxQA=
=qumD
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-02-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.1:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- Split out some part of drm_crtc_helper.h into drm_probe_helper.h
- DRIVER_* flags improvements
- New tasks on the TODO-list
- Improvements to the documentation
Driver Changes:
- Continual of drmP.h removal in multiple drivers
- Removal of FBINFO_(FLAG_)DEFAULT in multiple drivers
- sun4i: Addition of the A23 support, multiple fixes for the tiled
formats
- atmel-hlcdc: Fix of clipping and rotation properties
- qxl: various BO-related improvements, prime and generic fbdev emulation
support
- dw-hdmi: Support for HDMI2.0 2160p modes and YUV420 output
- New Sitronix ST7701 panel driver
- New Kingdisplay KD097D04 panel driver
- New LeMaker BL035-RGB-002 panel driver
- New PDA 91-00156-A0 panel driver
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190201144749.t3abxvguhstu6bcl@flea
New stuff for 5.1.
amdgpu:
- DC bandwidth formula updates
- Support for DCC on scanout surfaces
- Support for multiple IH rings on soc15 asics
- Fix xgmi locking
- Add sysfs interface to get pcie usage stats
- Simplify DC i2c/aux code
- Initial support for BACO on vega10/20
- New runtime SMU feature debug interface
- Expand existing sysfs power interfaces to new clock domains
- Handle kexec properly
- Simplify IH programming
- Rework doorbell handling across asics
- Drop old CI DPM implementation
- DC page flipping fixes
- Misc SR-IOV fixes
amdkfd:
- Simplify the interfaces between amdkfd and amdgpu
ttm:
- Add a callback to notify the driver when the lru changes
sched:
- Refactor mirror list handling
- Rework hw fence processing
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125231517.26268-1-alexander.deucher@amd.com
Decauple sched threads stop and start and ring mirror
list handling from the policy of what to do about the
guilty jobs.
When stoppping the sched thread and detaching sched fences
from non signaled HW fenes wait for all signaled HW fences
to complete before rerunning the jobs.
v2: Fix resubmission of guilty job into HW after refactoring.
v4:
Full restart for all the jobs, not only from guilty ring.
Extract karma increase into standalone function.
v5:
Rework waiting for signaled jobs without relying on the job
struct itself as those might already be freed for non 'guilty'
job's schedulers.
Expose karma increase to drivers.
v6:
Use list_for_each_entry_safe_continue and drm_sched_process_job
in case fence already signaled.
Call drm_sched_increase_karma only once for amdgpu and add documentation.
v7:
Wait only for the latest job's fence.
Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sriov would meet guest driver load failure,
if calling amdgpu_asic_reset in amdgpu_device_init.
sriov should skip asic_reset in device_init.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Having the probe helper stuff (which pretty much everyone needs) in
the drm_crtc_helper.h file (which atomic drivers should never need) is
confusing. Split them out.
To make sure I actually achieved the goal here I went through all
drivers. And indeed, all atomic drivers are now free of
drm_crtc_helper.h includes.
v2: Make it compile. There was so much compile fail on arm drivers
that I figured I'll better not include any of the acks on v1.
v3: Massive rebase because i915 has lost a lot of drmP.h includes, but
not all: Through drm_crtc_helper.h > drm_modeset_helper.h -> drmP.h
there was still one, which this patch largely removes. Which means
rolling out lots more includes all over.
This will also conflict with ongoing drmP.h cleanup by others I
expect.
v3: Rebase on top of atomic bochs.
v4: Review from Laurent for bridge/rcar/omap/shmob/core bits:
- (re)move some of the added includes, use the better include files in
other places (all suggested from Laurent adopted unchanged).
- sort alphabetically
v5: Actually try to sort them, and while at it, sort all the ones I
touch.
v6: Rebase onto i915 changes.
v7: Rebase once more.
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: CK Hu <ck.hu@mediatek.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: virtualization@lists.linux-foundation.org
Cc: etnaviv@lists.freedesktop.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-amlogic@lists.infradead.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: spice-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-renesas-soc@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-tegra@vger.kernel.org
Cc: xen-devel@lists.xen.org
Link: https://patchwork.freedesktop.org/patch/msgid/20190117210334.13234-1-daniel.vetter@ffwll.ch
To deal with situations like kexec or GPU VM passthrough
where the device may have been used previously without a
proper GPU reset between.
v2: rebase
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108585
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108754
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: Move locks around in other functions so that this
function can stand on its own. Also only hold the hive
specific lock for add/remove device instead of the driver
global lock so you can't add/remove devices in parallel from
one hive.
v3: add reset_lock
Acked-by: Shaoyun.liu < Shaoyun.liu@amd.com>
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When init fail, send rel init, req_fini and rel_fini to host for the
finishing routine.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
distinguish ip_reinit_early_sriov and ip_reinit_late_sriov
by different log RE-INIT-early and RE-INIT-late
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu
reset.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-By: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For virtual display feature, no need to pin cursor bo.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's not a core function, and the matching atomic functions are also
not in the core. Plus the suspend/resume helper is also already there.
Needs a tiny bit of open-coding, but less midlayer beats that I think.
v2: Rebase onto ast (which gained a new user).
Cc: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Rex Zhu <Rex.Zhu@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shaoyun Liu <Shaoyun.Liu@amd.com>
Cc: Monk Liu <Monk.Liu@amd.com>
Cc: nouveau@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20181217194303.14397-4-daniel.vetter@ffwll.ch
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
I retested Bonaire (gfx7 dGPU) and it works fine.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SI does not use doorbells, move asic doorbell init later
asic check.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108920
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use per hive wq to concurrently send reset commands to all nodes
in the hive.
v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop the hive reset for each node loop if there
is a reset failure on any of the nodes.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.
v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.
Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For XGMI hive case do reset in steps where each step iterates over
all devs in hive. This especially important for asic reset
since all PSP FW in hive must come up within a limited time
(around 1 sec) to properply negotiate the link.
Do this by refactoring amdgpu_device_gpu_recover and amdgpu_device_reset
into pre_asic_reset, asic_reset and post_asic_reset functions where is part
is exectued for all the GPUs in the hive before going to the next step.
v2: Update names for amdgpu_device_lock/unlock functions.
v3: Introduce per hive locking to avoid multiple resets for GPUs
in same hive.
v4:
Remove delayed_workqueue()/ttm_bo_unlock_delayed_workqueue() - they
are copy & pasted over from radeon and on amdgpu there isn't
any reason for that any more.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is prep work for updating each PSP FW in hive after
GPU reset.
Split into build topology SW state and update each PSP FW in the hive.
Save topology and count of XGMI devices for reuse.
v2: Create seperate header for XGMI.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ASIC specific doorbell layout is used instead of enum definition
Signed-off-by: Oak Zeng <ozeng@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Also call functioin amdgpu_device_doorbell_init after
amdgpu_device_ip_early_init because the former depends
on the later to set up asic-specific init_doorbell_index
function
Signed-off-by: Oak Zeng <ozeng@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue will
break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, paging
queues doorbell index use same index as SDMA gfx queues index but on second page.
For Vega20, after we change doorbell layout to increase SDMA doorbell for 8 SDMA RLC
queues later, we could use new doorbell index for paging queue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Setup and tear down xgmi as part of psp.
v2:
- make psp_xgmi_terminate static
- squash in:
drm/amdgpu: only issue xgmi cmd when it is enabled
drm/amdgpu/psp: terminate xgmi ta in suspend and hw_fini phase
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no functional changes,
Use function arguments for SRIOV special variables which
is hardcode in those functions.
so we can share those functions in baremetal.
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After testing looks like these subset of ASICs has GPU reset
working for the most part. Enable reset due to job timeout.
v2: Switch from GFX version to ASIC type.
v3: Fix identation
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is still completely breaking my Raven system.
This reverts commit cdf2f910fa969adca1b0e3ad2b487821233dc038.
Revert until we sort out the sbios and firmware combinations that work
correctly.
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108606
Cc: stable@vger.kernel.org # v4.19
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
During GPU recover DAL would hang in
amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty
Fix:
Turns out there was a typo introduced by
3320b8d drm/amdgpu: remove job->ring which caused skipping
amdgpu_fence_driver_force_completion and so the hangged job
was never force signaled and this would cause the hang later in DAL.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to check adev->powerplay.pp_funcs.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Extract the function of fw loading out of powerplay.
Do fw loading between hw_init/resuem_phase1 and phase2
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to do some IPs earlier to deal with ordering issues
similar to how resume is split into two phases.
Will do fw loading via smu/psp between the two phases.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. one is for create/free bo when init/fini
2. one is for fill the bo before fw loading
the ucode bo only need to be created when load driver
and free when driver unload.
when resume/reset, driver only need to re-fill the bo
if the bo is allocated in vram.
Suggested by Christian.
v2: Return error when bo create failed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix cg/pg unexpected set in hw init failed case.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. only call late_init when hw_init successful,
so check status.hw instand of status.valid in late_init.
2. set status.late_initialized true if late_init was not implemented.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move in_suspend flag to adev from gfx, so
can be used in other ip blocks, also keep
consistent with gpu_in_reset flag.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
MGPU fan boost feature is enabled only when two or more dGPUs
in the system.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't grab the reservation lock any more and simplify the handling quite
a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It shouldn't add much overhead and we should make sure that critical
VRAM content is always restored.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Treat them all as Raven rather than adding a new picasso
asic type. This simplifies a lot of code and also handles the
case of rv2 chips with the 0x15d8 pci id. It also fixes dmcu
fw handling for picasso.
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
enable gfxoff in non-sriov and stutter mode by default
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support for picasso to the display manager.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the IP blocks, clock and powergating flags, and common clockgating support.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Driver will save an array of XGMI hive info, each hive will have a list of devices
that have the same hive ID.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
since we use PSP to program IH regs now
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move that into amdgpu_gmc.c since we are really deadling with GMC
address space here.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It will be more safe to make full-acess include both phase1 and phase2.
Then accessing special registeris wherever at phase1 or phase2 will not
block any shutdown and suspend process under virtualization.
Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check if we should call the function instead of providing the forced
flag.
v2: rebase on KFD changes (Alex)
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no need for gpu full access for suspend phase1
because under virtualization there is no hw register access
for dce block.
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After set power ungate state, set clock ungate state
before when suspend or fini.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Unify to set power ungate state at the begin of suspend/fini.
Remove the workaround code for gfx off feature in
amdgpu_device.c.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are no any logical changes here.
1. change function names:
amdgpu_device_ip_late_set_pg/cg_state to
amdgpu_device_set_pg/cg_state.
2. add a function argument cg/pg_state, so
we can enable/disable cg/pg through those functions
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of the fixed round robin use let the scheduler balance the load
of page table updates.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cancel the delay work to avoid the corner case that
ib test was not running when suspend
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
there may be gfx off delay work pending when suspend/driver
unload, need to cancel them first.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
use amdgpu_gfx_off_ctrl function so driver can arbitrate
whether the gfx ip can be power off or power on.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
delay to enable gfx off feature to avoid gfx on/off frequently
suggested by Alex and Evan.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2:
1. drop the special handling for the hw IP
suggested by hawking and Christian.
2. refine the variable name suggested by Flora.
This funciton as the entry of gfx off feature.
we arbitrat gfx off feature enable/disable in this
function.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 8624c3c4dbfe24fc6740687236a2e196f5f4bfb0.
We need CONFIG_DRM_AMD_DC_DCN1_0 to guard code that is using fp math.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Extend the timeout for recovering vram bos from shadows on sr-iov
to cover the worst case scenario for timeslices and VFs
Under runtime, the wait fence time could be quite long when
other VFs are in exclusive mode. For example, for 4 VF, every
VF's exclusive timeout time is set to 3s, then the worst case is
9s. If the VF number is more than 4,then the worst case time will
be longer.
The 8s is the test data, with setting to 8s, it will pass the TDR
test for 1000 times.
SWDEV-161490
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch moves amdgpu_fbdev_set_suspend() to the beginning
of suspend sequence.
This is to ensure fbcon does not to write to the VRAM
after GPU is powerd down.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the newly split ip suspend functions to do suspend displays
first (to deal with atomic so that FBs can be unpinned before
attempting to evict vram), then evict vram, then suspend the
other IPs. Also move the non-DC pinning code to only be
called in the non-DC cases since atomic should take care of
DC.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107065
Fixes: e00fb85 drm: Stop updating plane->crtc/fb/old_fb on atomic drivers
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to do some IPs earlier to deal with ordering issues
similar to how resume is split into two phases. Do DCE first
to deal with atomic, then do the rest.
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch removes the usage of console_(un)lock
by replacing drm_fb_helper_set_suspend() to
drm_fb_helper_set_suspend_unlocked() which locks and
unlocks the console instead of locking ourselves.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
While the console_lock is held, console output will be buffered, till
its unlocked it wont be emitted, hence its ideal to unlock sooner to enable
debugging/detecting/fixing of any issue in the remaining sequence of events
in resume path.
The concern here is about consoles other than fbcon on the device,
e.g. a serial console
[How]
This patch restructures the console_lock, console_unlock around
amdgpu_fbdev_set_suspend() and moves this new block appropriately.
V2: Kept amdgpu_fbdev_set_suspend after pci_set_power_state
V3: Updated the commit message to clarify the real concern that this patch
addresses.
V4: code clean-up.
V5: fixed return value
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Trivial fix to spelling mistake in dev_err error message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allowing CONFIG_DRM_AMD_DC_DCN1_0 to be disabled on X86 was an
opportunity for display with Raven Ridge accidentally not working.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 2c773de2 (drm/amdgpu: defer test IBs on the rings at boot (V3))
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can easily get that from the scheduler.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
job could be NULL when amdgpu_device_gpu_recover is called
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
It could be got by amdgpu_bo_gpu_offset() if need
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Eliminating the warnings produced by sphinx when processing the sphinx comments in
amdgpu_device.c & amdgpu_mn.c
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the newly exported pci functions to get the link width
and speed rather than using the drm duplicated versions.
Also query the GPU link caps directly rather than hardcoding
them.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Partially revert commit 2dc80b0065
("drm/amdgpu: optimize amdgpu driver load & resume time")'
1. CG/PG enablement are part of gpu hw ip initialize, we should
wait for them complete. otherwise, there are some potential conflicts,
for example, Suspend and CG enablement concurrently.
2. better run ib test after hw initialize completely. That is to say,
ib test should be after CG/PG enablement. otherwise, the test will
not cover the cg/pg/poweroff enable case.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. add amdgpu_device_ip_late_set_pg_state function for
set pg state.
2. delete duplicate pg state setting on gfx_v8_0's late_init.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
we can take gfx off feature as gfx power gate. gfx off feature is also
controled by smu. so add gfx_off support in pp_set_powergating_by_smu.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Two requests have come in for a backmerge,
and I've got some pull reqs on rc2, so this
just makes sense.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use enum amd_powergating_state instead of enum amd_clockgating_state.
The underlying value stays the same, so there is no functional change
in practise. This fixes a warning seen with clang:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1930:14: warning: implicit
conversion from enumeration type 'enum amd_clockgating_state' to
different enumeration type 'enum amd_powergating_state'
[-Wenum-conversion]
AMD_CG_STATE_UNGATE);
^~~~~~~~~~~~~~~~~~~
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoid confusing the GART with the GTT domain.
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We've had a number of users report failures to detect and light up
display with DC with LVDS and VGA. These connector types are not
currently supported with DC. I'd like to add support but unfortunately
don't have a system with LVDS or VGA available.
In order not to cause regressions we should probably fallback to the
non-DC driver for ASICs that support VGA and LVDS.
These ASICs are:
* Bonaire
* Kabini
* Kaveri
* Mullins
ASIC support can always be force enabled with amdgpu.dc=1
v2: Keep Hawaii on DC
v3: Added Mullins to the list
Cc: stable@vger.kernel.org
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
driver need to know the real power source to do some power
related configuration when initialize.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After defer the execution of clockgating enabling, at that time, gfx already
enter into "off" state. Howerver, clockgating enabling will use MMIO to access
the gfx registers, then get the gfx hung.
So here we should move the gfx powergating and gfxoff enabling behavior at the
end of initialization behind clockgating.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VCN clockgating is handled manually like VCE and UVD.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
First of all it's already being called from the display code from amd_ip_funcs.suspend/resume hooks.
Second of all, the place in amdgpu_device_gpu_recover it's being called is wrong for GPU stalls since
it is called BEFORE we cancel and force completion of all in flight jobs which were not yet processed.
So, as Bas pointed in the ticket we will try to wait for fence in amdgpu_pm_compute_clocks but the pipe
is hanged so we end up in deadlock.
v2: remove unused variable
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106500
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
vega20_gpu_info firmware stores gpu configuration for vega20.
v2: drop gpu info firmware for vega20
Squash of:
drm/amdgpu: Add gpu_info firmware for vega20.
drm/amdgpu: drop gpu_info firmware for vega20
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add vega20 to amd_asic_type enum and amdgpu_asic_name[].
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When put the IB test out of exclusive mode, and do sriov reset,
the IB test will randomly fail. As out of exclusive mode it uses
kiq to do read and write registers, but as it has world switch,
the kiq read and write time will be random, sometimes it will
beyond the MAX_KIQ_REG_WAIT and then the read or write register
will fail, which will result the IB test fail.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: check reserved vram size before allocate.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.
This patch defers it and ensures that its executed
in amdgpu_info_ioctl() if it wasn't scheduled.
V2: Use queue_delayed_work() & flush_delayed_work().
V3: removed usage of separate wq, ensure ib tests is
run before enabling clockgating.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We enabled this upstream by default now and no longer need the flag.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the new callback to determine whether to use full
asic reset or per IP soft reset. Enables reset to
actually proceed on asics which don't support soft
reset yet.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since drm_framebuffer can now store GEM objects directly, place them
there rather than in our own subclass. As this makes the framebuffer
create_handle and destroy functions the same as the GEM framebuffer
helper, we can reuse those.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König<christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
When unloading due to failure amdgpu_device_fini was called twice
which was leading to NULL ptr in amdgpu_irq_disable_all.
Fix:
Call amdgpu_device_fini only once from amdgpu_driver_unload_kms.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DC is used for modesetting on vega12.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
soc15 just like vega10 and raven.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Add vega12 to amd_asic_type enum and amdgpu_asic_name[].
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
By moving amdgpu_irq_disable_all earlier in the sequence
fixes an issue with disabling pflip interrupts:
*ERROR* dal_irq_service_dummy_ack: called for non-implemented irq source
Earlier patch fixed a memory corruption and revealed irq
warnings.This way it seems to be there no obvious issues
with unloading the module.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Disable irq on devices before destroying them. That prevents
use-after-free memory access when unloading the driver.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This change fixes the deadlock when unloading the driver with displays
connected.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
make it symmetric with amdgpu_ucode_init_bo in amd_powerplay.c
refine the "commit b22558bb4ff8fc9fe925222f90297d7a03a5fb20"
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to replicate it in several places.
Reviewed-by: Rex Zhu <rezhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to replicate it in several places.
Reviewed-by: Rex Zhu <rezhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
mailbox registers can be accessed with a byte boundry according
to BIF team, so this patch prepares register byte access
and will be used by following patches.
Actually, for mailbox registers once the byte field is touched even not changed,
the mailbox behaves, so we need the byte width accessing to those sort of regs.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Pixel Ding <Pixel.Ding@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The amdgpu_ucode_fini_bo should be called after gfx_v8_0_hw_fini,
or it will have KCQ disable failed issue.
For Tonga, as it firstly finishes SMC block, and the SMC hw fini
will call amdgpu_ucode_fini, which will lead the amdgpu_ucode_fini_bo
called before gfx_v8_0_hw_fini, this is incorrect.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The amdgpu_pm_sysfs_fini should call before amdgpu_device_ip_fini,
or the adev->pm.dpm_enabled would be set to 0, then the device files
related to pp won't be removed by amdgpu_pm_sysfs_fini when unload
driver.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1)create a routine "handle_vram_lost" to do the vram
recovery, and put it into amdgpu_device_reset/reset_sriov,
this way no need of the extra paramter to hold the
VRAM LOST information and the related macros can be removed.
3)show vram_recover failure if time out, and set TMO equal to
lockup_timeout if vram_recover is under SRIOV runtime mode.
4)report error if any ip reset failed for SR-IOV
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
found recover_vram_from_shadow sometimes get executed
in paralle with SDMA scheduler, should stop all
schedulers before doing gpu reset/recover
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
WB_FREE should be put after all engines's hw_fini
done, otherwise the invalid wptr/rptr_addr would still
be used by engines which trigger abnormal bugs.
This fixes couple DMAR reading error in host side for SRIOV
after guest kmd is unloaded.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix:
should do right shift on wb before clearing
cleanups:
1,should memset all wb buffer
2,set max wb number to 128 (total 4KB) is big enough
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_dm_display_resume is now called from dm_resume to
unify DAL resume call into a single function call
There is no more need to separately call 2 resume functions
for DM.
Initially they were separated to resume display state after
cursor is pinned. But because there is no longer any corruption
with the cursor - the calls can be merged into one function hook.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add common smu_soc_asic_init function to emulate the sillicon post sequence
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On emulation mode , driver will be loaded with powerplay disabled
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add amdgpu_emu_mode module parameter to control the emulation mode
Avoid vbios operation on emulation since there is no vbios post duirng emulation,
use the common hw_init to simulate the post
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Acked-By: Alex Deucher <alexander.deucher@amd.com>
Acked-By: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It seems to be working now.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=102372
Reviewed-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The preinstall callback didn't do anything because not all
of the IPs were initialized when it was called.
Move the postinstall setup into sequence in the driver.
The uninstall callback disabled all interrupt source, but
it got called too late in the driver sequence and caused problems
with IPs who already freed the relevant data structures. Move
the call into the right place in the driver sequence.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Tested-By: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
And rename it to struct gmc_funcs.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
And rename it to amdgpu_gmc as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We only support vga_switcheroo and runtime pm on PX/HG systems
so forcing runpm to 1 doesn't do anything useful anyway.
Only call vga_switcheroo_init_domain_pm_ops() for PX/HG so
that the cleanup path is correct as well. This mirrors what
radeon does as well.
v2: rework the patch originally sent by Lukas (Alex)
Acked-by: Lukas Wunner <lukas@wunner.de>
Reported-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de> (v1)
Cc: stable@vger.kernel.org
Otherwise it keeps rejecting the reset.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Windows added by the BIOS are not marked as 64bit because they are
usually not changeable anyway.
This fixes large BAR support on my new Ryzen build system.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add device for consistency with other functions in this file.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add device for consistency with other functions in this file.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_device.c was getting pretty cluttered.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for consistency with the other functions in that file.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prefix the functions with device or device_ip for functions which
deal with ip blocks for consistency.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
and move them to amdgpu_atombios.c for consistency.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With introduction of amdgpu_gpu_recovery we don't need any more
to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new parameter to control GPU recovery procedure.
v2:
Add auto logic where reset is disabled for bare metal and enabled
for SR-IOV.
Allow forced reset from debugfs.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The expectation is that the base driver doesn't mess with these.
Some components interact with these directly so let the components
handle these directly.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The expectation is that the base driver doesn't mess with these.
Some components interact with these directly so let the components
handle these directly.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
this VRAM evict is not needed and also cost 2seconds
to finish because the IRQ is software side disabled
before it.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Torture test for MM and VM support, can be used to evict all VRAM while
the system is under load.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove the superflous .debugfs_init callback and register all files in
amdgpu_device.c in just one function.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This moves and renames the AMDGPU scheduler to a common location in DRM
in order to facilitate re-use by other drivers. This is mostly a straight
forward rename with no code changes.
One notable exception is the function to_drm_sched_fence(), which is no
longer a inline header function to avoid the need to export the
drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures.
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch allows specifying the vm_block_size even when multi level
page directories are active.
v2: fix signed/unsigned compare warning
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This moves validation of the VM size parameter into amdgpu_vm_adjust_size().
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The VM size actually doesn't need to be a power of two.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of specifying interruptible and no_wait_gpu manually.
v2: rebase
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For VCE to work properly the start of the GTT space must be aligned to a
4GB boundary.
v2: add comment why we do this
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove some outdated comments and all code which tries to reduce the VRAM size
mapped into the MC.
This is superfluous and misleading since we never actually program the size.
v2: handle gmc_v6_0.c as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't even try to resize the BAR when there is no window above 4GB.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
instead of doing it in each GFX ip's sw_fini
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
NULL pointer is because original logic will step into
set_pde_pte() even after the gart.ptr is freed due to
there are twice gart_unbind() on all gart area.
also, there are other minor fixes:
1,since gart_init only create dummy page, the corresponding
gart_fini shouldn't do more like unbinding all GART, this is
unnecessary because in driver fini stage all GART unbinding
had already been done during each IP's SW_FINI (GMC's
SW_FINI is the last one called), so remove the step
for the GART unbinding in gart_fini().
2,gart_fini() is already invoked during each GMC IP's gart_fini
routine,e.g. gmc_vx_0_gart_fini(), so no need to manually
call it during ttm_fini().
3,amdgpu_gem_force_release() should be put ahead of
amdgpu_vm_manager_fini()
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Retry at drm_dev_register instead of amdgpu_device_init.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It introduces 900ms latency in exclusive mode which causes failure
of driver loading. Host can resize the BAR before guest staring,
so the resizing is not necessary here.
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Try to resize BAR0 to let CPU access all of VRAM.
v2: rebased, style cleanups, disable mem decode before resize,
handle gmc_v9 as well, round size up to power of two.
v3: handle gmc_v6 as well, release and reassign all BARs in the driver.
v4: rename new function to amdgpu_device_resize_fb_bar,
reenable mem decoding only if all resources are assigned.
v5: reorder resource release, return -ENODEV instead of BUG_ON().
v6: squash in rebase fix
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The previous solution will create a zero buffer on the system
domain and then move the zeroes to the VRAM. This will break the
original data on the VRAM.
Refine the code to create bo on VRAM domain directly and then remove
and re-create mem node to the exact position before bo_pin. This can
avoid breaking the data and will not cause eviction.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: monk liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is caused of that hypervisor fails to handle request, one known
issue is MMIO unblocking timeout. In theory we can retry init here.
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The GTT manager handles the GART address space anyway, so it is
completely pointless to keep the same information around twice.
v2: rebased
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
since now gpu reset is unified with gpu_recover
for both bare-metal and SR-IOV:
1)rename in_sriov_reset to in_gpu_reset
2)move lock_reset from adev->virt to adev
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset
2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently
3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit
V2:
4,in scheduler main routine the job from guilty context will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error
5,in scheduler recovery routine all jobs from the guilty entity would be
dropped
6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.
V3:
7,replace deprecated gpu reset, use new gpu recover
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The exclusive mode has real-time limitation in reality, such like being
done in 300ms. It's easy observed if running many VF/VMs in single host
with heavy CPU workload.
If we find the init fails due to exclusive mode timeout, try it again.
v2:
- rewrite the condition for readable value.
v3:
- fix typo, add comments for sleep
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When this VF stays in exclusive mode for long, other VFs will be
impacted.
The redundant messages causes exclusive mode timeout when they're
redirected. That is a normal use case for cloud service to redirect
guest log to virtual serial port.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
merge the setting guilty on context into this function
to avoid implement extra routine.
v2:
go through entity list and compare the fence_ctx
before operate on the entity, otherwise the entity
may be just a wild pointer
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Chunming Zhou <David1.Zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
cleanups, now only operate on the given ring
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 446947b44f.
this patch is incorrrect, amdgpu_ucode_bo_fini always
called after gfx_hw_fini.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaFkpiAAoJEAx081l5xIa+LOcQAJqXyh7vx++oPe5kJFC2rCoX
MqX1aJ4nH8y04QJqLmKx1SC6eyYsTM92rcg3RfHOThktzonD5l2wSO9TvCkmLtr9
2n9P/aYMcbPTZntrbJc4mQyzd82U0D4h40i5Cmhr9n4gcLPOsOpau/7eclyuEUds
PHZSTCRq0Ygk1K5VWQPyKsY1k1TqFes2YE46FJzkD8SQwKDfbWxVZG0BPnvqb5Om
PMVobnEukruzpsSqnetaEYsW89e0TJ2TW9MSCfVohzWvyCVGzmwSzqaooqOkgFe2
5ZrzA4aW6qRez4nXN2Zw+p9qhS4DZ8MVEJO8qczrR6BGx5yRlHriGhs+5FQskGBT
Idqj6YZX3x/qab/AXQy0fzn2lrZdwxTolG6BgnNOwdGhyFEfz7P7p9kcv4QLbyn5
8MynMUcLmOkpouHD0mpIwn5kS7EU4hbEPGOeBwxy54FbiLFWb81FjlGts2N+/ckI
69UlmyyFZrpxvTmL9vRzvGCeO0zdfvKtBa1GoYWbzNTs8r50F2EtdJkS64SYOVOf
o4ApcG5bznx42NfBwa3TBc+NETTYJPS0blFImPVu1qvdQn5AciX137vYbqzwuqac
2gM2m6Rdfpncw/3VRIePwXYwpNS/3fsa3V6UgzTFlDhrQCtP2XxKPhfru7pFN+te
Vav1I46Q8pa7ko8dS3A3
=P4O6
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux
Pull more drm updates from Dave Airlie:
"Fixes/cleanups for rc1, non-desktop flags for VR
- remove the MSM dt-bindings file Rob managed to push in the previous
pull.
- add a property/edid quirk to denote HMD devices, I had these
hanging around for a few weeks and Keith had done some work on
them, they are fairly self contained and small, and only affect
people using HTC Vive VR headsets so far.
- amdgpu, tegra, tilcdc, fsl fixes
- some imx-drm cleanups I missed, these seemed pretty small, and no
reason to hold off.
I have one TTM regression fix (fixes bochs-vga in qemu) sitting
locally awaiting review I'll probably send that in a separate pull
request tomorrow"
* tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits)
dt-bindings: remove file that was added accidentally
drm/edid: quirk HTC vive headset as non-desktop. [v2]
drm/fb: add support for not enabling fbcon on non-desktop displays [v2]
drm: add connector info/property for non-desktop displays [v2]
drm/amdgpu: fix rmmod KCQ disable failed error
drm/amdgpu: fix kernel hang when starting VNC server
drm/amdgpu: don't skip attributes when powerplay is enabled
drm/amd/pp: fix typecast error in powerplay.
drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support
drm/tegra: sor: Reimplement pad clock
Revert "drm/radeon: dont switch vt on suspend"
drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence
drm/amd/powerplay: fix unfreeze level smc message for smu7
drm/amdgpu:fix memleak
drm/amdgpu:fix memleak in takedown
drm/amd/pp: fix dpm randomly failed on Vega10
drm/amdgpu: set f_mapping on exported DMA-bufs
drm/amdgpu: Properly allocate VM invalidate eng v2
drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
drm/fsl-dcu: avoid disabling pixel clock twice on suspend
...
If gfx_v8_0_hw_fini is called after amdgpu_ucode_fini_bo, we will
hit KCQ disabled failed. Let amdgpu_ucode_fini_bo run after
gfx_v8_0_hw_fini.
BUG: SWDEV-135547
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Wang Hongcheng <Annie.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaDlaqAAoJEAx081l5xIa+VB8P/3tl1kg6gONXBHA89t4aoyaM
uKyLy2D8//9RCPupnI2nOablbcdXzmZYE5gsLGHcN5G/cf9qHksslqo6P/8cjfIC
lOz+2AxzFGTP9s6M0jyE7l4Dlk53Chd+7yOTJfm322BUuAZW7nSjWGglkO6rW6RR
JRyNwIoRLX62nAkD769R9QTh8sh2P7pWvXKUSRtMQVWRRI0fICvUFuqyBbEFjJZN
4GGkqM5bA6GU+z1W91iqkXoPWz34Zejch7cLBM5pXiZsgXOuzl4V/RwxdKZlWVrf
9oA9357yKvvvb1bkNRgjNqLLHdOxQUomv1k2RxCbvX2xUecOCTKXKb4/X+AurZEI
ENfSejTbzj+mP18CI1IsvsQolkighP1xxqjH3zmSu+bS0ivWBywbpDUVN969qKrV
9kHigMwxxX5YCWGoLswhZ+6OsPm5R2FRKg10QVQAlARjye4Q7ssP+l+KRRP8rvkc
D4rZiLBMuIDersRhW3ylEym8gXqSO2BoBJZS3+ECSzweIhvwziNgY0q6lpFxfzJa
fzjW/mfK/uucEshoZrxJVRAEiWwtULvi1KVnTpQ/lm254maj4mOy6atqs7rmdAKK
Jetfg+Z0Fb+805fHeS2dk/E855qwmTCsBf+TA4hGrxoW3EHB3yNLH1j4MSUxK8es
6SpuEv7hzeyCiK0QJcSH
=0JS4
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.15-amd-dc' of git://people.freedesktop.org/~airlied/linux
Pull amdgpu DC display code for Vega from Dave Airlie:
"This is the pull request for the AMD DC (display code) layer which is
a requirement to program the display engines on the new Vega and Raven
based GPUs. It also contains support for all amdgpu supported GPUs
(CIK, VI, Polaris), which has to be enabled. It is also a kms atomic
modesetting compatible driver (unlike the current in-tree display
code).
I've kept it separate from drm-next because it may have some things
that cause you to reject it.
Background story:
AMD have an internal team creating a shared OS codebase for display at
hw bring up time using information from their hardware teams. This
process doesn't lead to the most Linux friendly/looking code but we
have worked together on cleaning a lot of it up and dealing with
sparse/smatch/checkpatch, and having their team internally adhere to
Linux coding standards.
This tree is a complete history rebased since they started opening it,
we decided not to squash it down as the history may have some value.
Some of the commits therefore might not reach kernel standards, and we
are steadily training people in AMD to better write commit msgs.
There is a major bunch of generated bandwidth calculation and
verification code that comes from their hardware team. On Vega and
before this is float calculations, on Raven (DCN10) this is double
based. They do the required things to do FP in the kernel, and I could
understand this might raise some issues. Rewriting the bandwidth would
be a major undertaken in reverification, it's non-trivial to work out
if a display can handle the complete set of mode information thrown at
it.
Future story:
There is a TODO list with this, and it address most of the remaining
things that would be nice to refine/remove. The DCN10 code is still
under development internally and they push out a lot of patches quite
regularly and are supporting this code base with their display team. I
think we've reached the point where keeping it out of tree is going to
motivate distributions to start carrying the code, so I'd prefer we
get it in tree. I think this code is slightly better than STAGING
quality but not massively so, I'd really like to see that float/double
magic gone and fixed point used, but AMD don't seem to think the
accuracy and revalidation of the code is worth the effort"
* tag 'drm-for-v4.15-amd-dc' of git://people.freedesktop.org/~airlied/linux: (1110 commits)
drm/amd/display: fix MST link training fail division by 0
drm/amd/display: Fix formatting for null pointer dereference fix
drm/amd/display: Remove dangling planes on dc commit state
drm/amd/display: add flip_immediate to commit update for stream
drm/amd/display: Miss register MST encoder cbs
drm/amd/display: Fix warnings on S3 resume
drm/amd/display: use num_timing_generator instead of pipe_count
drm/amd/display: use configurable FBC option in dm
drm/amd/display: fix AZ clock not enabled before program AZ endpoint
amdgpu/dm: Don't use DRM_ERROR in amdgpu_dm_atomic_check
amd/display: Fix potential null dereference in dce_calcs.c
amdgpu/dm: Remove unused forward declaration
drm/amdgpu: Remove unused dc_stream from amdgpu_crtc
amdgpu/dc: Fix double unlock in amdgpu_dm_commit_planes
amdgpu/dc: Fix missing null checks in amdgpu_dm.c
amdgpu/dc: Fix potential null dereferences in amdgpu_dm.c
amdgpu/dc: fix more indentation warnings
amdgpu/dc: handle allocation failures in dc_commit_planes_to_stream.
amdgpu/dc: fix indentation warning from smatch.
amdgpu/dc: fix non-ansi function decls.
...
The bottom two bits of the simd value were being put into
the upper bits of the wave value which was likely working due
to the bits being ignored (or aliased).
Eitherway, now we mask it correctly.
(v2) Touch up using GENMASK_ULL to a couple of other functions too
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Properly shift the index when clearing so we clear
the right bit
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1,it should not work on non-SR-IOV case
2,the NO_VBIOS error is incorrect, should
handle it under detect_sriov_bios.
3,wrap the whole detect_sriov_bios with sriov check
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
otherwise after VF FLR the KIQ cannot work
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge the post checking functions to avoid confusion and take
virtualization into account in all cases.
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Register accessing is performed when IRQ is disabled. Never sleep in
this function.
Known issue: dead sleep in many use cases of index/data registers.
v2:
- wrap polling fence functions.
- don't trigger IRQ for polling in case of wrongly fence signal.
v3:
- handle wrap round gracefully.
- add comments for polling function
v4:
- don't return negative timeout confused with error code
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fw ucode is corrupted after vf flr by PSP so ucode_init() is
a must in psp_hw_init othewise KIQ/KCQ enabling will fail
Revert "drm/amdgpu: refine code delete duplicated error handling"
This reverts commit e57b87ff828f95efe992468e6d18c2c059b27aa9.
Revert "drm/amdgpu: move amdgpu_ucode_init_bo to amdgpu_device.c"
This reverts commit 815b8f8595148d06a64d2ce4282e8e80dfcb02f1.
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SR-IOV need to exchange some data between PF&VF through shared VRAM
PF will copy some necessary firmware and information to the shared
VRAM. It also requires some information from VF. PF will send a
key through mailbox2 to help guest calculate checksum so that it can
verify whether the data is correct.
So check the data on the specified offset of the shared VRAM, if the
checksum is right, read values from it and write some VF information
next to the data from PF.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
in function amdgpu_ucode_init_bo, when failed, it will
set load_type to AMDGPU_FW_LOAD_DIRECT.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Programming CP_HQD_QUEUE_PRIORITY enables a queue to take priority over
other queues on the same pipe. Multiple queues on a pipe are timesliced
so this gives us full precedence over other queues.
Programming CP_HQD_PIPE_PRIORITY changes the SPI_ARB_PRIORITY of the
wave as follows:
0x2: CS_H
0x1: CS_M
0x0: CS_L
The SPI block will then dispatch work according to the policy set by
SPI_ARB_PRIORITY. In the current policy CS_H is higher priority than
gfx.
In order to prevent getting stuck in loops of resources bouncing between
GFX and high priority compute and introducing further latency, we
statically reserve a portion of the pipe.
v2: fix srbm_select to ring->queue and use ring->funcs->type
v3: use AMD_SCHED_PRIORITY_* instead of AMDGPU_CTX_PRIORITY_*
v4: switch int to enum amd_sched_priority
v5: corresponding changes for srbm_lock
v6: change CU reservation to PIPE_PERCENT allocation
v7: use kiq instead of MMIO
v8: back to MMIO, and make the implementation sleep safe.
v9: corresponding changes for splitting HIGH into _HW/_SW
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SR-IOV need to reserve a piece of shared VRAM at the exact place
to exchange data betweem PF and VF. The start address and size of
the shared mem are passed to guest through VBIOS structure
VRAM_UsageByFirmware.
VRAM_UsageByFirmware is a general feature in VBIOS, it indicates
that VBIOS need to reserve a piece of memory on the VRAM.
Because the mem address is specified. Reserve it early in
amdgpu_ttm_init to make sure that it can monoplize the space.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initial pull request for DC support. We've completed a substantial amount of
the cleanup and restructuring in our TODO. There are a few additional
cleanups that we are continuing to work on, but I don't think there are any
showstoppers remaining. We've tried to maintain most of the history for bisect
purposes. Harry made sure all the commits build. We've enabled DC for vega10
and Raven. Pre-vega10 parts can be enabled via module parameter (amdgpu.dc=1),
but are not enabled by default at this point until we get further testing
upstream.
This code provides atomic modesetting support for DCE8 (CIK), DCE10 (Tonga,
Fiji), DCE11 (CZ, ST, Polaris), DCE12 (vega10), and DCN1 (RV) including
HDMI and DP audio, DP MST, and many other advanced display features.
+
Latest cleanups for DC from you and Harry. Note that there is some
flickering on some older asics with this branch due to a regression in powerplay
that has already been fixed and will be included in my next non-DC pull request
next week.
* 'drm-next-4.15-dc' of git://people.freedesktop.org/~agd5f/linux: (897 commits)
amdgpu/dc: use kref for dc_state.
amdgpu/dc: convert dc_sink to kref.
amdgpu/dc: convert dc_stream_state to kref.
amdgpu/dc: use kref for dc_plane_state.
amdgpu/dc: convert dc_gamma to kref reference counting.
amdgpu/dc: convert dc_transfer to use a kref.
amdgpu/dc: kill a bunch of dead code.
amdgpu/dc: set a bunch of functions to static.
amdgpu/dc: kill some deadcode in dc core.
amdgpu/dc: fix indentation on a couple of returns.
amdgpu/dm: don't use after free.
amdgpu/dc: kfree already checks for NULL.
amdgpu/dc: fix a bunch of misc whitespace.
amdgpu/dc: drop hw_sequencer_types.h
amdgpu/dc: drop dce110_types.h
amdgpu/dc: use kernel ilog2 for log_2.
amdgpu/dc: don't memset after kzalloc.
amdgpu/dc: inline dal grph object id functions.
amdgpu/dc: inline dml_round_to_multiple
amdgpu/dc: rename bios get_image symbol to something more searchable.
...