AI affected:
CP/HW team requires KMD insert FRAME_CONTROL(end) after
the last IB and before the fence of this DMAframe.
this is to make sure the cache are flushed, and it's a must
change no matter MCBP/SR-IOV or bare-metal case because new
CP hw won't do the cache flush for each IB anymore, it just
leaves it to KMD now.
with this patch, certain MCBP hang issue when rendering
vulkan/chained-ib are resolved.
v2: drop gfx8 changes. gfx8 is not affected (Alex)
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
According to CP/hw team requirment, to support PAL/CHAINED-IB
MCBP, kernel driver must guarantee DE_META must be inserted
right prior to the work_load DE IB (with PREEMPT flag), there
cannot be any non-work_load DE IB between-in DE_META and
work_load DE IB.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The usage of kiq should not depend on the virtualization.
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by:Andres Rodriquez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support for MCBP/Virtualization in combination with chained IBs is
formal released on firmware feature version #46. So enable it
according to firmware feature version, otherwise, world switch will
hang.
Signed-off-by: Trigger Huang <trigger.huang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bug: SWDEV-117987: Always on CU mask broken for gfx7+
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
I'm not sure if the order matters, but it seems like it makes
more sense to set this after the range is programmed.
v2: rebase (Alex)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add set_doorbell functions for mec and cpg.
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This could be used in Andres' priority scheduling patch
as well.
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Even if we disable clockgating, we still need to make sure the
cp/rlc interrupts are enabled for powergating which might still
be enabled.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's global, not queue specific, so move it out of the
kiq register init function.
Tested-and-Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to reset the wptr and clear the rings. The UNMAP_QUEUES
packet writes the current MQD state back the MQD on suspend,
so there is no need to reset it as well.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the UNMAP_QUEUES packet to have the KIQ properly
disable them.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the KCQ setup. This way we only have to wait once for the
entire MEC.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
One for KIQ and one for the KCQ. This simplifies the logic and
allows for future optimizations.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to make sure the various init sequences submitted
to KIQ complete before testing the rings.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KIQ is the Kernel Interface Queue for managing the MEC. Rather than setting
up rings via direct MMIO of ring registers, the rings are configured via
special packets sent to the KIQ. The allows the MEC to better manage shared
resources and certain power events.
v2: squash in s3/s4 fix from Rex
v3: further fixes from Rex
Signed-off-by: David Panariti <David.Panariti@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Tom St Denis <tom.stdenis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add missing chips to the doorbell range setup. These
were missed in the KIQ code. Fixes power and performance
regressions with KIQ. Spotted by Rex.
Tested-and-Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to properly set the MTYPE and ROQ space setting.
This should fix performance regressions with KIQ enabled.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new RIDs.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Xie <AlexBin.Xie@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some of these paths probably cannot be interrupted by a signal anyway.
Those that can would fail to clean up things if they actually got
interrupted.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This way GFX and MM won't fight for VMIDs any more.
Initially disabled since we need to stop flushing all HUBS
at the same time as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1,KIQ won't touch VRAM so no need to involv HDP flush/invalidate at all.
2,According to CP hw designer KIQ better not use any PM4 package lead to wait behave.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clean up a toggle with ?:.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Swap read/write pattern for WREG32_FIELD()
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Swap read/write pattern for WREG32_FIELD()
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use new WREG32_FIELD_OFFSET() to clean up code.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1) Adapt to vulkan:
Now use double SWITCH BUFFER to replace the 128 nops w/a,
because when vulkan introduced, umd can insert 7 ~ 16 IBs
per submit which makes 256 DW size cannot hold the whole
DMAframe (if we still insert those 128 nops), CP team suggests
use double SWITCH_BUFFERs, instead of tricky 128 NOPs w/a.
2) To fix the CE VM fault issue when MCBP introduced:
Need one more COND_EXEC wrapping IB part (original one us
for VM switch part).
this change can fix vm fault issue caused by below scenario
without this change:
>CE passed original COND_EXEC (no MCBP issued this moment),
proceed as normal.
>DE catch up to this COND_EXEC, but this time MCBP issued,
thus DE treats all following packages as NOP. The following
VM switch packages now looks just as NOP to DE, so DE
dosen't do VM flush at all.
>Now CE proceeds to the first IBc, and triggers VM fault,
because DE didn't do VM flush for this DMAframe.
3) change estimated alloc size for gfx9.
with new DMAframe scheme, we need modify emit_frame_size
for gfx9
4) No need to insert 128 nops after gfx8 vm flush anymore
because there was double SWITCH_BUFFER append to vm flush,
and for gfx7 we already use double SWITCH_BUFFER following
after vm_flush so no change needed for it.
5) Change emit_frame_size for gfx8
v2: squash in BUG removal from Monk
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Apply the new IB during IB emit for SRIOV with MCBP
v2: agd: use define instead of magic number
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when MCBP enabled for gfx8, the cond_exec must also
be implemented, otherwise there will be odds to meet
cross engine (ce and me) deadlock when world switch
happens.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch introduces a new flag named "amdgpu_firmware_load_type" to
handle different firmware loading method. Since Vega10, there are
three ways to load firmware. It would be better to use a flag and a
fw_load_type kernel parameter to configure it.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The ring structure already has what we need.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoids passing around additional parameters during setup.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Everything we need is in the ring structure. No need to
pass all the bits explicitly.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to loop through the compute queues twice.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If KIQ isn't working, the compute rings won't work either.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To better match where they are used. Called from sw_init
and sw_fini.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Found with scripts/coccinelle/misc/boolconv.cocci.
Signed-off-by: Andrew F. Davis <afd@ti.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>