SR-IOV need to exchange some data between PF&VF through shared VRAM
PF will copy some necessary firmware and information to the shared
VRAM. It also requires some information from VF. PF will send a
key through mailbox2 to help guest calculate checksum so that it can
verify whether the data is correct.
So check the data on the specified offset of the shared VRAM, if the
checksum is right, read values from it and write some VF information
next to the data from PF.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the
trans_error function does nothing.
The error information includes error_code (16bit), error_flags(16bit)
and error_data(64bit). Since there are not many errors, we keep the
errors in an array and transfer all errors to Host before amdgpu
initialization function (amdgpu_device_init) exit.
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
that way we can know which job cause hang and
can do per sched reset/recovery instead of all
sched.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change message to debug level as VI does.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
if sriov gpu reset is invoked by job timeout, it is run
in a global work-queue which is very slow and better not call
msleep ortherwise it takes long time to get back CPU.
so make below changes:
1: Change msleep 1 to mdelay 5
2: Ignore the ack fail from pf after time out,
because VF FLR will clear ack, sometime VF FLR is done
prior to the beginning of poll_ack so we can ignore this ack
TODO:
Put job_timedout (and the following gpu reset) in a driver thread,
instead of the global work_struct.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
they are lack in the bringup stage, we need them for GPU reset
feature.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement mailbox protocol for AI so that guest vf can communicate
with GPU hypervisor.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>