amdgpu_nbio_ras_late_init is used to init nbio specfic
ras debugfs/sysfs node and nbio specific interrupt handler.
It can be shared among nbio generations
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
to prevent access to dangling pointers
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.
Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.
v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count
v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ras_late_init callback function will be used to do common ras
init in late init phase.
v2: call ras_late_fini to do cleanup when fails to enable interrupt
v3: rename sysfs/debugfs node name to pcie_bif_xxx
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ras controller interrupt and Ras err event athub interrupt are two dedicated
interrupts for RAS support.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ras_controller_interrupt and err_event_interrupt are ras specific interrupts.
add functions to check their status and ack them if they are generated. both
funcitons should only be invoked in ISR when BIF ring is disabled or even not
initialized.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
no functional change, just switch to new structures
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add vcn nbio doorbell range setting for 2nd vcn instance
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To setup the aperture for VCN2.5
v2: setup vcn doorbells in vcn2.5 hw_init (Alex)
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the doorbell range registers to support additional
SDMA rings.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The bit RSVD_ENG0 to RSVD_ENG5 in GPU_HDP_FLUSH_REQ/GPU_HDP_FLUSH_DONE
can be leveraged for sdma instance 2~7 to poll register/memory.
Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Snow Zhang < Snow.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL
to an empty page in mmio space. We will later map this page to process
space so application can flush hdp. This can't be done properly at
those registers' original location because it will expose more than
desired registers to process space.
v2: Use explicit register hole location
v3: Moved remapped hdp registers into adev struct
v4: Use more generic name for remapped page
Expose register offset in kfd_ioctl.h
v5: Move hdp register remap function to nbio ip function
v6: Fixed operator precedence issue and other bugs
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes for 5.1:
amdgpu:
- Fix missing fw declaration after dropping old CI DPM code
- Fix debugfs access to registers beyond the MMIO bar size
- Fix context priority handling
- Add missing license on some new files
- Various cleanups and bug fixes
radeon:
- Fix missing break in CS parser for evergreen
- Various cleanups and bug fixes
sched:
- Fix entities with 0 run queues
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlxqHJYeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWl8H/jPI4EipzD2GbnjZ
GaFpMBBjcXBaVmoA+Y69so+7BHx1Ql+5GQtqbK0RHJRb9qEPLw3FBhHNjM/N8Sgf
nSrK+GnBZp9s+k/NR/Yf2RacUR3jhz+Q9JEoQd3u9bFUeQyvE8Rf3vgtoBBwFOfz
+t7N1memYVF3asLGWB4e4sP1YVMGfseTQpSPojvM30YWM86Bv+QtSx1AGgHczQIM
kMKealR8ZPelN6JAXgLhQ5opDojBrE4YKB98pwsMDI6abz0Tz2JLFEUTTxsv5XNN
o/Iz+XDoylskEyxN2unNWfHx7Swkvoklog8J/hDg5XlTvipL/WkT66PHBgcGMNvj
BW9GgU8=
=ZizU
-----END PGP SIGNATURE-----
Merge v5.0-rc7 into drm-next
Backmerging for nouveau and imx that needed some fixes for next pulls.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There are several statements that are incorrectly indented. Fix these.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes doorbell reflection on Vega20.
Change-Id: I0495139d160a9032dff5977289b1eec11c16f781
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Different ASIC has different SDMA queue number so
different SDMA doorbell range. Introduce an extra
parameter to sdma_doorbell_range function and set
sdma doorbell range correctly.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need these offsets for PCIE perf counters, so include them as well as
the the previously-used defines from the nbio_*.c files
v2: Return NBIF definitions back to previous files
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Configure PCIE_CI_CNTL to work around a hw bug that affects
some multi-GPU compute workloads.
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add sriov capability detection for vega20, then can check if device is
virtual device.
Signed-off-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the old doorbell range setting until the driver is
able to support more sdma queues.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some register offset in nbio v7.4 are different with v7.0.
We need a seperate nbio_v7_4.c for vega20.
v2: fix doorbell range for sdma (Alex)
v3: squash in static fix (kbuild test robot)
Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>