linux

Author	SHA1	Message	Date
Hawking Zhang	a30f128602	drm/amdgpu: provide socket/die id info in RAS message Add socket/die information in RAS messages for platforms that support query those information Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:50 -04:00
Hawking Zhang	dfdd4b8a95	drm/amdgpu: implement smuio callback to query socket id get_socket_id is used to query socket id Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:49 -04:00
Jinzhou Su	ec0f72cb95	drm/amdgpu: Enable SDMA LS for Vangogh Add flags AMD_CG_SUPPORT_SDMA_LS for Vangogh. Start to open sdma ls from firmware version 70. Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:49 -04:00
Souptick Joarder	5f5cb2afd6	drm/amdgpu: Added missing prototype Kernel test robot throws below warning -> >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:125:5: warning: >> no previous prototype for 'kgd_arcturus_hqd_sdma_load' >> [-Wmissing-prototypes] 125 \| int kgd_arcturus_hqd_sdma_load(struct kgd_dev kgd, void mqd, >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:195:5: warning: >> no previous prototype for 'kgd_arcturus_hqd_sdma_dump' >> [-Wmissing-prototypes] 195 \| int kgd_arcturus_hqd_sdma_dump(struct kgd_dev kgd, >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:227:6: warning: >> no previous prototype for 'kgd_arcturus_hqd_sdma_is_occupied' >> [-Wmissing-prototypes] 227 \| bool kgd_arcturus_hqd_sdma_is_occupied(struct kgd_dev kgd, void mqd) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:246:5: warning: >> no previous prototype for 'kgd_arcturus_hqd_sdma_destroy' >> [-Wmissing-prototypes] 246 \| int kgd_arcturus_hqd_sdma_destroy(struct kgd_dev kgd, void *mqd, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Added prototype for these functions. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-28 23:35:49 -04:00
Christian König	3dc7216c1d	drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2 Starting with Vega the hardware supports concurrent flushes of VMID which can be used to implement per process VMID allocation. But concurrent flushes are mutual exclusive with back to back VMID allocations, fix this to avoid a VMID used in two ways at the same time. v2: don't set ring to NULL Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:19:05 -04:00
Nirmoy Das	42daecfc20	drm/amdgpu: remove AMDGPU_GEM_CREATE_SHADOW flag Remove unused AMDGPU_GEM_CREATE_SHADOW flag. Userspace is never allowed to use this. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:56 -04:00
Nirmoy Das	cd2454d6cd	drm/amdgpu: cleanup amdgpu_bo_create() Remove shadow bo related code as vm code is creating shadow bo using proper API. Without shadow bo code, amdgpu_bo_create() is basically a wrapper around amdgpu_bo_do_create(). So rename amdgpu_bo_do_create() to amdgpu_bo_create(). Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:47 -04:00
Nirmoy Das	adf6f5c51e	drm/amdgpu: create shadow bo using amdgpu_bo_create_shadow() Shadow BOs are only needed for vm code so call amdgpu_bo_create_shadow() directly instead of depending on amdgpu_bo_create(). Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:32 -04:00
Nirmoy Das	77df5c131d	drm/amdgpu: remove unused vm context flags Remove unused AMDGPU_VM_CONTEXT_GFX and AMDGPU_VM_CONTEXT_COMPUTE flags. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:26 -04:00
Nirmoy Das	a35455d065	drm/amdgpu: cleanup amdgpu_vm_init() Currently only way to create compute vm is through amdgpu_vm_make_compute(). So vm_context isn't required anymore for amdgpu_vm_init(). Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:19 -04:00
Nirmoy Das	25e9146ae6	drm/amdgpu: expose amdgpu_bo_create_shadow() Exposed amdgpu_bo_create_shadow() will be needed for amdgpu_vm handling. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:10 -04:00
Christian König	589939d401	drm/amdgpu: fix coding style and documentation in amdgpu_vram_mgr.c No functional changes, just cleaning up some leftovers and improve documentation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nirmoy Das <nirmoy.aiemd@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:18:02 -04:00
Christian König	a614b336f1	drm/amdgpu: fix coding style and documentation in amdgpu_gtt_mgr.c Avoid the forward define, fix coding style, add documentation. No functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nirmoy Das <nirmoy.aiemd@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:17:56 -04:00
Alex Sierra	126bbd4ab5	drm/amdgpu: extend xnack limit page fault timeout Extending this timeout will prevent IH from storm interrupts coming from SDMA while a page fault is active. Currently, on Aldebaran, handling that many interrupts can take a lot of CPU time (up to 4 seconds). This eventually causes timeouts in other tasks. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:16:20 -04:00
Hawking Zhang	502f0e2804	drm/amdgpu: disable gfx ras by default in aldebaran aldebaran gfx ras is still under development Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:16:14 -04:00
Kenneth Feng	1d712be90a	drm/amd/amdgpu: add cgls enable cgls to improve the runtime power efficiency. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:15:59 -04:00
Stanley.Yang	19d0dfda4c	drm/amdgpu: optimize gfx ras features flag clean Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:15:52 -04:00
Jinzhou Su	ef9bcfde9e	drm/amdgpu: Enable SDMA MGCG for Vangogh Add flags AMD_CG_SUPPORT_SDMA_MGCG for Vangogh. Start to open sdma mgcg from firmware version 70. Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:15:39 -04:00
John Clements	7e882aee84	drm/amdgpu: add support for ras init flags conditionally configure ras for dgpu mode or poison propogation mode Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:15:33 -04:00
Dennis Li	6effe77972	drm/amdgpu: refine gprs init shaders to check coverage Add codes to check whether all SIMDs are covered, make sure that all GPRs are initialized. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-23 17:15:21 -04:00
Lee Jones	3bffd71deb	drm/amd/amdgpu/amdgpu_cs: Repair some function naming disparity Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:685: warning: expecting prototype for cs_parser_fini(). Prototype was for amdgpu_cs_parser_fini() instead drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1502: warning: expecting prototype for amdgpu_cs_wait_all_fence(). Prototype was for amdgpu_cs_wait_all_fences() instead drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1656: warning: expecting prototype for amdgpu_cs_find_bo_va(). Prototype was for amdgpu_cs_find_mapping() instead Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jerome Glisse <glisse@freedesktop.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:51:13 -04:00
Lee Jones	03691f5502	drm/amd/amdgpu/amdgpu_ring: Provide description for 'sched_score' Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:169: warning: Function parameter or member 'sched_score' not described in 'amdgpu_ring_init' Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:51:09 -04:00
Lee Jones	27aa4a69b4	drm/amd/amdgpu/amdgpu_ttm: Fix incorrectly documented function 'amdgpu_ttm_copy_mem_to_mem()' Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:311: warning: expecting prototype for amdgpu_copy_ttm_mem_to_mem(). Prototype was for amdgpu_ttm_copy_mem_to_mem() instead Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jerome Glisse <glisse@freedesktop.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:51:06 -04:00
Lee Jones	777d9000d9	drm/amd/amdgpu/amdgpu_gart: Correct a couple of function names in the docs Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:73: warning: expecting prototype for amdgpu_dummy_page_init(). Prototype was for amdgpu_gart_dummy_page_init() instead drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:96: warning: expecting prototype for amdgpu_dummy_page_fini(). Prototype was for amdgpu_gart_dummy_page_fini() instead Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:51:02 -04:00
Lee Jones	b16cc4bb1a	drm/amd/amdgpu/amdgpu_fence: Provide description for 'sched_score' Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:444: warning: Function parameter or member 'sched_score' not described in 'amdgpu_fence_driver_init_ring' Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jerome Glisse <glisse@freedesktop.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:50:59 -04:00
Lee Jones	2196927bcb	drm/amd/amdgpu/amdgpu_device: Remove unused variable 'r' Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_suspend’: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3733:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable] Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:50:48 -04:00
Alex Sierra	485bea1f90	drm/amdgpu: add svm_bo eviction to enable_signal cb Add to amdgpu_amdkfd_fence.enable_signal callback, support for svm_bo fence eviction. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:49:56 -04:00
Alex Sierra	5f319c5c21	drm/amdgpu: svm bo enable_signal call condition [why] To support svm bo eviction mechanism. [how] If the BO crated has AMDGPU_AMDKFD_CREATE_SVM_BO flag set, enable_signal callback will be called inside amdgpu_evict_flags. This also causes gutting of the BO by removing all placements, so that TTM won't actually do an eviction. Instead it will discard the memory held by the BO. This is needed for HMM migration to user mode system memory pages. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:49:49 -04:00
Alex Sierra	f04c79cfba	drm/amdgpu: add param bit flag to create SVM BOs Add CREATE_SVM_BO define bit for SVM BOs. Another define flag was moved to concentrate these KFD type flags in one include file. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:49:31 -04:00
Alex Sierra	eb2cec5537	drm/amdkfd: add svm_bo reference for eviction fence [why] As part of the SVM functionality, the eviction mechanism used for SVM_BOs is different. This mechanism uses one eviction fence per prange, instead of one fence per kfd_process. [how] A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between SVM_BO or regular BO evictions. This also include modifications to set the reference at the fence creation call. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:49:22 -04:00
Alex Sierra	ea53af8a59	drm/amdkfd: SVM API call to restore page tables Use SVM API to restore page tables when retry fault and compute context are enabled. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:49:15 -04:00
Alex Sierra	9dd9cc2f74	drm/amdgpu: enable 48-bit IH timestamp counter By default this timestamp is 32 bit counter. It gets overflowed in around 10 minutes. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:48:58 -04:00
Philip Yang	c46ebb6a6d	drm/amdkfd: set memory limit to avoid OOM with HMM enabled HMM migration alloc sizeof(struct page) on system memory for each VRAM page, it is 1GB system memory reserved for 64GB VRAM. To avoid application OOM, increase system memory used size based on VRAM size of all GPUs, then application alloc memory will fail if system memory usage reach the limit. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:48:00 -04:00
Felix Kuehling	9705c85ff2	drm/amdgpu: Enable retry faults unconditionally on Aldebaran This is needed to allow per-process XNACK mode selection in the SQ when booting with XNACK off by default. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:47:34 -04:00
Felix Kuehling	f80fe9d3c1	drm/amdkfd: map svm range to GPUs Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap svm range system memory pages address to GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:47:21 -04:00
Philip Yang	d27afacfea	drm/amdgpu: export vm update mapping interface It will be used by kfd to map svm range to GPU, because svm range does not have amdgpu_bo and bo_va, cannot use amdgpu_bo_update interface, use amdgpu vm update interface directly. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:47:13 -04:00
Philip Yang	d8a3c1c80c	drm/amdkfd: support larger svm range allocation For larger range allocation, if hmm_range_fault return -EBUSY, set retry timeout based on 1 second for every 512MB, this is safe timeout value. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:46:50 -04:00
Philip Yang	04d8d73dbc	drm/amdgpu: add common HMM get pages function Move the HMM get pages function from amdgpu_ttm and to amdgpu_mn. This common function will be used by new svm APIs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:46:43 -04:00
Felix Kuehling	cccbeb6209	drm/amdgpu: Remove verify_access shortcut for KFD BOs This shortcut is no longer needed with access managed properly by KFD. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:46:01 -04:00
Felix Kuehling	d4ec4bdc0b	drm/amdkfd: Allow access for mmapping KFD BOs DRM render node file handles are used for CPU mapping of BOs using mmap by the Thunk. It uses the DRM render node of the GPU where the BO was allocated. DRM allows mmap access automatically when it creates a GEM handle for a BO. KFD BOs don't have GEM handles, so KFD needs to manage access manually. Use drm_vma_node_allow to allow user mode to mmap BOs allocated with kfd_ioctl_alloc_memory_of_gpu through the DRM render node that was used in the kfd_ioctl_acquire_vm call for the same GPU. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:45:54 -04:00
Felix Kuehling	b40a6ab2cf	drm/amdkfd: Use drm_priv to pass VM from KFD to amdgpu amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. The VM can also be extracted from drm_priv, so drm_priv can replace the vm parameter in the kfd2kgd interface. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:45:45 -04:00
Alex Deucher	7845d80dda	drm/amdgpu/gmc9: remove dummy read workaround for newer chips Aldebaran has a hw fix so no longer requires the workaround. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:45:36 -04:00
Jinzhou Su	5c88e3b86a	drm/amdgpu: Add mem sync flag for IB allocated by SA The buffer of SA bo will be used by many cases. So it's better to invalidate the cache of indirect buffer allocated by SA before commit the IB. Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:45:24 -04:00
Mukul Joshi	ceb47e0d84	drm/amdgpu: Fix SDMA RAS error reporting on Aldebaran Fix the following issues with SDMA RAS error reporting: 1. Read the EDC_COUNTER2 register also to fetch error counts for all sub-blocks in SDMA. 2. SDMA RAS on Aldebaran suports single-bit uncorrectable errors only. So, report error count in UE count instead of CE count. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-By: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:45:17 -04:00
Mukul Joshi	1f0d8e3781	drm/amdgpu: Reset RAS error count and status regs Reset the RAS error count and error status registers after reading to prevent over reporting error counts on Aldebaran. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-By: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:45:10 -04:00
Oak Zeng	5f41741a6d	Revert "drm/amdgpu: workaround the TMR MC address issue (v2)" This reverts commit `2f055097da`. `2f055097da` was a driver workaround when PSP firmware was not ready. Now the PSP fw is ready so we revert this driver workaround. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:43:49 -04:00
Jiansong Chen	7c49ee9ec5	drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish dimgrey_cavefish has similar gc_10_3 ip with sienna_cichlid, so follow its registers offset setting. Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:36:28 -04:00
John Clements	f9727922fc	drm/amdgpu: resolve erroneous gfx_v9_4_2 prints resolve bug on aldebaran where gfx error counts will print on driver load when there are no errors present Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:36:10 -04:00
Dennis Li	6df23f4c5c	drm/amdgpu: fix a error injection failed issue because "sscanf(str, "retire_page")" always return 0, if application use the raw data for error injection, it always wrongly falls into "op == 3". Change to use strstr instead. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:36:01 -04:00
Hawking Zhang	1f8d3ad2a0	drm/amdgpu: only harvest gcea/mmea error status in aldebaran In aldebaran, driver only needs to harvest SDP RdRspStatus, WrRspStatus and first parity error on RdRsp data. Check error type before harvest error information. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-20 21:35:55 -04:00

1 2 3 4 5 ...

9069 Commits