linux

Author	SHA1	Message	Date
Jack Xiao	928fe236c0	drm/amdgpu: add mes_kiq module parameter v2 mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 10:43:49 -04:00
Jack Xiao	de33a32968	drm/amdgpu: use the whole doorbell space for mes Use the whole doorbell space for mes. Each queue in one process occupies one doorbell slot to ring the queue submitting. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-05-04 10:04:01 -04:00
Hawking Zhang	85d1bcc6e0	drm/amdgpu: switch to atomfirmware_asic_init Some initial settings now are not available from the atom data table. The assumption that !ps[0] \|\| !ps[1] in amdgpu_atom_asic_init is not valid. In addition, driver needs to strictly follow atomfirmware structure (asic_init_parameters) to initialize parameters used to execute asic_init function, otherwise, the execution of asic_init would fail. This shall be applicable to all soc15 adapters,but let make the transition on soc21 first. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-28 17:48:00 -04:00
Alex Deucher	e24d0e91b3	drm/amdgpu/discovery: move all table parsing into amdgpu_discovery.c This data has no dependencies, so encapsulate it all within amdgpu_discovery.c. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-28 17:47:43 -04:00
Bokun Zhang	e15c9d06e9	drm/amd/amdgpu: Update PF2VF header - In the latest version of the header, there is a variable name change. This should not cause any backward compatibility since the variable is at the same offset in the struct. Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-21 16:00:14 -04:00
Evan Quan	25faeddcf3	drm/amdgpu: expand cg_flags from u32 to u64 With this, we can support more CG flags. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-04-08 17:24:24 -04:00
Alex Deucher	b818a5d374	drm/amdgpu/gmc: use PCI BARs for APUs in passthrough If the GPU is passed through to a guest VM, use the PCI BAR for CPU FB access rather than the physical address of carve out. The physical address is not valid in a guest. v2: Fix HDP handing as suggested by Michel Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-25 12:40:24 -04:00
Philip Yang	436afdfa35	drm/amdgpu: Move reset domain init before calling RREG32 amdgpu_detect_virtualization reads register, amdgpu_device_rreg access adev->reset_domain->sem if kernel defined CONFIG_LOCKDEP, below is the random boot hang backtrace on Vega10. It may get random NULL pointer access backtrace if amdgpu_sriov_runtime is true too. Move amdgpu_reset_create_reset_domain before calling to RREG32. BUG: kernel NULL pointer dereference, address: #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI Workqueue: events work_for_cpu_fn RIP: 0010:down_read_trylock+0x13/0xf0 Call Trace: <TASK> amdgpu_device_skip_hw_access+0x38/0x80 [amdgpu] amdgpu_device_rreg+0x1b/0x170 [amdgpu] amdgpu_detect_virtualization+0x73/0x100 [amdgpu] amdgpu_device_init.cold.60+0xbe/0x16b1 [amdgpu] ? pci_bus_read_config_word+0x43/0x70 amdgpu_driver_load_kms+0x15/0x120 [amdgpu] amdgpu_pci_probe+0x1a1/0x3a0 [amdgpu] Fixes: `d0fb18b535` ("drm/amdgpu: Move reset sem into reset_domain") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-15 14:40:49 -04:00
Alex Deucher	85ac2021fe	drm/amdgpu: only check for _PR3 on dGPUs We don't support runtime pm on APUs. They support more dynamic power savings using clock and powergating. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-15 14:34:41 -04:00
Alex Deucher	1b537e6410	drm/amdgpu: remove unused gpu_info firmwares These were leftover from bring up and are no longer necessary. The information is available via the IP discovery table. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-02 18:40:07 -05:00
Yifan Zhang	957b0787ee	drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set is called. Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-03-02 18:40:06 -05:00
Dave Airlie	38a15ad948	Merge tag 'amd-drm-next-5.18-2022-02-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.18-2022-02-25: amdgpu: - Raven2 suspend/resume fix - SDMA 5.2.6 updates - VCN 3.1.2 updates - SMU 13.0.5 updates - DCN 3.1.5 updates - Virtual display fixes - SMU code cleanup - Harvest fixes - Expose benchmark tests via debugfs - Drop no longer relevant gart aperture tests - More RAS restructuring - W=1 fixes - PSR rework - DP/VGA adapter fixes - DP MST fixes - GPUVM eviction fix - GPU reset debugfs register dumping support - Misc display fixes - SR-IOV fix - Aldebaran mGPU fix - Add module parameter to disable XGMI for testing amdkfd: - IH ring overflow logging fixes - CRIU fixes - Misc fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com	2022-03-01 16:19:02 +10:00
Andrey Grodzovsky	2656fd230d	drm/amdgpu: Exclude PCI reset method for now. According to my investigation of the state of PCI reset recently it's not working. The reason is due to the fact the kernel PCI code rejects SBR when there are more then one PF under same bridge which we always have (at least AUDIO PF but usually more) and that because SBR will reset all the PFS and devices under the same bridge as you and you cannot assume they support SBR. Once we anble FLR support we can reenable this option as FLR is doable on single PF and doens't have this restriction. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-24 17:25:20 -05:00
Dave Airlie	54f43c17d6	drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module__driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmIWLSEACgkQ/lWMcqZw E8OP7hAAjix94EX5fhFa7OAdqUbFtsiKhK/4zNtV9FWpFiEsDBz+dlbfDQWIx5an FIiiiQtSfWjpDv6pcMhoNf80w+dDbc/Cuauz6nNGO7Pkaerh2D/EPG74FD7f7nE3 EIScVs1heYtzM9usKrFKupNYgIdgZxmeovClWuE0OTjLOes2PGvvxXK6iJqNqJMX VlDO5SR7GRqsDUDV6vmwl63uKL77xJXAahAXIx+BQ/1xrtEhlu6NwsgHIsmPmMSN YluX34zc1xD/6/uUqvEdp7u46/5/He1c5Q/ia1WV3wRxsO/eMZ+axXqCZP3XGZdt rMdGNtj1MWKkudYiowStWkCVSG/0fXJCFIAhvRmeZy+YqPdVlqZ2W7g4H1l9iJoo UVfT9cHrKoxHsukvIEckC5Ov9v1yr39Bd4wUuqaUTUSxY8VID5vjY63TsXl9Zke1 SluTFe9qybbnRNz/hYRvwIS1eT8HvUauAfAhypGTLI5DYHTD7PawcfMJkNzCtJm4 Ta4SC3rTpkpN+7oc8SoNgqRHQ8U9KL5oksP0wVa8vwHsMptSd3X4pUljc6TcfjLv GEo41D5AuJz3HRVcn9yqPbLoPE2FFB7bfwIMH77yNnoos4Izy/LGhKpN0YdImmI5 W5XVFB0jltGSIhkzLe1mFpLrdJwdUTSUVeCK4H5PhZZQEHLkVtg= =HuwD -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2022-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module__driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/686ec871-e77f-c230-22e5-9e3bb80f064a@linux.intel.com	2022-02-25 05:50:18 +10:00
Somalapuram Amaranath	15fd09a05a	drm/amdgpu: add reset register dump trace on GPU Dump the list of register values to trace event on GPU reset. Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:26:36 -05:00
Alex Deucher	b784f42cf7	drm/amdgpu: drop testing module parameter This test is not particularly useful now that GTT and GART are decoupled in the driver. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:51 -05:00
Alex Deucher	0b1a63487b	drm/amdgpu: drop benchmark module parameter Now that we expose the benchmarks via debugfs, there is no longer a need for the module parameter. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:51 -05:00
Alex Deucher	f113cc32e3	drm/amdgpu: add a benchmark mutex To avoid multiple runs in parallel to avoid mixing results. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-23 14:02:50 -05:00
Jiawei Gu	8ab62eda17	drm/sched: Add device pointer to drm_gpu_scheduler Add device pointer so scheduler's printing can use DRM_DEV_ERROR() instead, which makes life easier under multiple GPU scenario. v2: amend all calls of drm_sched_init() v3: fill dev pointer for all drm_sched_init() calls Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com	2022-02-23 10:04:14 +01:00
Mario Limonciello	0ab5d711ec	drm/amd: Refactor `amdgpu_aspm` to be evaluated per device Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-17 15:59:05 -05:00
yipechai	867e24ca49	drm/amdgpu: define amdgpu_ras_late_init to call all ras blocks' .ras_late_init Define amdgpu_ras_late_init to call all ras blocks' .ras_late_init. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-17 15:59:05 -05:00
Alex Deucher	dfcc3e8c24	drm/amdgpu: make cyan skillfish support code more consistent Since this is an existing asic, adjust the code to follow the same logic as previously so the driver state is consistent. No functional change intended. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-16 17:30:02 -05:00
Surbhi Kakarya	7258fa31ea	drm/amdgpu: Handle the GPU recovery failure in SRIOV environment. This patch handles the GPU recovery failure in sriov environment by retrying the reset if the first reset fails. To determine the condition of retry, a new macro AMDGPU_RETRY_SRIOV_RESET is added which returns true if failure is due to ETIMEDOUT, EINVAL or EBUSY, otherwise return false.A new macro AMDGPU_MAX_RETRY_LIMIT is used to limit the retry to 2. It also handles the return status in Post Asic Reset by updating the return code with asic_reset_res and eventually return the return code in amdgpu_job_timedout(). Signed-off-by: Surbhi Kakarya <surbhi.kakarya@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-14 15:08:41 -05:00
Rajneesh Bhardwaj	7157934699	drm/amdgpu: Fix a kerneldoc warning Add missing parameters to fix a kerneldoc warning Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-14 15:08:40 -05:00
Andrey Grodzovsky	c7703ce38c	drm/amdgpu: Fix htmldoc warning Update function name. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220211205500.601391-1-andrey.grodzovsky@amd.com	2022-02-11 16:05:08 -05:00
Andrey Grodzovsky	3675c2f26f	drm/amdgpu: Revert 'drm/amdgpu: annotate a false positive recursive locking' Since we have a single instance of reset semaphore which we lock only once even for XGMI hive we don't need the nested locking hint anymore. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74120.html	2022-02-09 12:19:14 -05:00
Andrey Grodzovsky	e923be9934	drm/amdgpu: Rework amdgpu_device_lock_adev This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74118.html	2022-02-09 12:18:39 -05:00
Andrey Grodzovsky	89a7a87093	drm/amdgpu: Move in_gpu_reset into reset_domain We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html	2022-02-09 12:17:57 -05:00
Andrey Grodzovsky	d0fb18b535	drm/amdgpu: Move reset sem into reset_domain We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html	2022-02-09 12:17:32 -05:00
Andrey Grodzovsky	cfbb6b0047	drm/amdgpu: Rework reset domain to be refcounted. The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html	2022-02-09 12:17:09 -05:00
Andrey Grodzovsky	f287a3c5b0	drm/amdgpu: Drop concurrent GPU reset protection for device Since now all GPU resets are serialzied there is no need for this. This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout' Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74119.html	2022-02-09 12:16:53 -05:00
Andrey Grodzovsky	681260df4d	drm/amdgpu: Drop hive->in_reset Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74115.html	2022-02-09 12:16:29 -05:00
Andrey Grodzovsky	54f329cc7a	drm/amdgpu: Serialize non TDR gpu recovery with TDRs Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. v2: Rename to amdgpu_recover_work_struct Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74113.html	2022-02-09 12:15:23 -05:00
Andrey Grodzovsky	5fd8518d18	drm/amdgpu: Move scheduler init to after XGMI is ready Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74112.html	2022-02-09 12:15:04 -05:00
Andrey Grodzovsky	a4c63cafa5	drm/amdgpu: Introduce reset domain Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: Christian König <ckoenig.leichtzumerken@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74111.html	2022-02-09 12:14:32 -05:00
Changcheng Deng	6a77bce58c	drm/amdgpu: remove duplicate include in 'amdgpu_device.c' 'linux/pci.h' included in 'amdgpu_device.c' is duplicated. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-02-02 18:26:31 -05:00
Alex Deucher	8cda7a4f96	drm/amdgpu/UAPI: add new CTX OP to get/set stable pstates Add a new CTX ioctl operation to set stable pstates for profiling. When creating traces for tools like RGP or using SPM or doing performance profiling, it's required to enable a special stable profiling power state on the GPU. These profiling states set fixed clocks and disable certain other power features like powergating which may impact the results. Historically, these profiling pstates were enabled via sysfs, but this adds an interface to enable it via the CTX ioctl from the application. Since the power state is global only one application can set it at a time, so if multiple applications try and use it only the first will get it, the ioctl will return -EBUSY for others. The sysfs interface will override whatever has been set by this interface. Mesa MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207 v2: don't default r = 0; v3: rebase on Evan's PM cleanup Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-27 15:50:08 -05:00
Tianci.Yin	7270e8957e	drm/amdgpu: Fix an error message in rmmod [why] In rmmod procedure, kfd sends cp a dequeue request, but the request does not get response, then an error message "cp queue pipe 4 queue 0 preemption failed" printed. [how] Performing kfd suspending after disabling gfxoff can fix it. Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-27 15:48:30 -05:00
Alex Deucher	901e2be20d	drm/amdgpu: move PX checking into amdgpu_device_ip_early_init We need to set the APU flag from IP discovery before we evaluate this code. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-25 18:00:36 -05:00
Hawking Zhang	1b2dc99e2d	drm/amdgpu: switch to amdgpu_sriov_rreg/wreg Instead of ip specific implementation for rlcg indirect register access Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Zhou, Peng Ju <PengJu.Zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-25 18:00:33 -05:00
Xiaojian Du	86700a4026	drm/amdgpu: modify a pair of functions for the pcie port wreg/rreg This patch will modify a pair of functions for pcie port wreg/rreg. AMD GPU have had an independent NBIO block from SOC15 arch. If the dirver wants to read/write the address space of the pcie devices, it has to go through the NBIO block. This patch will move the pcie port wreg/rreg functions to "amdgpu_device.c", so that to reuse the functions on the future GPU ASICs. Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-19 22:32:11 -05:00
Jingwen Chen	9a458402fb	drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV [Why] This fixes `892deb4826` ("drm/amdgpu: Separate vf2pf work item init from virt data exchange"). we should read pf2vf data based at mman.fw_vram_usage_va after gmc sw_init. commit `892deb4826` breaks this logic. [How] calling amdgpu_virt_exchange_data in amdgpu_virt_init_data_exchange to set the right base in the right sequence. v2: call amdgpu_virt_init_data_exchange after gmc sw_init to make data exchange workqueue run v3: clean up the code logic v4: add some comment and make the code more readable Fixes: `892deb4826` ("drm/amdgpu: Separate vf2pf work item init from virt data exchange") Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-18 18:00:58 -05:00
Jingwen Chen	22c16d251a	drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV [Why] This fixes `892deb4826` ("drm/amdgpu: Separate vf2pf work item init from virt data exchange"). we should read pf2vf data based at mman.fw_vram_usage_va after gmc sw_init. commit `892deb4826` breaks this logic. [How] calling amdgpu_virt_exchange_data in amdgpu_virt_init_data_exchange to set the right base in the right sequence. v2: call amdgpu_virt_init_data_exchange after gmc sw_init to make data exchange workqueue run v3: clean up the code logic v4: add some comment and make the code more readable Fixes: `892deb4826` ("drm/amdgpu: Separate vf2pf work item init from virt data exchange") Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-18 17:41:03 -05:00
Alex Deucher	0ffb1fd158	drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu() Rather than opting into GPU recovery support, default to on, and opt out if it's not working on a particular GPU. This avoids the need to add new asics to this list since this is a core feature. Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 18:06:44 -05:00
CHANDAN VURDIGERE NATARAJ	4175c32be5	drm/amdgpu: Enable recovery on yellow carp Add yellow carp to devices which support recovery Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 18:06:44 -05:00
Alex Deucher	b3523c4573	drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu() Rather than opting into GPU recovery support, default to on, and opt out if it's not working on a particular GPU. This avoids the need to add new asics to this list since this is a core feature. Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 17:52:50 -05:00
CHANDAN VURDIGERE NATARAJ	31425abeda	drm/amdgpu: Enable recovery on yellow carp Add yellow carp to devices which support recovery Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 17:52:50 -05:00
Yang Li	ab3b9de65b	drm/amdgpu: clean up some inconsistent indenting Eliminate the follow smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3504 amdgpu_device_init() warn: inconsistent indenting drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1716 amdgpu_ras_error_status_query() warn: if statement not indented drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1058 amdgpu_ras_error_inject() warn: inconsistent indenting Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 17:52:10 -05:00
yipechai	5e67bba301	drm/amdgpu: Modify mmhub block to fit for the unified ras block data and ops 1.Modify mmhub block to fit for the unified ras block data and ops. 2.Change amdgpu_mmhub_ras_funcs to amdgpu_mmhub_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of mmhub ras variable so that mmhub ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register mmhub ras block into amdgpu device ras block link list. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of mmhub versions. If .ras_late_init and .ras_fini had been defined by the selected mmhub version, the defined functions will take effect; if not defined, default fill them with amdgpu_mmhub_ras_late_init and amdgpu_mmhub_ras_fini. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 17:51:59 -05:00
yipechai	6492e1b07c	drm/amdgpu: Unify ras block interface for each ras block 1. Define unified ops interface for each block. 2. Add ras_block_match function pointer in ops interface, each ras block can customize specail match function to identify itself. 3. Add amdgpu_ras_block_match_default new function. If a ras block doesn't define .ras_block_match, default execute amdgpu_ras_block_match_default to identify this ras block. 4. Define unified basic ras block data for each ras block. 5. Create dedicated amdgpu device ras block link list to manage all of the ras blocks. 6. Add amdgpu_ras_register_ras_block new function interface for each ras block to register itself to ras controlling block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 17:51:59 -05:00
Evan Quan	bc143d8b83	drm/amd/pm: do not expose implementation details to other blocks out of power Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should be kept internally. It's not a good practice and even error prone to expose implementation details. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-14 17:51:14 -05:00
Prike Liang	4eaf21b752	drm/amdgpu: not return error on the init_apu_flags In some APU project we needn't always assign flags to identify each other, so we may not need return an error. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-11 15:44:27 -05:00
Tom St Denis	4cc9f86f85	drm/amd/amdgpu: Add pcie indirect support to amdgpu_mm_wreg_mmio_rlc() The function amdgpu_mm_wreg_mmio_rlc() is used by debugfs to write to MMIO registers. It didn't support registers beyond the BAR mapped MMIO space. This adds pcie indirect write support. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-11 15:44:26 -05:00
Nirmoy Das	575e55ee4f	drm/amdgpu: recover gart table at resume Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-11 15:44:26 -05:00
Nirmoy Das	1dd8b1b987	drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v4: remove unused adev. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-11 15:44:26 -05:00
Leslie Shi	62d5f9f711	drm/amdgpu: Unmap MMIO mappings when device is not unplugged Patch: 3efb17ae7e92 ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure") makes call to amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps MMIO mappings even when device is not unplugged. v2: Add condition of drm_dev_enter() to deleted unmaps in patch "drm/amdgpu: Unmap all MMIO mappings" Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-11 15:44:26 -05:00
Mario Limonciello	e53d9665ab	drm/amdgpu: explicitly check for s0ix when evicting resources This codepath should be running in both s0ix and s3, but only does currently because s3 and s0ix are both set in the s0ix case. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-01-07 17:20:38 -05:00
Dave Airlie	cb6846fbb8	Merge tag 'amd-drm-next-5.17-2021-12-30' of ssh://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.17-2021-12-30: amdgpu: - Suspend/resume fixes - Fence fix - Misc code cleanups - IP discovery fixes - SRIOV fixes - RAS fixes - GMC 8 VRAM detection fix - FRU fixes for Aldebaran - Display fixes amdkfd: - SVM fixes - IP discovery fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211230141032.613596-1-alexander.deucher@amd.com	2021-12-31 10:59:17 +10:00
Alex Deucher	0637d41786	drm/amdgpu: no DC support for headless chips Chips with no display hardware should return false for DC support. v2: drop Arcturus and Aldebaran Fixes: `f7f12b2582` ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reported-by: Tareque Md.Hanif <tarequemd.hanif@yahoo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-30 08:54:45 -05:00
Surbhi Kakarya	b6fd6e0f5e	drm/amdgpu: Check the memory can be accesssed by ttm_device_clear_dma_mappings. If the event guard is enabled and VF doesn't receive an ack from PF for full access, the guest driver load crashes. This is caused due to the call to ttm_device_clear_dma_mappings with non-initialized mman during driver tear down. This patch adds the necessary condition to check if the mman initialization passed or not and takes the path based on the condition output. Signed-off-by: Surbhi Kakarya <Surbhi.Kakarya@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-30 08:54:43 -05:00
Leslie Shi	87172e89dc	drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure [Why] In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs. [How] call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid memory address in vcn_v3_0_sw_fini() when GPU initialization failure. Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-30 08:54:24 -05:00
sashank saye	4da8b63944	drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is used for SMU to differentiate. Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: sashank saye <sashank.saye@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-28 16:03:19 -05:00
Victor Skvortsov	4a0165f060	drm/amdgpu: get xgmi info before ip_init Driver needs to call get_xgmi_info() before ip_init to determine whether it needs to handle a pending hive reset. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: David Nieto <david.nieto@amd.com> Reviewed by: shaoyun.liu <Shaoyun.lui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-28 16:02:17 -05:00
Dave Airlie	b06103b532	Merge tag 'amd-drm-next-5.17-2021-12-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - Add some display debugfs entries - RAS fixes - SR-IOV fixes - W=1 fixes - Documentation fixes - IH timestamp fix - Misc power fixes - IP discovery fixes - Large driver documentation updates - Multi-GPU memory use reductions - Misc display fixes and cleanups - Add new SMU debug option amdkfd: - SVM fixes radeon: - Fix typo in comment From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211216202731.5900-1-alexander.deucher@amd.com	2021-12-23 11:55:28 +10:00
Victor Skvortsov	892deb4826	drm/amdgpu: Separate vf2pf work item init from virt data exchange We want to be able to call virt data exchange conditionally after gmc sw init to reserve bad pages as early as possible. Since this is a conditional call, we will need to call it again unconditionally later in the init sequence. Refactor the data exchange function so it can be called multiple times without re-initializing the work item. v2: Cleaned up the code. Kept the original call to init_exchange_data() inside early init to initialize the work item, afterwards call exchange_data() when needed. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed By: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-16 14:08:20 -05:00
Huang Rui	5c1e6fa49e	drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence The job embedded fence donesn't initialize the flags at dma_fence_init(). Then we will go a wrong way in amdgpu_fence_get_timeline_name callback and trigger a null pointer panic once we enabled the trace event here. So introduce new amdgpu_fence object to indicate the job embedded fence. [ 156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0 [ 156.131804] #PF: supervisor read access in kernel mode [ 156.131811] #PF: error_code(0x0000) - not-present page [ 156.131817] PGD 0 P4D 0 [ 156.131824] Oops: 0000 [#1] PREEMPT SMP PTI [ 156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G OE 5.16.0-rc1-custom #1 [ 156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016 [ 156.131848] RIP: 0010:strlen+0x0/0x20 [ 156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206 [ 156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b [ 156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0 [ 156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000 [ 156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0 [ 156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007 [ 156.131914] FS: 0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000 [ 156.131923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0 [ 156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 156.131949] Call Trace: [ 156.131953] <TASK> [ 156.131957] ? trace_event_raw_event_dma_fence+0xcc/0x200 [ 156.131973] ? ring_buffer_unlock_commit+0x23/0x130 [ 156.131982] dma_fence_init+0x92/0xb0 [ 156.131993] amdgpu_fence_emit+0x10d/0x2b0 [amdgpu] [ 156.132302] amdgpu_ib_schedule+0x2f9/0x580 [amdgpu] [ 156.132586] amdgpu_job_run+0xed/0x220 [amdgpu] v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot) Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-16 13:42:36 -05:00
Yann Dirson	03f2abb07e	amdgpu: fix some kernel-doc markup Those are not today pulled by the sphinx doc, but better be ready. Signed-off-by: Yann Dirson <ydirson@free.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-14 16:10:53 -05:00
Guchun Chen	e0f943b4f9	drm/amdgpu: use adev_to_drm to get drm_device pointer Updated for consistency when accessing drm_device from amdgpu driver. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-14 16:09:24 -05:00
Daniel Vetter	99b03ca651	Linux 5.16-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmG2fU0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC7EH/3R7Rt+OD8Wn8Ss3 w8V+dBxVwa2u2oMTyUHPxaeOXZ7bi38XlUdLFPOK/76bGwO0a5TmYZqsWdRbGyT0 HfcYjHsQ0lbJXk/nh2oM47oJxJXVpThIHXJEk0FZ0Y5t+DYjIYlNHzqZymUyhLem St74zgWcyT+MXuqY34vB827FJDUnOxhhhi85tObeunaSPAomy9aiYidSC1ARREnz iz2VUntP/QnRnKVvL2nUZNzcz1xL5vfCRSKsRGRSv3qW1Y/1M71ylt6JVmSftWq+ VmMdFxFhdrb1OK/1ct/930Un/UP2NG9EJsWxote2XYlnVSZHzDqH7lUhbqgdCcLz 1m2tVNY= =7wRd -----END PGP SIGNATURE----- Merge v5.16-rc5 into drm-next Thomas Zimmermann requested a fixes backmerge, specifically also for `96c5f82ef0` ("drm/vc4: fix error code in vc4_create_object()") Just a bunch of adjacent changes conflicts, even the big pile of them in vc4. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2021-12-14 10:24:28 +01:00
Philip Yang	4a74c38cd6	drm/amdgpu: Detect if amdgpu in IOMMU direct map mode If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:33:16 -05:00
Lang Yu	34f3a4a98b	drm/amdgpu: introduce a kind of halt state for amdgpu device It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: Christian Koenig <christian.koenig@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.co> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:33:16 -05:00
Lang Yu	613aa3ea74	drm/amdgpu: only hw fini SMU fisrt for ASICs need that We found some headaches on ASICs don't need that, so remove that for them. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:32:35 -05:00
Isabella Basso	bbe04dec5c	drm/amd: fix improper docstring syntax This fixes various warnings relating to erroneous docstring syntax, of which some are listed below: warning: Function parameter or member 'adev' not described in 'amdgpu_atomfirmware_ras_rom_addr' ... warning: expecting prototype for amdgpu_atpx_validate_functions(). Prototype was for amdgpu_atpx_validate() instead ... warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new' ... warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of waves in-flight on this device ... warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:32:34 -05:00
Zhigang Luo	a5f67c939e	drm/amdgpu: recover XGMI topology for SRIOV VF after reset For SRIOV VF, the XGMI topology was not recovered after reset. This change added code to SRIOV VF reset function to update XGMI topology for SRIOV VF after reset. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:32:34 -05:00
Zhigang Luo	175ac6ec6b	drm/amdgpu: skip reset other device in the same hive if it's SRIOV VF On SRIOV, host driver can support FLR(function level reset) on individual VF within the hive which might bring the individual device back to normal without the necessary to execute the hive reset. If the FLR failed , host driver will trigger the hive reset, each guest VF will get reset notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:32:34 -05:00
shaoyunl	428890a3fe	drm/amdgpu: adjust the kfd reset sequence in reset sriov function This change revert previous commits: `9f4f2c1a35` ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") `271fd38ce5` ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: `9f4f2c1a35` ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: `271fd38ce5` ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-01 17:09:30 -05:00
Flora Cui	1053b9c948	drm/amdgpu: check atomic flag to differeniate with legacy path since vkms support atomic KMS interface Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <aleander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-01 16:59:38 -05:00
shaoyunl	992110d747	drm/amdgpu: adjust the kfd reset sequence in reset sriov function This change revert previous commits: `9f4f2c1a35` ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") `271fd38ce5` ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: `9f4f2c1a35` ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: `271fd38ce5` ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-01 16:07:33 -05:00
Stanley.Yang	232d1d43b5	drm/amdgpu: fix disable ras feature failed when unload drvier v2 v2: still need call ras_disable_all_featrures to handle ras initilization failure case. Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-01 16:03:22 -05:00
Flora Cui	700de2c8aa	drm/amdgpu: check atomic flag to differeniate with legacy path since vkms support atomic KMS interface Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <aleander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-12-01 16:03:03 -05:00
shaoyunl	271fd38ce5	drm/amdgpu: move kfd post_reset out of reset_sriov function Fixes: `9f4f2c1a35` ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") For sriov XGMI configuration, the host driver will handle the hive reset, so in guest side, the reset_sriov only be called once on one device. This will make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov function to make them balance . Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-24 15:08:58 -05:00
shaoyunl	4f30d920d1	drm/amdgpu: move kfd post_reset out of reset_sriov function Fixes: `9f4f2c1a35` ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") For sriov XGMI configuration, the host driver will handle the hive reset, so in guest side, the reset_sriov only be called once on one device. This will make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov function to make them balance . Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-24 14:06:52 -05:00
Evan Quan	6c08e0ef87	drm/amd/pm: avoid duplicate powergate/ungate setting Just bail out if the target IP block is already in the desired powergate/ungate state. This can avoid some duplicate settings which sometimes may cause unexpected issues. Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/ Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789 Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-22 14:45:02 -05:00
Evan Quan	6ee27ee27b	drm/amd/pm: avoid duplicate powergate/ungate setting Just bail out if the target IP block is already in the desired powergate/ungate state. This can avoid some duplicate settings which sometimes may cause unexpected issues. Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/ Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789 Fixes: `bf756fb833` ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend") Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2021-11-17 17:41:20 -05:00
Evan Quan	087451f372	drm/amdgpu: use generic fb helpers instead of setting up AMD own's. With the shadow buffer support from generic framebuffer emulation, it's possible now to have runpm kicked when no update for console. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-17 16:58:03 -05:00
shaoyunl	9f4f2c1a35	drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov The KFD pre_reset should be called before reset been executed, it will hold the lock to prevent other rocm process to sent the packlage to hiq during host execute the real reset on the HW Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-09 17:08:00 -05:00
Alex Deucher	2d32ffd6e9	drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support() Properly handle SI DC support when CONFIG_DRM_AMD_DC_SI is not set. Fixes: `f7f12b2582` ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-05 14:12:53 -04:00
James Zhu	93cec18478	drm/amdgpu: remove duplicated kfd_resume_iommu Remove duplicated kfd_resume_iommu which already runs in mdgpu_amdkfd_device_init. Tested-By: Ken Moffat <zarniwhoop@ntlworld.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-03 12:22:08 -04:00
Jingwen Chen	38d4e4638e	drm/amd/amdgpu: fix bad job hw_fence use after free in advance tdr [Why] In advance tdr mode, the real bad job will be resubmitted twice, while in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job is put one more time than other jobs. [How] Adding dma_fence_get before resbumit job in amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-11-03 12:22:07 -04:00
Lang Yu	a5c5d8d50e	drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw() amdgpu_fence_driver_sw_fini() should be executed before amdgpu_device_ip_fini(), otherwise fence driver resource won't be properly freed as adev->rings have been tore down. Fixes: `72c8c97b15` ("drm/amdgpu: Split amdgpu_device_fini into early and late") Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-28 14:26:12 -04:00
Dave Airlie	970eae1560	Linux 5.15-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmF298ceHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGIJYH/1rsEFQQ6caeQdy1 z9eFIe48DNM4l7bFk+qEj2UAbzPdahVJ299Mg5fW0n2CDemOc9/n0b9TxQ37YObi mOzu0xwJVupIxkyFMPQSSc2q8aLm67NSpJy08DsmaNses5hSvu8x15RPHLQTybjt SwtKns+jpCq79P1GWbrB5e5UkLb0VNoxNp4L1U4pMrYGcEkJUXbaxNY2V/JcXdM7 Vtn+qN0T/J6V6QVftv0t8Ecj3bjEnmL3kZHaTaNg3dGeKRpCGyHc5lcBQ0cNFG6t vjZ9VbuhBzGI3TN2tHH5hpA1UXo7HPBBCwQqxF1jeGLGHULikYwZ3TAPWqL3QZqC 9cxr9SY= =p75d -----END PGP SIGNATURE----- BackMerge tag 'v5.15-rc7' into drm-next The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in. Signed-off-by: Dave Airlie <airlied@redhat.com>	2021-10-28 14:59:38 +10:00
YuBiao Wang	a3848df60b	drm/amd/amdgpu: Do irq_fini_hw after ip_fini_early [Why] drm_irq_uninstall is called in irq_fini_hw so that irq is disabled in sw stage. SMU (and maybe other IP blocks) fini_hw will call irq_put for cleanup and the whole cleanup process will be skipped because of drm->irq_enable = false. [How] Move ip_fini_early before irq_fini_hw. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-19 17:16:50 -04:00
Yifan Zhang	afd18180c0	drm/amdkfd: fix boot failure when iommu is disabled in Picasso. When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2 init will fail. But this failure should not block amdgpu driver init. Reported-by: youling <youling257@gmail.com> Tested-by: youling <youling257@gmail.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-13 14:15:52 -04:00
Guchun Chen	c58a863b1c	drm/amdgpu: use adev_to_drm for consistency when accessing drm_device adev_to_drm is used everywhere, so improve recent changes when accessing drm_device pointer from amdgpu_device. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-08 13:22:13 -04:00
Nirmoy Das	58144d2837	drm/amdgpu: unify BO evicting method in amdgpu_ttm Unify BO evicting functionality for possible memory types in amdgpu_ttm.c. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-07 11:55:46 -04:00
Guchun Chen	248b061689	drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen. Fixes: `c9a6b82f45` ("drm/amdgpu: Implement DPC recovery") Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-05 13:02:31 -04:00
Yifan Zhang	714d9e4574	drm/amdgpu: init iommu after amdkfd device init This patch is to fix clinfo failure in Raven/Picasso: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.2 AMD-APP (3364.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 0 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-05 13:02:20 -04:00
Guchun Chen	e17e27f9bd	drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen. Fixes: `c9a6b82f45` ("drm/amdgpu: Implement DPC recovery") Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-05 12:23:56 -04:00
Christian König	127aedf979	drm/amdgpu: print warning and taint kernel if lockup timeout is disabled Make sure that we notice this in error reports. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-05 12:23:42 -04:00
Christian König	c8365dbda0	drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8" This reverts commit `728e7e0cd6`. Further discussion reveals that this feature is severely broken and needs to be reverted ASAP. GPU reset can never be delayed by userspace even for debugging or otherwise we can run into in kernel deadlocks. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-10-05 12:22:36 -04:00

1 2 3 4 5 ...

1106 Commits