When init failed in early init stage, amdgpu_object has
not been initialized, so hasn't the ttm delayed queue functions.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
scheduler in s3 test, otherwise, fence related failure will arrive
after resume. To fix this and for a better clean up, move drm_sched_fini
from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and
should never be called in hw_fini.
v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
to keep sw_init and sw_fini paired.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668
Fixes: 8d35a25961 ("drm/amdgpu: adjust fence driver enable sequence")
Suggested-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fence driver was enabled per ring when sw init on per IP block before.
Change to enable all the fence driver at the same time after
amdgpu_device_ip_init finished.
Rename some function related to fence to make it reasonable for read.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On Sienna Cichlid, in pass-through mode, if we unload the driver in BACO
mode(RTPM), then the kernel would receive thousands of interrupts.
That's because there is doorbell monitor interrupt on BIF, so KVM keeps
injecting interrupts to the guest VM. So we should clear the doorbell
interrupt status after BACO exit.
v2: Modify coding style and commit message
Signed-off-by: Chengzhe Liu <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use FAMILY_NV for cyan_skillfish.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Currently all timedout job will be considered to be guilty. In SRIOV
multi-vf use case, the vf flr happens first and then job time out is
found. There can be several jobs timeout during a very small time slice.
And if the innocent sdma job time out is found before the real bad
job, then the innocent sdma job will be set to guilty. This will lead
to a page fault after resubmitting job.
[How]
If the job is a kernel job, we will always consider it not guilty
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The callback functions are used for SRIOV read/write instead
of just for rlcg read/write
Signed-off-by: Roy Sun <Roy.Sun@amd.com>
Reviewed-by: Zhou pengju <pengju.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
UAPI Changes:
- Remove sysfs stats for dma-buf attachments, as it causes a performance regression.
Previous merge is not in a rc kernel yet, so no userspace regression possible.
Cross-subsystem Changes:
- Sanitize user input in kyro's viewport ioctl.
- Use refcount_t in fb_info->count
- Assorted fixes to dma-buf.
- Extend x86 efifb handling to all archs.
- Fix neofb divide by 0.
- Document corpro,gm7123 bridge dt bindings.
Core Changes:
- Slightly rework drm master handling.
- Cleanup vgaarb handling.
- Assorted fixes.
Driver Changes:
- Add support for ws2401 panel.
- Assorted fixes to stm, ast, bochs.
- Demidlayer ingenic irq.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmD5TGAACgkQ/lWMcqZw
E8PNgxAApjTYQSfjIBbOZnNraxW6w7/bPea35E9A47EdBQsNGnYftNsFjbrn/mCJ
D+0eRLjCMlg4FF1SHdh9cPJ35py+ygbDeupogboLITfU99eGBth3fM2Xdg9LPcBh
dbni/JLG9R7gIvSlqdJuweN21trfVrV/9FQEilG5DvQcl27Wx5g8VMRZke1EqGKX
7Id09Uq50ky18vhDjQRCveYhRqJAxV+XozBatzHyxpDVzjLQvRhlAAYdvrSMHZ5R
jreGzOfR8awc6Om+w7wx3Jn1oEGmXVZB/VqxEqGtMOr3lpARPucxrqfHsqpam3rv
yIoEKPrkG+k6fsU7Tbg59jNqe/PbCUW3AlpyuBxf55EbnVGgjLDbq4sRRMkehPfA
fhC31ujOXQQnAgaxyeQAaAJFKNFJzA8Cq5ZPfG+zztzuomHCiUVQBRowP65hJMzR
+ZlEDnhUD3STLz39zuO1reZR1ZoPIvKbsokHAA+ZrIwUd6U3D3ia8V51pq+lL5aS
TGDkyMN9jyZ+SO8Z7+2FnJAv9FAOPU/WCLU/fWW46jAvuezwMIwVcjfSqDU2XbZD
e7KgHpHhx3BGxI8TThHKlY7mf6IL2Bm7X1Cv1pdZs/eEn3Udh2ax942uTQZu/YOO
0AT1XchpvYCBNRw05bVI3OlJ+w3I8uV+h+11jHOKeY6cbwdHeKE=
=BUya
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2021-07-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.15-rc1:
UAPI Changes:
- Remove sysfs stats for dma-buf attachments, as it causes a performance regression.
Previous merge is not in a rc kernel yet, so no userspace regression possible.
Cross-subsystem Changes:
- Sanitize user input in kyro's viewport ioctl.
- Use refcount_t in fb_info->count
- Assorted fixes to dma-buf.
- Extend x86 efifb handling to all archs.
- Fix neofb divide by 0.
- Document corpro,gm7123 bridge dt bindings.
Core Changes:
- Slightly rework drm master handling.
- Cleanup vgaarb handling.
- Assorted fixes.
Driver Changes:
- Add support for ws2401 panel.
- Assorted fixes to stm, ast, bochs.
- Demidlayer ingenic irq.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2d0d2fe8-01fc-e216-c3fd-38db9e69944e@linux.intel.com
The VGA arbitration is entirely based on pci_dev structures, so just pass
that back to the set_vga_decode callback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-8-hch@lst.de
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
All callers pass NULL as the irq_set_state argument, so remove it and
the ->irq_set_state member in struct vga_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-7-hch@lst.de
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
split amdgpu_device_access_vram()
1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram
2. amdgpu_device_aper_access(): using vram aperature to access vram (option)
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 4192f7b576.
It is not true (as stated in the reverted commit changelog) that we never
unmap the BAR on failure; it actually does happen properly on
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
amdgpu_device_fini() error path.
What's worse, this commit actually completely breaks resource freeing on
probe failure (like e.g. failure to load microcode), as
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
early, leaving all the resources that'd normally be freed in
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
to all sorts of oopses when someone tries to, for example, access the
sysfs and procfs resources which are still around while the driver is
gone.
Fixes: 4192f7b576 ("drm/amdgpu: unmap register bar on device init failure")
Reported-by: Vojtech Pavlik <vojtech@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In some asics, we need to adjust the behavior according to the apu flags
at very early stage.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move shadow_list to struct amdgpu_bo_vm as shadow BOs
are part of PT/PD BOs.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To enable output on real display instead of virtual.
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds yellow carp support for gpu_info firmware and ip
block setting.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds yellow carp to amd_asic_type enum and amdgpu_asic_name[].
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Integrate two generic functions to determine if HDP
flush is needed for all Asics.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd-drm-next-5.14-2021-06-02:
amdgpu:
- GC/MM register access macro clean up for SR-IOV
- Beige Goby updates
- W=1 Fixes
- Aldebaran fixes
- Misc display fixes
- ACPI ATCS/ATIF handling rework
- SR-IOV fixes
- RAS fixes
- 16bpc fixed point format support
- Initial smartshift support
- RV/PCO power tuning fixes for suspend/resume
- More buffer object subclassing work
- Add new INFO query for additional vbios information
- Add new placement for preemptable SG buffers
amdkfd:
- Misc fixes
radeon:
- W=1 Fixes
- Misc cleanups
UAPI:
- Add new INFO query for additional vbios information
Useful for debugging vbios related issues. Proposed umr patch:
https://patchwork.freedesktop.org/patch/433297/
- 16bpc fixed point format support
IGT test:
https://lists.freedesktop.org/archives/igt-dev/2021-May/031507.html
Proposed Vulkan patch:
a25d480207
- Add a new GEM flag which is only used internally in the kernel driver. Userspace
is not allowed to set it.
drm:
- 16bpc fixed point format fourcc
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602214009.4553-1-alexander.deucher@amd.com
When we want to decouble resource management from buffer management we need to
be able to handle resources separately.
Add a resource pointer and rename bo->mem so that all code needs to
change to access the pointer instead.
No functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210430092508.60710-4-christian.koenig@amd.com
enable smart shift on dGPU if it is part of HG system and
the platform supports ATCS method to handle power shift.
V2: avoid psc updates in baco enter and exit (Lijo)
fix alignment (Shashank)
V3: rebased on unified ATCS handling. (Alex)
V4: check for return value and warn on failed update (Shashank)
return 0 if device does not support smart shift. (Lizo)
V5: rebased on ATPX/ATCS structures global (Alex)
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Drop DC initialization when DCN is harvested in VBIOS. The way
doesn't affect virtual display ip initialization.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Asher Song <Asher.Song@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch series are used for GC/MMHUB(part)/IH_RB_CNTL
indirect access in the SRIOV environment.
There are 4 bits, controlled by host, to control
if GC/MMHUB(part)/IH_RB_CNTL indirect access enabled.
(one bit is master bit controls other 3 bits)
For GC registers, changing all the register access from MMIO to
RLC and use RLC as the default access method in the full access time.
For partial MMHUB registers, changing their access from MMIO to
RLC in the full access time, the remaining registers
keep the original access method.
For IH_RB_CNTL register, changing it's access from MMIO to PSP.
Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Access to those must be prevented post pci_remove
v6: Drop BOs list, unampping VRAM BAR is enough.
v8:
Add condition of xgmi.connected_to_cpu to MTTR
handling and remove MTTR handling from the old place.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210517193105.491461-1-andrey.grodzovsky@amd.com
This should prevent writing to memory or IO ranges possibly
already allocated for other uses after our device is removed.
v5:
Protect more places wher memcopy_to/form_io takes place
Protect IB submissions
v6: Switch to !drm_dev_enter instead of scoping entire code
with brackets.
v7:
Drop guard of HW ring commands emission protection since they
are in GART and not in MMIO.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-10-andrey.grodzovsky@amd.com
Problem:
Handle all DMA IOMMU group related dependencies before the
group is removed. Those manifest themself in that when IOMMU
enabled DMA map/unmap is dependent on the presence of IOMMU
group the device belongs to but, this group is released once
the device is removed from PCI topology.
Fix:
Expedite all such unmap operations to pci remove driver callback.
v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
v6: Drop the BO unamp list
v7:
Drop amdgpu_gart_fini
In amdgpu_ih_ring_fini do uncinditional check (!ih->ring)
to avoid freeing uniniitalized rings.
Call amdgpu_ih_ring_fini unconditionally.
v8: Add deatiled explanation
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210517143851.475058-1-andrey.grodzovsky@amd.com
Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug
v5:
Move HW finilization into this callback to prevent MMIO accesses
post cpi remove.
v7:
Split kfd suspend from device exit to expdite HW related
stuff to amdgpu_pci_remove
v8:
Squash previous KFD commit into this commit to avoid compile break.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com
Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.
v4: Change functions prefix early->hw and late->sw
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-3-andrey.grodzovsky@amd.com
Rather than gpu info firmware.
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Same as navi series
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check cached firmware_flags to determine if gpu
virtualization is supported in vbios
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Judgement whether to add an sw ip according to the harvest info.
v2: fix indentation (Alex)
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rename,
ras_hw_supported --> ras_hw_enabled, and
ras_features --> ras_enabled,
to show that ras_enabled is a subset of
ras_hw_enabled, which itself is a subset
of the ASIC capability.
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove redundant ras->supported, as this value
is also stored in adev->ras_features.
Use adev->ras_features, as that supercedes "ras",
since the latter is its member.
The dependency goes like this:
ras <== adev->ras_features <== hw_supported,
and is read as "ras depends on ras_features, which
depends on hw_supported." The arrows show the flow
of information, i.e. the dependency update.
"hw_supported" should also live in "adev".
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change to 60s. This matches what we already do in virtualization.
Infinite timeout can lead to deadlocks in the kernel.
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.
The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.
Without this change, sriov tdr have gfx IB test fail.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Review-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_suspend’:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3733:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
mmhub ras is only avaiable in cerntain mmhub ip
generation.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
get pf2vf msg info at it's earliest time so that
guest driver can use these info to decide whether
register indirect access enabled.
Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If reset handler is not implemented, reset error before proceeding.
Fixes issue with the following trace -
[ 106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0
[ 106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 106.509116] [drm] PCIE GART of 512M enabled.
[ 106.509120] [drm] PTB located at 0x0000008000000000
[ 106.509136] [drm] VRAM is lost due to GPU reset!
[ 106.509332] [drm] PSP is resuming...
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add aldebaran to devices which support recovery
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Expose PG/CG set states functions for other clients
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This prefers reset control based handling if it's implemented
for a particular ASIC. If not, it takes the legacy path. It uses
the legacy method of preparing environment (job, scheduler tasks)
and restoring environment.
v2: remove unused variable (Alex)
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.
[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.
1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.
2. For whole gpu reset(vram lost), do resubmit as the old way.
v2: squash in build fix (Alex)
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following coccicheck warning:
drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING:
use scnprintf or sprintf
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[what]
currently driver recover vram after full access, which may hit
a corner case that meanwhile another whole gpu reset may be
triggered by another VF, which will cause vram recover fail
then fail the whole device reset.
[how]
move the recover vram into full access. So another bad VF will
not disturb the recover sequence for this vf.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed by: Monk.Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX is in gfxoff mode during s0ix so we shouldn't need to
actually tear anything down and restore it.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We handle it properly within the CG/PG functions directly
now.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not needed as the device is in gfxoff state so the CG/PG state
is handled just like it would be for gfxoff during runtime gfxoff.
This should also prevent delays on resume.
Reworked from Pratik's original patch (Alex)
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Provide and explanation as to why we skip GFX and PSP for
S0ix. GFX goes into gfxoff, same as runtime, so no need
to tear down and re-init. PSP is part of the always on
state, so no need to touch it.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The SMU expects CGPG to be enabled when entering S0ix.
with this we can re-enable SMU suspend.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This really needs to be done to properly tear down
the device. SMC, PSP, and GFX are still problematic,
need to dig deeper into what aspect of them that is
problematic.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No functional change.
v2: use correct dev
v3: rework
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move the non-DC specific code into the DCE IP blocks similar
to how we handle DC. This cleans up the common suspend
and resume pathes.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set flags at the top level pmops callbacks to track
state. This cleans up the current set of flags and
properly handles S4 on S0ix capable systems.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During system hibernation suspend still need un-gate gfx CG/PG firstly to handle HW
status check before HW resource destory.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There's no need to keep vgaswitcheroo around for HG
systems. They don't use muxes and their power control
is handled via ACPI.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 33cf440d59.
And it will enable mode-2 gpu reset for vangogh,
it asks PSP firmware version is 00.1A.00.0F or newer.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When recovery thread has begun GPU reset, there should be not other
threads to access hardware, otherwise system randomly hang.
v2 (chk): rewritten from scratch, use trylock and lockdep instead of
hand wiring the logic.
v3: add in_irq check
v4: change to check in_task
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We set the same variable a few lines above. Drop the duplicate
setting.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is to fix the case where it only enable the light SMU
on normal device init. This feature actually need to be enabled after ASIC
been reset as well.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It was leftover from radeon where it was required for some
specific old hardware. It hasn't been required for ages
and the driver already falls back to MMIO when legacy IO
is not available. Legacy IO also seems to be problematic on
on some thunderbolt devices. Drop it.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Interrupts on are non-reentrant on linux. This is just an ancient
leftover from radeon where irq processing was kicked of from different
places.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
cause:
It is necessary to send ras disable command to ras-ta during gfx
block ras later init, because the ras capability is disable read
from vbios for vega20 gaming, but the ras context is released
during ras init process, this will cause send ras disable command
to ras-to failed.
how:
Delay releasing ras context, the ras context
will be released after gfx block later init done.
Changed from V1:
move release_ras_context into ras_resume
Changed from V2:
check BIT(UMC) is more reasonable before access eeprom table
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a spelling mistake in a drm debug message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMU introduce the new interface to enable light Secondary Bus Reset mode, driver
enable it on passthrough + XGMI configuration
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices
without sync to each other. This could cause device hang since for XGMI configuration, all the devices
within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue
by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't
do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init
to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it
for reset purpose will prevent the reset function to adjust XGMI device list which is required
in next change
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu driver may be in reset state during init which will not initialize the kfd,
driver need to initialize the KFD after reset by check the flag
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
driver should use the gfx_info atomfirmware interface
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use MSG_GfxDriverReset for mode reset and retire MSG_Mode1Reset.
Centralize soc15_asic_mode1_reset() and nv_asic_mode1_reset()functions.
Add mode2_reset_is_support() for smu->ppt_funcs.
Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set ip blocks and asic family id
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Parses asic configurations stored in gpu_info firmware and make them available
for driver to use.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add aldebaran in amdgpu_asic_name array and amdgpu_asic_type enum
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When unloading driver after killing some applications, it will hit sdma
flush tlb job timeout which is called by ttm_bo_delay_delete. So
to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue
before fence driver fini. And also put drm_sched_fini before waiting fence.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
when try to shutdown guest vm in sriov mode, virt data
exchange is not fini. After vram lost, trying to write
vram could hang cpu.
[How]
add fini virt data exchange in ip_suspend
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Jack Zhang <Jack.Zhang1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sienna cichlid needs one vf mode which allows vf to set and get
clock status from guest vm. So now expose the required interface
and allow some smu request on VF mode. Also since this asic blocked
direct MMIO access, use KIQ to send SMU request under sriov vf.
OD use same command as getting pp table which is not allowed for
sienna cichlid, so remove OD feature under sriov vf.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Monk Liu<monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the number of badpage records exceed the threshold, driver has
updated both epprom header and control->tbl_hdr.header before gpu reset,
therefore GPU recovery thread no need to read epprom header directly.
v2: merge amdgpu_ras_check_err_threshold into amdgpu_ras_eeprom_check_err_threshold
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move gpu_reset_counter after drm_sched_stop to avoid race
condition caused by job submitted between reset_count +1 and
drm_sched_stop.
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the shutdown and poweroff opt on the s0i3 system we still need
un-gate the gfx clock gating and power gating before destory amdgpu device.
Fixes: 628c36d7b2 ("drm/amdgpu: update amdgpu device suspend/resume sequence for s0i3 support")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1499
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
As dimgrey_cavefish driver is stable enough, set gpu recovery as default
in HW hang for dimgrey_cavefish.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
the flag used by kfd is not actually related to fbcon, it just happens
to align. Use the runpm flag instead so that we can decouple it from
the fbcon flag.
v2: fix resume as well
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These are already called in amdgpu_device_suspend/resume which
are already called in the same functions.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to use generic PCI reset mechanisms (FLR, SBR) as
a reset mechanism to verify that the generic PCI reset mechanisms
are working properly.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Until the issues in the SMU firmware are fixed.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Fix a racing issue when jobs on 2 rings timeout simultaneously.
If 2 rings timed out at the same time, the
amdgpu_device_gpu_recover will be reentered. Then the
adev->gmc.xgmi.head will be grabbed by 2 local linked list,
which may cause wild pointer issue in iterating.
lock the device earily to prevent the node be added to 2
different lists.
also increase karma for the skipped job since the job is also
timed out and should be guilty.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The ip discovery is supported on green sardine, it doesn't need gpu info
firmware anymore.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add PCIE_SPEED_32_0GT and PCIE GEN5 support for amdgpu.
Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd-drm-next-5.12-2021-01-20:
amdgpu:
- Fix non-x86 build
- W=1 fixes from Lee Jones
- Enable GPU reset on Navy Flounder
- Kernel doc fixes
- SMU workload profile fixes for APUs
- Display updates
- SR-IOV fixes
- Vangogh SMU feature enablment and bug fixes
- GPU reset support for Vangogh
- Misc cleanups
Conflicts:
drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
Resolve the conflict by picking the initialization value from amd from
f03e80d2e8 ("drm/amd/display: Initialize stack variable") over the
one Linus picked in 61d791365b ("drm/amd/display: avoid
uninitialized variable warning"). It shouldn't matter.
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210120060951.22600-1-alexander.deucher@amd.com
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
UAPI Changes:
- Fix fourcc macro for amlogic video fbc.
Cross-subsystem Changes:
- Export pci_rebar_bytes_to_size.
- Add a PCI quirk to increase bar0 for RX 5600 XT Pulse to max possible size.
- Convert devicetree bindings to use the OF graph schema.
- Update s6e63m0 bindings.
- Make omapfb2 DSI_CM incompatible with drm/omap2 DSI-CM because of
module conflicts.
- Add Zack Rusin as vmwgfx maintainer.
- Add CONFIG_DMABUF_DEBUG for validating dma-buf users don't loo kat struct page when importing or detaching.
Core Changes:
- Remove references to drm_device.pdev
- Fix regression in ttm_bo_move_to_lru_tail().
- Assorted docbook updates.
- Do not send dp-mst hotplug events on error when probing.
- Move some agp macros to agpsupport.c, so it's not always compiled.
- Move drm_need_swiotlb.h to drm_cache.c
- Only build drm_memory.o for legacy drivers, and move CONFIG_DRM_VM to legacy.
- Nuke drm_device.hose
- Warn when the ttm resource manager is non-empty when disabling.
- Assorted small fixes.
Driver Changes:
- Small assorted fixes in radeon, v3d, hisilicon, mipi-dbi, panfrost, hibmc, vc4, amdgpu, vkms, vmwgfx.
- Move hisilicon to use simple encode.
- Add writeback connector to vkms.
- Add support for BT2020 to DE3.
- Use gem prime mmap helpers in vc4, and move the mmap function upwards.
- Use managed drm device, and cleanup error paths and display registers in vmwgfx.
- Use correct bus_format and connector_type for innolux_n116bge.
- Fix a lot of warnings with W=1 (Lee Jones)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmAGxDYACgkQ/lWMcqZw
E8NLbw//XnkthxvMTK1y/eoBqJ0yFrO8YIcECZMW3Gxte7nPeXILoMfWTcDJIXh7
jBYi0+Ko12bhnvSka1L1U2VJ3rnu8JGt2K+kWK8Jz4hZxd9wmaARL20Mhli2o+81
cboIQsEo9a6uiQs1BfLSFIki9MYSK69o0b0K3nw1X+XMhLVWOE7zgy0pVxsCKpBG
8mnwMfX7G7hrwivrGAObKTiVWigteZgUlxACnSqP63zXg1OQtMMzk/PiCqZfH38T
8ODiH/Y2Aq+6TUInag2EAsWIMcIkHadyEv7C8GAeGfNBpTg6oknPpFyPV9RSG+aP
Ofk18S6VAZO1uV/ynKwOcQqdU4J72k6nnkDM2mjfmc1sblauVYXEvm73RdBd/oGC
bPDX2roSxqsS+pUvI5Tj8CqahAPy925hOxTcK7/yky/sXfdNtC8ZJ5cmawGocQhZ
+8eDRgTsf7P7aYb7hdUGd5c2JkvMOt87REZeccRk1Ovrh/Z++D0936g/FAUpuDmL
pUppBEyptT/13fPsCv7Xe3z5NbopMmDR0gmeZ6/0FzV8S5Pjkb5nYaryfJU4oJqD
lE5AGazySuqOMb4wUqTURRuZsoRLswLaB+LS6zS55BEAYXHRt6OQvQXYrggRczi+
ie//r4L+qWaNkGO9ZyAQ4dyE46Lfp5KmJvQgqTC1r1ledh0xZt0=
=JIou
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2021-01-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.12:
UAPI Changes:
- Fix fourcc macro for amlogic video fbc.
Cross-subsystem Changes:
- Export pci_rebar_bytes_to_size.
- Add a PCI quirk to increase bar0 for RX 5600 XT Pulse to max possible size.
- Convert devicetree bindings to use the OF graph schema.
- Update s6e63m0 bindings.
- Make omapfb2 DSI_CM incompatible with drm/omap2 DSI-CM because of
module conflicts.
- Add Zack Rusin as vmwgfx maintainer.
- Add CONFIG_DMABUF_DEBUG for validating dma-buf users don't loo kat struct page when importing or detaching.
Core Changes:
- Remove references to drm_device.pdev
- Fix regression in ttm_bo_move_to_lru_tail().
- Assorted docbook updates.
- Do not send dp-mst hotplug events on error when probing.
- Move some agp macros to agpsupport.c, so it's not always compiled.
- Move drm_need_swiotlb.h to drm_cache.c
- Only build drm_memory.o for legacy drivers, and move CONFIG_DRM_VM to legacy.
- Nuke drm_device.hose
- Warn when the ttm resource manager is non-empty when disabling.
- Assorted small fixes.
Driver Changes:
- Small assorted fixes in radeon, v3d, hisilicon, mipi-dbi, panfrost, hibmc, vc4, amdgpu, vkms, vmwgfx.
- Move hisilicon to use simple encode.
- Add writeback connector to vkms.
- Add support for BT2020 to DE3.
- Use gem prime mmap helpers in vc4, and move the mmap function upwards.
- Use managed drm device, and cleanup error paths and display registers in vmwgfx.
- Use correct bus_format and connector_type for innolux_n116bge.
- Fix a lot of warnings with W=1 (Lee Jones)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5c3ad775-48ce-33ee-e4c6-a5e1e540f845@linux.intel.com
Remove unused space_needed variable.
Fixes: 453f617a30 ("drm/amdgpu: Resize BAR0 to the maximum available size, even if it doesn't cover VRAM")
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/413807/
amd-drm-next-5.12-2021-01-08:
amdgpu:
- Rework IH ring handling on vega and navi
- Rework HDP handling for vega and navi
- swSMU documenation updates
- Overdrive support for Sienna Cichlid and newer asics
- swSMU updates for vangogh
- swSMU updates for renoir
- Enable FP16 on DCE8-11
- Misc code cleanups and bug fixes
radeon:
- Fixes for platforms that can't access PCI resources correctly
- Misc code cleanups
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210108221811.3868-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Enable GPU reset when we encounter a hang.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
For multiple VF, after engine hang,as host driver will first
encounter FLR, so has no meanning to set compute to 60s.
v2:
Refine the patch and comment
Signed-off-by: Emily.Deng <Emily.Deng@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Using struct drm_device.pdev is deprecated. Convert amdgpu to struct
drm_device.dev. No functional changes.
v3:
* rebased
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210107080748.4768-3-tzimmermann@suse.de
Adhere to kernel coding style.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210107080748.4768-2-tzimmermann@suse.de
This allows BAR0 resizing to be done for cards which don't advertise
support for a size large enough to cover the VRAM but which do
advertise at least one size larger than the default. For example,
my RX 5600 XT, which advertises 256MB, 512MB and 1GB.
Signed-off-by: Darren Salt <devspam@moreofthesa.me.uk>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20210107175017.15893-4-nirmoy.das@amd.com
Enable gpu recovery for navy_flounder by default to trigger
reset once needed.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fix bug 210921 where DRM_INFO floods log when hitting an unsupported ASIC in
amdgpu_device_asic_has_dc_support(). This info should be only called once.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=210921
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
UAPI Changes:
- Not necessarily one, but we document that userspace needs to force probe connectors.
Cross-subsystem Changes:
- Require FB_ATY_CT for aty on sparc64.
- video: Fix documentation, and a few compiler warnings.
- Add devicetree bindings for DP connectors.
- dma-buf: Update kernel-doc, and add might_lock for resv objects in begin/end_cpu_access.
Core Changes:
- ttm: Warn when releasing a pinned bo.
- ttm: Cleanup bo size handling.
- cma-helper: Remove prime infix, and implement mmap as GEM CMA functions.
- Split drm_prime_sg_to_page_addr_arrays into 2 functions.
- Add a new api to install irq using devm.
- Update panel kerneldoc to inline style.
- Add DP support to drm/bridge.
- Assorted small fixes to ttm, fb-helper, scheduler.
- Add atomic_commit_setup function callback.
- Automatically use the atomic gamma_set, instead of forcing drivers to declare the default atomic version.
- Allow using degamma for legacy gamma if gamma is not available.
- Clarify that primary/cursor planes are not tied to 1 crtc (depending on possible_crtcs).
- ttm: Cleanup the lru handler.
Driver Changes:
- Add pm support to ingenic.
- Assorted small fixes in radeon, via, rockchip, omap2fb, kmb, gma500, nouveau, virtio, hisilicon, ingenic, s6e63m0 panel, ast, udlfb.
- Add BOE NV110WTM-N61, ys57pss36bh5gq, Khadas TS050 panels.
- Stop using pages with drm_prime_sg_to_page_addr_arrays, and switch all callers to use ttm_sg_tt_init.
- Cleanup compiler and docbook warnings in a lot of fbdev devices.
- Use the drmm_vram_helper in hisilicon.
- Add support for BCM2711 DSI1 in vc4.
- Add support for 8-bit delta RGB panels to ingenic.
- Add documentation on how to test vkms.
- Convert vc4 to atomic helpers.
- Use degamma instead of gamma table in omap, to add support for CTM and color encoding/range properties.
- Rework omap DSI code, and merge all omapdrm modules now that the last omap panel is now a drm panel.
- More refactoring of omap dsi code.
- Enable 10/12 bpc outputs in vc4.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl/bLtAACgkQ/lWMcqZw
E8M5mBAAs6iA3giF+LrzQzszZqOFmtExwjHsvJXDtiINrVubdRov2XbbaOZ+8Eax
Cnnc5QzokJV8v1IReImz+cP7i4uXyWmwQsthY2WvHYLXZmZfZUV+rK4cCtJIKXlR
9EvkEFtdPZ6OpyJ8p1G0r/UNqXxqOl4pEhp6NZdSj1mscz2yux/BuqX+1tX1j5+G
SIfwcKIY+nIwnVpKwhJplJNfFwthRzENxq2KMt+PGLcpOWLHEgwqzKoR+ddTcKf2
Nk1w2OERbnBVRt8NzCIeyczlSp4QuB+NpE5mFEBedmLatGYqwAIwPMt22AVRP9rc
SdXqjVxg3UJ5UNMYXQwjrnJq1nIm1ViYt7za/e2aMC8g9oI7tNQ5uI7XViO6+tk5
3vQV40pHVfMLGzkzqGhv9qHYAkkc+gPMAXwObuzmYppVVZn8dnrkAco7Ib8WPrRc
g8JD0bpIsOrgqCTj2W54fC/ZP7hQZpffOrxWY7alhlriEl4E7To6OvTwEavIzxCm
ajkO129oItOcATDTGiUV3JRWCTdCj4cNz7cQi4pC2QgaUIy8twjm57V8nnFDUCNg
ifi1dIY24DOrHxdmZ8sAWZRoPzcl5a84c8EqEkGZieEsdr3Za4XMZD8tqI2uSJrU
x+OfqgfvqNZenkuc+SUDZJQ855iqdkzJcqZnkAn61VpPh7XCtxk=
=orEC
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2020-12-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.12:
UAPI Changes:
- Not necessarily one, but we document that userspace needs to force probe connectors.
Cross-subsystem Changes:
- Require FB_ATY_CT for aty on sparc64.
- video: Fix documentation, and a few compiler warnings.
- Add devicetree bindings for DP connectors.
- dma-buf: Update kernel-doc, and add might_lock for resv objects in begin/end_cpu_access.
Core Changes:
- ttm: Warn when releasing a pinned bo.
- ttm: Cleanup bo size handling.
- cma-helper: Remove prime infix, and implement mmap as GEM CMA functions.
- Split drm_prime_sg_to_page_addr_arrays into 2 functions.
- Add a new api to install irq using devm.
- Update panel kerneldoc to inline style.
- Add DP support to drm/bridge.
- Assorted small fixes to ttm, fb-helper, scheduler.
- Add atomic_commit_setup function callback.
- Automatically use the atomic gamma_set, instead of forcing drivers to declare the default atomic version.
- Allow using degamma for legacy gamma if gamma is not available.
- Clarify that primary/cursor planes are not tied to 1 crtc (depending on possible_crtcs).
- ttm: Cleanup the lru handler.
Driver Changes:
- Add pm support to ingenic.
- Assorted small fixes in radeon, via, rockchip, omap2fb, kmb, gma500, nouveau, virtio, hisilicon, ingenic, s6e63m0 panel, ast, udlfb.
- Add BOE NV110WTM-N61, ys57pss36bh5gq, Khadas TS050 panels.
- Stop using pages with drm_prime_sg_to_page_addr_arrays, and switch all callers to use ttm_sg_tt_init.
- Cleanup compiler and docbook warnings in a lot of fbdev devices.
- Use the drmm_vram_helper in hisilicon.
- Add support for BCM2711 DSI1 in vc4.
- Add support for 8-bit delta RGB panels to ingenic.
- Add documentation on how to test vkms.
- Convert vc4 to atomic helpers.
- Use degamma instead of gamma table in omap, to add support for CTM and color encoding/range properties.
- Rework omap DSI code, and merge all omapdrm modules now that the last omap panel is now a drm panel.
- More refactoring of omap dsi code.
- Enable 10/12 bpc outputs in vc4.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/78381a4f-45fd-aed4-174a-94ba051edd37@linux.intel.com
When GFXOFF is enabled and GPU is idle, driver will fail to access some
registers. Therefore change to disable power gating before all access
registers with MMIO.
Dmesg log is as following:
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change it to check if the device has ACPI power resources.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In preparation for systems that support d3cold on dGPUs
independent of PX/HG. No functional change intended.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
S0ix only makes sense on APUs since they are part of the platform, so
only when the ASIC is APU should set amdgpu_acpi_is_s0ix_supported flag
to deal with the related situation.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Required backmerge since we will be based on top of v5.11, and there
has been a request to backmerge already to upstream some features.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Rename "ring_mirror_list" to "pending_list",
to describe what something is, not what it does,
how it's used, or how the hardware implements it.
This also abstracts the actual hardware
implementation, i.e. how the low-level driver
communicates with the device it drives, ring, CAM,
etc., shouldn't be exposed to DRM.
The pending_list keeps jobs submitted, which are
out of our control. Usually this means they are
pending execution status in hardware, but the
latter definition is a more general (inclusive)
definition.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/405573/
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>
Rename "node" to "list" in struct drm_sched_job,
in order to make it consistent with what we see
being used throughout gpu_scheduler.h, for
instance in struct drm_sched_entity, as well as
the rest of DRM and the kernel.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/403515/
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Christian König <christian.koenig@amd.com>
We only need to arbitrate VGA access on VGA compatible devices.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix the null pointer issue when runtime pm is triggered.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- Need skip the RLC/CP/GFX disable for let GFXOFF enter during suspend period.
- For s0i3 suspend only need suspend DCE and each IP interrupt.
- Before VBIOS POSTed check and atom HW INT need set the GPU power status change
to D0 in the resume period, otherwise the HW will be mess up and see the SDMA hang.
- Need handle the GPU reset path during amdgpu device suspend.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:594: warning: Function parameter or member 'reg_addr' not described in 'amdgpu_device_indirect_rreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:624: warning: Function parameter or member 'reg_addr' not described in 'amdgpu_device_indirect_rreg64'
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoids confusion in configurations.
v2: fix build when CONFIG_DRM_AMD_DC_DCN is disabled
v3: rebase on latest code
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A kernel-doc markup can't be mixed with a random comment,
as it causes parsing problems.
While here, change an invalid kernel-doc markup into
a common comment.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Running "make htmldocs: produce lots of warnings on those files:
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
They're related to the repacement of some parameters by adev,
and due to a few renamed parameters.
While here, uniform the name of the parameter for it to be
the same on all functions using a pointer to struct amdgpu_device.
Update the kernel-doc documentation accordingly.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a helper so we can set per asic default values. Also,
the module parameter is currently clamped to 8, but clamp it
per asic just in case some asics have different limits in the
future. Enable the option on gfx6,7 as well for consistency.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to enable display IP block for vangogh platforms.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Current code wrongly treat all cases as job == NULL.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-and-tested-by: Jane Jian <Jane.Jian@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add DM block support for dimgrey_cavefish.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove gpu_info fw support for dimgrey_cavefish, gpu info can be got
from ip discovery.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Same as navi series.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds green_sardine support for gpu_info firmware and ip block setting.
v2: use apu flag
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The default auto setting for kcq should not generate
a warning.
Fixes: a300de40f6 ("drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds van gogh support for gpu_info firmware and ip block setting.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds van gogh to amd_asic_type enum and amdgpu_asic_name[].
v2: add missing comma
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
support both direct and indirect accessor in unified
helper functions.
v2: Retire indirect mmio access via mm_index/data
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add helper function in order to remove RREG32/WREG32
in current pcie_rreg/wreg function for soc15 and
onwards adapters.
PCIE_INDEX/DATA pairs are used to access regsiters
outside of mmio bar in the helper functions.
The new helper functions help remove the recursion
of amdgpu_mm_rreg/wreg from pcie_rreg/wreg and
provide the oppotunity to centralize direct and
indirect access in a single function.
v2: Fixed typo and refine the comments
v3: Remove unnecessary volatile local variable
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In FLR routine, init_data_exchange is called at reset_sriov
while fini_data_exchange is not. This will duplicating work
thread.
So call fini_data_exchange before reset for SRIOV
Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com>
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- Refactor the driver code to use amdgpu_virt_read_pf2vf_data
and amdgpu_virt_write_vf2pf_data instead of writing all code in
one function (which is the old amdgpu_virt_init_data_exchange)
- Adding a new transaction method for VF2PF message between host
and guest driver. Guest side will periodically update VF2PF
message in the framebuffer.
In the new header, we include guest ucode information, guest
framebuffer usage, and engine usage
- Clean up the old macros since they will cause compile error if
the new transaction method is used
v2: squash in build fix
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This will allow us to have different defaults per asic
in a future patch.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove gpu_info fw support for sienna_cichlid etc., since the
information can be retrieved from discovery binary.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We never unmapped the regiser BAR on failure.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not being able to create amdgpu sysfs attributes
is not a fatal error warranting not to continue
to try to bring up the display. Thus, if we get
an error trying to create amdgpu sysfs attrs,
report it and continue on to try to bring up
a display.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Slava Abramov <slava.abramov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
XGMI support is more complicated than single device support as
questions of synchronization between the device recovering from
PCI error and other members of the hive are required.
Leaving this for next round.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cache the PCI state on boot and before each case where we might
loose it.
v2: Add pci_restore_state while caching the PCI state to avoid
breaking PCI core logic for stuff like suspend/resume.
v3: Extract pci_restore_state from amdgpu_device_cache_pci_state
to avoid superflous restores during GPU resets and suspend/resumes.
v4: Style fixes.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wait for HW/PSP initiated ASIC reset to complete before
starting the recovery operations.
v2: Remove typo
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DPC recovery involves ASIC reset just as normal GPU recovery so block
SW GPU schedulers and wait on all concurrent GPU resets.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
At this point the ASIC is already post reset by the HW/PSP
so the HW not in proper state to be configured for suspension,
some blocks might be even gated and so best is to avoid touching it.
v2: Rename in_dpc to more meaningful name
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The DRM device is a static member of
the amdgpu device structure and as such
always exists, so long as the PCI and
thus the amdgpu device exist.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When GPU is in reset, its status isn't stable and ring buffer also need
be reset when resuming. Therefore driver should protect GPU recovery
thread from ring buffer accessed by other threads. Otherwise GPU will
randomly hang during recovery.
v2: correct indent
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes below compiler warnings:
CC [M] drivers/gpu/drm/amd/amdgpu/amdgpu_device.o
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
381 | void static inline amdgpu_mm_wreg_mmio(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags)
| ^~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_fini’:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3381:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
3381 | int r;
| ^
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize code to iterate less loops in
amdgpu_device_ip_reinit_early_sriov()
Signed-off-by: Jiawei <Jiawei.Gu@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Virtual display is non-atomic so report false to avoid checking
atomic state and other atomic things at runtime.
v2: squash into the sr-iov check
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to add asic specific workarounds for atom
asic init while keeping the adev specifics out of the
atombios parser code.
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
a) Embed struct drm_device into struct amdgpu_device.
b) Modify the inline-f drm_to_adev() accordingly.
c) Modify the inline-f adev_to_drm() accordingly.
d) Eliminate the use of drm_device.dev_private,
in amdgpu.
e) Switch from using drm_dev_alloc() to
drm_dev_init().
f) Add a DRM driver release function, which frees
the container amdgpu_device after all krefs on
the contained drm_device have been released.
v2: Split out adding adev_to_drm() into its own
patch (previous commit), making this patch
more succinct and clear. More detailed commit
description.
v3: squash in fix to call drmm_add_final_kfree()
to avoid a warning.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.
v2: Use a typed visible static inline function
instead of an invisible macro.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change to dynamically create and release hive info object,
which help driver support more hives in the future.
v2:
Change to save hive object pointer in adev, to avoid locking
xgmi_mutex every time when calling amdgpu_get_xgmi_hive.
v3:
1. Change type of hive object pointer in adev from void* to
amdgpu_hive_info*.
2. remove unnecessary variable initialization.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Using dev_xxx instead of DRM_xxx/pr_xxx to indicate which device
of a hive is the message for.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
in single gpu system, if driver reenter gpu recovery,
amdgpu_device_lock_adev will return false, but hive is
nullptr now.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
clients don't need reset-lock for synchronization when no
GPU recovery.
v2:
change to return the return value of down_read_killable.
v3:
if GPU recovery begin, VF ignore FLR notification.
Reviewed-by: Monk Liu <monk.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.
v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only for no job running test case need to do recover in
flr notification.
For having job in mirror list, then let guest driver to
hit job timeout, and then do recover.
Signed-off-by: jqdeng <Emily.Deng@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The whole approach wasn't thought through till the end.
We already had a reset lock like this in the past and it caused the same problems like this one.
Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.
This reverts commit df9c8d1aa2.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cover the implementation details from outside(of power). Also preparing
for expanding this to swSMU.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Backmerging drm-next into drm-misc-next for nouveau and panel updates.
Resolves a conflict between ttm and nouveau, where struct ttm_mem_res got
renamed to struct ttm_resource.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Assigning false to block->status.hw overwrites PSP's previous
hardware status, which causes the PSP to Resume operation after
hardware init.
Remove this assignment and let the PSP execute Resume operation
when it is told to.
v2: Remove the braces.
v3: Modify the description.
Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a spelling mistake in a dev_warn message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's related to the memory manager so move it there.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
what:
the MQD's save and restore of KCQ (kernel compute queue)
cost lots of clocks during world switch which impacts a lot
to multi-VF performance
how:
introduce a paramter to control the number of KCQ to avoid
performance drop if there is no kernel compute queue needed
notes:
this paramter only affects gfx 8/9/10
v2:
refine namings
v3:
choose queues for each ring to that try best to cross pipes evenly.
v4:
fix indentation
some cleanupsin the gfx_compute_queue_acquire()
v5:
further fix on indentations
more cleanupsin gfx_compute_queue_acquire()
TODO:
in the future we will let hypervisor driver to set this paramter
automatically thus no need for user to configure it through
modprobe in virtual machine
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When GPU executes recovery and retriving bad GPU tag
from external eerpom device, the recovery will be broken
and error message is printed as well for user's awareness.
v2: Refine warning message in threshold reaching case, and
fix spelling typo.
v3: Fix explicit calling of bad gpu.
v4: Rename function names.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When retrieving bad gpu tag from eeprom, GPU init should
fail as the GPU needs to be retired for further check.
v2: Fix spelling typo, correct the condition to detect
bad gpu tag and refine error message.
v3: Refine function argument name.
v4: Fix missing check of returning value of i2c
initialization error case.
v5: Use dev_err to print PCI information in dmesg instead
of DRM_ERROR.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assigning false to block->status.hw overwrites PSP's previous
hardware status, which causes the PSP to Resume operation after
hardware init.
Remove this assignment and let the PSP execute Resume operation
when it is told to.
v2: Remove the braces.
v3: Modify the description.
Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
amdgpu_device.c requires changes for SI chipsets support
si.c require changes for Display Manager IP block enabling
[How]
amdgpu_device.c: add SI families in amdgpu_device_asic_has_dc_support()
si.c: changes in si_set_ip_blocks() for Display Manager IP blocks enablement
(v1) NOTE: As per Kaveri and older amdgpu.dc=1 kernel cmdline is required
(v2) fix for bc011f9350 ("drm/amdgpu: Change SI/CI gfx/sdma/smu init sequence")
remove CHIP_HAINAN support since it does not have physical DCE6 module
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.
During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.
v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.
v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;
[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249] dump_stack+0x98/0xd5
[ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833] free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831] ksys_ioctl+0x98/0xb0
[ 1230.204004] __x64_sys_ioctl+0x1a/0x20
[ 1230.205174] do_syscall_64+0x5f/0x250
[ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe
2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.
v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset
v5:
1. Fix some style issues.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: Luben Tukov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Plumb DC support for navy flounder through.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the asic family and IP blocks for navy flounder.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If we are in RAS triggered situation and
BACO isn't support, emergency restart is needed,
and this code is only needed for some specific
cases(vega20 with given smu fw version).
After we add smu mode1 reset for sienna cichlid, we
need to share AMD_RESET_METHOD_MODE1 with psp mode1 reset,
so in amdgpu_device_gpu_recover, we need differentiate
which mode1 reset we are using, then decide if it's
a full reset and then decide if emergency restart is needed,
the logic will become much more complex.
After discussion with Hawking, move emergency restart logic
to an independent function.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cleanup of phase1 suspend code to reduce unnecessary indentation.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 2eee0229f6.
Fallback to a stable base until we have a correct new one
Signed-off-by:Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
issue:
originally we kickoff IB test asynchronously with driver's
init, thus
the IB test may still running when the driver loading
done (modprobe amdgpu done).
if we shutdown VM immediately after amdgpu driver
loaded then GPU may
hang because the IB test is still running
fix:
flush the delayed_init routine at the bottom of device_init
to avoid driver loading done prior to the IB test completes
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After we move request full access before set
ip blocks, we can merge atombios init block
of SRIOV and baremetal together.
Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>