Commit Graph

821 Commits

Author SHA1 Message Date
Yong Zhao
8bdab6bb1c drm/amdgpu: Increase timout on emulator to tenfold instead of twice
Since emulators are slower, sometime some operations like flushing tlb
through FM need more than twice the regular timout of 100ms, so increase
the timeout to 1s on emulators.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26 14:21:03 -05:00
Dave Airlie
1b245ec5b6 drm-misc-next for 5.7:
UAPI Changes:
   - lima: Add support for heap buffers
 
 Cross-subsystem Changes:
 
 Core Changes:
   - Implement mode_config mode_valid for memory constrained drivers
   - Bus format negociation between bridges
   - Consolidate fake vblank events for drivers without vblank interrupts
   - drm/bufs: dma_alloc related cleanups
   - drm/dp_mst: Various fixes
   - drm/print: New drm_device based print helpers
   - Thomas is a drm-misc maintainer now!
 
 Driver Changes:
   - DPMS cleanups for atomic drivers
   - Removal of owner field in SPI tinydrm drivers
   - Removal of explicit dependency on DT for tinydrm drivers
   - Conversion to YAML schemas for DT bindings
   - tidss: New driver
   - virtio: various reworks and fixes
   - Our usual dozen or so new panels or bridges
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXkEjOgAKCRDj7w1vZxhR
 xeaDAQD+1MludG4RmfQhATe4jTsPC1r2x63OF2CA0ChMGHXJyQEA8qqQ+8y1Cd/u
 PZ3PpcTl4qYYHgzJ6FwW7kDPTvlaZQE=
 =IJAt
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2020-02-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.7:

UAPI Changes:
  - lima: Add support for heap buffers

Cross-subsystem Changes:

Core Changes:
  - Implement mode_config mode_valid for memory constrained drivers
  - Bus format negociation between bridges
  - Consolidate fake vblank events for drivers without vblank interrupts
  - drm/bufs: dma_alloc related cleanups
  - drm/dp_mst: Various fixes
  - drm/print: New drm_device based print helpers
  - Thomas is a drm-misc maintainer now!

Driver Changes:
  - DPMS cleanups for atomic drivers
  - Removal of owner field in SPI tinydrm drivers
  - Removal of explicit dependency on DT for tinydrm drivers
  - Conversion to YAML schemas for DT bindings
  - tidss: New driver
  - virtio: various reworks and fixes
  - Our usual dozen or so new panels or bridges

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200210093421.xu4sofldm6wm6xq6@gilmour.lan
2020-02-21 05:44:40 +10:00
Rajneesh Bhardwaj
9593f4d6a6 drm/amdkfd: refactor runtime pm for baco
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.

Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:00:54 -05:00
Christian König
c12b84d6e0 drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2
This should speed up debugging VRAM access a lot.

v2: add HDP flush/invalidate

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-07 11:45:25 -05:00
Christian König
ce05ac56e6 drm/amdgpu: optimize amdgpu_device_vram_access a bit.
Only write the _HI register when necessary.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-07 11:45:17 -05:00
Jack Zhang
86b93fd62d drm/amdgpu/sriov Don't send msg when smu suspend
For sriov and pp_onevf_mode, do not send message to set smu
status, because smu doesn't support these messages under VF.

Besides, it should skip smu_suspend when pp_onevf_mode is disabled.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-07 11:44:24 -05:00
Alex Deucher
2cb44fb093 drm/amdgpu: enable GPU reset by default on renoir
Everything is in place.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:45 -05:00
Alex Deucher
658c663947 drm/amdgpu: enable GPU reset by default on Navi
Has been working fine for a while.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:45 -05:00
Chris Wilson
7e13ad8964 drm: Avoid drm_global_mutex for simple inc/dec of dev->open_count
Since drm_global_mutex is a true global mutex across devices, we don't
want to acquire it unless absolutely necessary. For maintaining the
device local open_count, we can use atomic operations on the counter
itself, except when making the transition to/from 0. Here, we tackle the
easy portion of delaying acquiring the drm_global_mutex for the final
release by using atomic_dec_and_mutex_lock(), leaving the global
serialisation across the device opens.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Hellström (VMware) <thomas_os@shipmail.org>
Reviewed-by: Thomas Hellström <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124130107.125404-1-chris@chris-wilson.co.uk
2020-01-24 17:41:34 +00:00
Nirmoy Das
a9d4fe2fd6 drm/amdgpu: remove unnecessary conversion to bool
Better clean that up before some automation starts to complain about it

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:55:27 -05:00
chen gong
c68dbcd8f9 drm/amdgpu: add kiq version interface for RREG32/WREG32
Reading some registers by mmio will result in hang when GPU is in
"gfxoff" state.This problem can be solved by GPU in "ring command
packages" way.

Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:34:22 -05:00
chen gong
d33a99c4b6 drm/amdgpu: provide a generic function interface for reading/writing register by KIQ
Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c,
and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and
flexible.

Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22 16:34:14 -05:00
Pan, Xinhui
bd05221123 drm/amdgpu: add the lost mutex_init back
Initialize notifier_lock.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/1016
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-17 16:03:09 -05:00
Hawking Zhang
e9d4cf918f drm/amdgpu: add arcturus to gpu recovery check code path
support check if dirver should try gpu recovery for
arcturus

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:40:54 -05:00
Evan Quan
9530273ec9 drm/amd/powerplay: cover the powerplay implementation details V3
This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.

V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-14 10:18:08 -05:00
Jack Zhang
895bd048fb amd/amdgpu/sriov tdr enablement with pp_onevf_mode
Under sriov and pp_onevf mode,
1.take resume instead of hw_init for smc recover to avoid
potential memory leak.

2.add return condition inside smc resume function for
sriov_pp_onevf_mode and pm_enabled param.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-07 11:58:19 -05:00
zhengbin
2a9b90ae47 drm/amdgpu: use true, false for bool variable in amdgpu_device.c
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3961:1-19: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3981:1-19: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-23 15:00:01 -05:00
Ma Feng
e3c00faa7a drm/amdgpu: Remove unneeded variable 'ret' in amdgpu_device.c
Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1036:5-8: Unneeded variable: "ret". Return "0" on line 1079

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-23 14:58:56 -05:00
Jane Jian
d83c7a07a7 drm/amdgpu: update VCN1(dual instances) fw types ID and VCN ip block type
Previously there is no VCN1 type ID in psp gfx interface. Also add VCN ip
block type unless the reinit after FLR for sriov would fail.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:33:26 -05:00
Andrey Grodzovsky
c96cf2823d drm/amdgpu: Switch from system_highpri_wq to system_unbound_wq
This is to avoid queueing jobs to same CPU during XGMI hive reset
because there is a strict timeline for when the reset commands
must reach all the GPUs in the hive.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:13 -05:00
Andrey Grodzovsky
c6a6e2db99 drm/amdgpu: Redo XGMI reset synchronization.
Use task barrier in XGMI hive to synchronize ASIC resets
across devices in XGMI hive.

v2: Return right away with a warning if no xgmi hive, update doc.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:13 -05:00
Andrey Grodzovsky
041a62bc06 drm/amdgpu: reverts commit ce316fa55e.
In preparation for doing XGMI reset synchronization using task barrier.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Monk Liu
5a7489a7e1 drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV
issues:
MEC is ruined by the amdkfd_pre_reset after VF FLR done

fix:
amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR,
the correct sequence is do amdkfd_pre_reset before VF FLR but there is
a limitation to block this sequence:
if we do pre_reset() before VF FLR, it would go KIQ way to do register
access and stuck there, because KIQ probably won't work by that time
(e.g. you already made GFX hang)

so the best way right now is to simply remove it.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Nirmoy Das
f880799d7f amd/amdgpu: add sched array to IPs with multiple run-queues
This sched array can be passed on to entity creation routine
instead of manually creating such sched array on every context creation.

v2: squash in missing break fix

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Nirmoy Das
0c88b43032 drm/amdgpu: replace vm_pte's run-queue list with drm gpu scheds list
drm_sched_entity_init() takes drm gpu scheduler list instead of
drm_sched_rq list. This makes conversion of drm_sched_rq list
to drm gpu scheduler list unnecessary

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Emily Deng
8973d9ec8f drm/amdgpu/sriov: Tonga sriov also need load firmware with smu
Fix Tonga sriov load driver fail issue.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewd-by Yintian Tao <Yintian.tao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:06 -05:00
Yong Zhao
d7f72fe482 drm/amdgpu: Add CU info print log
The log will be useful for easily getting the CU info on various
emulation models or ASICs.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:06 -05:00
Simon Ser
93b09a9a89 drm/amdgpu: log when amdgpu.dc=1 but ASIC is unsupported
This makes it easier to figure out whether the kernel parameter has been
taken into account.

Signed-off-by: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-11 15:22:08 -05:00
Yintian Tao
c9ffa427db drm/amd/powerplay: enable pp one vf mode for vega10
Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response

v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-11 15:22:07 -05:00
Le Ma
00eaa57172 drm/amdgpu: clear err_event_athub flag after reset exit
Otherwise next err_event_athub error cannot call gpu reset. And following
resume sequence will not be affected by this flag.

v2: create function to clear amdgpu_ras_in_intr for modularity of ras driver

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:26:11 -05:00
Le Ma
b823821f22 drm/amdgpu: support full gpu reset workflow when ras err_event_athub occurs
This athub fatal error can be recovered by baco without system-level reboot,
so add a mode to use baco for the recovery. Not affect the default psp reset
situations for now.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:26:03 -05:00
Le Ma
ce316fa55e drm/amdgpu: add concurrent baco reset support for XGMI
Currently each XGMI node reset wq does not run in parrallel if bound to same
cpu. Make change to bound the xgmi_reset_work item to different cpus.

XGMI requires all nodes enter into baco within very close proximity before
any node exit baco. So schedule the xgmi_reset_work wq twice for enter/exit
baco respectively.

To use baco for XGMI, PMFW supported for baco on XGMI needs to be involved.

The case that PSP reset and baco reset coexist within an XGMI hive never exist
and is not in the consideration.

v2: define use_baco flag to simplify the code for xgmi baco sequence

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:25:53 -05:00
Le Ma
7a22677b95 drm/amdgpu: enable/disable doorbell interrupt in baco entry/exit helper
This operation is needed when baco entry/exit for ras recovery

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-05 16:25:45 -05:00
Emily Deng
0ea203a912 drm/amdgpu/sriov: No need the event 3 and 4 now
As will call unload kms when initialize fail, and the unload kms will
send event 3 and 4, so don't need event 3 and 4 in device init.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-02 17:39:28 -05:00
Yintian Tao
7c868b592d drm/amdgpu: not remove sysfs if not create sysfs
When load amdgpu failed before create pm_sysfs and ucode_sysfs,
the pm_sysfs and ucode_sysfs should not be removed.
Otherwise, there will be warning call trace just like below.
[   24.836386] [drm] VCE initialized successfully.
[   24.841352] amdgpu 0000:00:07.0: amdgpu_device_ip_init failed
[   25.370383] amdgpu 0000:00:07.0: Fatal error during GPU init
[   25.889575] [drm] amdgpu: finishing device.
[   26.069128] amdgpu 0000:00:07.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[   26.070110] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[   26.200309] [TTM] Finalizing pool allocator
[   26.200314] [TTM] Finalizing DMA pool allocator
[   26.200349] [TTM] Zone  kernel: Used memory at exit: 0 KiB
[   26.200351] [TTM] Zone   dma32: Used memory at exit: 0 KiB
[   26.200353] [drm] amdgpu: ttm finalized
[   26.205329] ------------[ cut here ]------------
[   26.205330] sysfs group 'fw_version' not found for kobject '0000:00:07.0'
[   26.205347] WARNING: CPU: 0 PID: 1228 at fs/sysfs/group.c:256 sysfs_remove_group+0x80/0x90
[   26.205348] Modules linked in: amdgpu(OE+) gpu_sched(OE) ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache binfmt_misc snd_hda_codec_generic ledtrig_audio crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hda_core snd_hwdep snd_pcm snd_timer input_leds snd joydev soundcore serio_raw pcspkr evbug aesni_intel aes_x86_64 crypto_simd cryptd mac_hid glue_helper sunrpc ip_tables x_tables autofs4 8139too psmouse 8139cp mii i2c_piix4 pata_acpi floppy
[   26.205369] CPU: 0 PID: 1228 Comm: modprobe Tainted: G           OE     5.2.0-rc1 #1
[   26.205370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   26.205372] RIP: 0010:sysfs_remove_group+0x80/0x90
[   26.205374] Code: e8 35 b9 ff ff 5b 41 5c 41 5d 5d c3 48 89 df e8 f6 b5 ff ff eb c6 49 8b 55 00 49 8b 34 24 48 c7 c7 48 7a 70 98 e8 60 63 d3 ff <0f> 0b eb d7 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[   26.205375] RSP: 0018:ffffbee242b0b908 EFLAGS: 00010282
[   26.205376] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[   26.205377] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff97ad6f817380
[   26.205377] RBP: ffffbee242b0b920 R08: ffffffff98f520c4 R09: 00000000000002b3
[   26.205378] R10: ffffbee242b0b8f8 R11: 00000000000002b3 R12: ffffffffc0e58240
[   26.205379] R13: ffff97ad6d1fe0b0 R14: ffff97ad4db954c8 R15: ffff97ad4db7fff0
[   26.205380] FS:  00007ff3d8a1c4c0(0000) GS:ffff97ad6f800000(0000) knlGS:0000000000000000
[   26.205381] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.205381] CR2: 00007f9b2ef1df04 CR3: 000000042aab8001 CR4: 00000000003606f0
[   26.205384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   26.205385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   26.205385] Call Trace:
[   26.205461]  amdgpu_ucode_sysfs_fini+0x18/0x20 [amdgpu]
[   26.205518]  amdgpu_device_fini+0x3b4/0x560 [amdgpu]
[   26.205573]  amdgpu_driver_unload_kms+0x4f/0xa0 [amdgpu]
[   26.205623]  amdgpu_driver_load_kms+0xcd/0x250 [amdgpu]
[   26.205637]  drm_dev_register+0x12b/0x1c0 [drm]
[   26.205695]  amdgpu_pci_probe+0x12a/0x1e0 [amdgpu]
[   26.205699]  local_pci_probe+0x47/0xa0
[   26.205701]  pci_device_probe+0x106/0x1b0
[   26.205704]  really_probe+0x21a/0x3f0
[   26.205706]  driver_probe_device+0x11c/0x140
[   26.205707]  device_driver_attach+0x58/0x60
[   26.205709]  __driver_attach+0xc3/0x140

Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-02 17:39:05 -05:00
Alex Deucher
de185019bc drm/amdgpu: move pci handling out of pm ops
The documentation says the that PCI core handles this
for you unless you choose to implement it.  Just rely
on the PCI core to handle the pci specific bits.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-26 14:51:03 -05:00
Alex Deucher
3840c5bcc2 drm/amdgpu: disentangle runtime pm and vga_switcheroo
Originally we only supported runtime pm on PX/HG laptops
so vga_switcheroo and runtime pm are sort of entangled.

Attempt to logically separate them.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:53 -05:00
Alex Deucher
361dbd01a1 drm/amdgpu: add helpers for baco entry and exit
BACO - Bus Active, Chip Off

Will be used for runtime pm.  Entry will enter the BACO
state (chip off).  Exit will exit the BACO state (chip on).

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:52 -05:00
Alex Deucher
31af062acf drm/amdgpu: rename amdgpu_device_is_px to amdgpu_device_supports_boco (v2)
BACO - Bus Active, Chip Off
BOCO - Bus Off, Chip Off

To better match what we are checking for and to align with
amdgpu_device_supports_baco.

BOCO is used on PowerXpress/Hybrid Graphics systems and BACO
is used on desktop dGPU boards.

v2: fix typo in documentation

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:50 -05:00
Alex Deucher
a69cba42b1 drm/amdgpu: add a amdgpu_device_supports_baco helper
BACO - Bus Active, Chip Off

To check if a device supports BACO or not.  This will be
used in determining when to enable runtime pm.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 16:42:49 -05:00
Yintian Tao
d0d13fe874 drm/amdgpu: put flush_delayed_work at first
There is one regression from 042f3d7b745cd76aa
To put flush_delayed_work after adev->shutdown = true
which will make amdgpu_ih_process not response the irq
At last, all ib ring tests will be failed just like below

[drm] amdgpu: finishing device.
[drm] Fence fallback timer expired on ring gfx
[drm] Fence fallback timer expired on ring comp_1.0.0
[drm] Fence fallback timer expired on ring comp_1.1.0
[drm] Fence fallback timer expired on ring comp_1.2.0
[drm] Fence fallback timer expired on ring comp_1.3.0
[drm] Fence fallback timer expired on ring comp_1.0.1
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.1.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.2.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.3.1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on sdma1 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd_enc_0.0 (-110).
amdgpu 0000:00:07.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vce0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).

v2: replace cancel_delayed_work_sync() with flush_delayed_work()

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 10:12:51 -05:00
Leo Liu
52f2e779ad drm/amdgpu: add driver support for JPEG2.0 and above
By using JPEG IP block type

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19 10:12:50 -05:00
yu kuai
b8b7213057 drm/amdgpu: add function parameter description in 'amdgpu_device_set_cg_state'
Fixes gcc warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1954: warning: Function
parameter or member 'state' not described in 'amdgpu_device_set_cg_state'

Fixes: e3ecdffac9 ("drm/amdgpu: add documentation for amdgpu_device.c")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Bhawanpreet Lakha
b86a1aa36a drm/amd/display: rename DCN1_0 kconfig to DCN
Since dcn20 and dcn21 are under dcn1 it doesnt make sense to
have it named dcn1.

Change it to "dcn" to make it generic

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Bhawanpreet Lakha
aca935c7cc drm/amd/display: Drop CONFIG_DRM_AMD_DC_DCN2_1 flag
[Why]

DCN21 is stable enough to be build by default. So drop the flags.

[How]

Remove them using the unifdef tool. The following commands were executed
in sequence:

$ find -name '*.c' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DCN2_1 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_1 '{}' ';'
$ find -name '*.h' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DCN2_1 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_1 '{}' ';'

In addition:

* Remove from kconfig, and replace any dependencies with DCN1_0.
* Remove from any makefiles.
* Fix and cleanup Renoir definitions in dal_asic_id.h
* Expand DCN1 ifdef to include DCN21 code in the following files:
    * clk_mgr/clk_mgr.c: dc_clk_mgr_create()
    * core/dc_resources.c: dc_create_resource_pool()
    * gpio/hw_factory.c: dal_hw_factory_init()
    * gpio/hw_translate.c: dal_hw_translate_init()

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Bhawanpreet Lakha
1da37801a8 drm/amd/display: Drop CONFIG_DRM_AMD_DC_DCN2_0 and DSC_SUPPORTED
[Why]

DCN2 and DSC are stable enough to be build by default. So drop the flags.

[How]

Remove them using the unifdef tool. The following commands were executed
in sequence:

$ find -name '*.c' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DSC_SUPPORT -DCONFIG_DRM_AMD_DC_DCN2_0 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_0 '{}' ';'
$ find -name '*.h' -exec unifdef -m -DCONFIG_DRM_AMD_DC_DSC_SUPPORT -DCONFIG_DRM_AMD_DC_DCN2_0 -UCONFIG_TRIM_DRM_AMD_DC_DCN2_0 '{}' ';'

In addition:

* Remove from kconfig, and replace any dependencies with DCN1_0.
* Remove from any makefiles.
* Fix and cleanup NV defninitions in dal_asic_id.h
* Expand DCN1 ifdef to include DCN2 code in the following files:
    * clk_mgr/clk_mgr.c: dc_clk_mgr_create()
    * core/dc_resources.c: dc_create_resource_pool()
    * dce/dce_dmcu.c: dcn20_*lock_phy()
    * dce/dce_dmcu.c: dcn20_funcs
    * dce/dce_dmcu.c: dcn20_dmcu_create()
    * gpio/hw_factory.c: dal_hw_factory_init()
    * gpio/hw_translate.c: dal_hw_translate_init()

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-13 15:29:44 -05:00
Jesse Zhang
9f87516764 drm/amd/amdgpu: finish delay works before release resources
flush/cancel delayed works before doing finalization
to avoid concurrently requests.

Signed-off-by: Jesse Zhang <zhexi.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-11 17:38:13 -05:00
Evan Quan
60599a0363 drm/amdgpu: perform p-state switch after the whole hive initialized
P-state switch should be performed after all devices from the hive
get initialized.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by:  Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 16:27:48 -05:00
Evan Quan
b0adca4d50 drm/amdgpu: register gpu instance before fan boost feature enablment
Otherwise, the feature enablement will be skipped due to wrong count.

Fixes: beff74bc6e ("drm/amdgpu: fix a race in GPU reset with IB test (v2)")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 16:27:47 -05:00
Evan Quan
39ea6e5f9e drm/amdgpu: change pstate only after all XGMI device initialized
Pstate settings should be performed after all device of the
XGMI setup get initialized.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06 16:27:46 -05:00
Le Ma
bff77e86a3 drm/amdgpu: bypass some cleanup work after err_event_athub (v2)
PSP lost connection when err_event_athub occurs. These cleanup work can be
skipped in BACO reset.

v2: squash in missing include (Alex)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-30 11:06:52 -04:00
Wambui Karuga
f440ff44b1 drm/amd: correct "_LENTH" mispelling in constant
Correct the "_LENTH" mispelling in the AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
constant.

Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-28 11:19:17 -04:00
Andrey Grodzovsky
121a2bc6ae drm/amdgpu: Move amdgpu_ras_recovery_init to after SMU ready.
For Arcturus the I2C traffic is done through SMU tables and so
we must postpone RAS recovery init to after they are ready
which is in amdgpu_device_ip_hw_init_phase2.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-25 16:50:10 -04:00
Tianci.Yin
e35e2b117f drm/amdgpu: add a generic fb accessing helper function(v3)
add a generic helper function for accessing framebuffer via MMIO

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-17 16:31:13 -04:00
Alex Deucher
897483d8a0 drm/amdgpu: move gpu reset out of amdgpu_device_suspend
Move it into the caller.  There are cases were we don't
want it.  We need it for hibernation, but we don't need
it for runtime pm, so drop it for runtime pm.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-15 15:51:39 -04:00
Alex Deucher
803cc26d5c drm/amdgpu: move pci_save_state into suspend path
for amdgpu_device_suspend.  This follows the logic
in the resume path.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-15 15:51:39 -04:00
Emily Deng
bcccee89f4 drm/amdgpu: Fix tdr3 could hang with slow compute issue
When index is 1, need to set compute ring timeout for sriov and passthrough.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-15 15:50:23 -04:00
Alex Deucher
71f98027f2 drm/amdgpu: move amdgpu_device_get_job_timeout_settings
It's only used in amdgpu_device.c and the naming also
reflects that.  Move it there.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-07 15:10:20 -05:00
Lyude Paul
f8d2d39eb4 drm/amdgpu: Iterate through DRM connectors correctly
Currently, every single piece of code in amdgpu that loops through
connectors does it incorrectly and doesn't use the proper list iteration
helpers, drm_connector_list_iter_begin() and
drm_connector_list_iter_end(). Yeesh.

So, do that.

Cc: Juston Li <juston.li@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-03 09:11:05 -05:00
Jesse Zhang
df99ac0fcc drm/amd/amdgpu:Fix compute ring unable to detect hang.
When compute fence did not signal, compute ring cannot detect hardware hang
because its timeout value is set to be infinite by default.

In SR-IOV and passthrough mode, if user does not declare custome timeout
value for compute ring, then use gfx ring timeout value as default. So
that when there is a ture hardware hang, compute ring can detect it.

Signed-off-by: Jesse Zhang <zhexi.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-03 09:11:01 -05:00
Xiaojie Yuan
ec51d3facd drm/amdgpu/discovery: get gpu info from ip discovery table
except soc_bounding_box which is not integrated in discovery table yet

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-03 09:10:59 -05:00
Evan Quan
0e0b89c0d7 drm/amd/powerplay: properly set mp1 state for SW SMU suspend/reset routine
Set mp1 state properly for SW SMU suspend/reset routine.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 10:10:58 -05:00
Shirish S
a35ad98bf9 drm/amdgpu: remove needless usage of #ifdef
define sched_policy in case CONFIG_HSA_AMD is not
enabled, with this there is no need to check for CONFIG_HSA_AMD
else where in driver code.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 10:05:20 -05:00
Shirish S
8c9f69bc5c drm/amdgpu: fix build error without CONFIG_HSA_AMD
If CONFIG_HSA_AMD is not set, build fails:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.o: In function `amdgpu_device_ip_early_init':
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1626: undefined reference to `sched_policy'

Use CONFIG_HSA_AMD to guard this.

Fixes: 1abb680ad371 ("drm/amdgpu: disable gfxoff while use no H/W scheduling policy")

Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 10:05:07 -05:00
Huang Rui
aa978594cf drm/amdgpu: disable gfxoff while use no H/W scheduling policy
While gfxoff is enabled, the mmVM_XXX registers will be 0xfffffff while the GFX
is in "off" state. KFD queue creattion doesn't use ring based method, so it will
trigger a VM fault.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 09:55:41 -05:00
Yong Zhao
4e66d7d215 drm/amdgpu: Add a kernel parameter for specifying the asic type
As more and more new asics start to reuse the old device IDs before
launch, there is a need to quickly override the existing asic type
corresponding to the reused device ID through a kernel parameter. With
this, engineers no longer need to rely on local hack patches,
facilitating cooperation across teams.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-16 09:54:25 -05:00
Tao Zhou
1a6fc071e1 drm/amdgpu: move the call of ras recovery_init and bad page reserve to proper place
ras recovery_init should be called after ttm init,
bad page reserve should be put in front of gpu reset since i2c
may be unstable during gpu reset.
add cleanup for recovery_init and recovery_fini

v2: add more comment and print.
    remove cancel_work_sync in recovery_init.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13 17:50:47 -05:00
Yong Zhao
050091ab6e drm/amdkfd: Query kfd device info by CHIP id instead of pci device id
This optimizes out the pci device id usage in KFD and makes the code
more maintainable.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13 17:49:45 -05:00
Andrey Grodzovsky
d5ea093eeb dmr/amdgpu: Add system auto reboot to RAS.
In case of RAS error allow user configure auto system
reboot through ras_ctrl.
This is also part of the temproray work around for the RAS
hang problem.

v4: Use latest kernel API for disk sync.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13 17:41:17 -05:00
Andrey Grodzovsky
7c6e68c777 drm/amdgpu: Avoid HW GPU reset for RAS.
Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13 17:41:05 -05:00
Andrey Grodzovsky
12ffa55da6 drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.
Issue 1:
In  XGMI case amdgpu_device_lock_adev for other devices in hive
was called to late, after access to their repsective schedulers.
So relocate the lock to the begining of accessing the other devs.

Issue 2:
Using amdgpu_device_ip_need_full_reset to switch the device list from
all devices in hive to the single 'master' device who owns this reset
call is wrong because when stopping schedulers we iterate all the devices
in hive but when restarting we will only reactivate the 'master' device.
Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS
needed we then have to stop schedulers for all devices in hive and not
only the 'master' but with amdgpu_device_ip_need_full_reset  we
already missed the opprotunity do to so. So just remove this logic and
always stop and start all schedulers for all devices in hive.

Also minor cleanup and print fix.

v4: Minor coding style fix.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13 17:39:05 -05:00
Andrey Grodzovsky
0b2d2c2eec drm/amdgpu: Handle job is NULL use case in amdgpu_device_gpu_recover
This should be checked at all places job is accessed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-30 15:02:39 -05:00
Roman Li
e1c14c4339 drm/amdgpu: Enable DC on Renoir
Enable DC support for renoir.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-29 15:52:34 -05:00
Monk Liu
e352625796 drm/amdgpu: introduce vram lost for reset (v2)
for SOC15/vega10 the BACO reset & mode1 would introduce vram lost
in high end address range, current kmd's vram lost checking cannot
catch it since it only check very ahead visible frame buffer

v2:
cover NV as well

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-29 15:52:32 -05:00
Andrey Grodzovsky
c43b849f89 drm/amdgpu: Use new mode2 reset interface for RV.
Integrate the mode2 reset into rest sequence.

v2:
Check ppfuncs pointer for NULL

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15 11:00:44 -05:00
Huang Rui
b51a26a02a drm/amdgpu: add renoir support for gpu_info and ip block setting
This patch adds renoir support for gpu_info firmware and ip block setting.

Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12 12:47:49 -05:00
Huang Rui
1eee4228a5 drm/amdgpu: add renoir asic_type enum
This patch adds renoir to amd_asic_type enum and amdgpu_asic_name[].

Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12 12:47:49 -05:00
Tao Zhou
6ca523d7eb drm/amdgpu: remove RREG64/WREG64
atomic 64 bits REG operations are useless currently

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-09 11:17:30 -05:00
Tao Zhou
c6dddf4540 drm/amdgpu: replace readq/writeq with atomic64 operations
what we really want is a read or write that is guaranteed to be 64 bits
at a time, atomic64 operations are supported on all architectures

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-09 11:14:11 -05:00
Andrey Grodzovsky
b5507c7e06 drm/amdgpu: Fix GPU reset crash regression.
amdgpu_ip_block.status.hw for GMC wasn't set to
false on suspend during GPU reset and so on resume gmc_v9_0_resume
wasn't called.
Caused by 'drm/amdgpu: fix double ucode load by PSP(v3)'

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-06 13:53:19 -05:00
xinhui pan
876923fb92 drm/amdgpu: Fix panic during gpu reset
Clear the flag after hw suspend, otherwise it skips the corresponding hw
resume.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-06 13:52:37 -05:00
Leo Li
078655d982 drm/amdgpu: Add nv12 DC ip block
Load DC and amdgpu display manager

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02 10:30:41 -05:00
Xiaojie Yuan
4808cf9c2a drm/amdgpu: set asic family and ip blocks for navi12
same with navi10

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02 10:30:39 -05:00
Xiaojie Yuan
42b325e5ec drm/amdgpu: add gpu_info firmware for navi12
gpu_info firmare store asic configuration details.

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02 10:30:39 -05:00
Xiaojie Yuan
9802f5d78b drm/amdgpu: add navi12 asic type
Add asic type.

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02 10:30:39 -05:00
Monk Liu
482f0e5385 drm/amdgpu: fix double ucode load by PSP(v3)
previously the ucode loading of PSP was repreated, one executed in
phase_1 init/re-init/resume and the other in fw_loading routine

Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
prior to the FW loading and any block's hw_init/resume

v2:
still do the smu fw loading since it is needed by bare-metal

v3:
drop the change in reinit_early_sriov, just clear all block's status.hw
in the head place and set the status.hw after hw_init done is enough

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02 10:17:34 -05:00
Monk Liu
4cd4c5c064 drm/amdgpu: cleanup vega10 SRIOV code path
we can simplify all those unnecessary function under
SRIOV for vega10 since:
1) PSP L1 policy is by force enabled in SRIOV
2) original logic always set all flags which make itself
   a dummy step

besides,
1) the ih_doorbell_range set should also be skipped
for VEGA10 SRIOV.
2) the gfx_common registers should also be skipped
for VEGA10 SRIOV.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02 10:17:21 -05:00
Tao Zhou
4fa1c6a679 drm/amdgpu: add RREG64/WREG64(_PCIE) operations
add 64 bits register access functions

v2: implement 64 bit functions in low level

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31 14:49:40 -05:00
Alex Deucher
a3a09142f4 drm/amdgpu: put the SMC into the proper state on reset/unload
When doing a GPU reset or unloading the driver, we need to
put the SMU into the apprpriate state for the re-init after
the reset or unload to reliably work.

I don't think this is necessary for BACO because the SMU actually
controls the BACO state to it needs to be active.

For suspend (S3), the asic is put into D3 so the SMU would be
powered down so I don't think we need to put the SMU into
any special state.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30 23:24:44 -05:00
Le Ma
65e60f6e06 drm/amdgpu: add Arcturus gpu info firmware
Add GPU info firmware for Arcturus.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:03 -05:00
Le Ma
61cf44c1db drm/amdgpu: add to set Arcturus ip blocks
Add IP blocks for Arcturus.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:02 -05:00
Le Ma
d6c3b24ea2 drm/amdgpu: add Arcturus asic type
Add asic type for Arcturus.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:01 -05:00
Bhawanpreet Lakha
8fceceb69e drm/amd/display: add dm block
enable DC for navi14.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:18:00 -05:00
Xiaojie Yuan
7ecb5cd451 drm/amdgpu: set asic family and ip blocks for navi14
same with navi10

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:17:58 -05:00
Xiaojie Yuan
ed42cfe1ac drm/amdgpu: add gpu_info firmware for navi14
Add navi14 to case statement to load the GPU info firmware.

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:17:58 -05:00
Xiaojie Yuan
87dbad02d2 drm/amdgpu: add navi14 asic type
Add CHIP_NAVI14 to the list of asic types.

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18 14:17:58 -05:00
Alex Deucher
32eaeae0ef drm/amdgpu/psp: add a mutex to protect access to the psp ring
We need to serialize access to the psp ring if there are multiple
callers at runtime.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-09 17:43:39 -05:00
Alex Deucher
f54eeab4e7 drm/amdgpu: properly guard the generic discovery code
It's only available on navi and newer.

Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-09 17:43:31 -05:00
Arnd Bergmann
d155bef063 amdgpu: make pmu support optional
When CONFIG_PERF_EVENTS is disabled, we cannot compile the pmu
portion of the amdgpu driver:

drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:48:38: error: no member named 'hw' in 'struct perf_event'
        struct hw_perf_event *hwc = &event->hw;
                                     ~~~~~  ^
drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:51:13: error: no member named 'attr' in 'struct perf_event'
        if (event->attr.type != event->pmu->type)
            ~~~~~  ^
...

Use conditional compilation for this file.

Fixes: 9c7c85f7ea ("drm/amdgpu: add pmu counters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-08 13:56:22 -05:00
xinhui pan
f1c1314be4 drm/amdgpu: Disable ras features on all IPs before gpu reset
Perform a ras_suspend to disable ras on all IPs to workaround
some ROCm stability issue.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-05 15:59:20 -05:00
Jack Xiao
b2109d8ed6 drm/amdgpu: enable PCIE atomics ops support
GPU atomics operation depends on PCIE atomics support.
Always enable PCIE atomics ops support in case that
it hasn't been enabled.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-01 14:54:40 -05:00
Evan Quan
fdafb3597a drm/amdgpu: fix MGPU fan boost enablement for XGMI reset
MGPU fan boost feature should not be enabled until all the
devices from the same hive are all back from reset.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-01 14:54:12 -05:00
Alex Deucher
d7929c1e13 Merge branch 'drm-next' into drm-next-5.3
Backmerge drm-next and fix up conflicts due to drmP.h removal.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-25 08:42:25 -05:00
Harry Wentland
b4f199c7b0 drm/amdgpu: Enable DC support for Navi10
Enable the IP for navi10.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-22 09:34:07 -05:00
Harry Wentland
48321c3dde drm/amd/display: Read soc_bounding_box from gpu_info (v2)
[WHY]
We don't want to expose sensitive ASIC information before ASIC release.

[HOW]
Encode the soc_bounding_box in the gpu_info FW (for Linux) and read it
at driver load.

v2: fix warning when CONFIG_DRM_AMD_DC_DCN2_0 is not set (Alex)

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:59:34 -05:00
Huang Rui
0a5b8c7b94 drm/amdgpu: add to set navi ip blocks
Set the IPs for navi10 in early_init like other asics.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:59:24 -05:00
Hawking Zhang
e0d076574e drm/amdgpu: update golden setting programming logic
Since from soc15, make sure only AndMasked bit get changed
when applied or_mask

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:59:24 -05:00
Jack Xiao
5f84cc635b drm/amdgpu/mes: enable mes on navi10 and later asic
When amdgpu_mes is enabled and asic family is navi10 and
later asic, enable mes per device.

Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:58:22 -05:00
Xiaojie Yuan
a190d1c75c drm/amdgpu/discovery: add module param for ip discovery enablement
to control enablement.

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:58:21 -05:00
Jack Xiao
6698a3d05f drm/amdgpu: add mcbp unit test in debugfs (v3)
The MCBP unit test is used to test the functionality of MCBP.
It emualtes to send preemption request and resubmit the unfinished
jobs.

v2: squash in fixes (Alex)
v3: squash in memory leak fix (Jack)

Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 18:58:21 -05:00
Jack Xiao
f92d5c6123 drm/amdgpu: enable the static csa when mcbp enabled
CSA is the Context Save Area for preemption.

Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 09:36:54 -05:00
Jack Xiao
b239c01727 drm/amdgpu: add mcbp driver parameter
Add mcbp driver parameter, so that mcbp feature can be
enabled/disabled by driver parameter.

Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21 09:30:52 -05:00
Hawking Zhang
35c2e91059 drm/amdgpu: parse the new members added by gpu_info ucode v1_1
Parse the new parameters for gfx10.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20 21:16:16 -05:00
Huang Rui
23c6268eb1 drm/amdgpu: add navi10 gpu info firmware
gpu info firmware stores configuration data for various
IP blocks.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20 21:15:29 -05:00
Huang Rui
852a6626d5 drm/amdgpu: add navi10 asic type
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20 15:54:56 -05:00
Jonathan Kim
9c7c85f7ea drm/amdgpu: add pmu counters
adding perf event counters

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20 11:36:22 -05:00
Daniel Vetter
52d2d44eee Linux 5.2-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Gj1MeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGctkH/0At3+SQPY2JJSy8
 i6+TDeytFx9OggeGLPHChRfehkAlvMb/kd34QHnuEvDqUuCAMU6HZQJFKoK9mvFI
 sDJVayPGDSqpm+iv8qLpMBPShiCXYVnGZeVfOdv36jUswL0k6wHV1pz4avFkDeZa
 1F4pmI6O2XRkNTYQawbUaFkAngWUCBG9ECLnHJnuIY6ohShBvjI4+E2JUaht+8gO
 M2h2b9ieddWmjxV3LTKgsK1v+347RljxdZTWnJ62SCDSEVZvsgSA9W2wnebVhBkJ
 drSmrFLxNiM+W45mkbUFmQixRSmjv++oRR096fxAnodBxMw0TDxE1RiMQWE6rVvG
 N6MC6xA=
 =+B0P
 -----END PGP SIGNATURE-----

Merge v5.2-rc5 into drm-next

Maarten needs -rc4 backmerged so he can pull in the fbcon notifier
removal topic branch into drm-misc-next.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2019-06-19 12:07:29 +02:00
Alex Deucher
21a249ca02 drm/amdgpu: wait to fetch the vbios until after common init
We need the asic_funcs set for the get rom callbacks in some
cases.

Tested-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-17 11:02:03 -05:00
Daniel Vetter
2454fcea33 drm-misc-next for v5.3:
UAPI Changes:
 
 Cross-subsystem Changes:
 - Add code to signal all dma-fences when freed with pending signals.
 - Annotate reservation object access in CONFIG_DEBUG_MUTEXES
 
 Core Changes:
 - Assorted documentation fixes.
 - Use irqsave/restore spinlock to add crc entry.
 - Move code around to drm_client, for internal modeset clients.
 - Make drm_crtc.h and drm_debugfs.h self-contained.
 - Remove drm_fb_helper_connector.
 - Add bootsplash to todo.
 - Fix lock ordering in pan_display_legacy.
 - Support pinning buffers to current location in gem-vram.
 - Remove the now unused locking functions from gem-vram.
 - Remove the now unused kmap-object argument from vram helpers.
 - Stop checking return value of debugfs_create.
 - Add atomic encoder enable/disable helpers.
 - pass drm_atomic_state to atomic connector check.
 - Add atomic support for bridge enable/disable.
 - Add self refresh helpers to core.
 
 Driver Changes:
 - Add extra delay to make MTP SDM845 work.
 - Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip.
 - Add zpos and ?BGR8888 support to meson.
 - More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis.
 - Allow synopsis to unwedge the i2c hdmi bus.
 - Add orientation quirks for GPD panels.
 - Edid cleanups and fixing handling for edid < 1.2.
 - Add runtime pm to stm.
 - Handle s/r in dw-hdmi.
 - Add hooks for power on/off to dsi for stm.
 - Remove virtio dirty tracking code, done in drm core.
 - Rework BO handling in ast and mgag200.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl0DYU8ACgkQ/lWMcqZw
 E8NNWw/+MhcRakQmrNDMRIj4DvukzPW2efXbhRFuvthUvVN7rOHMzQZBc3le+gUb
 2GGoEeUYG7XoA/Nj3ZQMUoalrjODywtLClBClC4Blped0mZ4JPiI7bTsrNILn1N1
 hZ0+DbffMCAKqKN8TftK/TrFF9IEM8JSftqD/1RdkiXVcMH3NKuLABHZxzPxH2BH
 XuSqIL5lDyAtanixB53aDf2gw9iipUphYoFlKhdx9dr5Ql96RhiOcDgFhXnFiQu4
 O9z3W6tRI2VPoCzsnhNy3Eah7rBDnZwvyfGa9YU/Q+VeHegb601p8OmNNwpshWE1
 ohixBbADj0dr+K3T/lVW30kovg34i4L5K3O7MR0HxWYSA7+v3AHyG7/GWLxbBNQn
 AFHTRbBph8aP/Dn24ucbKaB7wHi31j7b0Hxj+oJR8RoGhuOYcMOuZrCHqpAxStto
 riSVDCRcq/KcPiuqZZ1UnzFWlQMhNFUwumloPiXFkJ4mcSdK9IbdKBd2eqbRdaU1
 eTOA4istVgNgaNbgLvVB2ltjqXrsdio7/jh6RhobFPqHISiL7iMZg3C/KRBXrkUB
 lYMeGkiE3Wp77zdycdofuEbMfAYUwLts8EYjVsM6xo0BKlBYhpeVuBOYeQEkU7PV
 PpGYqQVeZUoD1OyGlMWIYoyb5Ya7OLUDpooOJdFqoPzUfDki31E=
 =4uQX
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2019-06-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v5.3:

UAPI Changes:

Cross-subsystem Changes:
- Add code to signal all dma-fences when freed with pending signals.
- Annotate reservation object access in CONFIG_DEBUG_MUTEXES

Core Changes:
- Assorted documentation fixes.
- Use irqsave/restore spinlock to add crc entry.
- Move code around to drm_client, for internal modeset clients.
- Make drm_crtc.h and drm_debugfs.h self-contained.
- Remove drm_fb_helper_connector.
- Add bootsplash to todo.
- Fix lock ordering in pan_display_legacy.
- Support pinning buffers to current location in gem-vram.
- Remove the now unused locking functions from gem-vram.
- Remove the now unused kmap-object argument from vram helpers.
- Stop checking return value of debugfs_create.
- Add atomic encoder enable/disable helpers.
- pass drm_atomic_state to atomic connector check.
- Add atomic support for bridge enable/disable.
- Add self refresh helpers to core.

Driver Changes:
- Add extra delay to make MTP SDM845 work.
- Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip.
- Add zpos and ?BGR8888 support to meson.
- More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis.
- Allow synopsis to unwedge the i2c hdmi bus.
- Add orientation quirks for GPD panels.
- Edid cleanups and fixing handling for edid < 1.2.
- Add runtime pm to stm.
- Handle s/r in dw-hdmi.
- Add hooks for power on/off to dsi for stm.
- Remove virtio dirty tracking code, done in drm core.
- Rework BO handling in ast and mgag200.

Tiny conflict in drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c,
needed #include <linux/slab.h> to make it compile.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0e01de30-9797-853c-732f-4a5bd6e61445@linux.intel.com
2019-06-14 11:44:24 +02:00
Yintian Tao
e9bc1bf791 drm/amdgpu: register pm sysfs for sriov (v2)
we need register pm sysfs for virt in order
to support dpm level modification because
smu ip block will not be added under SRIOV

v2: whitespace fixes (Alex)

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-13 13:59:48 -05:00
Tom St Denis
b4559a1646 drm/amd/amdgpu: remove vram_page_split kernel option (v3)
This option is no longer needed.  The default code paths
are now the only option.

v2: Add HPAGE support and a default for non contiguous maps
v3: Misread 512 pages as MiB ...

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-11 12:44:23 -05:00
Prike Liang
80f41f84ae drm/amd/amdgpu: add RLC firmware to support raven1 refresh
Use SMU firmware version to indentify the raven1 refresh device and
then load homologous RLC FW.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Huang Rui<Ray.Huang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-11 12:40:06 -05:00
Sam Ravnborg
fdf2f6c56e drm/amd: drop use of drmP.h in amdgpu/amdgpu*
Drop use of drmP.h in all files named amdgpu*
in drm/amd/amdgpu/

Fix fallout.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org
2019-06-10 23:02:48 +02:00
Alex Deucher
beff74bc6e drm/amdgpu: fix a race in GPU reset with IB test (v2)
Split late_init into two functions, one (do_late_init) which
just does the hw init, and late_init which calls do_late_init
and schedules the IB test work.  Call do_late_init in
the GPU reset code to run the init code, but not schedule
the IB test code.  The IB test code is called directly
in the gpu reset code so no need to run the IB tests
in a separate work thread.  If we do, we end up racing.

v2: Rework late_init.  Pull out the mgpu fan boost and xgmi
pstate code into late_init so they get called in all cases.
rename the late_init worker thread to delayed work since it's
just the IB tests now which can happen later.  Schedule the
work at init and resume time.  It's not needed at reset time
because the IB tests are called directly.

Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-05 22:18:09 -05:00
xinhui pan
c53e4db712 drm/amdgpu: cancel late_init_work before gpu reset
gpu reset will run late_init and schedule the late_init_work.  if we
keep triggering gpu reset in a short time, there are potenial races.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-05 22:18:09 -05:00
Prike Liang
1929059893 drm/amd/amdgpu: add RLC firmware to support raven1 refresh
Use SMU firmware version to indentify the raven1 refresh device and
then load homologous RLC FW.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Huang Rui<Ray.Huang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-05 11:14:15 -05:00
Dave Airlie
91c1ead6ae Merge branch 'drm-next-5.3' of git://people.freedesktop.org/~agd5f/linux into drm-next
New stuff for 5.3:
- Add new thermal sensors for vega asics
- Various RAS fixes
- Add sysfs interface for memory interface utilization
- Use HMM rather than mmu notifier for user pages
- Expose xgmi topology via kfd
- SR-IOV fixes
- Fixes for manual driver reload
- Add unique identifier for vega asics
- Clean up user fence handling with UVD/VCE/VCN blocks
- Convert DC to use core bpc attribute rather than a custom one
- Add GWS support for KFD
- Vega powerplay improvements
- Add CRC support for DCE 12
- SR-IOV support for new security policy
- Various cleanups

From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190529220944.14464-1-alexander.deucher@amd.com
2019-05-31 10:04:39 +10:00
Emily Deng
394e9a14c6 drm/amdgpu: Need to set the baco cap before baco reset
For passthrough, after rebooted the VM, driver will do
a baco reset before doing other driver initialization during loading
 driver. For doing the baco reset, it will first
check the baco reset capability. So first need to set the
cap from the vbios information or baco reset won't be
enabled.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-28 14:47:42 -05:00
Alex Deucher
dbaa922b57 drm/amdgpu: use pcie_bandwidth_available rather than open coding it
It does the same thing we were doing already.  I though it needed
work for gen3/4 speeds, but that seems to be covered already.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:01 -05:00
Ori Messinger
5bb2353273 drm/amdgpu: Report firmware versions with sysfs v2
Firmware versions can be found as separate sysfs files at:
/sys/class/drm/cardX/device/fw_version (where X is the card number)
The firmware versions are displayed in hexadecimal.
v2: Moved sysfs files to subfolder

Signed-off-by: Ori Messinger <ori.messinger@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:52 -05:00
xinhui pan
5e6932fe31 drm/amdgpu: enable ras suspend/resume
suspend/resume will change ras state behind us. Let driver get notified.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:51 -05:00
xinhui pan
511fdbc33a drm/amdgpu: ras support suspend/resume
add ras suspend function. rename ras_post_init to amdgpu_ras_resume.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:51 -05:00
Trigger Huang
2d11fd3f54 drm/amdgpu: initialize PSP before IH under SR-IOV
In order to support new PSP feature that PSP may provide interface
to program IH CNTL register, initialize PSP before IH under Vega10
SR-IOV VF

Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:50 -05:00
Trigger Huang
78d4811267 drm/amdgpu: init vega10 SR-IOV reg access mode
Set different register access mode according to the features
provided by firmware

Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:50 -05:00
xinhui pan
e79a04d531 drm/amdgpu: gpu reset will run ras post init
ras need initialize proper state after late init

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:50 -05:00
xinhui pan
7c04ca50b0 drm/amdgpu: gpu reset will run late_init
ras need late init to initialize proper state.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:50 -05:00
Kent Russell
dcea6e65d4 drm/amdgpu: Add PCIe replay count sysfs file
Add a sysfs file for reporting the number of PCIe replays (NAKs). This
returns the sum of NAKs received and NAKs generated

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:48 -05:00
Evan Quan
912dfc846a drm/amdgpu: enable separate timeout setting for every ring type V4
Every ring type can have its own timeout setting.

 - V2: update lockup_timeout parameter format and cosmetic fixes
 - V3: invalidate 0 and negative values
 - V4: update lockup_timeout parameter format

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:20:48 -05:00
Sean Paul
374ed54293 Merge drm/drm-next into drm-misc-next
Backmerging 5.2-rc1 to -misc-next for robher

Signed-off-by: Sean Paul <seanpaul@chromium.org>
2019-05-22 16:08:21 -04:00
Maarten Lankhorst
752c4f3c1d Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
Requested for backmerging airlied's drm-legacy cleanup.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2019-05-09 10:19:03 +02:00
Linus Torvalds
a2d635decb drm pull request for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJc04M6AAoJEAx081l5xIa+SJgP/0uIgIOM53vPpydgmr+2IEHF
 jbDqrd+mipgNriRVHjDsWdUHCUNtyhB7YEBCMrj3mY0rRFI7FlQQf4lOwYGoHiKP
 4JZg4kwC37997lFXl1uabGj3DmJLtxKL2/D15zCH/uLe+2EDzWznP6NVdFT3WK0P
 YKZQCWT19PWSsLoBRPutWxkmop4AYvkqE0a6vXUlJlFYZK3Bbytx6/179uWKfiX5
 ZkKEEtx1XiDAvcp5gBb6PISurycrBY0e/bkPBnK3ES5vawMbTU5IrmWOrQ4D8yOd
 z9qOVZawZ6+b2XBDgBWjQ9bM7I5R7Il1q/LglYEaFI9+wHUnlUdDSm6ft5/5BiCZ
 fqgkh5Bj2iEsajbSsacoljMOpxpYPqj63mqc+7fAGXF34V+B+9U1bpt8kCbMKowf
 7Abb7IuiCR6vLDapjP6VqTMvdQ4O466OEAN83ULGFTdmMqYYH4AxaIwc+xcAk/aP
 RNq7/RHhh4FRynRAj9fCkGlF3ArnM88gLINwWuEQq4SClWGcvdw7eaHpwWo77c4g
 iccCnTLqSIg5pDVu07AQzzBlW6KulWxh5o72x+Xx+EXWdYUDHQ1SlNs11bSNUBV1
 5MkrzY2GuD+NFEjsXJEDIPOr40mQOyJCXnxq8nXPsz/hD9kHeJPvWn3J3eVKyb5B
 Z6/knNqM0BDn3SaYR/rD
 =YFiQ
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "This has two exciting community drivers for ARM Mali accelerators.
  Since ARM has never been open source friendly on the GPU side of the
  house, the community has had to create open source drivers for the
  Mali GPUs. Lima covers the older t4xx and panfrost the newer 6xx/7xx
  series. Well done to all involved and hopefully this will help ARM
  head in the right direction.

  There is also now the ability if you don't have any of the legacy
  drivers enabled (pre-KMS) to remove all the pre-KMS support code from
  the core drm, this saves 10% or so in codesize on my machine.

  i915 also enable Icelake/Elkhart Lake Gen11 GPUs by default, vboxvideo
  moves out of staging.

  There are also some rcar-du patches which crossover with media tree
  but all should be acked by Mauro.

  Summary:

  uapi changes:
   - Colorspace connector property
   - fourcc - new YUV formts
   - timeline sync objects initially merged
   - expose FB_DAMAGE_CLIPS to atomic userspace

  new drivers:
   - vboxvideo: moved out of staging
   - aspeed: ASPEED SoC BMC chip display support
   - lima: ARM Mali4xx GPU acceleration driver support
   - panfrost: ARM Mali6xx/7xx Midgard/Bitfrost acceleration driver support

  core:
   - component helper docs
   - unplugging fixes
   - devm device init
   - MIPI/DSI rate control
   - shmem backed gem objects
   - connector, display_info, edid_quirks cleanups
   - dma_buf fence chain support
   - 64-bit dma-fence seqno comparison fixes
   - move initial fb config code to core
   - gem fence array helpers for Lima
   - ability to remove legacy support code if no drivers requires it (removes 10% of drm.ko size)
   - lease fixes

  ttm:
   - unified DRM_FILE_PAGE_OFFSET handling
   - Account for kernel allocations in kernel zone only

  panel:
   - OSD070T1718-19TS panel support
   - panel-tpo-td028ttec1 backlight support
   - Ronbo RB070D30 MIPI/DSI
   - Feiyang FY07024DI26A30-D MIPI-DSI panel
   - Rocktech jh057n00900 MIPI-DSI panel

  i915:
   - Comet Lake (Gen9) PCI IDs
   - Updated Icelake PCI IDs
   - Elkhartlake (Gen11) support
   - DP MST property addtions
   - plane and watermark fixes
   - Icelake port sync and VEBOX disable fixes
   - struct_mutex usage reduction
   - Icelake gamma fix
   - GuC reset fixes
   - make mmap more asynchronous
   - sound display power well race fixes
   - DDI/MIPI-DSI clocks for Icelake
   - Icelake RPS frequency changing support
   - Icelake workarounds

  amdgpu:
   - Use HMM for userptr
   - vega20 experimental smu11 support
   - RAS support for vega20
   - BACO support for vega12 + fixes for vega20
   - reworked IH interrupt handling
   - amdkfd RAS support
   - Freesync improvements
   - initial timeline sync object support
   - DC Z ordering fixes
   - NV12 planes support
   - colorspace properties for planes=
   - eDP opts if eDP already initialized

  nouveau:
   - misc fixes

  etnaviv:
   - misc fixes

  msm:
   - GPU zap shader support expansion
   - robustness ABI addition

  exynos:
   - Logging cleanups

  tegra:
   - Shared reset fix
   - CPU cache maintenance fix

  cirrus:
   - driver rewritten using simple helpers

  meson:
   - G12A support

  vmwgfx:
   - Resource dirtying management improvements
   - Userspace logging improvements

  virtio:
   - PRIME fixes

  rockchip:
   - rk3066 hdmi support

  sun4i:
   - DSI burst mode support

  vc4:
   - load tracker to detect underflow

  v3d:
   - v3d v4.2 support

  malidp:
   - initial Mali D71 support in komeda driver

  tfp410:
   - omap related improvement

  omapdrm:
   - drm bridge/panel support
   - drop some omap specific panels

  rcar-du:
   - Display writeback support"

* tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm: (1507 commits)
  drm/msm/a6xx: No zap shader is not an error
  drm/cma-helper: Fix drm_gem_cma_free_object()
  drm: Fix timestamp docs for variable refresh properties.
  drm/komeda: Mark the local functions as static
  drm/komeda: Fixed warning: Function parameter or member not described
  drm/komeda: Expose bus_width to Komeda-CORE
  drm/komeda: Add sysfs attribute: core_id and config_id
  drm: add non-desktop quirk for Valve HMDs
  drm/panfrost: Show stored feature registers
  drm/panfrost: Don't scream about deferred probe
  drm/panfrost: Disable PM on probe failure
  drm/panfrost: Set DMA masks earlier
  drm/panfrost: Add sanity checks to submit IOCTL
  drm/etnaviv: initialize idle mask before querying the HW db
  drm: introduce a capability flag for syncobj timeline support
  drm: report consistent errors when checking syncobj capibility
  drm/nouveau/nouveau: forward error generated while resuming objects tree
  drm/nouveau/fb/ramgk104: fix spelling mistake "sucessfully" -> "successfully"
  drm/nouveau/i2c: Disable i2c bus access after ->fini()
  drm/nouveau: Remove duplicate ACPI_VIDEO_NOTIFY_PROBE definition
  ...
2019-05-08 21:35:19 -07:00
Dave Airlie
422449238e Merge branch 'drm-next-5.2' of git://people.freedesktop.org/~agd5f/linux into drm-next
- SR-IOV fixes
- Raven flickering fix
- Misc spelling fixes
- Vega20 power fixes
- Freesync improvements
- DC fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190502193020.3562-1-alexander.deucher@amd.com
2019-05-03 10:31:07 +10:00
Andrey Grodzovsky
1d721ed679 drm/amdgpu: Avoid HW reset if guilty job already signaled.
Also reject TDRs if another one already running.

v2:
Stop all schedulers across device and entire XGMI hive before
force signaling HW fences.
Avoid passing job_signaled to helper fnctions to keep all the decision
making about skipping HW reset in one place.

v3:
Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
against it's decrement in drm_sched_stop in non HW reset case.
v4: rebase
v5: Revert v3 as we do it now in sceduler code.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-6-git-send-email-andrey.grodzovsky@amd.com
2019-05-02 15:54:32 -05:00
Christian König
5918045c4e drm/scheduler: rework job destruction
We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.
v4: Move guilty job free into sched code.
v5:
Move sched->hw_rq_count to drm_sched_start to account for counter
decrement in drm_sched_stop even when we don't call resubmit jobs
if guily job did signal.
v6: remove unused variable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692

Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com
2019-05-02 15:45:48 -05:00
Andrey Grodzovsky
77e7f82985 drm/amdgpu: Change VRAM lost print from ERR to INF
It's normal for VRAM to lost during GPU reset and so change
the log level to INFO to avoid confusing users.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-23 12:09:26 -05:00
Dave Airlie
f06ddb5309 Linux 5.1-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyzsYgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGMw0H/ir42KJiABBKSETD
 0d38qXVclAI/123zl8EkSfDrBKOsuIpXUDxzKeoDMhMkiurMpK6bbEOTPJAQMZJe
 nEYpq/bZQi+vO8Q/pMMpaC3ExlIRosd0JAR7TyDUh5ZAeeMuDNzmvMk/DPxXPbNt
 0P1FWePDa7908ajCOW1T8ZrB9Ak8boo7TKkF3LBb00ks1mEkyp/l74MKOHdu+HYn
 XIwncX/Jotl4BrKdNC2f/NXYLYk6MrJDGug8TxuHgIqiMWhhrcSqbxU1ri7iqFXB
 cBYdFo6ZJ8CWHux8/5LY5CMjSqEtzKha2Ohuhy3MMu1RsICyFLQtHnxHJ1ytLSBt
 DOPcDQ0=
 =CEUD
 -----END PGP SIGNATURE-----

BackMerge v5.1-rc5 into drm-next

Need rc5 for udl fix to add udl cleanups on top.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-15 15:51:49 +10:00
wentalou
b575f10dbd drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR
shadow was added into shadow_list by amdgpu_bo_create_shadow.
meanwhile, shadow->tbo.mem was not fully configured.
tbo.mem would be fully configured by amdgpu_vm_sdma_map_table until calling amdgpu_vm_clear_bo.
If sriov TDR occurred between amdgpu_bo_create_shadow and amdgpu_vm_sdma_map_table,
amdgpu_device_recover_vram would deal with shadow without tbo.mem.start.

Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-12 11:23:49 -05:00
Yintian Tao
bb5a2bdf36 drm/amdgpu: support dpm level modification under virtualization v3
Under vega10 virtualuzation, smu ip block will not be added.
Therefore, we need add pp clk query and force dpm level function
at amdgpu_virt_ops to support the feature.

v2: add get_pp_clk existence check and use kzalloc to allocate buf

v3: return -ENOMEM for allocation failure and correct the coding style

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-10 13:53:27 -05:00
wentalou
1712fb1a2f drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list
amdgpu_bo_restore_shadow would assign zero to r if succeeded.
r would remain zero if there is only one node in shadow_list.
current code would always return failure when r <= 0.
restart the timeout for each wait was a rather problematic bug as well.
The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.

Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-04 10:22:06 -05:00
shaoyunl
df399b0641 drm/amdgpu: XGMI pstate switch initial support
Driver vote low to high pstate switch whenever there is an outstanding
XGMI mapping request. Driver vote high to low pstate when all the
outstanding XGMI mapping is terminated.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27 22:40:43 -05:00
Evan Quan
fed184e905 drm/amdgpu: trivial typo fix
"error" was not correctly spelled.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27 22:40:05 -05:00
Chengming Gui
ad51c46eec drm/amd/amdgpu: fix PCIe dpm feature issue (v3)
use pcie_bandwidth_available to get real link state
to update pcie table.

v2: fix incorrect initialized return value
v3: expand the fetching method about the link width to all asics.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27 22:31:56 -05:00
Christian König
86f7bae5cf drm/amdgpu: revert "XGMI pstate switch initial support"
This reverts commit 9b638f9751.

Adding this to the mapping is complete nonsense and the whole
implementation looks racy. This patch wasn't thoughtfully reviewed
and should be reverted for now.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Liu, Shaoyun <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-21 14:05:01 -05:00
Huang Rui
005440066f drm/amdgpu: enable gfxoff again on raven series (v2)
This patch enables gfxoff and stutter mode again, since we take more testing on
raven series. For raven2 and picasso, we can enable it directly. And for raven,
we need check the RLC/SMC ucode version cannot be less than #531/0x1e45.

v2: add smc version checking for raven.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Tested-by: Likun Gao <Likun.Gao@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-20 23:39:49 -05:00
Wentao Lou
f81e8d532a drm/amdkfd/sriov:Put the pre and post reset in exclusive mode v2
add amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov.

Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-20 23:39:47 -05:00
xinhui pan
108c6a6309 drm/amdgpu: add new ras workflow control flags
add ras post init function.
Do some initialization after all IP have finished their late init.

Add new member flags which will control the ras work flow.
For now, vbios enable ras for us on boot. That might change in the
future.
So there should be a flag from vbios to tell us if ras is enabled or not
on boot. Looks like there is no such info now.

Other bits of the flags are reserved to control other parts of ras.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:52 -05:00
xinhui pan
2be4c4a9d4 drm/amdgpu: reserve bad pages during recovery
Mark vram pages with errors as bad and prevent the driver
from using them.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:50 -05:00
xinhui pan
c030f2e416 drm/amdgpu: add amdgpu_ras.c to support ras (v2)
add obj management.
add feature control.
add debugfs infrastructure.
add sysfs infrastructure.
add IH infrastructure.
add recovery infrastructure.

It is a framework. Other IPs need call amdgpu_ras_xxx function instead of
psp_ras_xxx functions.

v2: squash in warning fixes

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:50 -05:00
Andrey Grodzovsky
533aed278a drm/amdgpu: Move IB pool init and fini v2
Problem:
Using SDMA for TLB invalidation in certain ASICs exposed a problem
of IB pool not being ready while SDMA already up on Init and already
shutt down while SDMA still running on Fini. This caused
IB allocation failure. Temproary fix was commited into a
bringup branch but this is the generic fix.

Fix:
Init IB pool rigth after GMC is ready but before SDMA is ready.
Do th opposite for Fini.

v2: Remove restriction on SDMA early init and move amdgpu_ib_pool_fini

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:49 -05:00
shaoyunl
9b638f9751 drm/amdgpu: XGMI pstate switch initial support
Driver vote low to high pstate switch whenever there is an outstanding
XGMI mapping request. Driver vote high to low pstate when all the
outstanding XGMI mapping is terminated.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:36:48 -05:00
Likun Gao
3b94fb101f drm/amd/powerplay: add limit of pp_feature for smu (v3)
Move pp_feature from the struct of amd_powerplay to amdgpu_device.
Add pp_feature limit for overdrive interface.

v2: put pp_feature into struct amdgpu_pm.
v3: merge feature_mask with pp_feature.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19 15:04:02 -05:00
Dave Airlie
f4bc54b532 Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-next
Updates for 5.1:
- GDS fixes
- Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES interface
- GPUVM fixes
- PCIE DPM switching fixes for vega20
- Vega10 uclk DPM regression fix
- DC Freesync fixes
- DC ABM fixes
- Various DC cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208210214.27666-1-alexander.deucher@amd.com
2019-02-11 14:04:20 +10:00
Harish Kasiviswanathan
c53134577c drm/amdgpu: Fix pci platform speed and width
The new Vega series GPU cards have in-built bridges. To get the pcie
speed and width supported by the platform walk the hierarchy and get the
slowest link.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-07 14:03:18 -05:00
Dave Airlie
37fdaa3390 drm-misc-next for 5.1:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
   - Split out some part of drm_crtc_helper.h into drm_probe_helper.h
   - DRIVER_* flags improvements
   - New tasks on the TODO-list
   - Improvements to the documentation
 
 Driver Changes:
   - Continual of drmP.h removal in multiple drivers
   - Removal of FBINFO_(FLAG_)DEFAULT in multiple drivers
   - sun4i: Addition of the A23 support, multiple fixes for the tiled
     formats
   - atmel-hlcdc: Fix of clipping and rotation properties
   - qxl: various BO-related improvements, prime and generic fbdev emulation
     support
   - dw-hdmi: Support for HDMI2.0 2160p modes and YUV420 output
   - New Sitronix ST7701 panel driver
   - New Kingdisplay KD097D04 panel driver
   - New LeMaker BL035-RGB-002 panel driver
   - New PDA 91-00156-A0 panel driver
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXFRb/wAKCRDj7w1vZxhR
 xXj2AQDFAkH0/tksJ+T6yrHZXQE7q/6fAWTlrDXLo7Nb1hhfswEA1AyaHmZJA0Ar
 u7S+4Gfzs/gisw9Jr4bqQaerEpYJxQA=
 =qumD
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2019-02-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 5.1:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
  - Split out some part of drm_crtc_helper.h into drm_probe_helper.h
  - DRIVER_* flags improvements
  - New tasks on the TODO-list
  - Improvements to the documentation

Driver Changes:
  - Continual of drmP.h removal in multiple drivers
  - Removal of FBINFO_(FLAG_)DEFAULT in multiple drivers
  - sun4i: Addition of the A23 support, multiple fixes for the tiled
    formats
  - atmel-hlcdc: Fix of clipping and rotation properties
  - qxl: various BO-related improvements, prime and generic fbdev emulation
    support
  - dw-hdmi: Support for HDMI2.0 2160p modes and YUV420 output
  - New Sitronix ST7701 panel driver
  - New Kingdisplay KD097D04 panel driver
  - New LeMaker BL035-RGB-002 panel driver
  - New PDA 91-00156-A0 panel driver

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190201144749.t3abxvguhstu6bcl@flea
2019-02-04 14:42:34 +10:00
Dave Airlie
e09191d360 Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-next
New stuff for 5.1.
amdgpu:
- DC bandwidth formula updates
- Support for DCC on scanout surfaces
- Support for multiple IH rings on soc15 asics
- Fix xgmi locking
- Add sysfs interface to get pcie usage stats
- Simplify DC i2c/aux code
- Initial support for BACO on vega10/20
- New runtime SMU feature debug interface
- Expand existing sysfs power interfaces to new clock domains
- Handle kexec properly
- Simplify IH programming
- Rework doorbell handling across asics
- Drop old CI DPM implementation
- DC page flipping fixes
- Misc SR-IOV fixes

amdkfd:
- Simplify the interfaces between amdkfd and amdgpu

ttm:
- Add a callback to notify the driver when the lru changes

sched:
- Refactor mirror list handling
- Rework hw fence processing

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125231517.26268-1-alexander.deucher@amd.com
2019-02-01 09:34:20 +10:00
Andrey Grodzovsky
222b5f0441 drm/sched: Refactor ring mirror list handling.
Decauple sched threads stop and start and ring mirror
list handling from the policy of what to do about the
guilty jobs.
When stoppping the sched thread and detaching sched fences
from non signaled HW fenes wait for all signaled HW fences
to complete before rerunning the jobs.

v2: Fix resubmission of guilty job into HW after refactoring.

v4:
Full restart for all the jobs, not only from guilty ring.
Extract karma increase into standalone function.

v5:
Rework waiting for signaled jobs without relying on the job
struct itself as those might already be freed for non 'guilty'
job's schedulers.
Expose karma increase to drivers.

v6:
Use list_for_each_entry_safe_continue and drm_sched_process_job
in case fence already signaled.
Call drm_sched_increase_karma only once for amdgpu and add documentation.

v7:
Wait only for the latest job's fence.

Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-25 16:15:36 -05:00
wentalou
f14899fd2a drm/amdgpu: sriov should skip asic_reset in device_init
sriov would meet guest driver load failure,
if calling amdgpu_asic_reset in amdgpu_device_init.
sriov should skip asic_reset in device_init.

Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-25 16:15:35 -05:00
Daniel Vetter
fcd70cd36b drm: Split out drm_probe_helper.h
Having the probe helper stuff (which pretty much everyone needs) in
the drm_crtc_helper.h file (which atomic drivers should never need) is
confusing. Split them out.

To make sure I actually achieved the goal here I went through all
drivers. And indeed, all atomic drivers are now free of
drm_crtc_helper.h includes.

v2: Make it compile. There was so much compile fail on arm drivers
that I figured I'll better not include any of the acks on v1.

v3: Massive rebase because i915 has lost a lot of drmP.h includes, but
not all: Through drm_crtc_helper.h > drm_modeset_helper.h -> drmP.h
there was still one, which this patch largely removes. Which means
rolling out lots more includes all over.

This will also conflict with ongoing drmP.h cleanup by others I
expect.

v3: Rebase on top of atomic bochs.

v4: Review from Laurent for bridge/rcar/omap/shmob/core bits:
- (re)move some of the added includes, use the better include files in
  other places (all suggested from Laurent adopted unchanged).
- sort alphabetically

v5: Actually try to sort them, and while at it, sort all the ones I
touch.

v6: Rebase onto i915 changes.

v7: Rebase once more.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: CK Hu <ck.hu@mediatek.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: virtualization@lists.linux-foundation.org
Cc: etnaviv@lists.freedesktop.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-amlogic@lists.infradead.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: spice-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-renesas-soc@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-tegra@vger.kernel.org
Cc: xen-devel@lists.xen.org
Link: https://patchwork.freedesktop.org/patch/msgid/20190117210334.13234-1-daniel.vetter@ffwll.ch
2019-01-24 13:20:42 +01:00
Alex Deucher
95e8e59ec4 drm/amdgpu: check if we need to reset at init time (v2)
To deal with situations like kexec or GPU VM passthrough
where the device may have been used previously without a
proper GPU reset between.

v2: rebase

bug: https://bugs.freedesktop.org/show_bug.cgi?id=108585
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108754
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-14 15:04:57 -05:00
Tom St Denis
22d6575b8d drm/amd/amdgpu: add missing mutex lock to amdgpu_get_xgmi_hive() (v3)
v2: Move locks around in other functions so that this
function can stand on its own.  Also only hold the hive
specific lock for add/remove device instead of the driver
global lock so you can't add/remove devices in parallel from
one hive.

v3: add reset_lock

Acked-by:  Shaoyun.liu < Shaoyun.liu@amd.com>
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-14 15:04:53 -05:00
Emily Deng
72d3f59205 drm/amdgpu/sriov: For finishing routine send rel event after init failed
When init fail, send rel init, req_fini and rel_fini to host for the
finishing routine.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-14 15:04:50 -05:00
wentalou
0aaeefccb4 drm/amdgpu: distinguish early and late re-init log in sriov
distinguish ip_reinit_early_sriov and ip_reinit_late_sriov
by different log RE-INIT-early and RE-INIT-late

Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-14 15:04:49 -05:00
Emily Deng
d3c117e564 drm/amdgpu/sriov:Correct pfvf exchange logic
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu
reset.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-By: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-14 15:04:24 -05:00
Emily Deng
91334223b2 drm/amdgpu/virtual_dce: No need to pin the cursor bo
For virtual display feature, no need to pin cursor bo.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-14 15:04:23 -05:00
Daniel Vetter
c2d88e06bc drm: Move the legacy kms disable_all helper to crtc helpers
It's not a core function, and the matching atomic functions are also
not in the core. Plus the suspend/resume helper is also already there.

Needs a tiny bit of open-coding, but less midlayer beats that I think.

v2: Rebase onto ast (which gained a new user).

Cc: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Rex Zhu <Rex.Zhu@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shaoyun Liu <Shaoyun.Liu@amd.com>
Cc: Monk Liu <Monk.Liu@amd.com>
Cc: nouveau@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20181217194303.14397-4-daniel.vetter@ffwll.ch
2019-01-11 22:54:29 +01:00
wentalou
7b184b0061 drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-14 12:04:38 -05:00
Andrey Grodzovsky
fc42d47ce0 drm/amdgpu: Enable GPU recovery by default for CI
I retested Bonaire (gfx7 dGPU) and it works fine.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-12 14:26:40 -05:00
Alex Deucher
223577753b drm/amdgpu/si: fix SI after doorbell rework
SI does not use doorbells, move asic doorbell init later
asic check.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108920
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-05 17:49:50 -05:00
Andrey Grodzovsky
d4535e2c01 drm/amdgpu: Implement concurrent asic reset for XGMI.
Use per hive wq to concurrently send reset commands to all nodes
in the hive.

v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop  the hive reset for each node loop if there
is a reset failure on any of the nodes.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-03 11:15:14 -05:00
Andrey Grodzovsky
a82400b57a drm/amdgpu: Handle xgmi device removal.
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.

v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-03 11:15:08 -05:00
Oak Zeng
88dc26e46b drm/amdgpu: Fix num_doorbell calculation issue
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-30 12:01:04 -05:00
Andrey Grodzovsky
26bc534094 drm/amdgpu: Refactor GPU reset for XGMI hive case
For XGMI hive case do reset in steps where each step iterates over
all devs in hive. This especially important for asic reset
since all PSP FW in hive must come up within a limited time
(around 1 sec) to properply negotiate the link.
Do this by  refactoring  amdgpu_device_gpu_recover and amdgpu_device_reset
into pre_asic_reset, asic_reset and post_asic_reset functions where is part
is exectued for all the GPUs in the hive before going to the next step.

v2: Update names for amdgpu_device_lock/unlock functions.

v3: Introduce per hive locking to avoid multiple resets for GPUs
    in same hive.
v4:
Remove delayed_workqueue()/ttm_bo_unlock_delayed_workqueue() - they
are copy & pasted over from radeon and on amdgpu there isn't
any reason for that any more.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28 15:55:36 -05:00
Andrey Grodzovsky
5183411b56 drm/amdgpu: Refactor amdgpu_xgmi_add_device
This is prep work for updating each PSP FW in hive after
GPU reset.
Split into build topology SW state and update each PSP FW in the hive.
Save topology and count of XGMI devices for reuse.

v2: Create seperate header for XGMI.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28 15:55:35 -05:00
Oak Zeng
9564f1928e drm/amdgpu: Use asic specific doorbell index instead of macro definition
ASIC specific doorbell layout is used instead of enum definition

Signed-off-by: Oak Zeng <ozeng@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28 15:55:33 -05:00
Oak Zeng
6585661ddd drm/amdgpu: Call doorbell index init on device initialization
Also call functioin amdgpu_device_doorbell_init after
amdgpu_device_ip_early_init because the former depends
on the later to set up asic-specific init_doorbell_index
function

Signed-off-by: Oak Zeng <ozeng@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28 15:55:32 -05:00
Philip Yang
ec3db8a63d drm/amdgpu: enable paging queue doorbell support v4
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue will
break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, paging
queues doorbell index use same index as SDMA gfx queues index but on second page.

For Vega20, after we change doorbell layout to increase SDMA doorbell for 8 SDMA RLC
queues later, we could use new doorbell index for paging queue.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-20 14:04:58 -05:00
Hawking Zhang
3e2e2ab554 drm/amdgpu/psp: initialize xgmi session (v2)
Setup and tear down xgmi as part of psp.

v2:
- make psp_xgmi_terminate static
- squash in:
drm/amdgpu: only issue xgmi cmd when it is enabled
drm/amdgpu/psp: terminate xgmi ta in suspend and hw_fini phase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-06 14:02:43 -05:00
Rex Zhu
1e256e2762 drm/amdgpu: Refine CSA related functions
There is no functional changes,
Use function arguments for SRIOV special variables which
is hardcode in those functions.

so we can share those functions in baremetal.

Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:48 -05:00
Andrey Grodzovsky
3ba7b418f1 drm/amdgpu: Enable default GPU reset for dGPU on gfx8/9 v3
After testing looks like these subset of ASICs has GPU reset
working for the most part. Enable reset due to job timeout.

v2: Switch from GFX version to ASIC type.
v3: Fix identation

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:23 -05:00
Christian König
9d064be1e6 drm/amdgpu: revert "enable gfxoff in non-sriov and stutter mode by default"
This is still completely breaking my Raven system.

This reverts commit cdf2f910fa969adca1b0e3ad2b487821233dc038.

Revert until we sort out the sbios and firmware combinations that work
correctly.

bug: https://bugs.freedesktop.org/show_bug.cgi?id=108606
Cc: stable@vger.kernel.org # v4.19

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-01 09:56:56 -05:00
Andrey Grodzovsky
734afd4b21 drm/amdgpu: Fix skipping hangged job reset during gpu recover.
Problem:
During GPU recover DAL would hang in
amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty

Fix:
Turns out there was a typo introduced by
3320b8d drm/amdgpu: remove job->ring which caused skipping
amdgpu_fence_driver_force_completion and so the hangged job
was never force signaled and this would cause the hang later in DAL.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-01 09:51:33 -05:00
Emily Deng
91eec27ebb drm/amdgpu: Fix null pointer amdgpu_device_fw_loading
Need to check adev->powerplay.pp_funcs.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-22 14:40:54 -05:00
Rex Zhu
7a3e0bb2a5 drm/amdgpu: Load fw between hw_init/resume_phase1 and phase2
Extract the function of fw loading out of powerplay.
Do fw loading between hw_init/resuem_phase1 and phase2

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10 14:49:21 -05:00
Rex Zhu
0a4f25205e drm/amdgpu: split ip hw_init into 2 phases
We need to do some IPs earlier to deal with ordering issues
similar to how resume is split into two phases.

Will do fw loading via smu/psp between the two phases.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10 14:49:09 -05:00
Rex Zhu
c8963ea4ce drm/amdgpu: Split amdgpu_ucode_init/fini_bo into two functions
1. one is for create/free bo when init/fini
2. one is for fill the bo before fw loading

the ucode bo only need to be created when load driver
and free when driver unload.

when resume/reset, driver only need to re-fill the bo
if the bo is allocated in vram.

Suggested by Christian.

v2: Return error when bo create failed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10 14:48:52 -05:00
Rex Zhu
a2d31dc3cf drm/amdgpu: Check late_init status before set cg/pg state
Fix cg/pg unexpected set in hw init failed case.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10 14:48:44 -05:00
Rex Zhu
73f847dbab drm/amdgpu: Refine function amdgpu_device_ip_late_init
1. only call late_init when hw_init successful,
   so check status.hw instand of status.valid in late_init.
2. set status.late_initialized true if late_init was not implemented.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10 14:48:28 -05:00
Rex Zhu
44779b43f1 drm/amdgpu: Move gfx flag in_suspend to adev
Move in_suspend flag to adev from gfx, so
can be used in other ip blocks, also keep
consistent with gpu_in_reset flag.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-09 17:05:33 -05:00
Evan Quan
b55c9e7a11 drm/amd/powerplay: helper interfaces for MGPU fan boost feature
MGPU fan boost feature is enabled only when two or more dGPUs
in the system.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-09 16:59:56 -05:00
Christian König
403009bfba drm/amdgpu: fix shadow BO restoring
Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-19 12:38:41 -05:00