Commit Graph

377 Commits

Author SHA1 Message Date
Dave Airlie
c76f0b2cc2 Merge tag 'drm-amdkfd-next-2018-05-14' of git://people.freedesktop.org/~gabbayo/linux into drm-next
This is amdkfd pull for 4.18. The major new features are:

- Add support for GFXv9 dGPUs (VEGA)
- Add support for userptr memory mapping

In addition, there are a couple of small fixes and improvements, such as:
- Fix lock handling
- Fix rollback packet in kernel kfd_queue
- Optimize kfd signal handling
- Fix CP hang in APU

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180514070126.GA1827@odedg-x270
2018-05-15 16:06:08 +10:00
Randy Dunlap
7bbc0b950f drm/amdkfd: fix build, select MMU_NOTIFIER
When CONFIG_MMU_NOTIFIER is not enabled, struct mmu_notifier has an
incomplete type definition, which causes build errors.

../drivers/gpu/drm/amd/amdkfd/kfd_priv.h:607:22: error: field 'mmu_notifier' has incomplete type
../include/linux/kernel.h:979:32: error: dereferencing pointer to incomplete type
../include/linux/kernel.h:980:18: error: dereferencing pointer to incomplete type
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:434:2: error: implicit declaration of function 'mmu_notifier_unregister_no_release' [-Werror=implicit-function-declaration]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:435:2: error: implicit declaration of function 'mmu_notifier_call_srcu' [-Werror=implicit-function-declaration]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:438:21: error: variable 'kfd_process_mmu_notifier_ops' has initializer but incomplete type
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:439:2: error: unknown field 'release' specified in initializer
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:439:2: warning: excess elements in struct initializer [enabled by default]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:439:2: warning: (near initialization for 'kfd_process_mmu_notifier_ops') [enabled by default]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:534:2: error: implicit declaration of function 'mmu_notifier_register' [-Werror=implicit-function-declaration]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-24 12:50:04 +03:00
Andres Rodriguez
1cf6cc74bb drm/amdkfd: fix clock counter retrieval for node without GPU
Currently if a user requests clock counters for a node without a GPU
resource we will always return EINVAL.

Instead if no GPU resource is attached, fill the gpu_clock_counter
argument with zeroes so that we may proceed and return valid CPU
counters.

Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-24 12:34:44 +03:00
Wei Yongjun
ded5e5622c drm/amdkfd: Fix the error return code in kfd_ioctl_unmap_memory_from_gpu()
Passing NULL pointer to PTR_ERR will result in return value of 0
indicating success which is clearly not what it is intended here.
This patch returns -EINVAL instead.

v2: change ret code to -ENODEV

Fixes: 5ec7e02854 ("drm/amdkfd: Add ioctls for GPUVM memory management")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-24 12:14:55 +03:00
kbuild test robot
a4efd3a4e6 drm/amdkfd: kfd_dev_is_large_bar() can be static
Fixes: 5ec7e02854 ("drm/amdkfd: Add ioctls for GPUVM memory management")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-24 12:05:27 +03:00
Laura Abbott
af47b39027 drm/amdkfd: Remove vla
There's an ongoing effort to remove VLAs[1] from the kernel to eventually
turn on -Wvla. Switch to a constant value that covers all hardware.

[1] https://lkml.org/lkml/2018/3/7/621

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-13 14:24:12 -07:00
Felix Kuehling
c129db1206 drm/amdkfd: Add sanity checks in IRQ handlers
Only accept interrupts from KFD VMIDs. Just checking for a PASID may
not be enough because amdgpu started using PASIDs to map VM faults
to processes.

Warn if an IRQ doesn't have a valid PASID (indicating a firmware bug).

Suggested-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Suggested-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:12 -04:00
Shaoyun Liu
2533f0741e drm/amdkfd: Remove queue node when destroy queue failed
HWS may hang in the middle of destroy queue, remove the queue from the
process queue list so it won't be freed again in the future

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:11 -04:00
Ben Goz
bfdcbfd255 drm/amdkfd: Locking PM mutex while allocating IB buffer
Signed-off-by: Ben Goz <ben.goz@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:10 -04:00
Felix Kuehling
ccb76b149e drm/amdkfd: Remove initialization of cp_hqd_ib_control on CIK
The initialization is not necessary. amd-kfd-staging and ROCm
releases have worked without it for two years.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:09 -04:00
Felix Kuehling
eeb27b7eb3 drm/amdkfd: Fix signal handling performance again
It turns out that idr_for_each_entry is really slow compared to just
iterating over the slots. Based on measurements the difference is
estimated to be about a factor 64. That means using idr_for_each_entry
is only worth it with very few allocated events.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:08 -04:00
Yong Zhao
f8ea72d097 drm/amdkfd: Fix CP soft hang on APUs
The problem happens on Raven and Carrizo. The context save handler
should not clear the high bits of PC_HI before extracting the bits
of IB_STS.

The bug is not relevant to VEGA10 until we enable demand paging.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:07 -04:00
Yong Zhao
0db54b24ad drm/amdkfd: Separate trap handler assembly code and its hex values
Since the assembly code is inside "#if 0", it is ineffective. Despite that,
during debugging, we need to change the assembly code, extract it into
a separate file and compile the new file into hex values using sp3.
That process also requires us to remove "#if 0" and modify lines starting
with "#", so that sp3 can successfully compile the new file.

With this change, all the above chore is no longer needed, and
cwsr_trap_handler_gfx*.asm can be directly used by sp3 to generate its
hex values.

Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:06 -04:00
Felix Kuehling
a2e94158b8 drm/amdkfd: Remove redundant include of amd-iommu.h
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:05 -04:00
Philip Yang
fa7e65147e drm/amdkfd: use %px to print user space address instead of %p
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:04 -04:00
Jay Cornwall
2774c63ef3 drm/amdkfd: Use volatile MTYPE in default/alternate apertures
MTYPE_NC_NV (0) marks scalar/vector L1 cache lines as non-volatile.
Cache lines loaded through these apertures are intended to be
invalidated before (and sometimes during) a dispatch. The non-volatile
qualifier prevents these cache lines from being distinguished from
those loaded through the private aperture.

Use MTYPE_NC (1) instead on both Gfx7 and Gfx8. This allows the
compiler to use the BUFFER_WBINVL1_VOL instruction and is a precursor
to automatic per-dispatch scalar/vector L1 volatile invalidation.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:03 -04:00
Jay Cornwall
87e6d4e077 drm/amdkfd: Reduce priority of context-saving waves before spin-wait
Synchronization between context-saving wavefronts is achieved by
sending a SAVEWAVE message to the SPI and then spin-waiting for a
response. These spin-waiting wavefronts may inhibit the progress
of other wavefronts in the context save handler, leading to the
synchronization condition never being achieved.

Before spin-waiting reduce the priority of each wavefront to
guarantee foward progress in the others.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:02 -04:00
Oak Zeng
24f48a4203 drm/amdkfd: Dump HQD of HIQ
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:01 -04:00
Dan Carpenter
8feaccf71d drm/amdkfd: Integer overflows in ioctl
args->n_devices is a u32 that comes from the user.  The multiplication
could overflow on 32 bit systems possibly leading to privilege
escalation.

Fixes: 5ec7e02854 ("drm/amdkfd: Add ioctls for GPUVM memory management")
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-24 16:35:49 +03:00
Felix Kuehling
389056e5fe drm/amdkfd: Add Vega10 topology and device info
* Report 64-bit doorbells as HSA_CAP_DOORBELL_TYPE_2_0 in topology
* Report cache information in topology (duplicates GFXv8 info for now)
* Add device info for Vega10 support in KFD

Raven is not enabled at this time as it needs additional changes in
DQM to work with a single SDMA engine.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:18 -04:00
welu
6106dce955 drm/amdkfd: Try to enable atomics for all GPUs
Report failure to enable atomics only on GPUs that require them.
This allows GPUs that don't require atomics to function, but can
benefit if they are available. This is the case for Vega10, which
doesn't use atomics for basic functioning of the MEC, AQL and HWS
microcode. So it can work without atomics. But shader programs can
still use atomic instructions on systems that support PCIe atomics.

Signed-off-by: welu <Wei.Lu2@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:17 -04:00
Felix Kuehling
3e76c2399b drm/amdkfd: Add GFXv9 CWSR trap handler
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:16 -04:00
Felix Kuehling
70a31d16cc drm/amdkfd: Support flat memory apertures for GFXv9
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:15 -04:00
Felix Kuehling
6aac0a48b0 drm/amdkfd: Remove limit on number of GPUs (follow-up)
This condition was missed in a previous commit with the same title.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:14 -04:00
Felix Kuehling
9d7d024816 drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
v2: Removed redundant 0x before %p.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-08 22:03:51 -04:00
Felix Kuehling
bebfd2f412 drm/amdkfd: Fix kernel queue rollback_packet
kq->queue->properties.write_ptr is a GPU address which can'd be
derefenced in the kernel. Use kq->wptr_kernel instead, which is the
kernel CPU address of the same buffer.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:12 -04:00
Felix Kuehling
2a26fbfe80 drm/amdkfd: Fix goto usage
Missed a spot in previous cleanup commit:
Remove gotos that do not feature any common cleanup, and use gotos
instead of repeating cleanup commands.

According to kernel.org: "The goto statement comes in handy when a
function exits from multiple locations and some common work such as
cleanup has to be done. If there is no cleanup needed then just return
directly."

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:11 -04:00
Felix Kuehling
ca750681bc drm/amdkfd: Add SOC15 interrupt processing support
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:10 -04:00
Felix Kuehling
bed4f11025 drm/amdkfd: Add GFXv9 device queue manager
Signed-off-by: John Bridgman <john.bridgman@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:09 -04:00
Felix Kuehling
b91d43dd01 drm/amdkfd: Add GFXv9 MQD manager
Signed-off-by: John Bridgman <john.bridgman@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:08 -04:00
Felix Kuehling
454150b1f9 drm/amdkfd: Add GFXv9 PM4 packet writer functions
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:07 -04:00
Felix Kuehling
f6e27ff19d drm/amdkfd: Move packet writer functions into ASIC-specific file
This is in preparation for GFXv9 (Vega10) which uses incompatible PM4
packet formats from previous ASIC generations.

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:06 -04:00
Felix Kuehling
ef568db792 drm/amdkfd: Implement doorbell allocation for SOC15
Allocate doorbells according to the doorbell routing information on
SOC15 ASICs (Vega10 and later). On older ASICs we continue to use the
queue_id as the doorbell ID to maintain compatibility with the Thunk.

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:05 -04:00
Harish Kasiviswanathan
df03ef9342 drm/amdkfd: Clean up KFD_MMAP_ offset handling
Use bit-rotate for better clarity and remove _MASK from the #defines as
these represent mmap types.

Centralize all the parsing of the mmap offset in kfd_mmap and add device
parameter to doorbell and reserved_mem map functions.

Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits
for encoding the the doorbell ID on Vega10.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:04 -04:00
Felix Kuehling
ada2b29c4a drm/amdkfd: Make doorbell size ASIC-dependent
This prepares for GFXv9 (Vega10), which has 64-bit doorbells.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-04-10 17:33:03 -04:00
Linus Torvalds
320b164abb main drm pull request for v4.17
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJavDxYAAoJEAx081l5xIa+pCoP/iwjuxkSTdJpZUx5g0daGkCK
 O18moGqGPChb7qJovfHqCKZ1f9PGulQt7SxwFzzJXNbv0PbfMA/Og0EhMLBImb+Q
 VfYgq2vJLpmkikgcI5fBrzs9DRMQKKobGIzw24VS7IkPYA7d8KgAyywBwG0+LUFR
 G3sobClgapsfaUcleb3ZOeDwymGkGCuuYRpYE4giHtuMDIxCWLePKJKOaOIq8o6P
 A1557EvSbKuLQGI9X50jzJOoBE3TKRQYkzuM1GthdOF8RHaMNcFy44lDNO030HwZ
 hzwAIg5Izhu16PqZGyEdIQ6SJTv3isRJWEciPnOsijvjl1li3ehMdQfhGISa/jZO
 ivEGd32kaactiT0jJ5OyexergEViCPVKCIORksSIk46L84luDva9L22A3yu0mf3F
 ixB63bAiLH7Py77kH3DmeJdqhMxlVZXCbdBVFDvzZvY4O3Mx0Dv9mmN/nw1FVCFH
 scSYnXea9/o4IY5yGASU6FAUJEEGu20HAN12oHJw7/taqV/gbbEos3F7AGmjJE0f
 qe6Rt/8fwi7Lhm2va6EoOo6yltH/gL4/AgnsN76VzppNGbaIv7W8Qa4Y/ES1lAE1
 SATAEUJfU8kiLrVOolIElPbgfdJwv8TzoxiKB5wK/eoH20wf4BTmOuBMviaL2qXK
 Sz6wihq+IlMXW7Y7pIl/
 =DrA+
 -----END PGP SIGNATURE-----

Merge tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "Cannonlake and Vega12 support are probably the two major things. This
  pull lacks nouveau, Ben had some unforseen leave and a few other
  blockers so we'll see how things look or maybe leave it for this merge
  window.

  core:
   - Device links to handle sound/gpu pm dependency
   - Color encoding/range properties
   - Plane clipping into plane check helper
   - Backlight helpers
   - DP TP4 + HBR3 helper support

  amdgpu:
   - Vega12 support
   - Enable DC by default on all supported GPUs
   - Powerplay restructuring and cleanup
   - DC bandwidth calc updates
   - DC backlight on pre-DCE11
   - TTM backing store dropping support
   - SR-IOV fixes
   - Adding "wattman" like functionality
   - DC crc support
   - Improved DC dual-link handling

  amdkfd:
   - GPUVM support for dGPU
   - KFD events for dGPU
   - Enable PCIe atomics for dGPUs
   - HSA process eviction support
   - Live-lock fixes for process eviction
   - VM page table allocation fix for large-bar systems

  panel:
   - Raydium RM68200
   - AUO G104SN02 V2
   - KEO TX31D200VM0BAA
   - ARM Versatile panels

  i915:
   - Cannonlake support enabled
   - AUX-F port support added
   - Icelake base enabling until internal milestone of forcewake support
   - Query uAPI interface (used for GPU topology information currently)
   - Compressed framebuffer support for sprites
   - kmem cache shrinking when GPU is idle
   - Avoid boosting GPU when waited item is being processed already
   - Avoid retraining LSPCON link unnecessarily
   - Decrease request signaling latency
   - Deprecation of I915_SET_COLORKEY_NONE
   - Kerneldoc and compiler warning cleanup for upcoming CI enforcements
   - Full range ycbcr toggling
   - HDCP support

  i915/gvt:
   - Big refactor for shadow ppgtt
   - KBL context save/restore via LRI cmd (Weinan)
   - Properly unmap dma for guest page (Changbin)

  vmwgfx:
   - Lots of various improvements

  etnaviv:
   - Use the drm gpu scheduler
   - prep work for GC7000L support

  vc4:
   - fix alpha blending
   - Expose perf counters to userspace

  pl111:
   - Bandwidth checking/limiting
   - Versatile panel support

  sun4i:
   - A83T HDMI support
   - A80 support
   - YUV plane support
   - H3/H5 HDMI support

  omapdrm:
   - HPD support for DVI connector
   - remove lots of static variables

  msm:
   - DSI updates from 10nm / SDM845
   - fix for race condition with a3xx/a4xx fence completion irq
   - some refactoring/prep work for eventual a6xx support (ie. when we
     have a userspace)
   - a5xx debugfs enhancements
   - some mdp5 fixes/cleanups to prepare for eventually merging
     writeback
   - support (ie. when we have a userspace)

  tegra:
   - mmap() fixes for fbdev devices
   - Overlay plane for hw cursor fix
   - dma-buf cache maintenance support

  mali-dp:
   - YUV->RGB conversion support

  rockchip:
   - rk3399/chromebook fixes and improvements

  rcar-du:
   - LVDS support move to drm bridge
   - DT bindings for R8A77995
   - Driver/DT support for R8A77970

  tilcdc:
   - DRM panel support"

* tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux: (1646 commits)
  drm/i915: Fix hibernation with ACPI S0 target state
  drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt
  drm/i915: Specify which engines to reset following semaphore/event lockups
  drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
  drm/amdkfd: Use ordered workqueue to restore processes
  drm/amdgpu: Fix acquiring VM on large-BAR systems
  drm/amd/pp: clean header file hwmgr.h
  drm/amd/pp: use mlck_table.count for array loop index limit
  drm: Fix uabi regression by allowing garbage mode->type from userspace
  drm/amdgpu: Add an ATPX quirk for hybrid laptop
  drm/amdgpu: fix spelling mistake: "asssert" -> "assert"
  drm/amd/pp: Add new asic support in pp_psm.c
  drm/amd/pp: Clean up powerplay code on Vega12
  drm/amd/pp: Add smu irq handlers for legacy asics
  drm/amd/pp: Fix set wrong temperature range on smu7
  drm/amdgpu: Don't change preferred domian when fallback GTT v5
  drm/vmwgfx: Bump version patchlevel and date
  drm/vmwgfx: use monotonic event timestamps
  drm/vmwgfx: Unpin the screen object backup buffer when not used
  drm/vmwgfx: Stricter count of legacy surface device resources
  ...
2018-04-02 07:59:23 -07:00
Felix Kuehling
6b95e7973a drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
These interfaces allow KGD to stop and resume all GPU user mode queue
access to a process address space. This is needed for handling MMU
notifiers of userptrs mapped for GPU access in KFD VMs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-23 15:32:32 -04:00
Felix Kuehling
d1853f42b6 drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
When an MMU notifier runs in memory reclaim context, it can deadlock
trying to take locks that are already held in the thread causing the
memory reclaim. The solution is to avoid memory reclaim while holding
locks that are taken in MMU notifiers by using GFP_NOIO.

This commit fixes memory allocations done while holding the dqm->lock
which is needed in the MMU notifier (dqm->ops.evict_process_queues).

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-23 15:32:31 -04:00
Felix Kuehling
1679ae8f8f drm/amdkfd: Use ordered workqueue to restore processes
Restoring multiple processes concurrently can lead to live-locks
where each process prevents the other from validating all its BOs.

v2: fix duplicate check of same variable

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-23 15:30:36 -04:00
Felix Kuehling
72a01d231d drm/amdkfd: Deallocate SDMA queues correctly
Deallocate SDMA queues during abnormal process termination and when
queue creation fails after the SDMA allocation.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-23 15:30:34 -04:00
Felix Kuehling
c70a362687 drm/amdkfd: Fix scratch memory with HWS enabled
Program sh_hidden_private_base_vmid correctly in the map-process
PM4 packet.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-23 15:30:33 -04:00
Felix Kuehling
374200b154 drm/amdkfd: Add module option for testing large-BAR functionality
Simulate large-BAR system by exporting only visible memory. This
limits the amount of available VRAM to the size of the BAR, but
enables CPU access to VRAM.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:53 -04:00
Felix Kuehling
0fc8011f89 drm/amdkfd: Kmap event page for dGPUs
The events page must be accessible in user mode by the GPU and CPU
as well as in kernel mode by the CPU. On dGPUs user mode virtual
addresses are managed by the Thunk's GPU memory allocation code.
Therefore we can't allocate the memory in kernel mode like we do
on APUs. But KFD still needs to map the memory for kernel access.
To facilitate this, the Thunk provides the buffer handle of the
events page to KFD when creating the first event.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:52 -04:00
Felix Kuehling
5ec7e02854 drm/amdkfd: Add ioctls for GPUVM memory management
v2:
* Fix error handling after kfd_bind_process_to_device in
  kfd_ioctl_map_memory_to_gpu
v3:
* Add ioctl to acquire VM from a DRM FD
v4:
* Return number of successful map/unmap operations in failure cases
* Facilitate partial retry after failed map/unmap
* Added comments with parameter descriptions to new APIs
* Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:51 -04:00
Felix Kuehling
552764b680 drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
On GFX7 the CP does not perform a TC flush when queues are unmapped.
To avoid TC eviction from accessing an invalid VMID, flush it
explicitly before releasing a VMID.

v2: Fix unnecessary list_for_each_entry_safe
v3: Moved allocation to kfd_process_device_init_vm

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:50 -04:00
Felix Kuehling
f35751b870 drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
Add helpers for allocating GPUVM memory in kernel mode and use them
to allocate memory for the CWSR trap handler.

v2: Use dev instead of pdd->dev in kfd_process_free_gpuvm
v3:
* Cleaned up and simplified kfd_process_alloc_gpuvm
* Moved allocation for dGPU to kfd_process_device_init_vm

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:49 -04:00
Felix Kuehling
52b29d7334 drm/amdkfd: Add per-process IDR for buffer handles
Also used for cleaning up on process termination.

v2: Refactored cleanup on process termination

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:48 -04:00
Felix Kuehling
d01994c24c drm/amdkfd: Aperture setup for dGPUs
Set up the GPUVM aperture for SVM (shared virtual memory) that allows
sharing a part of virtual address space between GPUs and CPUs.

Report the size of the GPUVM aperture that is supported by KGD accurately.

The low part of the GPUVM aperture is reserved for kernel use. This is
for kernel-allocated buffers that are only accessed on the GPU:
- CWSR trap handler
- IB for submitting commands in user-mode context from kernel mode

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:47 -04:00
Felix Kuehling
c7bcbfa4f8 drm/amdkfd: Remove limit on number of GPUs
Currently the number of GPUs is limited by aperture placement options
available on GFX7 and GFX8 hardware. This limitation is not necessary.
Scratch and LDS represent per-work-item and per-work-group storage
respectively. Different work-items and work-groups use the same virtual
address to access their own data. Work running on different GPUs is by
definition in different work-groups (different dispatches, in fact).
That means the same virtual addresses can be used for these apertures
on different GPUs.

Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the
artificial limitation on the number of GPUs that can be supported. The
new ioctl allows user mode to query the number of GPUs to allocate
enough memory for all GPUs to be reported.

This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:46 -04:00
Oak Zeng
7c9b717196 drm/amdkfd: Populate DRM render device minor
Populate DRM render device minor in kfd topology

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-15 17:27:45 -04:00