Use a reference counter instead of a lock to prevent process
destruction while functions running out of process context are using
the kfd_process structure. In many cases these functions don't need
the structure to be locked. In the few cases that really do need the
process lock, take it explicitly.
This helps simplify lock dependencies between the process lock and
other locks, particularly amdgpu and mm_struct locks. This will be
important when amdgpu calls back to amdkfd for memory evictions.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This will be used to elliminate the use of the process lock for
preventing concurrent process destruction. This will simplify lock
dependencies between KFD and KGD.
This also simplifies the process destruction in a few ways:
* Don't allocate work struct dynamically
* Remove unnecessary hack that increments mm reference counter
* Remove unnecessary process locking during destruction
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This commit adds several debugfs entries for kfd:
kfd/hqds: dumps all HQDs on all GPUs for KFD-controlled compute and
SDMA RLC queues
kfd/mqds: dumps all MQDs of all KFD processes on all GPUs
kfd/rls: dumps HWS runlists on all GPUs
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Allow HWS to to execute multiple processes on the hardware
concurrently. The number of concurrent processes is limited by
the number of VMIDs allocated to the HWS.
A module parameter can be used for limiting this further or turn
it off altogether (mainly for debugging purposes).
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This hardware feature allows the GPU to preempt shader execution in
the middle of a compute wave, save the state and restore it later
to resume execution.
Memory for saving the state is allocated per queue in user mode and
the address and size passed to the create_queue ioctl. The size
depends on the number of waves that can be in flight simultaneously
on a given ASIC.
Signed-off-by: Shaoyun.liu <shaoyun.liu@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
A list of per-process queues is maintained in the
kfd_process_queue_manager, so the queues array in kfd_process is
redundant and in fact unused.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In systems under heavy load the IH work may experience significant
scheduling delays.
Under load + system workqueue:
Max Latency: 7.023695 ms
Avg Latency: 0.263994 ms
Under load + high priority workqueue:
Max Latency: 1.162568 ms
Avg Latency: 0.163213 ms
Further work is required to measure the impact of per-cpu settings on IH
performance.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Replace our implementation of a lockless ring buffer with the standard
linux kernel kfifo.
We shouldn't maintain our own version of a standard data structure.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This allows increasing the KFD_SIGNAL_EVENT_LIMIT in kfd_ioctl.h
without breaking processes built with older kfd_ioctl.h versions.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signal slots are identical to event IDs.
Replace the used_slot_bitmap and events hash table with an IDR to
allocate and lookup event IDs and signal slots more efficiently.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The first event page is always big enough to handle all events.
Handling of multiple events pages is not supported by user mode, and
not necessary.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cleaned up the code while resolving some potential bugs and
inconsistencies in the process.
Clean-ups:
* Remove enum kfd_event_wait_result, which duplicates
KFD_IOC_EVENT_RESULT definitions
* alloc_event_waiters can be called without holding p->event_mutex
* Return an error code from copy_signaled_event_data instead of bool
* Clean up error handling code paths to minimize duplication in
kfd_wait_on_events
Fixes:
* Consistently return an error code from kfd_wait_on_events and set
wait_result to KFD_IOC_WAIT_RESULT_FAIL in all failure cases.
* Always call free_waiters while holding p->event_mutex
* copy_signaled_event_data might sleep. Don't call it while the task state
is TASK_INTERRUPTIBLE.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The kfd_process doesn't own a reference to the mm_struct, so it can
disappear without warning even while the kfd_process still exists.
Therefore, avoid dereferencing the kfd_process.mm pointer and make
it opaque. Use get_task_mm to get a temporary reference to the mm
when it's needed.
v2: removed unnecessary WARN_ON
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Removed unused num_concurrent_processes.
Implemented counting of queues in QPD. This makes counting the queue
list repeatedly in several places unnecessary.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Separate device queue termination from process queue manager
termination. Unmap all queues at once instead of one at a time.
Unmap device queues before the PASID is unbound, in the
kfd_process_iommu_unbind_callback.
When resetting wavefronts in non-HWS mode, do it before the VMID is
released.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: shaoyun liu <shaoyun.liu@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When unmapping the queues from HW scheduler, there are two actions:
reset and preempt. So naming the variables with only preempt is
inapproriate.
For functions such as destroy_queues_cpsch, what they do actually is to
unmap the queues on HW scheduler rather than to destroy them. Change the
name to reflect that fact. On the other hand, there is already a function
called destroy_queue_cpsch() which exactly destroys a queue, and the name
is very close to destroy_queues_cpsch(), resulting in confusion.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
There are already CHIP_* definitions under amd_shared.h file on amdgpu
side, so KFD should reuse them rather than defining new ones.
Using enum for asic type requires default cases on switch statements
to prevent compiler warnings. WARN on unsupported ASICs. It should never
get there because KFD should not be initialized on unsupported devices.
v2: Replace BUG() with WARN and error return
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The hard-coded values related to VMID were removed in KFD, as those
values can be calculated in the KFD initialization function.
v2: remove unnecessary local variable
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Adjust latencies and timeouts for dequeueing with HWS and consolidate
them in one place. Make them longer to allow long running waves to
complete without causing a timeout. The timeout is twice as long as the
latency plus some buffer to make sure we don't detect a timeout
prematurely.
Change timeouts for dequeueing HQDs without HWS. KFD_UNMAP_LATENCY is
more consistent with what the HWS does for user queues.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The timeout in milliseconds should not be regarded as jiffies. This
commit fixed that.
v2:
- use msecs_to_jiffies
- change timeout_ms parameter to unsigned int to match msecs_to_jiffies
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When we do suspend/resume through "sudo pm-suspend" while there is
HSA activity running, upon resume we will encounter HWS hanging, which
is caused by memory read/write failures. The root cause is that when
suspend, we neglected to unbind pasid from kfd device.
Another major change is that the bind/unbinding is changed to be
performed on a per process basis, instead of whether there are queues
in dqm.
v2:
- free IOMMU device if kfd_bind_processes_to_device fails in kfd_resume
- add comments to kfd_bind/unbind_processes_to/from_device
- minor cleanups
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZ0WQ6AAoJEHm+PkMAQRiGuloH/3sF4qfBhPuJo8OTf0uCtQ18
4Ux9zZbm81df/Jjz0exAp1Jqk+TvdIS3OXPWcKilvbUBP16hQcsxFTnI/5QF+YcN
87aNr+OCMJzOBK4suN1yhzO46NYHeIizdB0PTZVL1Zsto69Tt31D8VJmgH6oBxAw
Isb/nAkOr31dZ9PI5UEExTIanUt6EywVb0UswA+2rNl3h1UkeasQCpMpK2n6HBhU
kVD7sxEd/CN0MmfhB0HrySSam/BeSpOtzoU9bemOwrU2uu9+5+2rqMe7Gsdj4nX6
3Kk+7FQNktlrhxCZIFN/+CdusOUuDd8r/75d7DnsRK5YvSb0sZzJkfD3Nba68Ms=
=7J2+
-----END PGP SIGNATURE-----
BackMerge tag 'v4.14-rc3' into drm-next
Linux 4.14-rc3
Requested by Daniel for the tracing build fix in fixes.
PASID management is moving into KGD. Limiting the PASID range to the
number of doorbell pages is no longer practical.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2:
* Renamed ALLOC_MEMORY_OF_SCRATCH to SET_SCRATCH_BACKING_VA
* Removed size parameter from the ioctl, it was unused
* Removed hole in ioctl number space
* No more call to write_config_static_mem
* Return correct error code from ioctl
Signed-off-by: Moses Reuben <moses.reuben@amd.com>
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Various bug fixes and improvements that accumulated over the last two
years.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Using checkpatch.pl -f <file> showed a number of style issues. This
patch addresses as many of them as possible. Some long lines have been
left for readability, but attempts to minimize them have been made.
v2: Broke long lines in gfx_v7 get_fw_version
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Allow init_queue() to take 'struct queue_properties' by reference.
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
- inline functions need to be static inline, otherwise gcc can opt to
not inline and the linker gets unhappy.
- no forward decls for inline functions, just include the right headers.
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Ben Goz <ben.goz@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466500235-21282-2-git-send-email-daniel.vetter@ffwll.ch
This commit moves the reset wavefront flag to per process per device
data structure, so we can support multiple devices.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This commit makes sure that on process termination, after
we're destroying all the active queues, we're killing all the
existing wave front of the current process.
By doing this we're making sure that if any of the CUs were blocked
by infinite loop we're enforcing it to end the shader explicitly.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The wave control operation supports several command types executed upon
existing wave fronts that belong to the currently debugged process.
The available commands are:
HALT - Freeze wave front(s) execution
RESUME - Resume freezed wave front(s) execution
KILL - Kill existing wave front(s)
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the skeleton H/W debugger module support. This code
enables registration and unregistration of a single HSA process at a
time.
The module saves the process's pasid and use it to verify that only the
registered process is allowed to execute debugger operations through the
kernel driver.
v2: rename get_dbgmgr_mutex to kfd_get_dbgmgr_mutex to namespace it
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds support for static user-mode queues in QCM.
Queues which are designated as static can NOT be preempted by
the CP microcode when it is executing its scheduling algorithm.
This is needed for supporting the debugger feature, because we
can't allow the CP to preempt queues which are currently being debugged.
The number of queues that can be designated as static is limited by the
number of HQDs (Hardware Queue Descriptors).
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Use the generic mechanism to declare a bitmap instead of unsigned long.
It seems that "struct kfd_process.allocated_queue_bitmap" is unused.
Maybe it could be deleted instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds a new kernel module parameter to amdkfd,
called send_sigterm.
This parameter specifies whether amdkfd should send the
SIGTERM signal to an HSA process, when the following conditions
occur:
1. The GPU triggers an exception regarding a kernel that was
issued by this process.
2. The HSA process isn't waiting on an event that handles
this exception.
The default behavior is not to send a SIGTERM and suffice
with a dmesg error print.
Reviewed-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds Peripheral Page Request (PPR) failure processing
and reporting.
Bad address or pointer to a system memory block with inappropriate
read/write permission cause such PPR failure during a user queue
processing. PPR request handling is done by IOMMU driver notifying
AMDKFD module on PPR failure.
The process triggering a PPR failure will be notified by
appropriate event or SIGTERM signal will be sent to it.
v3:
- Change all bool fields in struct kfd_memory_exception_failure to
uint32_t
Signed-off-by: Alexey Skidanov <alexey.skidanov@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the events module (kfd_events.c) and the interrupt
handle module for Kaveri (cik_event_interrupt.c).
The patch updates the interrupt_is_wanted(), so that it now calls the
interrupt isr function specific for the device that received the
interrupt. That function(implemented in cik_event_interrupt.c)
returns whether this interrupt is of interest to us or not.
The patch also updates the interrupt_wq(), so that it now calls the
device's specific wq function, which checks the interrupt source
and tries to signal relevant events.
v2:
Increase limit of signal events to 4096 per process
Remove bitfields from struct cik_ih_ring_entry
Rename radeon_kfd_event_mmap to kfd_event_mmap
Add debug prints to allocate_free_slot and allocate_signal_page
Make allocate_event_notification_slot return a correct value
Add warning prints to create_signal_event
Remove error print from IOCTL path
Reformatted debug prints in kfd_event_mmap
Map correct size (as received from mmap) in kfd_event_mmap
v3:
Reduce limit of signal events back to 256 per process
Fix allocation of kernel memory for signal events
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the interrupt handling module, kfd_interrupt.c, and its
related members in different data structures to the amdkfd driver.
The amdkfd interrupt module maintains an internal interrupt ring
per amdkfd device. The internal interrupt ring contains interrupts
that needs further handling. The extra handling is deferred to
a later time through a workqueue.
There's no acknowledgment for the interrupts we use. The hardware
simply queues a new interrupt each time without waiting.
The fixed-size internal queue means that it's possible for us to lose
interrupts because we have no back-pressure to the hardware.
However, only interrupts that are "wanted" by amdkfd, are copied into
the amdkfd s/w interrupt ring, in order to minimize the chances
for overflow of the ring.
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The current code can only support one kgd instance. We have to
support multiple kgd instances in one system. i.e two amdgpu or two
radeon or one amdgpu + one radeon or more than two kgd instances.
Signed-off-by: Xihan Zhang <xihan.zhang@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
This backmerges drm-fixes into drm-next mainly for the amdkfd
stuff, I'm not 100% confident, but it builds and the amdkfd
folks can fix anything up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Conflicts:
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
Backmerge Linus tree after rc5 + drm-fixes went in.
There were a few amdkfd conflicts I wanted to avoid,
and Ben requested this for nouveau also.
Conflicts:
drivers/gpu/drm/amd/amdkfd/Makefile
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
drivers/gpu/drm/amd/include/kgd_kfd_interface.h
drivers/gpu/drm/i915/intel_runtime_pm.c
drivers/gpu/drm/radeon/radeon_kfd.c
This patch replaces the two current amdkfd module parameters with a new one.
The current parameters that are being replaced are:
- Maximum number of HSA processes
- Maximum number of queues per process
The new parameter that replaces them is called "Maximum queues per device"
This replacement achieves two goals:
- Allows the user to have as many HSA processes as it wants (until
a maximum of 512 HSA processes in Kaveri).
- Removes the limitation the user had on maximum number of queues per HSA
process. E.g. the user can now have processes which only have one queue and
other processes which have hundreds of queues, while before the user
couldn't have more than 128 queues per process (as default).
The default value of the new parameter is 4096 (32 * 128, which were the
defaults of the old parameters). There is almost no additional GART memory
required for the default case. As a reminder, this amount of queues requires a
little bit below 4MB of GART memory.
v2:
In addition, This patch defines a new counter for queues accounting in the DQM
structure. This is done because the current counter only counts active queues
which allows the user to create more queues than the
max_num_of_queues_per_device module parameter allows.
However, we need the current counter for the runlist packet build process, so
the solution is to have a dedicated counter for this accounting.
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Ben Goz <ben.goz@amd.com>
The work queue couldn't reliably prevent the SW ring buffer from
overflowing, so dmesg was spammed by
kfd kfd: Interrupt ring overflow, dropping interrupt.
messages when running e.g. the Atlantis Substance demo from
https://wiki.unrealengine.com/Linux_Demos on Kaveri.
Since the SW ring buffer doesn't actually do anything at this point, just
remove it for now. When actual interrupt processing code is added to
amdkfd, it should try to do things immediately and only defer to work
queues when necessary.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>