Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Slava Grigorev <slava.grigorev@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Slava Grigorev <slava.grigorev@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Backmerge Linus tree after rc5 + drm-fixes went in.
There were a few amdkfd conflicts I wanted to avoid,
and Ben requested this for nouveau also.
Conflicts:
drivers/gpu/drm/amd/amdkfd/Makefile
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
drivers/gpu/drm/amd/include/kgd_kfd_interface.h
drivers/gpu/drm/i915/intel_runtime_pm.c
drivers/gpu/drm/radeon/radeon_kfd.c
We need to wait for the GPUVM flush to complete. There
was some confusion as to how this mechanism was supposed
to work. The operation is not atomic. For GPU initiated
invalidations you need to read back a VM register to
introduce enough latency for the update to complete.
v2: drop gart changes
v3: just read back rather than polling
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Use multiple VMIDs for each VM, one for each ring. That allows
us to execute flushes separately on each ring, still not ideal
cause in a lot of cases rings can share IDs.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously we just allocated space for four hardware semaphores
in each software semaphore object. Make software semaphore objects
represent only one hardware semaphore address again by splitting
the sync code into it's own object.
v2: fix typo in comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use ring structure instead of index and provide vm_id and pd_addr separately.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Always need to set bit 0 of RLC_CGTT_MGCG_OVERRIDE
to avoid unreliable doorbell updates in some cases.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The power management code calls into the display code for
certain things. If certain power management sysfs attributes
are called before the driver has finished initializing all of
the hardware we can run into problems with uninitialized
modesetting state. Add a check to make sure modesetting
init has completed to the bandwidth update callbacks to
fix this. Can be triggered by the tlp and laptop start
up scripts depending on the timing.
bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=83611https://bugs.freedesktop.org/show_bug.cgi?id=85771
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
CE ram size is 32k/0k/0k for GFX/CS0/CS1 with CIK
Ported from amdgpu driver.
Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull drm updates from Dave Airlie:
"This is the main git pull for the drm,
I pretty much froze major pulls at -rc5/6 time, and haven't had much
fallout, so will probably continue doing that.
Lots of changes all over, big internal header cleanup to make it clear
drm features are legacy things and what are things that modern KMS
drivers should be using. Also big move to use the new generic fences
in all the TTM drivers.
core:
atomic prep work,
vblank rework changes, allows immediate vblank disables
major header reworking and cleanups to better delinate legacy
interfaces from what KMS drivers should be using.
cursor planes locking fixes
ttm:
move to generic fences (affects all TTM drivers)
ppc64 caching fixes
radeon:
userptr support,
uvd for old asics,
reset rework for fence changes
better buffer placement changes,
dpm feature enablement
hdmi audio support fixes
intel:
Cherryview work,
180 degree rotation,
skylake prep work,
execlist command submission
full ppgtt prep work
cursor improvements
edid caching,
vdd handling improvements
nouveau:
fence reworking
kepler memory clock work
gt21x clock work
fan control improvements
hdmi infoframe fixes
DP audio
ast:
ppc64 fixes
caching fix
rcar:
rcar-du DT support
ipuv3:
prep work for capture support
msm:
LVDS support for mdp4, new panel, gpu refactoring
exynos:
exynos3250 SoC support, drop bad mmap interface,
mipi dsi changes, and component match support"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (640 commits)
drm/mst: rework payload table allocation to conform better.
drm/ast: Fix HW cursor image
drm/radeon/kv: add uvd/vce info to dpm debugfs output
drm/radeon/ci: add uvd/vce info to dpm debugfs output
drm/radeon: export reservation_object from dmabuf to ttm
drm/radeon: cope with foreign fences inside the reservation object
drm/radeon: cope with foreign fences inside display
drm/core: use helper to check driver features
drm/radeon/cik: write gfx ucode version to ucode addr reg
drm/radeon/si: print full CS when we hit a packet 0
drm/radeon: remove unecessary includes
drm/radeon/combios: declare legacy_connector_convert as static
drm/radeon/atombios: declare connector convert tables as static
drm/radeon: drop btc_get_max_clock_from_voltage_dependency_table
drm/radeon/dpm: drop clk/voltage dependency filters for BTC
drm/radeon/dpm: drop clk/voltage dependency filters for CI
drm/radeon/dpm: drop clk/voltage dependency filters for SI
drm/radeon/dpm: drop clk/voltage dependency filters for NI
drm/radeon: disable audio when we disable hdmi (v2)
drm/radeon: split audio enable between eg and r600 (v2)
...
Adds an extra argument to radeon_bo_create, which is only used in radeon_prime.c.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Not the whole world is a radeon! :-)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Helpful for debugging as the version shows up in a
register dump.
Cc: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we may fail to init the second compute ring.
Noticed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This might decrease the chance of IH ring buffer overflows.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the same format for all ring indices, and fix the calculation of the
post-overflow RPTR.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise the bit remains set in rdev->ih.rptr, so the wptr can never
match that and we still have an infinite loop.
This fix allows me to successfully recover from an IH ring buffer
overflow.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUFjfVAAoJEHm+PkMAQRiGANkIAIU3PNrAz9dIItq8a/rEAhnx
l2shHoOyEmyNR2apholM3BPUNX50cbsc/HGdi7lZKLkA/ifAj6B9nFD2NzVsIChD
1QWVcvdkKlVuxXCDd26qbijlfmbTOAWrLw9ntvM+J6ZtECM6zCAZF4MAV/FwogPq
ETGKD76AxJtVIhBMS99troAiC1YxmQ7DKgEr8CraTOR1qwXEonnPCmN/IZA6x2/G
EXiihOuQB5me1X7k4PI0V8CDscQOn+3B2CQHIrjRB+KiTF+iKIuI8n6ORC6bpFh+
U8UZP9wLlIG1BrUHG83pIndglIHotqPcjmtfl1WGrRr2hn7abzVSfV+g5Syo3Vg=
=Ep+s
-----END PGP SIGNATURE-----
drm: backmerge tag 'v3.17-rc5' into drm-next
This is requested to get the fixes for intel and radeon into the
same tree for future development work.
i915_display.c: fix missing dev_priv conflict.
This allows us to specify if we want to sync to
the shared fences of a reservation object or not.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If we assign a Radeon device to a virtual machine, we can no longer
assume a fixed hardware topology, like the GPU having a parent device.
This patch simply adds a few pci_is_root_bus() tests to avoid passing
a NULL pointer to PCI access functions, allowing the radeon driver to
work in a QEMU 440FX machine with an assigned HD8570 on the emulated
PCI root bus.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Blocking completely innocent processes with a GPU reset is
a pretty bad idea. Just set needs_reset and let the next
command submission or fence wait do the job.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes a problem with GPU resets and TLB flushes on SI/CIK.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to initialize the mask to 0 on init, otherwise it
keeps increasing.
bug:
https://bugzilla.kernel.org/show_bug.cgi?id=82581
v2: also fix cu count
v3: split count fix into separate patch
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: stable@vger.kernel.org
Fixes lockups due to CP read GPUVM faults when running piglit on Cape
Verde.
v2 (chk): apply the fix to R600+ as well, on CIK only the GFX CP has
a PFP, add more comments to R600 code, enable flushing again
v3: (agd5f): only apply to 7xx+. r6xx does not have the packet.
v4: (agd5f): split flush change into a separate patch, fix formatting
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
It isn't necessary for command streams generated by the kernel (at least
not while we aren't storing ring or indirect buffers in VRAM).
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch moves to radeon the initialization of compute vmid.
That initializations was done in kfd-->kgd interface, but doing it in radeon
as part of radeon's H/W initialization routines is more appropriate.
In addition, this simplifies the kfd-->kgd interface.
The patch removes the function from the interface file and from the interface
declaration file.
The function initializes memory apertures to fixed base/limit address and non
cached memory types.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Skip the "manual" pageflip completion checks via polling and
guessing in the vblank handler radeon_crtc_handle_vblank() on
asics which are known to reliably support hw pageflip completion
irqs. Those pflip irqs are a more reliable and race-free method
of handling pageflip completion detection, whereas the "classic"
polling method has some small races in combination with dpm on,
and with the reworked pageflip implementation since Linux 3.16.
On old asics without pflip irqs, the classic method is used.
On asics with known good pflip irqs, only pflip irqs are used
by default, but a new module parameter "use_pflipirqs" allows to
override this in case we encounter asics in the wild with
unreliable or faulty pflip irqs. A module parameter of 0 allows
to use the classic method only in such a case. A parameter of 1
allows to use both classic method and pflip irqs as additional
band-aid to avoid some small races which could happen with the
classic method alone. The setting 1 gives Linux 3.16 behaviour.
Hw pflip irqs are available since R600.
Tested on DCE-4, AMD Cedar - FirePro 2270.
v2: agd5f: only enable pflip interrupts on DCE4+ as they are not
reliable on older asics.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Older firmware didn't support the new nop packet.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Older firmware didn't support the new nop packet.
v2 (Andreas Boll):
- Drop usage of packet3 for new firmware
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Cc: stable@vger.kernel.org
This ensures the GPU sees all previous CPU writes to VRAM, which makes it
safe:
* For userspace to stream data from CPU to GPU via VRAM instead of GTT
* For IBs to be stored in VRAM instead of GTT
* For ring buffers to be stored in VRAM instead of GTT, if the HPD flush
is performed via MMIO
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Seems to make VM flushes more stable on SI and CIK.
v2: only use the PFP on the GFX ring on CIK
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: fix rebase onto drm-fixes
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Doesn't seem necessary, the GART table memory should be persistent.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds CIK support for the new ucode format.
v2: add size validation, integrate debug info
v3: add support for MEC2 on KV
v4: fix typos
v4: update to latest format
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is a halfway fix for hawaii acceleration. More fixes to come
but hopefully isolated to userspace.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This patch adds the interface between the radeon driver and the amdkfd driver.
The interface implementation is contained in radeon_kfd.c and radeon_kfd.h.
The interface itself is represented by a pointer to struct
kfd_dev. The pointer is located inside radeon_device structure.
All the register accesses that amdkfd need are done using this interface. This
allows us to avoid direct register accesses in amdkfd proper, while also
avoiding locking between amdkfd and radeon.
The single exception is the doorbells that are used in both of the drivers.
However, because they are located in separate pci bar pages, the danger of
sharing registers between the drivers is minimal.
Having said that, we are planning to move the doorbells as well to radeon.
v3:
Add interface for sa manager init and fini. The init function will allocate a
buffer on system memory and pin it to the GART address space via the radeon sa
manager.
All mappings of buffers to GART address space are done via the radeon sa
manager. The interface of allocate memory will use the radeon sa manager to sub
allocate from the single buffer that was allocated during the init function.
Change lower_32/upper_32 calls to use linux macros
Add documentation for the interface
v4:
Change ptr field type in kgd_mem from uint32_t* to void* to match to type that
is returned by radeon_sa_bo_cpu_addr
v5:
Change format of mqd structure to work with latest KV firmware
Add support for AQL queues creation to enable working with open-source HSA
runtime.
Move generic kfd-->kgd interface and other generic kgd definitions to a generic
header file that will be used by AMD's radeon and amdgpu drivers
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Implementing a lock for selecting and accessing shader engines and arrays.
This lock will make sure that radeon and amdkfd are not colliding when
accessing shader engines and arrays with GRBM_GFX_INDEX register.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Replace occurrences of "v & 0xffffffff" with lower_32_bits(v)
when it's next to an upper_32_bits(v). Also remove unnecessary
"upper_32_bits(v) & 0xffffffff" code snippets.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Merge drm-fixes into drm-next.
Both i915 and radeon need this done for later patches.
Conflicts:
drivers/gpu/drm/drm_crtc_helper.c
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem.c
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/gpu/drm/i915/i915_gem_gtt.c
This patch makes it possible to decide how many address
bits are spend on the page directory vs the page tables.
v2: remove unintended change
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>