This is required to properly calculate the tiling parameters
in userspace.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A single doorbell page is plenty for cik kms compute.
Use a single page and manage doorbell allocation by
individual doorbells rather than pages. Identify
doorbells by their index rather than byte offset.
Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To workaround bugs and/or certain limits it's sometimes
useful to fall back to waiting on fences.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Bit a bit -fixes pull request in the merge window than usual dua to two
feauture-y things:
- Display CRCs are now enabled on all platforms, including the odd DP case
on gm45/vlv. Since this is a testing-only feature it should ever hurt,
but I figured it'll help with regression-testing -fixes. So I left it
in and didn't postpone it to 3.14.
- Display power well refactoring from Imre. Would have caused major pain
conflict with the bdw stage 1 patches if I'd postpone this to -next.
It's only an relatively small interface rework, so shouldn't cause pain.
It's also been in my tree since almost 3 weeks already.
That accounts for about two thirds of the pull, otherwise just bugfixes:
- vlv backlight fix from Jesse/Jani
- vlv vblank timestamp fix from Jesse
- improved edp detection through vbt from Ville (fixes a vlv issue)
- eDP vdd fix from Paulo
- fixes for dvo lvds on i830M
- a few smaller things all over
Note: This contains a backmerge of v3.12. Since the -internal branch
always applied on top of -nightly I need that unified base to merge bdw
patches. So you'll get a conflict with radeon connector props when pulling
this (and nouveau/master will also conflict a bit when Ben doesn't
rebase). The backmerge itself only had conflicts in drm/i915.
There's also a tiny conflict between Jani's backlight fix and your sysfs
lifetime fix in drm-next.
* tag 'drm-intel-fixes-2013-11-07' of git://people.freedesktop.org/~danvet/drm-intel: (940 commits)
drm/i915/vlv: use per-pipe backlight controls v2
drm/i915: make backlight functions take a connector
drm/i915: move opregion asle request handling to a work queue
drm/i915/vlv: use PIPE_START_VBLANK interrupts on VLV
drm/i915: Make intel_dp_is_edp() less specific
drm/i915: Give names to the VBT child device type bits
drm/i915/vlv: enable HDA display audio for Valleyview2
drm/i915/dvo: call ->mode_set callback only when the port is running
drm/i915: avoid unclaimed registers when capturing the error state
drm/i915: Enable DP port CRC for the "auto" source on g4x/vlv
drm/i915: scramble reset support for DP port CRC on vlv
drm/i915: scramble reset support for DP port CRC on g4x
drm/i916: add "auto" pipe CRC source
...
Conflicts:
MAINTAINERS
drivers/gpu/drm/i915/intel_panel.c
drivers/gpu/drm/nouveau/core/subdev/mc/base.c
drivers/gpu/drm/radeon/atombios_encoders.c
drivers/gpu/drm/radeon/radeon_connectors.c
op 08-10-13 18:58, Thomas Hellstrom schreef:
> On 10/08/2013 06:47 PM, Jerome Glisse wrote:
>> On Tue, Oct 08, 2013 at 06:29:35PM +0200, Thomas Hellstrom wrote:
>>> On 10/08/2013 04:55 PM, Jerome Glisse wrote:
>>>> On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
>>>>> Am 08.10.2013 16:33, schrieb Jerome Glisse:
>>>>>> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>>>>>>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>>>>>>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>>>>>>> in ttm_bo_vm_fault without upsetting lockdep.
>>>>>>>
>>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>> I would say NAK. Current code only allocate temporary page in AGP case.
>>>>>> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>>>>>>
>>>>>> Non AGP is directly memcpy to radeon IB.
>>>>>>
>>>>>> Your patch allocate memory memcpy userspace to it and it will then be
>>>>>> memcpy to IB. Which means you introduce an extra memcpy in the process
>>>>>> not something we want.
>>>>> Totally agree. Additional to that there is no good reason to provide
>>>>> anything else than anonymous system memory to the CS ioctl, so the
>>>>> dependency between the mmap_sem and reservations are not really
>>>>> clear to me.
>>>>>
>>>>> Christian.
>>>> I think is that in other code path you take mmap_sem first then reserve
>>>> bo. But here we reserve bo and then we take mmap_sem because of copy
>>> >from user.
>>>> Cheers,
>>>> Jerome
>>>>
>>> Actually the log message is a little confusing. I think the mmap_sem
>>> locking inversion problem is orthogonal to what's being fixed here.
>>>
>>> This patch fixes the possible recursive bo::reserve caused by
>>> malicious user-space handing a pointer to ttm memory so that the ttm
>>> fault handler is called when bos are already reserved. That may
>>> cause a (possibly interruptible) livelock.
>>>
>>> Once that is fixed, we are free to choose the mmap_sem ->
>>> bo::reserve locking order. Currently it's bo::reserve->mmap_sem(),
>>> but the hack required in the ttm fault handler is admittedly a bit
>>> ugly. The plan is to change the locking order to
>>> mmap_sem->bo::reserve
>>>
>>> I'm not sure if it applies to this particular case, but it should be
>>> possible to make sure that copy_from_user_inatomic() will always
>>> succeed, by making sure the pages are present using
>>> get_user_pages(), and release the pages after
>>> copy_from_user_inatomic() is done. That way there's no need for a
>>> double memcpy slowpath, but if the copied data is very fragmented I
>>> guess the resulting code may look ugly. The get_user_pages()
>>> function will return an error if it hits TTM pages.
>>>
>>> /Thomas
>> get_user_pages + copy_from_user_inatomic is overkill. We should just
>> do get_user_pages which fails with ttm memory and then use copy_highpage
>> helper.
>>
>> Cheers,
>> Jerome
> Yeah, it may well be that that's the preferred solution.
>
> /Thomas
>
I still disagree, and shuffled radeon_ib_get around to be called sooner.
How does the patch below look?
8<-------
Allocate and copy all kernel memory before doing reservations. This prevents a locking
inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
in ttm_bo_vm_fault without upsetting lockdep.
Changes since v1:
- Kill extra memcpy for !AGP case.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The DMA ring seems to be stable now.
v2: remove pt_ring_index as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stop fiddling with jiffies, always wait for RADEON_FENCE_JIFFIES_TIMEOUT.
Consolidate the two wait sequence implementations into just one function.
Activate all waiters and remember if the reset was already done instead of
trying to reset from only one thread.
v2: clear reset flag earlier to avoid timeout in IB test
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This hooks radeon up to the runtime PM system to enable
dynamic power management for secondary GPUs in switchable
and powerxpress laptops.
v2: agd5f: clean up, add module parameter
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We use u16 for voltage values throughout the driver so switch
the table values to a u16 as well. Fixes an incompatible
cast error in ci_patch_clock_voltage_limits_with_vddc_leakage()
picked up by coverity.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
bapm is a pm feature for sharing the power budget between
the GPU and the CPU on APUs. It needs to be enabled or
disabled in certain circumstances. For now, disable it
when on battery and enable it when on AC power.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds spinlocks to protect access to other
indirect register apertures. These indirect spaces are
used pretty infrequently and we haven't had an reported
problems, but better safe than sorry.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
smc registers are access indirectly via the main mmio aperture, so
there may be problems with concurrent access. This adds a spinlock
to protect access to this register space.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex writes:
This is the radeon drm-next request. Big changes include:
- support for dpm on CIK parts
- support for ASPM on CIK parts
- support for berlin GPUs
- major ring handling cleanup
- remove the old 3D blit code for bo moves in favor of CP DMA or sDMA
- lots of bug fixes
[airlied: fix up a bunch of conflicts from drm_order removal]
* 'drm-next-3.12' of git://people.freedesktop.org/~agd5f/linux: (898 commits)
drm/radeon/dpm: make sure dc performance level limits are valid (CI)
drm/radeon/dpm: make sure dc performance level limits are valid (BTC-SI) (v2)
drm/radeon: gcc fixes for extended dpm tables
drm/radeon: gcc fixes for kb/kv dpm
drm/radeon: gcc fixes for ci dpm
drm/radeon: gcc fixes for si dpm
drm/radeon: gcc fixes for ni dpm
drm/radeon: gcc fixes for trinity dpm
drm/radeon: gcc fixes for sumo dpm
drm/radeonn: gcc fixes for rv7xx/eg/btc dpm
drm/radeon: gcc fixes for rv6xx dpm
drm/radeon: gcc fixes for radeon_atombios.c
drm/radeon: enable UVD interrupts on CIK
drm/radeon: fix init ordering for r600+
drm/radeon/dpm: only need to reprogram uvd if uvd pg is enabled
drm/radeon: check the return value of uvd_v1_0_start in uvd_v1_0_init
drm/radeon: split out radeon_uvd_resume from uvd_v4_2_resume
radeon kms: fix uninitialised hotplug work usage in r100_irq_process()
drm/radeon/audio: set up the sads on DCE3.2 asics
drm/radeon: fix handling of variable sized arrays for router objects
...
Conflicts:
drivers/gpu/drm/i915/i915_dma.c
drivers/gpu/drm/i915/i915_gem_dmabuf.c
drivers/gpu/drm/i915/intel_pm.c
drivers/gpu/drm/radeon/cik.c
drivers/gpu/drm/radeon/ni.c
drivers/gpu/drm/radeon/r600.c
Resturcture clockgating code so that it can be
enabled/disabled from other components such as
dpm.
v2: make function static
v3: add fine grained cg controls
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commits adds flags for supported clockgating and
powergating features. This allows us to more easily
track which features are supported on a particular
asic and to enable/disable features for debugging.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Similar to DCE4/5, but supports multiple audio pins
which can be assigned per afmt block.
v2: rework the driver to handle more than one audio
pin.
v3: try different dto reg
v4: properly program dto
v5 (ck): change dto programming order
v6: program speaker allocation block
v7: rebase
v8: rebase on Rafał's changes
v9: integrated Rafał's comments, update to latest
drm_edid_to_speaker_allocation API
v10: add missing line break in error message
v11: add back audio enabled messages
v12: fix copy paste typo in r600_audio_enable
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Now that we have callbacks for [rw]ptr handling we can
remove the special handling for the DMA rings and use
the callbacks instead.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The hardware just doesn't support this correctly.
Disable it before we accidentally write anywhere we shouldn't.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Give the ring functions a separate structure and let the asic
structure point to the ring specific functions. This simplifies
the code and allows us to make changes at only one point.
No change in functionality.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Starting on CIK, multi-media blocks like UVD no longer
have special power state. Rather they have their own
DPM implementation which adjusts their clocks dynamically
when active. When they are not active, the blocks are
powergated to save power.
v2: add missing pm locks
v3: rebase on uvd state selection rework
v4: fix inverted logic typo noticed by Christian
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds dpm support for btc asics. This includes:
- dynamic engine clock scaling
- dynamic memory clock scaling
- dynamic voltage scaling
- dynamic pcie gen switching
Set radeon.dpm=1 to enable.
v2: remove unused radeon_atombios.c changes,
make missing smc ucode non-fatal
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Handle the the thermal state directly in the work handler.
Remove the state selection function since nothing else uses it now.
2. On some asics there is no thermal state, so we just use a regular
state and force the low performance state.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the UVD handle information to determine which
which power states to select when using UVD. For
example, decoding a single SD stream requires much
lower clocks than multiple HD streams.
v2: switch to a cleaner dpm/uvd interface
v3: change the uvd power state while streams
are active if need be
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a helper function for counting the number of open stream handles.
v2: fix copy-pasta in comments and whitespace error
v3: make function static since it's only used in radeon_uvd.c
at the moment
v4: make non-static again for future changes
v5: make static again for new rework of dpm uvd changes
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise just reinitialize from scratch on resume,
and so make it more likely to succeed.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
All the gem based kms drivers really want the same function to
destroy a dumb framebuffer backing storage object.
So give it to them and roll it out in all drivers.
This still leaves the option open for kms drivers which don't use GEM
for backing storage, but it does decently simplify matters for gem
drivers.
Acked-by: Inki Dae <inki.dae@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Ben Skeggs <skeggsb@gmail.com>
Reviwed-by: Rob Clark <robdclark@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Covers requirements of all current asics.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
There are cases where we need more than 4k alignment. No
functional change with this commit.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Changing the UVD BOs offset on suspend/resume doesn't work because the VCPU
internally keeps pointers to it. Just keep it always pinned and save the
content manually.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66425
v2: fix compiler warning
v3: fix CIK support
Note: a version of this patch needs to go to stable.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the vblank time is too short to adjust mclk,
assume multiple displays (no mclk adjustments).
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows you to force specific power levels within a power
state. Due to hardware restrictions between generations, the
interface is limited to the following 3 selections:
auto: all levels enabled
low: forced to the lowest power level
high: forced to the highest power level
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain older rv770 asics have both a performance and
a 3D performance state rather than just multiple performance
levels in the state power state. The current code would
select the performance state rather than the 3D performance
state when the "performance" profile was selected. This change
switches to the "balanced" profile by default which ends up being
the internal performance profile. When the user selects the
"performance" profile, it selects the internal 3D performance
state so the user can select the higher performance modes.
For most asics this changes nothing. For certain rv770 asics
with static performance and 3D performance states, this allows
you to select between then using by selecting the "balanced"
and "performance" dpm profiles.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit converts the source of the val_seq counter to
the ww_mutex api. The reservation objects are converted later,
because there is still a lockdep splat in nouveau that has to
resolved first.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds dpm support for SI asics. This includes:
- dynamic engine clock scaling
- dynamic memory clock scaling
- dynamic voltage scaling
- dynamic pcie gen1/gen2/gen3 switching
- power containment
- shader power scaling
Set radeon.dpm=1 to enable.
v2: enable hainan support, rebase
v3: guard acpi stuff
v4: fix 64 bit math
v5: fix 64 bit div harder
v6: fix thermal interrupt check noticed by Jerome
v7: attempt fix state enable
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Validate the voltages against the voltage requirements of the
dispclk. We currently don't adjust the disp clock so it never
changes, but we need to filter out voltage levels that are too
low none the less.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There's a new table for calculating the memory pll
parameters on SI. Required for SI DPM support.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Now that the proper fix has been implemented I can
remove the last remnants of the initial implementation.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use a dedicated copy of the current power state since
we may have to adjust it on the fly.
v2: fix up redundant state sets
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When using UVD, the driver must switch to a special UVD power
state. In the CS ioctl, switch to the power state and schedule
work to change the power state back, when the work comes up,
check if uvd is still busy and if not, switch back to the user
state, otherwise, reschedule the work.
Note: We really need some better way to decide when to
switch out of the uvd power state. Switching power states
while playback is active make uvd angry.
V2: fix locking.
V3: switch from timer to delayed work
V4: check fence driver for UVD jobs, reduce timeout to
1 second and rearm timeout on activity
v5: rebase on new dpm tree
v6: rebase on interim uvd on demand changes
v7: fix UVD when DPM is disabled
v8: unify non-DPM and DPM UVD handling
v9: remove leftover idle work struct
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This adds dpm support for rv7xx asics. This includes:
- clockgating
- dynamic engine clock scaling
- dynamic memory clock scaling
- dynamic voltage scaling
- dynamic pcie gen1/gen2 switching
Set radeon.dpm=1 to enable.
v2: reduce stack usage
v3: fix 64 bit div
v4: fix state enable
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds dpm support for rv6xx asics. This includes:
- clockgating
- dynamic engine clock scaling
- dynamic memory clock scaling
- dynamic voltage scaling
- dynamic pcie gen1/gen2 switching
Set radeon.dpm=1 to enable.
v2: remove duplicate line
v3: fix thermal interrupt check noticed by Jerome
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This adds the common dpm (dynamic power management)
infrastructure:
- dpm callbacks
- dpm init/fini/suspend/resume
- dpm power state selection
No device specific code is enabled yet.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dpm needs access to atombios data and command tables
for setup and calculation of a number of parameters.
v2: endian fix
v3: fix mc reg table bug
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is required for certain advanced functionality.
v2: save/restore list takes dword offsets
v3: rebase on gpu reset changes
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On CIK, the compute rings work slightly differently than
on previous asics, however the basic concepts are the same.
The main differences:
- New MEC engines for compute queues
- Multiple queues per MEC:
- CI/KB: 1 MEC, 4 pipes per MEC, 8 queues per pipe = 32 queues
- KV: 2 MEC, 4 pipes per MEC, 8 queues per pipe = 64 queues
- Queues can be allocated and scheduled by another queue
- New doorbell aperture allows you to assign space in the aperture
for the wptr which allows for userspace access to queues
v2: add wptr shadow, fix eop setup
v3: fix comment
v4: switch to new callback method
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
The doorbell aperture is a PCI BAR whose pages can be
mapped to compute resources for things like wptrs
for userspace queues.
This patch maps the BAR and sets up a simple allocator
to allocate pages from the BAR.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add callbacks to the radeon_asic struct to handle
rptr/wptr fetchs and wptr updates.
We currently use one version for all rings, but this
allows us to override with a ring specific versions.
Needed for compute rings on CIK.
v2: udpate as per Christian's comments
v3: fix some rebase cruft
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CIK (DCE8) hw cursors are programmed the same as evergreen
(DCE4) with the following caveats:
- cursors are now 128x128 pixels
- new alpha blend enable bit
v2: rebase
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CIK has new asynchronous DMA engines called sDMA
(system DMA). Each engine supports 1 ring buffer
for kernel and gfx and 2 userspace queues for compute.
TODO: fill in the compute setup.
v2: update to the latest reset code
v3: remove ib_parse
v4: fix copy_dma()
v5: drop WIP compute sDMA queues
v6: rebase
v7: endian fixes for IB
v8: cleanup for release
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently the driver required 6 sets of ucode:
1. pfp - pre-fetch parser, part of the GFX CP
2. me - micro engine, part of the GFX CP
3. ce - constant engine, part of the GFX CP
4. rlc - interrupt, etc. controller
5. mc - memory controller (discrete cards only)
6. mec - compute engines, part of Compute CP
V2: add documentation
V3: update MC ucode
V4: rebase
V5: update mc ucode
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2: tiling fixes
v3: more tiling fixes
v4: more tiling fixes
v5: additional register init
v6: rebase
v7: fix gb_addr_config for KV/KB
v8: drop wip KV bits for now, add missing config reg
v9: fix cu count on Bonaire
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
New asics support non-privileged IBs. This allows us
to skip IB checking in the driver since the hardware
will check the command buffers for us. When using
non-privileged IBs, if the CP encounters an illegal
register in the command stream, it will halt and generate
an interrupt. The CP needs to be reset to continue. For now
just do a full GPU reset when this happens.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of duplicating the code over and over again, just use a single
function to handle the clock calculations.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is to allow debugging of userspace program not freeing buffer
after, which is basicly a memory leak. This print the list of all
gem object along with their size and placement (VRAM,GTT,CPU) and
with the pid of the task that created them.
agd5f: add warning fix
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Golden registers are arrays of register settings from the
hw team that need to be initialized at asic startup.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Register audio callbacks for asic where we support
audio. Cleans up the code and makes it easier to
add support for newer asics.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
That not only saves some power, but also solves problems with
older chips where an idle UVD block on higher clocks can
cause problems.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow userspace to query for the tile mode array so userspace can properly
compute surface pitch and alignment requirement depending on tiling.
v2: Make strict aliasing safer by casting to char when copying
v3: merge fix from Christian
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Just everything needed to decode videos using UVD.
v6: just all the bugfixes and support for R7xx-SI merged in one patch
v7: UVD_CGC_GATE is a write only register, lockup detection fix
v8: split out VRAM fallback changes, remove support for RV770,
add support for HEMLOCK, add buffer sizes checks
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Let the CS module decide if we can fall back to VRAM or not.
v2: remove unintended change
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch allows the CPU to map the stolen vram segment
directly rather than going through the PCI BAR. This
significantly improves performance for certain workloads with
a properly patched ddx.
Use radeon.fastfb=1 to enable it (disabled by default).
Currently only supported on RS690, but support for RS780/880
and newer APUs may be added eventually.
Signed-off-by: Samuel Li <samuel.li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a per-asic MC (memory controller) mask which holds the
mak address mask the asic is capable of. Use this when
calculating the vram and gtt locations rather using asic
specific functions or limiting everything to 32 bits.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
More drm-next bits for radeon. Just bug fixes.
* 'drm-next-3.9' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: properly validate the atpx interface
drm/radeon: switch get_gpu_clock() to a callback (v2)
drm/radeon: add a asic callback to get the xclk
drm/radeon: Avoid NULL pointer dereference from atom_index_iio() allocation failure
drm/radeon: remove overzealous warning in hdmi handling
drm/radeon: fix multi-head power profile stability on BTC+ asics
This is required to get the reference clock used
by the gfx engine for things like timestamps. Fixes
support for GL extensions the use timestamps on
certain boards.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Simplify the Radeon prime implementation by using the default behavior provided
by drm_gem_prime_import and drm_gem_prime_export.
v2:
- Rename functions to radeon_gem_prime_get_sg_table and
radeon_gem_prime_import_sg_table.
- Delete the now-unused vmapping_count variable.
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For very large page table updates, we can exceed the
size of the ring. To avoid this, use an IB to perform
the page table update.
v2(ck): cleanup the IB infrastructure and the use it instead
of filling the struct ourself.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Used by all asic families from r600+.
Flag for the vbios and later instances of the driver
that the GPU is hung.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
next_reloc function does the same thing in all ASICs with
the exception of R600 which has a special case in legacy mode.
Pull out the common function in preparation for refactoring.
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This function is not limited to r100, but it can dump a
(raw) packet for any ASIC. Rename it accordingly and move
its declaration to radeon.h
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
vline packet parsing function for R600 and Evergreen+ are
the same, except that they use different registers. Factor
out the algorithm into a common function that uses register
table passed from ASIC-specific caller.
This reduces ASIC-specific function to (trivial) setup
of register table and call into the common function.
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Once we factored out radeon_cs_packet_parse function,
evergreen_cs_next_is_pkt3_nop and r600_cs_next_is_pkt3_nop
functions became identical, so they can be factored out
into a common function.
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We now have a common radeon_cs_packet_parse function
that is good for all ASICs. Hook it up and eliminate
ASIC-specific versions.
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The idea here is to move to a finer grained reset.
In some cases we may not need reset every block, and
in other cases we may not need to re-init the entire
asic.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
radeon_fence_wait_empty_locked should not trigger GPU reset as no
place where it's call from would benefit from such thing and it
actually lead to a kernel deadlock in case the reset is triggered
from pm codepath. Instead force ring completion in place where it
makes sense or return early in others.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Force all fence to signal if GPU reset failed so no process get stuck
on waiting fence.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Allows us to use the DMA ring from userspace.
DMA doesn't have a good NOP packet in which to embed the
reloc idx, so userspace has to add a reloc for each
buffer used and order them to match the command stream.
v2: fix address bounds checking, reloc indexing
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With the new per-crtc locking mutliple set-cursor calls could happen
in parallel. Out of sheer paranoia I've opted for an irqsave spinlock.
But if there's indeed an access from interrupt contexts to these regs
it's already broken with the old code, so this can likely just be
reduced to a normal spinlock. Otoh the pageflip completion happens
from the vblank irq handler ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Just refactoring to make the next patche simpler. Now all indirect register
access in the new modesetting driver should go through the r100_mm_(w|r)reg
fucntions.
RADEON_READ_MM from the old driver seems to be totally unused, so just kill
it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The bo creation placement is where the bo will be. Instead of trying
to move bo at each command stream let this work to another worker
thread that will use more advance heuristic.
agd5f: remove leftover unused variable
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
There are 2 async DMA engines on cayman, one at 0xd000 and
one at 0xd800. The programming interface is the same as
evergreen however there are some changes to the commands
for using vmids.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make it possible to allocate a persistent page table.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We want to use VMs without the IB pool in the future.
v2: also remove it from radeon_vm_finish.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Based on Dmitries work, but splitting the code into page
directory and page table handling makes it far more
readable and (hopefully) more reliable.
Allocations of page tables are made from the SA on demand,
that should still work fine since all page tables are of
the same size.
Also using the fact that allocations from the SA are mostly
continuously (except for end of buffer wraps and under very
high memory pressure) to group updates send to the chipset
specific code into larger chunks.
v3: mostly a rewrite of Dmitries previous patch.
v4: fix some typos and coding style
Signed-off-by: Dmitry Cherkasov <Dmitrii.Cherkasov@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pass the vm and ring index rather than an IB. This allows
us to use the vm_flush interface for non-IB cases in the
future.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
PDE/PTE update code uses CP ring for memory writes.
All page table entries are preallocated for now in alloc_pt().
It is made as whole because it's hard to divide it to several patches
that compile and doesn't break anything being applied separately.
Tested on cayman card.
v2: rebased on top of "refactor set_page chipset interface v3",
code cleanups
v3: switched offsets calc macros to inline funcs where possible,
remove pd_addr from radeon_vm, switched RADEON_BLOCK_SIZE define,
to 9 (and PTE_COUNT to 1 << BLOCK_SIZE)
v4 (ck): move "incr" documentation to previous patch, cleanup and
document RADEON_VM_* constants, change commit message to
our usual format, simplify patch allot by removing
everything current not necessary, disable SI workaround.
v5: (agd5f): Fix typo in tables_size calculation in
radeon_vm_alloc_pt(). Second line should have been
'+=' rather than '='.
v6: fix npdes calculation. In scenario when pfns to be mapped overlap
two PDE spans:
+-----------+-------------+
| PDE span | PDE span |
+-----------+----+--------+
| |
+---------+
| pfns |
+---------+
the following npdes calculation gives incorrect result:
npdes = (nptes >> RADEON_VM_BLOCK_SIZE) + 1;
For the case above picture it should give npdes = 2, but gives one.
This patch corrects it by rounding last pfn up to 512 border,
first - down to 512 border and then subtracting and dividing by 512.
v7: Make npde calculation clearer, fix ndw calculation.
v8: (agd5f): reserve enough for 2 full VM PTs, add some
additional comments.
v9: fix typo in npde calculation
Signed-off-by: Dmitry Cherkasov <Dmitrii.Cherkasov@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cleanup the interface in preparation for hierarchical page tables.
v2: add incr parameter to set_page for simple scattered PTs uptates
added PDE-specific flags to r600_flags and radeon_drm.h
removed superfluous value masking with 0xffffffff
v3: removed superfluous bo_va->valid checking
changed R600_PTE_VALID to R600_ENTRY_VALID to handle PDE too
v4 (ck): fix indention style, rework and fix typos in commit message,
add documentation for incr parameter, also use incr
parameter for system pages
v5 (agd5f): use upper_32_bits() and minor white space fixes
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dmitry Cherkassov <Dmitrii.Cherkasov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roughly based on how nouveau is handling it. Instead of
adding the bo_va when the address is set add the bo_va
when the handle is opened, but set the address to zero
until userspace tells us where to place it.
This fixes another bunch of problems with glamor.
v2: agd5f: fix build after dropping patch 7/8.
Signed-off-by: Christian König <deathsimple@vodafone.de>
It doesn't really belong into the object functions,
also rename it to avoid collisions with struct radeon_bo_va.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Even GPUs can have a null pointer dereference, so move
the IB pool to another offset to catch those.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Currently doing the update with the CP.
v2: Rebased on Jeromes bugfix. Make validity comparison
more human readable.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Removing the need to wait for anything.
Still not ideal, since we need to free pt on va remove.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Move binding onto the ring, simplifying handling a bit.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Move flushing the VMs as function into the rings.
First step to make VM operations async.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>