Somewhat specializaed sub-allocator designed to perform sub-allocation
for command buffer not only for current cs ioctl but for future command
submission ioctl as well. Patch also convert current ib pool to use
the sub allocator. Idea is that ib poll buffer can be share with other
command buffer submission not having 64K granularity.
v2 Harmonize pool handling and add suspend/resume callback to pin/unpin
sa bo (tested on rv280, rv370, r420, rv515, rv610, rv710, redwood, cayman,
rs480, rs690, rs880)
v3 Simplify allocator
v4 Fix radeon_ib_get error path to properly free fence
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add a start fence driver helper function which will be call
once for each ring and will compute cpu/gpu addr for fence
depending on wether to use wb buffer or scratch reg.
This patch replace initialize fence driver separately which
was broken in regard of GPU lockup. The fence list for created,
emited, signaled must be initialize once and only from the
asic init callback not from the startup call back which is
call from the gpu reset.
v2: With this in place we no longer need to know the number of
rings in fence_driver_init, also writing to the scratch reg
before knowing its offset is a bad idea.
v3: rebase on top of change to previous patch in the serie
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
That naming seems to make more sense, since we not
only want to run PM4 rings with it.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Replace cp, cp1 and cp2 members with just an array
of radeon_cp structs.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Give all asic and radeon_ring_* functions a
radeon_cp parameter, so they know the ring to work with.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For supporting multiple CP ring buffers, async DMA
engines and UVD. We still need a way to synchronize
between engines.
v2 initialize unused fence driver ring to avoid issue in
suspend/unload
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new optional chunk to the CS ioctl that specifies optional flags
to the CS parser. Why this is useful is explained below. Note that some regs
no longer need the NOP relocation packet if this feature is enabled.
Tested on r300g and r600g with this flag disabled and enabled.
Assume there are two contexts sharing the same mipmapped tiled texture.
One context wants to render into the first mipmap and the other one
wants to render into the last mipmap. As you probably know, the hardware
has a MACRO_SWITCH feature, which turns off macro tiling for small mipmaps,
but that only applies to samplers.
(at least on r300-r500, though later hardware likely behaves the same)
So we want to just re-set the tiling flags before rendering (writing
packets), right? ... No. The contexts run in parallel, so they may
set the tiling flags simultaneously and then fire their command streams
also simultaneously. The last one setting the flags wins, the other one
loses.
Another problem is when one context wants to render into the first and
the last mipmap in one CS. Impossible. It must flush before changing
tiling flags and do the rendering into the smaller mipmaps in another CS.
Yet another problem is that writing copy_blit in userspace would be a mess
involving re-setting tiling flags to please the kernel, and causing races
with other contexts at the same time.
The only way out of this is to send tiling flags with each CS, ideally
with each relocation. But we already do that through the registers.
So let's just use what we have in the registers.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
After GPU lockup VRAM gart table is unpinned and thus its pointer
becomes unvalid. This patch move the unpin code to a common helper
function and set pointer to NULL so that page update code can check
if it should update GPU page table or not. That way bo still bound
to GART can be unbound (pci_unmap_page for all there page) properly
while there is no need to update the GPU page table.
V2 move the test for null gart out of the loop, small optimization
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was previously done for r300 only. Use %016llX instead of %08X for
printing the table address.
Also fix typos in gart warning messages.
Signed-off-by: Tormod Volden <debian.tormod@gmail.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
Update cpuset info & webiste for cgroups
dcdbas: force SMI to happen when expected
arch/arm/Kconfig: remove one to many l's in the word.
asm-generic/user.h: Fix spelling in comment
drm: fix printk typo 'sracth'
Remove one to many n's in a word
Documentation/filesystems/romfs.txt: fixing link to genromfs
drivers:scsi Change printk typo initate -> initiate
serial, pch uart: Remove duplicate inclusion of linux/pci.h header
fs/eventpoll.c: fix spelling
mm: Fix out-of-date comments which refers non-existent functions
drm: Fix printk typo 'failled'
coh901318.c: Change initate to initiate.
mbox-db5500.c Change initate to initiate.
edac: correct i82975x error-info reported
edac: correct i82975x mci initialisation
edac: correct commented info
fs: update comments to point correct document
target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
...
Trivial conflict in fs/eventpoll.c (spelling vs addition)
This is an important security fix because we allowed arbitrary values
to be passed to AARESOLVE_OFFSET. This also puts the right buffer address
in the register.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The colorbuffer, zbuffer, and texture states are checked only once when
they get changed. This improves performance in the apps which emit
lots of draw packets and few state changes.
This drops performance in glxgears by a 1% or so, but glxgears is not
a benchmark we care about.
The time spent in the kernel when running Torcs dropped from 33% to 23%
and the frame rate is higher, which is a good thing.
r600 might need something like this as well.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
First, we were calling mc_stop() at the top of the function
which turns off all MC (memory controller) clients,
then checking if the GPU is idle. If it was idle we
returned without re-enabling the MC clients which would
lead to a blank screen, etc. This patch checks if the
GPU is idle before calling mc_stop().
Second, if the reset failed, we were returning without
re-enabling the MC clients. This patch re-enables
the MC clients before returning regardless of whether
the reset was successful or not.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The CMASK RAM is for colorbuffer compression (used in conjunction
with MSAA). Only one user (filp) can access it.
The CMASK RAM access is managed in the same way as Hyper-Z, but there is
a separate ioctl, because an app that uses MSAA does not necessarily
have to use zbuffering.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When writeback is enabled, the GPU shadows writes to certain
registers into a buffer in memory. The driver can then read
the values from the shadow rather than reading back from the
register across the bus. Writeback can be disabled by setting
the no_wb module param to 1.
On r6xx/r7xx/evergreen, the following registers are shadowed:
- CP scratch registers
- CP read pointer
- IH write pointer
On r1xx-rr5xx, the following registers are shadowed:
- CP scratch registers
- CP read pointer
v2:
- Combine wb patches for r6xx-evergreen and r1xx-r5xx
- Writeback is disabled on AGP boards since it tends to be
unreliable on AGP using the gart.
- Check radeon_wb_init return values properly.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This interface allows userspace to request hyperz support, it probably
needs more locking, and really reporting that you can have hyperz is racy
since someone else might get it before you do.
v2: modify so we pass 0 valued packets to let DDX/r300c keep working.
also fixed incorrect 0x4f1c reference.
v3: fixup zb_bw_cntl so older drivers keep working
v4: add locking, fixup SC_HYPERZ_EN - patch stream to disable hiz
Signed-off-by: Dave Airlie <airlied@redhat.com>
On systems using kexec, the new kernel is booted straight from the old kernel, without any warning to the graphics driver. So the GPU is basically left as-is in a running state, however the CPU side is completly reset.
Without stating the saneness of anyone using kexec on live systems, we should at least try not to crash the GPU. This patch resets 3 registers to 0 that could cause bad things to happen to the running system.
This allows kexec to work on a Power6/RN50 system.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The asics in question have the following requirements with regard to
their gart setups:
1. The GART aperture size has to be in the form of 2^X bytes, where X is from 25 to 31
2. The GART aperture MC base has to be aligned to a boundary equal to the size of the
aperture.
3. The GART page table has to be aligned to the boundary equal to the size of the table.
4. The GART page table size is: table_entry_size * (aperture_size / page_size)
5. The GART page table has to be allocated in non-paged, non-cached, contiguous system
memory.
This patch takes care 2. The rest should already be handled properly.
This fixes a regression noticed by: Torsten Kaiser <just.for.lkml@googlemail.com>
Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=28459
agd5f: apply to r1xx/r2xx as well.
Signed-off-by: Roland Scheidegger <sroland@vmware.com>
Cc: stable@kernel.org
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* anholt/drm-intel-next: (515 commits)
drm/i915: Fix out of tree builds
drm/i915: move fence lru to struct drm_i915_fence_reg
drm/i915: don't allow tiling changes on pinned buffers v2
drm/i915: Be extra careful about A/D matching for multifunction SDVO
drm/i915: Fix DDC bus selection for multifunction SDVO
drm/i915: cleanup mode setting before unmapping registers
drm/i915: Make fbc control wrapper functions
drm/i915: Wait for the GPU whilst shrinking, if truly desperate.
drm/i915: Use spatio-temporal dithering on PCH
[MTD] Remove zero-length files mtdbdi.c and internal.ho
pata_pcmcia / ide-cs: Fix bad hashes for Transcend and kingston IDs
libata: Fix several inaccuracies in developer's guide
slub: Fix bad boundary check in init_kmem_cache_nodes()
raid6: fix recovery performance regression
KEYS: call_sbin_request_key() must write lock keyrings before modifying them
KEYS: Use RCU dereference wrappers in keyring key type code
KEYS: find_keyring_by_name() can gain access to a freed keyring
ALSA: hda: Fix 0 dB for Packard Bell models using Conexant CX20549 (Venice)
ALSA: hda - Add quirk for Dell Inspiron 19T using a Conexant CX20582
ALSA: take tu->qlock with irqs disabled
...
- Separate dynpm and profile based power management methods. You can select the pm method
by echoing the selected method ("dynpm" or "profile") to power_method in sysfs.
- Expose basic 4 profile in profile method
"default" - default clocks
"auto" - select between low and high based on ac/dc state
"low" - DC, low power mode
"high" - AC, performance mode
The current base profile is "default", but it should switched to "auto" once we've tested
on more systems. Switching the state is a matter of echoing the requested profile to
power_profile in sysfs. The lowest power states are selected automatically when dpms turns
the monitors off in all states but default.
- Remove dynamic fence-based reclocking for the moment. We can revisit this later once we
have basic pm in.
- Move pm init/fini to modesetting path. pm is tightly coupled with display state. Make sure
display side is initialized before pm.
- Add pm suspend/resume functions to make sure pm state is properly reinitialized on resume.
- Remove dynpm module option. It's now selectable via sysfs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Conflicts:
drivers/gpu/drm/i915/i915_dma.c
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/radeon/r300.c
The BSD ringbuffer support that is landing in this branch
significantly conflicts with the Ironlake PIPE_CONTROL fix on master,
and requires it to be tested successfully anyway.
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms/legacy: only enable load detection property on DVI-I
drm/radeon/kms: fix panel scaling adjusted mode setup
drivers/gpu/drm/drm_sysfs.c: sysfs files error handling
drivers/gpu/drm/radeon/radeon_atombios.c: range check issues
gpu: vga_switcheroo, fix lock imbalance
drivers/gpu/drm/drm_memory.c: fix check for end of loop
drivers/gpu/drm/via/via_video.c: fix off by one issue
drm/radeon/kms/agp The wrong AGP chipset can cause a NULL pointer dereference
drm/radeon/kms: r300 fix CS checker to allow zbuffer-only fastfill
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon: Fix sparc regression in r300_scratch()
drm: make sure vblank interrupts are disabled at DPMS time
drm/radeon/kms/evergreen: No EnableYUV table
drm/radeon: 9800 SE has only one quadpipe
drm/radeon/kms: don't print error for legal crtcs.
drm/radeon/kms/evergreen: fix LUT setup
Previous reset code leaded to computer hard lockup (need to unplug
the power too reboot the computer) on various configuration. This
patch change the reset code to avoid hard lockup. The GPU reset
is failing most of the time but at least user can log in remotely
or properly shutdown the computer.
Two issues were leading to hard lockup :
- Writting to the scratch register lead to hard lockup most likely
because the write back mecanism is in fuzy state after GPU lockup.
- Resetting the GPU memory controller and not reinitializing it
after leaded to hard lockup. We did only reinitialize in case of
successfull reset thus unsuccessfull reset quickly leaded to hard
lockup.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: add FireMV 2400 PCI ID.
drm/radeon/kms: allow R500 regs VAP_ALT_NUM_VERTICES and VAP_INDEX_OFFSET
drivers/gpu/radeon: Add MSPOS regs to safe list.
drm/radeon/kms: disable the tv encoder when tv/cv is not in use
drm/radeon/kms: adjust pll settings for tv
drm/radeon/kms: fix tv dac conflict resolver
drm/radeon/kms/evergreen: don't enable hdmi audio stuff
drm/radeon/kms/atom: fix dual-link DVI on DCE3.2/4.0
drm/radeon/kms: fix rs600 tlb flush
drm/radeon/kms: print GPU family and device id when loading
drm/radeon/kms: fix calculation of mipmapped 3D texture sizes
drm/radeon/kms: only change mode when coherent value changes.
drm/radeon/kms: more atom parser fixes (v2)
[airlied: fix V_A_N_V to not be safe and fix check to make sure only r500
- bump userspace version]
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (29 commits)
drm/nouveau: bail out of auxch transaction if we repeatedly recieve defers
drm/nv50: implement gpio set/get routines
drm/nv50: parse/use some more de-magiced parts of gpio table entries
drm/nouveau: store raw gpio table entry in bios gpio structs
drm/nv40: Init some tiling-related PGRAPH state.
drm/nv50: Add NVA3 support in ctxprog/ctxvals generator.
drm/nv50: another dodgy DP hack
drm/nv50: punt hotplug irq handling out to workqueue
drm/nv50: preserve an unknown SOR_MODECTRL value for DP encoders
drm/nv50: Allow using the NVA3 new compute class.
drm/nv50: cleanup properly if PDISPLAY init fails
drm/nouveau: fixup the init failure paths some more
drm/nv50: fix instmem init on IGPs if stolen mem crosses 4GiB mark
drm/nv40: add LVDS table quirk for Dell Latitude D620
drm/nv40: rework lvds table parsing
drm/nouveau: detect vram amount once, and save the value
drm/nouveau: remove some unused members from drm_nouveau_private
drm/nouveau: Make use of TTM busy_placements.
drm/nv50: add more 0x100c80 flushy magic
drm/nv50: fix fbcon when framebuffer above 4GiB mark
...
This simplify and improve GPU reset for R1XX-R6XX hw, it's
not 100% reliable here are result:
- R1XX/R2XX works bunch of time in a row, sometimes it
seems it can work indifinitly
- R3XX/R3XX the most unreliable one, sometimes you will be
able to reset few times, sometimes not even once
- R5XX more reliable than previous hw, seems to work most
of the times but once in a while it fails for no obvious
reasons (same status than previous reset just no same
happy ending)
- R6XX/R7XX are lot more reliable with this patch, still
it seems that it can fail after a bunch (reset every
2sec for 3hour bring down the GPU & computer)
This have been tested on various hw, for some odd reasons
i wasn't able to lockup RS480/RS690 (while they use to
love locking up).
Note that on R1XX-R5XX the cursor will disapear after
lockup haven't checked why, switch to console and back
to X will restore cursor.
Next step is to record the bogus command that leaded to
the lockup.
V2 Fix r6xx resume path to avoid reinitializing blit
module, use the gpu_lockup boolean to avoid entering
inifinite waiting loop on fence while reiniting the GPU
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Patch rename gpu_reset to asic_reset in prevision of having
gpu_reset doing more stuff than just basic asic reset.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch cleanup the fence code, it drops the timeout field of
fence as the time to complete each IB is unpredictable and shouldn't
be bound.
The fence cleanup lead to GPU lockup detection improvement, this
patch introduce a callback, allowing to do asic specific test for
lockup detection. In this patch the CP is use as a first indicator
of GPU lockup. If CP doesn't make progress during 1second we assume
we are facing a GPU lockup.
To avoid overhead of testing GPU lockup frequently due to fence
taking time to be signaled we query the lockup callback every
500msec. There is plenty code comment explaining the design & choise
inside the code.
This have been tested mostly on R3XX/R5XX hw, in normal running
destkop (compiz firefox, quake3 running) the lockup callback wasn't
call once (1 hour session). Also tested with forcing GPU lockup and
lockup was reported after the 1s CP activity timeout.
V2 switch to 500ms timeout so GPU lockup get call at least 2 times
in less than 2sec.
V3 store last jiffies in fence struct so on ERESTART, EBUSY we keep
track of how long we already wait for a given fence
V4 make sure we got up to date cp read pointer so we don't have
false positive
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
radeon_gart_fini might call GART unbind callback function which
might try to access GART table but if gart_disable is call first
the GART table will be unmapped so any access to it will oops.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
- rs780/880 were using the wrong bandwidth functions
- convert r1xx-r4xx to use the same pm sclk/mclk structs as
r5xx+
- move bandwidth setup to a common function
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Look up i2c bus in the power table and expose it.
You'll need to load a hwmon driver for any chips
on the bus, this patch just exposes the bus.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>