This debugging trace was useful for finding the fbcon regression on
i965, and it may prove useful again in future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Incorporates a similar patch by Daniel Vetter, the alteration being to
report the current busy state after retiring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
This avoids the excess flush and requests on idle rings (and spamming
the debug log ;-)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
This is required should we ever attempt to use an io-mapping where
KM_USER0 is verboten, such as inside an IRQ context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When creating an object, we create the handle by which it is known to
the process and which own the reference to the object. That reference to
the new handle is what we want to transfer to the process, not the lost
reference to the object; so free the local object reference *not* the
process's handle reference.
This brings i915_gem_object_create_ioctl() into line with
drm_gem_open_ioctl()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
If we fail to flush outstanding GPU writes but return the memory to the
system, we risk corrupting memory should the GPU recovery and complete
those writes. On the other hand, if we bail early and free the object
then we have a definite use-after-free and real memory corruption.
Choose the lesser of two evils, since in order to recover from the hung
GPU we need to completely reset it, those pending writes should
never happen.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
If during the freeing of an object the unbind is interrupted by a system
call, which is quite possible if we have outstanding GPU writes that
must be flushed, the unbind is silently aborted. This still leaves the
AGP region and backing pages allocated, and perhaps more importantly,
the object remains upon the various lists exposing us to memory
corruption.
I think this is the cause behind the use-after-free, such as
Bug 15664 - Graphics hang and kernel backtrace when starting Azureus
with Compiz enabled
https://bugzilla.kernel.org/show_bug.cgi?id=15664
v2: Daniel Vetter reminded me that kernel space programming is never easy.
We cannot simply spin to clear the pending signal and so must deferred
the freeing of the object until later.
v3: Run from the top level retire requests.
v4: Tested with P(return -ERESTARTSYS)=.5 from i915_gem_do_wait_request()
v5: Rebase against Eric's for-linus tree.
v6: Refactor, split and add a comment about avoiding unbounded recursion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
Combine the iteration over active render rings into a common function.
This is in preparation for reusing the idle function to also retire
deferred free requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Simple fix for error propagation along the old UMS path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/r600: fix possible NULL pointer derefernce
drm/radeon/kms: add quirk for ASUS HD 3600 board
include/linux/vgaarb.h: add missing part of include guard
drm/nouveau: Fix crashes during fbcon init on single head cards.
drm/nouveau: fix pcirom vbios shadow breakage from acpi rom patch
drm/radeon/kms: fix shared ddc harder
drm/i915: enable low power render writes on GEN3 hardware.
drm/i915: Define MI_ARB_STATE bits
vmwgfx: return -EFAULT if copy_to_user fails
fb: handle allocation failure in alloc_apertures()
drm: radeon: check kzalloc() result
drm/ttm: Fix build on architectures without AGP
drm/radeon/kms: fix gtt MC base alignment on rs4xx/rs690/rs740 asics
drm/radeon/kms: fix possible mis-detection of sideport on rs690/rs740
drm/radeon/kms: fix legacy tv-out pal mode
A lot of 945GMs have had stability issues for a long time, this manifested as X hangs, blitter engine hangs, and lots of crashes.
one such report is at:
https://bugs.freedesktop.org/show_bug.cgi?id=20560
along with numerous distro bugzillas.
This only took a week of digging and hair ripping to figure out.
Tracked down and tested on a 945GM Lenovo T60,
previously running
x11perf -copypixwin500
or
x11perf -copywinpix500
repeatedly would cause the GPU to wedge within 4 or 5 tries, with random busy bits set.
After this patch no hangs were observed.
cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The hibernate issues that got fixed in commit 985b823b91 ("drm/i915:
fix hibernation since i915 self-reclaim fixes") turn out to have been
incomplete. Vefa Bicakci tested lots of hibernate cycles, and without
the __GFP_RECLAIMABLE flag the system eventually fails to resume.
With the flag added, Vefa can apparently hibernate forever (or until he
gets bored running his automated scripts, whichever comes first).
The reclaimable flag was there originally, and was one of the flags that
were dropped (unintentionally) by commit 4bdadb9785 ("drm/i915:
Selectively enable self-reclaim") that introduced all these problems,
but I didn't want to just blindly add back all the flags in commit
985b823b91, and it looked like __GFP_RECLAIM wasn't necessary. It
clearly was.
I still suspect that there is some subtle reason we're missing that
causes the problems, but __GFP_RECLAIMABLE is certainly not wrong to use
in this context, and is what the code historically used. And we have no
idea what the causes the corruption without it.
Reported-and-tested-by: M. Vefa Bicakci <bicave@superonline.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only ever assigned, never used.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[glisse: I will re-add if needed for range-restricted allocations]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since commit 4bdadb9785 ("drm/i915:
Selectively enable self-reclaim"), we've been passing GFP_MOVABLE to the
i915 page allocator where we weren't before due to some over-eager
removal of the page mapping gfp_flags games the code used to play.
This caused hibernate on Intel hardware to result in a lot of memory
corruptions on resume. See for example
http://bugzilla.kernel.org/show_bug.cgi?id=13811
Reported-by: Evengi Golov (in bugzilla)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested-by: M. Vefa Bicakci <bicave@superonline.com>
Cc: stable@kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we now get_user_pages() outside of the mutex prior to performing
the copy, we kmap() the page inside the copy routine and so need to
perform an ordinary memcpy() and not copy_from_user().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
As we do not have a requirement to be atomic and avoid sleeping whilst
performing the slow copy for shmem based pread and pwrite, we can use
kmap instead, thus simplifying the code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
We can avoid an early clflush when pwriting if we use the current CPU
write domain rather than moving the object to the GTT domain for the
purposes of the pwrite. This has the advantage of not flushing the
presumably hot data that we want to upload into the bo, and of ascribing
the clflush to the execution when profiling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
The callers expect us to cleanup any partially initialised structures
before reporting the error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
If the object is bigger than the entire aperture, reject it early
before evicting everything in a vain attempt to find space.
v2: Use E2BIG as suggested by Owain G. Ainsworth.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>
This particular warning is harmless as we emit during the normal
pinning process where the batch buffer requires more fences than is
available without eviction. Only if we fail to evict enough fences does
this become a problem, so include the requested number of fences in the
ultimate *error* message.
v2: Remember to compile test even trial patches to remove warnings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Nesting domain changes will cause confusion when trying to interpret the
tracepoints describing the sequence of changes for the object, as well
as obscuring the order of operations for the reader of the code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
This saves a whooping 7 dwords. Zero functional changes. Because
some of the refcounts are rather tightly calculated, I've put
BUG_ONs in the code to check for overflows.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
The BSD (bit stream decoder) ring is used for accessing the BSD engine
which decodes video bitstream for H.264 and VC1 on G45+. It is
asynchronous with the render ring and has access to separate parts of
the GPU from it, though the render cache is coherent between the two.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
The active list and request list move into the ringbuffer structure,
so each can track its active objects in the order they are in that
ring. The flushing list does not, as it doesn't matter which ring
caused data to end up in the render cache. Objects gain a pointer to
the ring they are active on (if any).
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Introduces a more complete intel_ring_buffer structure with callbacks
for setup and management of a particular ringbuffer, and converts the
render ring buffer consumers to use it.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
[anholt: Fixed up whitespace fail and rebased against prep patches]
Signed-off-by: Eric Anholt <eric@anholt.net>
This is preparation for supporting multiple ringbuffers on Ironlake.
The non-copy-and-paste changes are:
- de-staticing functions
- I915_GEM_GPU_DOMAINS moving to i915_drv.h to be used by both files.
- i915_gem_add_request had only half its implementation
copy-and-pasted out of the middle of it.
This lru tracks fences, not objects, so move it to where it belongs.
As a side effect, this nicely shrinks drm_i915_gem_object by two
pointers.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
Conflicts:
drivers/gpu/drm/i915/i915_dma.c
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/radeon/r300.c
The BSD ringbuffer support that is landing in this branch
significantly conflicts with the Ironlake PIPE_CONTROL fix on master,
and requires it to be tested successfully anyway.
By idling the GPU and discarding everything we can when under extreme
memory pressure, the number of OOM-killer events is dramatically
reduced. For instance, this makes it possible to run
firefox-planet-gnome.trace again on my swapless 512MiB i915.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
My PIPE_CONTROL fix (just sent via Eric's tree) was buggy; I was
testing a whole set of patches together and missed a conversion to the
new HAS_PIPE_CONTROL macro, which will cause breakage on non-Ironlake
965 class chips. Fortunately, the fix is trivial and has been tested.
Be sure to use the HAS_PIPE_CONTROL macro in i915_get_gem_seqno, or
we'll end up reading the wrong graphics memory, likely causing hangs,
crashes, or worse.
Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 965, the hardware has supported the PIPE_CONTROL command, which
provides fine grained GPU cache flushing control. On recent chipsets,
this instruction is required for reliable interrupt and sequence number
reporting in the driver.
So add support for this instruction, including workarounds, on Ironlake
and Sandy Bridge hardware.
https://bugs.freedesktop.org/show_bug.cgi?id=27108
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Luckily the change is quite a little bit less invasive than I've
feared.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thanks to the to_intel_bo helper, this change is rather trivial.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just embed it and adjust the pointers, No other changes (that's
for later patches).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just preparation, no functional change.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When drivers embed the core gem object into their own structures,
they'll have to do this. Temporarily this results in an ugly
kfree(gem_obj);
in every gem driver.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Current code is definitely crap: Largest pitch allowed spills into
the TILING_Y bit of the fence registers ... :(
I've rewritten the limits check under the assumption that 3rd gen hw
has a 3d pitch limit of 8kb (like 2nd gen). This is supported by an
otherwise totally misleading XXX comment.
This bug mostly resulted in tiling-corrupted pixmaps because the kernel
allowed too wide buffers to be tiled. Bug brought to the light by the
xf86-video-intel 2.11 release because that unconditionally enabled
tiling for pixmaps, relying on the kernel to check things. Tiling for
the framebuffer was not affected because the ddx does some additional
checks there ensure the buffer is within hw-limits.
v2: Instead of computing the value that would be written into the
hw fence registers and then checking the limits simply check whether
the stride is above the 8kb limit. To better document the hw, add
some WARN_ONs in i915_write_fence_reg like I've done for the i830
case (using the right limits).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27449
Tested-by: Alexander Lam <lambchop468@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Ignore LVDS EDID when it is unavailabe or invalid
drm/i915: Add no_lvds entry for the Clientron U800
drm/i915: Rename many remaining uses of "output" to encoder or connector.
drm/i915: Rename intel_output to intel_encoder.
agp/intel: intel_845_driver is an agp driver!
drm/i915: introduce to_intel_bo helper
drm/i915: Disable FBC on 915GM and 945GM.
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
This is a purely cosmetic change to make changes in this area easier.
And hey, it's not only clearer and typechecked, but actually shorter,
too!
[anholt: To clarify, this is a change to let us later make
drm_i915_gem_object subclass drm_gem_object, instead of having
drm_gem_object have a pointer to i915's private data]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
This could resolve HW deadlocks where a unit downstream of the VS is
waiting for more input, the VS has one vertex queued up but not
dispatched because it hopes to get one more vertex for 2x4 dispatch,
and software isn't handing more vertices down because it's waiting for
rendering to complete. The B-Spec says you should always have this
bit set.
Signed-off-by: Eric Anholt <eric@anholt.net>
The continue just after this call with loop around and wait for the
request just added just fine. This leads to slightly more compact code.
Signed-Off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
The assumption that an object has only ever one write domain is deeply
threaded into gem (it's even encoded the the singular of the variable
name). Don't let userspace screw us over.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
Now that we have an exact gpu write domain tracking, we don't need
to move objects to the active list ourself. i915_add_request will
take care of that under all circumstances.
Idea stolen from a patch by Chris Wilson <chris@chris-wilson.co.uk>.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
We have it, so use it. This required moving the function to avoid
a forward declaration.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
The fence_list should be lru ordered for otherwise we might try
to steal a fence reg from an active object even though there are
fences from inactive objects available. lru ordering was obeyed
for gpu access everywhere save when moving dirty objects from
flushing_list to active_list.
Fixing this cause the code to indent way to much, so I've extracted
the flushing_list processing logic into its on function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
The spaghetti logic in there tripped up my brain's code parser for a
few secs. Prevent this from happening again by extracting the fence
stealing code into a seperate functions. IMHO this slightly clears up
the code flow.
v2: Beautified according to ickle's comments.
v3: ickle forgot to flush his comment queue ... Now there's also a
we-are-paranoid BUG_ON in there.
v4: I've forgotten to switch on my brain when doing v3. Now the BUG_ON
actually checks something useful.
v5: Clean up a stale comment as noted by Eric Anholt.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
All other accesses take this spinlock, so do this here, too.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
This has a few functional changes against the old code:
* a few more unnecessary loads and stores to the drm_i915_fence_reg
objects. Also an unnecessary store to the hw fence register.
* zaps any userspace mappings before doing other flushes. Only changes
anything when userspace does racy stuff against itself.
* also flush GTT domain. This is a noop, but still try to keep the
bookkeeping correct.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
* anholt/drm-intel-next:
drm/i915: Record batch buffer following GPU error
drm/i915: give up on 8xx lid status
drm/i915: reduce some of the duplication of tiling checking
drm/i915: blow away userspace mappings before fence change
drm/i915: move a gtt flush to the correct place
agp/intel: official names for Pineview and Ironlake
drm/i915: overlay: drop superflous gpu flushes
drm/i915: overlay: nuke readback to flush wc caches
drm/i915: provide self-refresh status in debugfs
drm/i915: provide FBC status in debugfs
drm/i915: fix drps disable so unload & re-load works
drm/i915: Fix OGLC performance regression on 945
drm/i915: Deobfuscate the render p-state obfuscation
drm/i915: add dynamic performance control support for Ironlake
drm/i915: enable memory self refresh on 9xx
drm/i915: Don't reserve compatibility fence regs in KMS mode.
drm/i915: Keep MCHBAR always enabled
drm/i915: Replace open-coded eviction in i915_gem_idle()
i915_gem_object_fenceable was mostly just a repeat of the
i915_gem_object_fence_offset_ok, but also checking the size (which was
checkecd when we allowed that BO to be tiled in the first place). So
instead, export the latter function and use it in place.
Signed-Off-By: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This aligns it with the other user of i915_gem_clear_fence_reg,
which blows away the mapping before changing the fence reg.
Only affects userspace if it races against itself when changing
tiling parameters, i.e. behaviour is undefined, anyway.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
No functional change, because gtt flushing is a no-op. Still, try
to keep the bookkeeping accurate. The if is still slightly wrong
for with execbuf2 even i915-class hw doesn't always need a fence
reg for gpu access. But that's for somewhen lateron.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
The fence start is for compatibility with UMS X Servers before fence
management. KMS X Servers only started doing tiling after fence
management appeared.
Signed-off-by: Eric Anholt <eric@anholt.net>
With the introduction of the hang-check, we can safely expect that
i915_wait_request() will always return even when the GPU hangs, and so
do not need to open code the wait in order to manually check for the
hang. Also we do not need to always evict all buffers, so only flush
the GPU (and wait for it to idle) for KMS, but continue to evict for UMS.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Mostly obvious simplifications.
The i915 pread/pwrite ioctls, intel_overlay_put_image and
nouveau_gem_new were incorrectly using the locked versions
without locking: this is also fixed in this patch.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before changing the status of a buffer with a pending write we will await
upon a new flush for that buffer. So we can take advantage of any flushes
posted whilst the buffer is active and pending processing by the GPU, by
clearing its write_domain and updating its last_rendering_seqno -- thus
saving a potential flush in deep queues and improves flushing behaviour
upon eviction for both GTT space and fences.
In order to reduce the time spent searching the active list for matching
write_domains, we move those to a separate list whose elements are
the buffers belong to the active/flushing list with pending writes.
Orignal patch by Chris Wilson <chris@chris-wilson.co.uk>, forward-ported
by me.
In addition to better performance, this also fixes a real bug. Before
this changes, i915_gem_evict_everything didn't work as advertised. When
the gpu was actually busy and processing request, the flush and subsequent
wait would not move active and dirty buffers to the inactive list, but
just to the flushing list. Which triggered the BUG_ON at the end of this
function. With the more tight dirty buffer tracking, all currently busy and
dirty buffers get moved to the inactive list by one i915_gem_flush operation.
I've left the BUG_ON I've used to prove this in there.
References:
Bug 25911 - 2.10.0 causes kernel oops and system hangs
http://bugs.freedesktop.org/show_bug.cgi?id=25911
Bug 26101 - [i915] xf86-video-intel 2.10.0 (and git) triggers kernel oops
within seconds after login
http://bugs.freedesktop.org/show_bug.cgi?id=26101
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Adam Lantos <hege@playma.org>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Fix leak of relocs along do_execbuffer error path
drm/i915: slow acpi_lid_open() causes flickering - V2
drm/i915: Disable SR when more than one pipe is enabled
drm/i915: page flip support for Ironlake
drm/i915: Fix the incorrect DMI string for Samsung SX20S laptop
drm/i915: Add support for SDVO composite TV
drm/i915: don't trigger ironlake vblank interrupt at irq install
drm/i915: handle non-flip pending case when unpinning the scanout buffer
drm/i915: Fix the device info of Pineview
drm/i915: enable vblank interrupt on ironlake
drm/i915: Prevent use of uninitialized pointers along error path.
drm/i915: disable hotplug detect before Ironlake CRT detect
Following a gpu hang, we would leak the relocation buffer. So simply
earrange the error path to always free the relocation buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Having missed the ENOMEM return via i915_gem_fault(), there are probably
other paths that I also missed. By not enabling NORETRY by default these
paths can run the shrinker and take memory from the system (but not from
our own inactive lists because our shrinker can not run whilst we hold
the struct mutex) and this may allow the system to survive a little longer
whilst our drivers consume all available memory.
References:
OOM killer unexpectedly called with kernel 2.6.32
http://bugzilla.kernel.org/show_bug.cgi?id=14933
v2: Pass gfp into page mapping.
v3: Use new read_cache_page_gfp() instead of open-coding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> (in principle)
Signed-off-by: Eric Anholt <eric@anholt.net>
When we setup buffer for display plane, we'll check any pending
required GPU flush and possible make interruptible wait for flush
complete. But that wait would be most possibly to fail in case of
signals received for X process, which will then fail modeset process
and put display engine in unconsistent state. The result could be
blank screen or CPU hang, and DDX driver would always turn on outputs
DPMS after whatever modeset fails or not.
So this one creates new helper for setup display plane buffer, and
when needing flush using uninterruptible wait for that.
This one should fix bug like https://bugs.freedesktop.org/show_bug.cgi?id=24009.
Also fixing mode switch stress test on Ironlake.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: rs600: use correct mask for SW interrupt
gpu/drm/radeon/radeon_irq.c: move a dereference below a NULL test
drm/radeon/radeon_device.c: move a dereference below a NULL test
drm/radeon/radeon_fence.c: move a dereference below the NULL test
drm/radeon/radeon_connectors.c: add a NULL test before dereference
drm/radeon/kms: fix memory leak
drm/kms: Fix &&/|| confusion in drm_fb_helper_connector_parse_command_line()
drm/edid: Fix CVT width/height decode
drm/edid: Skip empty CVT codepoints
drm: remove address mask param for drm_pci_alloc()
drm/radeon/kms: add missing breaks in i2c and ss lookups
drm/radeon/kms: add primary dac adj values table
drm/radeon/kms: fallback to default connector table
drm_pci_alloc() has input of address mask for setting pci dma
mask on the device, which should be properly setup by drm driver.
And leave it as a param for drm_pci_alloc() would cause confusion
or mistake would corrupt the correct dma mask setting, as seen on
intel hw which set wrong dma mask for hw status page. So remove
it from drm_pci_alloc() function.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As pinning (allocating and binding GTT memory) does not actually invoke
GPU commands, it is safe, and indeed is attempted, during resumption
from suspension:
[drm:intel_init_clock_gating] *ERROR* failed to pin power context: -16
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>
This patch adds a new execbuf ioctl, execbuf2, for use by clients that
want to control fence register allocation more finely. The buffer
passed in to the new ioctl includes a new relocation type to indicate
whether a given object needs a fence register assigned for the command
buffer in question.
Compatibility with the existing execbuf ioctl is implemented in terms
of the new code, preserving the assumption that fence registers are
required for pre-965 rendering commands.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[ickle: Remove pre-emptive clear_fence_reg()]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
[anholt: Removed dmesg spam]
Signed-off-by: Eric Anholt <eric@anholt.net>
i915_gem_object_unbind had the ordering wrong. The other user,
i915_gem_object_put_fence_reg already has the correct ordering.
Results was usually corrupted pixmaps, especially garbled font glyphs
after a suspend/resume (because this evicts everything).
I'm still waiting for the feedback from the bug-reporters, but
because this obviously fixes a bug (at least for me) I'm already
submitting it.
Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=25406
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
CC: stable@kernel.org
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (189 commits)
drm/radeon/kms: fix warning about cur_placement being uninitialised.
drm/ttm: Print debug information on memory manager when eviction fails
drm: Add memory manager debug function
drm/radeon/kms: restore surface registers on resume.
drm/radeon/kms/r600/r700: fallback gracefully on ucode failure
drm/ttm: Initialize eviction placement in case the driver callback doesn't
drm/radeon/kms: cleanup structure and module if initialization fails
drm/radeon/kms: actualy set the eviction placements we choose
drm/radeon/kms: Fix NULL ptr dereference
drm/radeon/kms/avivo: add support for new pll selection algo
drm/radeon/kms/avivo: fix some bugs in the display bandwidth setup
drm/radeon/kms: fix return value from fence function.
drm/radeon: Remove tests for -ERESTART from the TTM code.
drm/ttm: Have the TTM code return -ERESTARTSYS instead of -ERESTART.
drm/radeon/kms: Convert radeon to new TTM validation API (V2)
drm/ttm: Rework validation & memory space allocation (V3)
drm: Add search/get functions to get a block in a specific range
drm/radeon/kms: fix avivo tiling regression since radeon object rework
drm/i915: Remove a debugging printk from hangcheck
drm/radeon/kms: make sure i2c id matches
...
IGD* isn't a useful name. Replace with the codenames, as sourced from
pci.ids.
Signed-off-by: Adam Jackson <ajax@redhat.com>
[anholt: Fixed up for merge with pineview/ironlake changes]
Signed-off-by: Eric Anholt <eric@anholt.net>
These are handled by the error return being propagated to user-space and
do not any add any information to the original error, so are useless.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
Acked-by: Thomas Hellström <thomas@shipmail.org>
Review-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse "Orange Smoothie" Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Execbufs involve quite a bit of payload, to the extent that cache misses
show up in the profiles here, and a suspicion that some of those cachelines
may get evicted and then reloaded in the subsequent copy.
This is still abstracted like drm_calloc_large since we want to check for
size overflow, and because we want to choose between kmalloc and vmalloc
on the fly. cairo's interface for malloc-with-calloc's-args was used as
the model.
Signed-off-by: Eric Anholt <eric@anholt.net>
Replace the DRM_DEBUG with DRM_DEBUG_DRIVER in generic i915 driver.
Then the debug info can be obtained by adding the boot option of
"drm.debug=0x02".
At the same time the debug info in increase/decrease clock is also
printed by using DRM_DEBUG_DRIVER instead of DRM_DEBUG_KMS.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
As long as the gpu can keep up, neither the cpu (waiting for gpu)
nore the gpu (waiting for vblank to do an overlay flip) stalls.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
This just waits until the hw passed the current ring position with
cmd execution. This slightly changes the existing i915_wait_request
function to make uninterruptible waiting possible - no point in
returning to userspace while mucking around with the overlay, that
piece of hw is just too fragile.
Also replace a magic 0 with the symbolic constant (and kill the then
superflous comment) while I was looking at the code.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
If we trigger a tracepoint for batch buffer submission, it is a reasonable
assumption that we wish to also trace the batch buffer completion. So in
order to capture the completion events, we need to enable irqs... However,
we cannot rely on the completion event to disable the irq later, so we
defer the irq disable to the retire request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We increment the seqno number between submitting the batch buffer and
the flush/interrupt that demarcates its end, so the tracepoint needs to
reference the incremented value to match the completion event.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (57 commits)
drm/i915: Handle ERESTARTSYS during page fault
drm/i915: Warn before mmaping a purgeable buffer.
drm/i915: Track purged state.
drm/i915: Remove eviction debug spam
drm/i915: Immediately discard any backing storage for uneeded objects
drm/i915: Do not mis-classify clean objects as purgeable
drm/i915: Whitespace correction for madv
drm/i915: BUG_ON page refleak during unbind
drm/i915: Search harder for a reusable object
drm/i915: Clean up evict from list.
drm/i915: Add tracepoints
drm/i915: framebuffer compression for GM45+
drm/i915: split display functions by chip type
drm/i915: Skip the sanity checks if the current relocation is valid
drm/i915: Check that the relocation points to within the target
drm/i915: correct FBC update when pipe base update occurs
drm/i915: blacklist Acer AspireOne lid status
ACPI: make ACPI button funcs no-ops if not built in
drm/i915: prevent FIFO calculation overflows on 32 bits with high dotclocks
drm/i915: intel_display.c handle latency variable efficiently
...
Fix up trivial conflicts in drivers/gpu/drm/i915/{i915_dma.c|i915_drv.h}
During a page fault and rebinding the buffer there exists a window for a
signal to arrive during the i915_wait_request() and trigger a
ERESTARTSYS. This used to be handled by returning SIGBUS and thereby
killing the application. Try 'cairo-perf-trace & cairo-test-suite' and
watch X go boom!
The solution as suggested by H. Peter Anvin is to simply return NOPAGE and
leave the higher layers to spot we did not fill the page and resubmit
the page fault.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
[anholt: Mostly squash it with another commit]
In order to correctly prevent the invalid reuse of a purged buffer, we
need to track such events and warn the user before something bad
happens.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Whilst cleaning up the patches for submission, I mis-classified non-dirty
objects as purgeable. This was causing the backing pages for those
objects to be evicted under memory-pressure, discarding valid and
unreplaceable texture data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As evict_something() is called by routines that do not repeatedly search
again, try harder in the initial search to find an object that matches
the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
First the routine attempted to unlock a mutex it did not own along the
error path.
Secondly the routine should never be called on any list but the inactive
one, since we attempt to unbind those objects, so fix the calling semantics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By adding tracepoint equivalents for WATCH_BUF/EXEC we are able to monitor
the lifetimes of objects, requests and significant events. These events can
then be probed using the tracing frameworks, such as systemtap and, in
particular, perf.
For example to record the stack trace for every GPU stall during a run, use
$ perf record -e i915:i915_gem_request_wait_begin -c 1 -g
And
$ perf report
to view the results.
[Updated to fix compilation issues caused.]
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (133 commits)
drm/vgaarb: add VGA arbitration support to the drm and kms.
drm/radeon: some r420s have a CP race with the DMA engine.
drm/radeon/r600/kms: rv670 is not DCE3
drm/radeon/kms: r420 idle after programming GA_ENHANCE
drm/radeon/kms: more fixes to rv770 suspend/resume path.
drm/radeon/kms: more alignment for rv770.c with r600.c
drm/radeon/kms: rv770 blit init called too late.
drm/radeon/kms: move around new init path code to avoid posting at init
drm/radeon/r600: fix some issues with suspend/resume.
drm/radeon/kms: disable VGA rendering engine before taking over VRAM
drm/radeon/kms: Move radeon_get_clock_info() call out of radeon_clocks_init().
drm/radeon/kms: add initial connector properties
drm/radeon/kms: Use surfaces for scanout / cursor byte swapping on big endian.
drm/radeon/kms: don't fail if we fail to init GPU acceleration
drm/r600/kms: fixup number of loops per blit calculation.
drm/radeon/kms: reprogram format in set base.
drm/radeon: avivo chips have no separate int bit for display
drm/radeon/r600: don't do interrupts
drm: fix _DRM_GEM addmap error message
drm: update crtc x/y when only fb changes
...
Fixed up trivial conflicts in firmware/Makefile due to network driver
(cxgb3) and drm (mga/r128/radeon) firmware being listed next to each
other.
If the presumed_offset as feed to userspace and returned to the kernel
from a previous execbuffer is still valid, then we do not need to rewrite
the relocation entry and may skip the offset sanity checks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Eric noted a potential concern with the low bits not being strictly used
as part of the absolute offset (instead part of the command stream to the
GPU), but in practice that should not be an issue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Andy Whitcroft <apw@canonical.com>
Cc: Eric Anholt <eric@anholt.net>
CC: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Due to the necessity of having to take the struct_mutex, the i915
shrinker can not free the inactive lists if we fail to allocate memory
whilst processing a batch buffer, triggering an OOM and an ENOMEM that
is reported back to userspace. In order to fare better under such
circumstances we need to manually retry a failed allocation after
evicting inactive buffers.
To do so involves 3 steps:
1. Marking the backing shm pages as NORETRY.
2. Updating the get_pages() callers to evict something on failure and then
retry.
3. Revamping the evict something logic to be smarter about the required
buffer size and prefer to use volatile or clean inactive pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Similar to the madvise() concept, the application may wish to mark some
data as volatile. That is in the event of memory pressure the kernel is
free to discard such buffers safe in the knowledge that the application
can recreate them on demand, and is simply using these as a cache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This should help GEM handle memory pressure sitatuions more gracefully.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is no need to store the gtt_alignment as it is either explicitly
set according to the hardware requirements (e.g. scanout) or the
minimum alignment is computed on demand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If we failed to set the domain, the buffer was no longer being tracked
on any list.
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is a very real possibility that multiple CPUs will notice that the
GPU is wedged. This introduces all sorts of potential race conditions.
Make the wedged flag atomic to mitigate this risk.
Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We set a periodic timer to check on the GPU, resetting it every time a
batch is completed. If the timer elapses, we check acthd. If acthd
hasn't changed in two timer periods, we assume the chip is wedged.
This is implemented in such a way that it leaves the option open to
employ adaptive timer intervals in the future. One could wait until
several timer periods have elapsed before declaring the chip dead. If
the chip comes back after several periods but before the "dead"
threshold, the timer interval or dead threshold could be raised.
It is important to note that while checking for active requests, we need
to account for the fact that requests are removed from the list (i.e.
retired) in a deferred work queue handler. This means that merely
checking for an empty request_list is insufficient; the list could be
non-empty yet the GPU still idle, causing the hangcheck timer to
incorrectly mark the GPU as wedged (it took me a while to figure that
out---sigh...)
Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We'll need it in i915_irq.c for checking whether there are outstanding
requests. Also, the function really ought to return a bool, not an int.
Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
i915_wait_request() only checks mm.wedged after it interacts with the
hardware, generally causing the driver to lock up waiting for a wedged
chip. Make sure we check mm.wedged as the first thing we do.
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
drm_ht_remove_item() does not handle removing an absent item and the hlist
in particular is incorrectly initialised. The easy remedy is simply skip
calling i915_gem_free_mmap_offset() unless we have actually created the
offset and associated ht entry.
This also fixes the mishandling of a partially constructed offset which
leaves pointers initialized after freeing them along the
i915_gem_create_mmap_offset() error paths.
In particular this should fix the oops found here:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/415357/comments/8
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had
serious stability issues. Back in May a wbinvd was added to the DRM to
work around much of the problem. Some failure remained -- easily visible
by dragging a window around on an X -retro desktop, or by looking at bugzilla.
The chipset flush was on the right track -- hitting the right amount of
memory, and it appears to be the only way to flush on these chipsets, but the
flush page was mapped uncached. As a result, the writes trying to clear the
writeback cache ended up bypassing the cache, and not flushing anything! The
wbinvd would flush out other writeback data and often cause the data we wanted
to get flushed, but not always. By removing the setting of the page to UC
and instead just clflushing the data we write to try to flush it, we get the
desired behavior with no wbinvd.
This exports clflush_cache_range(), which was laying around and happened to
basically match the code I was otherwise going to copy from the DRM.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Cc: stable@kernel.org
Otherwise, some other userland writing into its buffer may race to land
writes either after the CPU thinks it's got a coherent view, or after its
GTT entries have been redirected to point at the scratch page. Either
result is unpleasant.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
when switching from graphics mode to the text console, or when
suspending (which does the same thing). With netconsole, the oops
turned out to be
BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
and it's due to the i915_gem.c code doing drm_irq_uninstall() after
having done i915_gem_idle(). And the i915_gem_idle() path will do
i915_gem_idle() ->
i915_gem_cleanup_ringbuffer() ->
i915_gem_cleanup_hws() ->
dev_priv->hw_status_page = NULL;
but if an i915 interrupt comes in after this stage, it may want to
access that hw_status_page, and gets the above NULL pointer dereference.
And since the NULL pointer dereference happens from within an interrupt,
and with the screen still in graphics mode, the common end result is
simply a silently hung machine.
Fix it by simply uninstalling the irq handler before idling rather than
after. Fixes
http://bugzilla.kernel.org/show_bug.cgi?id=13819
Reported-and-tested-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to the docs, the ringbuffer is not allowed to wrap in the middle
of an instruction.
G45 PRM, Vol 1b, p101:
While the “free space” wrap may allow commands to be wrapped around the
end of the Ring Buffer, the wrap should only occur between commands.
Padding (with NOP) may be required to follow this restriction.
Do as commanded.
[Having seen bug reports where there is evidence of split commands, but
apparently the GPU has continued on merrily before a bizarre and untimely
death, this may or may not fix a few random hangs.]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
There are several sources of unnecessary power consumption on Intel
graphics systems. The first is the LVDS clock. TFTs don't suffer from
persistence issues like CRTs, and so we can reduce the LVDS refresh rate
when the screen is idle. It will be automatically upclocked when
userspace triggers graphical activity. Beyond that, we can enable memory
self refresh. This allows the memory to go into a lower power state when
the graphics are idle. Finally, we can drop some clocks on the gpu
itself. All of these things can be reenabled between frames when GPU
activity is triggered, and so there should be no user visible graphical
changes.
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Remember to release the local reference if we fail to wait on
the rendering.
(Also whilst in the vicinity add some whitespace so that the phasing of
the operations is clearer.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Several functions in the GEM kernel API used int as handle type, but
user API has it __u32 which is also the intended type.
Replace int with u32.
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As of 52dc7d32b8, we could leave an old
linear GTT mapping in place, so that apps trying to GTT-mapped write in
tiled data wouldn't get the fence added, and garbage would get displayed.
Signed-off-by: Eric Anholt <eric@anholt.net>
As we call unmap_mapping_range() twice in identical fashion, refactor
and attempt to explain why we need to call unmap_mapping_range().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Unitialized fence register could leads to corrupted display. Problem
encountered on MacBooks (revision 1 and 2), directly booting from EFI
or through BIOS emulation.
(bug #21710 at freedestop.org)
Signed-off-by: Grégoire Henry <henry@pps.jussieu.fr>
Signed-off-by: Eric Anholt <eric@anholt.net>
It hasn't been used in ages, and having the user tell your how much
memory is being freed at free time is a recipe for disaster even if it
was ever used.
Signed-off-by: Eric Anholt <eric@anholt.net>
The fence register value also depends upon the stride of the object, so we
need to clear the fence if that is changed as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: Added 8xx and 965 paths, and renamed the confusing
i915_gem_object_tiling_ok function to i915_gem_object_fence_offset_ok]
Signed-off-by: Eric Anholt <eric@anholt.net>
With the work by Jesse Barnes to eliminate allocation of fences during
execbuffer, it becomes possible to write to the scan-out buffer with it
never acquiring a fence (simply by only ever writing to the object using
tiled GPU commands and never writing to it via the GTT). So for pre-i965
chipsets which require fenced access for tiled scan-out buffers, we need
to obtain a fence register.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
After performing an operation over the page list for a buffer retrieved by
i915_gem_object_get_pages() the pages need to be returned with
i915_gem_object_put_pages(). This was not being observed for the phys
objects which were thus leaking references to their backing pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Dave Airlie <airlied@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
To differentiate between encountering an out-of-memory error with running
out of space in the aperture, use ENOSPC for the later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
The batch buffer may be shared with another read buffer, so we should not
ignore any previously set domains, but just or in the command domain (and
check that the buffer is not writable).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
By sending a broken execbuffer (its length was not suitably aligned) I
triggered an operation upon a freed object. The invalid alignment was
discovered after updating the write_domain on the object but before the
object was placed on the active queue. So during the unwind process
following the error, the now freed object attempts to flush its
non-existent, but outstanding, GPU writes causing this use-after-free.
[drm:i915_dispatch_gem_execbuffer] *ERROR* alignment
[drm:i915_gem_execbuffer] *ERROR* dispatch failed -22
WARNING: at lib/kref.c:43 warn_slowpath_null+0x10/0x15()
Modules linked in:
Pid: 4552, comm: lt-csi-drm Not tainted 2.6.30-rc6 #423
Call Trace:
[<c0119ef3>] warn_slowpath_fmt+0x57/0x6d
[<c014de24>] ? get_pageblock_migratetype+0x18/0x1e
[<c014e8fd>] ? free_hot_page+0xa/0xc
[<c014e915>] ? __free_pages+0x16/0x1f
[<c0153ebf>] ? shmem_truncate_range+0x63e/0x656
[<c015fb2f>] ? slob_page_alloc+0x146/0x1c8
[<c0119f19>] warn_slowpath_null+0x10/0x15
[<c01f55f2>] kref_get+0x1b/0x21
[<c02605db>] i915_gem_object_move_to_active+0x1f/0x56
[<c0261302>] i915_add_request+0x156/0x19a
[<c026136e>] i915_gem_object_flush_gpu_write_domain+0x28/0x3f
[<c0261eca>] i915_gem_object_unbind+0x4a/0x124
[<c0261fd7>] i915_gem_free_object+0x33/0x9b
[<c0250d6b>] drm_gem_object_free+0x28/0x4a
[<c0250d43>] ? drm_gem_object_free+0x0/0x4a
[<c01f55ce>] kref_put+0x38/0x41
[<c0250cbf>] drm_gem_object_unreference+0x11/0x13
[<c0250d06>] drm_gem_object_handle_unreference+0x1e/0x21
[<c0250d13>] drm_gem_object_release_handle+0xa/0xe
[<c01f3e6b>] idr_for_each+0x5f/0x98
[<c0250d09>] ? drm_gem_object_release_handle+0x0/0xe
[<c0250daf>] drm_gem_release+0x22/0x34
[<c025046f>] drm_release+0x1e8/0x3c4
[<c0162d25>] __fput+0xaf/0x146
[<c0162dce>] fput+0x12/0x14
[<c01605ef>] filp_close+0x48/0x52
[<c011b182>] put_files_struct+0x57/0x9b
[<c011b1e4>] exit_files+0x1e/0x20
[<c011c6b6>] do_exit+0x16d/0x511
[<c03704ab>] ? __schedule+0x3d4/0x3e5
[<c0103f0d>] ? handle_irq+0xd/0x69
[<c011caa7>] do_group_exit+0x4d/0x73
[<c011cae0>] sys_exit_group+0x13/0x17
[<c010268c>] sysenter_do_call+0x12/0x2b
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Update interrupt handling methods for IGDNG with new registers
for display and graphics interrupt functions. As we won't use
irq-based vblank sync in dri2, so display interrupt on new chip
will be used for hotplug only in future.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
keithp didn't like the original 20ms plan because a cooperative client could
be starved by an uncooperative client. There may even have been problems
with cooperative clients versus cooperative clients. So keithp changed
throttle to just wait for the second to last seqno emitted by that client.
It worked well, until we started getting more round-trips to the server
due to DRI2 -- the server throttles in BlockHandler, and so if you did more
than one round trip after finishing your frame, you'd end up unintentionally
syncing to the swap.
Fix this by keeping track of the client's requests, so the client can wait
when it has an outstanding request over 20ms old. This should have
non-starving behavior, good behavior in the presence of restarts, and less
waiting. Improves high-settings openarena performance on my GM45 by 50%.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This could be triggered by a gtt mapping fault on 965 that decides to
remove the fence from another object that happens to be active currently.
Since the other object doesn't rely on the fence reg for its execution, we
don't wait for it to finish. We'll soon be not waiting on 915 most of the
time as well, so just drop the BUG_ON.
Signed-off-by: Eric Anholt <eric@anholt.net>
When a GEM object is evicted from the GTT we set it to the CPU domain,
as it might get swapped in and out or ever mmapped regularly. If the
object is mmapped through the GTT it can still get evicted in this way
by other objects requiring GTT space. When the GTT mapping is touched
again we fault it back into the GTT, but fail to set it back to the
GTT domain. This means we fail to flush any cached CPU writes to the
pages backing the object which will then happen "eventually", typically
after we write to the page through the uncached GTT mapping.
[anholt: Note that userland does do a set_domain(GTT, GTT) when starting
to access the GTT mapping. That covers getting the existing mapping of the
object synchronized if it's bound to the GTT. But set_domain(GTT, GTT)
doesn't do anything if the object is currently unbound. This fix covers the
transition to being bound for GTT mapping.]
Fixes glyph and other pixmap corruption during swapping. fd.o bug #21790
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
On the 865, but not the 855, the clflush we do appears to not actually make
it out to the hardware all the time. An easy way to safely reproduce was
X -retro, which would show that some of the blits involved in drawing the
lovely root weave didn't make it out to the hardware. Those blits are 32
bytes each, and 1-2 would be missing at various points around the screen.
Other experimentation (doing more clflush, doing more AGP chipset flush,
poking at some more device registers to maybe trigger more flushing) didn't
help. krh came up with the wbinvd as a way to successfully get all those
blits to appear.
Signed-off-by: Eric Anholt <eric@anholt.net>
The pitch field is an exponent on pre-965, so we were rejecting buffers
on 8xx that we shouldn't have. 915 got lucky in that the largest legal
value happened to match (8KB / 512 = 0x10), but 8xx has a smaller tile width.
Additionally, we programmed that bad value into the register on 8xx, so the
only pitch that would work correctly was 4096 (512-1023 pixels), while others
would probably give bad rendering or hangs.
Signed-off-by: Eric Anholt <eric@anholt.net>
fd.o bug #20473.
For some reason we never added 8xx desktop cursor support to the
kernel. This patch fixes that.
[krh: Also set the size on pre-i915 hw.]
Tested-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
For awhile now, many of the GEM code paths have allocated page or
object arrays with the slab allocator. This is nice and fast, but
won't work well if memory is fragmented, since the slab allocator works
with physically contiguous memory (i.e. order > 2 allocations are
likely to fail fairly early after booting and doing some work).
This patch works around the issue by falling back to vmalloc for
>PAGE_SIZE allocations. This is ugly, but much less work than chaining
a bunch of pages together by hand (suprisingly there's not a bunch of
generic kernel helpers for this yet afaik). vmalloc space is somewhat
precious on 32 bit kernels, but our allocations shouldn't be big enough
to cause problems, though they're routinely more than a page.
Note that this patch doesn't address the unchecked
alloc-based-on-ioctl-args in GEM; that needs to be fixed in a separate
patch.
Also, I've deliberately ignored the DRM's "area" junk. I don't think
anyone actually uses it anymore and I'm hoping it gets ripped out soon.
[Updated: removed size arg to new free function. We could unify the
free functions as well once the DRM mem tracking is ripped out.]
fd.o bug #20152 (part 1/3)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
We might sleep here anyway so I hope an extra uncached read is ok to
add.
In #20896 we found that vbetool clobbers the IER. In KMS mode this is
particularly bad since we don't set the interrupt regs late (in
EnterVT), so we'd fail to get *any* interrupts at all after X started
(since some distros have scripts that call vbetool at X startup
apparently).
So this patch checks IER at wait_request time, and re-enables
interrupts if it's been clobbered. In a proper config this check
should never be triggered.
This is really a distro issue, but having a sanity check is nice, as
long as it doesn't have a real performance hit.
Tested-by: Mateusz Kaduk <mateusz.kaduk@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: Moved the check inside of the sleeping case to avoid perf cost]
Signed-off-by: Eric Anholt <eric@anholt.net>
regression caused by commit 5e118f4139:
i915_gem_object_move_to_inactive() should be called in task context,
as it calls fput();
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
[anholt: Add more detail to the comment about the lock break that's added]
Signed-off-by: Eric Anholt <eric@anholt.net>
Save the bit 17 state of the pages when freeing the page list, and
reswizzle them if necessary when rebinding the pages (in case they were
swapped out). Since we have userland with expectations that the swizzle
enums let it pread and pwrite contents accurately, we can't expose a new
swizzle enum for bit 17 (which it would have to GTT map to handle), so we
handle it down in pread and pwrite by swizzling the copy when bit 17 of the
page address is set.
Signed-off-by: Eric Anholt <eric@anholt.net>
Otherwise, the results of our read didn't show up when we were faulting in
the page being read into (as happened with a testcase reading into a big
stack area). Likely accounts for some conformance test failures.
Signed-off-by: Eric Anholt <eric@anholt.net>
i915_gem_put_relocs_to_user returned an uninitialized value which
got returned to userspace. This caused libdrm in my setup to never
get out of a do{}while() loop retrying i915_gem_execbuffer.
result was hanging X, overheating of cpu and 2-3gb of logfile-spam.
This patch adresses the issue by
1. initializing vars in this file where necessary
2. correcting wrongly interpreted return values of copy_[from/to]_user
Signed-off-by: Florian Mickler <florian@mickler.org>
[anholt: cleanups of unnecessary changes, consistency in APIs]
Signed-off-by: Eric Anholt <eric@anholt.net>
We create a debugfs node (i915_ringbuffer_data) to expose a hex dump
of the ring buffer itself. We also expose another debugfs node
(i915_ringbuffer_info) with information on the state (i.e. head, tail
addresses) of the ringbuffer.
For batchbuffer dumping, we look at the device's active_list, dumping
each object which has I915_GEM_DOMAIN_COMMAND in its read
domains. This is all exposed through the dri/i915_batchbuffers debugfs
file with a header for each object (giving the objects gtt_offset so
that it can be matched against the offset given in the
BATCH_BUFFER_START command.
Signed-off-by: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This is a baby-step in the direction of having finer-grained
locking than the struct_mutex. Specifically, this will enable
new debugging code to read the active list for printing out
GPU state when the GPU is wedged, (while the struct_mutex is
held, of course).
Signed-off-by: Carl Worth <cworth@cworth.org>
[anholt: indentation fix]
Signed-off-by: Eric Anholt <eric@anholt.net>
Indicates something is wrong with the mapping; and apparently triggers
in current kernels.
Signed-off-by: Jesse Barnes <jbarnes@virtuosugeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This fixes all the tiling problems with the 2d ddx. glxgears still doesn't work.
Changes:
- fix a copy&paste error in i8xx fence reg setup. It resulted in an at most a
512KB offset of the fence reg window, so was only visible sometimes.
- add tests for stride and object size constrains (also for i915 and 1965 class
hw). Userspace seems to have an of-by-one bug there, which changes the fence
size by at most 512KB due to an overflow.
- because i8xx hw is quite old (and therefore not as well-tested) I left 2 debug
WARN_ONs in the i8xx fence reg setup code to hopefully catch any further
overflows in the bit-fields. Lastly there's one small change to make the
alignment checks more consistent.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=20289
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
This produced a warning on my build, not sure why super-warning-man didn't
notice this one, its much worse than the %z one.
Signed-off-by: Dave Airlie <airlied@redhat.com>
agp_chipset_flush() is for flushing the intel GMCH write cache via the
IFP, these two uses are for when we're getting the object into the cpu
READ domain, and thus should not be needed. This confused me when I was
getting my head around the code.
With thanks to airlied for helping me check my mental picture of how the
flushes and clflushes are supposed to be used.
Signed-off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This introduces allocation in the batch submission path that wasn't there
previously, but these are compatibility paths so we care about simplicity
more than performance.
kernel.org bug #12419.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Like the GTT pwrite path fix, this uses an optimistic path and a
fallback to get_user_pages. Note that this means we have to stop using
vfs_write and roll it ourselves.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We've wanted this for a few consumers that touch the pages directly (such as
the following commit), which have been doing the refcounting outside of
get/put pages.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since the pagefault path determines that the lock order we use has to be
mmap_sem -> struct_mutex, we can't allow page faults to occur while the
struct_mutex is held. To fix this in pwrite, we first try optimistically to
see if we can copy from user without faulting. If it fails, fall back to
using get_user_pages to pin the user's memory, and map those pages
atomically when copying it to the GPU.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
agp_chipset_flush() is for flushing the intel GMCH write cache via the
IFP, these two uses are for when we're getting the object into the cpu
READ domain, and thus should not be needed. This confused me when I was
getting my head around the code.
With thanks to airlied for helping me check my mental picture of how the
flushes and clflushes are supposed to be used.
Signed-off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Once upon a time, the DRM made the distinction between the drm_map
data structure exchanged with user space and the drm_local_map used
in the kernel.
For some reasons, while the BSD port still has that "feature", the
linux part abused drm_map for kernel internal usage as the local
map only existed as a typedef of the struct drm_map.
This patch fixes it by declaring struct drm_local_map separately
(though its content is currently identical to the userspace variant),
and changing the kernel code to only use that, except when it's a
user<->kernel interface (ie. ioctl).
This allows subsequent changes to the in-kernel format
I've also replaced the use of drm_local_map_t with struct drm_local_map
in a couple of places. Mostly by accident but they are the same (the
former is a typedef of the later) and I have some remote plans and
half finished patch to completely kill the drm_local_map_t typedef
so I left those bits in.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
The last 8 fence registers sit at a different offset, so when we went to set
fence number 8 in the lower offset, we instead set PGETBL_CTL, and the GPU
got all sorts of angry at us.
fd.o bug #20567. Easily reproducible by running glxgears and killing it about
6 times.
Signed-off-by: Eric Anholt <eric@anholt.net>
The i915 also uses the fence registers for GPU access to tiled buffers so
we cannot reallocate one whilst it is on the active list. By performing a
LRU scan of the fenced buffers we also avoid waiting the possibility of
waiting on a pinned, or otherwise unusable, buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
We need to check and report if there are no available fences - or else we
spin endlessly waiting for a buffer to magically unpin itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
As we may steal the fence register of an unpinned buffer for another,
every time we repin the buffer we need to recheck whether it needs to be
allocated a fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
If we wait upon a request and successfully unbind a buffer occupying a
fence register, then that slot will be freed and cause a NULL derefrence
upon rescanning.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
If userspace passes an object list with the same object appearing more
than once, we end up hitting the BUG_ON() in
i915_gem_object_set_to_gpu_domain() as it gets called a second time
for the same object.
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
dev_priv->hw_status_page can be NULL, if i915_gem_retire_requests()
is called from i915_gem_busy_ioctl().
Signed-off-by Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The object is dereferenced before the NULL check. Oops.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=20235
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ensures that the user gets the latest information from the hardware
on whether the buffer is busy, potentially reducing the working set of objects
that the user chooses.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the KMS case, we need to suspend/resume GEM as well. So on suspend, make
sure we idle GEM and stop any new rendering from coming in, and on resume,
re-init the framebuffer and clear the suspended flag.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The problem was that object_set_to_gpu_domain would set the new write_domains
that are getting set by this batchbuffer, then the accumulated flushes required
for all the objects in preparation for this batchbuffer were posted, and the
brand new write domain would get cleared by the flush being posted. Instead,
hang on to the new (or old if we're not changing it) value and set it after
the flush is queued.
Results from this noticably included conformance test failures from reads
shortly after writes (where the new write domain had been lost and thus not
flushed and waited on), but is a suspected cause of hangs in some apps when
a write domain is lost on a buffer that gets reused for instruction or
commmand state.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While not strictly required, it helped while thinking about the following
change. This change should be invariant.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes potential fault at fault time if the object was unreferenced
while the mapping still existed. Now, while the mmap_offset only lives
for the lifetime of the object, the object also stays alive while a vma
exists that needs it.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we fail to create the ringbuffer, then we need to cleanup the allocated
hws.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
A missing unpin on the error path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
A missing unpin on the error path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
A missing unreference and unpin after rejecting the relocation for an
invalid memory domain.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
We failed to unlock the mutex after failing to create the mmap offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Lockdep warns that i915_gem_execbuffer() can trigger a page fault (which
takes mmap_sem) while holding dev->struct_mutex, while drm_vm_open()
(which is called with mmap_sem already held) takes dev->struct_mutex.
So this is a potential AB-BA deadlock.
The way that i915_gem_execbuffer() triggers a page fault is by doing
copy_to_user() when returning new buffer offsets back to userspace;
however there is no reason to hold the struct_mutex when doing this
copy, since what is being copied is the contents of an array private to
i915_gem_execbuffer() anyway. So we can fix the potential deadlock (and
get rid of the lockdep warning) by simply moving the copy_to_user()
outside of where struct_mutex is held.
This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=12491>.
Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
A missing unreference if the user calls pin() a second time on a pinned
buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Also spotted by Owain Ainsworth.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
The DRI people seem to have a hard time getting these right (see also
commit aeb565dfc3).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we failed to allocate a new fence register we would return
VM_FAULT_SIGBUS without relinquishing the lock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Adds code to set up fence registers at execbuf time on pre-965 chips as
necessary. Also fixes up a few bugs in the pre-965 tile register support
(get_order != ffs). The number of fences available to the kernel defaults
to the hw limit minus 3 (for legacy X front/back/depth), but a new parameter
allows userspace to override that as needed.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
Previously, the caller would continue along without knowing that the
function failed, resulting in potential mis-rendering. Right now vm_fault
just returns SIGBUS in that case, and we may need to disable signal handling
to avoid that happening.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
We'd love to just be using PAT, but even on chips with PAT it gets disabled
sometimes due to an errata. It would probably be better to have pat_enabled
exported and only bother with this when !pat_enabled.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/i915: Fix cursor physical address choice to match the 2D driver.
drm: stash AGP include under the do-we-have-AGP ifdef
drm: don't whine about not reading EDID data
drm/i915: hook up LVDS DPMS property
drm/i915: remove unnecessary debug output in KMS init
i915: fix freeing path for gem phys objects.
drm: create mode_config idr lock
drm: fix leak of device mappings since multi-master changes.
Use '%zu' to print out a size_t variable, not '%d'. Another case of the
"let's keep at least Linus' defconfig compile warningless" rule.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an initial patch to do support for objects which needs physical
contiguous main ram, cursors and overlay registers on older chipsets.
These objects are bound on cursor bin, like pinning, and we copy
the data to/from the backing store object into the real one on attach/detach.
notes:
possible over the top in attach/detach operations.
no overlay support yet.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This avoids a BUG_ON in the enter_vt path due to objects being in the GTT
when we shouldn't have ever let them be (as we're not supposed to touch the
device during that time).
This was triggered by a change in the 2D driver to use the GTT mapping of
objects after pinning them to improve software fallback performance.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
The error path for object list being null is in the second goto target.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
This showed up in logs where people had a hung chip, so pinning was blocked
on the chip unpinning other buffers, and the X Server took its scheduler
signal during that time.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
We haven't seen this in practice, but it was visible when looking at a bug
report from when i915_gem_evict_everything() was broken and would always
return error.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use the new core GEM object mapping code to allow GTT mapping of GEM
objects on i915. The fault handler will make sure a fence register is
allocated too, if the object in question is tiled.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These buffers don't have active rendering still occurring to them, they just
need either a flush to be emitted or a retire_requests to occur so that we
notice they're done. Return unbusy so that one of the two occurs. The two
expected consumers of this interface (OpenGL and libdrm_intel BO cache) both
want this behavior.
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's only for flushing caches appropriately for GTT access, not for actually
getting it there. Prevents potential smashing of cpu read/write domains on
unbound objects.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we fail to pin all of the buffers in an execbuffer request, go through
and clear the GTT and try again to see if its just a matter of fragmentation
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This eliminates the dev_set_domain function and just in-lines it
where its used, with the goal of moving the manipulation and use of
invalidate_domains and flush_domains closer together. This also
avoids calling add_request unless some domain has been flushed.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that the CPU and GTT domain operations are isolated to their own
functions, the previously general-purpose set_domain function is now used
only to set GPU domains. It also has no failure cases, which is important as
this eliminates any possible interruption of the computation of new object
domains and subsequent emmission of the flushing instructions into the ring.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes several domain management bugs, including potential lack of cache
invalidation for pread, potential failure to wait for set_domain(CPU, 0),
and more, along with producing more intelligible code.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes failure to flush caches in the relocation update path, and
failure to wait in the set_domain ioctl, each of which could lead to incorrect
rendering.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise, we would leave the objects in an inconsistent state, such as
write_domain == 0 but on the flushing list.
Signed-off-by: Dave Airlie <airlied@redhat.com>
obj_priv->write_domain is "write domain if the GPU went idle now", not
"write domain at this moment." By postponing the clear, we confused the
concept, required more storage, and potentially emitted more flushes than
are required.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before we had the notion of pinning objects, we had a kludge around to make
sure all of the objects were still resident in the GTT before we committed
to executing a batch buffer. We don't need this any longer, and it sticks an
error return in the middle of object domain computations that must be
associated with a subsequent flush/invalidate emmission into the ring.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The old code was wandering through the active list looking for pinned
buffers; there may be other pinned buffers around. Fortunately, we keep a
count of the total amount of pinned memory and can use that instead.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead, just warn that bad things are happening and do our best to clean up
the mess without the GPU's help.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could return early when reading after writing a buffer, if somebody
had already put it on the flushing list (write domains are 0, but still
active), leading to glReadPixels failure.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
* 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
io mapping: clean up #ifdefs
io mapping: improve documentation
i915: use io-mapping interfaces instead of a variety of mapping kludges
resources: add io-mapping functions to dynamically map large device apertures
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
This will let userland know when to submit its batchbuffers, before they get
too big to fit in the aperture.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Impact: optimize/clean-up the IO mapping implementation of the i915 DRM driver
Switch the i915 device aperture mapping to the io-mapping interface, taking
advantage of the cleaner API to extend it across all of the mapping uses,
including both pwrite and relocation updates.
This dramatically improves performance on 64-bit kernels which were using
the same slow path as 32-bit non-HIGHMEM kernels prior to this patch.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: Avoid oops in DRM_IOCTL_RM_DRAW if a bad handle is supplied.
drm: Add 32-bit compatibility for DRM_IOCTL_UPDATE_DRAW.
drm/i915: use pipes, not planes to label vblank data
drm/i915: hold dev->struct_mutex and DRM lock during vblank ring operations
i915: Fix format string warnings on x86-64.
i915: Don't dereference HWS in /proc debug files when it isn't initialized.
i915: Enable IMR passthrough of vblank events before enabling it in pipestat.
drm: Remove two leaks of vblank reference count in error paths.
drm: fix leak of cliprects in drm_rmdraw()
i915: Disable MSI on GM965 (errata says it doesn't work)
drm: Set cliprects to NULL when changing drawable to having 0 cliprects.
i915: Protect vblank IRQ reg access with spinlock
To synchronize clip lists with the X server, the DRM lock must be held while
looking at drawable clip lists. To synchronize with other ring access, the
ring mutex must be held while inserting commands into the ring. Failure to
do the first resulted in easy visual corruption when moving windows, and the
second could have corrupted the ring with DRI2.
Grabbing the DRM lock involves using the DRM tasklet mechanism, grabbing the
ring mutex means potentially sleeping. Deal with both of these by always
running the tasklet from a work handler.
Also, protect from clip list changes since the vblank request was queued by
making sure the window has at least one rectangle while looking inside,
preventing oopses .
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
commit 9b7530cc32 ("i915: cleanup coding
horrors in i915_gem_gtt_pwrite()")
broke the i386 build for CONFIG_HIGHMEM=y.
Caught by automatic testing http://www.tglx.de/autoqa-logs/000137-0006-0001.log
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ My bad. It's the same patch I sent out earlier, nobody noticed then either.. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yes, this will probably be switched over to a cleaner model anyway, but
in the meantime I don't want to see the 'unused variable' warnings that
come from the disgusting #ifdef code. Make the special case be a nice
inlien function of its own, clean up the code, and make the warning go
away.
I wish people didn't write code that gets (valid) warnings from the
compiler, but I'll limit my fixes to code that I actually care about (in
this case just because I see the warning and it annoys me).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At leavevt and lastclose time, cancel any pending retire work handler
invocation, and keep the retire work handler from requeuing itself if it is
currently running.
This patch restructures i915_gem_idle to perform all of these tasks instead
of having both leavevt and lastclose call a sequence of functions.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should improve performance by avoiding uncached reads by the CPU (the
point of having a status page), and may improve stability. This patch only
affects G33, GM45 and G45 chips as those are the only ones using GTT-based
HWS mappings.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
G45 appears quite sensitive to ring initialization register writes,
sometimes leaving the HEAD register with the START register contents. Check
to make sure HEAD is reset correctly when START is written, and fix it up,
screaming loudly.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes failure to map the ringbuffer when PAT tells us we don't get to do
uncached on something that's already mapped WC, or something along those lines.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the conversion for GEM, we had stopped using the hardware lock to protect
ring usage, since it was all internal to the DRM now. However, some paths
weren't converted to using struct_mutex to prevent multiple threads from
concurrently working on the ring, in particular between the vblank swap handler
and ioctls.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GEM allows the creation of persistent buffer objects accessible by the
graphics device through new ioctls for managing execution of commands on the
device. The userland API is almost entirely driver-specific to ensure that
any driver building on this model can easily map the interface to individual
driver requirements.
GEM is used by the 2d driver for managing its internal state allocations and
will be used for pixmap storage to reduce memory consumption and enable
zero-copy GLX_EXT_texture_from_pixmap, and in the 3d driver is used to enable
GL_EXT_framebuffer_object and GL_ARB_pixel_buffer_object.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>