Manaul revert of 0cdab21f9a, just to
remove the call to disable the clock gatings and powerctx before
suspend.
Peter Clifton bisected a suspend failure on his gme45 and found this to
be the culprit. As this was intended to be a fix for a similar suspend
failure for Ironlake (it didn't work), undoing this patch should have no
other side-effects.
Reported-and-tested-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The relative-to-general state default is useless as it means having to
rewrite the streaming kernels for each batch. Relative-to-surface is
more useful, as that stream usually needs to be rewritten for each
batch. And absolute addressing mode, vital if you start streaming
state, is also only available by adjusting the register...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add an interrupt handler for switching graphics frequencies and handling
PM interrupts. This should allow for increased performance when busy
and lower power consumption when idle.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This patch changes the strategy for pageflip completion
timestamping. It detects if the pageflip completion
routine gets executed before or after drm_handle_vblank,
and thereby decides if the returned vblank count and
timestamp must be incremented by 1 frame(duration) or
not. It compares the current system time at invocation
against the current vblank timestamp. If the difference
is more than 0.9 video refresh interval durations then
it assumes the vblank timestamp and count are outdated
and need to be incremented and does so. Otherwise it
assumes a delayed pageflip irq and doesn't correct
the timestamp and count.
Advantage of this patch: Pageflip timestamping becomes
more robust against implementation errors and is
maintenance free for future GPU's.
Disadvantage: A few dozen (hundred?) nsecs extra
time spent in pageflip irq handler for each flip,
compared to hard-coded per-gpu settings?
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
v2: Change IS_IRONLAKE to IS_GEN5 to adapt to 2.6.37
This patch adds new functions for use by the drm core:
.get_vblank_timestamp() provides a precise timestamp
for the end of the most recent (or current) vblank
interval of a given crtc, as needed for the DRI2
implementation of the OML_sync_control extension.
It is a thin wrapper around the drm function
drm_calc_vbltimestamp_from_scanoutpos() which does
almost all the work.
.get_scanout_position() provides the current horizontal
and vertical video scanout position and "in vblank"
status of a given crtc, as needed by the drm for use by
drm_calc_vbltimestamp_from_scanoutpos().
The patch modifies the pageflip completion routine
to use these precise vblank timestamps as the timestamps
for pageflip completion events.
This code has been only tested on a HP-Mini Netbook with
Atom processor and Intel 945GME gpu. The codepath for
(IS_G4X(dev) || IS_GEN5(dev) || IS_GEN6(dev)) gpu's
has not been tested so far due to lack of hardware.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* 'nouveau/drm-nouveau-next' of ../drm-nouveau-next: (93 commits)
drm/nv50: fix a couple of vm init issues
drm/nv04-nv40: Fix up PCI(E) GART DMA object bus address calculation.
drm/nouveau: kick vram functions out into an "engine"
drm/nouveau: allow gpuobj vinst to be a virtual address when necessary
drm/nv50: tidy up PCIEGART implementation
drm/nv50: enable non-contig vram allocations where requested
drm/nv50: enable 4KiB pages for small vram allocations
drm/nv50: implement global channel address space on new VM code
drm/nv50: implement BAR1/BAR3 management on top of new VM code
drm/nv50: import new vm code
drm/nv50: implement custom vram mm
drm/nouveau: Avoid potential race between nouveau_fence_update() and context takedown.
drm/nouveau: fix use of drm_mm_node in semaphore object
drm/nouveau: wrap calls to ttm_bo_validate()
drm/nouveau: no need to zero dma objects, we fill them completely anyway
drm/nouveau: introduce a util function to wait on reg != val
drm/nouveau: implicitly insert non-DMA objects into RAMHT
drm/nouveau: make fifo.create_context() responsible for mapping control regs
drm/nouveau: Spin for a bit in nouveau_fence_wait() before yielding the CPU.
drm/nouveau: Use WC memory on the AGP GART.
...
Drivers using their own implementation of io_mem_reserve/io_mem_free are
likely to store the tracking information for the map in mem.mm_node, so
it can't be freed while still mapped.
Signed-off-by: Ben Skeggs<bskeggs@redhat.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds a bo create, and fence seq tracking tracepoints.
This is just an initial set to play around with, we should investigate
what others we need would be useful.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes overwriting the first page table entry when testing that the PRAMIN
BAR can be correctly read/written, and adds an additional bar flush after
poking the BAR3 control regs.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add frame buffer compression on Sandybridge. The method is similar to
Ironlake, except that two new registers of type GTTMMADR must be written
with the right fence info.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add the support of memory self-refresh on Sandybridge, which is now
support 3 levels of watermarks and the source of the latency values
for watermarks has changed.
On Sandybridge, the LP0 WM value is not hardcoded any more. All the
latency value is now should be extracted from MCHBAR SSKPD register.
And the MCHBAR base address is changed, too.
For the WM values, if any calculated watermark values is larger than
the maximum value that can be programmed into the associated watermark
register, that watermark must be disabled.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
[ickle: remove duplicate compute routines and fixup for checkpatch]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be paranoid and ensure that the vblank has passed and the scanout has
switched to the new fb, before unpinning the old one and possibly
tearing down its PTEs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Restore PIPE_CONTROL once again just for Ironlake, as it appears that
MI_USER_INTERRUPT does not have the same coherency guarantees, that is
on Ironlake the interrupt following a GPU write is not guaranteed to
arrive after the write is coherent from the CPU, as it does on the
other generations.
Reported-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reported-by: Shuang He <shuang.he@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32402
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we already know the limits for the hardware clock, pass it down
rather than recomputing them for each match.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to enforce the correct memory barriers for irq get/put, we need
to perform the actual counting using atomic operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
BIOSes. Can't live without them (apparently), definitely can't live with
them.
Reported-by: Ben Gamari <bgamari@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=24312
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once we have read the value out of the GT power well, we need to remove
the FORCE WAKE bit to allow the system to auto-power down.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As suggested by Daniel Vetter, this is a safeguard should any of the
registers cause reference to PTE entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we provide a list of all objects that will be accessed from the
batchbuffer, we can build a lut of the handles associated with those
objects for this invocation and use that to avoid the overhead of
looking up those objects again for every relocation.
The cost of building and searching a small hash table is much less than
that of acquiring a spinlock, searching a radix tree and manipulating an
atomic refcnt per relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Don't post a downclocking task if the device is still active when the
idle timer fires. A pathological process could queue up several seconds
worth of processing and then go to sleep, during which time the idle
timer would kick in and downclock the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the tail advances beyond the autoreport HEAD value, then we need to
fallback to an uncached read of the HEAD register in order to ascertain
the correct amount of remaining space in the ringbuffer.
Reported-by: Fang, Xun <xunx.fang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32259
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The DisplayPort standard (1.1a) states that:
The I2C-over-AUX Reply field is valid only when Native AUX CH Reply
field is AUX_ACK (00). When Native AUX CH Reply field is not 00, then,
I2C-over-AUX Reply field must be 00 and be ignored.
This fixes broken EDID reading when using an active DisplayPort to
duallink DVI converter. If the AUX CH replier chooses to defer the
transaction, a short read occurs and erroneous data is returned as
the i2c reply due to a lack of length checking and failure to check
for AUX ACK.
As a result, broken EDIDs can look like:
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
00: bc bc bc ff bc bc bc ff bc bc bc ac bc bc bc 45 ???.???.???????E
10: bc bc bc 10 bc bc bc 34 bc bc bc ee bc bc bc 4c ???????4???????L
20: bc bc bc 50 bc bc bc 00 bc bc bc 40 bc bc bc 00 ???P???.???@???.
30: bc bc bc 01 bc bc bc 01 bc bc bc a0 bc bc bc 40 ???????????????@
40: bc bc bc 00 bc bc bc 00 bc bc bc 00 bc bc bc 55 ???.???.???.???U
50: bc bc bc 35 bc bc bc 31 bc bc bc 20 bc bc bc fc ???5???1??? ????
60: bc bc bc 4c bc bc bc 34 bc bc bc 46 bc bc bc 00 ???L???4???F???.
70: bc bc bc 38 bc bc bc 11 bc bc bc 20 bc bc bc 20 ???8??????? ???
80: bc bc bc ff bc bc bc ff bc bc bc ff bc bc bc ff ???.???.???.???.
...
which can lead to:
[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder
[drm:drm_edid_block_valid] *ERROR* Raw EDID:
<3>30 30 30 30 30 30 30 32 38 32 30 32 63 63 31 61 000000028202cc1a
<3>28 00 02 8c 00 00 00 00 18 00 00 00 00 00 00 00 (...............
<3>20 4c 61 73 74 20 62 65 61 63 6f 6e 3a 20 33 32 Last beacon: 32
<3>32 30 6d 73 20 61 67 6f 46 00 05 8c 00 00 00 00 20ms agoF.......
<3>36 00 00 00 00 00 00 00 00 0c 57 69 2d 46 69 20 6.........Wi-Fi
<3>52 6f 75 74 65 72 01 08 82 84 8b 96 24 30 48 6c Router......$0Hl
<3>03 01 01 06 02 00 00 2a 01 00 2f 01 00 32 04 0c .......*../..2..
<3>12 18 60 dd 09 00 10 18 02 00 00 01 00 00 18 00 ..`.............
Signed-off-by: David Flynn <davidf@rd.bbc.co.uk>
[ickle: fix up some surrounding checkpatch warnings]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
NVC0 will be able to share some of nv50's paths this way. This also makes
it the card-specific vram code responsible for deciding if a given set
of tile_flags is valid, rather than duplicating the allowed types in
nv50_vram.c and nouveau_gem.c
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As of this commit, it's guaranteed that if an object is in VRAM that its
GPU virtual address will be constant.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is required on nv50 as we need to be able to have more precise control
over physical VRAM allocations to avoid buffer corruption when using
buffers of mixed memory types.
This removes some nasty overallocation/alignment that we were previously
using to "control" this problem.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
At some point in the future, this bo won't necessarily be backed by
a drm_mm_node, so use the start/size fields of the ttm_mem_reg instead.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>