Add the interfaces to allow user space to create and destroy contexts.
Contexts are destroyed automatically if the file descriptor for the dri
device is closed.
Following convention as usual here causes checkpatch warnings.
v2: with is_initialized, no longer need to init at create
drop the context switch on create (daniel)
v3: Use interruptible lock (Chris)
return -ENODEV in !GEM case (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Implement the context switch code as well as the interfaces to do the
context switch. This patch also doesn't match 1:1 with the RFC patches.
The main difference is that from Daniel's responses the last context
object is now stored instead of the last context. This aids in allows us
to free the context data structure, and context object independently.
There is room for optimization: this code will pin the context object
until the next context is active. The optimal way to do it is to
actually pin the object, move it to the active list, do the context
switch, and then unpin it. This allows the eviction code to actually
evict the context object if needed.
The context switch code is missing workarounds, they will be implemented
in future patches.
v2: actually do obj->dirty=1 in switch (daniel)
Modified comment around above
Remove flags to context switch (daniel)
Move mi_set_context code to i915_gem_context.c (daniel)
Remove seqno , use lazy request instead (daniel)
v3: use i915_gem_request_next_seqno instead of
outstanding_lazy_request (Daniel)
remove id's from trace events (Daniel)
Put the context BO in the instruction domain (Daniel)
Don't unref the BO is context switch fails (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Invent an abstraction for a hw context which is passed around through
the core functions. The main bit a hw context holds is the buffer object
which backs the context. The rest of the members are just helper
functions. Specifically the ring member, which could likely go away if
we decide to never implement whatever other hw context support exists.
Of note here is the introduction of the 64k alignment constraint for the
BO. If contexts become heavily used, we should consider tweaking this
down to 4k. Until the contexts are merged and tested a bit though, I
think 64k is a nice start (based on docs).
Since we don't yet switch contexts, there is really not much complexity
here. Creation/destruction works pretty much as one would expect. An idr
is used to generate the context id numbers which are unique per file
descriptor.
v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Very basic code for context setup/destruction in the driver.
Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.
In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.
All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.
There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.
As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.
v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
For that to work we need to export the base address of the gtt
mmio window from intel-gtt. Also replace all other uses of
dev->agp by values we already have at hand.
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need the latest dma-buf code from Dave Airlie so that we can pimp
the backing storage handling code in drm/i915 with Chris Wilson's
unbound tracking and stolen mem backed gem object code.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If any l3 rows have been previously remapped, we must remap them after
GPU reset/resume too.
v2: Just return (no warn) on remapping init if not IVB (Jesse)
Move the check of schizo userspace to i915_gem_l3_remap (Jesse)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On IVB hardware we are given an interrupt whenever a L3 parity error
occurs in the L3 cache. The L3 cache is used by internal GPU clients
only. This is a very rare occurrence (in fact to test this I need to
use specially instrumented silicon).
When a row in the L3 cache detects a parity error the HW generates an
interrupt. The interrupt is masked in GTIMR until we get a chance to
read some registers and alert userspace via a uevent. With this
information userspace can use a sysfs interface (follow-up patch) to
remap those rows.
Way above my level of understanding, but if a given row fails, it is
statistically more likely to fail again than a row which has not failed.
Therefore it is desirable for an operating system to maintain a lifelong
list of failing rows and always remap any bad rows on driver load.
Hardware limits the number of rows that are remappable per bank/subbank,
and should more than that many rows detect parity errors, software
should maintain a list of the most frequent errors, and remap those
rows.
V2: Drop WARN_ON(IS_GEN6) (Jesse)
DRM_DEBUG row/bank/subbank on errror (Jesse)
Comment updates (Jesse)
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wait request is poorly named IMO. After working with these functions for
some time, I feel it's much clearer to name the functions more
appropriately.
Of course we must update the callers to use the new name as well.
This leaves room within our namespace for a *real* wait request function
at some point.
Note to maintainer: this patch is optional.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This helps implement GL_ARB_sync but stops short of allowing full blown
sync objects. Finally we can use the new timed seqno waiting function
to allow userspace to wait on a buffer object with a timeout. This
implements that interface.
The IOCTL will take as input a buffer object handle, and a timeout in
nanoseconds (flags is currently optional but will likely be used for
permutations of flush operations). Users may specify 0 nanoseconds to
instantly check.
The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
non-zero timeout parameter the wait ioctl will wait for the given number
of nanoseconds on an object becoming unbusy. Since the wait itself does
so holding struct_mutex the object may become re-busied before this
completes. A similar but shorter race condition exists in the busy
ioctl.
v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
Flush the object from the gpu write domain (Chris + Daniel)
Fix leaked refcount in good case (Chris)
Naturally align ioctl struct (Chris)
v3: Drop lock after getting seqno to avoid ugly dance (Chris)
v4: check for 0 timeout after olr check to allow polling (Chris)
v5: Updated the comment. (Chris)
v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
Fix the commit message comment to be less ugly (Ben)
Add a warning to check the return timespec (Ben)
v7: Use DRM_AUTH for the ioctl. (Eugeni)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds handle->fd and fd->handle support to i915, this is to allow
for offloading of rendering in one direction and outputs in the other.
v2 from Daniel Vetter:
- fixup conflicts with the prepare/finish gtt prep work.
- implement ppgtt binding support.
Note that we have squat i-g-t testcoverage for any of the lifetime and
access rules dma_buf/prime support brings along. And there are quite a
few intricate situations here.
Also note that the integration with the existing code is a bit
hackish, especially around get_gtt_pages and put_gtt_pages. It imo
would be easier with the prep code from Chris Wilson's unbound series,
but that is for 3.6.
Also note that I didn't bother to put the new prepare/finish gtt hooks
to good use by moving the dma_buf_map/unmap_attachment calls in there
(like we've originally planned for).
Last but not least this patch is only compile-tested, but I've changed
very little compared to Dave Airlie's version. So there's a decent
chance v2 on drm-next works as well as v1 on 3.4-rc.
v3: Right when I've hit sent I've noticed that I've screwed up one
obj->sg_list (for dmar support) and obj->sg_table (for prime support)
disdinction. We should be able to merge these 2 paths, but that's
material for another patch.
v4: fix the error reporting bugs pointed out by ickle.
v5: fix another error, and stop non-gtt mmaps on shared objects
stop pread/pwrite on imported objects, add fake kmap
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.
Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.
Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.
v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The line time can be programmed according to the number of horizontal
pixels vs effective pixel rate ratio.
v2: improve comment as per Chris Wilson suggestion
v3: incorporate latest changes in specs.
v4: move into wm update routine, also mention that the same routine can
program IPS watermarks. We do not have their enablement code yet, nor
handle the required clock settings at the moment, so this patch won't
program those values for now.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Only half of them even cared, and it's always the same one.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
- reduce the irq disabled section, even for a debugfs file this was
way too long.
- always disable irqs when taking the lock.
v2: Thou shalt not mistake locking for reference counting, so:
- reference count the error_state to protect from concurent freeeing.
This will be only really used in the next patch.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gpu reset is a very important piece of our infrastructure.
Unfortunately we only really it test by actually hanging the gpu,
which often has bad side-effects for the entire system. And the gpu
hang handling code is one of the rather complicated pieces of code we
have, consisting of
- hang detection
- error capture
- actual gpu reset
- reset of all the gem bookkeeping
- reinitialition of the entire gpu
This patch adds a debugfs to selectively stopping rings by ceasing to
update the hw tail pointer, which will result in the gpu no longer
updating it's head pointer and eventually to the hangcheck firing.
This way we can exercise the gpu hang code under controlled conditions
without a dying gpu taking down the entire systems.
Patch motivated by me forgetting to properly reinitialize ppgtt after
a gpu reset.
Usage:
echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
echo 0xffffffff > i915_ring_stop # stops all, future-proof version
then run whatever testload is desired. i915_ring_stop automatically
resets after a gpu hang is detected to avoid hanging the gpu to fast
and declaring it wedged.
v2: Incorporate feedback from Chris Wilson.
v3: Add the missing cleanup.
v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
Eugeni Dodonov.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Every time we use the device after a period of idleness, check that the
power management setup is still sane. This is to workaround a bug
whereby it seems that we begin suppressing power management interrupts,
preventing SandyBridge+ from going into turbo mode.
This patch does have a side-effect. It removes the mark-busy for just
moving the cursor - we don't want to increase the render clock just for
the sprite, though we may want to bump the display frequency. I'd argue
that we do not, and certainly don't want to take the struct_mutex here
due to the large latencies that introduces.
References: https://bugs.freedesktop.org/show_bug.cgi?id=44006
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To get the fun stuff out of the way, the legacy hws is allocated by
userspace when the gpu needs a gfx hws. And there's no reference-counting
going on, so userspace can simply screw everyone over.
At least it's not as horrible as i810, where the ringbuffer is allocated
by userspace ...
We can't fix this disaster, but we can at least tidy up the code a
bit to make things clearer:
- Drop the drm ioremap indirection.
- Add a new new read_legacy_status_page to paper over the differences
between the legacy gfx hws and the physical hws shared with the
new ringbuffer code.
- Add a pointer in dev_priv->dri1 for the cpu addresses - that one is
an iomem remapping as opposed to all other hw status pages. This is
just prep work to make sparse happy.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Wohoo!
Now we only need to move all the gem/kms stuff that accidentally
landed in i915_dma.c out of it, and this will be our legacy dri1
grave-yard.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and hide it in i915_dma.c.
This way all the legacy stuff dealing with READ_BREADCRUMB and
LP_RING and friends is in i915_dma.c.
v2: Rebase on top of Chris Wilson's rework irq handling code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let's just get this out of the way.
v2: Rebase against ENODEV changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Assigned in setparam, used never.
I didn't bother to dig through the archives to figure out what
this was supposed to do.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... and shove allow_batchbuffer in there. More dragons will
follow suit.
There's the curious case that we allow this for KMS ...
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_dma.c contains most of the old dri1 horror-show, so move
the remaining bits there, too. The code has been removed and
the only thing left are some stubs to ensure that userspace
doesn't try to use this stuff. vblank_pipe_set only returns 0
without any side-effects, so we can even stub it out with
the canonical drm_noop.
v2: Rebase against ENODEV changes.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
vblank_pipe was intended to be used for tracking DRI1 state. However,
the vblank_pipe reported to DRI1 is fixed to umask both pipes, and the
dev_priv->vblank_pipe unused and superfluous.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.
v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This extra bit of interrupt enabling code doesn't belong in the wait
seqno function. If anything we should pull it out to a helper so the
throttle code can also use it. The history is a bit vague, but I am
going to attempt to just dump it, unless someone can argue otherwise.
Removing this allows for a shared lock free wait seqno function. To keep
tabs on this issue though, the IER value is stored on error capture
(recommended by Chris Wilson)
v2: fixed typo EIR->IER (Ben)
Fix some white space (Ben)
Move IER capture to globally instead of per ring (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: ier is a 16 bit reg on gen2!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.
The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.
v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On later gen3, you are able to select the meaning of the FlipPending
status bit in IIR and change it to FlipDone. This was sometimes done by
the BIOS leading to confusion on just how pageflipping worked on gen3.
Simplify the implementation by using the legacy meaning for all gen3
machines.
Note: this makes all gen3 machines equally broken...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And remove the cargo-culted copy from the valleyview irq handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Always true these days. It has been added originally to work
around some issues with the agp layer in 2.6.29:
commit ac5c4e7618
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Dec 19 15:38:34 2008 +1000
drm/i915: GEM on PAE has problems - disable it for now.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We slightly modify the initialisation sequence to move the
initialisation of the memory managers earlier and in particular before
probing outputs and detecting any existing output configuration. This is
essential if we wish to track preallocated objects and preserve them
whilst initialising GEM.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The use of the mm_list by deferred-free breaks the following patches to
extend the range of objects tracked. We can simplify things if we just
make the unbind during free uninterrutible.
Note that unbinding should never fail, because we hold an additional
reference on every active object. Only the ilk vt-d workaround breaks
this, but already takes care of not failing by waiting for the gpu to
quiescent non-interruptible. But the existence of the deferred free
list casted some doubts on this theory, hence WARN if the unbind fails
and only then retry non-interruptible.
We can kill this additional code after a release in case the theory is
indeed right and no one has hit that WARN.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Simplify object tracking by removing the inactive but pinned list. The
only place where this was used is for counting the available memory,
which is just as easy performed by checking all objects on the rare
occasions it is required (application startup). For ease of debugging,
we keep the reporting of pinned objects through the error-state and
debugfs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was only used by one external caller who would just be as happy
with evict-everything, so perform the replacement and make the function
private.
In the process we note that unbinding the inactive list should not fail,
and make it a warning instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PCH PLLs aren't required for outputs on the CPU, so we shouldn't just
treat them as part of the pipe.
So split the code out and manage PCH PLLs separately, allocating them
when needed or trying to re-use existing PCH PLL setups when the timings
match.
v2: add num_pch_pll field to dev_priv (Daniel)
don't NULL the pch_pll pointer in disable or DPMS will fail (Jesse)
put register offsets in pll struct (Chris)
v3: Decouple enable/disable of PLLs from get/put.
v4: Track temporary PLL disabling during modeset
v5: Tidy PLL initialisation by only checking for num_pch_pll == 0 (Eugeni)
v6: Avoid mishandling allocation failure by embedding the small array of
PLLs into the device struct
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44309
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (up to v2)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3+)
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Tested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename obj->tiling_changed to obj->fence_dirty so that it is clear that
it flags when the parameters for an active fence (including the
no-fence) register are changed.
Also, do not set this flag when the object does not have a fence
register allocated currently and the gpu does not depend upon the
unfence. This case works exactly like when a tiled object lost its
fence and hence does not need additional handling for the tiling
change in the code.
v2: Use fence_dirty to better express what the flag tracks and add a few
more details to the comments to serve as a reminder of how the GPU also
uses the unfenced register slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add some bikeshed to the commit message about the stricter
use of fence_dirty.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Almost all of the errors related __iomem problems.
Most of the changes here are trivial, however there is plenty of chance
for yank/paste errors.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now never pipeline a fence update, obj->last_fenced_ring is always
the same as the obj->ring whenever obj->last_fenced_seqno is active, so
remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we now no longer track a pipelined fence change, we never use
ring->setup_seqno and can kill it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.
Step 1 is the removal of the pipeline parameter to get_fence().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This should contain all the changes which require no thought to make
sparse happy.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After a gpu reset we need to re-init some of the hw state we only
initialize when modeset is enabled, like rc6, hw contexts or render/GT
core clock gating and workaround register settings.
Note that this patch has a small change in the resume code:
- rc6 on gen6+ is only restored for the modeset case (for more
consistency with other callsites). This is no problem because recent
kernels refuse to load drm/i915 without kms on gen6+
- rc6/emon on ilk is only restored for the modeset case. This is no
problem because rc6 is disabled by default on ilk, and ums on ilk
has never really been a supported option outside of horrible rhel
backports.
v2: Chris Wilson noticed that we not only fail to restore the clock
gating settings after gpu reset.
v3: Move the call to modeset_init_hw in _reset out of the
struct_mutext protected area - other callers don't hold it, too.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Merge rc6 information into the power group for our device. Until now the
i915 driver has not had any sysfs entries (aside from the connector
stuff enabled by drm core). Since it seems like we're likely to have
more in the future I created a new file for sysfs stubs, as well as the
rc6 sysfs functions which don't really belong elsewhere (perhaps
i915_suspend, but most of the stuff is in intel_display,c).
displays rc6 modes enabled (as a hex mask):
cat /sys/class/drm/card0/power/rc6_enable
displays #ms GPU has been in rc6 since boot:
cat /sys/class/drm/card0/power/rc6_residency_ms
displays #ms GPU has been in deep rc6 since boot:
cat /sys/class/drm/card0/power/rc6p_residency_ms
displays #ms GPU has been in deepest rc6 since boot:
cat /sys/class/drm/card0/power/rc6pp_residency_ms
Important note: I've seen on SNB that even when RC6 is *not* enabled the
rc6 register seems to have a random value in it. I can only guess at the
reason reason for this. Those writing tools that utilize this value need
to be careful and probably want to scrutinize the value very carefully.
v2: use common rc6 residency units to milliseconds for the other RC6 types
v3: don't create sysfs files for GEN <= 5
add a rc6_enable to show a mask of enabled rc6 types
use unmerge instead of remove for sysfs group
squash intel_enable_rc6() extraction into this patch
v4: rename sysfs files (Chris)
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>f
CC: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: squash in the 64bit division fix by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).
v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)
To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
- I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
- We don't want to enable workarounds for early silicon unless we have
to.
- I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
- This should be how the code already worked, unless I am
misunderstanding your meaning.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
ids are not yet added, and there's still quite a few patches to merge
(mostly modesetting). To make QA easier I've decided to merge this stuff
in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
is a real frankenstein chip, but also hsw is stretching our driver's
code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
there are more patches needed (and some already queued up), but I wanted
to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
a few months already and a lot of i-g-t tests have been created for it.
Now it's finally ready to be merged. Note that one patch in this series
touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
driver we actually need to support by bailing out of unsupported case.
The driver now refuses to load without kms on gen6+ and disallows a few
ioctls that userspace never used in certain cases. More of this will
definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
in this way).
- Small fixlets and adjustements and some minor things to help debugging.
Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."
* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
drm/i915: VCS is not the last ring
drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
drm/i915: make quirks more verbose
drm/i915: dump the DMA fetch addr register on pre-gen6
drm/i915/sdvo: Include YRPB as an additional TV output type
drm/i915: disallow gem init ioctl on ilk
drm/i915: refuse to load on gen6+ without kms
drm/i915: extract gt interrupt handler
drm/i915: use render gen to switch ring irq functions
drm/i915: rip out old HWSTAM missed irq WA for vlv
drm/i915: open code gen6+ ring irqs
drm/i915: ring irq cleanups
drm/i915: add SFUSE_STRAP registers for digital port detection
drm/i915: add WM_LINETIME registers
drm/i915: add WRPLL clocks
drm/i915: add LCPLL control registers
drm/i915: add SSC offsets for SBI access
drm/i915: add port clock selection support for HSW
drm/i915: add S PLL control
drm/i915: add PIXCLK_GATE register
...
Conflicts:
drivers/char/agp/intel-agp.h
drivers/char/agp/intel-gtt.c
drivers/gpu/drm/i915/i915_debugfs.c