Next part of cleaning up the watermark code for skl. This is easy, since
it seems that we never actually needed to keep track of the linetime in
the skl_wm_values struct anyway.
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
First part of cleaning up all of the skl watermark code. This moves the
structures for storing the ddb allocations of each pipe into
intel_crtc_state, along with moving the structures for storing the
current ddb allocations active on hardware into intel_crtc.
Changes since v1:
- Don't replace alloc->start = alloc->end = 0;
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Saves 944 bytes of .rodata strings and 128 bytes of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Saves 864 bytes of .rodata strings and ~100 of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Saves 1392 bytes of .rodata strings.
Also change a few function/macro prototypes in i915_gem_gtt.c
from dev to dev_priv where it made more sense to do so.
v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Mention function prototype changes. (David Weinehall)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Saves 1016 bytes of .rodata strings and couple hundred of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
This saves 3248 bytes of .rodata strings.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
Mika wanted to know what requests were pending at the time of a hang as
we now track which requests we have submitted to the hardware.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161013101815.26978-1-chris@chris-wilson.co.uk
Our error states are quickly growing, pinning kernel memory with them.
The majority of the space is taken up by the error objects. These
compress well using zlib and without decode are mostly meaningless, so
encoding them does not hinder quickly parsing the error state for
familiarity.
v2: Make the zlib dependency optional
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-6-chris@chris-wilson.co.uk
The error state is purposefully racy as we expect it to be called at any
time and so have avoided any locking whilst capturing the crash dump.
However, with multi-engine GPUs and multiple CPUs, those races can
manifest into OOPSes as we attempt to chase dangling pointers freed on
other CPUs. Under discussion are lots of ways to slow down normal
operation in order to protect the post-mortem error capture, but what it
we take the opposite approach and freeze the machine whilst the error
capture runs (note the GPU may still running, but as long as we don't
process any of the results the driver's bookkeeping will be static).
Note that by of itself, this is not a complete fix. It also depends on
the compiler barriers in list_add/list_del to prevent traversing the
lists into the void. We also depend that we only require state from
carefully controlled sources - i.e. all the state we require for
post-mortem debugging should be reachable from the request itself so
that we only have to worry about retrieving the request carefully. Once
we have the request, we know that all pointers from it are intact.
v2: Avoid drm_clflush_pages() inside stop_machine() as it may use
stop_machine() itself for its wbinvd fallback.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-3-chris@chris-wilson.co.uk
We currently capture the GPU state after we detect a hang. This is vital
for us to both triage and debug hangs in the wild (post-mortem
debugging). However, it comes at the cost of running some potentially
dangerous code (since it has to make very few assumption about the state
of the driver) that is quite resource intensive.
This patch introduces both a method to disable error capture at runtime
(for users who hit bugs at runtime and need a workaround) and to disable
error capture at compiletime (for realtime users who want to minimise
any possible latency, and never require error capture, saving ~30k of
code). The cost is that we now have to be wary of (and test!) a kconfig
flag and a module parameter. The effect of the module parameter is easy
to verify through code inspection and runtime testing, but a kconfig flag
needs regular compile checking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-2-chris@chris-wilson.co.uk
In the next patch, I want to conditionally compile i915_gpu_error.c and
that requires moving the functions used by debug out of
i915_gpu_error.c!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-1-chris@chris-wilson.co.uk
Get rid of SEP_SEMICOLON and SEP_BLANK in DEV_INFO_FOR_EACH_FLAG.
Consolidate the debug output so that instead of one huge line with
"cap1,cap2,capN" each capability is split to own line and displayed
as "capN: [yes|no]" to make the dumps more historically informative.
v2:
- Do not break auto-indent by keeping semicolon after macro (Jani)
- Consolidate and use yesno() in all locations (Chris)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Include the position of the active request in the ring, and display that
alongside the current RING registers (on a GPU hang).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-6-chris@chris-wilson.co.uk
If we store this in the uncore structure we are on a good way to
show more commonality between the per-platform implementations.
v2: Constify table pointer and correct coding style. (Chris Wilson)
v3: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
There are current places in the code, and there will be more in the
future, which iterate the forcewake domains to find out which ones
are currently active.
To save them from doing this iteration, we can cheaply keep a mask
of active domains in dev_priv->uncore.fw_domains_active.
This has no cost in terms of object size, even manages to shrink it
overall by 368 bytes on my config.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Paneri, Praveen" <praveen.paneri@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The plan is to introduce intel_has_sagv() and then use it to discover
which platforms actually support it.
I thought about keeping the functions with their current skl names,
but found two problems: (i) skl_has_sagv() would become a very
confusing name, and (ii) intel_atomic_commit_tail() doesn't seem to be
calling any functions whose name start with a platform name, so the
"intel_" naming scheme seems make more sense than the "firstplatorm_"
naming scheme here.
Cc: stable@vger.kernel.org
Reviewed-by: Lyude <cpaul@redhat.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474578035-424-2-git-send-email-paulo.r.zanoni@intel.com
Ever since I started working on FBC I was already aware that FBC can
really amplify the FIFO underrun symptoms. On systems where FIFO
underruns were harmless error messages, enabling FBC would cause the
underruns to give black screens.
We recently tried to enable FBC on Haswell and got reports of a system
that would hang after some hours of uptime, and the first bad commit
was the one that enabled FBC. We also observed that this system had
FIFO underrun error messages on its dmesg. Although we don't have any
evidence that fixing the underruns would solve the bug and make FBC
work properly on this machine, IMHO it's better if we minimize the
amount of possible problems by just giving up FBC whenever we detect
an underrun.
v2: New version, different implementation and commit message.
v3: Clarify the fact that we run from an IRQ handler (Chris).
v4: Also add the underrun_detected check at can_choose() to avoid
misleading dmesg messages (DK).
v5: Fix Engrish, use READ_ONCE on the unlocked read (Chris).
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Lyude <cpaul@redhat.com>
Cc: stevenhoneyman@gmail.com <stevenhoneyman@gmail.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473773937-19758-1-git-send-email-paulo.r.zanoni@intel.com
DP MST provides the capability to send multiple video and audio streams
through a single port. This requires the API's between i915 and audio
drivers to distinguish between multiple audio capable displays that can be
connected to a port. Currently only the port identity is shared in the
APIs. This patch adds support for MST with an additional parameter
'int pipe'. The existing parameter 'port' does not change it's meaning.
pipe =
MST : display pipe that the stream originates from
Non-MST : -1
Affected APIs:
struct i915_audio_component_ops
- int (*sync_audio_rate)(struct device *, int port, int rate);
+ int (*sync_audio_rate)(struct device *, int port, int pipe,
+ int rate);
- int (*get_eld)(struct device *, int port, bool *enabled,
- unsigned char *buf, int max_bytes);
+ int (*get_eld)(struct device *, int port, int pipe,
+ bool *enabled, unsigned char *buf, int max_bytes);
struct i915_audio_component_audio_ops
- void (*pin_eld_notify)(void *audio_ptr, int port);
+ void (*pin_eld_notify)(void *audio_ptr, int port, int pipe);
This patch makes dummy changes in the audio drivers (thanks Libin) for
build to succeed. The audio side drivers will send the right 'pipe' values
for MST in patches that will follow.
v2:
Renamed the new API parameter from 'dev_id' to 'pipe'. (Jim, Ville)
Included Asoc driver API compatibility changes from Jeeja.
Added WARN_ON() for invalid pipe in get_saved_encoder(). (Takashi)
Added comment for av_enc_map[] definition. (Takashi)
v3:
Fixed logic error introduced while renaming 'dev_id' as 'pipe' (Ville)
Renamed get_saved_encoder() to get_saved_enc() to reduce line length
v4:
Rebased.
Parameter check for pipe < -1 values in get_saved_enc() (Ville)
Switched to for_each_pipe() in get_saved_enc() (Ville)
Renamed 'pipe' to 'dev_id' in audio side code (Takashi)
v5:
Included a comment for the dev_id arg. (Libin)
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474488168-2343-1-git-send-email-dhinakaran.pandiyan@intel.com
Fix sparse warnings:
drivers/gpu/drm/i915/i915_drv.c:1179:5: warning: symbol
'i915_driver_load' was not declared. Should it be static?
drivers/gpu/drm/i915/i915_drv.c:1267:6: warning: symbol
'i915_driver_unload' was not declared. Should it be static?
drivers/gpu/drm/i915/i915_drv.c:2444:25: warning: symbol 'i915_pm_ops'
was not declared. Should it be static?
Fixes: 42f5551d27 ("drm/i915: Split out the PCI driver interface to i915_pci.c")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473946137-1931-3-git-send-email-jani.nikula@intel.com
Storing the port enum in intel_encoder makes it convenient to know the
port attached to an encoder. Moving the port information up from
intel_digital_port to intel_encoder avoids unecessary intel_digital_port
access and handles MST encoders cleanly without requiring conditional
checks for them (thanks danvet).
v2:
Renamed the port enum member from 'attached_port' to 'port' (danvet)
Fixed missing initialization of port in intel_sdvo.c (danvet)
v3:
Fixed missing initialization of port in intel_crt.c (Ville)
v4:
Storing port for DVO encoders too.
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Lyude <cpaul@redhat.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474334681-22690-3-git-send-email-dhinakaran.pandiyan@intel.com
Consolidate the instdone logic so we can get a bit fancier. This patch also
removes the duplicated print of INSTDONE[0].
v2: (Imre)
- Rebased on top of hangcheck INSTDONE changes.
- Move all INSTDONE registers into a single struct, store it within the
engine error struct during error capturing.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474379673-28326-1-git-send-email-imre.deak@intel.com
Adding the ddb size into the devide info will avoid
platform checks while computing wm.
v2: Added comment and WARN_ON if ddb size is zero.(Jani)
v3: Added WARN_ON at the right place.(Jani)
Suggested-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Deepak M <m.deepak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473931870-7724-1-git-send-email-m.deepak@intel.com
No functional changes; just renaming a bit, tweaking a datatype,
prettifying layout, and adding comments, in particular in the
GuC setup code that touches this data.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473711577-11454-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We are about to specialize object synchronisation to enable nonblocking
execbuf submission. First we make a copy of the current object
synchronisation for execbuffer. The general i915_gem_object_sync() will
be removed following the removal of CS flips in the near future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
Update reset path in preparation for engine reset which requires
identification of incomplete requests and associated context and fixing
their state so that engine can resume correctly after reset.
The request that caused the hang will be skipped and head is reset to the
start of breadcrumb. This allows us to resume from where we left-off.
Since this request didn't complete normally we also need to cleanup elsp
queue manually. This is vital if we employ nonblocking request
submission where we may have a web of dependencies upon the hung request
and so advancing the seqno manually is no longer trivial.
ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS
We change the way we count pending batches. Only the active context
involved in the reset is marked as either innocent or guilty, and not
mark the entire world as pending. By inspection this only affects
igt/gem_reset_stats (which assumes implementation details) and not
piglit.
ARB_robustness gives this guide on how we expect the user of this
interface to behave:
* Provide a mechanism for an OpenGL application to learn about
graphics resets that affect the context. When a graphics reset
occurs, the OpenGL context becomes unusable and the application
must create a new context to continue operation. Detecting a
graphics reset happens through an inexpensive query.
And with regards to the actual meaning of the reset values:
Certain events can result in a reset of the GL context. Such a reset
causes all context state to be lost. Recovery from such events
requires recreation of all objects in the affected context. The
current status of the graphics reset state is returned by
enum GetGraphicsResetStatusARB();
The symbolic constant returned indicates if the GL context has been
in a reset state at any point since the last call to
GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
has not been in a reset state since the last call.
GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
that is attributable to the current GL context.
INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
is not attributable to the current GL context.
UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
cause is unknown.
The language here is explicit in that we must mark up the guilty batch,
but is loose enough for us to relax the innocent (i.e. pending)
accounting as only the active batches are involved with the reset.
In the future, we are looking towards single engine resetting (with
minimal locking), where it seems inappropriate to mark the entire world
as innocent since the reset occurred on a different engine. Reducing the
information available means we only have to encounter the pain once, and
also reduces the information leaking from one context to another.
v2: Legacy ringbuffer submission required a reset following hibernation,
or else we restore stale values to the RING_HEAD and walked over
stolen garbage.
v3: GuC requires replaying the requests after a reset.
v4: Restore engine IRQ after reset (so waiters will be woken!)
Rearm hangcheck if resetting with a waiter.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
Since we have a cooperative mode now with a direct reset, we can avoid
the contention on struct_mutex and instead try then sleep on the
I915_RESET_IN_PROGRESS bit. If the mutex is held and that bit is
cleared, all is fine. Otherwise, we sleep for a bit and try again. In
the worst case we sleep for an extra second waiting for the mutex to be
released (no one touching the GPU is allowed the struct_mutex whilst the
I915_RESET_IN_PROGRESS bit is set). But when we have a direct reset,
this allows us to clean up the reset worker faster.
v2: Remember to call wake_up_bit() after changing (for the faster wakeup
as promised)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-12-chris@chris-wilson.co.uk
If a waiter is holding the struct_mutex, then the reset worker cannot
reset the GPU until the waiter returns. We do not want to return -EAGAIN
form i915_wait_request as that breaks delicate operations like
i915_vma_unbind() which often cannot be restarted easily, and returning
-EIO is just as useless (and has in the past proven dangerous). The
remaining WARN_ON(i915_wait_request) serve as a valuable reminder that
handling errors from an indefinite wait are tricky.
We can keep the current semantic that knowing after a reset is complete,
so is the request, by performing the reset ourselves if we hold the
mutex.
uevent emission is still handled by the reset worker, so it may appear
slightly out of order with respect to the actual reset (and concurrent
use of the device).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk
In preparation for introducing a per-engine reset, we can first separate
the mixing of the reset state from the global reset counter.
The loss of atomicity in updating the reset state poses a small problem
for handling the waiters. For requests, this is solved by advancing the
seqno so that a waiter waking up after the reset knows the request is
complete. For pending flips, we still rely on the increment of the
global reset epoch (as well as the reset-in-progress flag) to signify
when the hardware was reset.
The advantage, now that we do not inspect the reset state during reset
itself i.e. we no longer emit requests during reset, is that we can use
the atomic updates of the state flags to ensure that only one reset
worker is active.
v2: Mika spotted that I transformed the i915_gem_wait_for_error() wakeup
into a waiter wakeup.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-6-git-send-email-arun.siluvery@linux.intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-7-chris@chris-wilson.co.uk
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platform
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Make the .hws_needs_physical the exception by switching the flag
on earlier platforms since they are fewer to support. Remove the flag on
later GPUs hardware since they all use GTT hws by default.
Switch the logic as well in the driver to reflect this change
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
No need for HAS_CORE_RING_FREQ as that flag is actually the same as
.has_llc. Feedback from V. Syrjala.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[patch series] Moving all GPU features to the platform struct definition
allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct definition
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We don't have safe 64-bit mmio writes as they are really split into
2x32-bit writes. This tearing is dangerous as the hardware *will*
operate on the intermediate value, requiring great care when assigning.
(See, for example, i965_write_fence_reg.) As such we don't currently use
them and strongly advise not to us them. Go one step further and remove
the 64-bit write vfuncs.
v2: Add some more details to the comment about why WRITE64 is absent,
and why you need to think twice before using READ64.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160906144538.4204-1-chris@chris-wilson.co.uk
In an upcoming patch we'll need the actual mask of subslices in addition
to their count, so convert the subslice_per_slice field to a mask.
Also we can easily calculate subslice_total from the other fields, so
instead of storing a cached version of this, add a helper to calculate
it.
v2:
- Use hweight8() on u8 typed vars instead of hweight32(). (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
In an upcoming patch we'll need the actual mask of slices in addition to
their count, so replace the count field with a mask.
v2:
- Use hweight8() on u8 typed vars instead of hweight32(). (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1472659987-10417-5-git-send-email-imre.deak@intel.com
Move all slice/subslice/eu related properties to the sseu_dev_info
struct.
No functional change.
v2:
- s/info/sseu/ based on the new struct name. (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
The data in this struct is provided both by getting the
slice/subslice/eu features available on a given platform and the actual
runtime state of these same features which depends on the HW's current
power saving state.
Atm members of this struct are duplicated in sseu_dev_status and
intel_device_info. For clarity and code reuse we can share one struct
for both of the above purposes. This patch only moves the struct to the
header file, the next patch will convert users of intel_device_info to
use this struct too.
Instead of unsigned int u8 is used now, which is big enough and is used
anyway in intel_device_info.
No functional change.
v2:
- s/stat/sseu/ based on the new struct name (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1472659987-10417-2-git-send-email-imre.deak@intel.com
Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index
to avoid one struct_mutex locking scenario.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com
The mentioned commit changes intel_display_crc_init to take a dev_priv,
but forgets to change the stub.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Fixes: 36cdd0138b ("drm/i915: debugfs spring cleaning")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-and-by: Kim Lidström <kim@dxtr.im>
Link: http://patchwork.freedesktop.org/patch/msgid/1472116022-17598-1-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Rather than walk the full array of engines checking whether each is in
the mask in turn, we can use the mask to jump to the right engines. This
should quicker for a sparse array of engines or mask, whilst generating
smaller code:
text data bss dec hex filename
1251010 4579 800 1256389 132bc5 drivers/gpu/drm/i915/i915.ko
1250530 4579 800 1255909 1329e5 drivers/gpu/drm/i915/i915.ko
The downside is that we have to pass in a temporary, alas no C99
iterators yet.
[P.S. Joonas doesn't like having to pass extra temporaries into the
macro, and even less that I called them tmp. As yet, we haven't found a
macro that avoids passing in a temporary that is smaller. We probably
will get C99 iterators first!]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160827075401.16470-2-chris@chris-wilson.co.uk
Now that we have working partial VMA and faulting support for all
objects, including fence support, advertise to userspace that it can
take advantage of unlimited GGTT mmaps.
v2: Make room in the kerneldoc for a more detailed explanation of the
limitations of the GTT mmap interface.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk
Since we have to write ddb allocations at the same time as we do other
plane updates, we're going to need to be able to control the order in
which we execute modesets on each pipe. The easiest way to do this is to
just factor this section of intel_atomic_commit_tail()
(intel_atomic_commit() for stable branches) into it's own function, and
add an appropriate display function hook for it.
Based off of Matt Rope's suggestions
Changes since v1:
- Drop pipe_config->base.active check in intel_update_crtcs() since we
check that before calling the function
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
[omitting CC for stable, since this patch will need to be changed for
such backports first]
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471961565-28540-1-git-send-email-cpaul@redhat.com
This is required for supporting nonblocking modesets. Iterating over
the connector list will no longer be allowed when we don't hold
connection_mutex, so we have to use the atomic state.
Fix disable_noatomic by populating the minimal state required to
disable a connector.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470755054-32699-3-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just like with sysfs, we do some major overhaul.
Pass dev_priv instead of dev to all feature macros (IS_, HAS_,
INTEL_, etc.). This has the side effect that a bunch of functions
now get dev_priv passed instead of dev.
All calls to INTEL_INFO()->gen have been replaced with
INTEL_GEN().
We want access to to_i915(node->minor->dev) in a lot of places,
so add the node_to_i915() helper to accommodate for this.
Finally, we have quite a few cases where we get a void * pointer,
and need to cast it to drm_device *, only to run to_i915() on it.
Add cast_to_i915() to do this.
v2: Don't introduce extra dev (Chris)
v3: Make pipe_crc_info have a pointer to drm_i915_private instead of
drm_device. This saves a bit of space, since we never use
drm_device anywhere in these functions.
Also some minor fixup that I missed in the previous version.
v4: Changed the code a bit so that dev_priv is passed directly
to various functions, thus removing the need for the
cast_to_i915() helper. Also did some additional cleanup.
v5: Additional cleanup of newly introduced changes.
v6: Rebase again because of conflict.
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822105931.pcbe2lpsgzckzboa@boom
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Various cleanup for i915_sysfs.c; we now use dev_priv whenever
possible. The kdev_to_drm_minor() helper function has been
replaced by one that converts from struct device *
to struct drm_i915_private *.
We already have a seemingly identical helper (kdev_to_i915())
in i915_drv.h. But that one cannot be used here.
Unlike the version in i915_drv.h, this helper
reaches i915 through drm_minor.
v2: Rename kdev_to_i915_dm() to kdev_minor_to_i915() (Chris)
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-4-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We currently have a mix of struct device *device, struct device *kdev,
and struct device *dev (the latter forcing us to refer to
struct drm_device as something else than the normal dev).
To simplify things, always use kdev when referring to struct device.
v2: Replace the dev_to_drm_minor() macro with the inline function
kdev_to_drm_minor().
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-3-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since the watermark calculations for Skylake are still broken, we're apt
to hitting underruns very easily under multi-monitor configurations.
While it would be lovely if this was fixed, it's not. Another problem
that's been coming from this however, is the mysterious issue of
underruns causing full system hangs. An easy way to reproduce this with
a skylake system:
- Get a laptop with a skylake GPU, and hook up two external monitors to
it
- Move the cursor from the built-in LCD to one of the external displays
as quickly as you can
- You'll get a few pipe underruns, and eventually the entire system will
just freeze.
After doing a lot of investigation and reading through the bspec, I
found the existence of the SAGV, which is responsible for adjusting the
system agent voltage and clock frequencies depending on how much power
we need. According to the bspec:
"The display engine access to system memory is blocked during the
adjustment time. SAGV defaults to enabled. Software must use the
GT-driver pcode mailbox to disable SAGV when the display engine is not
able to tolerate the blocking time."
The rest of the bspec goes on to explain that software can simply leave
the SAGV enabled, and disable it when we use interlaced pipes/have more
then one pipe active.
Sure enough, with this patchset the system hangs resulting from pipe
underruns on Skylake have completely vanished on my T460s. Additionally,
the bspec mentions turning off the SAGV with more then one pipe enabled
as a workaround for display underruns. While this patch doesn't entirely
fix that, it looks like it does improve the situation a little bit so
it's likely this is going to be required to make watermarks on Skylake
fully functional.
This will still need additional work in the future: we shouldn't be
enabling the SAGV if any of the currently enabled planes can't enable WM
levels that introduce latencies >= 30 µs.
Changes since v11:
- Add skl_can_enable_sagv()
- Make sure we don't enable SAGV when not all planes can enable
watermarks >= the SAGV engine block time. I was originally going to
save this for later, but I recently managed to run into a machine
that was having problems with a single pipe configuration + SAGV.
- Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
- Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
- Move printks outside of mutexes
- Don't print error messages twice
Changes since v10:
- Apparently sandybridge_pcode_read actually writes values and reads
them back, despite it's misleading function name. This means we've
been doing this mostly wrong and have been writing garbage to the
SAGV control. Because of this, we no longer attempt to read the SAGV
status during initialization (since there are no helpers for this).
- mlankhorst noticed that this patch was breaking on some very early
pre-release Skylake machines, which apparently don't allow you to
disable the SAGV. To prevent machines from failing tests due to SAGV
errors, if the first time we try to control the SAGV results in the
mailbox indicating an invalid command, we just disable future attempts
to control the SAGV state by setting dev_priv->skl_sagv_status to
I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
- Move mutex_unlock() a little higher in skl_enable_sagv(). This
doesn't actually fix anything, but lets us release the lock a little
sooner since we're finished with it.
Changes since v9:
- Only enable/disable sagv on Skylake
Changes since v8:
- Add intel_state->modeset guard to the conditional for
skl_enable_sagv()
Changes since v7:
- Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
all we use it for anyway)
- Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
- Fix a styling error that snuck past me
Changes since v6:
- Protect skl_enable_sagv() with intel_state->modeset conditional in
intel_atomic_commit_tail()
Changes since v5:
- Don't use is_power_of_2. Makes things confusing
- Don't use the old state to figure out whether or not to
enable/disable the sagv, use the new one
- Split the loop in skl_disable_sagv into it's own function
- Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
Changes since v4:
- Use is_power_of_2 against active_crtcs to check whether we have > 1
pipe enabled
- Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
enabled
- Call skl_sagv_enable/disable() from pre/post-plane updates
Changes since v3:
- Use time_before() to compare timeout to jiffies
Changes since v2:
- Really apply minor style nitpicks to patch this time
Changes since v1:
- Added comments about this probably being one of the requirements to
fixing Skylake's watermark issues
- Minor style nitpicks from Matt Roper
- Disable these functions on Broxton, since it doesn't have an SAGV
Signed-off-by: Lyude <cpaul@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com
[mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk
In the recent patch
bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Fixes: bc3d674462 ("drm/i915: Allow userspace to request no-error-capture...")
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-31-chris@chris-wilson.co.uk
If FBC is set on a framebuffer that is unmapped, all GTT faults will be
from a partial mapping. Writes by the user through the partial VMA are
then untracked by the FBC and so we must use the ORIGIN_CPU when flushing
the I915_GEM_DOMAIN_GTT.
v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
v2: A couple of silly drops replaced spotted by Joonas
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
Since vfree() now likes to WARN when passed a non-page-aligned pointer,
we need to discard the low bits to comply with it.
Fixes: d31d7cb146 ("drm/i915: Support for creating write combined type vmaps")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841
Fixes: 1816f92363 ("drm/i915: Support creation of unbound wc user...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Tested-by: Matti Hämäläinen <ccr@tnsp.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-30-git-send-email-chris@chris-wilson.co.uk
Since contexts are not currently shared between userspace processes, we
have an exact correspondence between context creator and guilty batch
submitter. Therefore we can save some per-batch work by inspecting the
context->pid upon error instead. Note that we take the context's
creator's pid rather than the file's pid in order to better track fd
passed over sockets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-29-git-send-email-chris@chris-wilson.co.uk
This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
v2: Joonas' stylistic nitpicks, a fun rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
When working with contexts, we most frequently want the GGTT VMA for the
context state, first and foremost. Since the object is available via the
VMA, we need only then store the VMA.
v2: Formatting tweaks to debugfs output, restored some comments removed
in the next patch
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-15-git-send-email-chris@chris-wilson.co.uk
The VMA are unreferenced, they belong to the object and live until they
are closed. However, if we want to use the VMA as a cookie and use it to
keep the object alive, we want to hold onto a reference to the object
for the lifetime of the VMA cookie. To facilitate this, add a couple of
simple wrappers for managing the reference count on the object owning the
VMA.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-11-git-send-email-chris@chris-wilson.co.uk
A simple little macro to clear a pointer and return the old value. This
is useful for writing
value = *ptr;
if (!value)
return;
*ptr = 0;
...
free(value);
in a slightly more concise form:
value = fetch_and_zero(ptr);
if (!value)
return;
...
free(value);
with the idea that this establishes a pattern that may be extended for
atomic use (using xchg or cmpxchg) i.e. atomic_fetch_and_zero() and
similar to llist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-10-git-send-email-chris@chris-wilson.co.uk
No longer is knowing how much of the GTT (both mappable aperture and
beyond) relevant, and the output clutters the real information - that is
how many objects are allocated and bound (and by who) so that we can
quickly grasp if there is a leak.
v2: Relent, and rename pinned to indicate display only. Since the
display objects are semi-static and are of variable size, they are the
interesting objects to watch over time for aperture leaking. The other
pins are either static (such as the scratch page) or very short lived
(such as execbuf) and not part of the precious GGTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-6-git-send-email-chris@chris-wilson.co.uk
When capturing the error state, we do not need to know about every
address space - just those that are related to the error. We know which
context is active at the time, therefore we know which VM are implicated
in the error. We can then restrict the VM which we report to the
relevant subset.
v2: s/i/count_active/ (and similar)
Rewrite label generation for "Buffers"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-2-git-send-email-chris@chris-wilson.co.uk
This patch provides the infrastructure for performing a 16-byte aligned
read from WC memory using non-temporal instructions introduced with sse4.1.
Using movntdqa we can bypass the CPU caches and read directly from memory
and ignoring the page attributes set on the CPU PTE i.e. negating the
impact of an otherwise UC access. Copying using movntdqa from WC is almost
as fast as reading from WB memory, modulo the possibility of both hitting
the CPU cache or leaving the data in the CPU cache for the next consumer.
(The CPU cache itself my be flushed for the region of the movntdqa and on
later access the movntdqa reads from a separate internal buffer for the
cacheline.) The write back to the memory is however cached.
This will be used in later patches to accelerate accessing WC memory.
v2: Report whether the accelerated copy is successful/possible.
v3: Function alignment override was only necessary when using the
function target("sse4.1") - which is not necessary for emitting movntdqa
from __asm__.
v4: Improve notes on CPU cache behaviour vs non-temporal stores.
v5: Fix byte offsets for unrolled moves.
v6: Find all remaining typos of "movntqda", use kernel_fpu_begin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
Until now code was calling hweight32 to figure out the
number from device_info->ring_mask at runtime. Instead
we can cache it at engine init time and use directly.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1470842530-35854-1-git-send-email-tvrtko.ursulin@linux.intel.com
In the preceding patches we made sure that:
- the LVDS encoder takes care of reiniting both the LVDS register
and its PPS
- the eDP encoder takes care of reiniting its PPS
- the PPS register unlocking workaround is applied explicitly whenever
the PPS context is lost
Based on the above we can safely remove the opaque LVDS and PPS save /
restore from generic code.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470827254-21954-6-git-send-email-imre.deak@intel.com
The PPS registers are pretty much the same everywhere, the differences
being:
- Register fields appearing, disappearing from one platform to the
next: panel-reset-on-powerdown, backlight-on, panel-port,
register-unlock
- Different register base addresses
- Different number of PPS instances: 2 on VLV/CHV/BXT, 1 everywhere
else.
We can merge the separate set of PPS definitions by extending the PPS
instance argument to all platforms and using instance 0 on platforms
with a single instance. This means we'll need to calculate the register
addresses dynamically based on the given platform and PPS instance.
v2:
- Simplify if ladder in intel_pps_get_registers(). (Ville)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470827254-21954-1-git-send-email-imre.deak@intel.com
The bottom-half we use for processing the breadcrumb interrupt is a
task, which is an RCU protected struct. When accessing this struct, we
need to be holding the RCU read lock to prevent it disappearing beneath
us. We can use the RCU annotation to mark our irq_seqno_bh pointer as
being under RCU guard and then use the RCU accessors to both provide
correct ordering of access through the pointer.
Most notably, this fixes the access from hard irq context to use the RCU
read lock, which both Daniel and Tvrtko complained about.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-3-git-send-email-chris@chris-wilson.co.uk
This function would call drm_modeset_lock_all, while the suspend/resume
functions already have their own locking. Fix this by factoring out
__intel_display_resume, and calling the atomic helpers for duplicating
atomic state and disabling all crtc's during suspend.
Changes since v1:
- Deal with -EDEADLK right after lock_all and clean up calls
to hw readout.
- Always take all modeset locks so updates during gpu reset are blocked.
Changes since v2:
- Fix deadlock in intel_update_primary_planes.
- Move WARN_ON(EDEADLK) to __intel_display_resume.
- pctx -> ctx
- only call __intel_display_resume on success in intel_display_resume.
Changes since v3:
- Rebase on top of dev_priv -> dev change.
- Use drm_modeset_lock_all_ctx instead of drm_modeset_lock_all.
Changes since v4 [by vsyrjala]:
- Deal with skip_intermediate_wm
- Update comment w.r.t. mode_config.mutex vs. ->detect()
- Rebase due to INTEL_GEN() etc.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e ("drm/i915: Use atomic helpers for suspend, v2.")
Cc: stable@vger.kernel.org
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470428910-12125-2-git-send-email-ville.syrjala@linux.intel.com
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.
v2: New magic
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.
get-tiling is reasonably frequent due to the back-channel passing around
of tiling parameters in DRI2/DRI3.
v2: Make tiling_mode a full unsigned int so that we can trivially use it
with READ_ONCE(). Separating it out into manual control over the flags
field was too noisy for a simple patch. Note that we could use the lower
bits of obj->stride for the tiling mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-16-git-send-email-chris@chris-wilson.co.uk
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.
As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).
v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).
v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct_mutex, one more step towards its removal as the de
facto BKL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-21-git-send-email-chris@chris-wilson.co.uk
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.
v2: Move the cheap unlikely tests into the caller
v3: Move the kerneldoc into the header (now separated out into
intel_fronbuffer.h for better kerneldoc and readability)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtien <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-20-git-send-email-chris@chris-wilson.co.uk
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-12-git-send-email-chris@chris-wilson.co.uk
In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.
v2: Update docs to consistently state "global GTT".
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-11-git-send-email-chris@chris-wilson.co.uk
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.
v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.
v2: long long casts not required for u64 printk (%llx)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
This reimplements the denial-of-service protection against igt from
commit 227f782e46 ("drm/i915: Retire requests before creating a new
one") and transfers the stall from before each batch into get_pages().
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we have made the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking.
v2: Recycle objects prior to i915_gem_object_get_pages()
v3: Remove the reference to the ring from i915_gem_requests_ring() as it
operates on an intel_engine_cs.
v4: Since commit 9b5f4e5ed6 ("drm/i915: Retire oldest completed request
before allocating next") we no longer need the safeguard to retire
requests before get_pages(). We no longer see the huge latencies when
hitting the shrinker between allocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-4-git-send-email-chris@chris-wilson.co.uk
This reverts commit e9f24d5fb7.
The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.
v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.
Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.
v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk
Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-7-git-send-email-chris@chris-wilson.co.uk
For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both distinguish GGTT and ppGTT address spaces, as well
as occasionally inspect the owner.
v2: Whitespace, tells us who owns i915_address_space
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-6-git-send-email-chris@chris-wilson.co.uk
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
Now that PCU communication is reasonably fast, we do not need to defer
RC6 initialisation to a workqueue.
References: https://bugs.freedesktop.org/show_bug.cgi?id=97017
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add this workaround to prevent hang when in place compression
is used.
References: HSD#2135774
Cc: stable@vger.kernel.org
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since commit a6f766f397 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e38b ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-6-git-send-email-chris@chris-wilson.co.uk
Migrate the request operations out of the main body of i915_gem.c and
into their own C file for easier expansion.
v2: Move __i915_add_request() across as well
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-1-git-send-email-chris@chris-wilson.co.uk
Before suspend, and especially before building the hibernation image, we
need to context image to be coherent in memory. To do this we require
that we perform a context switch to a disposable context (i.e. the
dev_priv->kernel_context) - when that switch is complete, all other
context images will be complete. This leaves the kernel_context image as
incomplete, but fortunately that is disposable and we can do a quick
fixup of the logical state after resuming.
v2: Share the nearly identical code to switch to the kernel context with
eviction.
v3: Explain why we need the switch and reset.
Testcase: igt/gem_exec_suspend # bsw
References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-2-git-send-email-chris@chris-wilson.co.uk
Currently execlists is exempt from emitting a request to switch each
ring away from the current context over to the dev_priv->kernel_context
(for whatever reason, just under execlists the GGTT is unlikely to be as
fragmented, however the switch may help in some extreme cases). Extract
the switcher and enable it for execlsts as well, as we need to do so in
a later patch to force the context switch before suspend. (And since for
that switch we explicitly require the disposable kernel context, rename
the extracted function.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-1-git-send-email-chris@chris-wilson.co.uk
Unfortunately, there's two situations where we lose hpd right now:
- Runtime suspend
- When we've shut off all of the power wells on Valleyview/Cherryview
While it would be nice if this didn't cause issues, this has the
ability to get us in some awkward states where a user won't be able to
get their display to turn on. For instance; if we boot a Valleyview
system without any monitors connected, it won't need any of it's power
wells and thus shut them off. Since this causes us to lose HPD, this
means that unless the user knows how to ssh into their machine and do a
manual reprobe for monitors, none of the monitors they connect after
booting will actually work.
Eventually we should come up with a better fix then having to enable
polling for this, since this makes rpm a lot less useful, but for now
the infrastructure in i915 just isn't there yet to get hpd in these
situations.
Changes since v1:
- Add comment explaining the addition of the if
(!mode_config->poll_running) in intel_hpd_init()
- Remove unneeded if (!dev->mode_config.poll_enabled) in
i915_hpd_poll_init_work()
- Call to drm_helper_hpd_irq_event() after we disable polling
- Add cancel_work_sync() call to intel_hpd_cancel_work()
Changes since v2:
- Apparently dev->mode_config.poll_running doesn't actually reflect
whether or not a poll is currently in progress, and is actually used
for dynamic module paramter enabling/disabling. So now we instead
keep track of our own poll_running variable in dev_priv->hotplug
- Clean i915_hpd_poll_init_work() a little bit
Changes since v3:
- Remove the now-redundant connector loop in intel_hpd_init(), just
rely on intel_hpd_poll_enable() for setting connector->polled
correctly on each connector
- Get rid of poll_running
- Don't assign enabled in i915_hpd_poll_init_work before we actually
lock dev->mode_config.mutex
- Wrap enabled assignment in i915_hpd_poll_init_work() in READ_ONCE()
for doc purposes
- Do the same for dev_priv->hotplug.poll_enabled with WRITE_ONCE in
intel_hpd_poll_enable()
- Add some comments about racing not mattering in intel_hpd_poll_enable
Changes since v4:
- Rename intel_hpd_poll_enable() to intel_hpd_poll_init()
- Drop the bool argument from intel_hpd_poll_init()
- Remove redundant calls to intel_hpd_poll_init()
- Rename poll_enable_work to poll_init_work
- Add some kerneldoc for intel_hpd_poll_init()
- Cross-reference intel_hpd_poll_init() in intel_hpd_init()
- Just copy the loop from intel_hpd_init() in intel_hpd_poll_init()
Changes since v5:
- Minor kerneldoc nitpicks
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
One of the things preventing us from using polling is the fact that
calling valleyview_crt_detect_hotplug() when there's a VGA cable
connected results in sending another hotplug. With polling enabled when
HPD is disabled, this results in a scenario like this:
- We enable power wells and reset the ADPA
- output_poll_exec does force probe on VGA, triggering a hpd
- HPD handler waits for poll to unlock dev->mode_config.mutex
- output_poll_exec shuts off the ADPA, unlocks dev->mode_config.mutex
- HPD handler runs, resets ADPA and brings us back to the start
This results in an endless irq storm getting sent from the ADPA
whenever a VGA connector gets detected in the middle of polling.
Somewhat based off of the "drm/i915: Disable CRT HPD around force
trigger" patch Ville Syrjälä sent a while back
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some hardware requires a valid render context before it can initiate
rc6 power gating of the GPU; the default state of the GPU is not
sufficient and may lead to undefined behaviour. The first execution of
any batch will load the "golden render state", at which point it is safe
to enable rc6. As we do not forcibly load the kernel context at resume,
we have to hook into the batch submission to be sure that the render
state is setup before enabling rc6.
However, since we don't enable powersaving until that first batch, we
queued a delayed task in order to guarantee that the batch is indeed
submitted.
v2: Rearrange intel_disable_gt_powersave() to match.
v3: Apply user specified cur_freq (or idle_freq if not set).
v4: Give in, and supply a delayed work to autoenable rc6
v5: Mika suggested a couple of better names for delayed_resume_work
v6: Rebalance rpm_put around the autoenable task
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
To allow the user finer control over waitboosting, allow them to set the
frequency we request for the boost. This also them allows to effectively
disable the boosting by setting the boost request to a low frequency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
With the unified common engine setup done, and the execlist engine
initialization loop clearly split into two phases, we can eliminate
the separate legacy engine initialization code.
v2: Fix cleanup path for legacy.
v3: Rename constructors. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
Make sure we keep kbuilder happy in all of its random configs by
providing argument names for compile-time stubs.
In file included from drivers/gpu/drm/i915/intel_dp_mst.c:27:0:
drivers/gpu/drm/i915/i915_drv.h: In function 'i915_debugfs_register':
>> drivers/gpu/drm/i915/i915_drv.h:3612:48: error: parameter name omitted
static inline int i915_debugfs_register(struct drm_i915_private *) {return 0;}
^~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/i915_drv.h: In function 'i915_debugfs_unregister':
drivers/gpu/drm/i915/i915_drv.h:3613:51: error: parameter name omitted
static inline void i915_debugfs_unregister(struct drm_i915_private *) {}
Reported-by: 0day
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468324529-20461-1-git-send-email-chris@chris-wilson.co.uk
One of the numerous VT-d workarounds we require is that the display
hardware reads past the end of the buffer triggering VT-d faults. This
is acknowledged in the code as being safe "since we fill the unused
portions of the GGTT with the scratch page". Alas, that is no longer
always true and so we trigger DMAR read faults.
Skylake also requires another workaround to avoid mixing VT-d and
unpopulated PTE, and so there we also need to ensure we fill unused
entries with the scratch page.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96584
Fixes: f7770bfd9f ("drm/i915: Skip clearing the GGTT on full-ppgtt systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773634-8106-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Some Kabylake SKUs are going to use Kabypoint PCH.
It is mainly for Halo and DT ones.
>From our specs it doesn't seem that KBP brings
any change on the display south engine. So let's consider
this as a continuation of SunrisePoint, i.e., SPT+.
Since it is easy to get confused by a letter change:
KBL = Kabylake - CPU/GPU codename.
KBP = Kabypoint - PCH codename.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96826
Link: http://patchwork.freedesktop.org/patch/msgid/1467418032-15167-1-git-send-email-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As we inspect both the tasklet (to check for an active bottom-half) and
set the irq-posted flag at the same time (both in the interrupt handler
and then in the bottom-halt), group those two together into the same
cacheline. (Not having total control over placement of the struct means
we can't guarantee the cacheline boundary, we need to align the kmalloc
and then each struct, but the grouping should help.)
v2: Try a couple of different names for the state touched by the user
interrupt handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-3-git-send-email-chris@chris-wilson.co.uk
Following on from the scenario Tvrtko envisioned to explain a hard-to-hit
race with multiple first waiters, we could also then race in the
__i915_request_irq_complete() and the bottom-half may miss the vital
irq-seqno barrier and so go to sleep not noticing their seqno is
complete.
v2: unlock, not double lock the rcu_read_lock.
Fixes: 3d5564e910 ("drm/i915: Only apply one barrier after...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-2-git-send-email-chris@chris-wilson.co.uk
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.
text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)
and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
Some IS_ and HAS_ macros can return any non-zero value for true.
One potential problem with that is that someone could assign
them to integers and be surprised with the result. Therefore it
is probably safer to do the conversion to 0/1 in the macros
themselves.
Luckily this does not seem to have an effect on code size.
Only one call site was getting bit by this and a patch for
that has been sent as "drm/i915/guc: Protect against HAS_GUC_*
returning true values other than one".
v2: Added some extra braces as suggested by checkpatch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467643823-9798-1-git-send-email-tvrtko.ursulin@linux.intel.com
igt likes to inject GPU hangs into its command streams. However, as we
expect these hangs, we don't actually want them recorded in the dmesg
output or stored in the i915_error_state (usually). To accommodate this
allow userspace to set a flag on the context that any hang emanating
from that context will not be recorded. We still do the error capture
(otherwise how do we find the guilty context and know its intent?) as
part of the reason for random GPU hang injection is to exercise the race
conditions between the error capture and normal execution.
v2: Split out the request->ringbuf error capture changes.
v3: Move the flag defines next to the intel_context->flags definition
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-9-git-send-email-chris@chris-wilson.co.uk
Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.
v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk
Under the assumption that enabling signaling will be a frequent
operation, lets preallocate our attachments for signaling inside the
(rather large) request struct (and so benefiting from the slab cache).
v2: Convert from void * to more meaningful names and types.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-17-git-send-email-chris@chris-wilson.co.uk
If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).
Process context is preferred over softirq (or even hardirq) for a couple
of reasons:
- we already utilize process context to have fast wakeup of a single
client (i.e. the client waiting for the GPU inspects the seqno for
itself following an interrupt to avoid the overhead of a context
switch before it returns to userspace)
- engine->irq_seqno() is not suitable for use from an softirq/hardirq
context as we may require long waits (100-250us) to ensure the seqno
write is posted before we read it from the CPU
A signaling framework is a requirement for enabling dma-fences.
v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.
v5: Make failure to allocate the thread fatal.
v6: Rename kthreads to i915/signal:%u
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-16-git-send-email-chris@chris-wilson.co.uk
If we flag the seqno as potentially stale upon receiving an interrupt,
we can use that information to reduce the frequency that we apply the
heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).
v2: Use cmpxchg to replace READ_ONCE/WRITE_ONCE for more explicit
control of the ordering wrt to interrupt generation and interrupt
checking in the bottom-half.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-14-git-send-email-chris@chris-wilson.co.uk
If we have multiple waiters, we may find that many complete on the same
wake up. If we first inspect the seqno from the CPU cache, we may reduce
the number of heavyweight coherent seqno reads we require.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-13-git-send-email-chris@chris-wilson.co.uk
By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.
v2: Fix semaphore_passed() to look at the signaling engine (not the
waiter's)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
When waiting for an interrupt (waiting for the engine to complete some
work), we know we are the only waiter to be woken on this engine. We also
know when the GPU has nearly completed our request (or at least started
processing it), so after being woken and we detect that the GPU is
active and working on our request, allow us the bottom-half (the first
waiter who wakes up to handle checking the seqno after the interrupt) to
spin for a very short while to reduce client latencies.
The impact is minimal, there was an improvement to the realtime-vs-many
clients case, but exporting the function proves useful later. However,
it is tempting to adjust irq_seqno_barrier to include the spin. The
problem is first ensuring that the "start-of-request" seqno is coherent
as we use that as our basis for judging when it is ok to spin. If we
could, spinning there could dramatically shorten some sleeps, and allow
us to make the barriers more conservative to handle missed seqno writes
on more platforms (all gen7+ are known to have the occasional issue, at
least).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-7-git-send-email-chris@chris-wilson.co.uk
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
purpose of waking after the GPU advances or for waking after an error.
In the future, we may add even more wake sources and require greater
separation, but for now we can conceptually simplify wakeups by separating
the two sources. In particular, this allows us to use different wait-queues
(e.g. one on the engine advancement, a global one for errors and one on
each requests) without any hassle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-5-git-send-email-chris@chris-wilson.co.uk
The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.
v2: Use the system_long_wq as we may wish to capture the error state
after detecting the hang - which may take a bit of time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-3-git-send-email-chris@chris-wilson.co.uk
Ville Syrjälä reported that in the majority of wait_for(I915_READ()) he
inspect, most completed within the first couple of reads and that the
delay between those wait_for() reads was the ratelimiting step for many
code paths. For example, __gen6_update_ring_freq() was blamed for
slowing down boot by many milliseconds, but under Ville's scrutiny the
issue was just excessive delay waiting for sandybridge_pcode_write().
We can eliminate the wait by initially using a busyspin upon the register
read and only fallback to the sleeping loop in cases where the hardware
is indeed too slow. A threshold of 2 microseconds is used as the initial
ballpark.
To avoid excessive code bloating from converting every wait_for() into a
hybrid busy/sleep loop, we extend wait_for_register_fw() and export it
for use by other callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-1-git-send-email-chris@chris-wilson.co.uk
Since commit 2ed53a94d8 ("drm/i915: On GPU reset, set the HWS
breadcrumb to the last seqno") once a hang is completed, the seqno is
advanced past all current requests. With this we know that if we wake up
from waiting for a request, if a hang has occurred and reset completed,
our request will be considered complete (i.e.
i915_gem_request_completed() returns true). Therefore we only need to
worry about the situation where a hang has occurred, but not yet reset,
where we may need to release our struct_mutex. Since we don't need to
detect the completed reset using the global gpu_error->reset_counter
anymore, we do not need to track the reset_counter epoch inside the
request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467211874-11552-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Fix build errors when ACPI is not enabled by adding function stubs:
../drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_suspend':
../drivers/gpu/drm/i915/i915_drv.c:635:2: error: implicit declaration of function 'intel_opregion_unregister' [-Werror=implicit-function-declaration]
intel_opregion_unregister(dev_priv);
../drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_resume':
../drivers/gpu/drm/i915/i915_drv.c:798:2: error: implicit declaration of function 'intel_opregion_register' [-Werror=implicit-function-declaration]
intel_opregion_register(dev_priv);
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Fixes: 03d92e4779 ("drm/i915/opregion: Rename init/fini functions to register/unregister")
Cc: drm-intel-fixes@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[Jani: dropped the stale init/fini declarations]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467028399-9965-1-git-send-email-jani.nikula@intel.com
We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.
v2: Tweak an error message if the wait fails for the ilk vtd w/a
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk
This is so that we have symmetry with intel_lrc.c and avoid a source of
if (i915.enable_execlists) layering violation within i915_gem_context.c -
that is we move the specific handling of the dev_priv->kernel_context
for legacy submission into the legacy submission code.
This depends upon the init/fini ordering between contexts and engines
already defined by intel_lrc.c, and also exporting the context alignment
required for pinning the legacy context.
v2: Separate out pin/unpin context funcs for greater symmetry with
intel_lrc. One more step towards unifying behaviour between the two
classes of engines and towards fixing another bug in i915_switch_context
vs requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-2-git-send-email-chris@chris-wilson.co.uk
i915_dma.c used to contain the DRI1/UMS horror show, but now all that
remains are the out-of-place driver level interfaces (such as
allocating, initialising and registering the driver). These should be in
i915_drv.c alongside similar routines for suspend/resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-10-git-send-email-chris@chris-wilson.co.uk
drm_connector_register_all() is now automatically called by
drm_dev_register(), and so we no longer have to do so ourselves (via
intel_modeset_register() after calling drm_dev_register()). Similarly
for unregistering.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-8-git-send-email-chris@chris-wilson.co.uk
Take control over allocating, loading and registering the driver from the
DRM midlayer by performing it manually from i915_pci_probe. This allows
us to carefully control the order of when we setup the hardware vs when
it becomes visible to third parties (including userspace). The current
ordering makes the driver visible to userspace first (in order to
coordinate with removed DRI1 userspace), but that ordering incurs risk.
The risk increases as we strive for more asynchronous loading.
One side effect of controlling the allocation is that we can allocate
both the drm_device + drm_i915_private in one block, the next step
towards subclassing.
Unload is still left as before, a mix of midlayer and driver.
v2: After drm_dev_init(), we should call drm_dev_unref() so that we call
drm_dev_release() and free everything from drm_dev_init().
v3: Fixup missed error code for failing to allocate dev_priv
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-6-git-send-email-chris@chris-wilson.co.uk
Currently debugfs files are created before the driver is even loads.
This gives the opportunity for userspace to open that interface and poke
around before the backing data structures are initialised - with the
possibility of oopsing or worse.
Move the creation of the debugfs files to our registration phase, where
we announce our presence to the world when we are ready, i.e the
sequence changes from
drm_dev_register()
-> drm_minor_register()
-> drm_debugfs_init()
-> i915_debugfs_init()
-> i915_driver_load()
to
drm_dev_register()
-> drm_minor_register()
-> drm_debugfs_init()
-> i915_driver_load()
-> i915_debugfs_register()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-5-git-send-email-chris@chris-wilson.co.uk
Currently the backlight is being registered in the load phase (before
the display and its objects are registered). Move the backlight
registration into the analogous phase by performing it from the
connector registration, just after its creation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-3-git-send-email-chris@chris-wilson.co.uk
Effectively removes one layer of indirection between the mask of
possible engines and the engine constructors. Instead of spelling
out in code the mapping of HAS_<engine> to constructors, makes
more use of the recently added data driven approach by putting
engine constructor vfuncs into the table as well.
Effect is fewer lines of source and smaller binary.
At the same time simplify the error handling since engine
destructors can run on unitialized engines anyway.
Similar approach could be done for legacy submission is wanted.
v2: Removed ugly BUILD_BUG_ONs in favour of newly introduced
ENGINE_MASK and HAS_ENGINE macros.
Also removed the forward declarations by shuffling functions
around.
v3: Warn when logical_rings table does not contain enough data
and disable the engines which could not be initialized.
(Chris Wilson)
v4: Chris Wilson suggested a nicer engine init loop.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466689961-23232-1-git-send-email-tvrtko.ursulin@linux.intel.com
Backmerge drm-next for the reworked device register/unregistering.
Chris Wilson needs that to be able to land his i915 load/unload
demidlayering.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
- Infrastructure for GVT-g (paravirtualized gpu on gen8+), from Zhi Wang
- another attemp at nonblocking atomic plane updates
- bugfixes and refactoring for GuC doorbell code (Dave Gordon)
- GuC command submission enabled by default, if fw available (Dave Gordon)
- more bxt w/a (Arun Siluvery)
- bxt phy improvements (Imre Deak)
- prep work for stolen objects support (Ankitprasa Sharma & Chris Wilson)
- skl/bkl w/a update from Mika Kuoppala
- bunch of small improvements and fixes all over, as usual
* tag 'drm-intel-next-2016-06-20' of git://anongit.freedesktop.org/drm-intel: (81 commits)
drm/i915: Update DRIVER_DATE to 20160620
drm/i915: Introduce GVT context creation API
drm/i915: Support LRC context single submission
drm/i915: Introduce execlist context status change notification
drm/i915: Make addressing mode bits in context descriptor configurable
drm/i915: Make ring buffer size of a LRC context configurable
drm/i915: gvt: Introduce the basic architecture of GVT-g
drm/i915: Fold vGPU active check into inner functions
drm/i915: Use offsetof() to calculate the offset of members in PVINFO page
drm/i915: Factor out i915_pvinfo.h
drm/i915: Serialise presentation with imported dmabufs
drm/i915: Use atomic commits for legacy page_flips
drm/i915: Move fb_bits updating later in atomic_commit
drm/i915: nonblocking commit
Reapply "drm/i915: Pass atomic states to fbc update, functions."
drm/i915: Roll out the helper nonblock tracking
drm/i915: Signal drm events for atomic
drm/i915/ilk: Don't disable SSC source if it's in use
drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission
drm/i915/guc: replace assign_doorbell() with select_doorbell_register()
...
Also extract drm_auth.h for nicer grouping.
v2: Nuke the other comments since they don't really explain a lot, and
within the drm core we generally only document functions exported to
drivers: The main audience for these docs are driver writers.
v3: Limit the exposure of drm_master internals by only including
drm_auth.h where it is neede (Chris).
v4: Spelling polish (Emil).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
No need for local struct drm_device * since dev_priv is the
correct thing to pass in to NEEDS_WaRsDisableCoarsePowerGating
anyway. Changed the macro definition for the latter to reflect
that as well.
v2: Alignment bikeshed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466518034-24838-1-git-send-email-tvrtko.ursulin@linux.intel.com
... instead of the previous ORIGIN_GTT. This should actually
invalidate FBC once something is written on the frontbuffer using WC
mmaps. The problem with ORIGIN_GTT is that the automatic hardware
tracking is not able to detect the WC writes as it can detect the GTT
writes.
This should help fix the SKL bug where nothing happens when you type
your username/password on lightdm.
This patch was originally pasted on an email by Chris and converted to
an actual git patch by Paulo.
v2 (from Paulo):
- Make it a full variable instead of a bit-field (Daniel)
- Use WRITE_ONCE (Chris)
v3 (from Paulo):
- Remove huge comment since now we have WRITE_ONCE (Chris)
- Remove uneeded new line (Chris)
- Add Chris' Signed-off-by, authorized via IRC
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466185599-26401-1-git-send-email-paulo.r.zanoni@intel.com
Currently to see if an object is backed by struct pages (as opposed to
being a simple pointer to stolen memory, for example) we do a manual
check on the obj->ops->flags. This is quite shouty and before adding
more checks in future, we should make it a bit calmer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431552-17860-1-git-send-email-chris@chris-wilson.co.uk
We now have a connector->func that serves the same purpose as our own
intel_connector->unregister vfunc allowing us to unwrap ourselves and
use drm_connector_register() (and friends) as the central function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466160034-12173-2-git-send-email-chris@chris-wilson.co.uk
GVT workload scheduler needs special host LRC contexts, the so called
"shadow LRC context" to submit guest workload to host i915. During the
guest workload submission, workload scheduler fills the shadow LRC
context with the content of guest LRC context: engine context is copied
without changes, ring context is mostly owned by host i915.
v8:
- Remove the graph temporarily. (Chris)
- Use interruptible mutex_lock. (Chris)
- Rename the function name of creating a GVT context. (Chris)
- Add the missing declaration in i915_drv.h (Chris)
v7:
- Move chart to a better place. (Joonas)
v6:
- Make GVT code as dead code when !CONFIG_DRM_I915_GVT. (Chris)
v5:
- Only compile this feature when CONFIG_DRM_I915_GVT is enabled. (Tvrtko)
- Rebase the code into new repo.
- Add a comment about the ring buffer size. (Joonas)
v2:
Mostly based on Daniel's idea. Call the refactored core logic of GEM
context creation service and LRC context creation service to create the GVT
context.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-10-git-send-email-zhi.a.wang@intel.com
This patch introduces the support of LRC context single submission.
As GVT context may come from different guests, which require different
configuration of render registers. It can't be combined into a dual ELSP
submission combo.
Only GVT-g will create this kinds of GEM context currently.
v8:
- Rename the data member in struct i915_gem_context. (Chris)
v7:
- Fix typos in commit message. (Joonas)
v6:
- Make GVT code as dead code when !CONFIG_DRM_I915_GVT. (Chris)
v5:
- Only compile this feature when CONFIG_DRM_I915_GVT=y. (Tvrtko)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-9-git-send-email-zhi.a.wang@intel.com
This patch introduces an approach to track the execlist context status
change.
GVT-g uses GVT context as the "shadow context". The content inside GVT
context will be copied back to guest after the context is idle. And GVT-g
has to know the status of the execlist context.
This function is configurable when creating a new GEM context. Currently,
Only GVT-g will create the "status-change-notification" enabled GEM
context.
v10:
- Fix the identation. (Joonas)
v8:
- Remove the boolean flag in struct i915_gem_context. (Joonas)
v7:
- Remove per-engine ctx status notifiers. Use one status notifier for all
engines. (Joonas)
- Add prefix "INTEL_" for related definitions. (Joonas)
- Refine the comments in execlists_context_status_change(). (Joonas)
v6:
- When !CONFIG_DRM_I915_GVT, make GVT code as dead code then compiler
could automatically eliminate them for us. (Chris)
- Always initialize the notifier header, so it could be switched on/off
at runtime. (Chris)
v5:
- Only compile this feature when CONFIG_DRM_I915_GVT is enabled.(Tvrtko)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v8)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-8-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently the addressing mode bit in context descriptor is statically
generated from the configuration of system-wide PPGTT usage model.
GVT-g will load the PPGTT shadow page table by itself and probably one
guest is using a different addressing mode with i915 host. The addressing
mode bits of a LRC context should be configurable under this case.
v10:
- Fix the identation. (Joonas)
v9:
- Rename the data member in struct i915_gem_context. (Chris)
v8:
- Rename the data member in struct i915_gem_context. (Chris)
v7:
- Move context addressing mode bit into i915_reg.h. (Joonas/Chris)
- Add prefix "INTEL_" for related definitions. (Joonas)
v6:
- Directly save the addressing mode bits inside i915_gem_context. (Chris)
- Move the LRC context addressing mode bits into intel_lrc.h. (Chris)
v5:
- Change USES_FULL_48BIT(dev) to USES_FULL_48BIT(dev_priv) (Tvrtko)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-7-git-send-email-zhi.a.wang@intel.com
This patch introduces an option for configuring the ring buffer size
of a LRC context after the context creation.
v9:
- Fix an identation issue. (Chris)
v8:
- Rename the data member in i915_gem_context. (Chris)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-6-git-send-email-zhi.a.wang@intel.com
This patch introduces the very basic framework of GVT-g device model,
includes basic prototypes, definitions, initialization.
v12:
- Call intel_gvt_init() in driver early initialization stage. (Chris)
v8:
- Remove the GVT idr and mutex in intel_gvt_host. (Joonas)
v7:
- Refine the URL link in Kconfig. (Joonas)
- Refine the introduction of GVT-g host support in Kconfig. (Joonas)
- Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas)
- Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas)
- Remove {alloc, free}_gvt_device()
- Rename intel_gvt_{create, destroy}_gvt_device()
- Expost intel_gvt_init_host()
- Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas)
v6:
- Refine introduction in Kconfig. (Chris)
- The exposed API functions will take struct intel_gvt * instead of
void *. (Chris/Tvrtko)
- Remove most memebers of strct intel_gvt_device_info. Will add them
in the device model patches.(Chris)
- Remove gvt_info() and gvt_err() in debug.h. (Chris)
- Move GVT kernel parameter into i915_params. (Chris)
- Remove include/drm/i915_gvt.h, as GVT-g will be built within i915.
- Remove the redundant struct i915_gvt *, as the functions in i915
will directly take struct intel_gvt *.
- Add more comments for reviewer.
v5:
Take Tvrtko's comments:
- Fix the misspelled words in Kconfig
- Let functions take drm_i915_private * instead of struct drm_device *
- Remove redundant prints/local varible initialization
v3:
Take Joonas' comments:
- Change file name i915_gvt.* to intel_gvt.*
- Move GVT kernel parameter into intel_gvt.c
- Remove redundant debug macros
- Change error handling style
- Add introductions for some stub functions
- Introduce drm/i915_gvt.h.
Take Kevin's comments:
- Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c
v2:
- Introduce i915_gvt.c.
It's necessary to introduce the stubs between i915 driver and GVT-g host,
as GVT-g components is configurable in kernel config. When disabled, the
stubs here do nothing.
Take Joonas' comments:
- Replace boolean return value with int.
- Replace customized info/warn/debug macros with DRM macros.
- Document all non-static functions like i915.
- Remove empty and unused functions.
- Replace magic number with marcos.
- Set GVT-g in kernel config to "n" by default.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This mode allows to assign EUs to pools which can process work collectively.
The command to enable this mode should be issued as part of context initialization.
The pooled mode is global, once enabled it has to stay the same across all
contexts until HW reset hence this is sent in auxiliary golden context batch.
Thanks to Mika for the preliminary review and comments.
v2: explain why this is enabled in golden context, use feature flag while
enabling the support (Chris)
v3: Include only kernel support as userspace support is not available yet.
User space clients need to know when the pooled EU feature is present
and enabled on the hardware so that they can adapt work submissions.
Create a new device info flag for this purpose.
Set has_pooled_eu to true in the Broxton static device info - Broxton
supports the feature in hardware and the driver will enable it by
default.
We need to add getparam ioctls to enable userspace to query availability of
this feature and to retrieve min. no of eus in a pool but we will expose
them once userspace support is available. Opensource users for this feature
are mesa, libva and beignet.
Beignet team is currently working on adding userspace support.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Winiarski, Michal <michal.winiarski@intel.com>
Cc: Zou, Nanhai <nanhai.zou@intel.com>
Cc: Yang, Rong R <rong.r.yang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Armin Reese <armin.c.reese@intel.com>
Cc: Tim Gore <tim.gore@intel.com>
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This utility function is a companion to i915_gem_object_get_page() that
uses the same cached iterator for the scatterlist to perform fast
sequential lookup of the dma address associated with any page within the
object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Apparently some CHV boards failed to hook up the port presence straps
for HDMI ports as well (earlier we assumed this problem only affected
eDP ports). So let's check the VBT in addition to the strap, and if
either one claims that the port is present go ahead and register the
relevant connector.
While at it, change port D to register DP before HDMI as we do for ports
B and C since
commit 457c52d87e ("drm/i915: Only ignore eDP ports that are connected")
Also print a debug message when we register a HDMI connector to aid
in diagnosing missing/incorrect ports. We already had such a print for
DP/eDP.
v2: Improve the comment in the code a bit, note the port D change in
the commit message
Cc: Radoslav Duda <radosd@radosd.com>
Tested-by: Radoslav Duda <radosd@radosd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96321
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464945463-14364-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Git got absolutely destroyed with all our cherry-picking from
drm-intel-next-queued to various branches. It ended up inserting
intel_crtc_page_flip 2x even in intel_display.c.
Backmerge to get back to sanity.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drm-intel-next-2016-05-22:
- cmd-parser support for direct reg->reg loads (Ken Graunke)
- better handle DP++ smart dongles (Ville)
- bxt guc fw loading support (Nick Hoathe)
- remove a bunch of struct typedefs from dpll code (Ander)
- tons of small work all over to avoid casting between drm_device and the i915
dev struct (Tvrtko&Chris)
- untangle request retiring from other operations, also fixes reset stat corner
cases (Chris)
- skl atomic watermark support from Matt Roper, yay!
- various wm handling bugfixes from Ville
- big pile of cdclck rework for bxt/skl (Ville)
- CABC (Content Adaptive Brigthness Control) for dsi panels (Jani&Deepak M)
- nonblocking atomic commits for plane-only updates (Maarten Lankhorst)
- bunch of PSR fixes&improvements
- untangle our map/pin/sg_iter code a bit (Dave Gordon)
drm-intel-next-2016-05-08:
- refactor stolen quirks to share code between early quirks and i915 (Joonas)
- refactor gem BO/vma funcstion (Tvrtko&Dave)
- backlight over DPCD support (Yetunde Abedisi)
- more dsi panel sequence support (Jani)
- lots of refactoring around handling iomaps, vma, ring access and related
topics culmulating in removing the duplicated request tracking in the execlist
code (Chris & Tvrtko) includes a small patch for core iomapping code
- hw state readout for bxt dsi (Ramalingam C)
- cdclk cleanups (Ville)
- dedupe chv pll code a bit (Ander)
- enable semaphores on gen8+ for legacy submission, to be able to have a direct
comparison against execlist on the same platform (Chris) Not meant to be used
for anything else but performance tuning
- lvds border bit hw state checker fix (Jani)
- rpm vs. shrinker/oom-notifier fixes (Praveen Paneri)
- l3 tuning (Imre)
- revert mst dp audio, it's totally non-functional and crash-y (Lyude)
- first official dmc for kbl (Rodrigo)
- and tons of small things all over as usual
* 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel: (194 commits)
drm/i915: Revert async unpin and nonblocking atomic commit
drm/i915: Update DRIVER_DATE to 20160522
drm/i915: Inline sg_next() for the optimised SGL iterator
drm/i915: Introduce & use new lightweight SGL iterators
drm/i915: optimise i915_gem_object_map() for small objects
drm/i915: refactor i915_gem_object_pin_map()
drm/i915/psr: Implement PSR2 w/a for gen9
drm/i915/psr: Use ->get_aux_send_ctl functions
drm/i915/psr: Order DP aux transactions correctly
drm/i915/psr: Make idle_frames sensible again
drm/i915/psr: Try to program link training times correctly
drm/i915/userptr: Convert to drm_i915_private
drm/i915: Allow nonblocking update of pageflips.
drm/i915: Check for unpin correctness.
Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
drm/i915: Make unpin async.
drm/i915: Prepare connectors for nonblocking checks.
drm/i915: Pass atomic states to fbc update functions.
drm/i915: Remove reset_counter from intel_crtc.
drm/i915: Remove queue_flip pointer.
...
This reverts the following patches:
d55dbd06bb drm/i915: Allow nonblocking update of pageflips.
15c86bdb76 drm/i915: Check for unpin correctness.
95c2ccdc82 Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
a6747b7304 drm/i915: Make unpin async.
03f476e1fc drm/i915: Prepare connectors for nonblocking checks.
2099deffef drm/i915: Pass atomic states to fbc update functions.
ee7171af72 drm/i915: Remove reset_counter from intel_crtc.
2ee004f7c5 drm/i915: Remove queue_flip pointer.
b8d2afae55 drm/i915: Remove use_mmio_flip kernel parameter.
8dd634d922 drm/i915: Remove cs based page flip support.
143f73b3bf drm/i915: Rework intel_crtc_page_flip to be almost atomic, v3.
84fc494b64 drm/i915: Add the exclusive fence to plane_state.
6885843ae1 drm/i915: Convert flip_work to a list.
aa420ddd8e drm/i915: Allow mmio updates on all platforms, v2.
afee4d8707 Revert "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
"drm/i915: Allow nonblocking update of pageflips" should have been
split up, misses a proper commit message and seems to cause issues in
the legacy page_flip path as demonstrated by kms_flip.
"drm/i915: Make unpin async" doesn't handle the unthrottled cursor
updates correctly, leading to an apparent pin count leak. This is
caught by the WARN_ON in i915_gem_object_do_pin which screams if we
have more than DRM_I915_GEM_OBJECT_MAX_PIN_COUNT pins.
Unfortuantely we can't just revert these two because this patch series
came with a built-in bisect breakage in the form of temporarily
removing the unthrottled cursor update hack for legacy cursor ioctl.
Therefore there's no other option than to revert the entire pile :(
There's one tiny conflict in intel_drv.h due to other patches, nothing
serious.
Normally I'd wait a bit longer with doing a maintainer revert, but
since the minimal set of patches we need to revert (due to the bisect
breakage) is so big, time is running out fast. And very soon
(especially after a few attempts at fixing issues) it'll be really
hard to revert things cleanly.
Lessons learned:
- Not a good idea to rush the review (done by someone fairly new to
the area) and not make sure domain experts had a chance to read it.
- Patches should be properly split up. I only looked at the two
patches that should be reverted in detail, but both look like the
mix up different things in one patch.
- Patches really should have proper commit messages. Especially when
doing more than one thing, and especially when touching critical and
tricky core code.
- Building a patch series and r-b stamping it when it has a built-in
bisect breakage is not a good idea.
- I also think we need to stop building up technical debt by
postponing atomic igt testcases even longer. I think it's clear that
there's enough corner cases in this beast that we really need to
have the testcases _before_ the next step lands.
(cherry picked from commit 5a21b6650a
from drm-intel-next-queeud)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
On Loading, GuC sets PM interrupts routing (bit 31) and clears ARAT
expired interrupt (bit 9). Host turbo also updates this register
in RPS flows. This patch ensures bit 31 and bit 9 setup by GuC persists.
ARAT timer interrupt is needed in GuC for various features. It also
facilitates halting GuC and hence achieving RC6. PM interrupt routing
will not impact RPS interrupt reception by host as GuC will redirect
them.
This patch fixes igt test pm_rc6_residency that was failing with guc
load/submission enabled. Tested with SKL GuC v6.1 and BXT GuC v5.1 and v8.7.
v2: i915_irq/i915_pm decoupling from intel_guc. (ChrisW)
v3: restructuring the mask update and rebase w.r.t Ville's patch. (ChrisW)
v4: Updating the pm_intr_keep during direct_interrupts_to_guc. (Sagar)
Cc: Chris Harris <chris.harris@intel.com>
Cc: Zhe Wang <zhe1.wang@intel.com>
Cc: Deepak S <deepak.s@intel.com>
Cc: Satyanantha, Rama Gopal M <rama.gopal.m.satyanantha@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Testcase: igt/pm_rc6_residency
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Tested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464683307-19475-1-git-send-email-sagar.a.kamble@intel.com
I see the main drm pull got merged, here's the first batch of fixes for
4.7 already. Fixes all around, a large portion cc: stable stuff.
[airlied: the DP++ stuff is a regression fix].
* tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
This reverts the following patches:
d55dbd06bb drm/i915: Allow nonblocking update of pageflips.
15c86bdb76 drm/i915: Check for unpin correctness.
95c2ccdc82 Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
a6747b7304 drm/i915: Make unpin async.
03f476e1fc drm/i915: Prepare connectors for nonblocking checks.
2099deffef drm/i915: Pass atomic states to fbc update functions.
ee7171af72 drm/i915: Remove reset_counter from intel_crtc.
2ee004f7c5 drm/i915: Remove queue_flip pointer.
b8d2afae55 drm/i915: Remove use_mmio_flip kernel parameter.
8dd634d922 drm/i915: Remove cs based page flip support.
143f73b3bf drm/i915: Rework intel_crtc_page_flip to be almost atomic, v3.
84fc494b64 drm/i915: Add the exclusive fence to plane_state.
6885843ae1 drm/i915: Convert flip_work to a list.
aa420ddd8e drm/i915: Allow mmio updates on all platforms, v2.
afee4d8707 Revert "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
"drm/i915: Allow nonblocking update of pageflips" should have been
split up, misses a proper commit message and seems to cause issues in
the legacy page_flip path as demonstrated by kms_flip.
"drm/i915: Make unpin async" doesn't handle the unthrottled cursor
updates correctly, leading to an apparent pin count leak. This is
caught by the WARN_ON in i915_gem_object_do_pin which screams if we
have more than DRM_I915_GEM_OBJECT_MAX_PIN_COUNT pins.
Unfortuantely we can't just revert these two because this patch series
came with a built-in bisect breakage in the form of temporarily
removing the unthrottled cursor update hack for legacy cursor ioctl.
Therefore there's no other option than to revert the entire pile :(
There's one tiny conflict in intel_drv.h due to other patches, nothing
serious.
Normally I'd wait a bit longer with doing a maintainer revert, but
since the minimal set of patches we need to revert (due to the bisect
breakage) is so big, time is running out fast. And very soon
(especially after a few attempts at fixing issues) it'll be really
hard to revert things cleanly.
Lessons learned:
- Not a good idea to rush the review (done by someone fairly new to
the area) and not make sure domain experts had a chance to read it.
- Patches should be properly split up. I only looked at the two
patches that should be reverted in detail, but both look like the
mix up different things in one patch.
- Patches really should have proper commit messages. Especially when
doing more than one thing, and especially when touching critical and
tricky core code.
- Building a patch series and r-b stamping it when it has a built-in
bisect breakage is not a good idea.
- I also think we need to stop building up technical debt by
postponing atomic igt testcases even longer. I think it's clear that
there's enough corner cases in this beast that we really need to
have the testcases _before_ the next step lands.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
struct intel_context contains two substructs, one for the legacy RCS and
one for every execlists engine. Since legacy RCS is a subset of the
execlists engine support, just combine the two substructs.
v2: Only pin the default context for legacy mode (the object only exists
for legacy, but adding i915.enable_execlists provides symmetry with the
cleanup functions).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-8-git-send-email-chris@chris-wilson.co.uk
Just move the kernel_context member of drm_i915_private next to the
engines it is associated with.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-7-git-send-email-chris@chris-wilson.co.uk
We want to give a name to the currently anonymous per-engine struct
inside the context, so that we can assign it to a local variable and
save clumsy typing. The name we have chosen is intel_context as it
reflects the HW facing portion of the context state (the logical context
state, the registers, the ringbuffer etc).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-4-git-send-email-chris@chris-wilson.co.uk
i915_gem_context_get() is a very simple wrapper around idr_find(), so
simple that it would be smaller to do the lookup inline. Also we use the
verb 'lookup' to return a pointer from a handle, freeing 'get' to imply
obtaining a reference to the context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-3-git-send-email-chris@chris-wilson.co.uk
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:
s/struct intel_context/struct i915_gem_context/
which fits much better with its constructors already conveying the
i915_gem_context prefix!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.
But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.
I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.
New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip
DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support
panel:
- Support for a bunch of new panels
i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes
radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support
nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information
vmwgfx:
- Add host messaging support
gma500:
- Some cleanups and fixes
atmel:
- Bridge support
- Async atomic commit support
fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support
rcar-du:
- Misc fixes
exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs
rockchip:
- Lots of little fixes
vc4:
- Render node support
- Gamma ramp support
- DPI output support
msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence
etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups
tegra:
- Gamme table size fix"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...
We'll want to store the cdclk PLL (whatever PLL that is in reality) vco
frequency somewhere on other platforms too, so let's rename the
skl_vco_freq to cdclk_pll.vco, and let's store it in kHz instead of MHz
to match most of the other clocks.
v2: Drop the spurious > vs != change (Imre)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463172100-24715-14-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Now that skl_vco_freq tracks the actual DPLL0 vco frequency, we'll need
something that keeps track of which vco frequency we want to use in case
the current vco is 0. This would be important across supend/resume since
we'll disable DPLL0 around those parts.
We'll also update our idea of max cdclk/dotclock when the preferred
vco changes. That could happen if out initial guess was wrong, and
later eDP would force us to change it. One issue here could be that
changing the max dotclock could cause our mode list to change during
next time the displays get probed. But I don't see a good way to avoid
that, except perhaps by allowing either vco frequency to be used as
needed. But the docs suggest that such usage wasn't really inteded.
Also need to make sure we don't update our max_cdclk value before we
have a preferred vco value, which means moving that to happen after
the cdclk sanitation.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463172100-24715-9-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
WARNING: Using ChromeOS with an eDP panel and a 4K@60 DP monitor connected
to DDI1 the system will hard hang during a cold boot. Occurs when DDI1
is enabled when the cdclk is less then required. DP connected to DDI2
and HPD on either port works correctly.
Set cdclk based on the max required pixel clock based on VCO
selected. Track boot vco instead of boot cdclk.
The vco is now tracked at the atomic level and all CRTCs updated if
the required vco is changed. Not tested with eDP v1.4 panels that
require 8640 vco due to availability.
V1: initial version
V2: add vco tracking in intel_dp_compute_config(), rename
skl_boot_cdclk.
V3: rebase, V2 feedback not possible as encoders are not aware of
atomic.
V4: track target vco is atomic state. modeset all CRTCs if vco changes
V5: rename atomic variable, cleaner if/else logic, use existing vco if
encoder does not return a new vco value. check_patch.pl cleanup
V6: simplify logic in intel_modeset_checks.
V7: reorder an IF for readability and whitespace fix.
V8: use dev_cdclk for tracking new cdclk during atomic
V9: correctly handle vco 8640 when crtcs==0
V10: Clean up if else in crtcs==0
V11: Rebase for new intel_dpll_mgr.c
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
[vsyrjala: rebased due to churn]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463172100-24715-3-git-send-email-ville.syrjala@linux.intel.com
Current intel_opregion_init is called during the driver registration
phase and intel_opregion_fini from the unregistration phase. Rename the
functions so that this is clear from their names. The phases tell us
what we expect the existing hw state to be, e.g. whether interrupts are
still enabled etc.
It should be noted that the opregion init/fini routines are asymmetric
and this is carried across into their new names. Indeed, their new names
make it even clearer that perhaps all is not well in the opregion
suspend/resume sequence (as well in the module unload).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464012490-30961-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Prefer passing struct drm_i915_private to internal interfaces as this
saves us having to dance between drm_device and our native struct. The
savings hare are small (only 70 bytes of unrequired dancing), but
progressive!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464012490-30961-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
For now, anything with a GuC requires uCode loading, and then supports
command submission once loaded. But these are logically distinct from
simply "having a GuC", so we need a separate macro for the latter. Then,
various tests should use this new macro rather than HAS_GUC_UCODE() or
testing enable_guc_submission.
v4:
Added a couple more uses of the new macro.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
DP dual mode type 1 DVI adaptors aren't required to implement any
registers, so it's a bit hard to detect them. The best way would
be to check the state of the CONFIG1 pin, but we have no way to
do that. So as a last resort, check the VBT to see if the HDMI
port is in fact a dual mode capable DP port.
v2: Deal with VBT code reorganization
Deal with DRM_DP_DUAL_MODE_UNKNOWN
Reduce DEVICE_TYPE_DP_DUAL_MODE_BITS a bit
Accept both DP and HDMI dvo_port in VBT as my BSW
at least declare its DP port as HDMI :(
v3: Ignore DEVICE_TYPE_NOT_HDMI_OUTPUT (Shashank)
Cc: stable@vger.kernel.org
Cc: Tore Anderson <tore@fud.no>
Reported-by: Tore Anderson <tore@fud.no>
Fixes: 7a0baa6234 ("Revert "drm/i915: Disable 12bpc hdmi for now"")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462362322-31278-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
(cherry picked from commit d61992565b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Avoiding the out-of-line call to sg_next() reduces the kernel execution
overhead by 10% in some workloads (for example the Unreal Engine 4 demo
Atlantis on 2GiB GTTs) which are dominated by the cost of inserting PTEs
due to texture thrashing. We can demonstrate this in a microbenchmark
that forces us to rebind the object on every execbuf, where we can
measure a 25% improvement, in the time required to execute an execbuf
requiring a texture to be rebound, for inlining the sg_next() for large
texture sizes.
Benchmark: igt/benchmarks/gem_exec_fault
Benchmark: igt/benchmarks/gem_exec_trace/Atlantis
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-5-git-send-email-chris@chris-wilson.co.uk
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.
Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)
One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.
Various uses of for_each_sg_page() are then converted to the new macros.
v2: Force inlining of the sg_iter constructor and make the union
anonymous.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
userptr directly only uses drm_device in a single interface where it
meant to use drm_i915_private (everywhere else we have to derive it from
the drm_i915_gem_object and so require going from drm_device).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463671036-3235-1-git-send-email-chris@chris-wilson.co.uk
When creating the hibernation image, the CPU will read the pages of all
objects and thus conflict with our domain tracking. We need to update
our domain tracking to accurately reflect the state on restoration.
v2: Perform the domain tracking inside freeze, before the image is
written, rather than upon restoration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-2-git-send-email-chris@chris-wilson.co.uk