Now that we've moved the Gen9 GuC blobs to version 32 we have CTB
support on all gens, so no need to restrict the usage to Gen11+.
Note that MMIO communication is still required for CTB initialization.
v2: fix commit message nits (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190606224225.14287-1-daniele.ceraolospurio@intel.com
Use i915_gem_object_lock() to guard the LUT and active reference to
allow us to break free of struct_mutex for handling GEM_CLOSE.
Testcase: igt/gem_close_race
Testcase: igt/gem_exec_parallel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190606112320.9704-1-chris@chris-wilson.co.uk
This allows us to avoid iterating the child devices in some cases.
Also replace the presence bit with child device being non-NULL, and set
the child device pointer last to allow us to take advantage of it in
follow-up work.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ceccb75d637af3134d0328d67cbd6623932f94db.1559308269.git.jani.nikula@intel.com
In this patch, a vfunc read_luts() is introduced to create a hw lut
i.e. lut having values read from gamma/degamma registers which will
later be used to compare with sw lut to validate gamma/degamma lut values.
v3: -Rebase
v4: -Renamed intel_get_color_config to intel_color_get_config [Jani]
-Wrapped get_color_config() [Jani]
v5: -Renamed intel_color_get_config() to intel_color_read_luts()
-Renamed get_color_config to read_luts
v6: -Renamed intel_color_read_luts() back to intel_color_get_config()
[Jani and Ville]
Signed-off-by: Swati Sharma <swati2.sharma@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1559123462-7343-2-git-send-email-swati2.sharma@intel.com
Move over structures, enums and macros from intel_display.h and
i915_drv.h to have all the display PM defines in the same header.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190531222409.9177-3-daniele.ceraolospurio@intel.com
Keep all the device-level PM management in intel_runtime_pm.h/c and move
all the display specific bits into their own file. Also add the new
header to Makefile.header-test.
Apart from the giant code move, the only difference is with the
intel_runtime_<get/put>_raw() functions, which are now exposed in the
header. The _put() version is also not conditionally compiled anymore
since it is ok to always pass the wakeref taken from the _get() to
__intel_runtime_pm_put (it is -1 if tracking is disabled).
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190531222409.9177-2-daniele.ceraolospurio@intel.com
Currently, we try to report to the shrinker the precise number of
objects (pages) that are available to be reaped at this moment. This
requires searching all objects with allocated pages to see if they
fulfill the search criteria, and this count is performed quite
frequently. (The shrinker tries to free ~128 pages on each invocation,
before which we count all the objects; counting takes longer than
unbinding the objects!) If we take the pragmatic view that with
sufficient desire, all objects are eventually reapable (they become
inactive, or no longer used as framebuffer etc), we can simply return
the count of pinned pages maintained during get_pages/put_pages rather
than walk the lists every time.
The downside is that we may (slightly) over-report the number of
objects/pages we could shrink and so penalize ourselves by shrinking
more than required. This is mitigated by keeping the order in which we
shrink objects such that we avoid penalizing active and frequently used
objects, and if memory is so tight that we need to free them we would
need to anyway.
v2: Only expose shrinkable objects to the shrinker; a small reduction in
not considering stolen and foreign objects.
v3: Restore the tracking from a "backup" copy from before the gem/ split
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-2-chris@chris-wilson.co.uk
Currently the purgeable objects, I915_MADV_DONTNEED, are mixed in the
normal bound/unbound lists. Every shrinker pass starts with an attempt
to purge from this set of unneeded objects, which entails us doing a
walk over both lists looking for any candidates. If there are none, and
since we are shrinking we can reasonably assume that the lists are
full!, this becomes a very slow futile walk.
If we separate out the purgeable objects into own list, this search then
becomes its own phase that is preferentially handled during shrinking.
Instead the cost becomes that we then need to filter the purgeable list
if we want to distinguish between bound and unbound objects.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-1-chris@chris-wilson.co.uk
The i915.alpha_support module parameter has caused some confusion along
the way. Add new i915.force_probe parameter to specify PCI IDs of
devices to probe, when the devices are recognized but not automatically
probed by the driver. The name is intended to reflect what the parameter
effectively does, avoiding any overloaded semantics of "alpha" and
"support".
The parameter supports "" to disable, "<pci-id>,[<pci-id>,...]" to
enable force probe for one or more devices, and "*" to enable force
probe for all known devices.
Also add new CONFIG_DRM_I915_FORCE_PROBE config option to replace the
DRM_I915_ALPHA_SUPPORT option. This defaults to "*" if
DRM_I915_ALPHA_SUPPORT=y.
Instead of replacing i915.alpha_support immediately, let the two coexist
for a while, with a deprecation message, for a transition period.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190506134801.28751-1-jani.nikula@intel.com
In order to support driver hot unbind, some cleanup operations, now
performed on PCI driver remove, must be called later, after all device
file descriptors are closed.
Split out those operations from the tail of pci_driver.remove()
callback and put them into drm_driver.release() which is called as soon
as all references to the driver are put. As a result, those cleanups
will be now run on last drm_dev_put(), either still called from
pci_driver.remove() if all device file descriptors are already closed,
or on last drm_release() file operation.
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190530133105.30467-1-janusz.krzysztofik@linux.intel.com
Out scatterlist utility routines can be pulled out of i915_gem.h for a
bit more decluttering.
v2: Push I915_GTT_PAGE_SIZE out of i915_scatterlist itself and into the
caller.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-9-chris@chris-wilson.co.uk
Currently the code for manipulating the pages on an object is still
residing in i915_gem.c, move it to i915_gem_object.c
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-3-chris@chris-wilson.co.uk
For convenience in avoiding inline spaghetti, keep the type definition
as a separate header.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-1-chris@chris-wilson.co.uk
Do not allow runtime pm autosuspend to remove userspace GGTT mmaps too
quickly. For example, igt sets the autosuspend delay to 0, and so we
immediately attempt to perform runtime suspend upon releasing the
wakeref. Unfortunately, that involves tearing down GGTT mmaps as they
require an active device.
Override the autosuspend for GGTT mmaps, by keeping the wakeref around
for 250ms after populating the PTE for a fresh mmap.
v2: Prefer refcount_t for its under/overflow error detection
v3: Flush the user runtime autosuspend prior to system system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190527115114.13448-1-chris@chris-wilson.co.uk
ICL has so many planes that it can easily exceed the maximum
effective memory bandwidth of the system. We must therefore check
that we don't exceed that limit.
The algorithm is very magic number heavy and lacks sufficient
explanation for now. We also have no sane way to query the
memory clock and timings, so we must rely on a combination of
raw readout from the memory controller and hardcoded assumptions.
The memory controller values obviously change as the system
jumps between the different SAGV points, so we try to stabilize
it first by disabling SAGV for the duration of the readout.
The utilized bandwidth is tracked via a device wide atomic
private object. That is actually not robust because we can't
afford to enforce strict global ordering between the pipes.
Thus I think I'll need to change this to simply chop up the
available bandwidth between all the active pipes. Each pipe
can then do whatever it wants as long as it doesn't exceed
its budget. That scheme will also require that we assume that
any number of planes could be active at any time.
TODO: make it robust and deal with all the open questions
v2: Sleep longer after disabling SAGV
v3: Poll for the dclk to get raised (seen it take 250ms!)
If the system has 2133MT/s memory then we pointlessly
wait one full second :(
v4: Use the new pcode interface to get the qgv points rather
that using hardcoded numbers
v5: Move the pcode stuff into intel_bw.c (Matt)
s/intel_sagv_info/intel_qgv_info/
Do the NV12/P010 as per spec for now (Matt)
s/IS_ICELAKE/IS_GEN11/
v6: Ignore bandwidth limits if the pcode query fails
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190524153614.32410-1-ville.syrjala@linux.intel.com
To overcome display engine stride limits we'll want to remap the
pages in the GTT. To that end we need a new gtt_view type which
is just like the "rotated" type except not rotated.
v2: Use intel_remapped_plane_info base type
s/unused/unused_mbz/ (Chris)
Separate BUILD_BUG_ON()s (Chris)
Use I915_GTT_PAGE_SIZE (Chris)
v3: Use i915_gem_object_get_dma_address() (Chris)
Trim the sg (Tvrtko)
v4: Actually trim this time. Limit the max length
to one row of pages to keep things simple
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-2-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
By disabling a power domain asynchronously we can restrict holding a
reference on that power domain to the actual code sequence that
requires the power to be on for the HW access it's doing, by also
avoiding unneeded on-off-on togglings of the power domain (since the
disabling happens with a delay).
One benefit is potential power saving due to the following two reasons:
1. The fact that we will now be holding the reference only for the
necessary duration by the end of the patchset. While simply not
delaying the disabling has the same benefit, it has the problem that
frequent on-off-on power switching has its own power cost (see the 2.
point below) and the debug trace for power well on/off events will
cause a lot of dmesg spam (see details about this further below).
2. Avoiding the power cost of freuqent on-off-on power switching. This
requires us to find the optimal disabling delay based on the measured
power cost of on->off and off->on switching of each power well vs.
the power of keeping the given power well on.
In this patchset I'm not providing this optimal delay for two
reasons:
a) I don't have the means yet to perform the measurement (with high
enough signal-to-noise ratio, or with the help of an energy
counter that takes switching into account). I'm currently looking
for a way to measure this.
b) Before reducing the disabling delay we need an alternative way for
debug tracing powerwell on/off events. Simply avoiding/throttling
the debug messages is not a solution, see further below.
Note that even in the case where we can't measure any considerable
power cost of frequent on-off switching of powerwells, it still would
make sense to do the disabling asynchronously (with 0 delay) to avoid
blocking on the disabling. On VLV I measured this disabling time
overhead to be 1ms on average with a worst case of 4ms.
In the case of the AUX power domains on ICL we would also need to keep
the sequence where we hold the power reference short, the way it would
be by the end of this patchset where we hold it only for the actual AUX
transfer. Anything else would make the locking we need for ICL TypeC
ports (whenever we hold a reference on any AUX power domain) rather
problematic, adding for instance unnecessary lockdep dependencies to
the required TypeC port lock.
I chose the disabling delay to be 100msec for now to avoid the unneeded
toggling (and so not to introduce dmesg spamming) in the DP MST sideband
signaling code. We could optimize this delay later, once we have the
means to measure the switching power cost (see above).
Note that simply removing/throttling the debug tracing for power well
on/off events is not a solution. We need to know the exact spots of
these events and cannot rely only on incorrect register accesses caught
(due to not holding a wakeref at the time of access). Incorrect
powerwell enabling/disabling could lead to other problems, for instance
we need to keep certain powerwells enabled for the duration of modesets
and AUX transfers.
v2:
- Clarify the commit log parts about power cost measurement and the
problem of simply removing/throttling debug tracing. (Chris)
- Optimize out local wakeref vars at intel_runtime_pm_put_raw() and
intel_display_power_put_async() call sites if
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=n. (Chris)
- Rebased on v2 of the wakeref w/o power-on guarantee patch.
- Add missing docbook headers.
v3:
- Checkpatch spelling/missing-empty-line fix.
v4:
- Fix unintended local wakeref var optimization when using
call-arguments with side-effects, by using inline funcs instead of
macros. In this patch in particular this will fix the
intel_display_power_grab_async_put_ref()->intel_runtime_pm_put_raw()
call).
No size change in practice (would be the same disregarding the
corresponding change in intel_display_power_grab_async_put_ref()):
$ size i915-macro.ko
text data bss dec hex filename
2455190 105890 10272 2571352 273c58 i915-macro.ko
$ size i915-inline.ko
text data bss dec hex filename
2455195 105890 10272 2571357 273c5d i915-inline.ko
Kudos to Stan for reporting the raw-wakeref WARNs this issue caused. His
config has CONFIG_DRM_I915_DEBUG_RUNTIME_PM=n, which I didn't retest
after v1, and we are also not testing this config in CI.
Now tested both with CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y/n on ICL,
connecting both Chamelium and regular DP, HDMI sinks.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190513192533.12586-1-imre.deak@intel.com
If the HW fails to ack a change in forcewake status, the machine is as
good as dead -- it may recover, but in reality it missed the mmio
updates and is now in a very inconsistent state. If it happens, we can't
trust the CI results (or at least the fails may be genuine but due to
the HW being dead and not the actual test!) so reboot the machine (CI
checks for a kernel taint in between each test and reboots if the
machine is tainted).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508115245.27790-1-chris@chris-wilson.co.uk
For us KBP is 100% identical to SPT. Kill the redundant enum
value. Also bspec doesn't talk about KBP either, so this might
avoid some confusion when cross checking the code against the
spec.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190506152627.20283-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
The original intent for the delay before running the idle_work was to
provide a hysteresis to avoid ping-ponging the device runtime-pm. Since
then we have also pulled in some memory management and general device
management for parking. But with the inversion of the wakeref handling,
GEM is no longer responsible for the wakeref and by the time we call the
idle_work, the device is asleep. It seems appropriate now to drop the
delay and just run the worker immediately to flush the cached GEM state
before sleeping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-2-chris@chris-wilson.co.uk
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
While at it, rename intel_i2c.c to intel_gmbus.c and the functions to
intel_gmbus_*.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5834b8fbbfd4ac2e3d0159e69c87f6926066f537.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2843b028d65e118dc40316aa84bf620a93f6c67b.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9bc1317a67df0b9d019eca5b36f474b76a1cad26.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9101a58b9f10bcf11332175e17b6e6e45f4ebd17.1556809195.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/876a1671a84c6839bcafdf276cf9c4e1da6c631c.1556809195.git.jani.nikula@intel.com
The workqueue code complains viciously if we try to queue more work onto
the queue while attampting to drain it. As we asynchronously free
objects and defer their enqueuing with RCU, it is quite tricky to
quiesce the system before attempting to drain the workqueue. Yet drain
we must to ensure that the worker is idle before unloading the module.
Give the freed object drain 3 whole passes with multiple rcu_barrier()
to give the defer freeing of several levels each protected by RCU and
needing a grace period before its parent can be freed, ultimately
resulting in a GEM object being freed after another RCU period.
A consequence is that it will make module unload even slower.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110550
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501135753.8711-1-chris@chris-wilson.co.uk
Make the engine responsible for cleaning itself up!
This removes the i915->gt.cleanup vfunc that has been annoying the
casual reader and myself for the last several years, and helps keep a
future patch to add more cleanup tidy.
v2: Assert that engine->destroy is set after the backend starts
allocating its own state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501103204.18632-1-chris@chris-wilson.co.uk
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6aea17072684dec0b04b6831c0c0e5a134edf87e.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87904259868782c1ad664d852b27a50c1597cfaa.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/76d2719b462004ec6f6f5c302ee5d3876357c599.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2e4fb1e67ed38870df3040bb0a1b1a58fd90cc86.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the header remains self-contained, and do so with minimal further
includes, using forward declarations as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cf9b17d56489e15d82356575037432ad04712475.1556540890.git.jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
v2: fix sparse warnings on undeclared global functions
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190429125011.10876-1-jani.nikula@intel.com
It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.
No functional changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/64e46278dc8dccc9c548ef453cb2ceece5367bb2.1556540890.git.jani.nikula@intel.com
We now have two locks for sideband access. The general one covering
sideband access across all generation, sb_lock, and a specific one
covering sideband access via the punit on vlv/chv. After lifting the
sb_lock around the punit into the callers, the pcu_lock is now redudant
and can be separated from its other use to regulate RPS (essentially
giving RPS a lock all of its own).
v2: Extract a couple of minor bug fixes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-4-chris@chris-wilson.co.uk
As we now employ a very heavy pm_qos around the punit access, we want to
minimise the number of synchronous requests by performing one for the
whole punit sequence rather than around individual accesses. The
sideband lock is used for this, so push the pm_qos into the sideband
lock acquisition and release, moving it from the lowlevel punit rw
routine to the callers. In the first step, we move the punit magic into
the common sideband lock so that we can acquire a bunch of ports
simultaneously, and if need be extend the workaround protection later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-2-chris@chris-wilson.co.uk
While we talk to the punit over its sideband, we need to prevent the cpu
from sleeping in order to prevent a potential machine hang.
Note that by itself, it appears that pm_qos_update_request (via
intel_idle) doesn't provide a sufficient barrier to ensure that all core
are indeed awake (out of Cstate) and that the package is awake. To do so,
we need to supplement the pm_qos with a manual ping on_each_cpu.
v2: Restrict the heavy-weight wakeup to just the ISOF_PORT_PUNIT, there
is insufficient evidence to implicate a wider problem atm. Similarly,
restrict the w/a to Valleyview, as Cherryview doesn't have an angry cadre
of users.
The working theory, courtesy of Ville and Hans, is the issue lies within
the power delivery and so is likely to be unit and board specific and
occurs when both the unit/fw require extra power at the same time as the
cpu package is changing its own power state.
References: https://bugzilla.kernel.org/show_bug.cgi?id=109051
References: https://bugs.freedesktop.org/show_bug.cgi?id=102657
References: https://bugzilla.kernel.org/show_bug.cgi?id=195255
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426081725.31217-1-chris@chris-wilson.co.uk
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
For controlling runtime pm of the GT and engines, we would like to have
a callback to do extra work the first time we wake up and the last time
we drop the wakeref. This first/last access needs serialisation and so
we encompass a mutex with the regular intel_wakeref_t tracker.
v2: Drop the _once naming and report the errors.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc; Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-1-chris@chris-wilson.co.uk
Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/
One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
As we push for better compartmentalisation, it is more convenient to
copy the default sseu configuration from the engine into the derived
logical context, than it is to dig it out from i915->runtime_info.
v2: Use intel_sseu_from_device_info() to describe the converter
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk
When we are called to relieve mempressue via the shrinker, the only way
we can make progress is either by discarding unwanted pages (those
objects that userspace has marked MADV_DONTNEED) or by reclaiming the
dirty objects via swap. As we know that is the only way to make further
progress, we can initiate the writeback as we invalidate the objects.
This means the objects we put onto the inactive anon lru list are
already marked for reclaim+writeback and so will trigger a wait upon the
writeback inside direct reclaim, greatly improving the success rate of
direct reclaim on i915 objects.
The corollary is that we may start a slow swap on opportunistic
mempressure from the likes of the compaction + migration kthreads. This
is limited by those threads only being allowed to shrink idle pages, but
also that if we reactivate the page before it is swapped out by gpu
activity, we only page the cost of repinning the page. The cost is most
felt when an object is reused after mempressure, which hopefully
excludes the latency sensitive tasks (as we are just extending the
impact of swap thrashing to them).
Apparently this is not the first time we've had this idea. Back in
commit 5537252b6b ("drm/i915: Invalidate our pages under memory
pressure") we wanted to start writeback but settled on invalidate after
Hugh Dickins warned us about a possibility of a deadlock within shmemfs
if we started writeback from shrink_slab. Looking at the callchain,
using writeback from i915_gem_shrink should be equivalent to the pageout
also employed by shrink_slab, i.e. it should not be any riskier afaict.
v2: Leave mmapings intact. At this point, the only mmapings of our
objects will be via CPU mmaps on the shmemfs filp, which are
out-of-scope for our LRU tracking. Instead leave those pages to the
inactive anon LRU page list for aging and pageout as normal.
v3: Be selective on which paths trigger writeback, in particular
excluding paths shrinking just to reclaim vm space (e.g. mmap, vmap
reapers) and avoid starting writeback on the entire process space from
within the pm freezer.
References: https://bugs.freedesktop.org/show_bug.cgi?id=108686
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190420115539.29081-1-chris@chris-wilson.co.uk
For consistency (and elegance!), add intel_device_info.has_rps.
The immediate boon is that RPS support is now emitted along the other
capabilities in the debug log and after errors.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419134836.5626-1-chris@chris-wilson.co.uk
On resume, we know that the only pinned contexts in danger of seeing
corruption are the kernel context, and so we do not need to walk the
list of all GEM contexts as we tracked them on each engine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk
As soon as a device is considered unplugged, not only prevent pending
users from accessing the device structures but also cancel all their
pending requests so all consumed resources can be cleaned up as soon
as possible.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190406104034.31380-1-chris@chris-wilson.co.uk
If we have only a single active pipe and the cdclk change only requires
the cd2x divider to be updated bxt+ can do the update with forcing a full
modeset on the pipe. Try to hook that up.
v2:
- Wait for vblank after an optimized CDCLK change.
- Avoid optimization if the pipe needs a modeset (or was disabled).
- Split CDCLK change to a pre/post plane update step.
v3:
- Use correct version of CDCLK state as old state. (Ville)
- Remove unused intel_cdclk_can_skip_modeset()
v4:
- For consistency call intel_set_cdclk_post_plane_update() only during
modesets (and not fastsets).
v5:
- Remove the logic to update the CD2X divider on-the-fly on ICL, since
only a divider of 1 is supported there. Clint also noticed that the
pipe select bits in CDCLK_CTL are oddly defined on ICL, it's not clear
yet whether that's only an error in the specification.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Abhay Kumar <abhay.kumar@intel.com>
Tested-by: Abhay Kumar <abhay.kumar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327101321.3095-1-imre.deak@intel.com
CDCLK has to be at least twice the BLCK regardless of audio. Audio
driver has to probe using this hook and increase the clock even in
absence of any display.
v2: Use atomic refcount for get_power, put_power so that we can
call each once(Abhay).
v3: Reset power well 2 to avoid any transaction on iDisp link
during cdclk change(Abhay).
v4: Remove Power well 2 reset workaround(Ville).
v5: Remove unwanted Power well 2 register defined in v4(Abhay).
v6:
- Use a dedicated flag instead of state->modeset for min CDCLK changes
- Make get/put audio power domain symmetric
- Rebased on top of intel_wakeref tracking changes.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Abhay Kumar <abhay.kumar@intel.com>
Tested-by: Abhay Kumar <abhay.kumar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190320135439.12201-1-imre.deak@intel.com
We want to use intel_engine_mask_t inside i915_request.h, which means
extracting it from the general header file mess and placing it inside a
types.h. A knock on effect is that the compiler wants to warn about
type-contraction of ALL_ENGINES into intel_engine_maskt_t, so prepare
for the worst.
v2: Use intel_engine_mask_t consistently
v3: Move I915_NUM_ENGINES to its natural home at the end of the enum
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Concept of a sub-platform already exist in our code (like ULX and ULT
platform variants and similar),implemented via the macros which check a
list of device ids to determine a match.
With this patch we consolidate device ids checking into a single function
called during early driver load.
A few low bits in the platform mask are reserved for sub-platform
identification and defined as a per-platform namespace.
At the same time it future proofs the platform_mask handling by preparing
the code for easy extending, and tidies the very verbose WARN strings
generated when IS_PLATFORM macros are embedded into a WARN type
statements.
v2: Fixed IS_SUBPLATFORM. Updated commit msg.
v3: Chris was right, there is an ordering problem.
v4:
* Catch-up with new sub-platforms.
* Rebase for RUNTIME_INFO.
* Drop subplatform mask union tricks and convert platform_mask to an
array for extensibility.
v5:
* Fix subplatform check.
* Protect against forgetting to expand subplatform bits.
* Remove platform enum tallying.
* Add subplatform to error state. (Chris)
* Drop macros and just use static inlines.
* Remove redundant IRONLAKE_M. (Ville)
v6:
* Split out Ironlake change.
* Optimize subplatform check.
* Use __always_inline. (Lucas)
* Add platform_mask comment. (Paulo)
* Pass stored runtime info in error capture. (Chris)
v7:
* Rebased for new AML ULX device id.
* Bump platform mask array size for EHL.
* Stop mentioning device ids in intel_device_subplatform_init by using
the trick of splitting macros i915_pciids.h. (Jani)
* AML seems to be either a subplatform of KBL or CFL so express it like
that.
v8:
* Use one device id table per subplatform. (Jani)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
IS_IRONLAKE_M can use the already defined intel_device_info.is_mobile for
this platform, so remove the instance of Ironlake's mobile device id from
the header file and replace it with an IS_MOBILE check.
v2:
* Improved commit text. (Chris)
v3:
* Rebased for EHL.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326074057.27833-3-tvrtko.ursulin@linux.intel.com
This allows the IS_PINEVIEW_<G|M> macros to be removed and avoid
duplication of device ids already defined in i915_pciids.h.
!IS_MOBILE check can be used in place of existing IS_PINEVIEW_G call
sites.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326074057.27833-2-tvrtko.ursulin@linux.intel.com
edram is not part of uncore and there is no requirement for the
detection to be done before we initialize the uncore functions. The
first check on HAS_EDRAM is in the ggtt_init path, so move it to
i915_driver_init_hw, where other dram-related detection happens.
While at it, save the size in MB instead of the capabilities because the
size is the only thing we look at outside of the init function.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190328174533.31532-1-daniele.ceraolospurio@intel.com
The current intel_color_check() is a mess, and worse yet it is
in fact incorrect for several platforms. The hardware has
evolved quite a bit over the years, so let's just go for a clean
split between the platforms by turning this into a vfunc.
The actual work to split it up will follow.
v2: Assign the vfuncs in the order they appear in the
struct (Matt)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327155045.28446-3-ville.syrjala@linux.intel.com
Tvrtko spotted that I left off the trailing ';'. It went unnoticed by CI
because despite adding the macro, we didn't add a user, so include one as
well (a simple debug print).
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 97ee6e9255 ("drm/i915: stop storing the media fuse")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326180007.11722-1-chris@chris-wilson.co.uk
The full read/write ops can now work on the intel_uncore struct.
Introduce intel_uncore_read/write functions working on intel_uncore
and switch the I915_READ/WRITE macro to internally call those.
v2: no change
v3: add intel_uncore_read/write functions (Chris), update commit msg
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-6-daniele.ceraolospurio@intel.com
They now work on uncore, so use raw_uncore_ prefix. Also move them to
uncore.h
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-2-daniele.ceraolospurio@intel.com
We're already updating the engine_mask to reflect what's in the HW, so
we can just get the info from there. A couple of macros have been added
to facilitate this.
v2: Appease checkpatch
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322002431.9585-1-daniele.ceraolospurio@intel.com
Iterate over child devices instead of ports in parse_ddi_ports() to
initialize ddi_port_info. We'll eventually need to decide some stuff
based on the child device order, which may be different from the port
order.
As a bonus, this allows better abstractions for e.g. dvo port mapping.
There's a subtle change in the DDC pin and AUX channel sanitization as
we change the order. Otherwise, this should not change behaviour.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322121008.4456-1-jani.nikula@intel.com
The AGPBUSY thing doesn't work on i945gm anymore. This means
the gmch is incapable of waking the CPU from C3 when an interrupt
is generated. The interrupts just get postponed indefinitely until
something wakes up the CPU. This is rather annoying for vblank
interrupts as we are unable to maintain a steady framerate
unless the machine is sufficiently loaded to stay out of C3.
To combat this let's use pm_qos to prevent C3 whenever vblank
interrupts are enabled. To maintain reasonable amount of powersaving
we will attempt to limit this to C3 only while leaving C1 and C2
enabled.
v2: Use READ_ONCE() (Chris)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30364
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322180804.3300-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
If I'm reading the spec right AML 0x87CA is a Y SKU, so it
should be marked as ULX in our old style terminology.
Cc: stable@vger.kernel.org
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: c0c46ca461 ("drm/i915/aml: Add new Amber Lake PCI ID")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322204944.23613-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Add ElkhartLake as a unique platform as there are some differences
between it and Icelake.
Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322175847.25707-2-rodrigo.vivi@intel.com
In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.
v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit
v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
similarly, turned out different!)
v5: Fix up test unwind for aliasing-ppgtt (snb)
v6: Tighten language for uapi struct drm_i915_gem_vm_control.
v7: Patch the context image for runtime ppgtt switching!
Testcase: igt/gem_vm_create
Testcase: igt/gem_ctx_param/vm
Testcase: igt/gem_ctx_clone/vm
Testcase: igt/gem_ctx_shared
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-2-chris@chris-wilson.co.uk
When we return pages to the system, we ensure that they are marked as
being in the CPU domain since any external access is uncontrolled and we
must assume the worst. This means that we need to always flush the pages
on acquisition if we need to use them on the GPU, and from the beginning
have used set-domain. Set-domain is overkill for the purpose as it is a
general synchronisation barrier, but our intent is to only flush the
pages being swapped in. If we move that flush into the pages acquisition
phase, we know then that when we have obj->mm.pages, they are coherent
with the GPU and need only maintain that status without resorting to
heavy handed use of set-domain.
The principle knock-on effect for userspace is through mmap-gtt
pagefaulting. Our uAPI has always implied that the GTT mmap was async
(especially as when any pagefault occurs is unpredicatable to userspace)
and so userspace had to apply explicit domain control itself
(set-domain). However, swapping is transparent to the kernel, and so on
first fault we need to acquire the pages and make them coherent for
access through the GTT. Our use of set-domain here leaks into the uABI
that the first pagefault was synchronous. This is unintentional and
baring a few igt should be unoticed, nevertheless we bump the uABI
version for mmap-gtt to reflect the change in behaviour.
Another implication of the change is that gem_create() is presumed to
create an object that is coherent with the CPU and is in the CPU write
domain, so a set-domain(CPU) following a gem_create() would be a minor
operation that merely checked whether we could allocate all pages for
the object. On applying this change, a set-domain(CPU) causes a clflush
as we acquire the pages. This will have a small impact on mesa as we move
the clflush here on !llc from execbuf time to create, but that should
have minimal performance impact as the same clflush exists but is now
done early and because of the clflush issue, userspace recycles bo and
so should resist allocating fresh objects.
Internally, the presumption that objects are created in the CPU
write-domain and remain so through writes to obj->mm.mapping is more
prevalent than I expected; but easy enough to catch and apply a manual
flush.
For the future, we should push the page flush from the central
set_pages() into the callers so that we can more finely control when it
is applied, but for now doing it one location is easier to validate, at
the cost of sometimes flushing when there is no need.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
Define a mutex for the exclusive use of interacting with the per-file
context-idr, that was previously guarded by struct_mutex. This allows us
to reduce the coverage of struct_mutex, with a view to removing the last
bits coordinating GEM context later. (In the short term, we avoid taking
struct_mutex while using the extended constructor functions, preventing
some nasty recursion.)
v2: s/context_lock/context_idr_lock/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-2-chris@chris-wilson.co.uk
This allows us to ditch i915 in some more places.
v2: use local var in check_vgpu (Paulo)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-9-daniele.ceraolospurio@intel.com
This will allow futher simplifications in the uncore handling.
v2: move register access setup under uncore (Chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-8-daniele.ceraolospurio@intel.com
Get/put functions used outside of uncore.c are updated in the next
patch for a nicer split.
v2: use dev_priv where we still have it (Paulo)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-3-daniele.ceraolospurio@intel.com
With the introduction of the separate addressable bits into the device
info, we can remove the conflation of the ppgtt size from the ppgtt
type.
Based on a patch by Bob Paauwe.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-3-chris@chris-wilson.co.uk
As the maximum addressable bits is determined by platform, record that
information in our static chipset tables. This has the advantage of
being clearly recorded in our capability dumps for dmesg, debugfs and
error states.
Based on a patch by Bob Paauwe.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-2-chris@chris-wilson.co.uk
A new field with the training pattern(TP) wakeup time for PSR2 was
added to VBT, so lets use it when available otherwise it will
fallback to PSR1 wakeup time.
v2: replacing enum to numerical usec time (Jani)
BSpec: 20131
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190312195743.8829-1-jose.souza@intel.com
In order to make it easier to bring up new platforms
without having to take care about all corner cases
that was previously taken care for previous platforms
we already use comparative INTEL_GEN statements.
Let's start doing the same with PCH.
The only caveats are:
- less-than comparisons need to be avoided or done with
attention and check > PCH_NONE as well.
- It is not necessarily a chronological order, but a matter
of south display compatibility/inheritance.
v2: Rebased on top of Jani's clean-up which removed the
need for less-than comparison
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308214300.25057-3-rodrigo.vivi@intel.com
So we can later use PCH >= comparisons. The ultimate goal
is to make it easier for us to introduce a new platform
with south display engine on PCH just by reusing the previous
one.
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308214300.25057-2-rodrigo.vivi@intel.com
Currently we assume that we know the order in which requests run and so
can determine if we need to reissue a switch-to-kernel-context prior to
idling. That assumption does not hold for the future, so instead of
tracking which barriers have been used, simply determine if we have ever
switched away from the kernel context by using the engine and before
idling ensure that all engines that have been used since the last idle
are synchronously switched back to the kernel context for safety (and
else of shrinking memory while idle).
v2: Use intel_engine_mask_t and ALL_ENGINES
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk
When the system idles, we switch to the kernel context as a defensive
measure (no users are harmed if the kernel context is lost). Currently,
we issue a switch to kernel context and then come back later to see if
the kernel context is still current and the system is idle. However,
if we are no longer privy to the runqueue ordering, then we have to
relax our assumptions about the logical state of the GPU and the only
way to ensure that the kernel context is currently loaded is by issuing
a request to run after all others, and wait for it to complete all while
preventing anyone else from issuing their own requests.
v2: Pull wedging into switch_to_kernel_context_sync() but only after
waiting (though only for the same short delay) for the active context to
finish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-1-chris@chris-wilson.co.uk
We'll need to know the memory type in the system for some
bandwidth limitations and whatnot. Let's read that out on
gen9+.
v2: Rebase
v3: Fix the copy paste fail in the BXT bit definitions (Jani)
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306203551.24592-13-ville.syrjala@linux.intel.com
Pass the dimm struct to skl_is_16gb_dimm() rather than passing each
value separately. And let's replace the hardcoded set of values with
some simple arithmetic.
Also fix the byte vs. bit inconsistency in the debug message,
and polish the wording otherwise as well.
v2: Deobfuscate the math (Chris)
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306203551.24592-4-ville.syrjala@linux.intel.com
Instead of passing the gem_context and engine to find the instance of
the intel_context to use, pass around the intel_context instead. This is
useful for the next few patches, where the intel_context is no longer a
direct lookup.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306084704.15755-1-chris@chris-wilson.co.uk
To find the active request, we need only search along the individual
engine for the right request. This does not require touching any global
GEM state, so move it into the engine compartment.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-3-chris@chris-wilson.co.uk
In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.
v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)
v3: Tvrtko voted for moah churn so teach everyone to not mention ring
and use $class$instance throughout.
v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and
VCS[0-4] in later gen. We opt to keep the code consistent and use
0-index naming throughout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
Define a HAS_TRANSCODER_EDP() macro that checks if we have defined an
offset for this transcoder. This allows platforms to be defined without
eDP transcoder.
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190222230254.20351-2-lucas.demarchi@intel.com
We currently use a worker queued from an rcu callback to determine when
a how grace period has elapsed while we remained idle. We use this idle
delay to infer that we will be idle for a while and this is a suitable
point at which we can trim our global memory caches.
Since we wrote that, this mechanism now exists as rcu_work, and having
converted the idle shrinkers over to using that, we can remove our own
variant.
v2: Say goodbye to gt.epoch as well.
v3: Remove the misplaced and redundant comment before parking globals
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-3-chris@chris-wilson.co.uk
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.
The benefit for us is much reduced pointer dancing along the frequent
allocation paths.
v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
Other than LPT, no other PCH needed to differentiate between
LP and HP. So let's remove this before we spread this mistake
to future platforms.
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221211716.9433-1-rodrigo.vivi@intel.com
On skl the crc registers were extended to provide plane crcs
for up to 7 planes. Add the new crc sources.
The current code uses the ivb+ register definitions for skl+
which does happen to work as the plane1, plane2, and dmux/pf
bits happen the match what ivb+ had. So no bug in the current
code.
v2: Drop the unused set_wa parameter (DK)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190214192219.3858-4-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Defining the mei-i915 interface functions and initialization of
the interface.
v2:
Adjust to the new interface changes. [Tomas]
Added further debug logs for the failures at MEI i/f.
port in hdcp_port data is equipped to handle -ve values.
v3:
mei comp is matched for global i915 comp master. [Daniel]
In hdcp_shim hdcp_protocol() is replaced with const variable. [Daniel]
mei wrappers are adjusted as per the i/f change [Daniel]
v4:
port initialization is done only at hdcp2_init only [Danvet]
v5:
I915 registers a subcomponent to be matched with mei_hdcp [Daniel]
v6:
HDCP_disable for all connectors incase of comp_unbind.
Tear down HDCP comp interface at i915_unload [Daniel]
v7:
Component init and fini are moved out of connector ops [Daniel]
hdcp_disable is not called from unbind. [Daniel]
v8:
subcomponent name is dropped as it is already merged.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [v11]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1550338640-17470-5-git-send-email-ramalingam.c@intel.com
At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously (where once before they were both
serialised by the struct_mutex), we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset
(caused by either us timing out in our reset handler,
i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare
for a stuck modeset). If we suspect this is the case, that is we see a
wedged driver *and* reset in progress, then wait until the reset is
resolved before reporting upon the wedged status.
v2: might_sleep() (Mika)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk
As time goes by, usage of generic ioctls such as drm_syncobj and
sync_file are on the increase bypassing i915-specific ioctls like
GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
we track the file/client and account the waitboosting to them. However,
since commit 7b92c1bd05 ("drm/i915: Avoid keeping waitboost active for
signaling threads"), we no longer have been applying the client
ratelimiting on waitboosts and so that information has only been used
for debug tracking.
Push the application of waitboosting down to the common
i915_request_wait, and apply it to all foreign fence waits as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213092504.25709-1-chris@chris-wilson.co.uk
Previously, we were able to rely on the recursive properties of
struct_mutex to allow us to serialise revoking mmaps and reacquiring the
FENCE registers with them being clobbered over a global device reset.
I then proceeded to throw out the baby with the bath water in order to
pursue a struct_mutex-less reset.
Perusing LWN for alternative strategies, the dilemma on how to serialise
access to a global resource on one side was answered by
https://lwn.net/Articles/202847/ -- Sleepable RCU:
1 int readside(void) {
2 int idx;
3 rcu_read_lock();
4 if (nomoresrcu) {
5 rcu_read_unlock();
6 return -EINVAL;
7 }
8 idx = srcu_read_lock(&ss);
9 rcu_read_unlock();
10 /* SRCU read-side critical section. */
11 srcu_read_unlock(&ss, idx);
12 return 0;
13 }
14
15 void cleanup(void)
16 {
17 nomoresrcu = 1;
18 synchronize_rcu();
19 synchronize_srcu(&ss);
20 cleanup_srcu_struct(&ss);
21 }
No more worrying about stop_machine, just an uber-complex mutex,
optimised for reads, with the overhead pushed to the rare reset path.
However, we do run the risk of a deadlock as we allocate underneath the
SRCU read lock, and the allocation may require a GPU reset, causing a
dependency cycle via the in-flight requests. We resolve that by declaring
the driver wedged and cancelling all in-flight rendering.
v2: Use expedited rcu barriers to match our earlier timing
characteristics.
v3: Try to annotate locking contexts for sparse
v4: Reduce selftest lock duration to avoid a reset deadlock with fences
v5: s/srcu/reset_backoff_srcu/
v6: Remove more stale comments
Testcase: igt/gem_mmap_gtt/hang
Fixes: eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-2-chris@chris-wilson.co.uk
Changing the i915_edp_psr_debug was enabling, disabling or switching
PSR version by directly calling intel_psr_disable_locked() and
intel_psr_enable_locked(), what is not the default PSR path that will
be executed by real users.
So lets force a fastset in the PSR CRTC to trigger a pipe update and
stress the default code path.
Recently a bug was found when switching from PSR2 to PSR1 while
enable_psr kernel parameter was set to the default parameter, this
changes fix it and also fixes the bug linked bellow were DRRS was
left enabled together with PSR when enabling PSR from debugfs.
v2: Handling missing case: disabled to PSR1
v3: Not duplicating the whole atomic state(Maarten)
v4: Adding back the missing call to intel_psr_irq_control(Dhinakaran)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108341
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190206211845.5322-1-jose.souza@intel.com
Split the color management hooks along the single vs. double
buffered registers line. Of the currently programmed registers
GAMMA_MODE and the ilk+ pipe CSC are double buffered, the
LUTS and CHV CGM block are single buffered.
The double buffered register will be programmed during the
normal pipe update with evasion, and also during pipe enable
so that the settings will already be correct when the pipe
starts up before the planes are enabled.
The single buffered registers are currently programmed before
the vblank evade. Which is totally wrong, but we'll correct
that later.
v2: Add some docs to explain the two vfuncs (Matt,Uma)
Rebase
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-6-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Pass the crtc state etc. as const to the color management commit
functions. And while at it polish some of the local variables.
v2: Rebase
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-4-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
First of all GMCH can be considered a feature by itself
since it is a chip present in some platforms that connects
the IA processor to memory and other components in PC.
Also with the introduction of display block at device info,
we got a redundant definition:
.display.has_gmch_display = 1,
So, let's clean up things a bit and use the standardized
way of has_feature on displays side.
No functional change and no manual interaction to generate
this patch.
It is only:
sed -si -e 's/has_gmch_display/has_gmch/g' \
-e 's/HAS_GMCH_DISPLAY/HAS_GMCH/g' drivers/gpu/drm/i915/*{c,h}
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204222538.15842-1-rodrigo.vivi@intel.com
We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.
v2: record sseu configuration per context & engine (Chris)
v3: introduce the i915_gem_context_sseu to store powergating
programming, sseu_dev_info has grown quite a bit (Lionel)
v4: rename i915_gem_sseu into intel_sseu (Chris)
use to_intel_context() (Chris)
v5: More to_intel_context() (Tvrtko)
Switch intel_sseu from union to struct (Tvrtko)
Move context default sseu in existing loop (Chris)
v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)
Tvrtko Ursulin:
v7:
* Pass intel_sseu by pointer instead of value to make_rpcs.
* Rebase for make_rpcs changes.
v8:
* Rebase for RPCS edit on pin.
v9:
* Rebase for context image setup changes.
v10:
* Rename dev_priv to i915. (Chris Wilson)
v11:
* Rebase.
v12:
* Rebase for IS_GEN changes.
v13:
* Rebase for RUNTIME_INFO.
v14:
* Rebase for intel_context_init.
v15:
* Rebase for drm-tip changes.
v16:
* Moved struct intel_sseu definition to i915_gem_context.h.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-1-tvrtko.ursulin@linux.intel.com
On icl+ bspec tells us to calculate a separate minimum ddb
allocation from the blocks watermark. Both have to be checked
against the actual ddb allocation, but since we do things the
other way around we'll just calculat the minimum acceptable
ddb allocation by taking the maximum of the two values.
We'll also replace the memcmp() with a full trawl over the
the watermarks so that it'll ignore the min_ddb_alloc
because we can't directly read that out from the hw. I suppose
we could reconstruct it from the other values, but I was
too lazy to do that now.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-6-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Now that we pin timelines around use, we have a clearly defined lifetime
and convenient points at which we can track only the active timelines.
This allows us to reduce the list iteration to only consider those
active timelines and not all.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-6-chris@chris-wilson.co.uk
If we restrict ourselves to only using a cacheline for each timeline's
HWSP (we could go smaller, but want to avoid needless polluting
cachelines on different engines between different contexts), then we can
suballocate a single 4k page into 64 different timeline HWSP. By
treating each fresh allocation as a slab of 64 entries, we can keep it
around for the next 64 allocation attempts until we need to refresh the
slab cache.
John Harrison noted the issue of fragmentation leading to the same worst
case performance of one page per timeline as before, which can be
mitigated by adopting a freelist.
v2: Keep all partially allocated HWSP on a freelist
This is still without migration, so it is possible for the system to end
up with each timeline in its own page, but we ensure that no new
allocation would needless allocate a fresh page!
v3: Throw a selftest at the allocator to try and catch invalid cacheline
reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-4-chris@chris-wilson.co.uk
Currently, the list of timelines is serialised by the struct_mutex, but
to alleviate difficulties with using that mutex in future, move the
list management under its own dedicated mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-5-chris@chris-wilson.co.uk
Now that the submission backends are controlled via their own spinlocks,
with a wave of a magic wand we can lift the struct_mutex requirement
around GPU reset. That is we allow the submission frontend (userspace)
to keep on submitting while we process the GPU reset as we can suspend
the backend independently.
The major change is around the backoff/handoff strategy for performing
the reset. With no mutex deadlock, we no longer have to coordinate with
any waiter, and just perform the reset immediately.
Testcase: igt/gem_mmap_gtt/hang # regresses
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk
Mixed C99 and kernel types use is getting ugly. Prefer kernel types.
sed -i 's/\buint\(8\|16\|32\|64\)_t\b/u\1/g'
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190118120125.15484-7-jani.nikula@intel.com
Since commit 93065ac753 ("mm, oom: distinguish blockable mode for mmu
notifiers") we have been able to report failure from
mmu_invalidate_range_start which allows us to use a trylock on the
struct_mutex to avoid potential recursion and report -EBUSY instead.
Furthermore, this allows us to pull the work into the main callback and
avoid the sleight-of-hand in using a workqueue to avoid lockdep.
However, not all paths to mmu_invalidate_range_start are prepared to
handle failure, so instead of reporting the recursion, deal with it by
propagating the failure upwards, who can decide themselves to handle it
or report it.
v2: Mark up the recursive lock behaviour and comment on the various weak
points.
v3: Follow commit 3824e41975 ("drm/i915: Use mutex_lock_killable() from
inside the shrinker") and also use mutex_lock_killable().
v3.1: No leak on EINTR.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108375
References: 93065ac753 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190115124442.3500-1-chris@chris-wilson.co.uk
As the GT_IRQ power domain implies a wakeref, we can use it inplace of
our existing redundant rpm grab.
v2: Drop papering over forgetting to take the runtime wakeref in
selftests
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-20-chris@chris-wilson.co.uk
On module load and unload, we grab the POWER_DOMAIN_INIT powerwells and
transfer them to the runtime-pm code. We can use our wakeref tracking to
verify that the wakeref is indeed passed from init to enable, and
disable to fini; and across suspend.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-17-chris@chris-wilson.co.uk
The majority of runtime-pm operations are bounded and scoped within a
function; these are easy to verify that the wakeref are handled
correctly. We can employ the compiler to help us, and reduce the number
of wakerefs tracked when debugging, by passing around cookies provided
by the various rpm_get functions to their rpm_put counterpart. This
makes the pairing explicit, and given the required wakeref cookie the
compiler can verify that we pass an initialised value to the rpm_put
(quite handy for double checking error paths).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-16-chris@chris-wilson.co.uk
Record the wakeref used for keeping the device awake as the GPU is
executing requests and be sure to cancel the tracking upon parking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-3-chris@chris-wilson.co.uk
The majority of runtime-pm operations are bounded and scoped within a
function; these are easy to verify that the wakeref are handled
correctly. We can employ the compiler to help us, and reduce the number
of wakerefs tracked when debugging, by passing around cookies provided
by the various rpm_get functions to their rpm_put counterpart. This
makes the pairing explicit, and given the required wakeref cookie the
compiler can verify that we pass an initialised value to the rpm_put
(quite handy for double checking error paths).
For regular builds, the compiler should be able to eliminate the unused
local variables and the program growth should be minimal. Fwiw, it came
out as a net improvement as gcc was able to refactor rpm_get and
rpm_get_if_in_use together,
v2: Just s/rpm_put/rpm_put_unchecked/ everywhere, leaving the manual
mark up for smaller more targeted patches.
v3: Mention the cookie in Returns
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-2-chris@chris-wilson.co.uk
Everytime we take a wakeref, record the stack trace of where it was
taken; clearing the set if we ever drop back to no owners. For debugging
a rpm leak, we can look at all the current wakerefs and check if they
have a matching rpm_put.
v2: Use skip=0 for unwinding the stack as it appears our noinline
function doesn't appear on the stack (nor does save_stack_trace itself!)
v3: Allow rpm->debug_count to disappear between inspections and so
avoid calling krealloc(0) as that may return a ZERO_PTR not NULL! (Mika)
v4: Show who last acquire/released the runtime pm
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-1-chris@chris-wilson.co.uk