With MST there won't be a crtc assigned to the main link encoder, so
trying to dig up the pipe_config from there is a recipe for an oops.
Instead store the parameters (lane_count and link_rate) in the encoder,
and use those values during link training etc. Since those parameters
are now assigned only when the link is actually enabled,
.compute_config() won't clobber them as it did before.
Hardware state readout is still bonkers though as we don't transfer the
link parameters from pipe_config intel_dp. We should do that during
encoder sanitation. But since we don't even do a proper job of reading
out the main link encoder state for MST there's littel point in
worrying about this now.
Fixes a regression with MST caused by:
commit 90a6b7b052
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Mon Jul 6 16:39:15 2015 +0300
drm/i915: Move intel_dp->lane_count into pipe_config
v2: Different apporoach that should keep intel_dp_check_mst_status()
somewhat less oopsy
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Due to a coherency issue on BXT A steppings we can't guarantee a
coherent view of cached (CPU snooped) GPU mappings, so fail such
requests. User space is supposed to fall back to uncached mappings in
this case.
v2:
- limit the WA to A steppings, on later stepping this HW issue is fixed
v3:
- return error instead of trying to work around the issue in kernel,
since that could confuse user space (Chris)
Testcast: igt/gem_store_dword_batches_loop/cached-mapping
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By running igt/store_dword_loop_render on BXT we can hit a coherency
problem where the seqno written at GPU command completion time is not
seen by the CPU. This results in __i915_wait_request seeing the stale
seqno and not completing the request (not considering the lost
interrupt/GPU reset mechanism). I also verified that this isn't a case
of a lost interrupt, or that the command didn't complete somehow: when
the coherency issue occured I read the seqno via an uncached GTT mapping
too. While the cached version of the seqno still showed the stale value
the one read via the uncached mapping was the correct one.
Work around this issue by clflushing the corresponding CPU cacheline
following any store of the seqno and preceding any reading of it. When
reading it do this only when the caller expects a coherent view.
v2:
- fix using the proper logical && instead of a bitwise & (Jani, Mika)
- limit the workaround to A stepping, on later steppings this HW issue
is fixed
v3:
- use a separate get_seqno/set_seqno vfunc (Chris)
Testcase: igt/store_dword_loop_render
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We also need to call the frontbuffer flip to trigger proper
invalidations when disabling planes. Otherwise we will miss
screen updates when disabling sprites or cursor.
On core platforms where HW tracking also works, this issue
is totally masked because HW tracking triggers PSR exit
however on VLV/CHV that has only SW tracking we miss screen
updates when disabling planes.
It was caught with kms_psr_sink_crc sprite_plane_onoff
and cursor_plane_onoff subtests running on VLV/CHV.
This is probably a regression since I can also get this
with the manual test case, but with so many changes on atomic
modeset I couldn't track exactly when this was introduced.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM instructions are not really
variable length instructions unlike MI_LOAD_REGISTER_IMM where it expects
(reg, addr) pairs so use fixed length for these instructions.
v2: rebase
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Appease checkpatch as Mika spotted in i915_reg.h - it seems
terminally unhappy about i915_cmd_parser.c so that would be a separate
patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In commit
d1675198e: drm/i915: Integrate GuC-based command submission
the drm.tmpl include lines reference the intel_guc_submission.c but the
patch adds the file i915_guc_submission.c. drm.tmpl fails to build with:
docproc: .//drivers/gpu/drm/i915/intel_guc_submission.c:
No such file or directory
Change the file reference to the actual file.
Signed-off-by: Graham Whaley <graham.whaley@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There's so much scaler debugging messages that it makes other debugging
hard. Remove them.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This provides a means of reading status and counts relating
to GuC actions and submissions.
v2:
Remove surplus blank line in output [Chris Wilson]
v5:
Added GuC per-engine submission & seqno statistics
v6:
Add per-ring statistics to client, refactor client-dumper.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GuC-based submission is mostly the same as execlist mode, up to
intel_logical_ring_advance_and_submit(), where the context being
dispatched would be added to the execlist queue; at this point
we submit the context to the GuC backend instead.
There are, however, a few other changes also required, notably:
1. Contexts must be pinned at GGTT addresses accessible by the GuC
i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
2. The GuC's TLB must be invalidated after a context is pinned at
a new GGTT address.
3. GuC firmware uses the one page before Ring Context as shared data.
Therefore, whenever driver wants to get base address of LRC, we
will offset one page for it. LRC_PPHWSP_PN is defined as the page
number of LRCA.
4. In the work queue used to pass requests to the GuC, the GuC
firmware requires the ring-tail-offset to be represented as an
11-bit value, expressed in QWords. Therefore, the ringbuffer
size must be reduced to the representable range (4 pages).
v2:
Defer adding #defines until needed [Chris Wilson]
Rationalise type declarations [Chris Wilson]
v4:
Squashed kerneldoc patch into here [Daniel Vetter]
v5:
Update request->tail in code common to both GuC and execlist modes.
Add a private version of lr_context_update(), as sharing the
execlist version leads to race conditions when the CPU and
the GuC both update TAIL in the context image.
Conversion of error-captured HWS page to string must account
for offset from start of object to actual HWS (LRC_PPHWSP_PN).
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Turn on interrupt steering to route necessary interrupts to GuC.
v6:
Rebased
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A GuC client has its own doorbell and workqueue. It maintains the
doorbell cache line, process description object and work queue item.
A default guc_client is created for the i915 driver to use for
normal-priority in-order submission.
Note that the created client is not yet ready for use; doorbell
allocation will fail as we haven't yet linked the GuC's context
descriptor to the default contexts for each ring (see later patch).
v2:
Defer adding structure members until needed [Chris Wilson]
Rationalise type declarations [Chris Wilson]
v5:
Add GuC per-engine submission & seqno statistics.
Move wq locking to encompass both get_space() and add_item().
Take forcewake lock in host2guc_action() [Tom O'Rourke]
v6:
Fix GuC doorbell cacheline selection code (the
cacheline-within-page calculation was wrong).
Rename GuC priorities to make them closer to the names used in
the GuC firmware source, matching what the autogenerated
versions will (probably) be.
Add per-ring statistics to client.
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allocate a GEM object to hold GuC log data. A debugfs interface
(i915_guc_log_dump) is provided to print out the log content.
v2:
Add struct members at point of use [Chris Wilson]
v6:
Rebased
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This adds the first of the data structures used to communicate with the
GuC (the pool of guc_context structures).
We create a GuC-specific wrapper round the GEM object allocator as all
GEM objects shared with the GuC must be pinned into GGTT space at an
address that is NOT in the range [0..WOPCM_TOP), as that range of GGTT
addresses is not accessible to the GuC (from the GuC's point of view,
it's permanently reserved for other objects such as the BootROM & SRAM).
Later, we will need to allocate additional GuC-sharable objects for the
submission client(s) and the GuC's debug log.
v2:
Remove redundant initialisation [Chris Wilson]
Defer adding struct members until needed [Chris Wilson]
Local functions should pass dev_priv rather than dev [Chris Wilson]
v5:
Invalidate GuC TLB after allocating and pinning a new object
v6:
Rebased
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GuC submission is basically execlist submission, but with the GuC
handling the actual writes to the ELSP and the resulting context
switch interrupts. So to describe a context for submission via
the GuC, we need one of the same functions used in execlist mode.
This commit exposes one such function, changing its name to better
describe what it does (it's related to logical ring contexts rather
than to execlists per se).
v2:
Replaces previous "drm/i915: Move execlists defines from .c to .h"
v3:
Incorporates a change to one of the functions exposed here that was
previously part of an internal patch, but which was omitted from
the version recently committed to drm-intel-nightly:
7a01a0a drm/i915/lrc: Update PDPx registers with lri commands
So we reinstate this change here.
v4:
Drop v3 change, update function parameters due to collision with
8ee3615 drm/i915: Convert execlists_ctx_descriptor() for requests
v5:
Don't expose execlists_update_context() after all. The current
version is no longer compatible with GuC submission; trying to
share the execlist version of this function results in both GuC
and CPU updating TAIL in the context image, with bad results when
they get out of step. The GuC submission path now has its own
private version that just updates the ringbuffer start address,
and not TAIL or PDPx.
v6:
Rebased
Issue: VIZ-4884
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The new node provides access to the status of the GuC-specific loader;
also the scratch registers used for communication between the i915
driver and the GuC firmware.
v2:
Changes to output formats per Chris Wilson's suggestions
v6:
Rebased
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This fetches the required firmware image from the filesystem,
then loads it into the GuC's memory via a dedicated DMA engine.
This patch is derived from GuC loading work originally done by
Vinit Azad and Ben Widawsky.
v2:
Various improvements per review comments by Chris Wilson
v3:
Removed 'wait' parameter to intel_guc_ucode_load() as firmware
prefetch is no longer supported in the common firmware loader,
per Daniel Vetter's request.
Firmware checker callback fn now returns errno rather than bool.
v4:
Squash uC-independent code into GuC-specifc loader [Daniel Vetter]
Don't keep the driver working (by falling back to execlist mode)
if GuC firmware loading fails [Daniel Vetter]
v5:
Clarify WOPCM-related #defines [Tom O'Rourke]
Delete obsolete code no longer required with current h/w & f/w
[Tom O'Rourke]
Move the call to intel_guc_ucode_init() later, so that it can
allocate GEM objects, and have it fetch the firmware; then
intel_guc_ucode_load() doesn't need to fetch it later.
[Daniel Vetter].
v6:
Update comment describing intel_guc_ucode_load() [Tom O'Rourke]
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We only need the link_bw/rate_select parameters when starting link
training, and they should be computed based on the currently active
config, so throw them out from intel_dp and just compute on demand.
Toss in an extra debug print to see rate_select in addition to link_bw,
as the latter may be 0 for eDP 1.4.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
intel_dp->link_bw is going away, so consul the port_clock instead when
choosing between TP1 and TP3.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we clobber intel_dp->lane_count in compute config, which means
after a rejected modeset we may no longer be able to retrain the current
link. Move lane_count into pipe_config to avoid that.
v2: Add missing ':' to the pipe config debug dump
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use a separate variable for the TRANS_DP_CTL value instead of reusing
'tmp' that otherwise contains the DP port register value.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
All the *_ddi_pll_select() functions get passed the port_clock and pipe
config as parameters. We only need to pass the pipe config, and the
functions can dig up the port_clock themselves.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use port_clock instead of link_bw when picking the PLL parameters for
DP. link_bw may be zero with an eDP 1.4 sink that supports
DP_LINK_RATE_SET so we shouldn't use it for anything other than feed it
to the sink appropriately.
v2: Fix typo in commit message (Sivakumar)
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we treat intel_{dp,hdmi}->color_range as partly user
controller value (via the property) but we also change it during
.compute_config() when using the "Automatic" mode. That is a bit
confusing, so let's just change things so that we store the user
property values in intel_dp, and only change what's stored in
pipe_config during .compute_config().
There should be no functional change.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we queue the command or operation to change the scanout address, we
mark the flip as in progress. We can use this flag to prevent us from
checking for a stalled flip prior to its existence!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we don't clflush on pin_to_display if the bo is already
UC/WT and is not in the CPU write domain. This causes problems with
pwrite since pwrite doesn't change the write domain, and it avoids
clflushing on UC/WT buffers on LLC platforms unless the buffer is
currently being scanned out.
Fix the problem by marking the cache dirty and adjusting
i915_gem_object_set_cache_level() to clflush when the cache is dirty
even if the cache_level doesn't change.
My last attempt [1] at fixing this via write domain frobbing was shot
down, but now with the cache_dirty flag we can do things in a nicer way.
[1] http://lists.freedesktop.org/archives/intel-gfx/2014-November/055390.html
v2: Drop the I915_CACHE_NONE/WT checks from pwrite
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86422
Testcase: igt/kms_pwrite_crc
Testcase: igt/gem_pwrite_snooped
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
WA for BXT A0/A1, where DDIB's HPD pin is swapped to DDIA, so enabling
DDIA HPD pin in place of DDIB.
v2: For DP, irq_port is used to determine the encoder instead of
hpd_pin and removing the edp HPD logic because port A HPD is not
present(Imre)
v3: Rebased on top of Imre's patchset for enabling HPD on PORT A.
Added hpd_pin swapping for intel_dp_init_connector, setting encoder
for PORT_A as per the WA in irq_port (Imre)
v4: Dont enable interrupt for edp, also reframe the description (Siva)
v5: Don’t check for PORT_A in intel_ddi_init to update dig_port,
instead avoid setting hpd_pin itself (Imre)
Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And fix 0-DAY kernel test infrastructure warning.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the offset length being taken care of in ("drm/i915/gtt: Allow >=
4GB offsets in X86_32"), the code should be finally safe in 32-bit
kernels.
This reverts commit 501fd70fca
Author: Michel Thierry <michel.thierry@intel.com>
Date: Fri May 29 14:15:05 2015 +0100
drm/i915: limit PPGTT size to 2GB in 32-bit platforms
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to commit c44ef60e43 ("drm/i915/gtt:
Allow >= 4GB sizes for vm"), i915_gem_obj_offset and i915_gem_obj_ggtt_offset
return an unsigned long, which in only 4-bytes long in 32-bit kernels.
Change return type (and other related offset variables) to u64.
Since Global GTT is always limited to 4GB, this change would not be required
in i915_gem_obj_ggtt_offset, but this is done for consistency.
v2: Remove unnecessary offset variable in do_pin, as we already have
vma->node.start (Chris).
Update GGTT offset too (Tvrtko).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By Vesa DP 1.2 spec TEST_CRC_COUNT is a "4 bit wrap counter which
increments each time the TEST_CRC_x_x are updated."
However if we are trying to verify the screen hasn't changed we get
same (count, crc) pair twice. Without this patch we would return
-ETIMEOUT in this case.
So, if in 6 vblanks the pair (count, crc) hasn't changed we
return it anyway instead of returning error and let test case decide
if it was right or not.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By Vesa DP 1.2 Spec TEST_CRC_COUNT should be
"reset to 0 when TEST_SINK bit 0 = 0."
However for some strange reason when PSR is enabled in
certain platforms this is not true. At least not immediatelly.
So we face cases like this:
first get_sink_crc operation:
count: 0, crc: 000000000000
count: 1, crc: c101c101c101
returned expected crc: c101c101c101
secont get_sink_crc operation:
count: 1, crc: c101c101c101
count: 0, crc: 000000000000
count: 1, crc: 0000c1010000
should return expected crc: 0000c1010000
But also the reset to 0 should be faster resulting into:
get_sink_crc operation:
count: 1, crc: c101c101c101
count: 1, crc: 0000c1010000
should return expected crc: 0000c1010000
So in order to know that the second one is valid one
we need to compare the pair (count, crc) with latest (count, crc).
If the pair changed you have your valid CRC.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
By Vesa DP spec, test counter at DP_TEST_SINK_MISC just reset to 0
when unsetting DP_TEST_SINK_START, so let's force this stop here.
But let's minimize the aux transactions and just do it when we know
it hasn't been properly stoped.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GTT was only 32b and its max value is 4GB. In order to allow objects
bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
against max 48b range (1ULL << 48).
But since the check no longer applies, just kill the limit.
v2: Use the default ctx to infer the ppgtt max size (Akash).
v3: Just kill the limit, it was only there for early detection of an
error when used for execbuffer (Chris).
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise it can overflow in 48-bit mode, and cause an incorrect
exec_start.
Before commit 5f19e2bffa ("drm/i915: Merged
the many do_execbuf() parameters into a structure"), it was already an u64.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.
v2: Drop the warning about bind with size 0, it shouldn't happen anyway.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rely on used_px bits instead of null checking (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: For semaphore errors, object is mapped to GGTT and offset will not
be > 4GB, print only lower 32-bits (Akash)
v3: Print gtt_offset in groups of 32-bit (Chris)
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pd before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or proactive prefetch.
Also add a scratch pdp, which the PML4 entries point to.
v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.
v4: Rebase after final merged version of Mika's ppgtt/scratch patches
(and removed commit message part related to v3).
v5: Update commit message to also mention PML4 table initialization and
the new scratch pdp (Akash).
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.
Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".
v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.
v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
adding and extra clamp function; remove unnecessary pdp_start/pdp_len
variables (Akash).
v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
length (Akash).
v10: Remove pdp warning check ingen8_ppgtt_insert_pte_entries until this
commit (Akash).
Reviewed-by: Akash Goel <akash.goel@intel.com> (v9)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.
v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).
v4: Rebase.
Reviewed-by: Akash Goel <akash.goel@intel.com> (v3)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.
In LRC, the addressing mode must be specified in every context
descriptor, and the base address to PML4 is stored in the reg state.
v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v7: There is no need to update the pml4 register value in
execlists_update_context. (Akash)
v8: Move pd and pdp setup functions to a previous patch, they do not
belong here. (Akash)
v9: Check USES_FULL_48BIT_PPGTT instead of GEN8_CTX_ADDRESSING_MODE in
gen8_emit_bb_start to check if emit pdps is needed. (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.
The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.
v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v12: Fix pdpe start value in trace (Akash)
v13: Define all 4lvl functions in this patch directly, instead of
previous patches, add i915_page_directory_pointer_entry_alloc here,
use test_bit to detect when pdp is already allocated (Akash).
v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
funtion, as we do for pds and pts; move pd and pdp setup functions to
this patch (Akash).
v15: Added kfree(pdp) from previous patch to this (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
v6: Keep _insert_pte_entries changes outside this patch (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.
v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after gen8_map_pagetable_range removal.
v7: Use generic page name (px) in DECLARE_EVENT_CLASS (Akash)
v8: Defer define of i915_page_directory_pointer_entry_alloc (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
v4: Check and warn for NULL value of pdp pointer (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
In preparation for 4 level page tables, we need to stop using ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e. The temporal pdp local variable will be
removed once the rest of the 4-level code lands.
Also, start passing the vm pointer to the alloc functions, instead of
ppgtt.
v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
v7: Keep pagetable map in-line (and avoid unnecessary for_each_pde
loops), remove redundant ppgtt pointer in _alloc_pagetabs (Akash)
v8: Fix text indentation in _alloc_pagetabs/page_directories (Chris)
v9: Defer gen8_alloc_va_range_4lvl definition until 4lvl is implemented,
clean-up gen8_ppgtt_cleanup [pun intended] (Akash).
v10: Clean-up commit message (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: "Akash Goel" <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier.
32-bit ppgtt uses just 4 PDPs, while 48-bit ppgtt will have up-to 512;
this patch prepares the existing functions to query the right number of pdps
at run-time. This also means that used_pdpes should also be allocated during
ppgtt_init, as the bitmap size will depend on the ppgtt address range
selected.
v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Akash Goel <akash.goel@intel.com>
[danvet: Amend commit message as suggested by Michel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory
boundary.
Furthermore, i915_pte_count also restricts to the next page table
boundary.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: "Akash Goel" <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
An OEM may request increased I_boost beyond the recommended values
by specifying an I_boost value to be applied to all swing entries for
a port. These override values are specified in VBT.
v2: rebase and remove unused iboost_bit variable
Issue: VIZ-5676
Signed-off-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>