Commit d112a8163f ("gma500/cdv: Add eDP support") replaced the
code inside this if-conditional with gma_backlight_set(), which
becomes a nop stub if CONFIG_BACKLIGHT_CLASS_DEVICE is disabled.
So, there is no need to guard the caller with config_enabled().
Note:
This is one of remaining TODOs to deprecate config_enabled() macro.
Refer to commit 97f2645f35 ("tree-wide: replace config_enabled()
with IS_ENABLED()").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1471970574-23906-1-git-send-email-yamada.masahiro@socionext.com
I figured I might as well go ocd and make them booleans and rename the
locked version too.
v2: Review from Noralf.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Noralf Trønnes <noralf@tronnes.org>
Fixes: cfe63423d9 ("drm/fb-helper: Add drm_fb_helper_set_suspend_unlocked()")
Acked-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160823152727.31788-1-daniel.vetter@ffwll.ch
If the ring isn't ready lets print out which ring name
to help debugging.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In an if block for (running == 0) running cannot be non-zero.
v2: agd: remove unused variable
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In an if block for (running == 0) running cannot be non-zero.
v2: agd: remove unused variable
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
if (a == NULL || a->b == NULL)
leads to a NULL pointer dereference if a == NULL.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
if (a == NULL || a->b == NULL)
leads to a NULL pointer dereference if a == NULL.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In an if block for (running == 0) running cannot be non-zero.
v2: agd: remove unused variable
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In an if block for (running == 0) running cannot be non-zero.
v2: agd: remove unused variable
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixed indentation for readability.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This driver is the only user of of_drm_find_panel() which prints an
error before doing probe deferral, yielding messages like this on boot,
before eventually succeeding:
[ 2.234271] [drm:rockchip_dp_probe] *ERROR* failed to find panel
...
[ 4.797539] [drm:rockchip_dp_probe] *ERROR* failed to find panel
...
Let's just drop the message.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
vblank should be enabled regardless of whether an event
is expected back. This is especially important for a cursor
plane.
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Tested-by: Yakir Yang <ykk@rock-chip.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Remove the delayed worker, opting instead for the non-delayed
variety. Also introduce a lock to ensure we don't have races
with the worker and psr_state. Finally, cancel and wait for
the worker to finish when disabling the bridge.
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
A few things that need tidying up, no functional changes.
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
The handling of psr state is racey, shore that up with
a per-psr driver lock.
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
The delayed worker isn't needed and is racey. Remove it and do
the state change in line.
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Tested-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
This patch converts the psr_list_mutex to a spinlock and locks
all access to psr_list to avoid races (however unlikely they
were).
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Alway enable the PSR function for Rockchip analogix_dp driver. If panel
don't support PSR, then the core analogix_dp would ignore this setting.
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
The full name of PSR is Panel Self Refresh, panel device could refresh
itself with the hardware framebuffer in panel, this would make lots of
sense to save the power consumption.
This patch have exported two symbols for platform driver to implement
the PSR function in hardware side:
- analogix_dp_active_psr()
- analogix_dp_inactive_psr()
Reviewed-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
The PSR driver have exported four symbols for specific device driver, and
it's safe to call them in interrupt context:
- rockchip_drm_psr_register()
- rockchip_drm_psr_unregister()
- rockchip_drm_psr_enable()
- rockchip_drm_psr_disable()
- rockchip_drm_psr_flush()
Encoder driver should call the register/unregister interfaces to hook
itself into common PSR driver, encoder have implement the 'psr_set'
callback which use the set PSR state in hardware side.
Crtc driver would call the enable/disable interfaces when vblank is
enable/disable, after that the common PSR driver would call the encoder
registered callback to set the PSR state.
Fb driver would call the flush interface in 'fb->dirty' callback, this
helper function would force all PSR enabled encoders to exit from PSR
for 3 seconds.
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
[seanpaul removed leftover psr_enabled/psr_work kruft from drm_vop.c]
Signed-off-by: Sean Paul <seanpaul@chromium.org>
VOP have integrated a hardware counter which indicate the exact display
line that vop is scanning. And if we're interested in a specific line,
we can set the line number to vop line_flag register, and then vop would
generate a line_flag interrupt for it.
For example eDP PSR function is interested in the vertical blanking
period, then driver could set the line number to zero.
This patch have exported a symbol that allow other driver to listen the
line flag event with given timeout limit:
- rockchip_drm_wait_line_flag()
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Instead of just preparing the panel on bind, actually prepare/unprepare
during modeset/disable. The panel must be prepared in order to read hpd
status and edid, so we need to keep state around the prepares in order
to ensure we don't accidentally turn the panel off at the wrong time.
Reviewed-by: Yakir Yang <ykk@rock-chips.com>
Tested-by: Yakir Yang <ykk@rock-chips.com>
Reviewed-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
There are two VOP in rk3399 chip, respectively VOP_BIG and VOP_LIT.
most registers layout of this two vop is same, their framework are both
VOP_FULL, the Major differences of this two is that:
VOP_BIG max output resolution is 4096x2160.
VOP_LIT max output resolution is 2560x1600
VOP_BIG support four windows.
VOP_LIT only support two windows.
RK3399 vop register layout is similar with rk3288, so some feature
can reuse with rk3288.
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Some new vop register support mask, bit[16-31] is mask,
bit[0-15] is value, the mask is correspond to the value.
Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
[seanpaul masked 'v' per tfiga's review comments]
Reviewed-by: Mark Yao <mark.yao@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
No functional changes, sort the vop registers to make
code more readable.
Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
[seanpaul resolved conflict with name change from _3066 to _3036]
Reviewed-by: Mark Yao <mark.yao@rock-chips.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
This adds a function that also takes the console lock before calling
fb_set_suspend() in contrast to drm_fb_helper_set_suspend() which is
a plain wrapper around fb_set_suspend().
Resume is run asynchronously using a worker if the console lock is
already taken. This is modelled after the i915 driver.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1471953246-29602-1-git-send-email-noralf@tronnes.org
We were using the same mask twice. Looking at radeon, it seems
we should be using HDMI_AVI_INFO_CONT instead as the second mask.
Being there, fix typos in comments and improved readability.
I haven't looked at other DCEs, the mask may also be wrong for them.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make it more obvious what we are doing here.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
AMDGPU_GEM_CREATE_NO_CPU_ACCESS and AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED are
obviously mutual exclusive. So stop adding a dummy entry without effect when
both are specified.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make it more clear what this function does. No intendet functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Adding a BO can make it the insertion point for larger sizes as well.
v2: add a comment about the guard structure.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clean up whitespace and formatting.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For profiling.
v2: really bump the minor version
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
1. don't directly submit to many jobs at the same time.
2. delete unrelated printk.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
1. use mutex instead of spinlock for shadow list, since its process could
sleep.
2. move list_del to bo destroy phase.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
move shadow parameter to amdgpu_pte_update_params.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: minutemaidpark@hotmail.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
V2:
Checking if shadow is valid.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use shadow flag to judge which direction to sync.
V2:
Don't need bo pin, so remove it.
V3:
1. Split to two functions, one is backup_to_shadow, another is
restore_from_shadow.
2. Clean up previous shadow direction difinitions.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since dev_printk likes to print "(NULL device *):" when passed in a NULL
pointer, we have to manually call printk() ourselves.
Fixes: c4e68a5832 ("drm: Introduce DRM_DEV_* log messages")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819073750.16610-1-chris@chris-wilson.co.uk
The C standard does not specify the size of the integer used
to store an enum. Hence in structure drm_stats32_t alignment
bytes may exist.
To avoid exposing bytes from the kernel stack it is
necessary to initialize variable s32 completely.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1471802179-2886-1-git-send-email-xypron.glpk@gmx.de
V2:
add checking if need backup in amdgpu_bo_create.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Phenomenon: software hang when device resume back, read UVD fence is 0xffffffff
and read pcie pid is 0xffff.
The issue is caused by VCE reset when update cg setting. according to HW programming
guide, adjust update VCE cg sequence.
The patch apply to VCE2.0.
Signed-off-by: JimQu <Jim.Qu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
abort if the bo is pined to other domain already
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk
As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
Only fbc1 is tied to using a fence. Later iterations of fbc are more
flexible and allow operation on unfenced frontbuffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-3-chris@chris-wilson.co.uk
If the frontbuffer doesn't have an associated fence, it will have a
fence reg of -1. If we attempt to OR in this register into the FBC
control register we end up setting all control bits, oops!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviwed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-2-chris@chris-wilson.co.uk
In the recent patch
bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Fixes: bc3d674462 ("drm/i915: Allow userspace to request no-error-capture...")
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The plane .prepare_fb() and .cleanup_fb() helpers are optional, there's
no need to implement empty stubs, and no need to explicitly set the
function pointers to NULL either.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
[danvet: Resolved conflicts with Chris' patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The drivers have to modify the atomic plane state during the prepare_fb
callback so they track allocations, reservations and dependencies for
this atomic operation involving this fb. In particular, how else do we
set the plane->fence from the framebuffer!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818180017.20508-1-chris@chris-wilson.co.uk
A significant proportion of the cmdparsing time for some batches is the
cost to find the register in the mmiotable. We ensure that those tables
are in ascending order such that we could do a binary search if it was
ever merited. It is.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-38-chris@chris-wilson.co.uk
On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.
v2: Less magic mask (ok, it is still magic, but now you cannot see!)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-36-chris@chris-wilson.co.uk
The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:
/*
* If we attempt to generate a perfect hash, we should be able to look at bits
* 31:29 of a command from a batch buffer and use the full mask for that
* client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
*/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-35-chris@chris-wilson.co.uk
For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-34-chris@chris-wilson.co.uk
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine how best we
need to clflush the range (rather than every page of the object).
The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
Since I have been using the BCS_TIMESTAMP to measure latency of
execution upon the blitter ring, allow regular userspace to also read
from that register. They are already allowed RCS_TIMESTAMP!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-32-chris@chris-wilson.co.uk
If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-31-chris@chris-wilson.co.uk
Since commit 43566dedde ("drm/i915: Broaden application of
set-domain(GTT)") we allowed objects to be in the GTT domain, but unbound.
Therefore removing the GTT cache domain when removing the GGTT vma is no
longer semantically correct.
An unfortunate side-effect is we lose the wondrously named
i915_gem_object_finish_gtt(), not to be confused with
i915_gem_gtt_finish_object()!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-30-chris@chris-wilson.co.uk
We track the LRU access for eviction and bump the last access for the
user GGTT on set-to-gtt. When we do so we need to not only bump the
primary GGTT VMA but all partials as well. Similarly we want to
bump the last access tracking for when unpinning an object from the
scanout so that they do not get promptly evicted and hopefully remain
available for reuse on the next frame.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-29-chris@chris-wilson.co.uk
When using the aliasing ppgtt and pageflipping with the shrinker/eviction
active, we note that we often have to rebind the backbuffer before
flipping onto the scanout because it has an invalid alignment. If we
store the worst-case alignment required for a VMA, we can avoid having
to rebind at critical junctures.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-28-chris@chris-wilson.co.uk
The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is capable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).
v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible ramifications of not being
able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk
Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a page-fault-of-doom as we ping-pong between evicting two
objects.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk
If FBC is set on a framebuffer that is unmapped, all GTT faults will be
from a partial mapping. Writes by the user through the partial VMA are
then untracked by the FBC and so we must use the ORIGIN_CPU when flushing
the I915_GEM_DOMAIN_GTT.
v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk
If we want to create a partial vma from a chunk that is the same size as
the object, create a normal ggtt vma instead. The benefit is that it
will match future requests for the normal ggtt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-24-chris@chris-wilson.co.uk
We want to always use the partial VMA as a fallback for a failure to
bind the object into the GGTT. This extends the support partial objects
in the GGTT to cover everything, not just objects too large.
v2: Call the partial view, view not partial.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-23-chris@chris-wilson.co.uk
In order to support setting up fences for partial mappings of an object,
we have to align those mappings with the fence. The minimum chunksize we
choose is at least the size of a single tile row.
v2: Make minimum chunk size a define for later use
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-22-chris@chris-wilson.co.uk
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
v2: A couple of silly drops replaced spotted by Joonas
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
Keep any error reported by the gup_worker until we are notified that the
arena has changed (via the mmu-notifier). This has the importance of
making two consecutive calls to i915_gem_object_get_pages() reporting
the same error, and curtailing a loop of detecting a fault and requeueing
a gup_worker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-19-chris@chris-wilson.co.uk
If we have stolen available, make use of it for ringbuffer allocation.
Previously this was restricted to !llc platforms, as writing to stolen
requires a GGTT mapping - but now that we have partial mappable support,
the mappable aperture isn't quite so precious so we can use it more
freely and ringbuffers are a good user for the otherwise wasted stolen.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-chris@chris-wilson.co.uk
Now that we have WC vmapping available, we can bind our rings anywhere
in the GGTT and do not need to restrict them to the mappable region.
Except for stolen objects, for which direct access is verbatim and we
must use the mappable aperture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-17-chris@chris-wilson.co.uk
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
If we cannot pin the entire object into the mappable region of the GTT,
try to pin a single page instead. This is much more likely to succeed,
and prevents us falling back to the clflush slow path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-14-chris@chris-wilson.co.uk
With the introduction of the reloc page cache, we are just one step away
from refactoring the relocation write functions into one. Not only does
it tidy the code (slightly), but it greatly simplifies the control logic
much to gcc's satisfaction.
v2: Add selftests to document the relationship between the clflush
flags, the KMAP bit and packing into the page-aligned pointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-13-chris@chris-wilson.co.uk
There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody via the shrinker
may steal the lock (which lock? right now, it is struct_mutex, THE lock)
and change the cache domains after we have already inspected them.
(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-11-chris@chris-wilson.co.uk
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-9-chris@chris-wilson.co.uk
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.
Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-7-chris@chris-wilson.co.uk
If we cannot release the fence (for example if someone is inexplicably
trying to write into a tiled framebuffer that is currently pinned to the
display! *cough* kms_frontbuffer_tracking *cough*) fallback to using the
page-by-page pwrite/pread interface, rather than fail the syscall
entirely.
Since this is triggerable by the user (along pwrite) we have to remove
the WARN_ON(fence->pin_count).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-6-chris@chris-wilson.co.uk
Similarly to invalidating beforehand, if the object is mmapped via
I915_MMAP_WC we cannot track writes through the I915_GEM_DOMAIN_GTT. At
the conclusion of the write, i915_gem_object_flush_gtt_writes() we also
need to treat the origin carefully in case it may have been untracked.
See also commit aeecc9696a ("drm/i915: use ORIGIN_CPU for frontbuffer
invalidation on WC mmaps").
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-5-chris@chris-wilson.co.uk
As pwrite does not use the fence for its GTT access, and may even go
through a secondary interface avoiding the main VMA, we cannot treat the
write as automatically invalidated by the hardware and so we require
ORIGIN_CPU frontbufer invalidate/flushes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-4-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since vfree() now likes to WARN when passed a non-page-aligned pointer,
we need to discard the low bits to comply with it.
Fixes: d31d7cb146 ("drm/i915: Support for creating write combined type vmaps")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk