Final pile of features for 4.13
New uabi:
- batch bo in first slot, for faster execbuf assembly in userspace
(Chris Wilson)
- (sub)slice getparam, needed for mesa perf support (Robert Bragg)
First pile of patches for cnl/cfl support, maintained by Rodrigo but
with lots of contributions from others. Still incomplete since public
review still ongoing.
Features/refactoring:
- Make execbuf faster (Chris Wilson), a pile of series to make execbuf
buffer handling have fewer passes, use less list walking, postpone
more work to async workers and shuffle buffers less, all to make the
common case much faster (in some cases at least).
- cold boot support for glk dsi (Madhav Chauhan)
- Clean up pipe A quirk and related old platform hacks (Ville)
- perf sampling support for kbl/glk (Lionel)
- perf cleanups (Robert Bragg)
- wire atomic state to backlight code, to avoid pipe lookup hacks
(Maarten)
- reduce request waiting latency/overhead to remove the spinning and
associated cpu cycle wasting (Chris)
- fix 90/270 rotation wm computation (Ville)
- new ddb allocation algo for skl (Kumar Mahesh)
- fix regression due to system suspend optimiazatino (Imre)
- the usual pile of small cleanups and refactors all over
GVT updates contained in this tag:
- optimization for per-VM mmio save/restore (Changbin)
- optimization for mmio hash table (Changbin)
- scheduler optimization with event (Ping)
- vGPU reset refinement (Fred)
- other misc refactor and cleanups, etc.
* tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel: (170 commits)
drm/i915: Update DRIVER_DATE to 20170619
drm/i915/cfl: Introduce Coffee Lake workarounds.
drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH
drm/i915: Stash a pointer to the obj's resv in the vma
drm/i915: Async GPU relocation processing
drm/i915: Allow execbuffer to use the first object as the batch
drm/i915: Wait upon userptr get-user-pages within execbuffer
drm/i915: First try the previous execbuffer location
drm/i915: Store a persistent reference for an object in the execbuffer cache
drm/i915: Eliminate lots of iterations over the execobjects array
drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
drm/i915: Pass vma to relocate entry
drm/i915: Store a direct lookup from object handle to vma
drm/i915: Fix retrieval of hangcheck stats
drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
drm/i915: Mark CPU cache as dirty on every transition for CPU writes
drm/i915: Make i915_vma_destroy() static
drm/i915: Actually attach the tv_format property to the SDVO connector
Revert "drm/i915/skl: New ddb allocation algorithm"
drm/i915/glk: Add cold boot sequence for GLK DSI
...
This time around, the biggest thing is a bunch of GEM rework for more
fine grained locking and prep work to handle multiple address spaces
(ie. per-process pagetables). Also some HDMI fixes for 8x96
(snapdragon 820).
One unrelated bus patch, for something that seems to get merged
through whatever random tree (and has all the right ack's).
* tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux:
drm/msm: Fix potential buffer overflow issue
bus: SIMPLE_PM_BUS does not depend on ARCH_RENESAS
drm/msm: Separate locking of buffer resources from struct_mutex
drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96
drm/msm/hdmi: 8996 PLL: Populate unprepare
drm/msm/hdmi: Use bitwise operators when building register values
drm/msm: update generated headers
drm/msm: remove address-space id
drm/msm: support for an arbitrary number of address spaces
drm/msm: refactor how we handle vram carveout buffers
drm/msm: pass address-space to _get_iova() and friends
drm/msm/mdp4+5: move aspace/id to base class
drm/msm/mdp5: kill pipe_lock
drm/msm: fix locking inconsistency for gpu->hw_init()
drm/msm: Remove memptrs->wptr
drm/msm: Add a struct to pass configuration to msm_gpu_init()
drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
drm/msm: Remove idle function hook
drm/msm: Remove DRM_MSM_NUM_IOCTLS
drm/msm: gpu: Enable zap shader for A5XX
This change implements support for per-engine reset as an initial, less
intrusive hang recovery option to be attempted before falling back to the
legacy full GPU reset recovery mode if necessary. This is only supported
from Gen8 onwards.
Hangchecker determines which engines are hung and invokes error handler to
recover from it. Error handler schedules recovery for each of those engines
that are hung. The recovery procedure is as follows,
- identifies the request that caused the hang and it is dropped
- force engine to idle: this is done by issuing a reset request
- reset the engine
- re-init the engine to resume submissions.
If engine reset fails then we fall back to heavy weight full gpu reset
which resets all engines and reinitiazes complete state of HW and SW.
v2: Rebase.
v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before
calling i915_gem_reset_engine (Chris).
v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and
reuse the function for reset_engine.
v5: intel_reset_engine_start/cancel instead of request/unrequest_reset.
v6: Clean up reset_engine function to not require mutex, i.e. no need to call
revoke/restore_fences and _retire_requests (Chris).
v7: Remove leftovers from v5, i.e. no need to disable irq, hold
forcewake or wakeup the handoff bit (Chris).
v8: engine_retire_requests should be (and it was) static; explain that
we have to re-init the engine after reset, which is why the init_hw call
is needed; check reset-in-progress flag (Chris).
v9: Rebase, include code to pass the active request to gem_reset_engine
(as it is already done in full reset). Remove unnecessary
intel_reset_engine_start/cancel, these are executed as part of the
reset.
v10: Rebase, use the right I915_RESET_ENGINE flag.
v11: Fixup to call reset_finish_engine even on error.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk
This is a preparatory patch which modifies error handler to do per engine
hang recovery. The actual patch which implements this sequence follows
later in the series. The aim is to prepare existing recovery function to
adapt to this new function where applicable (which fails at this point
because core implementation is lacking) and continue recovery using legacy
full gpu reset.
A helper function is also added to query the availability of engine
reset. A subsequent patch will add the capability to query which type
of reset is present (engine -> full -> no-reset) via the get-param
ioctl.
It has been decided that the error events that are used to notify user of
reset will only be sent in case if full chip reset. In case of just
single (or multiple) engine resets, userspace won't be notified by these
events.
Note that this implementation of engine reset is for i915 directly
submitting to the ELSP, where the driver manages the hang detection,
recovery and resubmission. With GuC submission these tasks are shared
between driver and firmware; i915 will still responsible for detecting a
hang, and when it does it will have to request GuC to reset that Engine and
remind the firmware about the outstanding submissions. This will be
added in different patch.
v2: rebase, advertise engine reset availability in platform definition,
add note about GuC submission.
v3: s/*engine_reset*/*reset_engine*/. (Chris)
Handle reset as 2 level resets, by first going to engine only and fall
backing to full/chip reset as needed, i.e. reset_engine will need the
struct_mutex.
v4: Pass the engine mask to i915_reset. (Chris)
v5: Rebase, update selftests.
v6: Rebase, prepare for mutex-less reset engine.
v7: Pass reset_engine mask as a function parameter, and iterate over the
engine mask for reset_engine. (Chris)
v8: Use i915.reset >=2 in has_reset_engine; remove redundant reset
logging; add a reset-engine-in-progress flag to prevent concurrent
resets, and avoid dual purposing of reset-backoff. (Chris)
v9: Support reset of different engines in parallel (Chris)
v10: Handle reset-engine flag locking better (Chris)
v11: Squash in reporting of per-engine-reset availability.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ian Lister <ian.lister@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-4-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-5-chris@chris-wilson.co.uk
If we move the actual cleanup of the context to a worker, we can allow
the final free to be called from any context and avoid undue latency in
the caller.
v2: Negotiate handling the delayed contexts free by flushing the
workqueue before calling i915_gem_context_fini() and performing the final
free of the kernel context directly
v3: Flush deferred frees before new context allocations
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-2-chris@chris-wilson.co.uk
We do not want to carry on over missing constructors and don't
need a duplicated engine mask checking which is already done
in the setup phase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
So I've noticed a number of instances where it was not obvious from the
code whether ->task_list was for a wait-queue head or a wait-queue entry.
Furthermore, there's a number of wait-queue users where the lists are
not for 'tasks' but other entities (poll tables, etc.), in which case
the 'task_list' name is actively confusing.
To clear this all up, name the wait-queue head and entry list structure
fields unambiguously:
struct wait_queue_head::task_list => ::head
struct wait_queue_entry::task_list => ::entry
For example, this code:
rqw->wait.task_list.next != &wait->task_list
... is was pretty unclear (to me) what it's doing, while now it's written this way:
rqw->wait.head.next != &wait->entry
... which makes it pretty clear that we are iterating a list until we see the head.
Other examples are:
list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
list_for_each_entry(wq, &fence->wait.task_list, task_list) {
... where it's unclear (to me) what we are iterating, and during review it's
hard to tell whether it's trying to walk a wait-queue entry (which would be
a bug), while now it's written as:
list_for_each_entry_safe(pos, next, &x->head, entry) {
list_for_each_entry(wq, &fence->wait.head, entry) {
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename:
wait_queue_t => wait_queue_entry_t
'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.
Start sorting this out by renaming it to 'wait_queue_entry_t'.
This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
dev_addr isn't even a dma_addr_t, and DMA_ERROR_CODE has never been
a valid driver API. Add a bool mapped flag instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
DMA_ERROR_CODE already isn't a valid API to user for drivers and will
go away soon. exynos_drm_fb_dma_addr uses it a an error return when
the passed in index is invalid, but the callers never check for it
but instead pass the address straight to the hardware.
Add a WARN_ON instead and just return 0.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Unify and review everything, plus make sure it's all correct markup.
Drop the kernel-doc for internal functions. Also rework the overview
section, it's become rather outdated.
Unfortuantely the kernel-doc in drm_driver isn't rendered yet, but
that will change as soon as drm_driver is kernel-docified properly.
Also document properly that drm_vblank_cleanup is optional, the core
calls this already.
v2: Make it clear that cleanup happens in drm_dev_fini for drivers
with their own ->release callback (Thierry).
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170524145212.27837-11-daniel.vetter@ffwll.ch
A few more things for 4.13:
- Semaphore support using sync objects
- Drop fb location programming
- Optimize bo list ioctl
* 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Optimize mutex usage (v4)
drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2)
amdgpu: use drm sync objects for shared semaphores (v6)
amdgpu/cs: split out fence dependency checking (v2)
drm/amdgpu: don't check the default value for vm size
Here are the Mali DP driver changes. They include the mali-dp specific
changes from Jose Abreu on crtc->mode_valid() as well as a couple of
patches for fixing the sharing of IRQ lines and use of DRM CMA helper
for framebuffer physical address calculation. Please pull!
* 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld:
drm/arm: mali-dp: Use CMA helper for plane buffer address calculation
drm/mali-dp: Check PM status when sharing interrupt lines
drm/arm: malidp: Use crtc->mode_valid() callback
- HDMI stereoscopic support
- Rework of display code to properly support SOR pad macro routing on
>=GM20x GPUs
- Various other fixes/improvements.
* 'linux-4.13' of git://github.com/skeggsb/linux: (73 commits)
drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HW
drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition
drm/nouveau: Skip vga_fini on non-PCI device
drm/nouveau/tegra: Don't leave GPU in reset
drm/nouveau/tegra: Skip manual unpowergating when not necessary
drm/nouveau/hwmon: Change permissions to numeric
drm/nouveau/hwmon: expose the auto_point and pwm_min/max attrs
drm/nouveau/hwmon: Remove old code, add .write/.read operations
drm/nouveau/hwmon: Add nouveau_hwmon_ops structure with .is_visible/.read_string
drm/nouveau/hwmon: Add config for all sensors and their settings
drm/nouveau/disp/gm200-: allow non-identity mapping of SOR <-> macro links
drm/nouveau/disp/nv50-: implement a common supervisor 3.0
drm/nouveau/disp/nv50-: implement a common supervisor 2.2
drm/nouveau/disp/nv50-: implement a common supervisor 2.1
drm/nouveau/disp/nv50-: implement a common supervisor 2.0
drm/nouveau/disp/nv50-: implement a common supervisor 1.0
drm/nouveau/disp/nv50-gt21x: remove workaround for dp->tmds hotplug issues
drm/nouveau/disp/dp: use new devinit script interpreter entry-point
drm/nouveau/disp/dp: determine link bandwidth requirements from head state
drm/nouveau/disp: introduce acquire/release display path methods
...
drm/tegra: Changes for v4.13-rc1
This starts off with the addition of more documentation for the host1x
and DRM drivers and finishes with a slew of fixes and enhancements for
the staging IOCTLs as a result of the awesome work done by Dmitry and
Erik on the grate reverse-engineering effort.
* tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux:
gpu: host1x: At first try a non-blocking allocation for the gather copy
gpu: host1x: Refactor channel allocation code
gpu: host1x: Remove unused host1x_cdma_stop() definition
gpu: host1x: Remove unused 'struct host1x_cmdbuf'
gpu: host1x: Check waits in the firewall
gpu: host1x: Correct swapped arguments in the is_addr_reg() definition
gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall
gpu: host1x: Forbid RESTART opcode in the firewall
gpu: host1x: Forbid relocation address shifting in the firewall
gpu: host1x: Do not leak BO's phys address to userspace
gpu: host1x: Correct host1x_job_pin() error handling
gpu: host1x: Initialize firewall class to the job's one
drm/tegra: dc: Disable plane if it is invisible
drm/tegra: dc: Apply clipping to the plane
drm/tegra: dc: Avoid reset asserts on Tegra20
drm/tegra: Check syncpoint ID in the 'submit' IOCTL
drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL
drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL
drm/tegra: Add driver documentation
gpu: host1x: Flesh out kerneldoc
In function submit_create, if nr_cmds or nr_bos is assigned with
negative value, the allocated buffer may be small than intended.
Using this buffer will lead to buffer overflow issue.
Signed-off-by: Kasin Li <donglil@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
In original function amdgpu_bo_list_get, the waiting
for result->lock can be quite long while mutex
bo_list_lock was holding. It can make other tasks
waiting for bo_list_lock for long period.
Secondly, this patch allows several tasks(readers of idr)
to proceed at the same time.
v2: use rcu and kref (Dave Airlie and Christian König)
v3: update v1 commit message (Michel Dänzer)
v4: rebase on upstream (Alex Deucher)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
v2: Remove duplication of zeroing of bo list (Christian König)
Move idr_alloc function to end of ioctl (Christian König)
Call kfree bo_list when amdgpu_bo_list_set return error.
Combine the previous two patches into this patch.
Add amdgpu_bo_list_set function prototype.
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>