linux

Author	SHA1	Message	Date
Daniel Vetter	6c5566a82c	drm/i915: Allow i915_gem_setup_global_gtt to fail We already needs this just as a safety check in case the preallocation reservation dance fails. But we definitely need this to be able to move tha aliasing ppgtt setup back out of the context code to this place, where it belongs. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:30 +02:00
Daniel Vetter	5dc383b05a	drm/i915: Add proper prefix to obj_to_ggtt Stuff in headers really aught to have this. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:29 +02:00
Daniel Vetter	841cd77375	drm/i915: Only refcount ppgtt if it actually is one This essentially unbreaks non-ppgtt operation where we'd scribble over random memory. While at it give the vm_to_ppgtt function a proper prefix and make it a bit more paranoid. Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-13 14:23:29 +02:00
Daniel Vetter	ee960be7bb	drm/i915: Some cleanups for the ppgtt lifetime handling So when reviewing Michel's patch I've noticed a few things and cleaned them up: - The early checks in ppgtt_release are now redundant: The inactive list should always be empty now, so we can ditch these checks. Even for the aliasing ppgtt (though that's a different confusion) since we tear that down after all the objects are gone. - The ppgtt handling functions are splattered all over. Consolidate them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for get/put. - There was a bit a confusion in ppgtt_release about whether it cares about the active or inactive list. It should care about them both, so augment the WARNINGs to check for both. There's still create_vm_for_ctx left to do, put that is blocked on the removal of ppgtt->ctx. Once that's done we can rename it to i915_ppgtt_create and move it to its siblings for handling ppgtts. v2: Move the ppgtt checks into the inline get/put functions as suggested by Chris. v3: Inline the now redundant ppgtt local variable. Cc: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-12 15:24:04 +02:00
Michel Thierry	b9d06dd9d1	drm/i915: vma/ppgtt lifetime rules VMAs should take a reference of the address space they use. Now, when the fd is closed, it will release the ref that the context was holding, but it will still be referenced by any vmas that are still active. ppgtt_release() should then only be called when the last thing referencing it releases the ref, and it can just call the base cleanup and free the ppgtt. Note that with this we will extend the lifetime of ppgtts which contain shared objects. But all the non-shared objects will get removed as soon as they drop of the active list and for the shared ones the shrinker can eventually reap them. Since we currently can't evict ppgtt pagetables either I don't think that temporary leak is important. Signed-off-by: Michel Thierry <michel.thierry@intel.com> [danvet: Add note about potential ppgtt leak with this approach.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-12 15:22:26 +02:00
Oscar Mateo	454afebde8	drm/i915/bdw: Skeleton for the new logical rings submission path Execlists are indeed a brave new world with respect to workload submission to the GPU. In previous version of these series, I have tried to impact the legacy ringbuffer submission path as little as possible (mostly, passing the context around and using the correct ringbuffer when I needed one) but Daniel is afraid (probably with a reason) that these changes and, especially, future ones, will end up breaking older gens. This commit and some others coming next will try to limit the damage by creating an alternative path for workload submission. The first step is here: laying out a new ring init/fini. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:57 +02:00
Oscar Mateo	a83014d3f8	drm/i915: Abstract the legacy workload submission mechanism away As suggested by Daniel Vetter. The idea, in subsequent patches, is to provide an alternative to these vfuncs for the Execlists submission mechanism. v2: Splitted into two and reordered to illustrate our intentions, instead of showing it off. Also, remove the add_request vfunc and added the stop_ring one. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: - Make checkpatch happy. - Be grumpy about the excessive vtable. - Ditch gt->is_ring_initialized.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:40:32 +02:00
Oscar Mateo	127f100369	drm/i915/bdw: Macro for LRCs and module option for Execlists GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". These expanded contexts enable a number of new abilities, especially "Execlists". The macro is defined to off until we have things in place to hope to work. v2: Rename "advanced contexts" to the more correct "logical ring contexts". v3: Add a module parameter to enable execlists. Execlist are relatively new, and so it'd be wise to be able to switch back to ring submission to debug subtle problems that will inevitably arise. v4: Add an intel_enable_execlists function. v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 16:00:27 +02:00
Chris Wilson	82b6b6d786	drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access This migrates the fence tracking onto the existing seqno infrastructure so that the later conversion to tracking via requests is simplified. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 12:20:25 +02:00
Chris Wilson	e6a844687c	drm/i915: Force CPU relocations if not GTT mapped Move the decision on whether we need to have a mappable object during execbuffer to the fore and then reuse that decision by propagating the flag through to reservation. As a corollary, before doing the actual relocation through the GTT, we can make sure that we do have a GTT mapping through which to operate. Note that the key to make this work is to ditch the obj->map_and_fenceable unbind optimization - with full ppgtt it doesn't make a lot of sense any more anyway. v2: Revamp and resend to ease future patches. v3: Refresh patch rationale References: https://bugs.freedesktop.org/show_bug.cgi?id=81094 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Explain why obj->map_and_fenceable is key and split out the secure batch fix.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 12:01:29 +02:00
Chris Wilson	dc8cd1e790	drm/i915: Only perform set-to-gtt domain for objects bound to the global gtt If an object is not bound into the global GTT, then it cannot be accessed via the GTT. This restores the original code that was muddled by ppGTT. In the process, we remove a WARN that had long outlived its usefulness and was simply being coded around instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-11 11:36:12 +02:00
Linus Torvalds	889fa782bf	Merge tag 'drm-intel-fixes-2014-08-08' of git://anongit.freedesktop.org/drm-intel Pull intel drm fixes from Daniel Vetter: "So I heard that proper pull requests have a revert on top ;-) So here we go with my usual mid-merge-window pile of fixes. [ Ed. This revert thing had better not become the "in" thing ] Big fix is the duct-tape for ring init on g4x platforms, we seem to have found the magic again to make those machines as happy as before (not perfect though unfortunately, but that was never the case). Otherwise fixes all over: - tune down some overzealous debug output - VDD power sequencing fix after resume - bunch of dsi fixes for baytrail among them hw state checker de-noising - bunch of error state capture fixes for bdw - misc tiny fixes/workarounds for various platforms Last minute rebase was to kick out two patches that shouldn't have been in here - they're for the state checker, so 0 functional code affected. Jani's back from vacation, so he'll take over -fixes from here" * tag 'drm-intel-fixes-2014-08-08' of git://anongit.freedesktop.org/drm-intel: (21 commits) Revert "drm/i915: Enable semaphores on BDW" drm/i915: read HEAD register back in init_ring_common() to enforce ordering drm/i915: Fix crash when failing to parse MIPI VBT drm/i915: Bring GPU Freq to min while suspending. drm/i915: Fix DEIER and GTIER collecting for BDW. drm/i915: Don't accumulate hangcheck score on forward progress drm/i915: Add the WaCsStallBeforeStateCacheInvalidate:bdw workaround. drm/i915: Refactor Broadwell PIPE_CONTROL emission into a helper. drm/i915: Fix threshold for choosing 32 vs. 64 precisions for VLV DDL values drm/i915: Fix drain latency precision multipler for VLV drm/i915: Collect gtier properly on HSW. drm/i915: Tune down MCH_SSKPD values warning drm/i915: Tune done rc6 enabling output drm/i915: Don't require dev->struct_mutex in psr_match_conditions drm/i915: Fix error state collecting drm/i915: fix VDD state tracking after system resume drm/i915: Add correct hw/sw config check for DSI encoder drm/i915: factor out intel_edp_panel_vdd_sanitize drm/i915: wait for all DSI FIFOs to be empty drm/i915: work around warning in i915_gem_gtt ...	2014-08-08 10:24:36 -07:00
Linus Torvalds	a7d7a143d0	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull DRM updates from Dave Airlie: "Like all good pull reqs this ends with a revert, so it must mean we tested it, [ Ed. That's _one_ way of looking at it ] This pull is missing nouveau, Ben has been stuck trying to track down a very longstanding bug that revealed itself due to some other changes. I've asked him to send you a direct pull request for nouveau once he cleans things up. I'm away until Monday so don't want to delay things, you can make a decision on that when he sends it, I have my phone so I can ack things just not really merge much. It has one trivial conflict with your tree in armada_drv.c, and also the pull request contains some component changes that are already in your tree, the base tree from Russell went via Greg's tree already, but some stuff still shows up in here that doesn't when I merge my tree into yours. Otherwise all pretty standard graphics fare, one new driver and changes all over the place. New drivers: - sti kms driver for STMicroelectronics chipsets stih416 and stih407. core: - lots of cleanups to the drm core - DP MST helper code merged - universal cursor planes. - render nodes enabled by default panel: - better panel interfaces - new panel support - non-continuous cock advertising ability ttm: - shrinker fixes i915: - hopefully ditched UMS support - runtime pm fixes - psr tracking and locking - now enabled by default - userptr fixes - backlight brightness fixes - MST support merged - runtime PM for dpms - primary planes locking fixes - gen8 hw semaphore support - fbc fixes - runtime PM on SOix sleep state hw. - mmio base page flipping - lots of vlv/chv fixes. - universal cursor planes radeon: - Hawaii fixes - display scalar support for non-fixed mode displays - new firmware format support - dpm on more asics by default - GPUVM improvements - uncached and wc GTT buffers - BOs > visible VRAM exynos: - i80 interface support - module auto-loading - ipp driver consolidated. armada: - irq handling in crtc layer only - crtc renumbering - add component support - DT interaction changes. tegra: - load as module fixes - eDP bpp and sync polarity fixed - DSI non-continuous clock mode support - better support for importing buffers from nouveau msm: - mdp5/adq8084 v1.3 hw enablement - devicetree clk changse - ifc6410 board working tda998x: - component support - DT documentation update vmwgfx: - fix compat shader namespace" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (551 commits) Revert "drm: drop redundant drm_file->is_master" drm/panel: simple: Use devm_gpiod_get_optional() drm/dsi: Replace upcasting macro by function drm/panel: ld9040: Replace upcasting macro by function drm/exynos: dp: Modify driver to support drm_panel drm/exynos: Move DP setup into commit() drm/panel: simple: Add AUO B133HTN01 panel support drm/panel: simple: Support delays in panel functions drm/panel: simple: Add proper definition for prepare and unprepare drm/panel: s6e8aa0: Add proper definition for prepare and unprepare drm/panel: ld9040: Add proper definition for prepare and unprepare drm/tegra: Add support for panel prepare and unprepare routines drm/exynos: dsi: Add support for panel prepare and unprepare routines drm/exynos: dpi: Add support for panel prepare and unprepare routines drm/panel: simple: Add dummy prepare and unprepare routines drm/panel: s6e8aa0: Add dummy prepare and unprepare routines drm/panel: ld9040: Add dummy prepare and unprepare routines drm/panel: Provide convenience wrapper for .get_modes() drm/panel: add .prepare() and .unprepare() functions drm/panel: simple: Remove simple-panel compatible ...	2014-08-07 17:36:12 -07:00
Deepak S	274fa1c1ac	drm/i915: Bring GPU Freq to min while suspending. We might be leaving the PGU Frequency (and thus vnn) high during the suspend. Flusing the delayed work queue should take care of this. Signed-off-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-08-07 14:04:09 +02:00
Linus Torvalds	e7fda6c4c3	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...	2014-08-05 17:46:42 -07:00
Thomas Gleixner	5ed0bdf21a	drm: i915: Use nsec based interfaces Use ktime_get_raw_ns() and get rid of the back and forth timespec conversions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:50 -07:00
Chris Wilson	eedd10f45b	drm/i915: Simplify i915_gem_release_all_mmaps() An object can only have an active gtt mapping if it is currently bound into the global gtt. Therefore we can simply walk the list of all bound objects and check the flag upon those for an active gtt mapping. From commit `48018a57a8` Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Dec 13 15:22:31 2013 -0200 drm/i915: release the GTT mmaps when going into D3 Also note that the WARN is inappropriate for this function as GPU activity is orthogonal to GTT mmap status. Rather it is the caller that relies upon this condition and so it should assert that the GPU is idle itself. References: https://bugs.freedesktop.org/show_bug.cgi?id=80081 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [danvet: cherry-pick from -next to -fixes.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 16:09:51 +02:00
Armin Reese	9490edb588	drm/i915: Do not unmap object unless no other VMAs reference it When using an IOMMU, GEM objects are mapped by their DMA address as the physical address is unknown. This depends on the underlying IOMMU driver to map and unmap the physical pages properly as defined in intel_iommu.c. The current code will tell the IOMMU to unmap the GEM BO's pages on the destruction of the first VMA that "maps" that BO. This is clearly wrong as there may be other VMAs "mapping" that BO (using flink). The scanout is one such example. The patch fixes this issue by only unmapping the DMA maps when there are no more VMAs mapping that object. This is equivalent to when an object is considered unbound as can be seen by the code. On the first VMA that again because bound, we will remap. An alternate solution would be to move the dma mapping to object creation and destrubtion. I am not sure if this is considered an unfriendly thing to do. Some notes to backporters trying to backport full PPGTT: The bug can never be hit without enabling the IOMMU. The existing code will also do the right thing when the object is shared via dmabuf. The failure should be demonstrable with flink. In cases when not using intel_iommu_strict it is likely (likely, as defined by: off the top of my head) on current workloads to not hit this bug since we often teardown all VMAs for an object shared across multiple VMs. We also finish access to that object before the first dma_unmapping. intel_iommu_strict with flinked buffers is likely to hit this issue. Signed-off-by: Armin Reese <armin.c.reese@intel.com> [danvet: Add the excellent commit message provided by Ben.] Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:40 +02:00
Jesse Barnes	9df7575f1c	drm/i915: add helper for checking whether IRQs are enabled Now that we use the runtime IRQ enable/disable functions in our suspend path, we can simply check the pm._irqs_disabled flag everywhere. So rename it to catch the users, and add an inline for it to make the checks clear everywhere. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:34 +02:00
Chris Wilson	a1db2fa7c8	drm/i915: Abandon oom quickly if killed by a signal Whilst waiting to obtain our locks for the last resort shrinking before an oom, we check whether or not a fatal signal was pending. If there was, we do not need to keep waiting as the oom will be aborted. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-23 07:05:28 +02:00
Oscar Mateo	1b5d063faf	drm/i915: Generalize intel_ring_get_tail to take a ringbuf Again, it's low-level enough to simply take a ringbuf and nothing else. Trivial change. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 12:31:02 +02:00
Chris Wilson	ec5cc0f9b0	drm/i915: Restrict GPU boost to the RCS engine Make the assumption that media workloads are not as latency sensitive for __wait_seqno, and that upclocking the GPU does not affect the BLT engine. Under that assumption, we only wait to forcibly upclock the GPU when we are stalling for results from the render pipeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 10:25:17 +02:00
Rodrigo Vivi	ddd4dbc6c1	drm/i915: Updating comments. ring index calculation table was out of date after other rings were added, although the formula is flexible and scale when adding new rings. So this patch just update the comments and add a brief explanation why to use sync_seqno[ring index]. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 22:02:49 +02:00
Daniel Vetter	f99d70690e	drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush__domain and set_to__domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 18:14:47 +02:00
Daniel Vetter	a071fa0064	drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 10:04:41 +02:00
Daniel Vetter	3108e99ea9	drm/i915: Drop schedule_back from psr_exit It doesn't make sense to never again schedule the work, since by the time we might want to re-enable psr the world might have changed and we can do it again. The only exception is when we shut down the pipe, but that's an entirely different thing and needs to be handled in psr_disable. Note that later patch will again split psr_exit into psr_invalidate and psr_flush. But the split is different and this simplification helps with the transition. v2: Improve the commit message a bit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 09:59:19 +02:00
Daniel Vetter	f25748ea73	drm/i915: Don't BUG_ON in i915_gem_obj_offset A WARN_ON is perfectly fine. The BUG in here seems to be the cause behind hard-hangs when I cat the i915_gem_pageflip debugfs file (which calls this from an irq spinlock). But only while running a full igt run after a while. I still need to root cause the underlying issue. I'll also start reject patches which add new BUG_ON but don't come with a really good justification for it. The general rule really should be to just WARN and hope the driver survives for long enough. v2: Make the WARN a bit more useful per Chris' suggestion. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 00:48:37 +02:00
Ville Syrjälä	beff0d0f61	drm/i915: Don't prefault the entire obj if the vma is smaller Take the minimum of the object size and the vma size and prefault only that much. Avoids a SIGBUS when mmapping only a portion of the object. Prefaulting was introduced here: commit `b90b91d870` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jun 10 12:14:40 2014 +0100 drm/i915: Prefault the entire object on first page fault Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Testcase: igt/gem_mmap/short-mmap Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 00:48:35 +02:00
Sourab Gupta	84c33a64b4	drm/i915: Replaced Blitter ring based flips with MMIO flips This patch enables the framework for using MMIO based flip calls, in contrast with the CS based flip calls which are being used currently. MMIO based flip calls can be enabled on architectures where Render and Blitter engines reside in different power wells. The decision to use MMIO flips can be made based on workloads to give 100% residency for Media power well. v2: The MMIO flips now use the interrupt driven mechanism for issuing the flips when target seqno is reached. (Incorporating Ville's idea) v3: Rebasing on latest code. Code restructuring after incorporating Damien's comments v4: Addressing Ville's review comments -general cleanup -updating only base addr instead of calling update_primary_plane -extending patch for gen5+ platforms v5: Addressed Ville's review comments -Making mmio flip vs cs flip selection based on module parameter -Adding check for DRIVER_MODESET feature in notify_ring before calling notify mmio flip. -Other changes mostly in function arguments v6: -Having a seperate function to check condition for using mmio flips (Ville) -propogating error code from i915_gem_check_olr (Ville) v7: -Adding __must_check with i915_gem_check_olr (Chris) -Renaming mmio_flip_data to mmio_flip (Chris) -Rebasing on latest nightly v8: -Rebasing on latest code -squash 3rd patch in series(mmio setbase vs page flip race) with this patch -Added new tiling mode update in intel_do_mmio_flip (Chris) v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in intel_postpone_flip, as this is a more restrictive condition (Chris) v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch. These patches make the selection of CS vs MMIO flip at the page flip time, and make the module parameter for using mmio flips as tristate, the states being 'force CS flips', 'force mmio flips', 'driver discretion'. Changed the logic for driver discretion (Chris) v11: Minor code cleanup(better readability, fixing whitespace errors, using lockdep to check mutex locked status in postpone_flip, removal of __must_check in function definition) (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Akash Goel <akash.goel@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb [danvet: Fix up parameter alignement checkpatch spotted.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-17 16:16:20 +02:00
Chris Wilson	6254b2042c	drm/i915: Simplify i915_gem_release_all_mmaps() An object can only have an active gtt mapping if it is currently bound into the global gtt. Therefore we can simply walk the list of all bound objects and check the flag upon those for an active gtt mapping. From commit `48018a57a8` Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Dec 13 15:22:31 2013 -0200 drm/i915: release the GTT mmaps when going into D3 Also note that the WARN is inappropriate for this function as GPU activity is orthogonal to GTT mmap status. Rather it is the caller that relies upon this condition and so it should assert that the GPU is idle itself. References: https://bugs.freedesktop.org/show_bug.cgi?id=80081 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-16 19:52:20 +02:00
Rodrigo Vivi	7c8f8a7007	drm/i915: Force PSR exit by inactivating it. The perfect solution for psr_exit is the hardware tracking the changes and doing the psr exit by itself. This scenario works for HSW and BDW with some environments like Gnome and Wayland. However there are many other scenarios that this isn't true. Mainly one right now is KDE users on HSW and BDW with PSR on. User would miss many screen updates. For instances any key typed could be seen only when mouse cursor is moved. So this patch introduces the ability of trigger PSR exit on kernel side on some common cases that. Most of the cases are coverred by psr_exit at set_domain. The remaining cases are coverred by triggering it at set_domain, busy_ioctl, sw_finish and mark_busy. The downside here might be reducing the residency time on the cases this already work very wall like Gnome environment. But so far let's get focused on fixinge issues sio PSR couild be used for everybody and we could even get it enabled by default. Later we can add some alternatives to choose the level of PSR efficiency over boot flag of even over crtc property. v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and also this isn't needed for BDW and HSW anyway. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 21:21:36 +02:00
Chris Wilson	b90b91d870	drm/i915: Prefault the entire object on first page fault Inserting additional PTEs has no side-effect for us as the pfn are fixed for the entire time the object is resident in the global GTT. The downside is that we pay the entire cost of faulting the object upon the first hit, for which we in return receive the benefit of removing the per-page faulting overhead. On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 2 tiled surfaces: 8607MiB/s -> 8625MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8127MiB/s Upload rate for 4 tiled surfaces: 8611MiB/s -> 8602MiB/s Upload rate for 8 linear surfaces: 8114MiB/s -> 8124MiB/s Upload rate for 8 tiled surfaces: 8601MiB/s -> 8603MiB/s Upload rate for 16 linear surfaces: 8110MiB/s -> 8123MiB/s Upload rate for 16 tiled surfaces: 8595MiB/s -> 8606MiB/s Upload rate for 32 linear surfaces: 8104MiB/s -> 8121MiB/s Upload rate for 32 tiled surfaces: 8589MiB/s -> 8605MiB/s Upload rate for 64 linear surfaces: 8107MiB/s -> 8121MiB/s Upload rate for 64 tiled surfaces: 2013MiB/s -> 3017MiB/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Goel, Akash" <akash.goel@intel.com> Testcasee: igt/gem_fence_upload/performance Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:41 +02:00
Chris Wilson	ef0cf27c4d	drm/i915: Use the .release hook to drop the stolen drm_mm tracking Now that we have a release hook into i915_gem_object_free, we can move the explicit call to the internal stolen function and hook it up throught the callback instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:36 +02:00
David Herrmann	f461d1be22	drm/i915: use shmem helpers if possible Instead of shuffling gfp-masks all the time, use the shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are set in mapping_gfp_mask() for i915, so the behavior is still the same. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-11 16:57:38 +02:00
Dave Airlie	ecb889e620	Merge tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel into drm-next > Bunch of stuff for 3.16 still: > - Mipi dsi panel support for byt. Finally! From Shobhit&others. I've > squeezed this in since it's a regression compared to vbios and we've > been ridiculed about it a bit too often ... > - connection_mutex deadlock fix in get_connector (only affects i915). > - Core patches from Matt's primary plane from Matt Roper, I've pushed the > i915 stuff to 3.17. > - vlv power well sequencing fixes from Jesse. > - Fix for cursor size changes from Chris. > - agpbusy fixes from Ville. > - A few smaller things. > * tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel: (32 commits) drm/i915: BDW: Adding missing cursor offsets. drm: Fix getconnector connection_mutex locking drm/i915/bdw: Only use 2g GGTT for 32b platforms drm/i915: Nuke pipe A quirk on i830M drm/i915: fix display power sw state reporting drm/i915: Always apply cursor width changes drm/i915: tell the user if both KMS and UMS are disabled drm/plane-helper: Add drm_plane_helper_check_update() (v3) drm: Check CRTC compatibility in setplane drm/i915: use VBT to determine whether to enumerate the VGA port drm/i915: Don't WARN about ring idle bit on gen2 drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS drm/i915: Enable interrupt-based AGPBUSY# enable on 85x drm/i915: Flip the sense of AGPBUSY_DIS bit drm/i915: Set AGPBUSY# bit in init_clock_gating drm/i915/vlv: add pll assertion when disabling DPIO common well drm/i915/vlv: move DPIO common reset de-assert into __vlv_set_power_well drm/i915/vlv: re-order power wells so DPIO common comes after TX drm/i915/vlv: move CRI refclk enable into __vlv_set_power_well ...	2014-06-06 19:07:09 +10:00
Dave Airlie	8d4ad9d4bb	Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next Merge drm-fixes into drm-next. Both i915 and radeon need this done for later patches. Conflicts: drivers/gpu/drm/drm_crtc_helper.c drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/i915_gem_gtt.c	2014-06-05 20:28:59 +10:00
Chris Wilson	ddeff6ee42	drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object If the user tries to mmap through the GTT an object that is marked as snooped, we report an error rather than allow the GPU to hang the machine. The choice of EINVAL, however, was unfortunate as we turn that into a WARN rather than a quiet SIGBUS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:41 +02:00
Ville Syrjälä	dbb42748ac	drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS Move the MI_ARB_STATE MI_ARB_C3_LP_WRITE_ENABLE setup to gen3_init_clock_gating() from i915_gem_load() when KMS is enabled. Leave it in i915_gem_load() for the UMS case, but add an explcit check, just to make it easier to spot it when we eventually rip out UMS support. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:40 +02:00
Chris Wilson	d23db88c3a	drm/i915: Prevent negative relocation deltas from wrapping This is pure evil. Userspace, I'm looking at you SNA, repacks batch buffers on the fly after generation as they are being passed to the kernel for execution. These batches also contain self-referenced relocations as a single buffer encompasses the state commands, kernels, vertices and sampler. During generation the buffers are placed at known offsets within the full batch, and then the relocation deltas (as passed to the kernel) are tweaked as the batch is repacked into a smaller buffer. This means that userspace is passing negative relocations deltas, which subsequently wrap to large values if the batch is at a low address. The GPU hangs when it then tries to use the large value as a base for its address offsets, rather than wrapping back to the real value (as one would hope). As the GPU uses positive offsets from the base, we can treat the relocation address as the minimum address read by the GPU. For the upper bound, we trust that userspace will not read beyond the end of the buffer. So, how do we fix negative relocations from wrapping? We can either check that every relocation looks valid when we write it, and then position each object such that we prevent the offset wraparound, or we just special-case the self-referential behaviour of SNA and force all batches to be above 256k. Daniel prefers the latter approach. This fixes a GPU hang when it tries to use an address (relocation + offset) greater than the GTT size. The issue would occur quite easily with full-ppgtt as each fd gets its own VM space, so low offsets would often be handed out. However, with the rearrangement of the low GTT due to capturing the BIOS framebuffer, it is already affecting kernels 3.15 onwards. I think only IVB+ is susceptible to this bug, but the workaround should only kick in rarely, so it seems sensible to always apply it. v3: Use a bias for batch buffers to prevent small negative delta relocations from wrapping. v4 from Daniel: - s/BIAS/BATCH_OFFSET_BIAS/ - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions were growing rather cumbersome. - Add a comment to eb_get_batch explaining why we do this. - Apply the batch offset bias everywhere but mention that we've only observed it on gen7 gpus. - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch. v5: Add static to eb_get_batch, spotted by 0-day tester. Testcase: igt/gem_bad_reloc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:40 +03:00
Chris Wilson	00731155a7	drm/i915: Fix dynamic allocation of physical handles A single object may be referenced by multiple registers fundamentally breaking the static allotment of ids in the current design. When the object is used the second time, the physical address of the first assignment is relinquished and a second one granted. However, the hardware is still reading (and possibly writing) to the old physical address now returned to the system. Eventually hilarity will ensue, but in the short term, it just means that cursors are broken when using more than one pipe. v2: Fix up leak of pci handle when handling an error during attachment, and avoid a double kmap/kunmap. (Ville) Rebase against -fixes. v3: And fix the error handling added in v2 (Ville) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:39 +03:00
Oscar Mateo	273497e5cd	drm/i915: s/i915_hw_context/intel_context Up until now, contexts had one (and only one) backing object that was used by the hardware to save/restore render ring contexts (via the MI_SET_CONTEXT command). Other rings did not have or need this, so our i915_hw_context struct had a 1:1 relationship with a a real HW context. With Logical Ring Contexts and Execlists, this is not possible anymore: all rings need a backing object, and it cannot be reused. To prepare for that, rename our contexts to the more generic term intel_context. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:41:17 +02:00
Oscar Mateo	ee1b1e5ef3	drm/i915: Split the ringbuffers from the rings (2/3) This refactoring has been performed using the following Coccinelle semantic script: @@ struct intel_engine_cs r; @@ ( - (r).obj + r.buffer->obj \| - (r).virtual_start + r.buffer->virtual_start \| - (r).head + r.buffer->head \| - (r).tail + r.buffer->tail \| - (r).space + r.buffer->space \| - (r).size + r.buffer->size \| - (r).effective_size + r.buffer->effective_size \| - (r).last_retired_head + r.buffer->last_retired_head ) @@ struct intel_engine_cs *r; @@ ( - (r)->obj + r->buffer->obj \| - (r)->virtual_start + r->buffer->virtual_start \| - (r)->head + r->buffer->head \| - (r)->tail + r->buffer->tail \| - (r)->space + r->buffer->space \| - (r)->size + r->buffer->size \| - (r)->effective_size + r->buffer->effective_size \| - (r)->last_retired_head + r->buffer->last_retired_head ) @@ expression E; @@ ( - LP_RING(E)->obj + LP_RING(E)->buffer->obj \| - LP_RING(E)->virtual_start + LP_RING(E)->buffer->virtual_start \| - LP_RING(E)->head + LP_RING(E)->buffer->head \| - LP_RING(E)->tail + LP_RING(E)->buffer->tail \| - LP_RING(E)->space + LP_RING(E)->buffer->space \| - LP_RING(E)->size + LP_RING(E)->buffer->size \| - LP_RING(E)->effective_size + LP_RING(E)->buffer->effective_size \| - LP_RING(E)->last_retired_head + LP_RING(E)->buffer->last_retired_head ) Note: On top of this this patch also removes the now unused ringbuffer fields in intel_engine_cs. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: Add note about fixup patch included here.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:27:25 +02:00
Oscar Mateo	a4872ba6d0	drm/i915: s/intel_ring_buffer/intel_engine_cs In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:01:05 +02:00
Chris Wilson	340fbd8ca1	drm/i915: Only discard backing storage on releasing the last ref Before purging our pages (as opposed to copying back the contents from the GPU), make sure that there is not an exposed CPU mmapping through which the user can inspect the results. Regression from commit `5537252b6b` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 25 13:23:06 2014 +0000 drm/i915: Invalidate our pages under memory pressure Testcase: igt/gem_mmap/new-object Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79005 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Guo Jinxian <jinxianx.guo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 15:06:34 +02:00
Chris Wilson	2cfcd32a92	drm/i915: Implement an oom-notifier for last resort shrinking Before the process killer is invoked, oom-notifiers are executed for one last try at recovering pages. We can hook into this callback to be sure that everything that can be is purged from our page lists, and to give a summary of how much memory is still pinned by the GPU in the case of an oom. This should be really valuable for debugging OOM issues. Note that the last-ditch effort call to shrink_all we've previously called from our normal shrinker when we could free as much as the vm demaned is moved into the oom notifier. Since the shrinker accounting races against bind/unbind operations we might have called shrink_all prematurely, which this approach with an oom notifier avoids. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> [danvet: Bikeshed logical \| into \|\| and pimp commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 10:57:13 +02:00
Chris Wilson	5537252b6b	drm/i915: Invalidate our pages under memory pressure Try to flush out dirty pages into the swapcache (and from there into the swapfile) when under memory pressure and forced to drop GEM objects from memory. In effect, this should just allow us to discard unused pages for memory reclaim and to start writeback earlier. v2: Hugh Dickins warned that explicitly starting writeback from shrink_slab was prone to deadlocks within shmemfs. Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:51:18 +02:00
Chris Wilson	b453c4dbc3	drm/i915: Refactor common lock handling between shrinker count/scan We can share a few lines of tricky lock handling we need to use for both shrinker routines and in the process fix the return value for count() when reporting a deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:52 +02:00
Chris Wilson	ceabbba524	drm/i915: Include bound and active pages in the count of shrinkable objects When the machine is under a lot of memory pressure and being stressed by multiple GPU threads, we quite often report fewer than shrinker->batch (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to skip calling into i915.ko to release pages, despite the GPU holding onto most of the physical pages in its active lists. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:06 +02:00
Chris Wilson	0820baf39b	drm/i915: Translate ENOSPC from shmem_get_page() to ENOMEM shmemfs first checks if there is enough memory to allocate the page and reports ENOSPC should there be insufficient, along with the usual ENOMEM for a genuine allocation failure. We use ENOSPC in our driver to mean that we have run out of aperture space and so want to translate the error from shmemfs back to our usual understanding of ENOMEM. None of the the other GEM users appear to distinguish between ENOMEM and ENOSPC in their error handling, hence it is easiest to do the fixup in i915.ko Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:45:22 +02:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Oscar Mateo	19656430a8	drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display Otherwise, we do a NULL pointer dereference. I've seen this happen while handling an error in i915_gem_object_pin_to_display_plane(): If i915_gem_object_set_cache_level() fails, we call is_pin_display() to handle the error. At this point, the object is still not pinned to GGTT and maybe not even bound, so we have to check before we dereference its GGTT vma. The IGT kms_flip/bo-too-big tests for this bug. v2: Chris Wilson says restoring the old value is easier, but that is_pin_display is useful as a theory of operation. Take the solomonic decision: at least this way is_pin_display is a little more robust (until Chris can kill it off). v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its usefulness: add a reminder to remove it. Issue: VIZ-3772 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Testcase: igt/kms_flip/bo-too-big Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 16:24:39 +02:00
Daniel Vetter	8b1bc9b4f1	drm/i915: Only do gtt cleanup in vma_unbind for the global vma Otherwise we end up tearing down fences when e.g. the client quits way too early. Might or might not fix a fence pin_count BUG Ville has reported. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:54 +02:00
Daniel Vetter	aff10b30a1	drm/i915: Don't drop pinned fences Userspace can currently provoke this when e.g. trying to use a pinned scanout as a cursor or overlay target. Later on that might lead to some fun fence pin count mayhem. Spurred by Ville's report that something goes wrong here and originally I've thought that this might slip through the pwrite gtt fastpath. But that one checks of obj tiling, so should be ok. But one thing that _does_ blow up is the vma unbinding with more than one address space. The next patch will fix this. v2: Use a WARN_ON - Chris pointed out that we already catch all cases so userspace can't provoke this like I've originally feared. While reviewing relevant code I've noticed a pile of DRM_ERROR in the overlay&cursor code which are all triggerable by userspace. Tune them down while at it. v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch, as requested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:31 +02:00
Daniel Vetter	d8ffa60b52	drm/i915: WARN_ON fence pin leaks The fence pin count should always be <= the bo pin count. If that's not the case then we have a funny problem and are leaking references somewhere. Which means we can catch fence pin leaks by checking for the same upper limit as we do for the bo pin count. Inspired by a discussion with Ville about a fence leak igt testcase. v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that might catch a leak even quicker. Also de-inline them, they're getting too big. v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count check will catch that already (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-13 17:16:12 +02:00
Chris Wilson	1cf0ba1474	drm/i915: Flush request queue when waiting for ring space During the review of commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Ville raised the point that our interaction with request->tail was likely to foul up other uses elsewhere (such as hang check comparing ACTHD against requests). However, we also need to restore the implicit retire requests that certain test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the spectre that the ppgtt will randomly call i915_gpu_idle() and recurse back into intel_ring_begin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023 Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Remove now unused 'tail' variable as spotted by Brad.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-08 01:23:34 +02:00
Ben Widawsky	6e7186af3b	drm/i915: Make aliasing a 2nd class VM There is a good debate to be had about how best to fit the aliasing PPGTT into the code. However, as it stands right now, getting aliasing PPGTT bindings is a hack, and done through implicit arguments. To make this absolutely clear, WARN and return an error if a driver writer tries to do something they shouldn't. I have no issue with an eventual revert of this patch. It makes sense for what we have today. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-07 10:01:41 +02:00
Ben Widawsky	ebc348b2ad	drm/i915: Move semaphore specific ring members to struct This will be helpful in abstracting some of the code in preparation for gen8 semaphores. v2: Move mbox stuff to a separate struct v3: Rebased over VCS2 work Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 10:56:52 +02:00
Chris Wilson	c8725f3dc0	drm/i915: Do not call retire_requests from wait_for_rendering A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:09:15 +02:00
Imre Deak	981a5aead1	drm/i915: vlv: clean up GTLC wake control/status register macros These will be needed by the upcoming VLV RPM helpers. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:50 +02:00
Zhao Yakui	845f74a701	drm/i915:Initialize the second BSD ring on BDW GT3 machine Based on the hardware spec, the BDW GT3 machine has two independent BSD ring that can be used to dispatch the video commands. So just initialize it. V3->V4: Follow Imre's comment to do some minor updates. For example: more comments are added to describe the semaphore between ring. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> [danvet: Fix up checkpatch error.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:46 +02:00
Chris Wilson	6099032045	drm/i915: Allow the module to load even if we fail to setup rings Even without enabling the ringbuffers to allow command execution, we can still control the display engines to enable modesetting. So make the ringbuffer initialization failure soft, and mark the GPU as wedged instead. v2: Only treat an EIO from ring initialisation as a soft failure, and abort module load for any other failure, such as allocation failures. v3: Add an ERROR prior to declaring the GPU wedged so that it stands out like a sore thumb in the logs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:38 +02:00
Chris Wilson	e3efda49e7	drm/i915: Preserve ring buffers objects across resume Tearing down the ring buffers across resume is overkill, risks unnecessary failure and increases fragmentation. After failure, since the device is still active we may end up trying to write into the dangling iomapping and trigger an oops. v2: stop_ringbuffers() was meant to call stop(ring) not cleanup(ring) during resume! Reported-by: Jae-hyeon Park <jhyeon@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351 References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as suggested by Oscar.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:37 +02:00
Dave Airlie	444c9a08bf	Merge branch 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm into drm-next Next pull request, this time more of the drm de-midlayering work. The big thing is that his patch series here removes everything from drm_bus except the set_busid callback. Thierry has a few more patches on top of this to make that one optional to. With that we can ditch all the non-pci drm_bus implementations, which Thierry has already done for the fake tegra host1x drm_bus. Reviewed by Thierry, Laurent and David and now also survived some testing on my intel boxes to make sure the irq fumble is fixed correctly ;-) The last minute rebase was just to add the r-b tags from Thierry for the 2 patches I've redone. * 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm: drm/<drivers>: don't set driver->dev_priv_size to 0 drm: Remove dev->kdriver drm: remove drm_bus->get_name drm: rip out dev->devname drm: inline drm_pci_set_unique drm: remove bus->get_irq implementations drm: pass the irq explicitly to drm_irq_install drm/irq: Look up the pci irq directly in the drm_control ioctl drm/irq: track the irq installed in drm_irq_install in dev->irq drm: rename dev->count_lock to dev->buf_lock drm: Rip out totally bogus vga_switcheroo->can_switch locking drm: kill drm_bus->bus_type drm: remove drm_dev_to_irq from drivers drm/irq: remove cargo-culted locking from irq_install/uninstall drm/irq: drm_control is a legacy ioctl, so pci devices only drm/pci: fold in irq_by_busid support drm/irq: simplify irq checks in drm_wait_vblank	2014-05-01 09:32:21 +10:00
Dave Airlie	885ac04ab3	Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2014-04-16: - vlv infoframe fixes from Jesse - dsi/mipi fixes from Shobhit - gen8 pageflip fixes for LRI/SRM from Damien - cmd parser fixes from Brad Volkin - some prep patches for CHV, DRRS, ... - and tons of little things all over drm-intel-next-2014-04-04: - cmd parser for gen7 but only in enforcing and not yet granting mode - the batch copying stuff is still missing. Also performance is a bit ... rough (Brad Volkin + OACONTROL fix from Ken). - deprecate UMS harder (i.e. CONFIG_BROKEN) - interrupt rework from Paulo Zanoni - runtime PM support for bdw and snb, again from Paulo - a pile of refactorings from various people all over the place to prep for new stuff (irq reworks, power domain polish, ...) drm-intel-next-2014-04-04: - cmd parser for gen7 but only in enforcing and not yet granting mode - the batch copying stuff is still missing. Also performance is a bit ... rough (Brad Volkin + OACONTROL fix from Ken). - deprecate UMS harder (i.e. CONFIG_BROKEN) - interrupt rework from Paulo Zanoni - runtime PM support for bdw and snb, again from Paulo - a pile of refactorings from various people all over the place to prep for new stuff (irq reworks, power domain polish, ...) Conflicts: drivers/gpu/drm/i915/i915_gem_context.c	2014-05-01 09:11:37 +10:00
Daniel Vetter	bb0f1b5c16	drm: pass the irq explicitly to drm_irq_install Unfortunately this requires a drm-wide change, and I didn't see a sane way around that. Luckily it's fairly simple, we just need to inline the respective get_irq implementation from either drm_pci.c or drm_platform.c. With that we can now also remove drm_dev_to_irq from drm_irq.c. Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-23 10:32:50 +02:00
Daniel Vetter	e090c53b21	drm/irq: remove cargo-culted locking from irq_install/uninstall The dev->struct_mutex locking in drm_irq.c only protects dev->irq_enabled. Which isn't really much at all and only prevents especially nasty ums userspace from concurrently installing the interrupt handling a few times. Or at least trying. There are tons of unlocked readers of dev->irqs_enabled in the vblank wait code (and by extension also in the pageflip code since that uses the same vblank timestamp engine). Real modesetting drivers should ensure that nothing can go haywire with a sane setup teardown sequence. So we only really need this for the drm_control ioctl, everywhere else this will just paper over nastiness. Note that drm/i915 is a bit specially due to the gem+ums combination. So there we also need to properly protect the entervt and leavevt ioctls. But it's definitely saner to do everything in one go than to drop the lock in-between. Finally there's the gpu reset code in drm/i915. That one's just race (concurrent userspace calls to for vblank waits of pageflips could spuriously fail). So wrap it up in with a nice comment since fixing this is more involved. v2: Rebase and fix commit message (Thierry) Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-22 11:41:12 +02:00
Chris Wilson	691e6415c8	drm/i915: Always use kref tracking for all contexts. If we always initialize kref for the context, even if we are using fake contexts for hangstats when there is no hw support, we can forgo the dance to dereference the ctx->obj and inspect whether we are permitted to use kref inside i915_gem_context_reference() and _unreference(). My ulterior motive here is to improve the debugging of a use-after-free of ctx->obj. This patch avoids the dereference here and instead forces the assertion checks associated with kref. v2: Refactor the fake contexts to being even more like the real contexts, so that there is much less duplicated and special case code. v3: Tweaks. v4: Tweaks, minor. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [Jani: tiny change to backport to drm-intel-fixes.] Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-04-11 13:29:51 +03:00
Mika Kuoppala	88b4aa8770	drm/i915: add flags to i915_ring_stop Piglit runner and QA are both looking at the dmesg for DRM_ERRORs with test cases. Add a flag to control those when we they are expected from related test cases. Also add flag to control if contexts should be banned that introduced the hang. Hangcheck is timer based and preventing bans by adding sleeps to testcases makes testing slower. v2: intel_ring_stopped(), readable comment (Chris) v3: keep compatibility (Daniel) References: https://bugs.freedesktop.org/show_bug.cgi?id=75876 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-09 15:07:42 +02:00
Dave Airlie	9f97ba806a	Merge tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel into drm-next Merge window -fixes pull request as usual. Well, I did sneak in Jani's drm_i915_private_t typedef removal, need to have fun with a big sed job too ;-) Otherwise: - hdmi interlaced fixes (Jesse&Ville) - pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but not cc: stable since only really relevant for igt runs) - large cursor wm fixes (Chris) - fix gpu turbo boost/throttle again, was getting stuck due to vlv rps patches (Chris+Imre) - fix runtime pm fallout (Paulo) - bios framebuffer inherit fix (Chris) - a few smaller things * tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits) Skip intel_crt_init for Dell XPS 8700 drm/i915: vlv: fix RPS interrupt mask setting Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec" drm/i915: move power domain init earlier during system resume drm/i915: Fix the computation of required fb size for pipe drm/i915: don't get/put runtime PM at the debugfs forcewake file drm/i915: fix WARNs when reading DDI state while suspended drm/i915: don't read cursor registers on powered down pipes drm/i915: get runtime PM at i915_display_info drm/i915: don't read pp_ctrl_reg if we're suspended drm/i915: get runtime PM at i915_reg_read_ioctl drm/i915: don't schedule force_wake_timer at gen6_read drm/i915: vlv: reserve the GT power context only once during driver init drm/i915: prefer struct drm_i915_private to drm_i915_private_t drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t ...	2014-04-05 16:14:21 +10:00
Lauri Kasanen	62347f9e0f	drm: Add support for two-ended allocation, v3 Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Based on drm_mm work by Chris Wilson. v3: Changed to use a TTM placement flag v2: Updated kerneldoc Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Christian König <deathsimple@vodafone.de> Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: David Airlie <airlied@redhat.com>	2014-04-04 09:28:14 +10:00
Jani Nikula	3e31c6c017	drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t No functional changes. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-31 15:32:38 +02:00
Chris Wilson	df6f783a4e	drm/i915: Fix unsafe loop iteration over vma whilst unbinding them On non-LLC platforms, when changing the cache level of an object, we may need to unbind it so that prefetching across page boundaries does not cross into a different memory domain. This requires us to unbind conflicting vma, but we did so iterating over the objects vma in an unsafe manner (as the list was being modified as we iterated). The regression was introduced in commit `3089c6f239` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 17:00:03 2013 -0700 drm/i915: make caching operate on all address spaces apparently as far back as v3.12-rc1, but it has only just begun to trigger real world bug reports. Reported-and-tested-by: Nikolay Martynov <mar.kolya@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-21 16:13:08 +01:00
Paulo Zanoni	5d584b2eca	drm/i915: move pc8.irqs_disabled to pm.irqs_disabled When other platforms add runtime PM support they will also need to disable interrupts, so move the variable to the runtime PM struct. Also notice that the longer-term goal is to completely kill the regsave struct, and I even have patches for that. v2: - Rebase. v3: - Rebase. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-19 16:39:46 +01:00
Daniel Vetter	b80d6c781e	Merge branch 'topic/dp-aux-rework' into drm-intel-next-queued Conflicts: drivers/gpu/drm/i915/intel_dp.c A bit a mess with reverts which differe in details between -fixes and -next and some other unrelated shuffling. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-19 15:54:37 +01:00
Dave Airlie	d1583c9997	Merge branch 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux into drm-next This is the 3rd respin of the drm-anon patches. They allow module unloading, use the pin_fs_* helpers recommended by Al and are rebased on top of drm-next. Note that there are minor conflicts with the "drm-minor" branch. * 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux: drm: init TTM dev_mapping in ttm_bo_device_init() drm: use anon-inode instead of relying on cdevs drm: add pseudo filesystem for shared inodes	2014-03-18 19:17:02 +10:00
David Herrmann	6796cb16c0	drm: use anon-inode instead of relying on cdevs DRM drivers share a common address_space across all character-devices of a single DRM device. This allows simple buffer eviction and mapping-control. However, DRM core currently waits for the first ->open() on any char-dev to mark the underlying inode as backing inode of the device. This delayed initialization causes ugly conditions all over the place: if (dev->dev_mapping) do_sth(); To avoid delayed initialization and to stop reusing the inode of the char-dev, we allocate an anonymous inode for each DRM device and reset filp->f_mapping to it on ->open(). Signed-off-by: David Herrmann <dh.herrmann@gmail.com>	2014-03-16 12:23:33 +01:00
Ville Syrjälä	3ddffb7b8a	drm/i915: Unbind all vmas whose new cache_level doesn't agree with the neighbours When we change the cache_level for an object we need to make sure we don't put differing types of snoopable memory too close to each other on non-LLC machines. Currently i915_gem_object_set_cache_level() will stop looking when it finds just one vma that has such a conflict. Drop the bogus break statement to make sure it will unbind all vmas which need to be moved around to avoid the conflict. I suppose this is a theoretical issue as currently we don't enable ppgtt on non-LLC machines, so each object can only have one vma. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-13 12:22:44 +01:00
Chris Wilson	c2831a94b5	drm/i915: Do not force non-caching copies for pwrite along shmem path We don't always want to write into main memory with pwrite. The shmem fast path in particular is used for memory that is cacheable - under such circumstances forcing the cache eviction is undesirable. As we will always flush the cache when targeting incoherent buffers, we can rely on that second pass to apply the cache coherency rules and so benefit from in-cache copies otherwise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-08 00:03:26 +01:00
Chris Wilson	17793c9a46	drm/i915: Process page flags once rather than per pwrite/pread We used to lock individual pages inside the buffer object and so needed to update the page flags every time. However, we now pin the pages into the object for the duration of the pwrite/pread (and hopefully much longer) and so we can forgo the flag updates until we release all the pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-08 00:03:01 +01:00
Brad Volkin	4c914c0c7c	drm/i915: Refactor shmem pread setup The command parser is going to need the same synchronization and setup logic, so factor it out for reuse. v2: Add a check that the object is backed by shmem Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-07 22:36:59 +01:00
Damien Lespiau	cb216aa844	drm/i915: Make i915_gem_retire_requests_ring() static Its last usage outside of i915_gem.c was removed in: commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:39 +01:00
Chris Wilson	ab0e7ff9f2	drm/i915: Record pid/comm of hanging task After finding the guilty batch and request, we can use it to find the process that submitted the batch and then add the culprit into the error state. This is a slightly different approach from Ben's in that instead of adding the extra information into the struct i915_hw_context, we use the information already captured in struct drm_file which is then referenced from the request. v2: Also capture the workaround buffer for gen2, so that we can compare its contents against the intended batch for the active request. v3: Rebase (Mika) v4: Check for null context (Chris) checkpatch warnings fixed Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4) Acked-by: Ben Widawsky <ben@bwidawsk.net> Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	8d9fc7fd2d	drm/i915: Rely on accurate request tracking for finding hung batches In the past, it was possible to have multiple batches per request due to a stray signal or ENOMEM. As a result we had to scan each active object (filtered by those having the COMMAND domain) for the one that contained the ACTHD pointer. This was then made more complicated by the introduction of ppgtt, whereby ACTHD then pointed into the address space of the context and so also needed to be taken into account. This is a fairly robust approach (though the implementation is a little fragile and depends upon the per-generation setup, registers and parameters). However, due to the requirements for hangstats, we needed a robust method for associating batches with a particular request and having that we can rely upon it for finding the associated batch object for error capture. If the batch buffer tracking is not robust enough, that should become apparent quite quickly through an erroneous error capture. That should also help to make sure that the runtime reporting to userspace is robust. It also means that we then report the oldest incomplete batch on each ring, which can be useful for determining the state of userspace at the time of a hang. v2: Use i915_gem_find_active_request (Mika) v3: remove check for ring->get_seqno, split long lines (Ben) v4: check that context is available (Chris) checkpatch warnings fixed Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	64bf930379	drm/i915: Reset vma->mm_list after unbinding In place of true activity counting, we walk the list of vma associated with an object managing each on the vm's active/inactive list everytime we call move-to-inactive. This depends upon the vma->mm_list being cleared after unbinding, or else we run into difficulty when tracking the object in multiple vm's - we see a use-after free and corruption of the mm_list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:23 +01:00
Ville Syrjälä	ccc7bed05e	drm/i915: Don't ban default context when stop_rings!=0 If we've explicitly stopped the rings for testing purposes, don't ban the default context. Fixes kms_flip hang tests. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:14 +01:00
Chris Wilson	f62a007603	drm/i915: Accurately track when we mark the hardware as idle/busy We currently call intel_mark_idle() too often, as we do so as a side-effect of processing the request queue. However, we the calls to intel_mark_idle() are expected to be paired with a call to intel_mark_busy() (or else we try to idle the hardware by accessing registers that are already disabled). Make the idle/busy tracking explicit to prevent the multiple calls. v2: We can drop some of the complexity in __i915_add_request() as queue_delayed_work() already behaves as we want (not requeuing the item if it is already in the queue) and mark_busy/mark_idle imply that the idle task is inactive. v3: We do still need to cancel the pending idle task so that it is sent again after the current busy load completes (not in the middle of it). Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:10 +01:00
Daniel Vetter	8ea99c9287	drm/i915: Only bind each object rather than for every execbuffer One side-effect of the introduction of ppgtt was that we needed to rebind the object into the appropriate vm (and global gtt in some peculiar cases). For simplicity this was done twice for every object on every call to execbuffer. However, that adds a tremendous amount of CPU overhead (rewriting all the PTE for all objects into WC memory) per draw. The fix is to push all the decision about which vm to bind into and when down into the low-level bind routines through hints rather than performing the bind unconditionally in the execbuffer routine. Note that this is a regression introduced in the full ppgtt feature branch, before this we've only done re-bound objects when the relevant has_(aliasing_ppgtt\|global_gtt)_mapping flag was clear. But since that's per-object and not per-vma that optimization broke. v2: Split out prep work and unrelated changes. v3: Bring back functional change around PIN_GLOBAL that I've accidentally split out. v4: Remove the temporary hack for the old binding logic to avoid bisection issues. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906 Tested-by: jianx.zhou@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:38 +01:00
Daniel Vetter	262de14531	drm/i915: Directly return the vma from bind_to_vm This is prep work for reworking the object_pin logic. Atm it still does a (now redundant) lookup of the vma. The next patch will fix this. Split out from Chris vma-bind rework. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:30 +01:00
Daniel Vetter	b287110e89	drm/i915: Simplify i915_gem_object_ggtt_unpin Split out from Chris vma-bind rework. Jani wondered why this is save, and the reason is that i915_vma_unbind does all these checks, too. So they're redundant. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:21 +01:00
Daniel Vetter	bf3d149b25	drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE With abitrary pin flags it makes sense to split out a "please bind this into global gtt" from the "please allocate in the mappable range". Use this unconditionally in our global gtt pin helper since this is what its callers want. Later patches will drop PIN_MAPPABLE where it's not strictly needed. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:17:27 +01:00
Daniel Vetter	1ec9e26dda	drm/i915: Consolidate binding parameters into flags Anything more than just one bool parameter is just a pain to read, symbolic constants are much better. Split out from Chris' vma-binding rework patch. v2: Undo the behaviour change in object_pin that Chris spotted. v3: Split out misplaced hunk to handle set_cache_level errors, spotted by Jani. v4: Keep the current over-zealous binding logic in the execbuffer code working with a quick hack while the overall binding code gets shuffled around. v5: Reorder the PIN_ flags for more natural patch splitup. v6: Pull out the PIN_GLOBAL split-up again. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:16:58 +01:00
Chris Wilson	6e4930f6ee	drm/i915: Flush GPU rendering with a lockless wait during a pagefault Arjan van de Ven reported that on his test machine that he was seeing stalls of greater than 1 frame greatly impacting the user experience. He tracked this down to being the locked flush during a pagefault as being the culprit hogging the struct_mutex and so blocking any other user from proceeding. Stalling on a pagefault is bad behaviour on userspace's part, for one it means that they are ignoring the coherency rules on pointer access through the GTT, but fortunately we can apply the same trick as the set-to-domain ioctl to do a lightweight, nonblocking flush of outstanding rendering first. "Prior to the patch it looks like this (this one testrun does not show the 20ms+ I've seen occasionally) 4.99 ms 2.36 ms 31360 __wait_seqno i915_wait_seqno i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_fault __do_fault handle_ +pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault 4.99 ms 2.75 ms 107751 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.99 ms 1.63 ms 1666 i915_mutex_lock_interruptible i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fa +ult 4.93 ms 2.45 ms 980 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_ +sysret 4.89 ms 2.20 ms 3283 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.34 ms 1.66 ms 1715 i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.73 ms 3.73 ms 49 i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.17 ms 0.33 ms 931 i915_mutex_lock_interruptible i915_gem_madvise_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.97 ms 0.43 ms 1029 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.55 ms 0.51 ms 735 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret After the patch it looks like this: 4.99 ms 2.14 ms 22212 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.86 ms 0.99 ms 14170 __wait_seqno i915_gem_object_wait_rendering__nonblocking i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_ +fault do_page_fault page_fault 3.59 ms 1.31 ms 325 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.37 ms 3.37 ms 65 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.58 ms 2.58 ms 65 i915_mutex_lock_interruptible i915_gem_do_execbuffer.isra.23 i915_gem_execbuffer2 drm_ioctl i915_compat_ioctl compat_sys_ioctl +ia32_sysret 2.19 ms 2.19 ms 65 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_ +sysret 2.18 ms 2.18 ms 65 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 1.66 ms 1.66 ms 65 i915_gem_set_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret It may not look like it, but this is quite a large difference, and I've been unable to reproduce > 5 msec delays at all, while before they do happen (just not in the trace above)." gem_gtt_hog on an old Pineview (GMA3150), before: 4969.119ms after: 4122.749ms Reported-by: Arjan van de Ven <arjan.van.de.ven@intel.com> Testcase: igt/gem_gtt_hog Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:59 +01:00
Chris Wilson	bd9b6a4ec5	drm/i915: Downgrade ERROR message for invalid user input When we detect that the user passed along an invalid handle or object, we emit a warning as an aide for debugging. Since these are indeed only for debugging user triggerable errors (and the errors are reported back to userspace by the errno), the messages should only be at the debug level and not claiming that there is a catastrophic error in the driver/hardware. References: https://bugs.freedesktop.org/show_bug.cgi?id=74704 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:54 +01:00
Damien Lespiau	3d13ef2e2d	drm/i915: Always use INTEL_INFO() to access the device_info structure If we make sure that all the dev_priv->info usages are wrapped by INTEL_INFO(), we can easily modify the ->info field to be structure and not a pointer while keeping the const protection in the INTEL_INFO() macro. v2: Rebased onto latest drm-nightly Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:50 +01:00
Chris Wilson	8c99e57d39	drm/i915: Treat using a purged buffer as a source of EFAULT Since a purged buffer is one without any associated pages, attempting to use it should generate EFAULT rather than EINVAL, as it is not strictly an invalid parameter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 17:03:32 +01:00
Chris Wilson	45d678173a	drm/i915: Convert EFAULT into a silent SIGBUS EFAULT will be a possible return code where backing storage is transient, such after it is purged by madvise. As such it is to be expected and so should not trigger a WARN inside i915_gem_fault() but be converted silently to SIGBUS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 17:03:27 +01:00
Mika Kuoppala	e38486943e	drm/i915: release mutex in i915_gem_init()'s error path Found with smatch. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 12:10:45 +01:00
Mika Kuoppala	939fd76208	drm/i915: Get rid of acthd based guilty batch search As we seek the guilty batch using request and hangcheck score, this code is not needed anymore. v2: Rebase. Passing dev_priv instead of getting it from last_ring Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 11:57:29 +01:00
Mika Kuoppala	b6b0fac04d	drm/i915: Use hangcheck score to find guilty context With full ppgtt using acthd is not enough to find guilty batch buffer. We get multiple false positives as acthd is per vm. Instead of scanning which vm was running on a ring, to find corressponding context, use a different, simpler, strategy of finding batches that caused gpu hang: If hangcheck has declared ring to be hung, find first non complete request on that ring and claim it was guilty. v2: Rebase Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 11:57:24 +01:00
Mika Kuoppala	3fac8978f5	drm/i915: Tune down debug output when context is banned If we have stopped rings then we know that test is running so no need for spam. In addition, only spam when default context gets banned. v2: - make sure default context ban gets shown (Chris) - use helper for checking for default context, everywhere (Chris) v3: - dont be quiet when debug is set (Ben, Daniel) Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73652 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 17:25:38 +01:00
Mika Kuoppala	44e2c0705a	drm/i915: Use i915_hw_context to set reset stats With full ppgtt support drm_i915_file_private gained knowledge about the default context. Also reset stats are now inside i915_hw_context so we can use proper abstraction. v2: Move BUG_ON and WARN_ON to more proper locations (Ben) v3: Pass dev directly to i915_context_is_banned to avoid the need to dereference ctx->last_ring. Spotted by Mika when checking my s/BUG/WARN/ change, I've missed this ->last_ring dereference. Suggested-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v2) Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v2) [danvet: s/BUG/WARN/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-30 17:24:36 +01:00
Daniel Vetter	6ba844b090	drm/i915: GEN7_MSG_CONTROL is ivb-only At least I couldn't find it in the Haswell Bspec any more and we've tried to test-boot a Haswell machine with num_pipes forced to 0 (i.e. hit the PCH_NOP path) and the unclaimed register logic complained. So restrict this dance to just ivb platforms. v2: Art pointed out that the bits simply moved on hsw+ v3: Buy code terseneness with a notch of sublety as suggested by Chris. v4: Frob the right bit, spotted by Art. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Arthur Ranyan <arthur.j.runyan@intel.com> Cc: Dave Airlie <airlied@gmail.com> Reviewed-by: Art Runyan <arthur.j.runyan@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-27 17:16:47 +01:00
Jani Nikula	d330a9530c	drm/i915: move module parameters into a struct, in a new file With 20+ module parameters, I think referring to them via a struct improves clarity over just having a bunch of globals. While at it, move the parameter initialization and definitions into a new file i915_params.c to reduce clutter in i915_drv.c. Apart from the ill-named i915_enable_rc6, i915_enable_fbc and i915_enable_ppgtt parameters, for which we lose the "i915_" prefix internally, the module parameters now look the same both on the kernel command line and in code. For example, "i915.modeset". The downsides of the change are losing static on a couple of variables and not having the initialization and module_param_named() right next to each other. On the other hand, all module parameters are now defined in one place at i915_params.c. Plus you can do this to find all module parameter references: $ git grep "i915\." -- drivers/gpu/drm/i915 v2: - move the definitions into a new file - s/i915_params/i915/ - make i915_try_reset i915.reset, for consistency Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-27 17:16:45 +01:00
Daniel Vetter	0e5539b923	Merge branch 'topic/ppgtt' into drm-intel-next-queued Because whatever.* * This should contain a fairly long list of issues and still unresolved resgressions, but I didn't really get a vote. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-25 21:14:57 +01:00
Chris Wilson	f72d21eddf	drm/i915: Place the Global GTT VM first in the list of VM This is useful for debugging as we then know that the first entry is always the global GTT, and all later entries the per-process GTT VM. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-25 20:07:15 +01:00
Chris Wilson	5dce5b9387	drm/i915: Wait for completion of pending flips when starved of fences On older generations (gen2, gen3) the GPU requires fences for many operations, such as blits. The display hardware also requires fences for scanouts and this leads to a situation where an arbitrary number of fences may be pinned by old scanouts following a pageflip but before we have executed the unpin workqueue. This is unpredictable by userspace and leads to random EDEADLK when submitting an otherwise benign execbuffer. However, we can detect when we have an outstanding flip and so cause userspace to wait upon their completion before finally declaring that the system is starved of fences. This is really no worse than forcing the GPU to stall waiting for older execbuffer to retire and release their fences before we can reallocate them for the next execbuffer. v2: move the test for a pending fb unpin to a common routine for later reuse during eviction Reported-and-tested-by: dimon@gmx.net Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 10:34:40 +01:00
Ben Widawsky	1d62beeaeb	drm/i915/ppgtt: Defer request freeing on reset We need to defer the free request until the object/vma is capable of being freed - or else we have a problem when we try to destroy the context. The exact same issue is described and fixed here: commit `e20780439b` Author: Ben Widawsky <ben@bwidawsk.net> Date: Fri Dec 6 14:11:22 2013 -0800 drm/i915: Defer request freeing I had this fix previously, but decided not to keep it for some reason I can no longer remember. gem_reset_stats is a really good test at hitting the problem. For the inquisitive: [ 170.516392] ------------[ cut here ]------------ [ 170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]() [ 170.518064] Memory manager not clean during takedown. [ 170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ #28 [ 170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012 [ 170.520662] Call Trace: [ 170.521517] [<ffffffff814f0589>] dump_stack+0x4e/0x7a [ 170.522373] [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0 [ 170.523227] [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50 [ 170.524079] [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm] [ 170.524934] [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110 [i915] [ 170.525777] [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29 [i915] [ 170.526603] [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0 [i915] [ 170.527423] [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0 [i915] [ 170.528247] [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915] [ 170.529065] [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915] [ 170.529870] [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120 [i915] [ 170.530666] [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0 [ 170.531453] [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0 [ 170.532230] [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0 [ 170.532996] [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0 [ 170.533771] [<ffffffff810743ef>] kthread+0xff/0x120 [ 170.534548] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.535322] [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0 [ 170.536089] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80 [ 170.536847] ---[ end trace 3d4c12892e42d58f ]--- v2: Whitespace fix. (Chris) Note: This is a bug that only hits the ppgtt topic branch but I've figured that doing the request cleanup in this order is generally the right thing to do. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Add a code comment to clarify what's actually going on since the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also add a note about why we need this.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 10:34:37 +01:00
Daniel Vetter	8664850087	drm/i915: Tune down reset_stat output from ERROR to debug This is user-triggerable and hence we should not allow it to spam dmesg. Also, it upsets the nice dmesg tracking piglit does. Note that this is just extra debugging information, mostly unwanted, in case of a hang and that there is a separate message to the user giving instructions on how to report a bug for a GPU hang. v2: Add note as suggests in Chris' reply. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72740 Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 10:34:35 +01:00
Daniel Vetter	0d9d349d87	Merge commit origin/master into drm-intel-next Conflicts are getting out of hand, and now we have to shuffle even more in -next which was also shuffled in -fixes (the call for drm_mode_config_reset needs to move yet again). So do a proper backmerge. I wanted to wait with this for the 3.13 relaese, but alas let's just do this now. Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_ddi.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_pm.c Besides the conflict around the forcewake get/put (where we chaged the called function in -fixes and added a new parameter in -next) code all the current conflicts are of the adjacent lines changed type. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-16 22:06:30 +01:00
Chris Wilson	e910303802	drm/i915: Free requests after object release when retiring requests Freeing a request triggers the destruction of the context. This needs to occur after all objects are themselves unbound from the context, and so the free request needs to occur after the object release during retire. This tidies up commit `e20780439b` Author: Ben Widawsky <ben@bwidawsk.net> Date: Fri Dec 6 14:11:22 2013 -0800 drm/i915: Defer request freeing by simply swapping the order of operations rather than introducing further complexity - as noted during review. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-10 08:21:52 +01:00
Daniel Vetter	bfca05275a	Revert "drm/i915: Do not allow buffers at offset 0" This reverts commit `4fe9adbc36`. The patch completely lacks a detailed explanation of what exactly blows up and how, so is insufficiently justified as a band-aid. Otoh the justification as a safety measure against userspace botching up relocations is also fairly weak: If we want real project we need to at least make the gab big enough that the gpu doesn't scribble over more important stuff. With 4k screens that would be 32MB. Also I think this would be much better in conjunction with a (debug) switch to disable our use of the scratch page. Hence revert this. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 17:50:39 +01:00
Daniel Vetter	02f6bcccf7	drm/i915: Reject the pin ioctl on gen6+ Especially with ppgtt this kinda stopped making sense. And if we indeed need this to hack around an issue, we need something that also works for non-root. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:30:22 +01:00
Ben Widawsky	7e0d96bc03	drm/i915: Use multiple VMs -- the point of no return As with processes which run on the CPU, the goal of multiple VMs is to provide process isolation. Specific to GEN, there is also the ability to map more objects per process (2GB each instead of 2Gb-2k total). For the most part, all the pipes have been laid, and all we need to do is remove asserts and actually start changing address spaces with the context switch. Since prior to this we've converted the setting of the page tables to a streamed version, this is quite easy. One important thing to point out (since it'd been hotly contested) is that with this patch, every context created will have it's own address space (provided the HW can do it). v2: Disable BDW on rebase NOTE: I tried to make this commit as small as possible. I needed one place where I could "turn everything on" and that is here. It could be split into finer commits, but I didn't really see much point. Cc: Eric Anholt <eric@anholt.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:24:52 +01:00
Daniel Vetter	3d7f0f9dcc	Merge commit drm-intel-fixes into topic/ppgtt I need the tricky do_switch fix before I can merge the final piece of the ppgtt enabling puzzle. Otherwise the conflict will be a real pain to resolve since the do_switch hunk from -fixes must be placed at the exact right place within a hunk in the next patch. Conflicts: drivers/gpu/drm/i915/i915_gem_context.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/intel_display.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:23:37 +01:00
Ben Widawsky	4fe9adbc36	drm/i915: Do not allow buffers at offset 0 This is primarily a band aid for an unexplainable error in gem_reloc_vs_gpu/forked-faulting-reloc-thrashing. Essentially as soon as a relocated buffer (which had a non-zero presumed offset) moved to offset 0, something goes bad. Since I have been unable to solve this, and potentially this is a good thing to do anyway, since many things can accidentally write to offset 0, why not? Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 16:15:40 +01:00
Ben Widawsky	e20780439b	drm/i915: Defer request freeing With context destruction, we always want to be able to tear down the underlying address space. This is invoked on the last unreference to the context which could happen before we've moved all objects to the inactive list. To enable a clean tear down the address space, make sure to process the request free lastly. Without this change, we cannot guarantee to we don't still have active objects in the VM. As an example of a failing case: CTX-A is created, count=1 CTX-A is used during execbuf does a context switch count = 2 and add_request count = 3 CTX B runs, switches, CTX-A count = 2 CTX-A is destroyed, count = 1 retire requests is called free_request from CTX-A, count = 0 <--- free context with active object As mentioned above, by doing the free request after processing the active list, we can avoid this case. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:52:51 +01:00
Ben Widawsky	41bde5535a	drm/i915: Get context early in execbuf We need to have the address space when reserving space for the objects. Since the address space and context are tied together, and reserve occurs before context switch (for good reason), we must lookup our context earlier in the process. This leaves some room for optimizations where we no longer need to use ctx_id in certain places. This will be addressed in a subsequent patch. Important tricky bit: Because slow relocations during execbuffer drop struct_mutex Perhaps it would be best to acquire the reference when we get the context, but I'll save that for another day (note I have written the patch before, and I found the changes required to be uglier than this). Note that since we currently access everything via context id, and not the data structure this is fine, though not desirable. The next change attempts to get the context only once via the context ID idr lookup, and as such, the following can happen: CTX-A is created, refcount = 1 CTX-A execbuf, mutex dropped close IOCTL called on CTX-A, refcount = 0 CTX-A resumes in execbuf. v2: Rebased on top of commit `b6359918b8` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Wed Oct 30 15:44:16 2013 +0200 drm/i915: add i915_get_reset_stats_ioctl v3: Rebased on top of commit `25b3dfc87b` Author: Mika Westerberg <mika.westerberg@linux.intel.com> Date: Tue Nov 12 11:57:30 2013 +0200 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Tue Nov 26 16:14:33 2013 +0200 drm/i915: check context reset stats before relocations Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:52:42 +01:00
Ben Widawsky	c482972a08	drm/i915: Piggy back hangstats off of contexts To simplify the codepaths somewhat, we can simply always create a context. Contexts already keep hangstat information. This prevents us from having to differentiate at other parts in the code. There is allocation overhead, but it should not be measurable. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:51:58 +01:00
Ben Widawsky	bdf4fd7ea0	drm/i915: Do aliasing PPGTT init with contexts We have a default context which suits the aliasing PPGTT well. Tie them together so it looks like any other context/PPGTT pair. This makes the code cleaner as it won't have to special case aliasing as often. The patch has one slightly tricky part in the default context creation function. In the future (and on aliased setup) we create a new VM for a context (potentially). However, if we have aliasing PPGTT, which occurs at this point in time for all platforms GEN6+, we can simply manage the refcounting to allow things to behave as normal. Now is a good time to recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT drm_mm. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:32:14 +01:00
Ben Widawsky	a3d67d2396	drm/i915: PPGTT vfuncs should take a ppgtt argument Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:56 +01:00
Ben Widawsky	2fa48d8d4a	drm/i915: Split context enabling from init We need to do this for exactly 1 reason, because we want to embed a PPGTT into the context, but we don't want to special case the default context. To achieve that, we must be able to initialize contexts after the GTT is setup (so we can allocate and pin the default context's BO), but before the PPGTT and rings are initialized. This is because, currently, context initialization requires ring usage. We don't have rings until after the GTT is setup. If we split the enabling part of context initialization, the part requiring the ringbuffer, we can untangle this, and then later embed the PPGTT Incidentally this allows us to also adhere to the original design of context init/fini in future patches: they were only ever meant to be called at driver load and unload. v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris) v3: BUG_ON after checking for disabled contexts. Or else it blows up pre gen6 (Ben) v4: Forward port Modified enable for each ring, since that patch is earlier in the series Dropped ring arg from create_default_context so it can be used by others Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:55 +01:00
Ben Widawsky	acce9ffa48	drm/i915: Better reset handling for contexts This patch adds to changes for contexts on reset: Sets last context to default - this will prevent the context switch happening after a reset. That switch is not possible because the rings are hung during reset and context switch requires reset. This behavior will need to be reworked in the future, but this is what we want for now. In the future, we'll also want to reset the guilty context to uninitialized. We should wait for ARB_Robustness related code to land for that. This is somewhat for paranoia. Because we really don't know what the GPU was doing when it hung, or the state it was in (mid context write, for example), later restoring the context is a bad idea. By setting the flag to not initialized, the next load of that context will not restore the state, and thus on the subsequent switch away from the context will overwrite the old data. NOTE: This code needs a fixup when we actually have multiple VMs. The issue that can occur is inactive objects in a VM will need to be destroyed before the last context unref. This can now happen via the fake switch introduced in this patch (and it other ways in the future) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:54 +01:00
Ben Widawsky	e422b888eb	drm/i915: Add a context open function We'll be doing a bit more stuff with each file, so having our own open function should make things clean. This also allows us to easily add conditionals for stuff we don't want to do when we don't have HW contexts. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:51 +01:00
Ben Widawsky	6f65e29aca	drm/i915: Create bind/unbind abstraction for VMAs To sum up what goes on here, we abstract the vma binding, similarly to the previous object binding. This helps for distinguishing legacy binding, versus modern binding. To keep the code churn as minimal as possible, I am leaving in insert_entries(). It serves as the per platform pte writing basically. bind_vma and insert_entries do share a lot of similarities, and I did have designs to combine the two, but as mentioned already... too much churn in an already massive patchset. What follows are the 3 commits which existed discretely in the original submissions. Upon rebasing on Broadwell support, it became clear that separation was not good, and only made for more error prone code. Below are the 3 commit messages with all their history. drm/i915: Add bind/unbind object functions to VMA drm/i915: Use the new vm [un]bind functions drm/i915: reduce vm->insert_entries() usage drm/i915: Add bind/unbind object functions to VMA As we plumb the code with more VM information, it has become more obvious that the easiest way to deal with bind and unbind is to simply put the function pointers in the vm, and let those choose the correct way to handle the page table updates. This change allows many places in the code to simply be vm->bind, and not have to worry about distinguishing PPGTT vs GGTT. Notice that this patch has no impact on functionality. I've decided to save the actual change until the next patch because I think it's easier to review that way. I'm happy to squash the two, or let Daniel do it on merge. v2: Make ggtt handle the quirky aliasing ppgtt Add flags to bind object to support above Don't ever call bind/unbind directly for PPGTT until we have real, full PPGTT (use NULLs to assert this) Make sure we rebind the ggtt if there already is a ggtt binding. This happens on set cache levels. Use VMA for bind/unbind (Daniel, Ben) v3: Reorganize ggtt_vma_bind to be more concise and easier to read (Ville). Change logic in unbind to only unbind ggtt when there is a global mapping, and to remove a redundant check if the aliasing ppgtt exists. v4: Make the bind function a bit smarter about the cache levels to avoid unnecessary multiple remaps. "I accept it is a wart, I think unifying the pin_vma / bind_vma could be unified later" (Chris) Removed the git notes, and put version info here. (Daniel) v5: Update the comment to not suck (Chris) v6: Move bind/unbind to the VMA. It makes more sense in the VMA structure (always has, but I was previously lazy). With this change, it will allow us to keep a distinct insert_entries. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> drm/i915: Use the new vm [un]bind functions Building on the last patch which created the new function pointers in the VM for bind/unbind, here we actually put those new function pointers to use. Split out as a separate patch to aid in review. I'm fine with squashing into the previous patch if people request it. v2: Updated to address the smart ggtt which can do aliasing as needed Make sure we bind to global gtt when mappable and fenceable. I thought we could get away without this initialy, but we cannot. v3: Make the global GTT binding explicitly use the ggtt VM for bind_vma(). While at it, use the new ggtt_vma helper (Chris) At this point the original mailing list thread diverges. ie. v4^: use target_obj instead of obj for gen6 relocate_entry vma->bind_vma() can be called safely during pin. So simply do that instead of the complicated conditionals. Don't restore PPGTT bound objects on resume path Bug fix in resume path for globally bound Bos Properly handle secure dispatch Rebased on vma bind/unbind conversion Signed-off-by: Ben Widawsky <ben@bwidawsk.net> drm/i915: reduce vm->insert_entries() usage FKA: drm/i915: eliminate vm->insert_entries() With bind/unbind function pointers in place, we no longer need insert_entries. We could, and want, to remove clear_range, however it's not totally easy at this point. Since it's used in a couple of place still that don't only deal in objects: setup, ppgtt init, and restore gtt mappings. v2: Don't actually remove insert_entries, just limit its usage. It will be useful when we introduce gen8. It will always be called from the vma bind/unbind. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:50 +01:00
Ben Widawsky	d7f46fc4e7	drm/i915: Make pin count per VMA Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:49 +01:00
Ben Widawsky	feb822cfc2	drm/i915: Handle inactivating objects for all VMAs This came from a patch called, "drm/i915: Move active to vma" When moving an object to the inactive list, we do it for all VMs for which the object is bound. The primary difference from that patch is this time around we don't not track 'active' per vma, but rather by object. Therefore, we only need one unref. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:46 +01:00
Ben Widawsky	c39538a88d	drm/i915: Takedown drm_mm on failed gtt setup This was found by code inspection. If the GTT setup fails then we are left without properly tearing down the drm_mm. Hopefully this never happens. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:45 +01:00
Ben Widawsky	6e164c3382	drm/i915: Allow ggtt lookups to not WARN To be able to effectively use the GGTT object lookup function, we don't want to warn when there is no GGTT mapping. Let the caller deal with it instead. Originally, I had intended to have this behavior, and has not introduced the WARN. It was introduced during review with the addition of the follow commit commit `5c2abbeab7` Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Tue Sep 24 09:57:57 2013 -0700 drm/i915: Provide a cheap ggtt vma lookup Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:45 +01:00
Ben Widawsky	6f425321e0	drm/i915: Don't unconditionally try to deref aliasing ppgtt Since the beginning, the functions which try to properly reference the aliasing PPGTT have deferences a potentially null aliasing_ppgtt member. Since the accessors are meant to be global, this will not do. Introduced originally in: commit `a70a3148b0` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 16:59:56 2013 -0700 drm/i915: Make proper functions for VMs Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-18 15:27:44 +01:00
Paulo Zanoni	48018a57a8	drm/i915: release the GTT mmaps when going into D3 So we'll get a fault when someone tries to access the mmap, then we'll wake up from D3. v2: - Rebase v3: - Use gtt active/inactive Testcase: igt/pm_pc8/gem-mmap-gtt Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [danvet: Add comment + WARN as discussed with Paulo on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-14 15:35:52 +01:00
Mika Kuoppala	168c3f2151	drm/i915: dont call irq_put when irq test is on If test is running, irq_get was not called so we should gain balance by not doing irq_put "So the rule is: if you access unlocked values, you use ACCESS_ONCE(). You don't say "but it can't matter". Because you simply don't know." -- Linus v2: use local variable so it can't change during test (Chris) v3: update commit msg and use ACCESS_ONCE (Ville) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-12 22:58:44 +01:00
Mika Kuoppala	47e9766df0	drm/i915: Fix timeout with missed interrupts in __wait_seqno Commit `094f9a54e3` ("drm/i915: Fix __wait_seqno to use true infinite timeouts") added support for __wait_seqno to detect missing interrupts and go around them by polling. As there is also timeout detection in __wait_seqno, the polling and timeout detection were done with the same timer. When there has been missed interrupts and polling is needed, the timer is set to trigger in (now + 1) jiffies in future, instead of the caller specified timeout. Now when io_schedule() returns, we calculate the jiffies left to timeout using the timer expiration value. As the current jiffies is now bound to be always equal or greater than the expiration value, the timeout_jiffies will become zero or negative and we return -ETIME to caller even tho the timeout was never reached. Fix this by decoupling timeout calculation from timer expiration. v2: Commit message with some sense in it (Chris Wilson) v3: add parenthesis on timeout_expire calculation v4: don't read jiffies without timeout (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-12 15:27:21 +01:00
Chris Wilson	4db080f9e9	drm/i915: Fix erroneous dereference of batch_obj inside reset_status As the rings may be processed and their requests deallocated in a different order to the natural retirement during a reset, /* Whilst this request exists, batch_obj will be on the * active_list, and so will hold the active reference. Only when this * request is retired will the the batch_obj be moved onto the * inactive_list and lose its active reference. Hence we do not need * to explicitly hold another reference here. */ is violated, and the batch_obj may be dereferenced after it had been freed on another ring. This can be simply avoided by processing the status update prior to deallocating any requests. Fixes regression (a possible OOPS following a GPU hang) from commit `aa60c664e6` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Wed Jun 12 15:13:20 2013 +0300 drm/i915: find guilty batch buffer on ring resets Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: Add the code comment Chris supplied.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-12 10:49:05 +01:00
Paulo Zanoni	f65c916898	drm/i915: add runtime put/get calls at the basic places If I add code to enable runtime PM on my Haswell machine, start a desktop environment, then enable runtime PM, these functions will complain that they're trying to read/write registers while the graphics card is suspended. v2: - Simplify i915_gem_fault changes. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing to do.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-10 22:47:33 +01:00
Chris Wilson	70903c3ba8	drm/i915: Fix ordering of unbind vs unpin pages It is useful to assert that if the object is bound, then it must have its pages pinned to prevent the shrinker from reaping its backing store. This is even more useful with the introduction of real-ppgtt whereupon we may have the object bound into several vma, with each instance pinning the backing store. This assertion breaks down during unbind where we unpinned the backing store before decoupling the vma binding. This can be fixed with a trivial reording of the unbind sequence, which reinforces the pin pages bind to vma ... unbind from vma unpin pages concept. v2: Bonus comment Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-12-04 12:10:50 +01:00
Ville Syrjälä	0bf2134780	drm/i915: MI_PREDICATE_RESULT_2 is HSW only The MI_PREDICATE_RESULT_2 register exits only on HSW. On other platforms the same offset is either reserved, or contains some other register. So write the register only on HSW. This regression has been introduced in commit `9435373ef8` Author: Rodrigo Vivi <rodrigo.vivi@gmail.com> Date: Wed Aug 28 16:45:46 2013 -0300 drm/i915: Report enabled slices on Haswell GT3 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [danvet: Add regression notice.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-29 15:00:03 +01:00
Daniel Vetter	c09cd6e969	Merge branch 'backlight-rework' into drm-intel-next-queued Pull in Jani's backlight rework branch. This was merged through a separate branch to be able to sort out the Broadwell conflicts properly before pulling it into the main development branch. Conflicts: drivers/gpu/drm/i915/intel_display.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-15 10:02:39 +01:00
Ben Widawsky	31a5336e1c	drm/i915/bdw: Swizzling support Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:37 +01:00
Ben Widawsky	5ab31333ac	drm/i915/bdw: Fences on gen8 look just like gen7 Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-08 18:09:36 +01:00
Ben Widawsky	8245be3139	drm/i915: Require HW contexts (when possible) v2: Fixed the botched locking on init_hw failure in i915_reset (Ville) Call cleanup_ringbuffer on failed context create in init_hw (Ville) v3: Add dev argument ti clean_ringbuffer Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-11-07 09:35:44 +01:00
Paulo Zanoni	de45eaf7b9	drm/i915: fix open-coded DIV_ROUND_UP Use the nice Kernel macro, it makes the code much more readable. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-21 10:04:03 +02:00
Daniel Vetter	aa5f802181	drm/i915: Use unsigned long for obj->user_pin_count At least on linux sizeof(long) == sizeof(void*) and the thinking is that you can grab about as many references as there's memory. Doesn't really matter, just a bit of OCD since the fixed size data type in a pure in-kernel datastructure look off. v2: Ville asked for an overflow check since no one prevents userspace from incrementing the pin count forever. v3: s/INT/LONG/, noticed by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-16 22:06:39 +02:00
Chris Wilson	45c5f2022c	drm/i915: Disable all GEM timers and work on unload We have two once very similar functions, i915_gpu_idle() and i915_gem_idle(). The former is used as the lower level operation to flush work on the GPU, whereas the latter is the high level interface to flush the GEM bookkeeping in addition to flushing the GPU. As such i915_gem_idle() also clears out the request and activity lists and cancels the delayed work. This is what we need for unloading the driver, unfortunately we called i915_gpu_idle() instead. In the process, make sure that when cancelling the delayed work and timer, which is synchronous, that we do not hold any locks to prevent a deadlock if the work item is already waiting upon the mutex. This requires us to push the mutex down from the caller to i915_gem_idle(). v2: s/i915_gem_idle/i915_gem_suspend/ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: xunx.fang@intel.com [danvet: Only set ums.suspended for !kms as discussed earlier. Chris noticed that this slipped through.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-16 19:42:14 +02:00
Ben Widawsky	3d57e5bd12	drm/i915: Do a fuller init after reset I had this lying around from he original PPGTT series, and thought we might try to get it in by itself. It's convenient to just call i915_gem_init_hw at reset because we'll be adding new things to that function, and having just one function to call instead of reimplementing it in two places is nice. In order to accommodate we cleanup ringbuffers in order to bring them back up cleanly. Optionally, we could also teardown/re initialize the default context but this was causing some problems on reset which I wasn't able to fully debug, and is unnecessary with the previous context init/enable split. This essentially reverts: commit `8e88a2bd59` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jun 19 18:40:00 2012 +0200 drm/i915: don't call modeset_init_hw in i915_reset It seems to work for me on ILK now. Perhaps it's due to: commit `8a5c2ae753` Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Thu Mar 28 13:57:19 2013 -0700 drm/i915: fix ILK GPU reset for render Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-16 11:08:08 +02:00
Daniel Vetter	3bbbe706e8	drm/i915: check that the i965g/gm 4G limit is really obeyed In truly crazy circumstances shmem might give us the wrong type of page. So be a bit paranoid and double check this. Reviewer: Damien Lespiau <damien.lespiau@intel.com> Cc: Rob Clark <robdclark@gmail.com> References: http://lkml.org/lkml/2011/7/11/238 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:47:05 +02:00
Chris Wilson	d9973b4356	drm/i915: Fix type mismatch and accounting in i915_gem_shrink The interface uses an unsigned long, and we can use the unsigned counter throughout our code, so do so. In the process, we notice one instance where the shrink count is based on a heuristic rather than the result, and another where we ask for too many pages to be purged. v2: nr_to_scan needs to be promoted to a long as well, so just use sc->nr_to_scan directly. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:46:48 +02:00
Chris Wilson	5035c275af	drm/i915: Call io_schedule() whilst whilsting for the GPU Since we are waiting upon IO completion, inform the kernel through use of the io_schedule() call rather than the regular schedule(). This should allow the kernel to make better decisions regarding scheduling and power management. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:46:47 +02:00
Daniel Vetter	967ad7f148	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next The conflict in intel_drv.h tripped me up a bit since a patch in dinq moves all the functions around, but another one in drm-next removes a single function. So I'ev figured backing this into a backmerge would be good. i915_dma.c is just adjacent lines changed, nothing nefarious there. Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/intel_drv.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-10 12:44:43 +02:00
David Herrmann	16eb5f4379	drm: kill ->gem_init_object() and friends All drivers embed gem-objects into their own buffer objects. There is no reason to keep drm_gem_object_alloc(), gem->driver_private and ->gem_init_object() anymore. New drivers are highly encouraged to do the same. There is no benefit in allocating gem-objects separately. Cc: Dave Airlie <airlied@gmail.com> Cc: Alex Deucher <alexdeucher@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Ben Skeggs <skeggsb@gmail.com> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-10-09 14:38:02 +10:00
Chris Wilson	b29c19b645	drm/i915: Boost RPS frequency for CPU stalls If we encounter a situation where the CPU blocks waiting for results from the GPU, give the GPU a kick to boost its the frequency. This should work to reduce user interface stalls and to quickly promote mesa to high frequencies - but the cost is that our requested frequency stalls high (as we do not idle for long enough before rc6 to start reducing frequencies, nor are we aggressive at down clocking an underused GPU). However, this should be mitigated by rc6 itself powering off the GPU when idle, and that energy use is dependent upon the workload of the GPU in addition to its frequency (e.g. the math or sampler functions only consume power when used). Still, this is likely to adversely affect light workloads. In particular, this nearly eliminates the highly noticeable wake-up lag in animations from idle. For example, expose or workspace transitions. (However, given the situation where we fail to downclock, our requested frequency is almost always the maximum, except for Baytrail where we manually downclock upon idling. This often masks the latency of upclocking after being idle, so animations are typically smooth - at the cost of increased power consumption.) Stéphane raised the concern that this will punish good applications and reward bad applications - but due to the nature of how mesa performs its client throttling, I believe all mesa applications will be roughly equally affected. To address this concern, and to prevent applications like compositors from permanently boosting the RPS state, we ratelimit the frequency of the wait-boosts each client recieves. Unfortunately, this techinique is ineffective with Ironlake - which also has dynamic render power states and suffers just as dramatically. For Ironlake, the thermal/power headroom is shared with the CPU through Intelligent Power Sharing and the intel-ips module. This leaves us with no GPU boost frequencies available when coming out of idle, and due to hardware limitations we cannot change the arbitration between the CPU and GPU quickly enough to be effective. v2: Limit each client to receiving a single boost for each active period. Tested by QA to only marginally increase power, and to demonstrably increase throughput in games. No latency measurements yet. v3: Cater for front-buffer rendering with manual throttling. v4: Tidy up. v5: Sadly the compositor needs frequent boosts as it may never idle, but due to its picking mechanism (using ReadPixels) may require frequent waits. Those waits, along with the waits for the vrefresh swap, conspire to keep the GPU at low frequencies despite the interactive latency. To overcome this we ditch the one-boost-per-active-period and just ratelimit the number of wait-boosts each client can receive. Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Stéphane Marchesin <stephane.marchesin@gmail.com> Cc: Owen Taylor <otaylor@redhat.com> Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com> Cc: "Zhuang, Lena" <lena.zhuang@intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> [danvet: No extern for function prototypes in headers.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-03 20:01:31 +02:00

1 2 3 4 5 ...

1097 Commits