linux

Author	SHA1	Message	Date
Chris Wilson	ceabbba524	drm/i915: Include bound and active pages in the count of shrinkable objects When the machine is under a lot of memory pressure and being stressed by multiple GPU threads, we quite often report fewer than shrinker->batch (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to skip calling into i915.ko to release pages, despite the GPU holding onto most of the physical pages in its active lists. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:06 +02:00
Chris Wilson	0820baf39b	drm/i915: Translate ENOSPC from shmem_get_page() to ENOMEM shmemfs first checks if there is enough memory to allocate the page and reports ENOSPC should there be insufficient, along with the usual ENOMEM for a genuine allocation failure. We use ENOSPC in our driver to mean that we have run out of aperture space and so want to translate the error from shmemfs back to our usual understanding of ENOMEM. None of the the other GEM users appear to distinguish between ENOMEM and ENOSPC in their error handling, hence it is easiest to do the fixup in i915.ko Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:45:22 +02:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Oscar Mateo	19656430a8	drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display Otherwise, we do a NULL pointer dereference. I've seen this happen while handling an error in i915_gem_object_pin_to_display_plane(): If i915_gem_object_set_cache_level() fails, we call is_pin_display() to handle the error. At this point, the object is still not pinned to GGTT and maybe not even bound, so we have to check before we dereference its GGTT vma. The IGT kms_flip/bo-too-big tests for this bug. v2: Chris Wilson says restoring the old value is easier, but that is_pin_display is useful as a theory of operation. Take the solomonic decision: at least this way is_pin_display is a little more robust (until Chris can kill it off). v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its usefulness: add a reminder to remove it. Issue: VIZ-3772 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Testcase: igt/kms_flip/bo-too-big Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 16:24:39 +02:00
Daniel Vetter	8b1bc9b4f1	drm/i915: Only do gtt cleanup in vma_unbind for the global vma Otherwise we end up tearing down fences when e.g. the client quits way too early. Might or might not fix a fence pin_count BUG Ville has reported. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:54 +02:00
Daniel Vetter	aff10b30a1	drm/i915: Don't drop pinned fences Userspace can currently provoke this when e.g. trying to use a pinned scanout as a cursor or overlay target. Later on that might lead to some fun fence pin count mayhem. Spurred by Ville's report that something goes wrong here and originally I've thought that this might slip through the pwrite gtt fastpath. But that one checks of obj tiling, so should be ok. But one thing that _does_ blow up is the vma unbinding with more than one address space. The next patch will fix this. v2: Use a WARN_ON - Chris pointed out that we already catch all cases so userspace can't provoke this like I've originally feared. While reviewing relevant code I've noticed a pile of DRM_ERROR in the overlay&cursor code which are all triggerable by userspace. Tune them down while at it. v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch, as requested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 18:39:31 +02:00
Daniel Vetter	d8ffa60b52	drm/i915: WARN_ON fence pin leaks The fence pin count should always be <= the bo pin count. If that's not the case then we have a funny problem and are leaking references somewhere. Which means we can catch fence pin leaks by checking for the same upper limit as we do for the bo pin count. Inspired by a discussion with Ville about a fence leak igt testcase. v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that might catch a leak even quicker. Also de-inline them, they're getting too big. v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count check will catch that already (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-13 17:16:12 +02:00
Chris Wilson	1cf0ba1474	drm/i915: Flush request queue when waiting for ring space During the review of commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Ville raised the point that our interaction with request->tail was likely to foul up other uses elsewhere (such as hang check comparing ACTHD against requests). However, we also need to restore the implicit retire requests that certain test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the spectre that the ppgtt will randomly call i915_gpu_idle() and recurse back into intel_ring_begin(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023 Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Remove now unused 'tail' variable as spotted by Brad.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-08 01:23:34 +02:00
Ben Widawsky	6e7186af3b	drm/i915: Make aliasing a 2nd class VM There is a good debate to be had about how best to fit the aliasing PPGTT into the code. However, as it stands right now, getting aliasing PPGTT bindings is a hack, and done through implicit arguments. To make this absolutely clear, WARN and return an error if a driver writer tries to do something they shouldn't. I have no issue with an eventual revert of this patch. It makes sense for what we have today. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-07 10:01:41 +02:00
Ben Widawsky	ebc348b2ad	drm/i915: Move semaphore specific ring members to struct This will be helpful in abstracting some of the code in preparation for gen8 semaphores. v2: Move mbox stuff to a separate struct v3: Rebased over VCS2 work Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 10:56:52 +02:00
Chris Wilson	c8725f3dc0	drm/i915: Do not call retire_requests from wait_for_rendering A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:09:15 +02:00
Imre Deak	981a5aead1	drm/i915: vlv: clean up GTLC wake control/status register macros These will be needed by the upcoming VLV RPM helpers. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:50 +02:00
Zhao Yakui	845f74a701	drm/i915:Initialize the second BSD ring on BDW GT3 machine Based on the hardware spec, the BDW GT3 machine has two independent BSD ring that can be used to dispatch the video commands. So just initialize it. V3->V4: Follow Imre's comment to do some minor updates. For example: more comments are added to describe the semaphore between ring. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> [danvet: Fix up checkpatch error.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:46 +02:00
Chris Wilson	6099032045	drm/i915: Allow the module to load even if we fail to setup rings Even without enabling the ringbuffers to allow command execution, we can still control the display engines to enable modesetting. So make the ringbuffer initialization failure soft, and mark the GPU as wedged instead. v2: Only treat an EIO from ring initialisation as a soft failure, and abort module load for any other failure, such as allocation failures. v3: Add an ERROR prior to declaring the GPU wedged so that it stands out like a sore thumb in the logs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:38 +02:00
Chris Wilson	e3efda49e7	drm/i915: Preserve ring buffers objects across resume Tearing down the ring buffers across resume is overkill, risks unnecessary failure and increases fragmentation. After failure, since the device is still active we may end up trying to write into the dangling iomapping and trigger an oops. v2: stop_ringbuffers() was meant to call stop(ring) not cleanup(ring) during resume! Reported-by: Jae-hyeon Park <jhyeon@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351 References: https://bugs.freedesktop.org/show_bug.cgi?id=76554 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as suggested by Oscar.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-05 09:08:37 +02:00
Dave Airlie	444c9a08bf	Merge branch 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm into drm-next Next pull request, this time more of the drm de-midlayering work. The big thing is that his patch series here removes everything from drm_bus except the set_busid callback. Thierry has a few more patches on top of this to make that one optional to. With that we can ditch all the non-pci drm_bus implementations, which Thierry has already done for the fake tegra host1x drm_bus. Reviewed by Thierry, Laurent and David and now also survived some testing on my intel boxes to make sure the irq fumble is fixed correctly ;-) The last minute rebase was just to add the r-b tags from Thierry for the 2 patches I've redone. * 'drm-init-cleanup' of git://people.freedesktop.org/~danvet/drm: drm/<drivers>: don't set driver->dev_priv_size to 0 drm: Remove dev->kdriver drm: remove drm_bus->get_name drm: rip out dev->devname drm: inline drm_pci_set_unique drm: remove bus->get_irq implementations drm: pass the irq explicitly to drm_irq_install drm/irq: Look up the pci irq directly in the drm_control ioctl drm/irq: track the irq installed in drm_irq_install in dev->irq drm: rename dev->count_lock to dev->buf_lock drm: Rip out totally bogus vga_switcheroo->can_switch locking drm: kill drm_bus->bus_type drm: remove drm_dev_to_irq from drivers drm/irq: remove cargo-culted locking from irq_install/uninstall drm/irq: drm_control is a legacy ioctl, so pci devices only drm/pci: fold in irq_by_busid support drm/irq: simplify irq checks in drm_wait_vblank	2014-05-01 09:32:21 +10:00
Dave Airlie	885ac04ab3	Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next drm-intel-next-2014-04-16: - vlv infoframe fixes from Jesse - dsi/mipi fixes from Shobhit - gen8 pageflip fixes for LRI/SRM from Damien - cmd parser fixes from Brad Volkin - some prep patches for CHV, DRRS, ... - and tons of little things all over drm-intel-next-2014-04-04: - cmd parser for gen7 but only in enforcing and not yet granting mode - the batch copying stuff is still missing. Also performance is a bit ... rough (Brad Volkin + OACONTROL fix from Ken). - deprecate UMS harder (i.e. CONFIG_BROKEN) - interrupt rework from Paulo Zanoni - runtime PM support for bdw and snb, again from Paulo - a pile of refactorings from various people all over the place to prep for new stuff (irq reworks, power domain polish, ...) drm-intel-next-2014-04-04: - cmd parser for gen7 but only in enforcing and not yet granting mode - the batch copying stuff is still missing. Also performance is a bit ... rough (Brad Volkin + OACONTROL fix from Ken). - deprecate UMS harder (i.e. CONFIG_BROKEN) - interrupt rework from Paulo Zanoni - runtime PM support for bdw and snb, again from Paulo - a pile of refactorings from various people all over the place to prep for new stuff (irq reworks, power domain polish, ...) Conflicts: drivers/gpu/drm/i915/i915_gem_context.c	2014-05-01 09:11:37 +10:00
Daniel Vetter	bb0f1b5c16	drm: pass the irq explicitly to drm_irq_install Unfortunately this requires a drm-wide change, and I didn't see a sane way around that. Luckily it's fairly simple, we just need to inline the respective get_irq implementation from either drm_pci.c or drm_platform.c. With that we can now also remove drm_dev_to_irq from drm_irq.c. Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-23 10:32:50 +02:00
Daniel Vetter	e090c53b21	drm/irq: remove cargo-culted locking from irq_install/uninstall The dev->struct_mutex locking in drm_irq.c only protects dev->irq_enabled. Which isn't really much at all and only prevents especially nasty ums userspace from concurrently installing the interrupt handling a few times. Or at least trying. There are tons of unlocked readers of dev->irqs_enabled in the vblank wait code (and by extension also in the pageflip code since that uses the same vblank timestamp engine). Real modesetting drivers should ensure that nothing can go haywire with a sane setup teardown sequence. So we only really need this for the drm_control ioctl, everywhere else this will just paper over nastiness. Note that drm/i915 is a bit specially due to the gem+ums combination. So there we also need to properly protect the entervt and leavevt ioctls. But it's definitely saner to do everything in one go than to drop the lock in-between. Finally there's the gpu reset code in drm/i915. That one's just race (concurrent userspace calls to for vblank waits of pageflips could spuriously fail). So wrap it up in with a nice comment since fixing this is more involved. v2: Rebase and fix commit message (Thierry) Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-22 11:41:12 +02:00
Chris Wilson	691e6415c8	drm/i915: Always use kref tracking for all contexts. If we always initialize kref for the context, even if we are using fake contexts for hangstats when there is no hw support, we can forgo the dance to dereference the ctx->obj and inspect whether we are permitted to use kref inside i915_gem_context_reference() and _unreference(). My ulterior motive here is to improve the debugging of a use-after-free of ctx->obj. This patch avoids the dereference here and instead forces the assertion checks associated with kref. v2: Refactor the fake contexts to being even more like the real contexts, so that there is much less duplicated and special case code. v3: Tweaks. v4: Tweaks, minor. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [Jani: tiny change to backport to drm-intel-fixes.] Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-04-11 13:29:51 +03:00
Mika Kuoppala	88b4aa8770	drm/i915: add flags to i915_ring_stop Piglit runner and QA are both looking at the dmesg for DRM_ERRORs with test cases. Add a flag to control those when we they are expected from related test cases. Also add flag to control if contexts should be banned that introduced the hang. Hangcheck is timer based and preventing bans by adding sleeps to testcases makes testing slower. v2: intel_ring_stopped(), readable comment (Chris) v3: keep compatibility (Daniel) References: https://bugs.freedesktop.org/show_bug.cgi?id=75876 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-04-09 15:07:42 +02:00
Dave Airlie	9f97ba806a	Merge tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel into drm-next Merge window -fixes pull request as usual. Well, I did sneak in Jani's drm_i915_private_t typedef removal, need to have fun with a big sed job too ;-) Otherwise: - hdmi interlaced fixes (Jesse&Ville) - pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but not cc: stable since only really relevant for igt runs) - large cursor wm fixes (Chris) - fix gpu turbo boost/throttle again, was getting stuck due to vlv rps patches (Chris+Imre) - fix runtime pm fallout (Paulo) - bios framebuffer inherit fix (Chris) - a few smaller things * tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits) Skip intel_crt_init for Dell XPS 8700 drm/i915: vlv: fix RPS interrupt mask setting Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec" drm/i915: move power domain init earlier during system resume drm/i915: Fix the computation of required fb size for pipe drm/i915: don't get/put runtime PM at the debugfs forcewake file drm/i915: fix WARNs when reading DDI state while suspended drm/i915: don't read cursor registers on powered down pipes drm/i915: get runtime PM at i915_display_info drm/i915: don't read pp_ctrl_reg if we're suspended drm/i915: get runtime PM at i915_reg_read_ioctl drm/i915: don't schedule force_wake_timer at gen6_read drm/i915: vlv: reserve the GT power context only once during driver init drm/i915: prefer struct drm_i915_private to drm_i915_private_t drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t ...	2014-04-05 16:14:21 +10:00
Lauri Kasanen	62347f9e0f	drm: Add support for two-ended allocation, v3 Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Based on drm_mm work by Chris Wilson. v3: Changed to use a TTM placement flag v2: Updated kerneldoc Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Christian König <deathsimple@vodafone.de> Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: David Airlie <airlied@redhat.com>	2014-04-04 09:28:14 +10:00
Jani Nikula	3e31c6c017	drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t No functional changes. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-31 15:32:38 +02:00
Chris Wilson	df6f783a4e	drm/i915: Fix unsafe loop iteration over vma whilst unbinding them On non-LLC platforms, when changing the cache level of an object, we may need to unbind it so that prefetching across page boundaries does not cross into a different memory domain. This requires us to unbind conflicting vma, but we did so iterating over the objects vma in an unsafe manner (as the list was being modified as we iterated). The regression was introduced in commit `3089c6f239` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 17:00:03 2013 -0700 drm/i915: make caching operate on all address spaces apparently as far back as v3.12-rc1, but it has only just begun to trigger real world bug reports. Reported-and-tested-by: Nikolay Martynov <mar.kolya@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-21 16:13:08 +01:00
Paulo Zanoni	5d584b2eca	drm/i915: move pc8.irqs_disabled to pm.irqs_disabled When other platforms add runtime PM support they will also need to disable interrupts, so move the variable to the runtime PM struct. Also notice that the longer-term goal is to completely kill the regsave struct, and I even have patches for that. v2: - Rebase. v3: - Rebase. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-19 16:39:46 +01:00
Daniel Vetter	b80d6c781e	Merge branch 'topic/dp-aux-rework' into drm-intel-next-queued Conflicts: drivers/gpu/drm/i915/intel_dp.c A bit a mess with reverts which differe in details between -fixes and -next and some other unrelated shuffling. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-19 15:54:37 +01:00
Dave Airlie	d1583c9997	Merge branch 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux into drm-next This is the 3rd respin of the drm-anon patches. They allow module unloading, use the pin_fs_* helpers recommended by Al and are rebased on top of drm-next. Note that there are minor conflicts with the "drm-minor" branch. * 'drm-next' of git://people.freedesktop.org/~dvdhrm/linux: drm: init TTM dev_mapping in ttm_bo_device_init() drm: use anon-inode instead of relying on cdevs drm: add pseudo filesystem for shared inodes	2014-03-18 19:17:02 +10:00
David Herrmann	6796cb16c0	drm: use anon-inode instead of relying on cdevs DRM drivers share a common address_space across all character-devices of a single DRM device. This allows simple buffer eviction and mapping-control. However, DRM core currently waits for the first ->open() on any char-dev to mark the underlying inode as backing inode of the device. This delayed initialization causes ugly conditions all over the place: if (dev->dev_mapping) do_sth(); To avoid delayed initialization and to stop reusing the inode of the char-dev, we allocate an anonymous inode for each DRM device and reset filp->f_mapping to it on ->open(). Signed-off-by: David Herrmann <dh.herrmann@gmail.com>	2014-03-16 12:23:33 +01:00
Ville Syrjälä	3ddffb7b8a	drm/i915: Unbind all vmas whose new cache_level doesn't agree with the neighbours When we change the cache_level for an object we need to make sure we don't put differing types of snoopable memory too close to each other on non-LLC machines. Currently i915_gem_object_set_cache_level() will stop looking when it finds just one vma that has such a conflict. Drop the bogus break statement to make sure it will unbind all vmas which need to be moved around to avoid the conflict. I suppose this is a theoretical issue as currently we don't enable ppgtt on non-LLC machines, so each object can only have one vma. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-13 12:22:44 +01:00
Chris Wilson	c2831a94b5	drm/i915: Do not force non-caching copies for pwrite along shmem path We don't always want to write into main memory with pwrite. The shmem fast path in particular is used for memory that is cacheable - under such circumstances forcing the cache eviction is undesirable. As we will always flush the cache when targeting incoherent buffers, we can rely on that second pass to apply the cache coherency rules and so benefit from in-cache copies otherwise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-08 00:03:26 +01:00
Chris Wilson	17793c9a46	drm/i915: Process page flags once rather than per pwrite/pread We used to lock individual pages inside the buffer object and so needed to update the page flags every time. However, we now pin the pages into the object for the duration of the pwrite/pread (and hopefully much longer) and so we can forgo the flag updates until we release all the pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-08 00:03:01 +01:00
Brad Volkin	4c914c0c7c	drm/i915: Refactor shmem pread setup The command parser is going to need the same synchronization and setup logic, so factor it out for reuse. v2: Add a check that the object is backed by shmem Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-07 22:36:59 +01:00
Damien Lespiau	cb216aa844	drm/i915: Make i915_gem_retire_requests_ring() static Its last usage outside of i915_gem.c was removed in: commit `1f70999f90` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 27 22:43:07 2014 +0000 drm/i915: Prevent recursion by retiring requests when the ring is full Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:39 +01:00
Chris Wilson	ab0e7ff9f2	drm/i915: Record pid/comm of hanging task After finding the guilty batch and request, we can use it to find the process that submitted the batch and then add the culprit into the error state. This is a slightly different approach from Ben's in that instead of adding the extra information into the struct i915_hw_context, we use the information already captured in struct drm_file which is then referenced from the request. v2: Also capture the workaround buffer for gen2, so that we can compare its contents against the intended batch for the active request. v3: Rebase (Mika) v4: Check for null context (Chris) checkpatch warnings fixed Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4) Acked-by: Ben Widawsky <ben@bwidawsk.net> Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	8d9fc7fd2d	drm/i915: Rely on accurate request tracking for finding hung batches In the past, it was possible to have multiple batches per request due to a stray signal or ENOMEM. As a result we had to scan each active object (filtered by those having the COMMAND domain) for the one that contained the ACTHD pointer. This was then made more complicated by the introduction of ppgtt, whereby ACTHD then pointed into the address space of the context and so also needed to be taken into account. This is a fairly robust approach (though the implementation is a little fragile and depends upon the per-generation setup, registers and parameters). However, due to the requirements for hangstats, we needed a robust method for associating batches with a particular request and having that we can rely upon it for finding the associated batch object for error capture. If the batch buffer tracking is not robust enough, that should become apparent quite quickly through an erroneous error capture. That should also help to make sure that the runtime reporting to userspace is robust. It also means that we then report the oldest incomplete batch on each ring, which can be useful for determining the state of userspace at the time of a hang. v2: Use i915_gem_find_active_request (Mika) v3: remove check for ring->get_seqno, split long lines (Ben) v4: check that context is available (Chris) checkpatch warnings fixed Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:24 +01:00
Chris Wilson	64bf930379	drm/i915: Reset vma->mm_list after unbinding In place of true activity counting, we walk the list of vma associated with an object managing each on the vm's active/inactive list everytime we call move-to-inactive. This depends upon the vma->mm_list being cleared after unbinding, or else we run into difficulty when tracking the object in multiple vm's - we see a use-after free and corruption of the mm_list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:23 +01:00
Ville Syrjälä	ccc7bed05e	drm/i915: Don't ban default context when stop_rings!=0 If we've explicitly stopped the rings for testing purposes, don't ban the default context. Fixes kms_flip hang tests. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:14 +01:00
Chris Wilson	f62a007603	drm/i915: Accurately track when we mark the hardware as idle/busy We currently call intel_mark_idle() too often, as we do so as a side-effect of processing the request queue. However, we the calls to intel_mark_idle() are expected to be paired with a call to intel_mark_busy() (or else we try to idle the hardware by accessing registers that are already disabled). Make the idle/busy tracking explicit to prevent the multiple calls. v2: We can drop some of the complexity in __i915_add_request() as queue_delayed_work() already behaves as we want (not requeuing the item if it is already in the queue) and mark_busy/mark_idle imply that the idle task is inactive. v3: We do still need to cancel the pending idle task so that it is sent again after the current busy load completes (not in the middle of it). Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-03-05 21:30:10 +01:00
Daniel Vetter	8ea99c9287	drm/i915: Only bind each object rather than for every execbuffer One side-effect of the introduction of ppgtt was that we needed to rebind the object into the appropriate vm (and global gtt in some peculiar cases). For simplicity this was done twice for every object on every call to execbuffer. However, that adds a tremendous amount of CPU overhead (rewriting all the PTE for all objects into WC memory) per draw. The fix is to push all the decision about which vm to bind into and when down into the low-level bind routines through hints rather than performing the bind unconditionally in the execbuffer routine. Note that this is a regression introduced in the full ppgtt feature branch, before this we've only done re-bound objects when the relevant has_(aliasing_ppgtt\|global_gtt)_mapping flag was clear. But since that's per-object and not per-vma that optimization broke. v2: Split out prep work and unrelated changes. v3: Bring back functional change around PIN_GLOBAL that I've accidentally split out. v4: Remove the temporary hack for the old binding logic to avoid bisection issues. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906 Tested-by: jianx.zhou@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:38 +01:00
Daniel Vetter	262de14531	drm/i915: Directly return the vma from bind_to_vm This is prep work for reworking the object_pin logic. Atm it still does a (now redundant) lookup of the vma. The next patch will fix this. Split out from Chris vma-bind rework. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:30 +01:00
Daniel Vetter	b287110e89	drm/i915: Simplify i915_gem_object_ggtt_unpin Split out from Chris vma-bind rework. Jani wondered why this is save, and the reason is that i915_vma_unbind does all these checks, too. So they're redundant. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:18:21 +01:00
Daniel Vetter	bf3d149b25	drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE With abitrary pin flags it makes sense to split out a "please bind this into global gtt" from the "please allocate in the mappable range". Use this unconditionally in our global gtt pin helper since this is what its callers want. Later patches will drop PIN_MAPPABLE where it's not strictly needed. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:17:27 +01:00
Daniel Vetter	1ec9e26dda	drm/i915: Consolidate binding parameters into flags Anything more than just one bool parameter is just a pain to read, symbolic constants are much better. Split out from Chris' vma-binding rework patch. v2: Undo the behaviour change in object_pin that Chris spotted. v3: Split out misplaced hunk to handle set_cache_level errors, spotted by Jani. v4: Keep the current over-zealous binding logic in the execbuffer code working with a quick hack while the overall binding code gets shuffled around. v5: Reorder the PIN_ flags for more natural patch splitup. v6: Pull out the PIN_GLOBAL split-up again. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-14 14:16:58 +01:00
Chris Wilson	6e4930f6ee	drm/i915: Flush GPU rendering with a lockless wait during a pagefault Arjan van de Ven reported that on his test machine that he was seeing stalls of greater than 1 frame greatly impacting the user experience. He tracked this down to being the locked flush during a pagefault as being the culprit hogging the struct_mutex and so blocking any other user from proceeding. Stalling on a pagefault is bad behaviour on userspace's part, for one it means that they are ignoring the coherency rules on pointer access through the GTT, but fortunately we can apply the same trick as the set-to-domain ioctl to do a lightweight, nonblocking flush of outstanding rendering first. "Prior to the patch it looks like this (this one testrun does not show the 20ms+ I've seen occasionally) 4.99 ms 2.36 ms 31360 __wait_seqno i915_wait_seqno i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_fault __do_fault handle_ +pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault 4.99 ms 2.75 ms 107751 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.99 ms 1.63 ms 1666 i915_mutex_lock_interruptible i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fa +ult 4.93 ms 2.45 ms 980 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_ +sysret 4.89 ms 2.20 ms 3283 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.34 ms 1.66 ms 1715 i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.73 ms 3.73 ms 49 i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.17 ms 0.33 ms 931 i915_mutex_lock_interruptible i915_gem_madvise_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.97 ms 0.43 ms 1029 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.55 ms 0.51 ms 735 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret After the patch it looks like this: 4.99 ms 2.14 ms 22212 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 4.86 ms 0.99 ms 14170 __wait_seqno i915_gem_object_wait_rendering__nonblocking i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_ +fault do_page_fault page_fault 3.59 ms 1.31 ms 325 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 3.37 ms 3.37 ms 65 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 2.58 ms 2.58 ms 65 i915_mutex_lock_interruptible i915_gem_do_execbuffer.isra.23 i915_gem_execbuffer2 drm_ioctl i915_compat_ioctl compat_sys_ioctl +ia32_sysret 2.19 ms 2.19 ms 65 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_ +sysret 2.18 ms 2.18 ms 65 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret 1.66 ms 1.66 ms 65 i915_gem_set_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret It may not look like it, but this is quite a large difference, and I've been unable to reproduce > 5 msec delays at all, while before they do happen (just not in the trace above)." gem_gtt_hog on an old Pineview (GMA3150), before: 4969.119ms after: 4122.749ms Reported-by: Arjan van de Ven <arjan.van.de.ven@intel.com> Testcase: igt/gem_gtt_hog Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:59 +01:00
Chris Wilson	bd9b6a4ec5	drm/i915: Downgrade ERROR message for invalid user input When we detect that the user passed along an invalid handle or object, we emit a warning as an aide for debugging. Since these are indeed only for debugging user triggerable errors (and the errors are reported back to userspace by the errno), the messages should only be at the debug level and not claiming that there is a catastrophic error in the driver/hardware. References: https://bugs.freedesktop.org/show_bug.cgi?id=74704 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:54 +01:00
Damien Lespiau	3d13ef2e2d	drm/i915: Always use INTEL_INFO() to access the device_info structure If we make sure that all the dev_priv->info usages are wrapped by INTEL_INFO(), we can easily modify the ->info field to be structure and not a pointer while keeping the const protection in the INTEL_INFO() macro. v2: Rebased onto latest drm-nightly Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-12 18:52:50 +01:00
Chris Wilson	8c99e57d39	drm/i915: Treat using a purged buffer as a source of EFAULT Since a purged buffer is one without any associated pages, attempting to use it should generate EFAULT rather than EINVAL, as it is not strictly an invalid parameter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 17:03:32 +01:00
Chris Wilson	45d678173a	drm/i915: Convert EFAULT into a silent SIGBUS EFAULT will be a possible return code where backing storage is transient, such after it is purged by madvise. As such it is to be expected and so should not trigger a WARN inside i915_gem_fault() but be converted silently to SIGBUS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 17:03:27 +01:00
Mika Kuoppala	e38486943e	drm/i915: release mutex in i915_gem_init()'s error path Found with smatch. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-02-04 12:10:45 +01:00

1 2 3 4 5 ...

950 Commits