linux

Author	SHA1	Message	Date
Ben Widawsky	0a9ae0d7f8	drm/i915: pre-fixes for checkpatch Since I'll need to modify i915_gem_object_bind_to_gtt(), fix the errors now to get checkpatch to not complain. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Resolve conflict with Chris' improved debug output, and bikeshed the new variable with s/max/gtt_max/ a bit while at it.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-31 20:53:56 +02:00
Dave Airlie	e81f3d81e2	Merge tag 'drm-intel-next-2013-05-20-merged' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: Highlights (copy-pasted from my testing cycle mails): - fbc support for Haswell (Rodrigo) - streamlined workaround comments, including an igt tool to grep for them (Damien) - sdvo and TV out cleanups, including a fixup for sdvo multifunction devices - refactor our eDP mess a bit (Imre) - don't register the hdmi connector on haswell when desktop eDP is present - vlv support is no longer preliminary! - more vlv fixes from Jesse for stolen and dpll handling - more flexible power well checking infrastructure from Paulo - a few gtt patches from Ben - a bit of OCD cleanups for transcoder #defines and an assorted pile of smaller things. - fixes for the gmch modeset sequence - a bit of OCD around plane/pipe usage (Ville) - vlv turbo support (Jesse) - tons of vlv modeset fixes (Jesse et al.) - vlv pte write fixes (Kenneth Graunke) - hpd filtering to avoid costly probes on unaffected outputs (Egbert Eich) - intel dev_info cleanups and refactorings (Damien) - vlv rc6 support (Jesse) - random pile of fixes around non-24bpp modes handling - asle/opregion cleanups and locking fixes (Jani) - dp dpll refactoring - improvements for reduced_clock computation on g4x/ilk+ - pfit state refactored to use pipe_config (Jesse) - lots more computed modeset state moved to pipe_config, including readout and cross-check support - fdi auto-dithering for ivb B/C links, using the neat pipe_config improvements - drm_rect helpers plus sprite clipping fixes (Ville) - hw context refcounting (Mika + Ben) * tag 'drm-intel-next-2013-05-20-merged' of git://people.freedesktop.org/~danvet/drm-intel: (155 commits) drm/i915: add support for dvo Chrontel 7010B drm/i915: Use pipe config state to control gmch pfit enable/disable drm/i915: Use pipe_config state to disable ilk+ pfit drm/i915: panel fitter hw state readout&check support drm/i915: implement WADPOClockGatingDisable for LPT drm/i915: Add missing platform tags to FBC workaround comments drm/i915: rip out an unused lvds_reg variable drm/i915: Compute WR PLL dividers dynamically drm/i915: HSW FBC WaFbcDisableDpfcClockGating drm/i915: HSW FBC WaFbcAsynchFlipDisableFbcQueue drm/i915: Enable FBC at Haswell. drm/i915: IVB FBC WaFbcDisableDpfcClockGating drm/i915: IVB FBC WaFbcAsynchFlipDisableFbcQueue drm/i915: Add support for FBC on Ivybridge. drm/i915: Organize VBT stuff inside drm_i915_private drm/i915: make SDVO TV-out work for multifunction devices drm/i915: rip out now unused is_foo tracking from crtc code drm/i915: rip out TV-out lore ... drm/i915: drop TVclock special casing on ilk+ drm/i915: move sdvo TV clock computation to intel_sdvo.c ...	2013-05-31 12:56:05 +10:00
Chris Wilson	2dc8aae06d	drm/i915: Workaround incoherence with fence updates on Valleyview In commit `25ff1195f8` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Apr 4 21:31:03 2013 +0100 drm/i915: Workaround incoherence between fences and LLC across multiple CPUs we introduced an empirical workaround for memory corruption when using fences from multiple CPUs. At the time, we did not have any results for Valleyview, so the presumption was that it was limited to recent generations using LLC. Now we have evidence that Valleyview also suffers incoherence and requires a similar but different workaround. For Valleyview, the wbinvd instruction is insufficient and we require the serialising register write per-CPU. Conversely, that serialising register write is not enough for SNB/IVB/HSW. To compromise and keep the code relatively clean, employ both serialisation techniques in the same workaround. Reported-by: Jon Bloomfield <jon.bloomfield@intel.com> Tested-by: Jon Bloomfield <jon.bloomfield@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-23 12:51:31 +02:00
Chris Wilson	a36689cb77	drm/i915: Be more informative when reporting "too large for aperture" error This should help debugging the truly unexpected cases where it occurs - in particular to see which value is garbage. References: https://bugzilla.kernel.org/show_bug.cgi?id=58511 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: s/%ld/%zd/ as spotted by Wu Fengguang's autobuilder.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-23 12:51:29 +02:00
Imre Deak	e054cc3937	drm/i915: avoid premature timeouts in __wait_seqno() At the moment wait_event_timeout/wait_event_interruptible_timeout may time out 1 jiffy too early, as the calculated expiry time is 1 less than needed. Besides timing out too early this also means that the calculation of the remaining time will be incorrect and we will pass a non-zero remaining time to user space in case of a time out. This is one reason for the following bugzilla report: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64270 Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-22 13:51:23 +02:00
Daniel Vetter	e1b73cba13	Linux 3.10-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRmpexAAoJEHm+PkMAQRiGrRIH/1uWFW38RvaCV/PXm/ia6Z+x BfBJfBIvPxGwb4n7aQNQlhU25xkfrPZ6szO4WiBH5/KPH3xYi2I2OZ1AzffkYqMF BWkPmsPK6EsTdp16zsi6JtH2aXArG4SpYA7ZamPvDkmfigHuiZg7GlL/9eHTRPNV P7Q8JToOrcnP8RoGgNj0uFiQeQbc62Kmoq7WuPtUhVlpQCCCknXgOJiYgz9w6Xe9 /i79YFS8WRrzAquExT1NbIOh4ZMqB9MvuroaVWy8JDDLUyz7QUvOCe3tCDNguwgi FdWvU6nfkdQq5SLaWCWXDE9Rp/pL1MvfBn9vCOwFcp42aw0aQ0PgJVIXvsqufd0= =jgDI -----END PGP SIGNATURE----- Merge tag 'v3.10-rc2' into drm-intel-next-queued Backmerge Linux 3.10-rc2 since the various (rather trivial) conflicts grew a bit out of hand. intel_dp.c has the only real functional conflict since the logic changed while dev_priv->edp.bpp was moved around. Also squash in a whitespace fixup from Ben Widawsky for i915_gem_gtt.c, git seems to do something pretty strange in there (which I don't fully understand tbh). Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_dp.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-21 09:52:16 +02:00
Mika Kuoppala	0e50e96bf2	drm/i915: add context into request struct Storing context reference into request struct allows us to inspect context and its associated objects when requests are retired. Both ppgtt and arb robustness work will need this. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-05-06 11:21:51 +02:00
Chris Wilson	4f42f4ef0d	drm/i915: Always normalize return timeout for wait_timeout_ioctl As we recompute the remaining timeout after waiting, there is a potential for that timeout to be less than zero and so need sanitizing. The timeout is always returned to userspace and validated, so we should always perform the sanitation. v2 [vsyrjala]: Only normalize the timespec if it's invalid v3: Add a comment to clarify the situation and remove the now useless WARN_ON() (ickle) Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-04-30 10:50:32 +02:00
Ville Syrjälä	42b5aeabe9	drm/i915: IVB/HSW have 32 fence register Increase the number of fence registers to 32 on IVB/HSW. VLV however only has 16 fence registers according to the docs. Increasing the number of fences was attempted before [1], but there was some uncertainty about the maximum CPU fence number for FBC. Since then BSpec has been updated to state that there are in fact 32 fence registers, and the CPU fence number field in the SNB_DPFC_CTL_SA register is 5 bits, and the CPU fence number field in the ILK_DPFC_CONTROL register must be zero. So now it all makes sense. [1] http://lists.freedesktop.org/archives/intel-gfx/2011-October/012865.html v2: Include some background information based on the previous attempt Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-04-18 09:43:21 +02:00
Ben Widawsky	b7c36d2546	drm/i915: Allow PPGTT enable to fail I'm really not happy that we have to support this, but this will be the simplest way to handle cases where PPGTT init can fail, which I promise will be coming in the future. v2: Resolve conflicts due to patch series reordering. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-04-18 09:43:16 +02:00
Ben Widawsky	6197349bde	drm/i915: Abstract PPGTT enabling Since we've already set up a nice vtable to abstract other PPGTT functions, also abstract the actual register programming to enable things. This function will probably need to change a bit as we implement real processes. v2: Resolve conflicts due to patch series reordering. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-04-18 09:43:15 +02:00
Chris Wilson	25ff1195f8	drm/i915: Workaround incoherence between fences and LLC across multiple CPUs In order to fully serialize access to the fenced region and the update to the fence register we need to take extreme measures on SNB+, and manually flush writes to memory prior to writing the fence register in conjunction with the memory barriers placed around the register write. Fixes i-g-t/gem_fence_thrash v2: Bring a bigger gun v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) v4: Remove changes for working generations. v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. v6: Rewrite comments to ellide forgotten history. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Tested-by: Jon Bloomfield <jon.bloomfield@intel.com> (v2) Cc: stable@vger.kernel.org Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-04-18 09:43:10 +02:00
Ben Widawsky	88a2b2a32d	drm/i915: Don't wait for PCH on reset BIOS should be setting this, but in case it doesn't... v2: Define the bits we actually want to clear (Jesse) Make it an RMW op (Jesse) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-04-08 20:53:05 +02:00
Imre Deak	2db76d7c3c	lib/scatterlist: sg_page_iter: support sg lists w/o backing pages The i915 driver uses sg lists for memory without backing 'struct page' pages, similarly to other IO memory regions, setting only the DMA address for these. It does this, so that it can program the HW MMU tables in a uniform way both for sg lists with and without backing pages. Without a valid page pointer we can't call nth_page to get the current page in __sg_page_iter_next, so add a helper that relevant users can call separately. Also add a helper to get the DMA address of the current page (idea from Daniel). Convert all places in i915, to use the new API. Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-27 17:13:44 +01:00
Chris Wilson	f9c513e9d6	drm/i915: Always call fence-lost prior to removing the fence There is a minute window for a race between put-fence removing the fence and for a new transaction by an external party on the GTT mmap. That is we must zap the mmap prior to removing the fence and not afterwards. Fixes regression from commit `61050808bb` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 17 15:31:31 2012 +0100 drm/i915: Refactor put_fence() to use the common fence writing routine v2: Remember the fence to remove with a local variable (gcc) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-26 20:16:18 +01:00
Imre Deak	90797e6d1e	drm/i915: create compact dma scatter lists for gem objects So far we created a sparse dma scatter list for gem objects, where each scatter list entry represented only a single page. In the future we'll have to handle compact scatter lists too where each entry can consist of multiple pages, for example for objects imported through PRIME. The previous patches have already fixed up all other places where the i915 driver _walked_ these lists. Here we have the corresponding fix to _create_ compact lists. It's not a performance or memory footprint improvement, but it helps to better exercise the new logic. Reference: http://www.spinics.net/lists/dri-devel/msg33917.html Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-23 12:17:09 +01:00
Imre Deak	67d5a50c04	drm/i915: handle walking compact dma scatter lists So far the assumption was that each dma scatter list entry contains only a single page. This might not hold in the future, when we'll introduce compact scatter lists, so prepare for this everywhere in the i915 code where we walk such a list. We'll fix the place _creating_ these lists separately in the next patch to help the reviewing/bisectability. Reference: http://www.spinics.net/lists/dri-devel/msg33917.html Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-23 12:16:36 +01:00
Daniel Vetter	0d4a42f6bd	Linux 3.9-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRRkrbAAoJEHm+PkMAQRiGy3oH/jrbHinYs0auurANgx4TdtWT /WNajstKBqLOJJ6cnTR7sOqwOVlptt65EbbTs+qGyZ2Z2W/Lg0BMenHvNHo4ER8C e7UbMdBCSLKBjAMKh1XCoZscGv4Exm8WRH3Vc5yP0Hafj3EzSAVLY1dta9WKKoQi bh7D1ErUlbU1zczA1w5YbPF0LqFKRvyZOwebMCCAKAxv5wWAxmbcPNxVR4sufkjg k6TkQ2ysgWivZAfy3tJYOcxiEu7ahpZVEuYdlZEJQXHRQUfoNljQlOp4BqKsYUai 5A0kaf2VpKay/7pkhvTfBBcF/jFJ68pYP6gQ2ThNdr0b5kOiAfMWj030Xyngnhg= =iO9t -----END PGP SIGNATURE----- Merge tag 'v3.9-rc3' into drm-intel-next-queued Backmerge so that I can merge Imre Deak's coalesced sg entries fixes, which depend upon the new for_each_sg_page introduce in commit `a321e91b6d` Author: Imre Deak <imre.deak@intel.com> Date: Wed Feb 27 17:02:56 2013 -0800 lib/scatterlist: add simple page iterator The merge itself is just two trivial conflicts: Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-19 09:47:30 +01:00
Jesse Barnes	d62b4892f3	drm/i915: allow force wake at init time on VLV v2 We need to set the 'allow force wake' bit to enable forcewake handling later on. v2: split from clock gating patch (Jani) check for allowwakeack (Ville) Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-19 09:38:32 +01:00
Ville Syrjälä	2bb4629add	drm/i915: Add to_user_ptr() to_user_ptr() simply casts a pointer passed as u64 from user space to void __user * correctly. Using this lets us get rid of all the tiresome casts. The idea came from Chris Wilson <chris@chris-wilson.co.uk>. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-03-03 19:49:11 +01:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Daniel Vetter	eb32e4584d	drm/i915: Use HAS_L3_GPU_CACHE in i915_gem_l3_remap Yet another remnant ... this might explain why l3 remapping didn't really work on HSW. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57441 Spotted-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-02-20 00:21:47 +01:00
Imre Deak	769ce4643b	drm/i915: don't clflush gem objects in stolen memory As explained by Chris Wilson gem objects in stolen memory are always coherent with the GPU so we don't need to ever flush the CPU caches for these. This fixes a breakage - at least with the compact sg patches applied - during the resume/restore gtt mappings path, when we tried to clflush an FB object in stolen memory, but since stolen objects don't have backing pages we passed an invalid page pointer to drm_clflush_page(). Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-02-20 00:21:43 +01:00
Ben Widawsky	4fc7c971c3	drm/i915: Extract ring init from hw_init The ring initialization will differ a bit in upcoming generations, and this split will prepare the code for what's needed. This patch also fixes a bug introduced in: commit `9943393195` Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Tue Jan 22 14:12:17 2013 +0200 drm/i915: use gem_set_seqno() on hardware init After doing the extraction, the bad error handling became obvious. I acknowledge that this should be two patches, but it's a pretty small/trivial patch. If requested, I can certainly do the fix as a distinct patch. v2: Should be cleanup blt, not init blt on failure (Chris) v3: Forgot to git add on v2 Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-02-15 10:30:39 +01:00
Dave Airlie	cd17ef4114	Merge tag 'drm-intel-next-2013-02-01' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: "Probably the last feature pull for 3.9, there's some fixes outstanding thought that I'd like to sneak in. And maybe 3.8 takes a bit longer ... Anyway, highlights of this pull: - Kill the horrible IS_DISPLAYREG hack to handle the mmio offset movements on vlv, big thanks to Ville. - Dynamic power well support for Haswell, shaves away a bit when only using the eDP port on pipe A (Paulo). Plus unclaimed register fixes uncovered by this. - Clarifications of the gpu hang/reset state transitions, hopefully fixing a few spurious -EIO deaths in userspace. - Haswell ELD fixes. - Some more (pp)gtt cleanups from Ben. - A few smaller things all over. Plus all the stuff from the previous rather small pull request: - Broadcast RBG improvements and reduced color range fixes from Ville. - Ben is on a "kill legacy gtt code for good" spree, first pile of patches included. - No-relocs and bo lut improvements for faster execbuf from Chris. - Some refactorings from Imre." * tag 'drm-intel-next-2013-02-01' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits) GPU/i915: Fix acpi_bus_get_device() check in drivers/gpu/drm/i915/intel_opregion.c drm/i915: Set the SR01 "screen off" bit in i915_redisable_vga() too drm/i915: Kill IS_DISPLAYREG() drm/i915: Introduce i915_vgacntrl_reg() drm/i915: gen6_gmch_remove can be static drm/i915: dynamic Haswell display power well support drm/i915: check the power down well on assert_pipe() drm/i915: don't send DP "idle" pattern before "normal" on HSW PORT_A drm/i915: don't run hsw power well code on !hsw drm/i915: kill cargo-culted locking from power well code drm/i915: Only run idle processing from i915_gem_retire_requests_worker drm/i915: Fix CAGF for HSW drm/i915: Reclaim GTT space for failed PPGTT drm/i915: remove intel_gtt structure drm/i915: Add probe and remove to the gtt ops drm/i915: extract hw ppgtt setup/cleanup code drm/i915: pte_encode is gen6+ drm/i915: vfuncs for ppgtt drm/i915: vfuncs for gtt_clear_range/insert_entries drm/i915: Error state should print /sys/kernel/debug ...	2013-02-08 11:08:10 +10:00
Chris Wilson	725a5b5402	drm/i915: Only run idle processing from i915_gem_retire_requests_worker When adding the fb idle detection to mark-inactive, it was forgotten that userspace can drive the processing of retire-requests. We assumed that it would be principally driven by the retire requests worker, running once every second whilst active and so we would get the deferred timer for free. Instead we spend too many CPU cycles reclocking the LVDS preventing real work from being done. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-and-tested-by: Alexander Lam <lambchop468@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58843 Cc: stable@vger.kernel.org Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-31 11:50:09 +01:00
Mika Kuoppala	9943393195	drm/i915: use gem_set_seqno() on hardware init When machine was rebooted or module was reloaded, gem_hw_init() set last_seqno to be identical to next_seqno. This lead to situation that waits for first ever request always passed immediately regardless if it was actually executed. Use gem_set_seqno() to be consistent how hw is initialized on init, wrap and on resume. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-22 13:52:26 +01:00
Daniel Vetter	f69061bedd	drm/i915: create a race-free reset detection With the previous patch the state transition handling of the reset code itself is now (hopefully) race free and solid. But that still leaves out everyone else - with the various lock-free wait paths we have there's the possibility that the reset happens between the point where we read the seqno we should wait on and the actual wait. And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll happily wait for a seqno which will in all likelyhood never signal. In practice this is not a big problem since the X server gets constantly interrupted, and can then submit more work (hopefully) to unblock everyone else: As soon as a new seqno write lands, all waiters will unblock. But running the i-g-t reset testcase ZZ_hangman can expose this race, especially on slower hw with fewer cpu cores. Now looking forward to ARB_robustness and friends that's not the best possible behaviour, hence this patch adds a reset_counter to be able to detect any reset, even if a given thread never observed the in-progress state. The important part is to correctly order things: - The write side needs to increment the counter after any seqno gets reset. Hence we need to do that at the end of the reset work, and again wake everyone up. We also need to place a barrier in between any possible seqno changes and the counter increment, since any unlock operations only guarantee that nothing leaks out, but not that at later load operation gets moved ahead. - On the read side we need to ensure that no reset can sneak in and invalidate the seqno. In all cases we can use the one-sided barrier that unlock operations guarantee (of the lock protecting the respective seqno/ring pair) to ensure correct ordering. Hence it is sufficient to place the atomic read before the mutex/spin_unlock and no additional barriers are required. The end-result of all this is that we need to wake up everyone twice in a reset operation: - First, before the reset starts, to get any lockholders of the locks, so that the reset can proceed. - Second, after the reset is completed, to allow waiters to properly and reliably detect the reset condition and bail out. I admit that this entire reset_counter thing smells a bit like overkill, but I think it's justified since it makes it really explicit what the bail-out condition is. And we need a reset counter anyway to implement ARB_robustness, and imo with finer-grained locking on the horizont this is the most resilient scheme I could think of. v2: Drop spurious change in the wait_for_error EXIT_COND - we only need to wait until we leave the reset-in-progress wedged state. v3: Don't play tricks with barriers in the throttle ioctl, the spin_unlock is barrier enough. I've also considered using a little helper to grab the current reset_counter, but then decided that hiding the atomic_read isn't a great idea, since having it explicitly show up in the code is a nice remainder to reviews to check the memory barriers. v4: Add a comment to explain why we need to fall through in __wait_seqno in the end variable assignments. v5: Review from Damien: - s/smb/smp/ in a comment - don't increment the reset counter after we've set it to WEDGED. Now we (again) properly wedge the gpu when the reset fails. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-21 19:53:54 +01:00
Dave Airlie	735dc0d1e2	Merge branch 'drm-kms-locking' of git://people.freedesktop.org/~danvet/drm-intel into drm-next The aim of this locking rework is that ioctls which a compositor should be might call for every frame (set_cursor, page_flip, addfb, rmfb and getfb/create_handle) should not be able to block on kms background activities like output detection. And since each EDID read takes about 25ms (in the best case), that always means we'll drop at least one frame. The solution is to add per-crtc locking for these ioctls, and restrict background activities to only use the global lock. Change-the-world type of events (modeset, dpms, ...) need to grab all locks. Two tricky parts arose in the conversion: - A lot of current code assumes that a kms fb object can't disappear while holding the global lock, since the current code serializes fb destruction with it. Hence proper lifetime management using the already created refcounting for fbs need to be instantiated for all ioctls and interfaces/users. - The rmfb ioctl removes the to-be-deleted fb from all active users. But unconditionally taking the global kms lock to do so introduces an unacceptable potential stall point. And obviously changing the userspace abi isn't on the table, either. Hence this conversion opportunistically checks whether the rmfb ioctl holds the very last reference, which guarantees that the fb isn't in active use on any crtc or plane (thanks to the conversion to the new lifetime rules using proper refcounting). Only if this is not the case will the code go through the slowpath and grab all modeset locks. Sane compositors will never hit this path and so avoid the stall, but userspace relying on these semantics will also not break. All these cases are exercised by the newly added subtests for the i-g-t kms_flip, tested on a machine where a full detect cycle takes around 100 ms. It works, and no frames are dropped any more with these patches applied. kms_flip also contains a special case to exercise the above-describe rmfb slowpath. * 'drm-kms-locking' of git://people.freedesktop.org/~danvet/drm-intel: (335 commits) drm/fb_helper: check whether fbcon is bound drm/doc: updates for new framebuffer lifetime rules drm: don't hold crtc mutexes for connector ->detect callbacks drm: only grab the crtc lock for pageflips drm: optimize drm_framebuffer_remove drm/vmwgfx: add proper framebuffer refcounting drm/i915: dump refcount into framebuffer debugfs file drm: refcounting for crtc framebuffers drm: refcounting for sprite framebuffers drm: fb refcounting for dirtyfb_ioctl drm: don't take modeset locks in getfb ioctl drm: push modeset_lock_all into ->fb_create driver callbacks drm: nest modeset locks within fpriv->fbs_lock drm: reference framebuffers which are on the idr drm: revamp framebuffer cleanup interfaces drm: create drm_framebuffer_lookup drm: revamp locking around fb creation/destruction drm: only take the crtc lock for ->cursor_move drm: only take the crtc lock for ->cursor_set drm: add per-crtc locks ...	2013-01-21 07:44:58 +10:00
Chris Wilson	97c809fd9c	drm/i915: Only apply the mb() when flushing the GTT domain during a finish Now that we seem to have brought order to the GTT barriers, the last one to review is the terminal barrier before we unbind the buffer from the GTT. This needs to only be performed if the buffer still resides in the GTT domain, and so we can skip some needless barriers otherwise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:11:17 +01:00
Chris Wilson	d0a57789d5	drm/i915: Only insert the mb() before updating the fence parameter With a fence, we only need to insert a memory barrier around the actual fence alteration for CPU accesses through the GTT. Performing the barrier in flush-fence was inserting unnecessary and expensive barriers for never fenced objects. Note removing the barriers from flush-fence, which was effectively a barrier before every direct access through the GTT, revealed that we where missing a barrier before the first access through the GTT. Lack of that barrier was sufficient to cause GPU hangs. v2: Add a couple more comments to explain the new barriers Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:11:16 +01:00
Daniel Vetter	1f83fee08d	drm/i915: clear up wedged transitions We have two important transitions of the wedged state in the current code: - 0 -> 1: This means a hang has been detected, and signals to everyone that they please get of any locks, so that the reset work item can do its job. - 1 -> 0: The reset handler has completed. Now the last transition mixes up two states: "Reset completed and successful" and "Reset failed". To distinguish these two we do some tricks with the reset completion, but I simply could not convince myself that this doesn't race under odd circumstances. Hence split this up, and add a new terminal state indicating that the hw is gone for good. Also add explicit #defines for both states, update comments. v2: Split out the reset handling bugfix for the throttle ioctl. v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase error which prevented this patch from actually compiling. v4: To unify the wedged state with the reset counter, keep the reset-in-progress state just as a flag. The terminally-wedged state is now denoted with a big number. v5: Add a comment to the reset_counter special values explaining that WEDGED & RESET_IN_PROGRESS needs to be true for the code to be correct. v6: Fixup logic errors introduced with the wedged+reset_counter unification. Since WEDGED implies reset-in-progress (in a way we're terminally stuck in the dead-but-reset-not-completed state), we need ensure that we check for this everywhere. The specific bug was in wait_for_error, which would simply have timed out. v7: Extract an inline i915_reset_in_progress helper to make the code more readable. Also annote the reset-in-progress case with an unlikely, to help the compiler optimize the fastpath. Do the same for the terminally wedged case with i915_terminally_wedged. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:11:16 +01:00
Daniel Vetter	308887aad1	drm/i915: fix reset handling in the throttle ioctl While auditing the code I've noticed one place (the throttle ioctl) which does not yet wait for the reset handler to complete and doesn't properly decode the wedge state into -EAGAIN/-EIO. Fix this up by calling the right helpers. This might explain the oddball "my compositor just died in a successfull gpu reset" reports. Or maybe not, since current mesa doesn't use this ioctl to throttle command submission. The throttle ioctl doesn't take the struct_mutex, so to avoid busy-looping with -EAGAIN while a reset is in process, check for errors first and wait for the handler to complete if a reset is pending by calling i915_gem_wait_for_error. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:11:15 +01:00
Daniel Vetter	33196dedda	drm/i915: move wedged to the other gpu error handling stuff And to make Ben Widawsky happier, use the gpu_error instead of the entire device as the argument in some functions. Drop the outdated comment on ->wedged for now, a follow-up patch will change the semantics and add a proper comment again. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:11:15 +01:00
Daniel Vetter	99584db33b	drm/i915: extract hangcheck/reset/error_state state into substruct This has been sprinkled all over the place in dev_priv. I think it'd be good to also move all the code into a separate file like i915_gem_error.c, but that's for another patch. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:11:14 +01:00
Ben Widawsky	93d187993b	drm/i915: Remove use of gtt_mappable_entries Mappable_end, ie. size is almost always what you want as opposed to the number of entries. Since we already have that information, we can scrap the number of entries and only calculate it when needed. If gtt_start is !0, this will have slightly different behavior. This difference can only occur in DRI1, and exists when we try to kick out the firmware fb. The new code seems like a bugfix to me. The other case where we've changed the behavior is during init we check the mappable region against our current known upper and lower limits (64MB, and 512MB). This now matches the comment, and makes things more convenient after removing gtt_mappable_entries. Also worth noting is the setting of mappable_end is taken out of setup because we do it earlier now in the DRI2 case and therefore need to add that tiny hunk to support the DRI1 IOCTL. v2: Move up mappable end to before legacy AGP init v3: Add the dev_priv inclusion here from previous rebase error in patch 5 Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> (v2) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: squash in fix for a printk format flag mismatch warning.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-20 13:09:20 +01:00
Ben Widawsky	5d4545aef5	drm/i915: Create a gtt structure The purpose of the gtt structure is to help isolate our gtt specific properties from the rest of the code (in doing so it help us finish the isolation from the AGP connection). The following members are pulled out (and renamed): gtt_start gtt_total gtt_mappable_end gtt_mappable gtt_base_addr gsm The gtt structure will serve as a nice place to put gen specific gtt routines in upcoming patches. As far as what else I feel belongs in this structure: it is meant to encapsulate the GTT's physical properties. This is why I've not added fields which track various drm_mm properties, or things like gtt_mtrr (which is itself a pretty transient field). Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [Ben modified commit messages] Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:33:56 +01:00
Chris Wilson	43e28f092b	drm/i915: Bail if we attempt to allocate pages for a purged object Move the existing checking inside bind_to_gtt() to the more appropriate layer in order to prevent recreation of the pages after they have been explicitly truncated. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:07:59 +01:00
Chris Wilson	dd624afd53	drm/i915: Add a debug interface to forcibly evict and shrink our object caches As a means to investigate some bad system behaviour related to the purging of the active, inactive and unbound lists, it is useful to be able to manually control when those lists should be cleared. v2: use _safe list iterators as we kick objects from the list as we walk. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add a small comment explaining why we don't need to check and wait for gpu resets, acked by Chris on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:07:57 +01:00
Imre Deak	0fa8779651	drm/i915: use gtt_get_size() instead of open coding it Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:07:56 +01:00
Imre Deak	56c844e539	drm/i915: merge {i965, sandybridge}_write_fence_reg() The two functions are rather similar, so merge them. Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:07:55 +01:00
Imre Deak	d865110cc2	drm/i915: merge get_gtt_alignment/get_unfenced_gtt_alignment() The two functions are rather similar, so merge them. Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-17 22:07:54 +01:00
Dave Airlie	b5cc6c0387	Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: - seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris Wilson - some leftover kill-agp on gen6+ patches from Ben - hotplug improvements from Damien - clear fb when allocated from stolen, avoids dirt on the fbcon (Chris) - Stolen mem support from Chris Wilson, one of the many steps to get to real fastboot support. - Some DDI code cleanups from Paulo. - Some refactorings around lvds and dp code. - some random little bits&pieces * tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits) drm/i915: Return the real error code from intel_set_mode() drm/i915: Make GSM void drm/i915: Move GSM mapping into dev_priv drm/i915: Move even more gtt code to i915_gem_gtt drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno drm/i915: Introduce i915_gem_set_seqno() drm/i915: Always clear semaphore mboxes on seqno wrap drm/i915: Initialize hardware semaphore state on ring init drm/i915: Introduce ring set_seqno drm/i915: Missed conversion to gtt_pte_t drm/i915: Bug on unsupported swizzled platforms drm/i915: BUG() if fences are used on unsupported platform drm/i915: fixup overlay stolen memory leak drm/i915: clean up PIPECONF bpc #defines drm/i915: add intel_dp_set_signal_levels drm/i915: remove leftover display.update_wm assignment drm/i915: check for the PCH when setting pch_transcoder drm/i915: Clear the stolen fb before enabling drm/i915: Access to snooped system memory through the GTT is incoherent drm/i915: Remove stale comment about intel_dp_detect() ... Conflicts: drivers/gpu/drm/i915/intel_display.c	2013-01-17 20:34:08 +10:00
Daniel Vetter	93927ca52a	drm/i915: Revert shrinker changes from "Track unbound pages" This partially reverts commit `6c085a728c` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 20 11:40:46 2012 +0200 drm/i915: Track unbound pages Closer inspection of that patch revealed a bunch of unrelated changes in the shrinker: - The shrinker count is now in pages instead of objects. - For counting the shrinkable objects the old code only looked at the inactive list, the new code looks at all bounds objects (including pinned ones). That is obviously in addition to the new unbound list. - The shrinker cound is no longer scaled with sysctl_vfs_cache_pressure. Note though that with the default tuning value of vfs_cache_pressue = 100 this doesn't affect the shrinker behaviour. - When actually shrinking objects, the old code first dropped purgeable objects, then normal (inactive) objects. Only then did it, in a last-ditch effort idle the gpu and evict everything. The new code omits the intermediate step of evicting normal inactive objects. Safe for the first change, which seems benign, and the shrinker count scaling, which is a bit a different story, the endresult of all these changes is that the shrinker is _much_ more likely to fall back to the last-ditch resort of idling the gpu and evicting everything. The old code could only do that if something else evicted lots of objects meanwhile (since without any other changes the nr_to_scan will be smaller than the object count). Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only dentry/inode object caches should scale their shrinker counts with vfs_cache_pressure. Originally I've had that change reverted, too. But Chris Wilson insisted that it's too bogus and shouldn't again see the light of day. Hence revert all these other changes and restore the old shrinker behaviour, with the minor adjustment that we now first scan the unbound list, then the inactive list for each object category (purgeable or normal). A similar patch has been tested by a few people affected by the gen4/5 hangs which started to appear in 3.7, which some people bisected to the "drm/i915: Track unbound pages" commit. But just disabling the unbound logic alone didn't change things at all. Note that this patch doesn't fix the referenced bugs, it only hides the underlying bug(s) well enough to restore pre-3.7 behaviour. The key to achieve that is to massively reduce the likelyhood of going into a full gpu stall and evicting everything. v2: Reword commit message a bit, taking Chris Wilson's comment into account. v3: On Chris Wilson's insistency, do not reinstate the rather bogus vfs_cache_pressure change. Tested-by: Greg KH <gregkh@linuxfoundation.org> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=55984 References: https://bugs.freedesktop.org/show_bug.cgi?id=57122 References: https://bugs.freedesktop.org/show_bug.cgi?id=56916 References: https://bugs.freedesktop.org/show_bug.cgi?id=57136 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-10 18:02:44 +01:00
Chris Wilson	93be8788e6	drm/i915; Only increment the user-pin-count after successfully pinning the bo As along the error path we do not correct the user pin-count for the failure, we may end up with userspace believing that it has a pinned object at offset 0 (when interrupted by a signal for example). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-01-07 10:30:53 +01:00
Dave Airlie	8be0e5c427	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Some fixes for 3.8: - Watermark fixups from Chris Wilson (4 pieces). - 2 snb workarounds, seem to be recently added to our internal DB. - workaround for the infamous i830/i845 hang, seems now finally solid! Based on Chris' fix for SNA, now also for UXA/mesa&old SNA. - Some more fixlets for shrinker-pulls-the-rug issues (Chris&me). - Fix dma-buf flags when exporting (you). - Disable the VGA plane if it's enabled on lid open - similar fix in spirit to the one I've sent you last weeek, BIOS' really like to mess with the display when closing the lid (awesome debug work from Krzysztof Mazur). * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: disable shrinker lock stealing for create_mmap_offset drm/i915: optionally disable shrinker lock stealing drm/i915: fix flags in dma buf exporting i915: ensure that VGA plane is disabled drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager drm: Export routines for inserting preallocated nodes into the mm manager drm/i915: don't disable disconnected outputs drm/i915: Implement workaround for broken CS tlb on i830/845 drm/i915: Implement WaSetupGtModeTdRowDispatch drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled drm/i915: Prefer CRTC 'active' rather than 'enabled' during WM computations drm/i915: Clear self-refresh watermarks when disabled drm/i915: Double the cursor self-refresh latency on Valleyview drm/i915: Fixup cursor latency used for IVB lp3 watermarks	2012-12-30 13:54:12 +10:00
Ben Widawsky	d7e5008f7c	drm/i915: Move even more gtt code to i915_gem_gtt This really should have been part of the kill agp series. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-20 16:27:35 +01:00
Daniel Vetter	da494d7ca5	drm/i915: disable shrinker lock stealing for create_mmap_offset The mmap offset structure is not part of the drm/i915 code, but provided by gem helpers. To avoid leaky abstractions (by either depending upon implementation details of said helper wrt to preallocations, or reimplementing it in our code and so fuzzing around in internal details of that helpr) simply disable the shrinker lock stealing accross calls into the helper functions. This should fix igt/gem_tiled_swapping. v2: Fix cleanup path confusion bemoaned by Chris Wilson. Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-20 14:57:35 +01:00
Daniel Vetter	677feac291	drm/i915: optionally disable shrinker lock stealing commit `5774506f15` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 13:04:04 2012 +0000 drm/i915: Borrow our struct_mutex for the direct reclaim added a nice trick to steal the struct_mutex lock in the shrinker if it's the current task holding it. But this also caused the requirement that every place which allocates memory needs to be careful about the gem state of objects, since the shrinker could have pulled the rug out from under it. We've usually solved this by carefully preallocating things or ensure that buffers are pinned already. But the shrinker also reaps mmap offset, so allocating those needs to be careful, too. Now that code has been factored out into some common helpers, so either we have fragile code depending upon the common helper not doing something we don't want it to do. Or we need to reimplement the mmap offset creation and so also leak implementation details into our code. Since this all results in leaky abstraction, cop out by disabling the lock borrowing trick while calling down into the helpers. That way our craziness is nicely confined to files in drm/i915. v2: Split out the change to create_mmap_offset as request by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-20 14:56:04 +01:00
Mika Kuoppala	fca26bb453	drm/i915: Introduce i915_gem_set_seqno() This function can be used to set the driver's next_seqno to arbitrary value. i915_gem_set_seqno() will idle the gpu, retire outstanding requests, clear the semaphore mailboxes and set the hardware status page's seqno index. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-19 11:25:10 +01:00
Mika Kuoppala	ba1a7067c0	drm/i915: Always clear semaphore mboxes on seqno wrap In preparation for setting the seqno to arbitrary value on init or through debugfs. We need to always clear the semaphores and set the hws page seqno index by calling intel_ring_init_seqno(). v2: rewrote the commit message as suggested by Chris Wilson. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-19 11:17:41 +01:00
Mika Kuoppala	f7e98ad4d4	drm/i915: Initialize hardware semaphore state on ring init Hardware status page needs to have proper seqno set as our initial seqno can be arbitrary. If initial seqno is close to wrap boundary on init and i915_seqno_passed() (31bit space) refers to hw status page which contains zero, errorneous result will be returned. v2: clear mboxes and set hws page directly instead of going through rings. Suggested by Chris Wilson. v3: hws needs to be updated for all gens. Noticed by Chris Wilson. References: https://bugs.freedesktop.org/show_bug.cgi?id=58230 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-19 11:17:01 +01:00
Ben Widawsky	8782e26c0c	drm/i915: Bug on unsupported swizzled platforms Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-18 22:31:23 +01:00
Ben Widawsky	7dbf9d6e0f	drm/i915: BUG() if fences are used on unsupported platform Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-18 22:29:55 +01:00
Chris Wilson	dc9dd7a20f	drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager As we may reap neighbouring objects in order to free up pages for allocations, we need to be careful not to allocate in the middle of the drm_mm manager. To accomplish this, we can simply allocate the drm_mm_node up front and then use the combined search & insert drm_mm routines, reducing our code footprint in the process. Fixes (partially) i-g-t/gem_tiled_swapping Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> [danvet: Again fixup atomic bikeshed.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-18 22:02:29 +01:00
Linus Torvalds	3c2e81ef34	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull DRM updates from Dave Airlie: "This is the one and only next pull for 3.8, we had a regression we found last week, so I was waiting for that to resolve itself, and I ended up with some Intel fixes on top as well. Highlights: - new driver: nvidia tegra 20/30/hdmi support - radeon: add support for previously unused DMA engines, more HDMI regs, eviction speeds ups and fixes - i915: HSW support enable, agp removal on GEN6, seqno wrapping - exynos: IPP subsystem support (image post proc), HDMI - nouveau: display class reworking, nv20->40 z compression - ttm: start of locking fixes, rcu usage for lookups, - core: documentation updates, docbook integration, monotonic clock usage, move from connector to object properties" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (590 commits) drm/exynos: add gsc ipp driver drm/exynos: add rotator ipp driver drm/exynos: add fimc ipp driver drm/exynos: add iommu support for ipp drm/exynos: add ipp subsystem drm/exynos: support device tree for fimd radeon: fix regression with eviction since evict caching changes drm/radeon: add more pedantic checks in the CP DMA checker drm/radeon: bump version for CS ioctl support for async DMA drm/radeon: enable the async DMA rings in the CS ioctl drm/radeon: add VM CS parser support for async DMA on cayman/TN/SI drm/radeon/kms: add evergreen/cayman CS parser for async DMA (v2) drm/radeon/kms: add 6xx/7xx CS parser for async DMA (v2) drm/radeon: fix htile buffer size computation for command stream checker drm/radeon: fix fence locking in the pageflip callback drm/radeon: make indirect register access concurrency-safe drm/radeon: add W\|RREG32_IDX for MM_INDEX\|DATA based mmio accesss drm/exynos: support extended screen coordinate of fimd drm/exynos: fix x, y coordinates for right bottom pixel drm/exynos: fix fb offset calculation for plane ...	2012-12-17 08:26:17 -08:00
Chris Wilson	eb119bd612	drm/i915: Access to snooped system memory through the GTT is incoherent We ignore all the user requests to handle flushing to the GTT domain if the user requests such on a snoopable bo, and as such access through the GTT to such pages remains incoherent. The specs even warn that such behaviour is undefined - a strong reason never to do so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-17 12:28:23 +01:00
Mika Kuoppala	9e8e36879f	drm/i915: Set initial seqno value close to wrap boundary To gain confidence in the wrap handling, make it happen quite soon after the boot. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-11 14:07:22 +01:00
Chris Wilson	107f27a5df	drm/i915: Open-code i915_gpu_idle() for handling seqno wrapping The complication is that during seqno wrapping we must be extremely careful not to write to any ring as that will require a new seqno, and so would recurse back into the seqno wrap handler. So we cannot call i915_gpu_idle() as that does additional work beyond simply retiring the current set of requests, and instead must do the minimal work ourselves during seqno wrapping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-11 14:07:03 +01:00
Mika Kuoppala	f72b3435c1	drm/i915: Don't emit semaphore wait if wrap happened If wrap just happened we need to prevent emitting waits for pre wrap values. Detect this and emit no-ops instead. v2: Use olr > seqno to detect wrap instead of *seqno == 0 as suggested by Chris Wilson. v3: Use last used seqno to detect the wraparound. From Chris Wilson v4: Fixed unnecessary last_seqno assigment References: https://bugs.freedesktop.org/show_bug.cgi?id=57967 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-11 13:32:26 +01:00
Linus Torvalds	caf491916b	Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage This reverts commits `a50915394f` and `d7c3b937bd`. This is a revert of a revert of a revert. In addition, it reverts the even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the original commits in linux-next. It turns out that the original patch really was bogus, and that the original revert was the correct thing to do after all. We thought we had fixed the problem, and then reverted the revert, but the problem really is fundamental: waking up kswapd simply isn't the right thing to do, and direct reclaim sometimes simply _is_ the right thing to do. When certain allocations fail, we simply should try some direct reclaim, and if that fails, fail the allocation. That's the right thing to do for THP allocations, which can easily fail, and the GPU allocations want to do that too. So starting kswapd is sometimes simply wrong, and removing the flag that said "don't start kswapd" was a mistake. Let's hope we never revisit this mistake again - and certainly not this many times ;) Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-10 11:03:05 -08:00
Chris Wilson	e9b73c6739	drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages On a machine with bit17 swizzling, we need to store the bit17 of the physical page address in put-pages. This requires a memory allocation, on average less than a page, which may be difficult to satisfy is the request to put-pages is on behalf of the shrinker. We could allow that allocation to pull from the reserved memory pools, but it seems much safer to preallocate the array for tiled objects on affected machines. v2: Export i915_gem_object_needs_bit17_swizzle() for reuse. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-07 01:16:15 +01:00
Mika Kuoppala	498d2ac15c	drm/i915: Add intel_ring_handle_seqno wrap If there are pre-wrap values in semaphore-mbox registers after wrap, syncing against some after-wrap request will complete immediately. Fix this by emitting ring commands to set mbox registers to zero when the wrap happens. v2: Use __intel_ring_begin to emit ring commands, from Chris Wilson. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add a small comment to handle_seqno_wrap.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-06 13:14:34 +01:00
Daniel Vetter	1a240d4de2	drm/i915: fixup sparse warnings - __iomem where there is none (I love how we mix these things up). - Use gfp_t instead of an other plain type. - Unconfuse one place about enum pipe vs enum transcoder - for the pch transcoder we actually use the pipe enum. Fixup the other cases where we assign the pipe to the cpu transcoder with explicit casts. - Declare the mch_lock properly in a header. There is still a decent mess in intel_bios.c about __iomem, but heck, this is x86 and we're allowed to do that. Makes-sparse-happy: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Use a space after the cast consistently and fix up the newly-added cast in i915_irq.c to properly use __iomem.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-03 22:31:04 +01:00
Damien Lespiau	4239ca779d	drm/i915: Fix dieing -> dying typo Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-03 18:25:02 +01:00
Chris Wilson	a2165e3123	drm/i915: Decouple the object from the unbound list before freeing pages As we may actually allocate in order to save the physical swizzling bits during the free, we have to be careful not to trigger the shrinker on the same object. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added a small comment in the code to really drive the scariness of this patch home.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-12-03 17:22:16 +01:00
Chris Wilson	42dcedd4f2	drm/i915: Use a slab for object allocation The primary purpose of this was to debug some use-after-free memory corruption that was causing an OOPS inside drm/i915. As it turned out the corruption was being caused elsewhere and i915.ko as a major user of many objects was being hit hardest. Indeed as we do frequent the generic kmalloc caches, dedicating one to ourselves (or at least naming one for us depending upon the core) aids debugging our own slab usage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-30 23:44:05 +01:00
Chris Wilson	0104fdbb84	drm/i915: Introduce i915_gem_object_create_stolen() Allow for the creation of GEM objects backed by stolen memory. As these are not backed by ordinary pages, we create a fake dma mapping and store the address in the scatterlist rather than obj->pages. v2: Mark _i915_gem_object_create_stolen() as static, as noticed by Jesse Barnes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-30 23:34:16 +01:00
Daniel Vetter	8dcf015eb9	drm/i915: optimize the shmem_pwrite slowpath handling Since we drop dev->struct_mutex when going through the slowpath, the object might have been moved out of the cpu domain. Hence we need to clflush the entire object to ensure that after the ioctl returns, everything is coherent again (interwoven writes are ill-defined anyway). But we only need to do this if we start in the cpu domain and the object requires flushing for coherency. So don't do the flushing if the object is coherent anyway or if we've done in-line clfushing already. v2: i915_gem_clflush_object already checks whether the object is coherent and if so, drops the flushing. Hence we don't need to check that ourselves, simplifying the condition. v3: Reorder the checks for better clarity (and adjust the comment accordingly), suggested by Chris Wilson. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 13:49:08 +01:00
Daniel Vetter	a39a68054f	drm/i915: simplify shmem pwrite/pread slowpath handling The shmem paths for pwrite/pread used a clever trick to hold onto a single page when dropping the big dev->struct_mutex for the slowpath. But this ran the risk of reinstating (or not completely purging) the backing storage when dropping purgeable objects. Hence the code needed to keep track of whether it ever dropped the lock, and if it did, manually check whether it needs to re-purge the backing storage. But thanks to the pages pin count introduced in commit `a5570178c0` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Sep 4 21:02:54 2012 +0100 drm/i915: Pin backing pages whilst exporting through a dmabuf vmap which allowed us to pin the backing storage and remove that page reference trick from shmem_pwrite/read in commit `f60d7f0c1d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Sep 4 21:02:56 2012 +0100 drm/i915: Pin backing pages for pread and commit `755d22184f` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Sep 4 21:02:55 2012 +0100 drm/i915: Pin backing pages for pwrite we can now abolish this check. The slowpath cleanup completely disappears from pread, and for pwrite we're only left with the domain fixup in case someone moved the object out of the cpu domain from under us. A follow-on patch will optimize that a notch more. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 13:48:34 +01:00
Mika Kuoppala	7b01e260a6	drm/i915: Set sync_seqno properly after seqno wrap i915_gem_handle_seqno_wrap() will zero all sync_seqnos but as the wrap can happen inside ring->sync_to(), pre wrap seqno was carried over and overwrote the zeroed sync_seqno. When wrap is handled, all outstanding requests will be retired and objects moved to inactive queue, causing their last_read_seqno to be zero. Use this to update the sync_seqno correctly. RING_SYNC registers after wrap will contain pre wrap values which are >= seqno. So injecting the semaphore wait into ring completes immediately. Original idea for using last_read_seqno from Chris Wilson. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:54 +01:00
Chris Wilson	3e9605018a	drm/i915: Rearrange code to only have a single method for waiting upon the ring Replace the wait for the ring to be clear with the more common wait for the ring to be idle. The principle advantage is one less exported intel_ring_wait function, and the removal of a hardcoded value. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:53 +01:00
Chris Wilson	b662a06632	drm/i915: Simplify flushing activity on the ring As we now always preallocate the seqno before writing to the ring, we can trivially test if we have any pending activity on the ring by inspecting the olr. This makes it then possible to flush operations that are not normally associated with a request, like power-management. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:53 +01:00
Chris Wilson	9d7730914f	drm/i915: Preallocate next seqno before touching the ring Based on the work by Mika Kuoppala, we realised that we need to handle seqno wraparound prior to committing our changes to the ring. The most obvious point then is to grab the seqno inside intel_ring_begin(), and then to reuse that seqno for all ring operations until the next request. As intel_ring_begin() can fail, the callers must already be prepared to handle such failure and so we can safely add further checks. This patch looks like it should be split up into the interface changes and the tweaks to move seqno wrapping from the execbuffer into the core seqno increment. However, I found no easy way to break it into incremental steps without introducing further broken behaviour. v2: Mika found a silly mistake and a subtle error in the existing code; inside i915_gem_retire_requests() we were resetting the sync_seqno of the target ring based on the seqno from this ring - which are only related by the order of their allocation, not retirement. Hence we were applying the optimisation that the rings were synchronised too early, fortunately the only real casualty there is the handling of seqno wrapping. v3: Do not forget to reset the sync_seqno upon module reinitialisation, ala resume. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861 Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:52 +01:00
Chris Wilson	b5d177946a	drm/i915: Wait upon the last request seqno, rather than a future seqno In commit `69c2fc8913` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 20 12:41:03 2012 +0100 drm/i915: Remove the per-ring write list the explicit flush was removed from i915_ring_idle(). However, we continued to wait upon the next seqno which now did not correspond to any request (except for the unusual condition of a failure to queue a request after execbuffer) and so would wait indefinitely. This has an important side-effect that i915_gpu_idle() does not cause the seqno to be incremented. This is vital if we are to be able to idle the GPU to handle seqno wraparound, as in subsequent patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-29 11:43:51 +01:00
Chris Wilson	5774506f15	drm/i915: Borrow our struct_mutex for the direct reclaim If we have hit oom whilst holding our struct_mutex, then currently we cannot reap our own GPU buffers which likely pin most of memory, making an outright OOM more likely. So if we are running in direct reclaim and already hold the mutex, attempt to free buffers knowing that the original function can not continue until we return. v2: Add a note explaining that the mutex may be stolen due to pre-emption, and that is bad. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-21 17:47:14 +01:00
Chris Wilson	8742267af4	drm/i915: Defer assignment of obj->gtt_space until after all possible mallocs As we may invoke the shrinker whilst trying to allocate memory to hold the gtt_space for this object, we need to be careful not to mark the drm_mm_node as activated (by assigning it to this object) before we have finished our sequence of allocations. Note: We also need to move the binding of the object into the actual pagetables down a bit. The best way seems to be to move it out into the callsites. Reported-by: Imre Deak <imre.deak@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added small note to commit message to summarize review discussion.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-21 17:47:13 +01:00
Chris Wilson	c9839303d1	drm/i915: Pin the object whilst faulting it in In order to prevent reaping of the object whilst setting it up to handle the pagefault, we need to mark it as pinned. This has the nice side-effect of eliminating some special cases from the pagefault handler as well! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-21 17:45:04 +01:00
Chris Wilson	fbdda6fb5e	drm/i915: Guard pages being reaped by OOM whilst binding-to-GTT In the circumstances that the shrinker is allowed to steal the mutex in order to reap pages, we need to be careful to prevent it operating on the current object and shooting ourselves in the foot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-21 17:45:04 +01:00
Dave Airlie	9fabd4eede	Merge branch 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: Highlights of this -next round: - ivb fdi B/C fixes - hsw sprite/plane offset fixes from Damien - unified dp/hdmi encoder for hsw, finally external dp support on hsw (Paulo) - kill-agp and some other prep work in the gtt code from Ben - some fb handling fixes from Ville - massive pile of patches to align hsw VGA with the spec and make it actually work (Paulo) - pile of workarounds from Jesse, mostly for vlv, but also some other related platforms - start of a dev_priv reorg, that thing grew out of bounds and chaotic - small bits&pieces all over the place, down to better error handling for load-detect on gen2 (Chris, Jani, Mika, Zhenyu, ...) On top of the previous pile (just copypasta): - tons of hsw dp prep patches form Paulo - round scheduled work items and timers to nearest second (Chris) - some hw workarounds (Jesse&Damien) - vlv dp support and related fixups (Vijay et al.) - basic haswell dp support, not yet wired up for external ports (Paulo) - edp support (Paulo) - tons of refactorings to prepare for the above (Paulo) - panel rework, unifiying code between lvds and edp panels (Jani) - panel fitter scaling modes (Jani + Yuly Novikov) - panel power improvements, should now work without the BIOS setting it up - extracting some dp helpers from radeon/i915 and move them to drm_dp_helper.c - randome pile of workarounds (Damien, Ben, ...) - some cleanups for the register restore code for suspend/resume - secure batchbuffer support, should enable tear-free blits on gen6+ Chris) - random smaller fixlets and cleanups. * 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (231 commits) drm/i915: Restore physical HWS_PGA after resume drm/i915: Report amount of usable graphics memory in MiB drm/i915/i2c: Track users of GMBUS force-bit drm/i915: Allocate the proper size for contexts. drm/i915: Update load-detect failure paths for modeset-rework drm/i915: Clear unused fields of mode for framebuffer creation drm/i915: Always calculate 8xx WM values based on a 32-bpp framebuffer drm/i915: Fix sparse warnings in from AGP kill code drm/i915: Missed lock change with rps lock drm/i915: Move the remaining gtt code drm/i915: flush system agent TLBs on SNB drm/i915: Kill off now unused gen6+ AGP code drm/i915: Calculate correct stolen size for GEN7+ drm/i915: Stop using AGP layer for GEN6+ drm/i915: drop the double-OP_STOREDW usage in blt_ring_flush drm/i915: don't rewrite the GTT on resume v4 drm/i915: protect RPS/RC6 related accesses (including PCU) with a new mutex drm/i915: put ring frequency and turbo setup into a work queue v5 drm/i915: don't block resume on fb console resume v2 drm/i915: extract l3_parity substruct from dev_priv ...	2012-11-20 09:22:35 +10:00
Ben Widawsky	26b1ff35c8	drm/i915: Move the remaining gtt code It's pretty much all consolidated now that we've killed AGP. We can move the one outlier, and defines too. (Kill some unused defines in the process) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-11 23:51:44 +01:00
Ben Widawsky	e76e9aebcd	drm/i915: Stop using AGP layer for GEN6+ As a quick hack we make the old intel_gtt structure mutable so we can fool a bunch of the existing code which depends on elements in that data structure. We can/should try to remove this in a subsequent patch. This should preserve the old gtt init behavior which upon writing these patches seems incorrect. The next patch will fix these things. The one exception is VLV which doesn't have the preserved flush control write behavior. Since we want to do that for all GEN6+ stuff, we'll handle that in a later patch. Mainstream VLV support doesn't actually exist yet anyway. v2: Update the comment to remove the "voodoo" Check that the last pte written matches what we readback v3: actually kill cache_level_to_agp_type since most of the flags will disappear in an upcoming patch v4: v3 was actually not what we wanted (Daniel) Make the ggtt bind assertions better and stricter (Chris) Fix some uncaught errors at gtt init (Chris) Some other random stuff that Chris wanted v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a tad more robust by mapping everything != CACHE_NONE to the cached agp flag - we have a 1:1 uncached mapping, but different modes of cacheable (at least on later generations). Suggested by Chris Wilson.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-11 23:51:42 +01:00
Daniel Vetter	a4da4fa4e5	drm/i915: extract l3_parity substruct from dev_priv Pretty astonishing how far apart these two members landed ... Especially since I've already removed almost 200 lines in between. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-11-11 23:51:40 +01:00
Daniel Vetter	c2fb791692	Linux 3.7-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQgvdwAAoJEHm+PkMAQRiG+3AH/i2XsqqN3VctL0nnbWfvds+Q aKulfIdJTjKiVAsawPUtRqReZ8ijiebrgA/53lZLlrFOoPPQ5+LHmnSyQF6gErOY NuAE1lijXDRM1pwBlhvOBbAj26wUobGjqONFJ9OkKr758Ue8ds/Q7UdxyEgmYgmg tvVMzfRcICzryUV3PcqL+3cNPpCUdT6wGGRJ9DCv/jvGiWKExWhOle5oltrmxk+D NsqRcws5pEubfHE4J8BvNWr8lE1kHfYVhrJETiLJUiN2XAJcbI4Jy7rU/3EGteNS 0HMZdaPPjV874lohdM70X2225SbYrCVkAYB5hnZCTeC3tYyCawBBPMQoyAiOcmU= =+861 -----END PGP SIGNATURE----- Merge tag 'v3.7-rc2' into drm-intel-next-queued Linux 3.7-rc2 Backmerge to solve two ugly conflicts: - uapi. We've already added new ioctl definitions for -next. Do I need to say more? - wc support gtt ptes. We've had to revert this for snb+ for 3.7 and also fix a few other things in the code. Now we know how to make it work on snb+, but to avoid losing the other fixes do the backmerge first before re-enabling wc gtt ptes on snb+. And a few other minor things, among them git getting confused in intel_dp.c and seemingly causing a conflict out of nothing ... Conflicts: drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_dp.c drivers/gpu/drm/i915/intel_modes.c include/drm/i915_drm.h Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-22 14:34:51 +02:00
Dave Airlie	64acba6a7a	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes Daniel writes: The big thing is the disabling of the hsw support by default, cc: stable. We've aimed for basic hsw support in 3.6, but due to a few bad happenstances we've screwed up and only 3.8 will have better modeset support than vesa. To avoid yet another round of fallout from such a gaffle on for the next platform we've added a module option to disable early hw support by default. That should also give us more flexibility in bring-up. Otherwise just small fixes: - 3 fixes from Egbert for sdvo corner cases - invert-brightness quirk entry from Egbert - revert a dp link training change, it regresses some setups - and shut up a spurious WARN in our gem fault handler. - regression fix for an oops on bit17 swizzling machines, introduce in 3.7 - another no-lvds quirk * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle() drm/i915: Add no-lvds quirk for Supermicro X7SPA-H drm/i915: Insert i915_preliminary_hw_support variable. drm/i915: shut up spurious WARN in the gtt fault handler Revert "drm/i915: Try harder to complete DP training pattern 1" DRM/i915: Restore sdvo_flags after dtd->mode->dtd Roundrtrip. DRM/i915: Don't clone SDVO LVDS with analog. DRM/i915: Add QUIRK_INVERT_BRIGHTNESS for NCR machines. DRM/i915: Don't delete DPLL Multiplier during DAC init.	2012-10-22 09:55:48 +10:00
Chris Wilson	74ce6b6c63	drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle() If we leave obj->pages set to NULL before attempting to deswizzle them, then an OOPS is well deserved. Fixes regression introduced in commit `9da3da660d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 1 15:20:22 2012 +0100 drm/i915: Replace the array of pages with a scatterlist Reported-and-tested-by: Krzysztof Kolasa Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-19 21:52:52 +02:00
Daniel Vetter	a7c2e1aad6	drm/i915: shut up spurious WARN in the gtt fault handler -ENOSPC can happen if userspace is being simplistic and tries to map a too big object. To aid further spurious WARN debugging, also print out the error code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56017 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-17 11:56:40 +02:00
Dave Airlie	3459f62047	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes Daniel writes: "- some register magic to fix hsw crw (Paulo&Ben) - fix backlight destruction for cpu edp (Jani) - fix gen ch7xxx dvo ->get_hw_state - fixup the plane->pipe fixup code, the broken version massively angers the modeset sanity checks - kill pipe A quirk for i855gm, otherwise I get a black screen with the above patch - fixup for gem_get_page helper (Chris) - fixup guardband clipping w/a (Ken), without this mesa master can erronously drop vertices on snb, mesa 9.0 has the optimization reverted - another pageflip vs. modeset fix - kill bogus BUG_ON which broke ums+gem from Willy Tarreau (gasp, people are still using this!)" * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: fix non-DP-D eDP backlight cleanup and module reload drm/i915: HSW CRW stability magic drm/i915/dvo-ch7xxx: fix get_hw_state drm/i915: fixup the plane->pipe fixup code drm/i915: rip out the pipe A quirk for i855gm drm/i915: disable wc gtt pte mappings on gen2 drm/i915: fixup i915_gem_object_get_page inline helper drm/i915: Disallow preallocation of requests drm/i915: Set guardband clipping workaround bit in the right register. drm/i915: paper over a pipe-enable vs pageflip race drm/i915: remove useless BUG_ON which caused a regression in 3.5.	2012-10-16 10:11:59 +10:00
Rodrigo Vivi	eda2d7f582	drm/i915: HSW CRW stability magic This magic brings stability to HSW CRW machines. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-12 10:59:11 +02:00
Chris Wilson	acb868d3d7	drm/i915: Disallow preallocation of requests The intention was to allow the caller to avoid a failure to queue a request having already written commands to the ring. However, this is a moot point as the i915_add_request() can fail for other reasons than a mere allocation failure and those failure cases are more likely than ENOMEM. So the overlay code already had to handle i915_add_request() failures, and due to commit `3bb73aba1e` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 20 12:40:59 2012 +0100 drm/i915: Allow late allocation of request for i915_add_request() the error handling code in intel_overlay.c was subject to causing double-frees, as found by coverity. Rather than further complicate i915_add_request() and callers, realise the battle is lost and adapt intel_overlay.c to take advantage of the late allocation of requests. v2: Handle callers passing in a NULL seqno. v3: Ditto. This time for sure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-12 10:59:09 +02:00
Chris Wilson	bcb4508616	drm/i915: Align the retire_requests worker to the nearest second By using round_jiffies() we can align the wakeup of our worker to the nearest second in order to batch wakeups and reduce system load, which is useful for unimportant coarse tasks like our retire_requests. v2: round_jiffies_relative() already returns the relative timeout value, so no need to incorrectly perform the subtraction twice. The timer interface still leaves the possibility for the value of jiffies to change be we program the timer. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-08 18:45:21 +02:00
Chris Wilson	cecc21fea9	drm/i915: Align the hangcheck wakeup to the nearest second round_jiffies() aligns the wakeup time to the nearest second in order to batch wakeups and reduce system load, which is useful for unimportant coarse timers like our hangcheck. v2: round_jiffies_relative() returns the relative jiffie value, whereas we need the absolute value for the timer. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-08 18:44:36 +02:00
Willy Tarreau	c77d7162a7	drm/i915: remove useless BUG_ON which caused a regression in 3.5. starting an old X server causes a kernel BUG since commit `1b50247a8d`: ------------[ cut here ]------------ kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3661! invalid opcode: 0000 [#1] SMP Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo +videobuf2_core videodev videobuf2_vmalloc videobuf2_memops uhci_hcd ath9k mac80211 snd_hda_codec_realtek ath9k_common microcode +ath9k_hw psmouse serio_raw sg ath cfg80211 atl1c lpc_ich mfd_core ehci_hcd snd_hda_intel snd_hda_codec snd_hwdep snd_pcm rtc_cmos +snd_timer snd evdev eeepc_laptop snd_page_alloc sparse_keymap Pid: 2866, comm: X Not tainted 3.5.6-rc1-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA EIP: 0060:[<c12dc291>] EFLAGS: 00013297 CPU: 0 EIP is at i915_gem_entervt_ioctl+0xf1/0x110 EAX: f5941df4 EBX: f5940000 ECX: 00000000 EDX: 00020000 ESI: f5835400 EDI: 00000000 EBP: f51d7e38 ESP: f51d7e20 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: b760e0a0 CR3: 351b6000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process X (pid: 2866, ti=f51d6000 task=f61af8d0 task.ti=f51d6000) Stack: 00000001 00000000 f5835414 f51d7e84 f5835400 f54f85c0 f51d7f10 c12b530b 00000001 c151b139 c14751b6 c152e030 00000b32 00006459 00000059 0000e200 00000001 00000000 00006459 c159ddd0 c12dc1a0 ffffffea 00000000 00000000 Call Trace: [<c12b530b>] drm_ioctl+0x2eb/0x440 [<c12dc1a0>] ? i915_gem_init+0xe0/0xe0 [<c1052b2b>] ? enqueue_hrtimer+0x1b/0x50 [<c1053321>] ? __hrtimer_start_range_ns+0x161/0x330 [<c10530b3>] ? lock_hrtimer_base+0x23/0x50 [<c1053163>] ? hrtimer_try_to_cancel+0x33/0x70 [<c12b5020>] ? drm_version+0x90/0x90 [<c10ca171>] vfs_ioctl+0x31/0x50 [<c10ca2e4>] do_vfs_ioctl+0x64/0x510 [<c10535de>] ? hrtimer_nanosleep+0x8e/0x100 [<c1052c20>] ? update_rmtp+0x80/0x80 [<c10ca7c9>] sys_ioctl+0x39/0x60 [<c1433949>] syscall_call+0x7/0xb Code: 83 c4 0c 5b 5e 5f 5d c3 c7 44 24 04 2c 05 53 c1 c7 04 24 6f ef 47 c1 e8 6e e0 fd ff c7 83 38 1e 00 00 00 00 00 00 e9 3f ff ff +ff <0f> 0b eb fe 0f 0b eb fe 8d b4 26 00 00 00 00 0f 0b eb fe 8d b6 EIP: [<c12dc291>] i915_gem_entervt_ioctl+0xf1/0x110 SS:ESP 0068:f51d7e20 ---[ end trace dd332ec083cbd513 ]--- The crash happens here in i915_gem_entervt_ioctl() : 3659 BUG_ON(!list_empty(&dev_priv->mm.active_list)); 3660 BUG_ON(!list_empty(&dev_priv->mm.flushing_list)); -> 3661 BUG_ON(!list_empty(&dev_priv->mm.inactive_list)); 3662 mutex_unlock(&dev->struct_mutex); Quoting Chris : "That BUG_ON there is silly and can simply be removed. The check is to verify that no batches were submitted to the kernel whilst the UMS/GEM client was suspended - to which the BUG_ONs are a crude approximation. Furthermore, the checks are too late, since it means we attempted to program the hardware whilst it was in an invalid state, the BUG_ONs are the least of your concerns at that point." Note that this regression has been introduced in commit `1b50247a8d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 24 15:47:30 2012 +0100 drm/i915: Remove the list of pinned inactive objects Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Willy Tarreau <w@1wt.eu> [danvet: Added note about the regressing commit and cc: stable.] Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-07 22:57:25 +02:00
Dave Airlie	1f31c69dac	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: Bigger -fixes pile, mostly because I've included Ajax' DP dongle stuff, as discussed on irc. Otherwise just small things: - regression fix to finally make 6bpc auto-dither on dp work (Jani) - reinstate an snb ctx w/a that accidentally got lost in a rework (Chris) - fixup the DP train sequence, logic-goof-up uncovered by Coverty (Chris) - fix set_caching locking (Ben) - fix spurious segfault on con-current gtt mmap faulting (Dimitry and Mika) - some pageflip correctness fixes (still hunting down some issues, but these are the worst offenders of confused code that we've tracked down thus far) from Chris and me - fixup swizzling settings on vlv (Jesse) - gt_mode w/a from Ben added, fixes snb gt1 rc6+hw ctx hangs. * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: Fix GT_MODE default value drm/i915: don't frob the vblank ts in finish_page_flip drm/i915: call drm_handle_vblank before finish_page_flip drm/i915: print warning if vmi915_gem_fault error is not handled drm/i915: EBUSY status handling added to i915_gem_fault(). drm/i915: Try harder to complete DP training pattern 1 drm/i915: set swizzling to none on VLV drm/dp: Make sink count DP 1.2 aware drm/dp: Document DP spec versions for various DPCD registers drm/i915/dp: Be smarter about connection sense for branch devices drm/i915/dp: Fetch downstream port info if needed during DPCD fetch drm/dp: Update DPCD defines drm: Export drm_probe_ddc() drm/i915: Flush the pending flips on the CRTC before modification drm/i915: Actually invalidate the TLB for the SandyBridge HW contexts w/a drm/i915: Fix set_caching locking drm/i915: use adjusted_mode instead of mode for checking the 6bpc force flag	2012-10-07 21:13:54 +10:00
Mika Kuoppala	4d0f817e74	drm/i915: print warning if vmi915_gem_fault error is not handled Falling into default case in vmi915_gem_fault is a bug. Be more verbose about it. Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-04 10:33:42 +02:00
Dmitry Rogozhkin	e79e0fe380	drm/i915: EBUSY status handling added to i915_gem_fault(). Subsequent threads returning EBUSY from vm_insert_pfn() was not handled correctly. As a result concurrent access from new threads to mmapped data caused SIGBUS. Note that this fixes i-g-t/tests/gem_threaded_tiled_access. Tested-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-10-04 10:33:42 +02:00
Linus Torvalds	612a9aab56	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm merge (part 1) from Dave Airlie: "So first of all my tree and uapi stuff has a conflict mess, its my fault as the nouveau stuff didn't hit -next as were trying to rebase regressions out of it before we merged. Highlights: - SH mobile modesetting driver and associated helpers - some DRM core documentation - i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write combined pte writing, ilk rc6 support, - nouveau: major driver rework into a hw core driver, makes features like SLI a lot saner to implement, - psb: add eDP/DP support for Cedarview - radeon: 2 layer page tables, async VM pte updates, better PLL selection for > 2 screens, better ACPI interactions The rest is general grab bag of fixes. So why part 1? well I have the exynos pull req which came in a bit late but was waiting for me to do something they shouldn't have and it looks fairly safe, and David Howells has some more header cleanups he'd like me to pull, that seem like a good idea, but I'd like to get this merge out of the way so -next dosen't get blocked." Tons of conflicts mostly due to silly include line changes, but mostly mindless. A few other small semantic conflicts too, noted from Dave's pre-merged branch. * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (447 commits) drm/nv98/crypt: fix fuc build with latest envyas drm/nouveau/devinit: fixup various issues with subdev ctor/init ordering drm/nv41/vm: fix and enable use of "real" pciegart drm/nv44/vm: fix and enable use of "real" pciegart drm/nv04/dmaobj: fixup vm target handling in preparation for nv4x pcie drm/nouveau: store supported dma mask in vmmgr drm/nvc0/ibus: initial implementation of subdev drm/nouveau/therm: add support for fan-control modes drm/nouveau/hwmon: rename pwm0* to pmw1* to follow hwmon's rules drm/nouveau/therm: calculate the pwm divisor on nv50+ drm/nouveau/fan: rewrite the fan tachometer driver to get more precision, faster drm/nouveau/therm: move thermal-related functions to the therm subdev drm/nouveau/bios: parse the pwm divisor from the perf table drm/nouveau/therm: use the EXTDEV table to detect i2c monitoring devices drm/nouveau/therm: rework thermal table parsing drm/nouveau/gpio: expose the PWM/TOGGLE parameter found in the gpio vbios table drm/nouveau: fix pm initialization order drm/nouveau/bios: check that fixed tvdac gpio data is valid before using it drm/nouveau: log channel debug/error messages from client object rather than drm client drm/nouveau: have drm debugging macros build on top of core macros ...	2012-10-03 23:29:23 -07:00
David Howells	760285e7e7	UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/ Convert #include "..." to #include <path/...> in drivers/gpu/. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:07 +01:00
David Howells	4126d5d61f	UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant DRM UAPI header #inclusions from drivers/gpu/. Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding patch. Without this patch and the patch to make include the UAPI headers from the core headers, after the UAPI split, the DRM C sources cannot find these UAPI headers because the DRM code relies on specific -I flags to make #include "..." work on headers in include/drm/ - but that does not work after the UAPI split without adding more -I flags. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>	2012-10-02 18:01:05 +01:00
Ben Widawsky	3bc2913e2c	drm/i915: Fix set_caching locking On the EINVAL case we don't release struct_mutex. It should be safe to grab the lock after checking the parameters, which also resolves the issues. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-27 08:45:11 +02:00
Ben Widawsky	199adf40ae	drm/i915: s/cacheing/caching/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-26 09:24:36 +02:00
Daniel Vetter	398b7a1b88	Linux 3.6-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJQX7MuAAoJEHm+PkMAQRiG0h0IAJURkrMCAQUxA+Ik66ReH89s LQcVd0U9uL4UUOi7f5WR64Vf9Cfu6VVGX9ZKSvjpNskvlQaUQPMIt4pMe6g4X4dI u0bApEy4XZz3nGabUAghIU8jJ8cDmhCG6kPpSiS7pi7KHc0yIa4WFtJRrIpGaIWT xuK38YOiOHcSDRlLyWZzainMncQp/ixJdxnqVMTonkVLk0q0b84XzOr4/qlLE5lU i+TsK3PRKdQXgvZ4CebL+srPBwWX1dmgP3VkeBloQbSSenSeELICbFWavn2ml+sF GXi4dO93oNquL/Oy5SwI666T4uNcrRPaS+5X+xSZgBW/y2aQVJVJuNZg6ZP/uWk= =0v2l -----END PGP SIGNATURE----- Merge tag 'v3.6-rc7' into drm-intel-next-queued Manual backmerge of -rc7 to resolve a silent conflict leading to compile failure in drivers/gpu/drm/i915/intel_hdmi.c. This is due to the bugfix in -rc7: commit `b98b601672` Author: Wang Xingchao <xingchao.wang@intel.com> Date: Thu Sep 13 07:43:22 2012 +0800 drm/i915: HDMI - Clear Audio Enable bit for Hot Plug Since this code moved around a lot in -next git put that snippet at the wrong spot. I've tried to fix this by making the conflict explicit by merging a version for next with: commit `3cce574f01` Author: Wang Xingchao <xingchao.wang@intel.com> Date: Thu Sep 13 11:19:00 2012 +0800 drm/i915: HDMI - Clear Audio Enable bit for Hot Plug unconditionally But that failed to solve the entire problem. To avoid pushing out further -nightly branch to our QA where this is broken, do the backmerge and manually add the stuff git adds to -next from the patch in -fixes. Note that this doesn't show up in git's merge diff (and hence is also not handled by git rerere), which adds to the reasons why I'd like to fix this with a verbose backmerge. The git merge diff only shows a bunch of trivial conflicts of the "code changed in lines next to each another" kind. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-24 18:17:12 +02:00
Chris Wilson	2f745ad3d3	drm/i915: Convert the dmabuf object to use the new i915_gem_object_ops By providing a callback for when we need to bind the pages, and then release them again later, we can shorten the amount of time we hold the foreign pages mapped and pinned, and importantly the dmabuf objects then behave as any other normal object with respect to the shrinker and memory management. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:23:10 +02:00
Chris Wilson	9da3da660d	drm/i915: Replace the array of pages with a scatterlist Rather than have multiple data structures for describing our page layout in conjunction with the array of pages, we can migrate all users over to a scatterlist. One major advantage, other than unifying the page tracking structures, this offers is that we replace the vmalloc'ed array (which can be up to a megabyte in size) with a chain of individual pages which helps reduce memory pressure. The disadvantage is that we then do not have a simple array to iterate, or to access randomly. The common case for this is in the relocation processing, which will typically fit within a single scatterlist page and so be almost the same cost as the simple array. For iterating over the array, the extra function call could be optimised away, but in reality is an insignificant cost of either binding the pages, or performing the pwrite/pread. v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the trivial compile error from rebasing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:57 +02:00
Chris Wilson	f60d7f0c1d	drm/i915: Pin backing pages for pread By using the recently introduced pinning of pages, we can safely drop the mutex in the knowledge that the pages are not going to disappear beneath us, and so we can simplify the code for iterating over the pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:57 +02:00
Chris Wilson	755d22184f	drm/i915: Pin backing pages for pwrite By using the recently introduced pinning of pages, we can safely drop the mutex in the knowledge that the pages are not going to disappear jeneath us, and so we can simplify the code for iterating over the pages. Note: The old code had such complicated page refcounting since it used obj->pages as a micro-optimization if it's there, but that could (before this patch) disappear when we drop the dev->struct_mutex. Hence some manual page refcounting was required for the slow path, complicated by the fact that pages returned by shmem_read_mapping_page already have a pageref, which needs to be dropped again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Added note to explain the question Ben raised in review.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:56 +02:00
Chris Wilson	a5570178c0	drm/i915: Pin backing pages whilst exporting through a dmabuf vmap We need to refcount our pages in order to prevent reaping them at inopportune times, such as when they currently vmapped or exported to another driver. However, we also wish to keep the lazy deallocation of our pages so we need to take a pin/unpinned approach rather than a simple refcount. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:56 +02:00
Chris Wilson	37e680a15f	drm/i915: Introduce drm_i915_gem_object_ops In order to specialise functions depending upon the type of object, we can attach vfuncs to each object via a new ->ops pointer. For instance, this will be used in future patches to only bind pages from a dma-buf for the duration that the object is used by the GPU - and so prevent them from pinning those pages for the entire of the object. v2: Bonus comments. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-20 14:22:55 +02:00
Chris Wilson	7e81a42e34	drm/i915: Reduce a pin-leak BUG into a WARN Pin-leaks persist and we get the perennial bug reports of machine lockups to the BUG_ON(pin_count==MAX). If we instead loudly report that the object cannot be pinned at that time it should prevent the driver from locking up, and hopefully restore a semblance of working whilst still leaving us a OOPS to debug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-09-17 10:12:57 +02:00
Dave Airlie	65983bd605	Merge branch 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Daniel writes: "New stuff for -next. Highlights: - prep patches for the modeset rework. Note that one of those patches touches the fb helper in the common drm code. - hasw hdmi audio support (Wang Xingchao) - improved instdone dumping for gen7 (Ben) - unbound tracking and a few follow-up patches from Chris - dma_buf->begin/end_cpu_access plus fix for drm/udl (Dave) - improve mmio error reporting for hsw - prep patch for WQ_NON_REENTRANT removal (Tejun Heo) " * 'for-airlied' of git://people.freedesktop.org/~danvet/drm-intel: (41 commits) drm/i915: Remove __GFP_NO_KSWAPD drm/i915: disable rc6 on ilk when vt-d is enabled drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer drm/i915: Use new INSTDONE registers (Gen7+) drm/i915: Add new INSTDONE registers drm/i915: Extract reading INSTDONE drm/i915: Use a non-blocking wait for set-to-domain ioctl drm/i915: Juggle code order to ease flow of the next patch drm/i915: Use cpu relocations if the object is in the GTT but not mappable drm/i915: Extract general object init routine drm/i915: Protect private gem objects from truncate (such as imported dmabuf) drm/i915: Only pwrite through the GTT if there is space in the aperture i915: use alloc_ordered_workqueue() instead of explicit UNBOUND w/ max_active = 1 drm/i915: Find unclaimed MMIO writes. drm/i915: Add ERR_INT to gen7 error state drm/i915: Cantiga+ cannot handle a hsync front porch of 0 drm/i915: fix reassignment of variable "intel_dp->DP" drm/i915: Try harder to allocate an mmap_offset drm/i915: Show pin count in debugfs drm/i915: Show (count, size) of purgeable objects in i915_gem_objects ...	2012-09-03 12:05:01 +10:00
Sedat Dilek	d7c3b937bd	drm/i915: Remove __GFP_NO_KSWAPD When I pulled-in today's drm-intel-next into linux-next (next-20120824) I saw this build-breakage: drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_object_get_pages_gtt': drivers/gpu/drm/i915/i915_gem.c:1778:40: error: '__GFP_NO_KSWAPD' undeclared (first use in this function) drivers/gpu/drm/i915/i915_gem.c:1778:40: note: each undeclared identifier is reported only once for each function it appears in This is caused by commit ba099ef165f8 ("mm: remove __GFP_NO_KSWAPD") and commit b6beae2c2014 ("mm: remove __GFP_NO_KSWAPD fixes") in linux-next (next-20120824). Fix this by removing __GFP_NO_KSWAPD from drm/i915 driver. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-27 17:11:38 +02:00
Dave Airlie	93bb70e0c0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next There was some merge conflicts in -next and they weren't so pretty, so backmerge now to avoid them. Conflicts: drivers/gpu/drm/i915/i915_gem.c drivers/gpu/drm/i915/intel_modes.c	2012-08-27 16:22:20 +10:00
Chris Wilson	3236f57a01	drm/i915: Use a non-blocking wait for set-to-domain ioctl The principal use for set-to-domain is for userspace to serialise operations with a particular buffer, for example to maintain coherency with a CPU map or to ratelimit its rendering by waiting on all previous operations before continuing. As such we tend to hold the struct_mutex for long periods during the synchronisation and so cause contention issues with other users of the graphics device, even for independent operations as memory management. An example is the contention between compiz and X which causes jitter in the display and a drop in peak throughput. The ultimate solution would be a set of fine grained locks and lockless operations, but an intermediate step is to first attempt the synchronisation for set-to-domain without holding the mutex. This introduces a number of race conditions, so we limit it use to the ioctl periphery where we have no dependent state and can safely complete with a locked synchronisation afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 11:12:56 +02:00
Chris Wilson	b361237bcc	drm/i915: Juggle code order to ease flow of the next patch Move the wait-for-rendering logic around in the file so that we can group it together with the subsequent variations. The general goal is to have the lower level routines clustered together and then the higher level logic building upon those low level routines that came before. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 11:12:53 +02:00
Chris Wilson	0327d6ba99	drm/i915: Extract general object init routine As we wish to create specialised object constructions in the near future that share the same basic GEM object struct, export the default initializer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:04:38 +02:00
Chris Wilson	4d6294bf77	drm/i915: Protect private gem objects from truncate (such as imported dmabuf) If the object has no backing shmemfs filp, then we obviously cannot perform a truncation operation upon it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:04:31 +02:00
Chris Wilson	86a1ee26bb	drm/i915: Only pwrite through the GTT if there is space in the aperture Avoid stalling and waiting for the GPU by checking to see if there is sufficient inactive space in the aperture for us to bind the buffer prior to writing through the GTT. If there is inadequate space we will have to stall waiting for the GPU, and incur overheads moving objects about. Instead, only incur the clflush overhead on the target object by writing through shmem. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-24 02:03:33 +02:00
Chris Wilson	d8cb508669	drm/i915: Try harder to allocate an mmap_offset Given the persistence of an offset for the lifetime of an object, itis easy to contemplate how the mmap space becomes badly fragmented to the point that further allocations fail with ENOSPC. Our only recourse at this point is to try to purge the objects to release some space and reattempt the allocation. References: https://bugs.freedesktop.org/show_bug.cgi?id=39552 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-21 14:34:36 +02:00
Chris Wilson	c4670ad080	drm/i915: Add some sanity checks to unbound tracking A pair of universally true checks that just need to be put in the right place depending on where in the patch sequence you go. Note that i915_gem_object_put_pages_gtt() already gains the BUG_ON(obj->gtt_space), but on reflection that needed to migrate to put_pages(). Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-21 14:34:20 +02:00
Chris Wilson	6c085a728c	drm/i915: Track unbound pages When dealing with a working set larger than the GATT, or even the mappable aperture when touching through the GTT, we end up with evicting objects only to rebind them at a new offset again later. Moving an object into and out of the GTT requires clflushing the pages, thus causing a double-clflush penalty for rebinding. To avoid having to clflush on rebinding, we can track the pages as they are evicted from the GTT and only relinquish those pages on memory pressure. As usual, if it were not for the handling of out-of-memory condition and having to manually shrink our own bo caches, it would be a net reduction of code. Alas. Note: The patch also contains a few changes to the last-hope evict_everything logic in i916_gem_execbuffer.c - we no longer try to only evict the purgeable stuff in a first try (since that's superflous and only helps in OOM corner-cases, not fragmented-gtt trashing situations). Also, the extraction of the get_pages retry loop from bind_to_gtt (and other callsites) to get_pages should imo have been a separate patch. v2: Ditch the newly added put_pages (for unbound objects only) in i915_gem_reset. A quick irc discussion hasn't revealed any important reason for this, so if we need this, I'd like to have a git blame'able explanation for it. v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Split out code movements and rant a bit in the commit message with a few Notes. Done v2] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-21 14:34:11 +02:00
Daniel Vetter	225067eedf	drm/i915: move functions around Prep work to make Chris Wilson's unbound tracking patch a bit easier to read. Alas, I'd have preferred that moving the page allocation retry loop from bind to get_pages would have been a separate patch, too. But that looks like real work ;-) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-20 10:59:41 +02:00
Ben Widawsky	b6c7488df6	drm/i915/contexts: fix list corruption After reset we unconditionally reinitialize lists. If the context switch hasn't yet completed before the suspend, the default context object will end up on lists that are going to go away when we resume. The patch forces the context switch to be synchronous before suspend assuring that the active/inactive tracking is correct at the time of resume. References: https://bugs.freedesktop.org/show_bug.cgi?id=52429 Tested-by: Guang A Yang <guang.a.yang@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-17 09:21:34 +02:00
Chris Wilson	b2eadbc85b	drm/i915: Lazily apply the SNB+ seqno w/a Avoid the forcewake overhead when simply retiring requests, as often the last seen seqno is good enough to satisfy the retirment process and will be promptly re-run in any case. Only ensure that we force the coherent seqno read when we are explicitly waiting upon a completion event to be sure that none go missing, and also for when we are reporting seqno values in case of error or debugging. This greatly reduces the load for userspace using the busy-ioctl to track active buffers, for instance halving the CPU used by X in pushing the pixels from a software render (flash). The effect will be even more magnified with userptr and so providing a zero-copy upload path in that instance, or in similar instances where X is simply compositing DRI buffers. v2: Reverse the polarity of the tachyon stream. Daniel suggested that 'force' was too generic for the parameter name and that 'lazy_coherency' better encapsulated the semantics of it being an optimization and its purpose. Also notice that gen6_get_seqno() is only used by gen6/7 chipsets and so the test for IS_GEN6 \|\| IS_GEN7 is redundant in that function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-08-10 11:11:32 +02:00
Chris Wilson	e6994aeedc	drm/i915: Export ability of changing cache levels to userspace By selecting the cache level (essentially whether or not the CPU snoops any updates to the bo, and on more recent machines whether it resides inside the CPU's last-level-cache) a userspace driver is able to then manage all of its memory within buffer objects, if it so desires. This enables the userspace driver to accelerate uploads and more importantly downloads from the GPU and to able to mix CPU and GPU rendering/activity efficiently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added code comment about where we plan to stuff platform specific cacheing control bits in the ioctl struct.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-26 12:56:25 +02:00
Chris Wilson	42d6ab4839	drm/i915: Segregate memory domains in the GTT using coloring Several functions of the GPU have the restriction that differing memory domains cannot be placed next to each other (as the GPU may prefetch beyond the end of one domain and hang as it crosses into the other domain). We use the facility of the drm_mm to mark ranges with a particular color that corresponds to the cache attributes of those pages in order to prevent allocating adjacent blocks of differing memory types. v2: Rebase ontop of drm_mm coloring v2. v3: Fix rebinding existing gtt_space and add a verification routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-26 12:56:25 +02:00
Chris Wilson	f047e395dd	drm/i915: Avoid concurrent access when marking the device as idle/busy As suggested by Daniel, rip out the independent timers for device and crtc busyness and integrate the manual powermanagement of the display engine into the GEM core and its request tracking. The benefits are that the code is a lot smaller, fewer moving parts and should fit more neatly into the overall activity tracking of the driver. v2: Complete overhaul and removal of the racy timers and workers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:56 +02:00
Chris Wilson	a7b9761d0a	drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:55 +02:00
Chris Wilson	26b9c4a57f	drm/i915: Remove the explicit flush of the GPU write domain Rely instead on the insertion of the implicit flush before the seqno breadcrumb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:54 +02:00
Chris Wilson	86d5bc3782	drm/i915: Remove explicit flush from i915_gem_object_flush_fence() As the flush is either performed explictly immediately after the execbuffer dispatch, or before the serialisation of last_fenced_seqno we can forgo the explict i915_gem_flush_ring(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Chris Wilson	69c2fc8913	drm/i915: Remove the per-ring write list This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:53 +02:00
Chris Wilson	65ce302741	drm/i915: Remove the defunct flushing list As we guarantee to emit a flush before emitting the breadcrumb or the next batchbuffer, there is no further need for the flushing list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:52 +02:00
Chris Wilson	0201f1ecf4	drm/i915: Replace the pending_gpu_write flag with an explicit seqno As we always flush the GPU cache prior to emitting the breadcrumb, we no longer have to worry about the deferred flush causing the pending_gpu_write to be delayed. So we can instead utilize the known last_write_seqno to hopefully minimise the wait times. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:52 +02:00
Chris Wilson	e5f1d962a8	drm/i915: Remove assertion over write domain after i915_gem_object_sync() As we move to lazily clearing the GPU write domain only when the buffer becomes inactive, this leaves a window of opportunity for i915_gem_object_pin_to_display_plane() to detect a seemingly inconsistent value. This function is special as it tries to pipeline the operation to avoid the stall and so may not retires the buffer and we may not get the opportunity to clear the write domain. However, we know all is good, so drop the assertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:51 +02:00
Chris Wilson	3bb73aba1e	drm/i915: Allow late allocation of request for i915_add_request() Request preallocation was added to i915_add_request() in order to support the overlay. However, not all users care and can quite happily ignore the failure to allocate the request as they will simply repeat the request in the future. By pushing the allocation down into i915_add_request(), we can then remove some rather ugly error handling in the callers. v2: Nullify request->file_priv otherwise we chase a garbage pointer when retiring requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:51 +02:00
Chris Wilson	e9808edd98	drm/i915: Return a mask of the active rings in the high word of busy_ioctl The intention is to help select which engine to use for copies with interoperating clients - such as a GL client making a request to the X server to perform a SwapBuffers, which may require copying from the active GL back buffer to the X front buffer. We choose to report a mask of the active rings to future proof the interface against any changes which may allow for the object to reside upon multiple rings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: bikeshed away the write ring mask and add the explanation Chris sent in a follow-up mail why we decided to use masks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 18:23:50 +02:00
Chris Wilson	eeef9b3874	drm/i915: Add -EIO to the list of known errors for __wait_seqno This prevents a WARN introduced with commit `de2b998552` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 4 22:52:50 2012 +0200 drm/i915: don't return a spurious -EIO from intel_ring_begin Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-25 10:39:57 +02:00
Chris Wilson	67b1b57182	drm/i915: Disable the BLT on pre-production SNB hardware It never quite worked despite the numerous workarounds, yet I still see people trying to use this hardware and filing bug reports. As we no longer even try to implement the workarounds, since `6a233c7887` (drm/i915/ringbuffer: kill snb blt workaround), simply disable the ring. v2: Add a message to inform the user about the limited capabilities of their pre-production hardware. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-20 12:21:37 +02:00
Chris Wilson	6b9d89b436	drm: Add colouring to the range allocator In order to support snoopable memory on non-LLC architectures (so that we can bind vgem objects into the i915 GATT for example), we have to avoid the prefetcher on the GPU from crossing memory domains and so prevent allocation of a snoopable PTE immediately following an uncached PTE. To do that, we need to extend the range allocator with support for tracking and segregating different node colours. This will be used by i915 to segregate memory domains within the GTT. v2: Now with more drm_mm helpers and less driver interference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@gmail.com>	2012-07-16 05:59:37 +10:00
Daniel Vetter	a9340ccab5	drm/i915: properly SIGBUS on I/O errors ... instead of looping endless with no hope of ever serving that page-fault. We only need to break out of this loop when the gpu died, to run the reset work (and hopefully resurrect it). To clarify questions Chris raised on irc: This is about handling I/O errors not from our own code, but e.g. when the disk died when trying to swap in a gem bo. So this patch remidies the issue that the current handling only handles gpu-death-induced cases of -EIO. Admittedly, dying disks are much rarer than hanging gpus ...To clarify questions Chris raised on irc: This is about handling I/O errors not from our own code, but e.g. when the disk died when trying to swap in a gem bo. So this patch remidies the issue that the current handling only handles gpu-death-induced cases of -EIO. Admittedly, dying disks are much rarer than hanging gpus ... This seems to have been lost in: commit `d9bc7e9f32` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 7 13:09:31 2011 +0000 drm/i915: Fix infinite loop regression from `21dd3734` Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-05 10:03:01 +02:00
Daniel Vetter	0a6759c6ba	drm/i915: don't hang userspace when the gpu reset is stuck With the gpu reset no longer using a trylock we've increased the chances of userspace getting stuck quite a bit. To make that (hopefully) rare case more paletable time out when waiting for the gpu reset code to complete and signal this little issue to the caller by returning -EIO. This should help userspace to somewhat gracefully fall back and hopefully allow the user to grab some logs and reboot the machine (instead of staring at a frozen X screen in agony). Suggested by Chris Wilson because I've been stubborn about allowing the gpu reset code no to fail, ever (by removing the trylock). Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-05 10:02:24 +02:00
Daniel Vetter	d6b2c790a4	drm/i915: non-interruptible sleeps can't handle -EAGAIN So don't return -EAGAIN, even in the case of a gpu hang. Remap it to -EIO instead. Note that this isn't really an issue with interruptability, but more that we have quite a few codepaths (mostly around kms stuff) that simply can't handle any errors and hence not even -EAGAIN. Instead of adding proper failure paths so that we could restart these ioctls we've opted for the cheap way out of sleeping non-interruptibly. Which works everywhere but when the gpu dies, which this patch fixes. So essentially interruptible == false means 'wait for the gpu or die trying'.' This patch is a bit ugly because intel_ring_begin is all non-interruptible and hence only returns -EIO. But as the comment in there says, auditing all the callsites would be a pain. To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno and intel_wait_ring_buffer. Also use the opportunity to clarify the different cases in i915_gem_check_wedge a bit with comments. v2: Don't access dev_priv->mm.interruptible from check_wedge - we might not hold dev->struct_mutex, making this racy. Instead pass interruptible in as a parameter. I've noticed this because I've hit a BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been added in commit `b4aca0106c` Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Apr 25 20:50:12 2012 -0700 drm/i915: extract some common olr+wedge code although that commit is missing any justification for this. I guess it's just copy&paste, because the same commit add the same BUG_ON check to check_olr, where it indeed makes sense. But in check_wedge everything we access is protected by other means, so this is superflous. And because it now gets in the way (we add a new caller in __wait_seqno, which can be called without dev->struct_mutext) let's just remove it. v3: Group all the i915_gem_check_wedge refactoring into this patch, so that this patch here is all about not returning -EAGAIN to callsites that can't handle syscall restarting. v4: Add clarification what interuptible == fales means in our code, requested by Ben Widawsky. v5: Fix EAGAIN mispell noticed by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-07-05 10:01:14 +02:00
Daniel Vetter	cc889e0f6c	drm/i915: disable flushing_list/gpu_write_list This is just the minimal patch to disable all this code so that we can do decent amounts of QA before we rip it all out. The complicating thing is that we need to flush the gpu caches after the batchbuffer is emitted. Which is past the point of no return where execbuffer can't fail any more (otherwise we risk submitting the same batch multiple times). Hence we need to add a flag to track whether any caches associated with that ring are dirty. And emit the flush in add_request if that's the case. Note that this has a quite a few behaviour changes: - Caches get flushed/invalidated unconditionally. - Invalidation now happens after potential inter-ring sync. I've bantered around a bit with Chris on irc whether this fixes anything, and it might or might not. The only thing clear is that with these changes it's much easier to reason about correctness. Also rip out a lone get_next_request_seqno in the execbuffer retire_commands function. I've dug around and I couldn't figure out why that is still there, with the outstanding lazy request stuff it shouldn't be necessary. v2: Chris Wilson complained that I also invalidate the read caches when flushing after a batchbuffer. Now optimized. v3: Added some comments to explain the new flushing behaviour. Cc: Eric Anholt <eric@anholt.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-20 13:54:28 +02:00
Ben Widawsky	f2ef6eb145	drm/i915: switch to default context on idle To keep things as sane as possible, switch to the default context before idling. This should help free context objects, as well as put things in a more well defined state before suspending. v2: remove seqno from context switch call (daniel) return error on failed context switch instead of WARN+continue (daniel) v3: move idling to i915_gpu idle (from i915_gem_idle) (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:20 +02:00
Ben Widawsky	254f965c39	drm/i915: preliminary context support Very basic code for context setup/destruction in the driver. Adds the file i915_gem_context.c This file implements HW context support. On gen5+ a HW context consists of an opaque GPU object which is referenced at times of context saves and restores. With RC6 enabled, the context is also referenced as the GPU enters and exists from RC6 (GPU has it's own internal power context, except on gen5). Though something like a context does exist for the media ring, the code only supports contexts for the render ring. In software, there is a distinction between contexts created by the user, and the default HW context. The default HW context is used by GPU clients that do not request setup of their own hardware context. The default context's state is never restored to help prevent programming errors. This would happen if a client ran and piggy-backed off another clients GPU state. The default context only exists to give the GPU some offset to load as the current to invoke a save of the context we actually care about. In fact, the code could likely be constructed, albeit in a more complicated fashion, to never use the default context, though that limits the driver's ability to swap out, and/or destroy other contexts. All other contexts are created as a request by the GPU client. These contexts store GPU state, and thus allow GPU clients to not re-emit state (and potentially query certain state) at any time. The kernel driver makes certain that the appropriate commands are inserted. There are 4 entry points into the contexts, init, fini, open, close. The names are self-explanatory except that init can be called during reset, and also during pm thaw/resume. As we expect our context to be preserved across these events, we do not reinitialize in this case. As Adam Jackson pointed out, The cutoff of 1MB where a HW context is considered too big is arbitrary. The reason for this is even though context sizes are increasing with every generation, they have yet to eclipse even 32k. If we somehow read back way more than that, it probably means BIOS has done something strange, or we're running on a platform that wasn't designed for this. v2: rename load/unload to init/fini (daniel) remove ILK support for get_size() (indirectly daniel) add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel) added comments (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>	2012-06-14 17:36:16 +02:00
Daniel Vetter	8ecd1a6615	drm/i915: call intel_enable_gtt When drm/i915 is in control of the gtt, we need to call the enable function at all the relevant places ourselves. Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-12 22:21:07 +02:00
Daniel Vetter	dd2757f8b5	drm/i915: stop using dev->agp->base For that to work we need to export the base address of the gtt mmio window from intel-gtt. Also replace all other uses of dev->agp by values we already have at hand. Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-12 22:18:06 +02:00
Ben Widawsky	eac1f14fd1	drm/i915: Inifite timeout for wait ioctl Change the ns_timeout parameter of the wait ioctl to a signed value. Doing this allows the kernel to provide an infinite wait when a timeout of less than 0 is provided. This mimics select/poll. Initially the parameter was meant to match up with the GL spec 1:1, but after being made aware of how much 2^64 - 1 nanoseconds actually is, I do not think anyone will ever notice the loss of 1 bit. The infinite timeout on waiting is similar to the existing i915 userspace interface with the exception that struct_mutex is dropped while doing the wait in this ioctl. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-06 12:25:46 +02:00
Daniel Vetter	30dfebf34b	drm/i915: extract object active state flushing code Both busy_ioctl and the new wait_ioct need to do the same dance (or at least should). Some slight changes: - busy_ioctl now unconditionally checks for olr. Before emitting a require flush would have prevent the olr check and hence required a second call to the busy ioctl to really emit the request. - the timeout wait now also retires request. Not really required for abi-reasons, but makes a notch more sense imo. I've tested this by pimping the i-g-t test some more and also checking the polling behviour of the wait_rendering_timeout ioctl versus what busy_ioctl returns. v2: Too many people complained about unplug, new color is flush_active. v3: Kill the comment about the unplug moniker. v4: s/un-active/inactive/ Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-02 20:51:03 +02:00
Daniel Vetter	e269f90f3d	Merge remote-tracking branch 'airlied/drm-prime-vmap' into drm-intel-next-queued We need the latest dma-buf code from Dave Airlie so that we can pimp the backing storage handling code in drm/i915 with Chris Wilson's unbound tracking and stolen mem backed gem object code. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-06-01 10:52:54 +02:00
Ben Widawsky	b9524a1e1c	drm/i915: remap l3 on hw init If any l3 rows have been previously remapped, we must remap them after GPU reset/resume too. v2: Just return (no warn) on remapping init if not IVB (Jesse) Move the check of schizo userspace to i915_gem_l3_remap (Jesse) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-31 12:11:29 +02:00
Dave Airlie	a21f976094	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: tune down the noise of the RP irq limit fail drm/i915: Remove the error message for unbinding pinned buffers drm/i915: Limit page allocations to lowmem (dma32) for i965 drm/i915: always use RPNSWREQ for turbo change requests drm/i915: reject doubleclocked cea modes on dp drm/i915: Adding TV Out Missing modes. drm/i915: wait for a vblank to pass after tv detect drm/i915: no lvds quirk for HP t5740e Thin Client drm/i915: enable vdd when switching off the eDP panel drm/i915: Fix PCH PLL assertions to not assume CRTC:PLL relationship drm/i915: Always update RPS interrupts thresholds along with frequency drm/i915: properly handle interlaced bit for sdvo dtd conversion drm/i915: fix module unload since error_state rework drm/i915: be more careful when returning -ENXIO in gmbus transfer	2012-05-29 11:09:06 +01:00
Ben Widawsky	199b2bc25b	drm/i915: s/i915_wait_request/i915_wait_seqno/g Wait request is poorly named IMO. After working with these functions for some time, I feel it's much clearer to name the functions more appropriately. Of course we must update the callers to use the new name as well. This leaves room within our namespace for a real wait request function at some point. Note to maintainer: this patch is optional. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 14:18:42 +02:00
Ben Widawsky	23ba4fd0a4	drm/i915: wait render timeout ioctl This helps implement GL_ARB_sync but stops short of allowing full blown sync objects. Finally we can use the new timed seqno waiting function to allow userspace to wait on a buffer object with a timeout. This implements that interface. The IOCTL will take as input a buffer object handle, and a timeout in nanoseconds (flags is currently optional but will likely be used for permutations of flush operations). Users may specify 0 nanoseconds to instantly check. The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any non-zero timeout parameter the wait ioctl will wait for the given number of nanoseconds on an object becoming unbusy. Since the wait itself does so holding struct_mutex the object may become re-busied before this completes. A similar but shorter race condition exists in the busy ioctl. v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris) Flush the object from the gpu write domain (Chris + Daniel) Fix leaked refcount in good case (Chris) Naturally align ioctl struct (Chris) v3: Drop lock after getting seqno to avoid ugly dance (Chris) v4: check for 0 timeout after olr check to allow polling (Chris) v5: Updated the comment. (Chris) v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel) Fix the commit message comment to be less ugly (Ben) Add a warning to check the return timespec (Ben) v7: Use DRM_AUTH for the ioctl. (Eugeni) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 14:15:46 +02:00
Chris Wilson	31d8d651eb	drm/i915: Remove the error message for unbinding pinned buffers This is now used intentionally to prevent proliferation of is-pinned checks upon the inactive list following: commit `1b50247a8d` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 24 15:47:30 2012 +0100 drm/i915: Remove the list of pinned inactive objects Reported-and-tested-by: guang.a.yang@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50075 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 10:10:40 +02:00
Chris Wilson	bed1ea95a3	drm/i915: Limit page allocations to lowmem (dma32) for i965 Broadwater and Crestline share a limitation that prevent it from relocating general surface state above 4GiB. The only recourse we have since any buffer object may be used as a relocation target is then to limit all object allocations on 965g[m] to DMA32. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 10:07:06 +02:00
Ben Widawsky	5c81fe85da	drm/i915: timeout parameter for seqno wait Insert a wait parameter in the code so we can possibly timeout on a seqno wait if need be. The code should be functionally the same as before because all the callers will continue to retry if an arbitrary timeout elapses. We'd like to have nanosecond granularity, but the only way to do this is with hrtimer, and that doesn't fit well with the needs of this code. v2: Fix rebase error (Chris) Return proper time even in wedged + signal case (Chris + Ben) Use timespec constructs (Ben) Didn't take Daniel's advice regarding the Frankenstein-ness of the function. I did try his advice, but in the end I liked the way the original code looked, better. v3: Make wakeups far less frequent for infinite waits (Chris) v4: Remove dummy_wait variable (Daniel) Use raw monotonic time instead of jiffies (made the code a bit cleaner) (Ben) Added a couple of warnings (Ben) v5: Remove warnings (Daniel) Use more accurate time diff for default case (Daniel) Bikeshed for setting the return timespec in timeout case (Daniel) s/jiffies/time in one of the comments Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-25 09:55:08 +02:00
Daniel Vetter	1286ff7397	i915: add dmabuf/prime buffer sharing support. This adds handle->fd and fd->handle support to i915, this is to allow for offloading of rendering in one direction and outputs in the other. v2 from Daniel Vetter: - fixup conflicts with the prepare/finish gtt prep work. - implement ppgtt binding support. Note that we have squat i-g-t testcoverage for any of the lifetime and access rules dma_buf/prime support brings along. And there are quite a few intricate situations here. Also note that the integration with the existing code is a bit hackish, especially around get_gtt_pages and put_gtt_pages. It imo would be easier with the prep code from Chris Wilson's unbound series, but that is for 3.6. Also note that I didn't bother to put the new prepare/finish gtt hooks to good use by moving the dma_buf_map/unmap_attachment calls in there (like we've originally planned for). Last but not least this patch is only compile-tested, but I've changed very little compared to Dave Airlie's version. So there's a decent chance v2 on drm-next works as well as v1 on 3.4-rc. v3: Right when I've hit sent I've noticed that I've screwed up one obj->sg_list (for dmar support) and obj->sg_table (for prime support) disdinction. We should be able to merge these 2 paths, but that's material for another patch. v4: fix the error reporting bugs pointed out by ickle. v5: fix another error, and stop non-gtt mmaps on shared objects stop pread/pwrite on imported objects, add fake kmap Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-23 10:47:10 +01:00
Chris Wilson	b4519513e8	drm/i915: Introduce for_each_ring() macro In many places we wish to iterate over the rings associated with the GPU, so refactor them to use a common macro. Along the way, there are a few code removals that should be side-effect free and some rearrangement which should only have a cosmetic impact, such as error-state. Note that this slightly changes the semantics in the hangcheck code: We now always cycle through all enabled rings instead of short-circuiting the logic. v2: Pull in a couple of suggestions from Ben and Daniel for intel_ring_initialized() and not removing the warning (just moving them to a new home, closer to the error). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [danvet: Added note to commit message about the small behaviour change, suggested by Ben Widawsky.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-19 22:39:53 +02:00
Daniel Vetter	5e13a0c5ec	Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued Backmerge of drm-next to resolve a few ugly conflicts and to get a few fixes from 3.4-rc6 (which drm-next has already merged). Note that this merge also restricts the stencil cache lra evict policy workaround to snb (as it should) - I had to frob the code anyway because the CM0_MASK_SHIFT define died in the masked bit cleanups. We need the backmerge to get Paulo Zanoni's infoframe regression fix for gm45 - further bugfixes from him touch the same area and would needlessly conflict. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-08 13:39:59 +02:00
Daniel Vetter	dc257cf154	Linux 3.4-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPpvY9AAoJEHm+PkMAQRiGpEoIAJgbu+Y8gITnBK/wh9O6zy3S 5jie5KK4YWdbJsvO58WbNr3CyVIwGIqQ2dUZLiU59aBVLarlGw8xor0MmW+cZwhp 6fBHaf0qDYAV0MZjD+mnnExOiCRyISa2lPmsfu9dAWywh5KGe6/oAP6/qcXIyok3 KZyl3qQf4ENpaZPHwZPXCEkUvtuyHgNiszN+QXEadA3s19Ot4VGe9A3VGw+GNrSm JqFIq3acQAbKa5BYaqf7TQC02v2FI7//eqt6QHxTqbE6a7LGbTvLfX3HlJ2mnfqa 1R6QHhM4y4OZDHbaMT2raHZ8WuLXzhehJzhP8Co7AHFOKwVKOb5XbcUr2RrukMU= =HkMd -----END PGP SIGNATURE----- Merge tag 'v3.4-rc6' into drm-intel-next Conflicts: drivers/gpu/drm/i915/intel_display.c Ok, this is a fun story of git totally messing things up. There /shouldn't/ be any conflict in here, because the fixes in -rc6 do only touch functions that have not been changed in -next. The offending commits in drm-next are 14415745b2..1fa611065 which simply move a few functions from intel_display.c to intel_pm.c. The problem seems to be that git diff gets completely confused: $ git diff 14415745b2..1fa611065 is a nice mess in intel_display.c, and the diff leaks into totally unrelated functions, whereas $git diff --minimal 14415745b2..1fa611065 is exactly what we want. Unfortunately there seems to be no way to teach similar smarts to the merge diff and conflict generation code, because with the minimal diff there really shouldn't be any conflicts. For added hilarity, every time something in that area changes the + and - lines in the diff move around like crazy, again resulting in new conflicts. So I fear this mess will stay with us for a little longer (and might result in another backmerge down the road). Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-07 14:02:14 +02:00
Ben Widawsky	b4aca0106c	drm/i915: extract some common olr+wedge code The new wait_rendering ioctl also needs to check for an oustanding lazy request, and we already duplicate that logic at three places. So extract it. While at it, also extract the code to check the gpu wedging state to improve code flow. v2: Don't use seqno as an outparam (Chris) v3 by danvet: Kill stale comment and pimp commit message Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:32 +02:00
Daniel Vetter	53ca26cab8	drm/i915 disallow physical batchbuffers for KMS Even the horrible gen3 XvMC code has learned to do this right by the time xf86-video-intel releases learned to do kernel modesetting. So we can just disallow this. Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:25 +02:00
Daniel Vetter	8781342df7	drm/i915: create dev_priv->dri1 dragon dungeon^W^W sub-struct ... and shove allow_batchbuffer in there. More dragons will follow suit. There's the curious case that we allow this for KMS ... Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:25 +02:00
Ben Widawsky	3b88cc0dd7	drm/i915: use __wait_seqno for ring throttle It turns out throttle had an almost identical bit of code to do the wait. Now we can call the new helper directly. This is just a bonus, and not needed for the overall series. v2: remove irq_get/put which is now in __wait_seqno (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:23 +02:00
Ben Widawsky	4146b08d76	drm/i915: remove polled wait from throttle It's about to go away anyway. Just here to help bisection. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:22 +02:00
Ben Widawsky	604dd3ec75	drm/i915: extract __wait_seqno from i915_wait_request i915_wait_request is actually a fairly large function encapsulating quite a few different operations. Because being able to wait on seqnos in various conditions is useful, extracting that bit of code to a helper function seems useful v2: pull the irq_get/put as well (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:22 +02:00
Ben Widawsky	c58cf4f108	drm/i915: drop polled waits from i915_wait_request The only time irq_get should fail is during unload or suspend. Both of these points should try to quiesce the GPU before disabling interrupts and so the atomic polling should never occur. This was recommended by Chris Wilson as a way of reducing added complexity to the polled wait which I introduced in an RFC patch. 09:57 < ickle_> it's only there as a fudge for waiting after irqs after uninstalled during s&r, we aren't actually meant to hit it 09:57 < ickle_> so maybe we should just kill the code there and fix the breakage v2: return -ENODEV instead of -EBUSY when irq_get fails Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:22 +02:00
Ben Widawsky	9574b3fe29	drm/i915: kill waiting_seqno The waiting_seqno is not terribly useful, and as such we can remove it so that we'll be able to extract lockless code. v2: Keep the information for error_state (Chris) Check if ring is initialized in hangcheck (Chris) Capture the waiting ring (Chris) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: add some bikeshed to clarify a comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:21 +02:00
Ben Widawsky	be998e2e39	drm/i915: move vbetool invoked ier stuff This extra bit of interrupt enabling code doesn't belong in the wait seqno function. If anything we should pull it out to a helper so the throttle code can also use it. The history is a bit vague, but I am going to attempt to just dump it, unless someone can argue otherwise. Removing this allows for a shared lock free wait seqno function. To keep tabs on this issue though, the IER value is stored on error capture (recommended by Chris Wilson) v2: fixed typo EIR->IER (Ben) Fix some white space (Ben) Move IER capture to globally instead of per ring (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: ier is a 16 bit reg on gen2!] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:21 +02:00
Ben Widawsky	b2da9fe5d5	drm/i915: remove do_retire from i915_wait_request This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:20 +02:00
Daniel Vetter	507432986c	drm/i915: use the new masked bit macro some more I've missed this one. v2: Chris Wilson noticed another register. v3: Color choice improvements. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:20 +02:00
Daniel Vetter	63ed2cb2d1	drm/i915: rip out GEM drm feature checks We always set it so there's no point in checking. We could instead add a bit that tells us whether gem is actually initialized (i.e. either kms or gem_init_ioctl called), but that's imho not worth it. So just rip it out. There's a little change in the wait_ring timeout, but we've never run with anything else than the 60 second timeout, even on dri1 userspace. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:14 +02:00
Daniel Vetter	7bb6fb8dd9	drm/i915: disallow gem ums init ioctl for kms This ioctl used in a kms driver is only useful to create massive havoc. v2: Bail out with -ENODEV as suggested by Chris Wilson. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:13 +02:00
Chris Wilson	1070a42b6b	drm/i915: Move GEM initialisation from i915_dma.c to i915_gem.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:12 +02:00
Chris Wilson	1488fc08c1	drm/i915: Remove the deferred-free list The use of the mm_list by deferred-free breaks the following patches to extend the range of objects tracked. We can simplify things if we just make the unbind during free uninterrutible. Note that unbinding should never fail, because we hold an additional reference on every active object. Only the ilk vt-d workaround breaks this, but already takes care of not failing by waiting for the gpu to quiescent non-interruptible. But the existence of the deferred free list casted some doubts on this theory, hence WARN if the unbind fails and only then retry non-interruptible. We can kill this additional code after a release in case the theory is indeed right and no one has hit that WARN. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:11 +02:00
Chris Wilson	1b50247a8d	drm/i915: Remove the list of pinned inactive objects Simplify object tracking by removing the inactive but pinned list. The only place where this was used is for counting the available memory, which is just as easy performed by checking all objects on the rare occasions it is required (application startup). For ease of debugging, we keep the reporting of pinned objects through the error-state and debugfs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:11 +02:00
Chris Wilson	a39d7efc62	drm/i915: Remove i915_gem_evict_inactive() This was only used by one external caller who would just be as happy with evict-everything, so perform the replacement and make the function private. In the process we note that unbinding the inactive list should not fail, and make it a warning instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:10 +02:00
Chris Wilson	8325a09dd0	drm/i915: Bump the inactive LRU on set-to-GTT-domain Currently, we only bump the inactive LRU of an object when we bind into the GTT for a page-fault. As the object may be used many times before its mapping is zapped, we do not mark it as active as frequently as we should. Userspace should be calling set-to-GTT-domain before each pointer deference (for synchronous access) and so is a good place to mark the buffer as active. Marking the buffer as recently used places it at the end of the inactive eviction queue, though still before anything with outstanding rendering. This reduces the likelihood of evicting a buffer that is going to be used again by the CPU in the near future. This way we can hopefully avoid to kick out upload buffers right before we use them on the gpu. Note that we need to check that the object is not active or pinned, for otherwise we create havoc on the active/pinned lists, which also use obj->mm_list. The active lists are sorted by and evicted in last GPU rendering order, access by the CPU to a still active buffer therefore does not affect its eviction ordering. Pinned objects are currently excluded from eviction, therefore the only list that we need to bump for GTT access by the CPU is the inactive list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added further explanations to the commit message as discussed on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:10 +02:00
Daniel Vetter	6b26c86d61	drm/i915: create macros to handle masked bits ... and put them to so good use. Note that there's functional change in vlv clock gating code, we now no longer spuriously read back the current value of the bit. According to Bspec the high bits should always read zero, so ORing this in should have no effect. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:08 +02:00
Chris Wilson	5d82e3e642	drm/i915: Clarify the semantics of tiling_changed Rename obj->tiling_changed to obj->fence_dirty so that it is clear that it flags when the parameters for an active fence (including the no-fence) register are changed. Also, do not set this flag when the object does not have a fence register allocated currently and the gpu does not depend upon the unfence. This case works exactly like when a tiled object lost its fence and hence does not need additional handling for the tiling change in the code. v2: Use fence_dirty to better express what the flag tracks and add a few more details to the comments to serve as a reminder of how the GPU also uses the unfenced register slot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add some bikeshed to the commit message about the stricter use of fence_dirty.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:06 +02:00
Ben Widawsky	4f0c7cfbb4	drm/i915: [sparse] __iomem fixes for gem As with one of the earlier patches in the series, we're forced to cast for copy_[to\|from]_user. Again because of the nature of the GEN x86 exclusivity, this should be safe. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> [danvet: Added some bikeshed.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-05-03 11:18:01 +02:00
Linus Torvalds	6be5ceb02e	VM: add "vm_mmap()" helper function This continues the theme started with vm_brk() and vm_munmap(): vm_mmap() does the same thing as do_mmap(), but additionally does the required VM locking. This uninlines (and rewrites it to be clearer) do_mmap(), which sadly duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have to export our internal do_mmap_pgoff() function. Some day we hopefully don't have to export do_mmap() either, if all modular users can become the simpler vm_mmap() instead. We're actually very close to that already, with the notable exception of the (broken) use in i810, and a couple of stragglers in binfmt_elf. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-04-20 17:29:13 -07:00
Chris Wilson	14415745b2	drm/i915: Refactor get_fence() to use the common fence writing routine We can also take advantage of the new 'no retire' mode for seqno waiting to avoid having to take a reference on the old fence object whilst flushing an existing fence. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:40:51 +02:00
Chris Wilson	ada726c734	drm/i915: Refactor fence clearing to use the common fence writing routine Now that we have a routine that is able to clear the fences as well as setup up the register for a tiled object, remove the surplus routines to clear the fences. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:34:53 +02:00
Chris Wilson	61050808bb	drm/i915: Refactor put_fence() to use the common fence writing routine One clarification that we make is to the existing semantics of obj->tiling_changed to only mean that we need to update an associated fence register (including the NO_FENCE when executing an untiled but fenced GPU command). If we do not have a fence register or pending fenced GPU access for the object (after put_fence() for example), then we can clear the tiling_changed flag as any fence will necessarily be rewritten upon acquisition. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:34:30 +02:00
Chris Wilson	9ce079e481	drm/i915: Prepare to consolidate fence writing Update the existing architecture specific fence writing routines to either update the fence to point to a tiled object or to clear them in preparation to remove the other fence writing routes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:24:32 +02:00
Chris Wilson	1899184547	drm/i915: Remove the unsightly "optimisation" from flush_fence() As i915_wait_request() will first check for an already passed seqno, doing it also in the caller is a waste of space for a cold path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:23:17 +02:00
Chris Wilson	8fe301add5	drm/i915: Simplify fence finding As the fences are stored in LRU order, we can simply reuse the oldest if we do not have an unused register. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:20:35 +02:00
Chris Wilson	1c293ea3b1	drm/i915: Discard the unused obj->last_fenced_ring As we now never pipeline a fence update, obj->last_fenced_ring is always the same as the obj->ring whenever obj->last_fenced_seqno is active, so remove it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:19:51 +02:00
Chris Wilson	69963e7c76	drm/i915: Remove unused ring->setup_seqno As we now no longer track a pipelined fence change, we never use ring->setup_seqno and can kill it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:18:52 +02:00
Chris Wilson	a360bb1a83	drm/i915: Remove fence pipelining Step 2 is then to replace the pipelined parameter with NULL and perform constant folding to remove dead code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:18:25 +02:00
Chris Wilson	06d9813157	drm/i915: Remove the pipelined parameter from get_fence() We never succeeded in getting pipelined fencing to work (unresolved spurious GPU hangs), so begin the process of dismantling and removal the broken code. Step 1 is the removal of the pipeline parameter to get_fence(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-18 13:15:43 +02:00
Daniel Vetter	48ecfa1090	drm/i915: properly set ppgtt cacheability on snb For some reason snb has 2 fields to set ppgtt cacheability. This one here does not exist on gen7. This might explain why ppgtt wasn't a win on snb like on ivb - not enough pte caching. v2: Fixup rebase fail. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-17 11:19:59 +02:00
Daniel Vetter	be901a5a1b	drm/i915: set w/a bit for snb pagefaults Bspec says that we need to set this: vol1c.3 "Blitter Command Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register". We don't really rely on pagefaults, but who knows what this all affects. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-17 11:19:56 +02:00
Daniel Vetter	767878908e	Linux 3.4-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEbBAABAgAGBQJPi3XOAAoJEHm+PkMAQRiGnsUH9RjHwH4YFVyuP/DKtKa6zs74 wqkpT15yITQ5WWMog4JaJFFg5rJCUd8QZr7AS/HSn0ijDyZX5VU7Rcs9cMudDzNR H/5K/AscS4fjb0HwWVqoltTWHRb9QGSwVN3+E3VCDLt9P89YJ0o3QztkkuEX5dkZ jc7reVXTfRnCcILEa9jleOzrn+OLM3j/jAjQ2hGunl8EDLzD4b17HHPoli4jEZ/5 5ibpSVsPD+AqzN+glbXvYjVItl12D0IQos/JdOwfuZriCVWLxysSSwHZTbPCyvBZ LHH4HR5T+XLSXbjJeNkUFHLzqU+d5gVRadIoWtJCxqxFjKbOs2YtzJ5Ai0nDiw== =kTkC -----END PGP SIGNATURE----- Merge tag 'v3.4-rc3' into drm-intel-next-queued Backmerge Linux 3.4-rc3 into drm-intel-next to resolve a few things that conflict/depend upon patches in -rc3: - Second part of the Sandybridge workaround series - it changes some of the same registers. - Preparation for Chris Wilson's fencing cleanup - we need the fix from -rc3 merged before we can move around all that code. - Resolve the gmbus conflict - gmbus has been disabled in 3.4 again, but should be enabled on all generations in 3.5. Conflicts: drivers/gpu/drm/i915/intel_i2c.c Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-17 11:16:20 +02:00
Daniel Vetter	c07496fa61	drm/i915: don't pwrite tiled objects through the gtt ... we will botch up the bit17 swizzling. Furthermore tiled pwrite is a (now) unused slowpath, so no one really cares. This fixes the last swizzling issues I have with i-g-t on my bit17 swizzling i915G. No regression, it's been broken since the dawn of gem, but it's nice for regression tracking when really _all_ i-g-t tests work. Actually this is not true, Chris Wilson noticed while reviewing this patch that the commit commit `d9e86c0ee6` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 10 16:40:20 2010 +0000 drm/i915: Pipelined fencing [infrastructure] contained a functional change that broke things. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-15 19:37:42 +02:00
Ben Widawsky	1500f7ea06	drm/i915: hide (seqno-1) in ringbuffer code Waiting for seqno-1 in our object synchronization code is an implementation detail given how we've decided to do the waits within the rest of our code. Requested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:14 +02:00
Ben Widawsky	e3a5a2250a	drm/i915: fix for when semaphore updates fail This fixes a long standing issue where emitting the semaphore updates may have failed, but we've already updated our internal data structure. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:13 +02:00
Ben Widawsky	5816d648d5	drm/i915: i915_gem_object_sync must handle NULL When I extracted the synchronization code for implementing semaphorified pageflips (74f5f6e0), I neglected the non pipelined case which also calls this code. The modesetting code wants to make sure the object has finished rendering to the frame before configuring the scanout (ie. non-pipelined case). As a result of a follow on discussion on IRC, I've decided to add a comment about the function itself which received much inspiration from Chris as well. So really, this patch was ghost-written by Chris :). Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:13 +02:00
Chris Wilson	f84131905b	drm/i915: Allow concurrent read access between CPU and GPU domain Similar to allowing a buffer to be simultaneously read by the GPU and through the GTT, we wish to allow readback of the pages through the CPU domain whilst they are also being read by the GPU. Domain coherency is managed by allowing multiple readers, but only a single writer. This is used by mesa for its program cache which it may search for every new program every frame and then renews should it need to add. During renewal, mesa copies the program bo currently executing through a CPU mapping onto the new bo. This patch allows the search and that copy to proceed without causing a stall on the current batch. Testcase: i-g-t/tests/gem_cpu_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:10 +02:00
Ben Widawsky	2911a35b2e	drm/i915: use semaphores for the display plane In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:05 +02:00
Chris Wilson	9a5a53b392	drm/i915: Reorganise rules for get_fence/put_fence By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 21:14:04 +02:00
Dave Airlie	effbc4fd8e	Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next Daniel Vetter wrote First pull request for 3.5-next, slightly large than usual because new things kept coming in since the last pull for 3.4. Highlights: - first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci ids are not yet added, and there's still quite a few patches to merge (mostly modesetting). To make QA easier I've decided to merge this stuff in pieces. - loads of cleanups and prep patches spurred by the above. Especially vlv is a real frankenstein chip, but also hsw is stretching our driver's code design. Expect more to come in this area for 3.5. - more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again, there are more patches needed (and some already queued up), but I wanted to split this a bit for better testing. - pwrite/pread rework and retuning. This series has been in the works for a few months already and a lot of i-g-t tests have been created for it. Now it's finally ready to be merged. Note that one patch in this series touches include/pagemap.h, that patch is acked-by akpm. - reduce mappable pressure and relocation throughput improvements from Chris. - mmap offset exhaustion mitigation by Chris Wilson. - a start at figuring out which codepaths in our messy dri1/ums+gem/kms driver we actually need to support by bailing out of unsupported case. The driver now refuses to load without kms on gen6+ and disallows a few ioctls that userspace never used in certain cases. More of this will definitely come. - More decoupling of global gtt and ppgtt. - Improved dual-link lvds detection by Takashi Iwai. - Shut up the compiler + plus fix the fallout (Ben) - Inverted panel brightness handling (mostly Acer manages to break things in this way). - Small fixlets and adjustements and some minor things to help debugging. Regression-wise QA reported quite a few issues on ivb, but all of them turned out to be hw stability issues which are already fixed in drm-intel-fixes (QA runs the nightly regression tests on -next alone, without -fixes automatically merged in). There's still one issue open on snb, it looks like occlusion query writes are not quite as cache coherent as we've expected. With some of the pwrite adjustements we can now reliably hit this. Kernel workaround for it is in the works." * 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits) drm/i915: VCS is not the last ring drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2 drm/i915: make quirks more verbose drm/i915: dump the DMA fetch addr register on pre-gen6 drm/i915/sdvo: Include YRPB as an additional TV output type drm/i915: disallow gem init ioctl on ilk drm/i915: refuse to load on gen6+ without kms drm/i915: extract gt interrupt handler drm/i915: use render gen to switch ring irq functions drm/i915: rip out old HWSTAM missed irq WA for vlv drm/i915: open code gen6+ ring irqs drm/i915: ring irq cleanups drm/i915: add SFUSE_STRAP registers for digital port detection drm/i915: add WM_LINETIME registers drm/i915: add WRPLL clocks drm/i915: add LCPLL control registers drm/i915: add SSC offsets for SBI access drm/i915: add port clock selection support for HSW drm/i915: add S PLL control drm/i915: add PIXCLK_GATE register ... Conflicts: drivers/char/agp/intel-agp.h drivers/char/agp/intel-gtt.c drivers/gpu/drm/i915/i915_debugfs.c	2012-04-12 10:27:01 +01:00
Daniel Vetter	15a13bbdff	drm/i915: clear fencing tracking state when retiring requests This fixes a resume regression introduced in commit `7dd4906586` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Mar 21 10:48:18 2012 +0000 drm/i915: Mark untiled BLT commands as fenced on gen2/3 which fixed fencing tracking for untiled blt commands. A side effect of that patch was that now also untiled objects have a non-zero obj->last_fenced_seqno to track when a fence can be set up after a pipelined tiling change. Unfortunately this was only cleared by the fence setup and teardown code, resulting in tons of untiled but inactive objects with non-zero last_fenced_seqno. Now after resume we completely reset the seqno tracking, both on the driver side (by setting dev_priv->next_seqno = 1) and on the hw side (by allocating a new hws page, which contains the seqnos). Hilarity and indefinite waits ensued from the stale seqnos in obj->last_fenced_seqno from before the suspend. The fix is to properly clear the fencing tracking state like we already do for the normal gpu rendering while moving objects off the active list. Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Jiri Slaby <jslaby@suse.cz> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-12 09:02:37 +02:00
Daniel Vetter	f534bc0b22	drm/i915: disallow gem init ioctl on ilk Ums is already disabled, but on ilk we can additionally disable gem initialization when using user mode setting. Upstream never support ilk without kernel modesetting and not even the RHEL ilk ums backport needs gem - that driver is based on xf86-video-intel version 2.2, which is pre-gem. Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-09 18:04:08 +02:00
Chris Wilson	7dd4906586	drm/i915: Mark untiled BLT commands as fenced on gen2/3 The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: Jiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:26:05 +02:00
Daniel Vetter	55a254ac63	drm/i915: properly restore the ppgtt page directory on resume The ppgtt page directory lives in a snatched part of the gtt pte range. Which naturally gets cleared on hibernate when we pull the power. Suspend to ram (which is what I've tested) works because despite the fact that this is a mmio region, it is actually back by system ram. Fix this by moving the page directory setup code to the ppgtt init code (which gets called on resume). This fixes hibernate on my ivb and snb. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-04-01 12:25:29 +02:00
Jesse Barnes	23e3f9b37e	drm/i915: check for disabled interrupts on ValleyView Haven't seen this yet, but it doesn't hurt. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-29 00:11:46 +02:00
Daniel Vetter	e7e58eb5c0	drm/i915: mark pwrite/pread slowpaths with unlikely Beside helping the compiler untangle this maze they double-up as documentation for which parts of the code aren't performance-critical but just around to keep old (but already dead-slow) userspace from breaking. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:41:41 +02:00
Daniel Vetter	23c18c71da	drm/i915: fixup in-line clflushing on bit17 swizzled bos The issue is that with inline clflushing the clflushing isn't properly swizzled. Fix this by - always clflushing entire 128 byte chunks and - unconditionally flush before writes when swizzling a given page. We could be clever and check whether we pwrite a partial 128 byte chunk instead of a partial cacheline, but I've figured that's not worth it. Now the usual approach is to fold this into the original patch series, but I've opted against this because - this fixes a corner case only very old userspace relies on and - I'd like to not invalidate all the testing the pwrite rewrite has gotten. This fixes the regression notice by tests/gem_tiled_partial_prite_pread from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to tiled buffers on bit17 swizzling machines. But that is also broken without the pwrite patches, so likely a different issue (or a problem with the testcase). v2: Simplify the patch by dropping the overly clever partial write logic for swizzled pages. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:40:57 +02:00
Daniel Vetter	f56f821feb	mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:36:30 +02:00
Daniel Vetter	d174bd6472	drm/i915: extract copy helpers from shmem_pread\|pwrite While moving around things, this two functions slowly grew out of any sane bounds. So extract a few lines that do the copying and clflushing. Also add a few comments to explain what's going on. v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths as suggested by Chris Wilson. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:30:33 +02:00
Daniel Vetter	117babcdd5	drm/i915: use uncached writes in pwrite It's around 20% faster. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:29:38 +02:00
Daniel Vetter	ffc62976d2	drm/i915: fall back to shmem pwrite when the buffer is not accessible It's too expensive to move it around just for that pwrite, especially when we're trashing on the mappable gtt part like crazy. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:29:08 +02:00
Daniel Vetter	586428852a	drm/i915: implement inline clflush for pwrite In micro-benchmarking of the usual pwrite use-pattern of alternating pwrites with gtt domain reads from the gpu, this yields around 30% improvement of pwrite throughput across all buffers size. The trick is that we can avoid clflush cachelines that we will overwrite completely anyway. Furthermore for partial pwrites it gives a proportional speedup on top of the 30% percent because we only clflush back the part of the buffer we're actually writing. v2: Simplify the clflush-before-write logic, as suggested by Chris Wilson. v3: Finishing touches suggested by Chris Wilson: - add comment to needs_clflush_before and only set this if the bo is uncached. - s/needs_clflush/needs_clflush_after/ in the write paths to clearly differentiate it from needs_clflush_before. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:28:45 +02:00
Daniel Vetter	96d79b5270	drm/i915: don't clobber userspace memory before commiting to the pread The pagemap.h prefault helpers do the prefaulting by simply writing some data into every page. Hence we should not prefault when we're not yet commited to to actually writing data to userspace. The problem is now that - we can't prefault while holding dev->struct_mutex for we could deadlock with our own pagefault handler - we need to grab dev->struct_mutex before copying to sync up with any outsanding gpu writes. Therefore only prefault when we're dropping the lock the first time in the pread slowpath - at that point we're committed to the write, don't wait on the gpu anymore and hence won't return early (with e.g. -EINTR). Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:28:32 +02:00
Daniel Vetter	935aaa692e	drm/i915: drop gtt slowpath With the proper prefault, it's extremely unlikely that we fall back to the gtt slowpath. So just kill it and use the shmem_pwrite path as fallback. To further clean up the code, move the preparatory gem calls into the respective pwrite functions. This way the gtt_fast->shmem fallback is much more obvious. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:27:21 +02:00
Daniel Vetter	692a576b9d	drm/i915: don't call shmem_read_mapping unnecessarily This speeds up pwrite and pread from ~120 µs ro ~100 µs for reading/writing 1mb on my snb (if the backing storage pages are already pinned, of course). v2: Chris Wilson pointed out a glaring page reference bug - I've unconditionally dropped the reference. With that fixed (and the associated reduction of dirt in dmesg) it's now even a notch faster. v3: Unconditionaly grab a page reference when dropping dev->struct_mutex to simplify the code-flow. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:27:03 +02:00
Daniel Vetter	3ae5378330	drm/i915: don't use gtt_pwrite on LLC cached objects ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:25:45 +02:00
Daniel Vetter	a0356fc373	drm/i915: kill ranged cpu read domain support No longer needed. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:25:32 +02:00
Daniel Vetter	8489731c9b	drm/i915: move clflushing into shmem_pread This is obviously gonna slow down pread. But for a half-way realistic micro-benchmark, it doesn't matter: Non-broken userspace reads back data from the gpu once before the gpu again dirties it. So all this ranged clflush tracking is just a waste of time. No pread performance change (neglecting the dumb benchmark of constantly reading the same data) measured. As an added bonus, this avoids clflush on read on coherent objects. Which means that partial preads on snb are now roughly 4x as fast. This will be usefull for e.g. the libva encoder - when I finally get around to fix that up. v2: Properly sync with the gpu on LLC machines. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:20:01 +02:00
Daniel Vetter	dbf7bff074	drm/i915: merge shmem_pread slow&fast-path With the previous rewrite, they've become essential identical. v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris Wilson. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:19:11 +02:00
Daniel Vetter	e244a443bf	drm/i915: merge shmem_pwrite slow&fast-path With the previous rewrite, they've become essential identical. v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris Wilson. Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:18:58 +02:00
Chris Wilson	dabdfe021a	drm/i915: Avoid using mappable space for relocation processing through the CPU We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:16:17 +02:00
Daniel Vetter	644ec02b5d	drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt ... because this is what it actually doesn now that we have the global gtt vs. ppgtt split. Also move it to the other global gtt functions in i915_gem_gtt.c Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-27 13:14:59 +02:00
Chris Wilson	a14917eeb2	drm/i915: Release the mmap offset when purging a buffer If we discard a buffer due to memory pressure, also release its alloted mmap address space. As it may be sometime before userspace wakes up and notices that it has buffers to purge from its cache, we may waste valuable address space on unusable objects for a period of time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-23 11:04:35 +01:00
Ben Widawsky	eb2c0c818a	drm/i915: [dinq] shut up two instances -Wunitialized Introduced in commit `8461d226` and `8c59967c` Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: s/fix/shut up/ in the commit msg.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-22 17:44:09 +01:00
Daniel Vetter	0ebb982993	drm/i915: enable lazy global-gtt binding Now that everything is in place, only bind to the global gtt when actually required. Patch split-up suggested by Chris Wilson. Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-20 21:55:16 +01:00
Daniel Vetter	74898d7edc	drm/i915: bind objects to the global gtt only when needed And track the existence of such a binding similar to the aliasing ppgtt case. Speeds up binding/unbinding in the common case where we only need a ppgtt binding (which is accessed in a cpu coherent fashion by the gpu) and no gloabl gtt binding (which needs uc writes for the ptes). This patch just puts the required tracking in place. v2: Check that global gtt mappings exist in the error_state capture code (with Chris Wilson's llc reloc patches batchbuffers are no longer relocated as mappable in all situations, so this matters). Suggested by Chris Wilson. v3: Adapted to Chris' latest llc-reloc patches. v4: Fix a bug in the i915 error state capture code noticed by Chris Wilson. Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-20 21:52:01 +01:00
Daniel Vetter	741639079c	drm/i915: split out dma mapping from global gtt bind/unbind functions Note that there's a functional change buried in this patch wrt the ilk dmar workaround: We now only idle the gpu while tearing down the dmar mappings, not while clearing the gtt. Keeping the current semantics would have made for some really ugly code and afaik the issue is only with the dmar unmapping that needs a fully idle gpu. Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-20 21:51:41 +01:00
Chris Wilson	c501ae7f33	drm/i915: Only clear the GPU domains upon a successful finish By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-03-01 21:36:13 +01:00
Chris Wilson	eadb29a9c5	drm/i915: Silence the error message from i915_wait_request() This error message has since been superseded by the hangcheck, and does not add any salient information beyond that already printed by hangcheck discovering the GPU hang that lead to i915_wait_request() bombing out in the first place. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-27 18:08:22 +01:00
Chris Wilson	a71d8d9452	drm/i915: Record the tail at each request and use it to estimate the head By recording the location of every request in the ringbuffer, we know that in order to retire the request the GPU must have finished reading it and so the GPU head is now beyond the tail of the request. We can therefore provide a conservative estimate of where the GPU is reading from in order to avoid having to read back the ring buffer registers when polling for space upon starting a new write into the ringbuffer. A secondary effect is that this allows us to convert intel_ring_buffer_wait() to use i915_wait_request() and so consolidate upon the single function to handle the complicated task of waiting upon the GPU. A necessary precaution is that we need to make that wait uninterruptible to match the existing conditions as all the callers of intel_ring_begin() have not been audited to handle ERESTARTSYS correctly. By using a conservative estimate for the head, and always processing all outstanding requests first, we prevent a race condition between using the estimate and direct reads of I915_RING_HEAD which could result in the value of the head going backwards, and the tail overflowing once again. We are also careful to mark any request that we skip over in order to free space in ring as consumed which provides a self-consistency check. Given sufficient abuse, such as a set of unthrottled GPU bound cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on Sandy Bridge (i5-2520m): firefox-paintball 18927ms -> 15646ms: 1.21x speedup firefox-fishtank 12563ms -> 11278ms: 1.11x speedup which is a mild consolation for the performance those traces achieved from exploiting the buggy autoreported head. v2: Add a few more comments and make request->tail a conservative estimate as suggested by Daniel Vetter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: resolve conflicts with retirement defering and the lack of the autoreport head removal (that will go in through -fixes).] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-15 14:26:03 +01:00
Daniel Vetter	53d227f282	drm/i915: fixup seqno allocation logic for lazy_request Currently we reserve seqnos only when we emit the request to the ring (by bumping dev_priv->next_seqno), but start using it much earlier for ring->oustanding_lazy_request. When 2 threads compete for the gpu and run on two different rings (e.g. ddx on blitter vs. compositor) hilarity ensued, especially when we get constantly interrupted while reserving buffers. Breakage seems to have been introduced in commit `6f392d5486` Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 7 11:01:22 2010 +0100 drm/i915: Use a common seqno for all rings. This patch fixes up the seqno reservation logic by moving it into i915_gem_next_request_seqno. The ring->add_request functions now superflously still return the new seqno through a pointer, that will be refactored in the next patch. Note that with this change we now unconditionally allocate a seqno, even when ->add_request might fail because the rings are full and the gpu died. But this does not open up a new can of worms because we can already leave behind an outstanding_request_seqno if e.g. the caller gets interrupted with a signal while stalling for the gpu in the eviciton paths. And with the bugfix we only ever have one seqno allocated per ring (and only that ring), so there are no ordering issues with multiple outstanding seqnos on the same ring. v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it clear that we only have one seqno counter for all rings. Suggested by Chris Wilson. v3: As suggested by Chris Wilson use i915_gem_next_request_seqno instead of ring->oustanding_lazy_request to make the follow-up refactoring more clearly correct. Also improve the commit message with issues discussed on irc. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181 Tested-by: Nicolas Kalkhof nkalkhof()at()web.de Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-13 10:55:57 +01:00
Daniel Vetter	5391d0cffe	drm/i915: outstanding_lazy_request is a u32 So don't assign it false, that's just confusing ... No functional change here. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-13 10:55:48 +01:00
Daniel Vetter	e21af88d39	drm/i915: enable ppgtt We want to unconditionally enable ppgtt for two reasons: - Windows uses this on snb and later. - We need the basic hw support to work before we can think about real per-process address spaces and other cool features we want. But Chris Wilson was complaining all over irc and intel-gfx that this will blow up if we don't have a module option to disable it. Hence add one, to prevent this. ppgtt support seems to slightly change the timings and make crashy things slightly more or less crashy. Now in my testing and the testing this got on troublesome snb machines, it seems to have improved things only. But on ivb it makes quite a few crashes happen much more often, see https://bugs.freedesktop.org/show_bug.cgi?id=41353 Luckily Eugeni Dodonov seems to have a set of workarounds that fix this issue. v2: Don't try to enable ppgtt on pre-snb. v3: Pimp commit message and make Chris Wilson less grumpy by adding a module option. v4: New try at making Chris Wilson happy. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 21:49:30 +01:00
Daniel Vetter	7bddb01fb9	drm/i915: ppgtt binding/unbinding support This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-09 21:25:23 +01:00
Daniel Vetter	11782b0233	drm/i915: consolidate swizzling control bit frobbing On gen5 we also need to correctly set up swizzling in the display scanout engine, but only there. Consolidate this into the same function. This has a small effect on ums setups - the kernel now also sets this bit in addition to userspace setting it. Given that this code only runs when userspace either can't (resume, gpu reset) or explicitly won't(gem_init) touch the hw this shouldn't have an adverse effect. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-08 23:18:27 +01:00
Daniel Vetter	f691e2f4ce	drm/i915: swizzling support for snb/ivb We have to do this manually. Somebody had a Great Idea. I've measured speed-ups just a few percent above the noise level (below 5% for the best case), but no slowdows. Chris Wilson measured quite a bit more (10-20% above the usual snb variance) on a more recent and better tuned version of sna, but also recorded a few slow-downs on benchmarks know for uglier amounts of snb-induced variance. v2: Incorporate Ben Widawsky's preliminary review comments and elaborate a bit about the performance impact in the changelog. v3: Add a comment as to why we don't need to check the 3rd memory channel. v4: Fixup whitespace. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-02-08 23:16:24 +01:00
Daniel Vetter	8461d22677	drm/i915: rewrite shmem_pread_slow to use copy_to_user Like for shmem_pwrite_slow. The only difference is that because we read data, we can leave the fetched cachelines in the cpu: In the case that the object isn't in the cpu read domain anymore, the clflush for the next cpu read domain invalidation will simply drop these cachelines. slow_shmem_bit17_copy is now ununsed, so kill it. With this patch tests/gem_mmap_gtt now actually works. v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson. v3: Fixup the swizzling logic, it swizzled the wrong pages. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-30 23:34:34 +01:00
Daniel Vetter	8c59967c44	drm/i915: rewrite shmem_pwrite_slow to use copy_from_user ... instead of get_user_pages, because that fails on non page-backed user addresses like e.g. a gtt mapping of a bo. To get there essentially copy the vfs read path into pagecache. We can't call that right away because we have to take care of bit17 swizzling. To not deadlock with our own pagefault handler we need to completely drop struct_mutex, reducing the atomicty-guarantees of our userspace abi. Implications for racing with other gem ioctl: - execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's already the risk of the pwrite call not being atomic, no degration. - read/write access to mmaps: already fully racy, no degration. - set_tiling: Calling set_tiling while reading/writing is already pretty much undefined, now it just got a bit worse. set_tiling is only called by libdrm on unused/new bos, so no problem. - set_domain: When changing to the gtt domain while copying (without any read/write access, e.g. for synchronization), we might leave unflushed data in the cpu caches. The clflush_object at the end of pwrite_slow takes care of this problem. - truncating of purgeable objects: the shmem_read_mapping_page call could reinstate backing storage for truncated objects. The check at the end of pwrite_slow takes care of this. v2: - add missing intel_gtt_chipset_flush - add __ to copy_from_user_swizzled as suggest by Chris Wilson. v3: Fixup bit17 swizzling, it swizzled the wrong pages. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-30 23:34:21 +01:00
Daniel Vetter	5c0480f21f	drm/i915: fall through pwrite_gtt_slow to the shmem slow path The gtt_pwrite slowpath grabs the userspace memory with get_user_pages. This will not work for non-page backed memory, like a gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit -EFAULT in the gtt paths. Now the shmem paths have exactly the same problem, but this way we only need to rearrange the code in one write path. v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed. v3: Make the codeflow around phys_pwrite cleara as suggested by Chris Wilson. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-30 23:34:07 +01:00
Chris Wilson	068c6ff1cb	drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain The original intention of comparing the bo against the mappable GTT limits was to prevent a subsequent faulting of the bo into the GTT from clearing the entire GTT in vain. However, that was clearly a cut'n'paste mistake as a CPU mapping never binds the bo into the aperture. Whilst there may be some merit to limiting the maximum size of the bo to something that can be utilized by the GPU, that limit itself does not belong as a safeguard to mmapping the bo, so remove the check entirely. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-30 17:54:35 +01:00
Daniel Vetter	39965b3766	drm/i915: don't trash the gtt when running out of fences With the fence accounting fixed up in the previous commit not finding enough fences is a fatal error and userspace bug. Trashing the entire gtt is not gonna turn up that missing fence, so don't to this by returning another error thatn ENOSPC. This has the added benefit that it's easier to distinguish fence accounting errors from gtt space accounting issues. TTM serves as precendence for the EDEADLK error code - it returns it when the reservation code needs resources already blocked by the current reservation. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 18:24:10 +01:00
Chris Wilson	1690e1eb7a	drm/i915: Separate fence pin counting from normal bind pin counting In order to correctly account for reserving space in the GTT and fences for a batch buffer, we need to independently track whether the fence is pinned due to a fenced GPU access in the batch or whether the buffer is pinned in the aperture. Currently we count the fenced as pinned if the buffer has already been seen in the execbuffer. This leads to a false accounting of available fence registers, causing frequent mass evictions. Worse, if coupled with the change to make i915_gem_object_get_fence() report EDADLK upon fence starvation, the batchbuffer can fail with only one fence required... Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Paul Neumann <paul104x@yahoo.de> [danvet: Resolve the functional conflict with Jesse Barnes sprite patches, acked by Chris Wilson on irc.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-29 18:23:37 +01:00
Ben Widawsky	b93f9cf14e	drm/i915: argument to control retiring behavior Sometimes it may be the case when we idle the gpu or wait on something we don't actually want to process the retiring list. This patch allows callers to choose the behavior. Reviewed-by: Keith Packard <keithp@keithp.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-26 11:19:19 +01:00
Eugeni Dodonov	3d29b842e5	drm/i915: add a LLC feature flag in device description LLC is not SNB/IVB-specific, so we should check for it in a more generic way. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2012-01-17 20:01:45 +01:00
Eric Anholt	e959b5db4a	drm/i915: Make the fallback IRQ wait not sleep. The waits we do here are generally so short that sleeping is a bad idea unless we have an IRQ to wake us up. Improves regression test performance from 18 minutes to 3.5 minutes on gen7, which is now consistent with the previous generation. Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:31:16 -08:00
Eric Anholt	7ea29b13e5	drm/i915: Do the fallback non-IRQ wait in ring throttle, too. As a workaround for IRQ synchronization issues in the gen7 BLT ring, we want to turn the two wait functions into polling loops. Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Keith Packard <keithp@keithp.com>	2012-01-03 09:31:14 -08:00

... 3 4 5 6 7 ...

930 Commits