linux/drivers/gpu/drm/i915
Chris Wilson b47161858b drm/i915: Implement inter-engine read-read optimisations
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-21 15:11:42 +02:00
..
dvo_ch7xxx.c
dvo_ch7017.c
dvo_ivch.c Enabled dithering in the intel VCH DVO for 18bpp pipelines. 2015-03-30 16:39:31 +02:00
dvo_ns2501.c drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010 2015-04-23 21:31:58 +02:00
dvo_sil164.c
dvo_tfp410.c
dvo.h
i915_cmd_parser.c drm/i915: Tidy batch pool logic 2015-04-10 08:56:04 +02:00
i915_debugfs.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
i915_dma.c drm/i915/skl: Add support to load SKL CSR firmware. 2015-05-08 13:03:10 +02:00
i915_drv.c drm/i915: Kill the dev variable in intel_suspend_complete() 2015-05-20 17:53:06 +02:00
i915_drv.h drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
i915_gem_batch_pool.c drm/i915: Split batch pool into size buckets 2015-04-10 08:56:05 +02:00
i915_gem_batch_pool.h drm/i915: Split batch pool into size buckets 2015-04-10 08:56:05 +02:00
i915_gem_context.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
i915_gem_debug.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
i915_gem_dmabuf.c dma-buf: cleanup dma_buf_export() to make it easily extensible 2015-04-21 14:47:16 +05:30
i915_gem_evict.c drm/i915: kerneldoc for i915_gem_shrinker.c 2015-03-20 11:48:16 +01:00
i915_gem_execbuffer.c drm/i915: Fix possible security hole in command parsing 2015-05-08 17:26:01 +02:00
i915_gem_gtt.c drm/i915/gtt: Fix the boundary check for vm area 2015-05-20 11:25:47 +02:00
i915_gem_gtt.h drm/i915: Add a partial GGTT view type 2015-05-08 13:04:18 +02:00
i915_gem_render_state.c
i915_gem_render_state.h
i915_gem_shrinker.c drm/i915: Simplify object is-pinned checking for shrinker 2015-04-10 10:58:34 +02:00
i915_gem_stolen.c drm/i915: use proper FBC base register on all new platforms 2015-04-09 15:57:46 +02:00
i915_gem_tiling.c drm/i915: Simplify i915_gem_obj_is_pinned() test for set-tiling 2015-04-16 11:20:29 +02:00
i915_gem_userptr.c drm/i915: Use uninterruptible mutex_lock for userptr bo creation 2015-05-20 11:26:03 +02:00
i915_gem.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
i915_gpu_error.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
i915_ioc32.c
i915_irq.c drm/i915: Use HOTPLUG_INT_STATUS_G4X on VLV/CHV 2015-05-20 11:25:48 +02:00
i915_params.c drm/i915/skl: Add module parameter to select edp vswing table 2015-05-08 13:03:41 +02:00
i915_reg.h drm/i915/bxt: fix WaForceContextSaveRestoreNonCoherent on steppings B0+ 2015-05-21 14:02:06 +02:00
i915_suspend.c drm/i915: Remove regfile code&data for UMS suspend/resume 2015-02-27 18:10:39 +01:00
i915_sysfs.c drm/i915/skl: Updated the act_freq_mhz_show sysfs function 2015-03-17 22:30:25 +01:00
i915_trace_points.c
i915_trace.h Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next 2015-05-08 20:51:06 +10:00
i915_vgpu.c drm/i915: Adds graphic address space ballooning logic 2015-02-13 23:28:23 +01:00
i915_vgpu.h drm/i915: Add ULL postfix to VGT_MAGIC constant 2015-03-17 22:30:18 +01:00
intel_acpi.c
intel_atomic_plane.c drm/i915: Use atomic helpers for computing changed flags 2015-05-08 13:04:08 +02:00
intel_atomic.c drm/i915: Call drm helpers when duplicating crtc and plane states 2015-05-08 13:03:58 +02:00
intel_audio.c drm/i915/audio: do not mess with audio registers if port is invalid 2015-05-08 13:03:36 +02:00
intel_bios.c drm/i915/bios: be more explicit about discarding iomem address space 2015-05-20 11:26:01 +02:00
intel_bios.h drm/i915: Fix the VBT child device parsing for BSW 2015-04-10 08:56:14 +02:00
intel_crt.c drm/i915: Allocate connector state together with the connectors 2015-04-13 15:21:21 +03:00
intel_csr.c drm/i915/skl: Documentation for CSR firmware 2015-05-20 11:25:57 +02:00
intel_ddi.c drm/i915/bxt: Move around lane stagger calculation 2015-05-20 11:26:08 +02:00
intel_display.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
intel_dp_mst.c drm/i915: Use for_each_connector_in_state helper macro 2015-05-08 13:03:58 +02:00
intel_dp.c drm/i915: add HAS_DP_MST feature test macro 2015-05-20 11:26:07 +02:00
intel_drv.h drm/i915: s/\<rq\>/req/g 2015-05-21 15:10:48 +02:00
intel_dsi_panel_vbt.c drm/i915/dsi: remove intel_dsi_cmd.c and the unused functions therein 2015-01-29 16:57:14 +01:00
intel_dsi_pll.c drm/i915/dsi: add support for DSI PLL N1 divisor values 2015-05-20 11:25:58 +02:00
intel_dsi.c drm/i915: Allocate connector state together with the connectors 2015-04-13 15:21:21 +03:00
intel_dsi.h drm/i915/dsi: add drm mipi dsi host support 2015-01-29 16:51:39 +01:00
intel_dvo.c drm/i915: Silence compiler warning in dvo 2015-04-29 14:37:48 +03:00
intel_fbc.c drm/i915: get rid of primary_enabled and use atomic state 2015-05-08 13:03:53 +02:00
intel_fbdev.c drm/i915: Pass in plane state when (un)pinning frame buffers 2015-03-23 15:00:57 +01:00
intel_fifo_underrun.c drm/i915: Check for driver readyness before handling an underrun interrupt 2015-03-04 10:04:19 +02:00
intel_frontbuffer.c drm/i915: PSR VLV: Add single frame update. 2015-04-14 19:15:23 +02:00
intel_hdmi.c drm/i915: Only wait for required lanes in vlv_wait_port_ready() 2015-05-08 17:26:02 +02:00
intel_i2c.c drm/i915: don't register invalid gmbus pins for skl 2015-05-20 11:25:50 +02:00
intel_lrc.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
intel_lrc.h drm/i915: Move common request allocation code into a common function 2015-04-01 07:54:30 +02:00
intel_lvds.c Linux 4.1-rc4 2015-05-20 16:23:53 +10:00
intel_modes.c
intel_opregion.c drm/i915: Remove DRIVER_MODESET checks from modeset code 2015-02-27 18:10:53 +01:00
intel_overlay.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
intel_panel.c drm/i915/bxt: BLC implementation 2015-05-08 13:03:38 +02:00
intel_pm.c drm/i915: s/\<rq\>/req/g 2015-05-21 15:10:48 +02:00
intel_psr.c drm/i915: PSR VLV: Add single frame update. 2015-04-14 19:15:23 +02:00
intel_renderstate_gen6.c drm/i915: Add headers to the various render state 2014-12-10 17:47:23 +01:00
intel_renderstate_gen7.c drm/i915: Add headers to the various render state 2014-12-10 17:47:23 +01:00
intel_renderstate_gen8.c drm/i915: Add headers to the various render state 2014-12-10 17:47:23 +01:00
intel_renderstate_gen9.c drm/i915: Add headers to the various render state 2014-12-10 17:47:23 +01:00
intel_renderstate.h
intel_ringbuffer.c drm/i915: Implement inter-engine read-read optimisations 2015-05-21 15:11:42 +02:00
intel_ringbuffer.h drm/i915: Split the batch pool by engine 2015-04-10 08:56:04 +02:00
intel_runtime_pm.c drm/i915: Fix typo in intel_runtime_pm.c 2015-05-20 11:25:39 +02:00
intel_sdvo_regs.h
intel_sdvo.c drm/i915: Use POSTING_READ() in intel_sdvo_write_sdvox() 2015-05-08 13:03:39 +02:00
intel_sideband.c drm/i915: Correct the IOSF Dev_FN field for IOSF transfers 2015-02-09 14:26:19 +02:00
intel_sprite.c drm/i915: Make the sprite formats const 2015-05-20 11:25:55 +02:00
intel_tv.c drm/i915: Allocate connector state together with the connectors 2015-04-13 15:21:21 +03:00
intel_uncore.c Merge tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel into drm-next 2015-05-08 20:51:06 +10:00
Kconfig drm/i915: Force clean compilation with -Werror 2015-05-21 11:56:12 +02:00
Kconfig.debug drm/i915: Force clean compilation with -Werror 2015-05-21 11:56:12 +02:00
Makefile drm/i915: Force clean compilation with -Werror 2015-05-21 11:56:12 +02:00