linux/drivers/gpu/drm/i915/Makefile

179 lines
4.5 KiB
Makefile
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
# SPDX-License-Identifier: GPL-2.0
#
# Makefile for the drm device driver. This driver provides support for the
# Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
2017-10-24 18:15:47 +00:00
# Add a set of useful warning flags and enable -Werror for CI to prevent
# trivial mistakes from creeping in. We have to do this piecemeal as we reject
# any patch that isn't warning clean, so turning on -Wall -Wextra (or W=1) we
# need to filter out dubious warnings. Still it is our interest
# to keep running locally with W=1 C=1 until we are completely clean.
#
# Note the danger in using -Wall -Wextra is that when CI updates gcc we
# will most likely get a sudden build breakage... Hopefully we will fix
# new warnings before CI updates!
subdir-ccflags-y := -Wall -Wextra
subdir-ccflags-y += $(call cc-disable-warning, unused-parameter)
subdir-ccflags-y += $(call cc-disable-warning, type-limits)
subdir-ccflags-y += $(call cc-disable-warning, missing-field-initializers)
subdir-ccflags-y += $(call cc-disable-warning, implicit-fallthrough)
2017-10-24 18:15:47 +00:00
subdir-ccflags-$(CONFIG_DRM_I915_WERROR) += -Werror
# Fine grained warnings disable
CFLAGS_i915_pci.o = $(call cc-disable-warning, override-init)
CFLAGS_intel_fbdev.o = $(call cc-disable-warning, override-init)
2017-10-24 18:15:47 +00:00
drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory This patch provides the infrastructure for performing a 16-byte aligned read from WC memory using non-temporal instructions introduced with sse4.1. Using movntdqa we can bypass the CPU caches and read directly from memory and ignoring the page attributes set on the CPU PTE i.e. negating the impact of an otherwise UC access. Copying using movntdqa from WC is almost as fast as reading from WB memory, modulo the possibility of both hitting the CPU cache or leaving the data in the CPU cache for the next consumer. (The CPU cache itself my be flushed for the region of the movntdqa and on later access the movntdqa reads from a separate internal buffer for the cacheline.) The write back to the memory is however cached. This will be used in later patches to accelerate accessing WC memory. v2: Report whether the accelerated copy is successful/possible. v3: Function alignment override was only necessary when using the function target("sse4.1") - which is not necessary for emitting movntdqa from __asm__. v4: Improve notes on CPU cache behaviour vs non-temporal stores. v5: Fix byte offsets for unrolled moves. v6: Find all remaining typos of "movntqda", use kernel_fpu_begin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk
2016-08-12 11:39:59 +00:00
subdir-ccflags-y += \
$(call as-instr,movntdqa (%eax)$(comma)%xmm0,-DCONFIG_AS_MOVNTDQA)
# Please keep these build lists sorted!
# core driver code
i915-y := i915_drv.o \
i915_irq.o \
drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory This patch provides the infrastructure for performing a 16-byte aligned read from WC memory using non-temporal instructions introduced with sse4.1. Using movntdqa we can bypass the CPU caches and read directly from memory and ignoring the page attributes set on the CPU PTE i.e. negating the impact of an otherwise UC access. Copying using movntdqa from WC is almost as fast as reading from WB memory, modulo the possibility of both hitting the CPU cache or leaving the data in the CPU cache for the next consumer. (The CPU cache itself my be flushed for the region of the movntdqa and on later access the movntdqa reads from a separate internal buffer for the cacheline.) The write back to the memory is however cached. This will be used in later patches to accelerate accessing WC memory. v2: Report whether the accelerated copy is successful/possible. v3: Function alignment override was only necessary when using the function target("sse4.1") - which is not necessary for emitting movntdqa from __asm__. v4: Improve notes on CPU cache behaviour vs non-temporal stores. v5: Fix byte offsets for unrolled moves. v6: Find all remaining typos of "movntqda", use kernel_fpu_begin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk
2016-08-12 11:39:59 +00:00
i915_memcpy.o \
i915_mm.o \
i915_params.o \
i915_pci.o \
i915_suspend.o \
drm/i915: Squash repeated awaits on the same fence Track the latest fence waited upon on each context, and only add a new asynchronous wait if the new fence is more recent than the recorded fence for that context. This requires us to filter out unordered timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the absence of a universal identifier, we have to use our own i915->mm.unordered_timeline token. v2: Throw around the debug crutches v3: Inline the likely case of the pre-allocation cache being full. v4: Drop the pre-allocation support, we can lose the most recent fence in case of allocation failure -- it just means we may emit more awaits than strictly necessary but will not break. v5: Trim allocation size for leaf nodes, they only need an array of u32 not pointers. v6: Create mock_timeline to tidy selftest writing v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko) v8: Prune the stale sync points when we idle. v9: Include a small benchmark in the kselftests v10: Separate the idr implementation into its own compartment. (Tvrkto) v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto) v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves v13: kselftests to investigate struct i915_syncmap itself (Tvrtko) v14: Foray into ascii art graphs v15: Take into account that the random lookup/insert does 2 prng calls, not 1, when benchmarking, and use for_each_set_bit() (Tvrtko) v16: Improved ascii art Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
2017-05-03 09:39:21 +00:00
i915_syncmap.o \
i915_sw_fence.o \
i915_sysfs.o \
intel_csr.o \
intel_device_info.o \
intel_pm.o \
intel_runtime_pm.o
i915-$(CONFIG_COMPAT) += i915_ioc32.o
i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
# GEM code
i915-y += i915_cmd_parser.o \
drm/i915: Implement a framework for batch buffer pools This adds a small module for managing a pool of batch buffers. The only current use case is for the command parser, as described in the kerneldoc in the patch. The code is simple, but separating it out makes it easier to change the underlying algorithms and to extend to future use cases should they arise. The interface is simple: init to create an empty pool, fini to clean it up, get to obtain a new buffer. Note that all buffers are expected to be inactive before cleaning up the pool. Locking is currently based on the caller holding the struct_mutex. We already do that in the places where we will use the batch pool for the command parser. v2: - s/BUG_ON/WARN_ON/ for locking assertions - Remove the cap on pool size - Switch from alloc/free to init/fini v3: - Idiomatic looping structure in _fini - Correct handling of purged objects - Don't return a buffer that's too much larger than needed v4: - Rebased to latest -nightly v5: - Remove _put() function and clean up comments to match v6: - Move purged check inside the loop (danvet, from v4 1/7 feedback) v7: - Use single list instead of two. (Chris W) - s/active_list/cache_list - Squashed in debug patches (Chris W) drm/i915: Add a batch pool debugfs file It provides some useful information about the buffers in the global command parser batch pool. v2: rebase on global pool instead of per-ring pools v3: rebase drm/i915: Add batch pool details to i915_gem_objects debugfs To better account for the potentially large memory consumption of the batch pool. v8: - Keep cache in LRU order (danvet, from v6 1/5 feedback) Issue: VIZ-4719 Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-11 20:13:08 +00:00
i915_gem_batch_pool.o \
i915_gem_clflush.o \
drm/i915: preliminary context support Very basic code for context setup/destruction in the driver. Adds the file i915_gem_context.c This file implements HW context support. On gen5+ a HW context consists of an opaque GPU object which is referenced at times of context saves and restores. With RC6 enabled, the context is also referenced as the GPU enters and exists from RC6 (GPU has it's own internal power context, except on gen5). Though something like a context does exist for the media ring, the code only supports contexts for the render ring. In software, there is a distinction between contexts created by the user, and the default HW context. The default HW context is used by GPU clients that do not request setup of their own hardware context. The default context's state is never restored to help prevent programming errors. This would happen if a client ran and piggy-backed off another clients GPU state. The default context only exists to give the GPU some offset to load as the current to invoke a save of the context we actually care about. In fact, the code could likely be constructed, albeit in a more complicated fashion, to never use the default context, though that limits the driver's ability to swap out, and/or destroy other contexts. All other contexts are created as a request by the GPU client. These contexts store GPU state, and thus allow GPU clients to not re-emit state (and potentially query certain state) at any time. The kernel driver makes certain that the appropriate commands are inserted. There are 4 entry points into the contexts, init, fini, open, close. The names are self-explanatory except that init can be called during reset, and also during pm thaw/resume. As we expect our context to be preserved across these events, we do not reinitialize in this case. As Adam Jackson pointed out, The cutoff of 1MB where a HW context is considered too big is arbitrary. The reason for this is even though context sizes are increasing with every generation, they have yet to eclipse even 32k. If we somehow read back way more than that, it probably means BIOS has done something strange, or we're running on a platform that wasn't designed for this. v2: rename load/unload to init/fini (daniel) remove ILK support for get_size() (indirectly daniel) add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel) added comments (Ben) Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-04 21:42:42 +00:00
i915_gem_context.o \
i915_gem_dmabuf.o \
i915_gem_evict.o \
i915_gem_execbuffer.o \
i915_gem_fence_reg.o \
i915_gem_gtt.o \
drm/i915: Introduce an internal allocator for disposable private objects Quite a few of our objects used for internal hardware programming do not benefit from being swappable or from being zero initialised. As such they do not benefit from using a shmemfs backing storage and since they are internal and never directly exposed to the user, we do not need to worry about providing a filp. For these we can use an drm_i915_gem_object wrapper around a sg_table of plain struct page. They are not swap backed and not automatically pinned. If they are reaped by the shrinker, the pages are released and the contents discarded. For the internal use case, this is fine as for example, ringbuffers are pinned from being written by a request to be read by the hardware. Once they are idle, they can be discarded entirely. As such they are a good match for execlist ringbuffers and a small variety of other internal objects. In the first iteration, this is limited to the scratch batch buffers we use (for command parsing and state initialisation). v2: Allocate physically contiguous pages, where possible. v3: Reduce maximum order on subsequent requests following an allocation failure. v4: Fix up mismatch between swiotlb segment size and page count (it counts in 2k units, not 4k pages) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk
2016-10-28 12:58:30 +00:00
i915_gem_internal.o \
i915_gem.o \
drm/i915: Split obj->cache_coherent to track r/w Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-11 11:11:16 +00:00
i915_gem_object.o \
i915_gem_render_state.o \
i915_gem_request.o \
i915_gem_shrinker.o \
i915_gem_stolen.o \
i915_gem_tiling.o \
i915_gem_timeline.o \
drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-16 13:22:37 +00:00
i915_gem_userptr.o \
i915_gemfs.o \
i915_trace_points.o \
i915_vma.o \
drm/i915: Slaughter the thundering i915_wait_request herd One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01 16:23:15 +00:00
intel_breadcrumbs.o \
intel_engine_cs.o \
intel_hangcheck.o \
intel_lrc.o \
drm/i915: Added Programming of the MOCS This change adds the programming of the MOCS registers to the gen 9+ platforms. The set of MOCS configuration entries introduced by this patch is intended to be minimal but sufficient to cover the needs of current userspace - i.e. a good set of defaults. It is expected to be extended in the future to provide further default values or to allow userspace to redefine its private MOCS tables based on its demand for additional caching configurations. In this setup, userspace should only utilize the first N entries, higher entries are reserved for future use. It creates a fixed register set that is programmed across the different engines so that all engines have the same table. This is done as the main RCS context only holds the registers for itself and the shared L3 values. By trying to keep the registers consistent across the different engines it should make the programming for the registers consistent. v2: -'static const' for private data structures and style changes.(Matt Turner) v3: - Make the tables "slightly" more readable. (Damien Lespiau) - Updated tables fix performance regression. v4: - Code formatting. (Chris Wilson) - re-privatised mocs code. (Daniel Vetter) v5: - Changed the name of a function. (Chris Wilson) v6: - re-based - Added Mesa table entry (skylake & broxton) (Francisco Jerez) - Tidied up the readability defines (Francisco Jerez) - NUMBER of entries defines wrong. (Jim Bish) - Added comments to clear up the meaning of the tables (Jim Bish) Signed-off-by: Peter Antoine <peter.antoine@intel.com> v7 (Francisco Jerez): - Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control tables. Prefix L3-specific defines consistently with L3_ and e/LLC-specific defines with LE_ to avoid this kind of confusion in the future. - Change L3CC WT define back to RESERVED (matches my hardware documentation and the original patch, probably a misunderstanding of my own previous comment). - Drop Android tables, define new minimal tables more suitable for the open source stack. - Add comment that the MOCS tables are part of the kernel ABI. - Move intel_logical_ring_begin() and _advance() calls one level down (Chris Wilson). - Minor formatting and style fixes. v8 (Francisco Jerez): - Add table size sanity check to emit_mocs_control/l3cc_table() (Chris Wilson). - Add comment about undefined entries being implicitly set to uncached for forwards compatibility. v9 (Francisco Jerez): - Minor style fixes. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Acked-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-07-10 17:13:11 +00:00
intel_mocs.o \
intel_ringbuffer.o \
intel_uncore.o
# general-purpose microcontroller (GuC) support
i915-y += intel_uc.o \
intel_uc_fw.o \
intel_guc.o \
intel_guc_ct.o \
intel_guc_fw.o \
intel_guc_log.o \
intel_guc_submission.o \
intel_huc.o
# autogenerated null render state
i915-y += intel_renderstate_gen6.o \
intel_renderstate_gen7.o \
intel_renderstate_gen8.o \
intel_renderstate_gen9.o
# modesetting core code
i915-y += intel_audio.o \
intel_atomic.o \
intel_atomic_plane.o \
intel_bios.o \
intel_cdclk.o \
intel_color.o \
intel_display.o \
intel_dpio_phy.o \
intel_dpll_mgr.o \
intel_fbc.o \
intel_fifo_underrun.o \
intel_frontbuffer.o \
intel_hotplug.o \
intel_modes.o \
intel_overlay.o \
intel_psr.o \
intel_sideband.o \
intel_sprite.o
i915-$(CONFIG_ACPI) += intel_acpi.o intel_opregion.o
i915-$(CONFIG_DRM_FBDEV_EMULATION) += intel_fbdev.o
# modesetting output/encoder code
i915-y += dvo_ch7017.o \
dvo_ch7xxx.o \
dvo_ivch.o \
dvo_ns2501.o \
dvo_sil164.o \
dvo_tfp410.o \
intel_crt.o \
intel_ddi.o \
intel_dp_aux_backlight.o \
intel_dp_link_training.o \
2014-05-02 04:02:48 +00:00
intel_dp_mst.o \
intel_dp.o \
intel_dsi.o \
intel_dsi_dcs_backlight.o \
intel_dsi_pll.o \
intel_dsi_vbt.o \
intel_dvo.o \
intel_hdmi.o \
intel_i2c.o \
intel_lspcon.o \
intel_lvds.o \
intel_panel.o \
intel_sdvo.o \
intel_tv.o
# Post-mortem debug and GPU hang state capture
i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
drm/i915: Provide a hook for selftests Some pieces of code are independent of hardware but are very tricky to exercise through the normal userspace ABI or via debugfs hooks. Being able to create mock unit tests and execute them through CI is vital. Start by adding a central point where we can execute unit tests and a parameter to enable them. This is disabled by default as the expectation is that these tests will occasionally explode. To facilitate integration with igt, any parameter beginning with i915.igt__ is interpreted as a subtest executable independently via igt/drv_selftest. Two classes of selftests are recognised: mock unit tests and integration tests. Mock unit tests are run as soon as the module is loaded, before the device is probed. At that point there is no driver instantiated and all hw interactions must be "mocked". This is very useful for writing universal tests to exercise code not typically run on a broad range of architectures. Alternatively, you can hook into the live selftests and run when the device has been instantiated - hw interactions are real. v2: Add a macro for compiling conditional code for mock objects inside real objects. v3: Differentiate between mock unit tests and late integration test. v4: List the tests in natural order, use igt to sort after modparam. v5: s/late/live/ v6: s/unsigned long/unsigned int/ v7: Use igt_ prefixes for long helpers. v8: Deobfuscate macros overriding functions, stop using -I$(src) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-1-chris@chris-wilson.co.uk
2017-02-13 17:15:12 +00:00
i915-$(CONFIG_DRM_I915_SELFTEST) += \
selftests/i915_random.o \
selftests/i915_selftest.o
drm/i915: Introduce a PV INFO page structure for Intel GVT-g. Introduce a PV INFO structure, to facilitate the Intel GVT-g technology, which is a GPU virtualization solution with mediated pass-through. This page contains the shared information between i915 driver and the host emulator. For now, this structure utilizes an area of 4K bytes on HSW GPU's unused MMIO space. Future hardware will have the reserved window architecturally defined, and layout of the page will be added in future BSpec. The i915 driver load routine detects if it is running in a VM by reading the contents of this PV INFO page. Thereafter a flag, vgpu.active is set, and intel_vgpu_active() is used by checking this flag to conclude if GPU is virtualized with Intel GVT-g. By now, intel_vgpu_active() will return true, only when the driver is running as a guest in the Intel GVT-g enhanced environment on HSW platform. v2: take Chris' comments: - call the i915_check_vgpu() in intel_uncore_init() - sanitize i915_check_vgpu() by adding BUILD_BUG_ON() and debug info take Daniel's comments: - put the definition of PV INFO into a new header - i915_vgt_if.h other changes: - access mmio regs by readq/readw in i915_check_vgpu() v3: take Daniel's comments: - move the i915/vgt interfaces into a new i915_vgpu.c - update makefile - add kerneldoc to functions which are non-static - add a DOC: section describing some of the high-level design - update drm docbook other changes: - rename i915_vgt_if.h to i915_vgpu.h v4: take Tvrtko's comments: - fix a typo in commit message - add debug message when vgt version mismatches - rename low_gmadr/high_gmadr to mappable/non-mappable in PV INFO structure Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Eddie Dong <eddie.dong@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-02-10 11:05:47 +00:00
# virtual gpu code
i915-y += i915_vgpu.o
drm/i915: Add i915 perf infrastructure Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
2016-11-07 19:49:47 +00:00
# perf code
i915-y += i915_perf.o \
i915_oa_hsw.o \
i915_oa_bdw.o \
i915_oa_chv.o \
i915_oa_sklgt2.o \
i915_oa_sklgt3.o \
i915_oa_sklgt4.o \
i915_oa_bxt.o \
i915_oa_kblgt2.o \
i915_oa_kblgt3.o \
i915_oa_glk.o \
i915_oa_cflgt2.o \
i915_oa_cflgt3.o \
i915_oa_cnl.o
drm/i915: Add i915 perf infrastructure Adds base i915 perf infrastructure for Gen performance metrics. This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64 properties to configure a stream of metrics and returns a new fd usable with standard VFS system calls including read() to read typed and sized records; ioctl() to enable or disable capture and poll() to wait for data. A stream is opened something like: uint64_t properties[] = { /* Single context sampling */ DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle, /* Include OA reports in samples */ DRM_I915_PERF_PROP_SAMPLE_OA, true, /* OA unit configuration */ DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id, DRM_I915_PERF_PROP_OA_FORMAT, report_format, DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent, }; struct drm_i915_perf_open_param parm = { .flags = I915_PERF_FLAG_FD_CLOEXEC | I915_PERF_FLAG_FD_NONBLOCK | I915_PERF_FLAG_DISABLED, .properties_ptr = (uint64_t)properties, .num_properties = sizeof(properties) / 16, }; int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param); Records read all start with a common { type, size } header with DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records contain an extensible number of fields and it's the DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that determine what's included in every sample. No specific streams are supported yet so any attempt to open a stream will return an error. v2: use i915_gem_context_get() - Chris Wilson v3: update read() interface to avoid passing state struct - Chris Wilson fix some rebase fallout, with i915-perf init/deinit v4: s/DRM_IORW/DRM_IOW/ - Emil Velikov Signed-off-by: Robert Bragg <robert@sixbynine.org> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
2016-11-07 19:49:47 +00:00
drm/i915: gvt: Introduce the basic architecture of GVT-g This patch introduces the very basic framework of GVT-g device model, includes basic prototypes, definitions, initialization. v12: - Call intel_gvt_init() in driver early initialization stage. (Chris) v8: - Remove the GVT idr and mutex in intel_gvt_host. (Joonas) v7: - Refine the URL link in Kconfig. (Joonas) - Refine the introduction of GVT-g host support in Kconfig. (Joonas) - Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas) - Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas) - Remove {alloc, free}_gvt_device() - Rename intel_gvt_{create, destroy}_gvt_device() - Expost intel_gvt_init_host() - Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas) v6: - Refine introduction in Kconfig. (Chris) - The exposed API functions will take struct intel_gvt * instead of void *. (Chris/Tvrtko) - Remove most memebers of strct intel_gvt_device_info. Will add them in the device model patches.(Chris) - Remove gvt_info() and gvt_err() in debug.h. (Chris) - Move GVT kernel parameter into i915_params. (Chris) - Remove include/drm/i915_gvt.h, as GVT-g will be built within i915. - Remove the redundant struct i915_gvt *, as the functions in i915 will directly take struct intel_gvt *. - Add more comments for reviewer. v5: Take Tvrtko's comments: - Fix the misspelled words in Kconfig - Let functions take drm_i915_private * instead of struct drm_device * - Remove redundant prints/local varible initialization v3: Take Joonas' comments: - Change file name i915_gvt.* to intel_gvt.* - Move GVT kernel parameter into intel_gvt.c - Remove redundant debug macros - Change error handling style - Add introductions for some stub functions - Introduce drm/i915_gvt.h. Take Kevin's comments: - Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c v2: - Introduce i915_gvt.c. It's necessary to introduce the stubs between i915 driver and GVT-g host, as GVT-g components is configurable in kernel config. When disabled, the stubs here do nothing. Take Joonas' comments: - Replace boolean return value with int. - Replace customized info/warn/debug macros with DRM macros. - Document all non-static functions like i915. - Remove empty and unused functions. - Replace magic number with marcos. - Set GVT-g in kernel config to "n" by default. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-06-16 12:07:00 +00:00
ifeq ($(CONFIG_DRM_I915_GVT),y)
i915-y += intel_gvt.o
include $(src)/gvt/Makefile
endif
# LPE Audio for VLV and CHT
i915-y += intel_lpe_audio.o
obj-$(CONFIG_DRM_I915) += i915.o